Prefectural difference in spontaneous intracerebral hemorrhage incidence in Japan analyzed with publically accessible diagnosis procedure combination data: possibilities and limitations

OBJECTIVES: Annually reported, publically accessible Diagnosis Procedure Combination (DPC) data from the Japanese government is a part of the total DPC database of the Japanese medical reimbursement system for hospitalization. Although medical issues can be evaluated with these data promptly, the applicability of these data in epidemiological analyses has not been assessed. METHODS: We performed analyses using only statistical indices reported on the a government website. As a preliminary step, the prefectural consistency of spontaneous intracerebral hemorrhage (sICH) was examined with prefectural mortality over 20 years. Then the prefectural incidence of sICH for four years was calculated, utilizing publically accessible DPC data. To determine its reliability, the consistency was examined, and correlations were analyzed with three prefectural factors expected to have an effect: the elderly rate, mortality due to sICH, and the non-DPC bed rate. In addition, a comparison model between prefectures with this method was developed by analyzing other prefecture-specific factors. RESULTS: Prefectural mortality due to sICH and prefectural sICH incidence in the DPC database were both consistent over the years. Prefectural sICH incidence had a constant positive correlation with the elderly rate, a partial correlation with mortality due to sICH, but no correlation with the non-DPC bed rate, which is one of the major biases when utilizing the DPC database. In the comparison model, the factors of low income and alcohol consumption showed increased sICH incidence. CONCLUSIONS: Although careful attention to its limitations is required, publically accessible DPC data will provide insights into epidemiological issues.


INTRODUCTION
Although many reports have estimated spontaneous intracerebral hemorrhage (sICH) incidence in certain regions [1,2], no nationwide survey capturing all sICH patients has been reported. To obtain the exact sICH incidence, all newly diagnosed sICH cases in the population with available baseline information must be captured, which is difficult even in a small community, and any reported sICH incidence should be regarded as involving some error. Although higher age [1] and male [1,3] increases sICH incidence, even after adjusting for these factors, large regional differences are likely to exist [2]. Hence, estimated sICH incidences in certain regions should not be interpreted as having universal meaning. These estimations would have significance if they were comparable among different regions.
The recent expansion of electronic medical records has enabled large amounts of information to be prepared for medical analysis [4]. In Japan, a novel medical reimbursement system for hospitalization, named the Diagnosis Procedure Combination (DPC) system, was launched in 2003, based on electronic medical records from physicians, submitted by hospitals participating in this system to the Japan Ministry of Health, Labor, and Welfare (JMHLW) [5]. These data are analyzed by the government and the summarized data are officially reported once a year. The number of hospitals submitting their hospitalized patient information to the DPC system (DPC hospitals) has gradually increased over 10 years, and as of March 2015, among 894,216 general beds for admission in Japan, 582,367 beds have been covered by the DPC system, which is 65.1% of the total. The DPC system is believed to cover approximately 90% of the total number of acute inpatient hospitalizations [6,7], as most active hospitals are now DPC hospitals. Since the information from DPC hospitals directly relates to the reimbursement of hospitalization expenses, the data are carefully checked at each hospital, and the submitted information is investigated by the government. The platform format of submissions is uniform throughout the nation, and the data to be submitted includes: age, sex, primary diagnosis, consciousness status and comorbidities on admission, all interventions performed during admission, and activity status at discharge [8]. We believe the quality of the DPC database is quite high in spite of its size.
However, in utilizing publically accessible DPC data, careful attention should be paid to the limitations of the data [5]. As illustrated in Figure 1, the actual sICH incidence in Japan is unknown because of uncaptured sICH patients at hospitals that Figure 1. Conceptional diagram indicating the relation between sICH incidence and mortality due to sICH. sICH, spontaneous intracerebral hemorrhage; DPC, Diagnosis Procedure Combination. 1 Sampling errors in obtaining patients in the DPC database include misdiagnosis and inappropriate coding or double counts of the same patients. 2 Unreported sICH patients from publically accessible DPC data include patients who died or were discharged within 24 hours. 3 Masked patients are unreported patients treated at hospitals with fewer than 10 patients in a year. Specifically, for sICH patients, patients with ICH due to arteriovenous malformation rupture will be involved due to the lack of corresponding codes. do not submit DPC data (non-DPC hospitals). Although the number of sICHs at DPC hospitals is known by JMHLW, patients who died within 24 hours are not reported to the public; thus the number of sICHs that can be gathered from publically accessible DPC data is smaller than the number contained by the whole DPC database. In publically accessible DPC data, numbers fewer than 10 in the segmentalized fields of each hospital according to the DPC codes are masked, although the total patient numbers of DPC hospitals are reported. When obtaining prefectural patient numbers by summing up the patient numbers of the hospitals in the same prefectures, this masking will result in a further reduction in the number of patients compared with the total patient number in the DPC database [2]. Considering these limitations, whether publically accessible DPC data is worth analyzing has yet to be determined, although its enormity, its accessibility, and its expected further development are attractive. The purpose of this paper was to examine the adequacy of utilizing publically accessible DPC data in epidemiological fields.

MATERIALS AND METHODS
This study required no institutional review board approval, or informed consent, since only publically accessible data on government homepages obtained through the Internet were used.

Prefectural consistency in spontaneous intracerebral hemorrhage mortality
Prior to examining the adequacy of prefectural sICH incidence in the DPC database, whether sICH occurs with a constant tendency in each prefecture should be examined, since randomly fluctuating medical events are inappropriate for epidemiological analysis. Although there has been no comparable prefectural sICH incidence measurement so far, as a related index, the number of deaths due to sICH was utilized for this purpose. The Japanese government has successively reported the total annual number of deaths categorized by cause. This number is obtained from all of the death certificates physicians produced, which are then regrouped by cause of death according to the International Classificstion of Diseas, 10th revision by the government. Although mortality and incidence are different indices, it can be assumed that there are correlations, and as a preliminary analysis, the prefectural consistency concerning sICH was confirmed utilizing this officially reported mortality. Every five years, the government reported the rate of age-adjusted cause of death for each prefecture by sex (Appendix 1A). If death due to sICH is affected by surrounding conditions specific to each prefecture, the rank of this rate must be highly constant and the tendencies must be close between sexes. We statistically analyzed the prefectural consistency of sICH mortality between 1990 and 2010 and the correlation between the tendencies in both sexes was evaluated for each year.
Obtaining spontaneous intracerebral hemorrhage incidence in the Diagnosis Procedure Combination database From publically accessible DPC data, the number of hospitalized patients due to sICH in DPC hospitals can be obtained. The corresponding codes indicating sICH are those beginning with 010040, and in this article, the number of patients with those codes divided by the population of the corresponding year was defined as the sICH incidence in the DPC database. However, this number was contaminated by two inevitable biases: one is that patients who died within the first day were excluded from the publically accessible DPC data, and the other is that intracerebral hemorrhages (ICH) due to arteriovenous malformation rupture were counted as sICH, although ICH due to aneurysm rupture, tumor, cerebral infarction, or trauma with different corresponding codes were ruled out. As for the prefectural sICH incidence, hospitals were grouped according to the prefecture in which they were located, then the sICH patient number for each prefecture was summed up. This number divided by the prefectural population is defined as the prefectural sICH incidence; however, this number is further contaminated by the "masked rate" [5]. This occurs because patient numbers treated in hospitals with fewer than 10 patients in a year were not reported, thus the patient numbers obtained as the prefectural sICH incidence resulted in smaller numbers than those captured by the DPC database. This "masked rate," the percentage of unreported numbers in Excel files giving the patient numbers of each hospital, was obtained by a 100× summation / total number. All calculated sICH incidences were expressed as the number per 100,000 people. The files used for the calculation are shown in Appendix 1B and 1C.

Evaluation of prefectural spontaneous intracerebral hemorrhage incidence in the Diagnosis Procedure Combination database
In order to evaluate the adequacy of the prefectural sICH incidence in the DPC database, obtained with the abovementioned method, as an epidemiologically applicable index, analyses were performed concerning two factors: its consistency over a number of years and its correlation with other factors that might have an effect. The consistency of the prefectural sICH incidence in the DPC database between 2011 and 2014 was analyzed according to the same manner used in the consistency analysis for prefectural mortality. Then, the correlations with three possibly affecting factors were assessed for each year. The first possibly affecting factor is the prefectural rate of elderly population aged 75 and over, each year of which was obtained from the files shown in Appendix 1C. A large elderly population will increase the sICH incidence, and without a correlation with this factor, the adequacy of the obtained prefectural sICH would be doubtful. As a second factor, the correlation with the prefectural crude mortality due to sICH was evaluated, since mortality due to sICH is a major outcome. The crude prefectural sICH mortality was calculated by dividing total reported deaths due to sICH in each prefecture, with the files shown in Appendix 1D, by the total prefectural population in each corresponding year (Appendix 1C). The third factor is a major, considerable bias: the rate of non-DPC beds in each prefecture. As mentioned above, the prefectural sICH incidence may be contaminated by several biases, and the most significant bias would be the unknown number of sICH patients treated at non-DPC hospitals. If prefectures with a larger number of non-DPC general beds tend to have smaller sICH incidences, the effect of this bias may contaminate the analysis, thus the prefectural sICH incidence obtained with this method should be considered unreliable. The numbers of DPC beds in each hospital were obtained from the files shown in Appendix 1E, and the total number of general hospital beds in each prefecture from the files shown in Appendix 1F, for the calculation of the prefectural rate of non-DPC general beds.

Comparison model between prefectural sICH incidences in the DPC database
Although the purpose of this article is to evaluate the adequacy of prefectural sICH incidence in the DPC database as an epidemiological index, it is also important to understand how this can be utilized in further analysis. In order to demonstrate a comparison model as an example, the latest prefectural sICH incidence in the DPC database in 2014 was analyzed with several factors specific to each prefecture. The factors used for analysis were 1) population per 1 km 2 of inhabitable area, 2) yearly temperature average, 3) yearly hours of sunshine, 4) yearly precipitation, 5) prefectural income per person, and 6) prefectural alcohol consumption per adult person. These factors were obtained from the government database indicated in Appendix 1G and 1H.

Statistical analysis
The analysis for the time-course consistency of prefectures was performed by calculating Kendall's coefficient of concordance [9,10]. For the correlation, the nonparametric Spearman's correlation coefficient was used for the univariate analysis, and multiple forward stepwise regression analysis was employed for the multivariate analysis. The results were considered statistically significant at p< 0.05, and all p-values were two-sided.
Factors affecting prefectural spontaneous intracerebral hemorrhage incidence in the DPC database Table 2 indicates the Spearman's correlation coefficient of three examined factors in each year, and Figure 2 indicates the relations between sICH incidence and these factors in 2014. Among examined factors, only the rate of the elderly aged 75 and over showed a significant positive correlation continuously for these four years. The crude mortality due to sICH showed positive correlation and the rate of non-DPC beds showed negative correlation, but they were non-significant. With a multiple regression analysis, no factor significantly affected the prefectural sICH in the DPC database in 2011 and in 2012; however, in 2013, the crude mortality due to sICH had a significant correlation (β= 0.37, p= 0.01), and in 2014, the rate of elderly persons had a significant correlation (β= 0.30, p < 0.05). All data from the three examined factors are shown in Appendices 4-6.

Comparison model between prefectural s spontaneous intracerebral hemorrhage incidences in the Diagnosis Procedure Combination database
Among six factors examined, only the prefectural income per person negatively correlated with prefectural sICH incidence according to the non-parametric Spearman's correlation coefficient (Rs= -0.32, p< 0.05). With multivariate analysis, the prefectural alcohol consumption per adult person independently affected prefectural sICH incidence (β= 0.32, p< 0.05). Other factors did not reveal any significant relation in both analyses. Figure 3 indicates the relations between sICH incidence and these significant factors. All data are shown in Appendix 7.

DISCUSSION
The reliability of a large medical database has been difficult to determine. Ordinarily, comparison with a "gold standard" database is desirable for novel database evaluation. However, there is no "gold standard" database regarding sICH incidence [2]. Among medical indices relating to sICH, the standard one is mortality due to sICH, which can be extracted from the vital statistics survey reported annually by the JMHLW, and these data have been used for international comparison by the World Health Organization [11]. In our preliminary analysis, mortality due to sICH in Japan revealed a constant tendency among prefectures. However, our study also revealed only partial positive tendencies between the prefectural mortality and prefectural sICH incidence in the DPC database. This does not necessarily mean that the sICH incidence in the DPC database is unreliable, since the mortality is a different parameter from the incidence, as shown in Figure 1. The mortality due to sICH was reported to be lower in Japan than in any other region [1], and this might result from an intensive medical intervention attitude even for comatose patients with sICH [12]. Further, we should be skeptical about the accuracy of the reported mortality number, since this also definitely contains errors due to misdiagnosis [13,14].
Several population-based studies for sICH incidence in Japan    Prefectural sICH incidence on DPC database Crude mortality due to sICH Rs= 0.22, p= 0.14 (Spearman's correlation coefficient) have been reported, all of which were from regional communities [3,[15][16][17]. The Japanese government holds a National Database (NDB) composed of all health insurance claims (HIC) for payment, which covers the entire nation, including non-DPC hospitals [18]. Although NDB has been refined with more computerized HIC submissions [19], NDB is not equally adequate to the DPC database for analysis due to the lack of detailed patient profiles or uncoded diagnoses [18,19]. We showed constant positive correlations for prefectural sICH incidence in the DPC database with the elderly population rate for each prefecture, indicating a rational tendency: a more aged population will have more sICHs [1]. No significant correlation was obtained with the rate of non-DPC beds in each prefecture, which is a considerable bias. These positive and negative results, and some correlations with mortality due to sICH, may suggest some potential for the sICH incidence obtained by this method. When considering the reliability of the database, its extensive coverage of all concerned patients is a necessary factor to take into account. Due to the severity of the symptoms, it is very rare for patients with sICH to be treated without hospitalization [20]. Thus, analysis of hospitalized patient data is effective for sICH research [20], and was actually not so different from analysis with a population-based study [21]. As a hospital patient database, the DPC database has been expanding and enhancing its coverage. In addition, the credibility of the gathered data is important for the reliability of the database. We believe the quality of the recorded data is high due to its relation to reimbursement under government supervision.
Although there are attractive features to the DPC database, we should be careful regarding limitations when utilizing the  DPC database. The major limitation is that the number of patients treated at non-DPC hospitals is decreasing as the medical reimbursement for hospitalization using the DPC system expands. However, when utilizing publically accessible DPC data, as in this article, other specific limitations occur. The limitations inherent to the publically accessible DPC data are significant and this may account for the paucity of articles utilizing these data [5,22]. Analyses with the whole body of the DPC database are able to utilize more detailed information for each patient. In order to avoid the limitations inherent to publically accessible DPC data, it is preferable that the whole body of the DPC database to be made accessible to any qualified researcher with a medically meaningful research proposal. Since determining the causative factors influencing sICH incidence is not the purpose of this article, we performed only correlation analysis with a limited number of factors, in order to introduce a comparison model between prefectures using the latest publically accessible DPC data. Among the factors examined in this model, the prefectural income per person negatively correlated with the prefectural sICH incidence, and prefectural alcohol consumption was a factor with a positive effect. The risk of alcohol consumption has been proven in Japan [17], and considering the correlation of sICH incidence with low income, some social behaviors relating to drinking habits may influence sICH onset [23]. Other meteorological factors did not reveal any significant relation with sICH incidence, although several reports indicate meteorological [15] or seasonal effects [24,25] on sICH onset. Further analyses with reliable data representing the characteristics of each prefecture may show causative factors with this comparison model.
Although careful attention due to its limitations is required, further evaluation of publically accessible DPC data in Japan as a tool for epidemiological analyses is warranted.

CONFLICT OF INTEREST
The authors have no conflicts of interest to declare for this study. C. The population in Japan in each prefecture and the numbers of people aged 75 and over were obtained from the Excel file on e-Stat, indicated as H. The total amount of alcohol consumption was obtained from the Excel file on the homepage of the National Tax Agency at https://www.nta.go.jp/ kohyo/tokei/kokuzeicho/jikeiretsu/xls/13.xls. (2013 as the latest data). Each prefectural amount of alcohol consumption was divided by the number of people aged 20 and over in each prefecture, obtained from the Excel file, indicated as