Comparison of estimates and time series stability of Korea Community Health Survey and Korea National Health and Nutrition Examination Survey

OBJECTIVES In South Korea, there are two nationwide health surveys conducted by the Korea Centers for Disease Control and Prevention: the Korea Community Health Survey (KCHS) and Korea National Health and Nutrition Examination Survey (KNHANES). The two surveys are directly comparable, as they have the same target population with some common items, and because both surveys are used in various analyses, identifying the similarities and disparities between the two surveys would promote their appropriate use. Therefore, this study aimed to compare the estimates of six variables in KCHS and eight variables in KNHANES over a six-year period and compare time series stability of region-specific and sex- and age-specific subgroup estimates. METHODS Data from adults aged 19 years or older in the 2010-2015 KCHS and KNHANES were examined to analyze the differences of estimates and 95% confidence interval for self-rated health, current smoking rate, monthly drinking rate, hypertension diagnosis rate, diabetes diagnosis rate, obesity prevalence, hypertension prevalence, and diabetes prevalence. The variables were then clustered into subgroups by city as well as sex and age to assess the time series stability of the estimates based on mean square error. RESULTS With the exception of self-rated health, the estimates taken based on questionnaires, namely current smoking rate, monthly drinking rate, hypertension diagnosis rate, and diabetes diagnosis rate, only differed by less than 1.0%p for both KCHS and KNHANES. However, for KNHANES, estimates taken from physical examination data, namely obesity prevalence, hypertension prevalence, and diabetes prevalence, differed by 1.9-8.4%p, which was greater than the gap in the estimates taken from questionnaires. KCHS had a greater time series stability for subgroup estimates than KNHANES. CONCLUSIONS When using the data from KCHS and KNHANES, the data should be selected and used based on the purpose of analysis and policy and in consideration of the various differences between the two data.


Study data
Data from KCHS and KNHANES between 2010 and 2015 were used. Both surveys were conducted by the KCDC. KCHS were collected via an interview with all adult members (aged ≥ 19 years) of the sample households. Nine hundred people per city, gun, and gu are surveyed, for a total of 220,000 people every year. Sampling is performed to ensure proportional sampling probability in con sideration of household sizes based on the number of households by home type within tong, ban, and ri, and secondary sample house holds are selected via systematic sampling [4]. KNHANES is con ducted on the members (≥ 1 year) of sample households, with about 10,000 people surveyed every year. Similar to KCHS, sampling is performed with a complex sample design with city/province, dong/ eup/myeon, and home type as the stratification variables. A total of 3,840 households in 192 districts are chosen every year, and the members are classified as children (aged 111 years), adolescents (aged 1218 years), and adults (aged ≥ 19 years). Agespecific items are used for each group, and health questionnaire, physical exami nation, and nutritional survey are administered for all participants [5].
In this study, participants were limited to adults aged 19 years or older, and the number of participants is shown in Table 1. Al though the number of subjects to be surveyed shows a difference of about 40 times every year between KCHS and KNHANES, the weighted number of subjects is similar between the two surveys after applying weighted values to represent the target population -the Korean population-considering that a complex sample de sign was used for both surveys. The weighted number of subjects differed by about 2% between 2010 and 2013 and by 1% from 2014 to 2015, an average of 2% difference over six years.
When the weighted subjects are divided into subgroups by city, as well as sex and age, there is an average of 9% and 5% difference with reference to the 2015 current smoking rate. The differences vary across years and variables, and arise from nonresponses. In the present study, we considered these differences as a feature of the data and thus analyzed the data as is without age standardiza tion.
The two classic public health surveys conducted in South Korea (hereafter Korea) are Korea Community Health Survey (KCHS) and Korea National Health and Nutrition Examination Survey (KNHANES) controlled by the Korea Centers for Disease Control and Prevention (KCDC). KCHS presents health statistics in units of city, gun, and gu required for establishing community health care plans, thereby enabling interregional comparisons and serv ing as indices of community health projects [4]. KNHANES com putes national statistics for people's health, healthrelated aware ness and behavior, and food and nutrition intake and is used for goalsetting and assessment of the Health Plan. Further, it pro vides national statistical data requested by the World Health Or ganization (WHO) and the Organization for Economic Coopera tion and Development, such as smoking, drinking, physical activi ty, and obesity data [5]. The target population for both surveys is Korean citizens, and they both include the entire region of Korea. Furthermore, they share many survey items although the two sur veys serve different purposes.
In foreign countries, the estimates of surveys with different pur poses but having duplicate items are continually compared to as sess the validity of the surveys. The most accurate method to vali date survey estimates is to examine the entire population, but this is practically impossible; therefore, these estimates can be compared with those of other surveys [6]. In the USA, studies have compared selfrated health estimates among four national health surveys, namely Behavioral Risk Factor Surveillance System (BRFSS), Cur rent Population Survey, National Health and Nutrition Examina tion Survey (NHNES), and National Health Interview Survey [7], binge drinking rate estimates between BRFSS and National Survey on Drug Use and Health [8], and obesity estimates between BRF SS and NHNES [9].
However, there is a lack of studies comparing the estimates be tween different national surveys in Korea. This is because KCHS and KNHANES are virtually the only two national health surveys and because of their different fundamental purposes, assessing the validity of one survey with reference to the other survey is contro versial. However, both surveys are actively utilized for policymak ing and research and for diverse analyses in deviation from their purposes [1013]. Identifying the similarities and disparities be tween the estimates of two surveys would promote appropriate use of the data. Policymakers and researchers would be able to understand cur rent health problems more accurately if they select the relevant data or appropriately utilize both data based on data features and the purpose of their analysis, as opposed to their convenience.
Therefore, this study aimed to compare estimates between KCHS and KNHANES, two surveys that serve as the foundation for com puting important national statistics. To this end, we compared the estimates for six questionnaire variables in KCHS and eight ques tionnaire and physical examination variables in KNHANES over a sixyear period and analyzed the time series stability of estimates of cityspecific and sex and agespecific subgroups.

Definition of variables
To minimize bias, we selected the same variables that the ques tion can correspond to the two surveys. From KCHS, the follow ing six variables were analyzed: selfrated health, current smoking rate, monthly drinking rate, hypertension diagnosis rate, diabetes diagnosis rate, and obesity prevalence. From KNHANES, the fol lowing eight variables were analyzed: selfrated health, current smoking rate, monthly drinking rate, hypertension diagnosis rate, diabetes diagnosis rate, obesity prevalence, hypertension preva lence, and diabetes prevalence. Obesity prevalence, hypertension prevalence, and diabetes prevalence in KNHANES were analyzed using physical examination data, and the remaining variables were analyzed based on questionnaire data.
Selfrated health was defined as the percentage of participants who perceived their health to be very good or good, and current smoking rate was defined as the percentage of participants who claimed to have smoked at least five packs (100 cigarettes) in their lifetime and currently smoke every day or occasionally. Monthly drinking rate referred to the percentage of participants who have drank at least one shot of drink in their lifetime and currently drink at least once a month. In KCHS, hypertension diagnosis rate and diabetes rate referred to the percentage of participants who had been diagnosed with hypertension and diabetes, respectively, by a physician. In KNHANES, hypertension diagnosis rate and diabe tes diagnosis rate referred to the percentage of participants who responded "yes" to the question asking whether they have been diagnosed with hypertension and diabetes, respectively, by a phy sician. KCHS measured body mass index (BMI) using selfreported height and weight, while KNHANES measured BMI using height and weight measured during physical examination. Obesity was defined as a BMI of 25 kg/m 2 or greater based on the WHO Asia Pacific criteria for obesity [14], and obesity rates were compared between the two sets of data. Hypertension prevalence in KNHA NES was defined as a systolic blood pressure (BP) of 140 mmHg or higher or diastolic BP of 90 mmHg higher for three repeated BP measurements in a physical examinations or the use of hyper tension drugs. The hypertension prevalence was compared with hypertension diagnosis rate in KCHS. Diabetes prevalence in KN HANES was defined as a fasting (≥ 8 hours) blood glucose level of 126 mg/dL or higher measured in a physical examination, diag nosis by a physician, use of hypoglycemic agent, or use of insulin injections, and diabetes prevalence was compared with the diabe tes diagnosis rate in KCHS. Because hypertension and diabetes are chronic diseases that can be controlled, as opposed to being cured, we deemed it appropriate to conclude individuals to have the disease if they had ever been diagnosed with it in their life time. Also, it would be appropriate to compare this estimate with an objective assessment of the disease based on health examina tion data.

Statistical analysis
Because both surveys used a complex sample design, we con sidered weight, stratification and clustering for the computation of the estimates. When comparing the estimates, the absolute dif ference, 95% confidence interval (CI) of the difference, and rela tive difference (ratio of absolute difference to mean KNHANES estimate) were analyzed for the annual estimates for each variable. The absolute difference is the absolute value of the difference be tween estimates, and relative difference is the proportion of abso lute difference in the mean estimate. Variables with low estimates tend to have low absolute differences, so examining relative differ ence more clearly shows the difference between estimates regard less of the size of the estimates.
Time series stability was compared among 16 city subgroups, excluding the city of Sejong, which was not separately surveyed until the sixth KNHANES (20132015), and among 14 sex and age subgroups (1988 years divided into 10year units for males and females). A simple linear regression line was computed using six years of estimates by subgroup, and the variability of the esti mates were assessed using mean square error (MSE) of the esti mates to the line. For example, if there is low variability in the es timates over six years, the MSE would be lower, and this would indicate high time series stability. All analyses were performed us ing SAS version 9.4 (SAS Institute Inc., Cary, NC, USA) and R ver sion 3.4.4 (https://cran.rproject.org/bin/windows/base/old/3.4.4/).

Ethical statement
This study was waived for review by the Institutional Review Board (IRB) at Seoul National University (IRB No. E1711/003002). Figure 1 and Table 2 show the comparison of estimates by vari able between both surveys over six years from 2010 to 2015.

Comparison of estimates by variable between KCHS and KNHANES
The mean absolute difference of selfrated health over six years was 10.8%p, with 10.5%p in 2010, 9.2%p in 2011, 11.4%p in 2012, 9.6%p in 2013, 9.9%p in 2014, and 14.0%p in 2015. The mean rel ative difference was 33.0%p. The differences were greater than those for other variables.
The mean absolute difference of current smoking rate over six years was 1.    Figure 2 shows the graph of time series trends by city as well as sex and age for current smoking rate, hypertension diagnosis rate and prevalence of obesity from 2010 to 2015. When divided by city, changes in current smoking rate estimates in KCHS were smaller than those in KNHANES in nearly all regions. However, when divided by sex and age, changes in estimates were similar between the two surveys. The trends for hypertension diagnosis rate were nearly identical to that of current smoking rate for both city as well as sex and age graphs. Regarding prevalence of obesity, KCHS computes the prevalence using selfreported height and weight while the KNHANES computes the prevalence using ac tually measured height and weight. For this reason, the difference of estimates is larger than that for current smoking rate or hyper tension diagnosis rate. Similar to other variables, time series trends varied greatly for KNHANES in the city graph, but the variability was smaller in the sex and age graph.

DISCUSSION
This study compared the estimates for six variables measured based on a questionnaire in KCHS and eight variables measured based on a questionnaire or physical examination in KNHANES from 2010 to 2015 and divided them by city as well as sex and age to compare time series stability. With the exception of selfrated health, all estimates measured based on questionnaires, namely current smoking rate, monthly drinking rate, hypertension diag nosis rate and diabetes diagnosis rate, showed an absolute differ ence of less than 1.0%p and relative difference ranging from 1.9 9.0%p. For prevalence of obesity, hypertension diagnosis rate and prevalence of hypertension, diabetes diagnosis rate and preva lence of diabetes using questionnaire data in KCHS and physical examinations data in KNHANES, the absolute difference ranged from 1.98.4%p and relative difference ranged from 21.632.0%p, showing greater differences in the estimates compared to those measured based on questionnaire data. Time series stability by subgroup was higher for KCHS than KNHANES, and in KN HANES, time series stability for sex and age subgroups was great er than that for regional subgroups.
In both surveys, the difference of estimates for variables meas ured using questionnaire data was small, but there was a large dif ference of the estimates in selfrated health. Although both KCHS and KNHANES showed a declining trend in selfrated health, the difference between the two surveys was relatively large despite the fact that both surveys used the same question "how would you rate your health?" and both surveys collected this data using a ques tionnaire. Differences also varied across variables in the study con ducted by Fahimi et al. [3] comparing BRFSS and other national surveys, where there were relatively small differences of current smoking rate and influenza vaccination rate in the past year in various subgroups but larger differences in prevalence of asthma and selfrated health. Particularly, the absolute difference of the percentage of "fair" or "poor" responses regarding selfrated health between surveys was greater (4.2%; relative difference 33.9%) than that for other variables [3]. Although the absolute difference of selfrated health between surveys in the present study (10.8%) was greater than that found by Fahimi et al. [3], the relative difference was similar at about 33.0%. Salomon et al. [7] compared selfrated health data among various national surveys from 1971 to 2007. In the said study, the estimates of selfrated health were compared among four surveys by dividing the partici pants by sex, age, race, and education. In general, the estimates dif fered greatly and showed inconsistent trends across surveys. In contrast, variables such as diabetes and BMI differed less and showed consistent time series trends across surveys. Based on these results, Salomon et al. [7] suggested that it is difficult to pro vide a simple explanation of the differences in the estimates of self rated health across surveys and that selfrated health is not an ap propriate variable for monitoring the health of different groups over time. In a study comparing the estimates among three nation al surveys by Li et al. [15], the absolute difference of current smoking rate, prevalence of obesity, prevalence of hypertension, and no health insurance rate ranged from 0.73.9%p. Particularly, the absolute difference of selfrated health ranged from 0.43.1%p, which suggests similar estimates across surveys, but the trends were inconsistent. As shown here, multiple studies report that selfrated health estimates differ greatly across surveys. The difference of prevalence between data measured using ques tionnaires and physical examinations was greater than that between data measured using questionnaires in other studies as well. In a study that compared the prevalence of obesity between that meas ured using selfreported height and weight in KCHS and that com puted using actually measured height and weight in KNHANES in 2010, the prevalence of obesity differed by 8.6%p and preva lence of overweight differed by 7.8%p. This was because overesti mation of height increased with age while weight was underesti mated in males in their 20s and 30s and females in their 20s to 40s in KCHS [16]. In a study investigating the effects of prevalence of obesity on diabetes according to method of survey in adults aged 45 years or older, the difference of obesity prevalence by ques tionnaire in KCHS and that by physical examination in KNHANES was 6.4%p and the difference of diabetes prevalence was 2.6%p [17]. Other study also reported that selfreported data and actual ly measured data differ particularly according to sex [18].
Time series trend by subgroup was more stable in KCHS than in KNHANES. This may be attributable to the fact that KCHS has about 40 times more subjects than KNHANES. Variability of esti mates in the regional subgroups was high in KNHANES, particu larly in regions with a small number of subjects (≤ 200). There was a tendency of higher variability of estimates in the 7988 years group for males and females compared to other age groups in KNHANES, but this age group had fewer subjects (≤ 150) than other age groups. This suggests that time series stability of regional estimates is not en sured in KNHANES, so extra precaution should be taken when us ing regional analyses as evidence of policies or research.
Nevertheless, this study has a few limitations. We were able to compare the estimates of only a few items, so the findings cannot represent the overall differences of estimates between the two sur veys. Furthermore, although both KCHS and KNHANES use com plex samples, they differ in the number of subjects and sampling method. KCHS surveys about 220,000 people every year with 900 individuals in each of the 251 public health centers in 16 cities na tionwide, which enables even distribution of samples throughout all regions in Korea [19]. In contrast, KNHANES has a relatively smaller sample of 10,000 people in 3,840 households in 192 dis tricts every year using sex, age, living space, and education of head of household as implicit stratification standards, and it aims to compute yearly national statistics, which may cause uneven dis tribution of subjects across regions. Such differences of features limits direct comparison of the two surveys. Furthermore, KN HANES is an annual survey, whereas KCHS conducts a survey for three months from August to October, so there is a chance that the estimates may differ due to the difference in survey peri ods. In addition, KCHS collects data through inperson interviews, while KNHANES collects data either through interviews or self reported questionnaire depending on the survey item. Quality management of interviewers also differs between the two surveys. For KCHS, interviewers are selected for each region in June, and they undergo shortterm training [20]. For KNHANES on the other hand, eight specialists comprise a team for interview survey and physical examinations survey, two of whom take charge of health interview, and a total of four professional survey teams trav el around the country for the survey [21]. As shown here, the two surveys differ in several aspects, and estimates can differ not only due to the differences in the number of subjects, sampling meth od, and data collection method but also due to subtle differences, such as those in the nature of health parameters, phrasing of ques tions, and order of questions [22,23]. Therefore, such minor dif ferences should be meticulously reviewed when comparing dif ferent surveys [15]. When using KCHS and KNHANES data, the differences between the two surveys should be noted and either data should be selected or both should be used in supplementa tion depending on the purpose of analysis or policy.
Despite these limitations, this study clearly has strengths as well. Compared to similar Korean studies [16,17], we performed a more comprehensive comparison using six and eight variables in KCHS and KNHANES, respectively, over six years, and also analyzed the differences of estimates according to method of survey by com paring estimates taken from interviews and estimates taken from physical examinations data. Furthermore, we analyzed the time series stability of subgroup estimates and demonstrated that sub group estimates might differ between the two surveys even with little differences in yearly estimates overall.
This study proposed the similarities and disparities between two major health surveys in Korea that are utilized in policies and re search, and our findings would contribute to preventing errors that may occur by using only one set of data as the basis of policies or research. Recent intersurvey comparison studies have expanded the scope of comparison to surveys across countries [24]. In the fu ture, studies should compare Korean surveys with foreign surveys to lay a foundation to compare and share international policies.