### INTRODUCTION

The Korean Genome and Epidemiology Study (KoGES), a large population-based genomic cohort study supported by the Korea Centers for Disease Control and Prevention (KCDC), investigates risk factors for major diseases among Koreans, in particular focusing on gene-environment and gene-gene interactions [

1,

2]. According to the standardized KoGES protocol 12 biochemical analyses (fasting blood sugar [FBS]; gamma-glutamyl transpeptidase [γ-GTP]; aspartic acid transaminase [AST]; alanine transaminase [ALT]; total cholesterol; triglyceride [TG]; high density lipoprotein [HDL]-cholesterol; low density lipoprotein [LDL]-cholesterol; albumin; blood urea nitrogen [BUN]; uric acid [UA]; and creatinine), complete blood cell counts including eight hematologic indices (white blood cells [WBC]; red blood cells [RBC]; hemoglobin [Hb]; hematocit [HCT]; mean corpuscular volume [MCV]; mean corpuscular hemoglobin [MCH]; mean corpuscular hemoglobin concentration [MCHC]; and platelets), and analysis of high sensitivity c-reactive protein (HS-CRP) was conducted at two clinical laboratories, and authenticated via external quality assessment (QA) by the Clinical and Laboratory Standards Institute (CLSI) and the Korean Society for Laboratory Medicine (KSLM).

Clinical results from multiple laboratories may differ systematically [

3-

5]. Despite standardization of the blood test in the KoGES, potential discrepancies that may lead to significantly biased results in the pooled analyses remain. Moreover, the fundamental differences between the laboratories involving instruments (Hitachi Co., Tokyo, Japan vs. Bayer HealthCare Ltd., Tarrytown, NY, USA), methods (enzymatic vs. colorimetry for γ-GTP) and reference values (adapted to each instrument) may result in heterogeneity.

To minimize the potential analytic drift, appropriate standardization such as calibration or statistical adjustment is warranted [

6]. Thus, we investigated whether 1) the KoGES laboratories measure consistent test values regardless of time and environmental conditions, 2) the laboratories are able to produce the same test values for future data pooling, and 3) future pooled analysis for serological parameters is possible using a statistical compensation model even though absolute test values differ between laboratories. In the present study, the intra- and inter-laboratory reliability of the two clinical laboratories were assessed using quadruplicated standardized serological samples and external quality control (QC) data, and the possibility that serological parameters produced by different laboratories can be integrated through statistical compensation models was evaluated.

### METHODS

To check the overall reliability of each laboratory, three years of QC data (2005, 2006, and 2007) for ten blood chemical tests (albumin, ALT, AST, γ-GTP, BUN, creatinine, UA, FBS, cholesterol, and TG) were obtained from the KSLM. To evaluate intra- and inter-laboratory reliability, quadruplicated standardized serological samples were prepared according to the Clinical and Laboratory Standards Institute (CLSI) guidelines [

7]. Basic principles including 'at least 40 patient samples', 'at least 5 operating days', and 'analysis of each sample in duplicate within the same run' were adopted. The principle of 'at least 50% of samples outside the reference range' was revised based on the true range of each test result in the KoGES and the general distribution in the Korean population. Finally, 40 samples were divided into five groups with two outlier groups A and E (

Table 1).

The standardized serological samples were manufactured at the Department of Laboratory Medicine at the Seoul National University College of Medicine. Eight anonymous patient samples were prepared for five consecutive operating days. During the repeated trial, a total of 3,200 test serological samples were manufactured (eight patients×five operating days×duplication×repeated trial×ten test items×two laboratories). In detail, sample preparation was carried out as follows: 1) a nurse at the Department of Laboratory Medicine selected all serum samples with a volume of more than 3 mL before the day of the test, 2) a well-trained clinical pathologist selected eight serum samples on the morning of the test, 3) four 250 µL aliquots of each serum (quadruplicated samples) were divided into plain tubes, and 4) a set of duplicated samples was delivered to each center. To minimize other sources of variation, sample delivery, storage conditions, and test time were consistent with a standardized protocol. On the day of the test, prepared samples were delivered to each laboratory between noon and 1 pm, and tests were conducted simultaneously at 3 pm. Serum selection and tests were not conducted on a Saturday, Sunday, or Monday. The first trial was conducted on November 29-30 and December 4-7, 2007. On the 6th day of the first trial, additional tests for FBS, BUN, UA, and albumin with unacceptable values outside the reference range (from Groups A or E) were conducted. The second trial was conducted over five days on December 11, 12, 13, 14, and 20, 2007.

The mean of the difference and the coefficient of variation (CV) between the duplicated test results at each laboratory were calculated to determine intra-rater reliability. Pearson's correlation coefficients and intra-class correlation (ICC) coefficients were estimated as indices of intra-rater reliability. To determine the inter-laboratory reliability, we estimated the mean and CV of the differences between the mean of 40 duplicated test values at Centers X and Y. The level of agreement between the two laboratories was assessed using Pearson's correlation coefficients, ICC coefficients and the Bland-Altman Plot [

8]. Kappa statistics were used to measure the agreement in diagnosis (normal vs. abnormal). Coefficient values of 0.81-1.00 and >0.75 indicated 'almost perfect' and 'substantial and excellent', respectively [

9,

10].

Statistical compensation models were constructed using linear regression analysis to determine the potential for data integration. The regression equation was as follows: (Test values at Center Y)=α (intercept)+β (slope)× (Test values at Center X). The model fitting was checked by both regression residual analysis and R-square values. All statistical analyses were performed using SAS version 9.2 (SAS Inc., Cary, NC, USA) and MedCalc version 11.5 (MedCalc Software, Inc, Mariakerke, Belgium).

The present study was approved by the Institutional Review Boards of Seoul National University Hospital and the National Cancer Center of Korea (C-0707-072-214).

### RESULTS

For intra-rater reliability at Center X, the mean differences ranged from 0.027 to 7.387. With the exception of AST (0.842), all correlation coefficients were greater than 0.997 (p<0.001). The ICC coefficients indicated excellent reliability that was greater than 0.836. Similarly, the mean differences at Center Y ranged from 0.018 to 3.125, all Pearson's correlation coefficients ranged from 0.989 to 0.999, and ICC coefficients ranged from 0.990 to 1.000, indicating almost perfect intra-rater reliability. Inter-laboratory reliability between Center X and Y was highly comparable. The means of the absolute values of the differences between the mean of the 40 duplicated test values at Centers X and Y ranged from 0.095 to 9.737. All correlation coefficients were reliable. Though AST showed the lowest Pearson's correlation coefficient and an ICC coefficient equal to the intra-rater reliability, the values remained good at 0.872 and 0.871, respectively. The kappa statistics showed attenuated reliability but all values were over 0.75 with substantial reliability (

Table 2). Bland-Altman plots showed that most absolute differences between Center X and Y were within the 95% limit of agreement (

Figure 1).

Using the standardized serological samples, statistical compensation models using linear regression analysis were constructed to yield a mathematical relationship between the results of the two laboratories. With the exception of AST (R

^{2}=0.70), all regression equations presented strong correlation coefficients (R

^{2}>0.97). The regression equations for the three years, 2005, 2006, and 2007, were estimated using external QC data acquired from the KSLM. All correlation coefficient values were greater than 0.95 with the exception of TG, which used a different QC method in the two laboratories in 2005 and 2006. All the results of the standardized serological samples and external QC data were consistent in the regression models (

Table 3).

In the residual analysis, the sums of residuals were all zero and for those residual plots that presented residuals between the actual y-values and the predicted values all residuals were randomly distributed around zero (data not shown).

### DISCUSSION

This study evaluated the possibility of integrating serological parameters from different laboratories participating in the KoGES via a statistical adjustment algorithm with good intra- and inter-laboratory reliability. Our results indicated that the two laboratories had excellent intra- (correlation coefficient>0.84, ICC>0.83) and inter-laboratory reliability (correlation coefficient>0.87, ICC>0.87, and kappa>0.75) for ten chemical test values. Moreover, linear regression analysis to compensate for the discrepancy in test values between the two centers gave excellent R-square values and zero sums of residuals.

Poorly controlled data can lead to significantly biased results [

11]. Given the participation of two laboratories in the KoGES, the issue of quality control should be carefully addressed. Likewise, a strategy for data integration should be established based on the reliability within and between the laboratories. The present study indicated that regression equations with higher R-square values can compensate for the potential discrepancies caused by the use of different laboratories. Although the absolute values differed slightly, if intra- and inter-laboratory reliability can be assured, data integration may be successfully conducted using statistical compensation models. In terms of AST, γ-GTP and TG with relatively unstable results or insufficient external QC data, further replication studies focused on clinical features and test methodology are required.

The issue of multiple laboratories is relevant not only for the KoGES but also for other large cohorts. Ideally, a single laboratory that passes a strict QC system should conduct all clinical tests according to accurate and standardized methods. However, this is logistically difficult in reality. The most reasonable alternatives are 1) choosing reliable laboratories, 2) developing a standardized protocol for clinical tests, 3) conducting and monitoring regular QC, and 4) integrating the KoGES database after statistical adjustment. For statistical adjustment, the regression analysis method can be used to check intra- and inter- rater reliability.

Using the KoGES data, this study aimed to determine the possibility of integrating serological parameters produced from different laboratories via statistical compensation models. Our results indicate that the ten blood chemical tests analyzed at the two laboratories can be integrated through statistical adjustments using regression equations. The existing external QC data should be used to correct the discrepancies in the other biochemical tests and complete blood cell counts, or additional trials with the standardized serological samples should be conducted before data integration into the KoGES database.

### ACKNOWLEDGEMENT

This study was supported by a research grant from the Korea Centers for Disease Control and Prevention (2007-E71010-00).

### CONFLICT OF INTEREST

The authors have no conflicts of interest to declare for this study.

### References

1. Khoury MJ, Beaty TH, Cohen BH. Fundamentals of genetic epidemiology. 1993. New York: Oxford University Press.

3. Schulze MB, Kroke A, Saracci R, Boeing H. The effect of differences in measurement procedure on the comparability of blood pressure estimates in multi-centre studies. Blood Press Monit 2002;7:95-104. 12048426. PMID:

12048426
4. Iida M, Sato S, Nakamura M. Standardization of laboratory test in the JPHC study. Japan Public Health Center-based Prospective Study on Cancer and Cardiovascular Diseases. J Epidemiol 2001;11:S81-S86. 11763143. PMID:

11763143
5. McGuinness C, Seccombe DW, Frohlich JJ, Ehnholm C, Sundvall J, Steiner G. Laboratory standardization of a large international clinical trial: the DAIS experience. DAIS Project Group. Diabetes Atherosclerosis Intervention Study. Clin Biochem 2000;33:15-24. 10693982. PMID:

10693982
6. Whitney CW, Lind BK, Wahl PW. Quality assurance and quality control in longitudinal studies. Epidemiol Rev 1998;20:71-80. 9762510. PMID:

9762510
7. In: Krouwer JS, ed. Method comparison and bias estimation using patient samples: approved guidelines. 2002. Wayne, PA: National Committe for Clinical Laboratory Standards.

8. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986;1:307-310. 2868172. PMID:

2868172
9. Fleiss JL, Cohen J. The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ Psychol Meas 1973;33:613-619.

10. Altman DG. Practical statistics for medical research. 1991. London: Chapman & Hall.

11. Desmet M, Linder SN. The challenge of auditing clinical laboratories. Drug Inf J 1997;31:197-205.