A suggestion for quality assessment in systematic reviews of observational studies in nutritional epidemiology

OBJECTIVES: It is important to control the quality level of the observational studies in conducting meta-analyses. The Newcastle-Ottawa Scale (NOS) is a representative tool used for this purpose. We investigated the relationship between high-quality (HQ) defined using NOS and the results of subgroup analysis according to study design. METHODS: We selected systematic review studies with meta-analysis which performed a quality evaluation on observational studies of diet and cancer by NOS. HQ determinations and the distribution of study designs were examined. Subgroup analyses according to quality level as defined by the NOS were also extracted. Equivalence was evaluated based on the summary effect size (sES) and 95% confidence intervals computed in the subgroup analysis. RESULTS: The meta-analysis results of the HQ and cohort groups were identical. The overall sES, which was obtained by combining the sES when equivalence was observed between the cohort and case-control groups, also showed equivalence. CONCLUSIONS: The results of this study suggest that it is more reasonable to control for quality level by performing subgroup analysis according to study design rather than by using HQ based on the NOS quality assessment tool.


INTRODUCTION
Systematic reviews (SRs) with meta-analyses are a useful methodology to assess inconsistent epidemiological study findings [1]. However, controversy arose in the early 1990s regarding meta-analysis of the results of observational studies rather than randomized-controlled trials (RCT) [2][3][4][5]. Researchers have concluded that meta-analyses are not beneficial in cases where there are differences in the quality of the included studies [6,7]; in addition, the necessity for a scoring system to assess the quality levels of selected publications has also been emphasized [8,9].
The Newcastle-Ottawa Scale (NOS) is a representative tool developed for meta-analysis of observational studies [10]. The NOS is suitable for SR due to its easy application [11], and has been widely used due to recommendations from the Cochrane Collaboration [12]. Recently, however, the validity and reliability of the NOS has been questioned [13][14][15]. It was claimed to be additionally supplemented because the guidelines to be applied to each evaluation item were unclear [13,14]. Furthermore, Stang [13] suggested serious errors in the SRs that assessed the quality of observational studies using the NOS.
Since it is impossible to apply RCTs to nutritional epidemiology studies that investigate the relationships between daily food intake and the incidence of various cancers, only SRs for observational studies are available. Thus, Yang et al. [16] included 'data analysis that used an energy-adjusted residual or nutrient-den-sity model' to the existing nine items of the NOS as an adjustment item to evaluate the quality of articles included in metaanalyses, which was then reflected to subgroup analyses. However, in their quality assessment [16], all three cohort studies were determined to be of high-quality (HQ), whereas only two of eight case-control studies were determined to be HQ. The reason for the difference in HQ determination between study designs appears to be attributable to the characteristics of the NOS, which was designed to give a higher score to cohort studies that are more scientifically persuasive [15].
Based on these observations, then, is it possible to replace the NOS tool with evaluating the quality level of observational studies in meta-analysis according to study design? It would be more reasonable and efficient if quality levels could be controlled according to study design rather than spending labor and time in applying the NOS. In other words, if there were consistency between HQ classification and study design, subgroup analysis by study design would be sufficient, instead of quality assessment using the NOS. Thus, the present study investigated the equivalence of meta-analysis results between HQ classification by NOS and cohort studies in same SR of observational studies on nutritional epidemiology.

Subject article searches and selection criteria
The articles selected in the present study were those included in the analytical epidemiology SRs that investigated the relationships between daily food intake and the incidence of various cancers. In addition, they should use the NOS for quality evaluation and show the results of subgroup analysis by study design. The PubMed literature database (www.ncbi.nlm.nih. gov/pubmed) was used to search for articles using search terms corresponding to foods -diet, food, fruit, vegetable, or meatand SR or meta-analysis for cancer incidence in the article title, abstract and keyword among lists published between January 2000 and October 2015.
After obtaining a list of articles, the following exclusion criteria were applied: (1) study hypotheses that did not assess the association between diet and cancer, (2) RCT rather than observational study, (3) SR without conducting meta-analysis, (4) SR without quality assessment, (5) SR having quality assessment by a tool other than the NOS, and (6) despite assessing quality using the NOS during the SR, subgroup analysis results were not included.

Collection of related information and statistical analysis
Inclusion of articles in the analysis of quality assessment was based on statements in the methods section of each article, from which the type of tool used for evaluation and HQ decision criteria were identified. In order to determine the distribution of HQ subjects in each SR article according to study design, the articles were divided into cohort and case-control studies, and the statistical differences between the selected fractions (%) were examined using chi-squared tests.
In addition, the summary effect size (sES) and 95% confidence intervals (CI) estimated from the HQ group and cohort study group in subgroup analysis were extracted. The equiva-  lence between the results of the HQ group and cohort study group was determined based on consistent direction of sES values to the null (= 1) as well as consistent statistically significance based on the 95% CI. Figure 1 shows the process of selecting final articles included in the analysis. After remove duplicate publications from the list obtained by searching formula, resulting in 371 articles identified for further review. Among them, (1) 256 articles were excluded because their study hypotheses did not assess the relationship between diet and cancer, (2) 14 articles were RCT, and (3) six articles performed SR without meta-analysis. Of the remaining 95 papers, 14 SR conducted quality assessments using the NOS as stated in their methods sections and also performed quality assessment in subgroup analysis [16][17][18][19][20][21][22][23][24][25][26][27][28][29]. Of those 14 articles, 4 [16][17][18][19] used modified NOS that included energy intake. Except for the study by Liu et al. [21], all others used seven points or more as the criterion for HQ. Table 1 presents results of HQ assessment as evaluated by the NOS according to cohort or case-control studies. Of the papers selected as subjects of meta-analysis in the 14 papers included in the current study, 81 (91%) of 89 cohort and 72 (34%) of 209 case-control studies were considered HQ, a statistically significant difference (p< 0.01). Table 2 summarizes the sES and 95% CI of the HQ and cohorts group by food item. The paper by Wang et al. [22] was excluded because it considered only cohort studies. Results of 19 datasets by food item from the 13 papers were summarized by food item; of these, 15 datasets showed the same magnitude and same statistical significance. The remaining four datasets lost statistical significance as CI became wide, although their magnitudes were consistent.

RESULTS
For the 15 datasets that showed equivalence between the HQ and cohort groups in Table 2, Table 3 was constructed to compare the results of subgroup analysis in the cohort and casecontrol study groups with overall sES by combining both results. Eight datasets showed equivalence in sES of the cohort and case-control study groups, and their overall sES also showed equivalence. In the seven datasets with non-equivalence, the results of case-control studies with higher article numbers greatly influenced the overall sES.

DISCUSSION
In summary, the HQ and cohort groups had similar meta-analysis results because most cohort studies were classified as HQ based on quality assessment using the NOS. In other words, for SR of observational studies in nutritional epidemiology, quality assessment by NOS is decisively dependent on study methodology. Thus, subgroup analysis by study design may be more valid than the conducting quality assessment method based on the NOS, until a new quality assessment tool is developed in consideration of the characteristics of nutritional epidemiology.
As shown in Table 2, four datasets had no equivalence between the HQ and cohort groups. However, the width of CI changed while the magnitude of sES remained consistent, which Table 3. Summary effect size (sES) and 95% confidence intervals (CI) on the basis of overall and case-control studies about papers showing equivalence of direction and statistical significance between cohort and high quality group in Table 2 Author resulted in their non-equivalence. Since the width of CI changes according to the number of subject articles included in the meta-analysis, the non-equivalence results was attributed to the difference in the number of subject articles, rather than differences between the HQ and cohort groups. The current study also assessed whether non-equivalence could be controlled based on assessing results according to study design rather than the NOS. The results showed that when sES between the cohort and case-control groups showed equivalence, the overall sES for the combination of both group also showed equivalence. In addition, since the CI of the sES between the cohort and case-control groups in the paper by Hu et al. [29] was wide, there was no statistical significance; however, the CI of the overall sES of both groups combined was narrower, resulting in statistical significance. Thus, these findings suggest that it is reasonable to combine both groups when the sES calculated for each of the cohort and case-control groups shows equivalence. However, non-equivalence in the sES between these groups indicates that the overall sES should be interpreted carefully, as the value is dependent on the number of articles.
Colditz et al. [30] proposed that study design, quality of implementation, exposure, and covariates contribute to heterogeneity in SR. Since meta-analysis in nutritional epidemiology uses both cohort and case-control studies, there is heterogeneity by study design [31,32]. Therefore, while NOS evaluation within a group with the same study design would be meaningful, subgroup analysis only with the NOS while ignoring differences in study design may compromise the results.
In the present study, only 16.8% (= 16/95) of papers ( Figure  1) applied quality assessment results to subgroup analysis in their SR of nutritional epidemiology studies, even after including two papers that used their own quality assessment criteria instead of the NOS [33,34]. In addition, the first article to apply the NOS was published in 2006 [34], although the search included publication dates from January 2000. The primary reason for the lack of quality assessment in SR of nutritional epidemiology studies was the lack of valid assessment tools [33,34]. Although the NOS has been used since its development, the present study found that HQ decision in the NOS is fully dependent on study design. Thus, subgroup analysis should be performed separately for each study design until a quality assessment tool specific for nutritional epidemiology is developed.
The current study has several limitations. First, only the Pub-Med literature database was searched to investigate the level of quality assessment and a narrow set of terms related to diet items were applied to the search formula. In particular, items such as dietary fat, fiber, and vitamins indirectly assessed through diet measurement were excluded. Thus, the proportion as 16.8% (= 16/95) (Figure 1) of papers that had subgroup analysis after quality assessment seems to be over-estimated. The results of the present study emphasize the importance of quality level control through subgroup analysis according to study design even without application of the NOS. Secondly, the equivalence of sES between the cohort and case-control study groups was comparatively analyzed based on study results that applied NOS. Thus, further investigations are necessary to determine if quality level can be controlled for in meta-analysis of nutritional epidemiology based on subgroup analysis results according to study design and application methods.
In conclusion, it is advisable to conduct subgroup analysis by study design for quality assessment of meta-analysis in nutritional epidemiology rather than applying the NOS assessment tool, and interpretation of overall sES should rely on the equivalence of sES by study design. These suggestions are consistent with the statement from Greenland [35] in 1994: "Just as a diet and health study needs to examine the effects of each major dietary factor, quality scoring should be replaced by direct regression or stratification on objective quality-related study characteristics, such as study design (cohort, case-control, etc.), sources of data (direct interviews, mailed questionnaire, medical records, etc.), and sources of subjects (registry, hospital, etc.)."