Integrated database-based Screening Cohort for Asian Nomadic descendants in China (Scan-China): Insights on prospective ethnicity-focused cancer screening

Established in 2017, the Screening Cohort for Asian Nomadic descendants in China (Scan-China) has benefited over 180,000 members of a multi-ethnic population, particularly individuals of Mongolian descent compared with the general population (Han ethnicity), in the Inner Mongolia Autonomous Region, China. This cohort study aims to evaluate the effectiveness of cancer screening and serve as a real-world data platform for cancer studies. The 6 most prevalent cancers in China are considered—namely, breast, lung, colorectal, gastric, liver and esophageal cancer. After baseline cancer risk assessments and screening tests, both active and passive follow-up (based on the healthcare insurance database, cancer registry, the front page of hospital medical records, and death certificates) will be conducted to trace participants’ onset and progression of cancers and other prevalent chronic diseases. Scan-China has preliminarily found a disproportionately lower screening participation rate and higher incidence/mortality rates of esophageal and breast cancer among the Mongolian population than among their Han counterparts. Further research will explore the cancer burden, natural history, treatment patterns, and risk factors of the target cancers.


INTRODUCTION
Most screening programs are one-size-fits-all, despite their high efficiency in identifying high-risk and early-onset populations and cost-effectiveness in reducing cancer incidence and mortality [1][2][3].Racial/ethnic minorities worldwide, compared to the general populations, are reported to have lower and less timely uptake and completion of screening [4][5][6][7][8][9][10], along with a higher risk of cancer health disparities due to genetic, geographic, and socioeconomic factors [11][12][13].In Western countries, Asian minorities have among the severest patterns of these alarming disparities [6].However, this issue has been persistently overlooked on the multicultural Asian continent.
Mongolians, who descend from Asian nomads inhabiting grassland areas, are among the largest ethnic minorities in East Asia (over 10 million population worldwide, including 6.3 million Chinese residents).Mongolians' traditional dietary habits are Established in 2017, the Screening Cohort for Asian Nomadic descendants in China (Scan-China) has benefited over 180,000 members of a multi-ethnic population, particularly individuals of Mongolian descent compared with the general population (Han ethnicity), in the Inner Mongolia Autonomous Region, China.This cohort study aims to evaluate the effectiveness of cancer screening and serve as a real-world data platform for cancer studies.The 6 most prevalent cancers in China are considered-namely, breast, lung, colorectal, gastric, liver and esophageal cancer.After baseline cancer risk assessments and screening tests, both active and passive follow-up (based on the healthcare insurance database, cancer registry, the front page of hospital medical records, and death certificates) will be conducted to trace participants' onset and progression of cancers and other prevalent chronic diseases.Scan-China has preliminarily found a disproportionately lower screening participation rate and higher incidence/mortality rates of esophageal and breast cancer among the Mongolian population than among their Han counterparts.Further research will explore the cancer burden, natural history, treatment patterns, and risk factors of the target cancers.
characterized by highly caloric, salty food and high consumption of red meat and dairy products to accommodate their migratory patterns [14].Although they currently live mixed with the general population (Han ethnicity) in China, Mongolians' distinctive living patterns, together with genetic factors conferring susceptibility [15,16] and cultural beliefs [17] inherited from their ancestors, may contribute to pessimistic uptake and poor effectiveness of screening.However, Mongolian-oriented screening programs have not been initiated, and studies have not focused on cancer disparities between Mongolians and the general population in China.These gaps have led to cancer health disparities being persistently unaddressed.
The Screening Cohort for Asian Nomadic descents in China Inner Mongolia Autonomous Region (Scan-China) project is the first and largest screening program tailored for the Mongolian ethnicity, initiated in the Inner Mongolia Autonomous Region of China.Inner Mongolia is the area with the largest population of the Mongolian ethnicity globally, wherein Mongolians constitute the second-largest ethnicity (19.3% of the total population) [18].Derived from the Cancer Screening Program in Urban China (detailed information can be seen in previous relative literature [19,20] and in the Supplementary Material 1), Scan-China offers cancer risk assessments and screening tests for the 6 most prevalent cancers in urban areas (lung, breast, liver, esophageal, gastric and colorectal).Scan-China aims to evaluate the effectiveness of screening, particularly for ethnic minorities, to describe the natural history and explore risk factors of the targeted cancer types and prevalent comorbidities, and to portray treatment patterns among different ethnicities, all contributing to addressing cancer health disparities across ethnicities.
Scan-China targets urban residents in 5 districts of 2 major cities (Xincheng District, Huimin District, Yuquan District, and Saihan District in the city of Hohhot; Keerqin District in the city of Tongliao) with the largest population (over 1 million per city) in Inner Mongolia (Figure 1).The districts were selected based on population size, representativeness of multiple ethnicities (Han, Mongolian, and other ethnic minorities living mixed together), and the feasibility of project implementation considering healthcare resources and hospital collaborations.In detail, Hohhot is the provincial capital and Tongliao is one of the birthplaces of the Mongolian ethnicity, respectively representing developed (toplevel gross domestic product) and developing (medium-level gross domestic product) levels of healthcare resources in Inner Mongolia [18].

STUDY PARTICIPANTS
This dynamic cohort was planned to benefit 30,000 new participants annually.Participants who meet the following inclusion criteria would be recruited in Scan-China after a baseline cancer risk assessment: (1) Chinese residents of the catchment areas, or with a minimum residence of 3 years in Hohhot or Tongliao, (2) aged 40-74 years old at the cohort entry date, (3) being a community-dwelling national healthcare insurance beneficiary, (4) having voluntarily signed a written informed consent form to participate in Scan-China.Individuals of all ethnicities have equal rights and chances to participate in Scan-China.Eligible individuals are voluntarily enrolled and not restricted by ethnicity.More detailed inclusion and exclusion criteria are shown in the Supplementary Material 1 and Figure 2.
Follow-up is conducted both actively and passively.All participants in Scan-China will be passively followed via annual linkage of the baseline results of risk assessment and screening tests to multi-source electronic health data (EHD) databases in Inner Mongolia.The EHD databases of Scan-China include the Urban Residents' Healthcare Insurance Database (URHID), the Cancer Registry (CaR), the front page of hospital medical records (FPMR), and death certificates (DCs), all officially governed by the Inner Mongolia Center for Disease Control and Prevention.
Data linkage and integration are conducted using personal identity numbers, which are de-identified during data analysis for privacy protection.Standardized coding systems are used across the EHD databases of Scan-China.The diagnoses of diseases and causes of death are coded using the International Classification of   Diseases, 10th revision (ICD-10).The prescriptions in URHID are coded using the Anatomical Therapeutic Chemical (ATC) system.Moreover, specific validation of the accuracy of EHDsourced diagnoses has been conducted independently by 2 clinical experts, who checked the diagnoses of a certain portion of random samples from the whole cohort population (an initial assessment found about 94% accuracy in the diagnoses of cardiovascular diseases).The active follow-up is tailored for participants who receive positive screening results for any cancer via phone calls or inhouse visits, as well as medical records for subsequent confirmation of their status of cancer onset.Clinical experts from collaborative tertiary hospitals will conduct gold-standard examinations to diagnose cancer.Participants who receive positive screening results but negative examination findings will undergo annual rescreening for the next 5 years.Participants with confirmed diagnoses of cancer will be recommended to visit doctors for professional treatment.

Ethics statement
The project was approved by the Institutional Review Board (IRB) of Ethics Committee of Inner Mongolia Autonomous Region Center for Disease Control and Prevention (IRB No. NM-CDCIRB2021001).Informed consent was confirmed by the IRB.The project has been submitted for registration in the China Clinical Trial Registration Center.The recruited participants all signed written informed consent at baseline.The authors affirm that the human research participants provided informed consent for the publication of all results involved in this paper.

MEASUREMENTS
Information throughout the screening is collected via a baseline questionnaire and biological tests for cancer risk assessment, screening tests and blood samples if necessary, and active and passive follow-up via EHD databases.A detailed statistical analysis plan for the analyses that will be conducted and what will be reported when the follow-up period is extended to 10 years or 20 years can be seen in the Supplementary Material 1. Cancer-related information is mainly collected during follow-up, involving diagnoses, treatments, and details on cancer onset and survival status (Table 1).

Baseline cancer risk assessment: questionnaire survey for all participants
All eligible participants are required to engage in cancer risk assessments via a paper-based questionnaire with instructions from trained staff (Table 1).The assessment is based on the Harvard Risk Index [21][22][23].For each participant, the information collected at baseline includes socio-demographic characteristics, behaviors and environmental/occupational exposures to cancer-related risk factors, psychological conditions, and personal and family history of diseases.Details on lifestyle habits such as food prefer-ences regarding temperature and flavor are also collected, filling an insufficiency in most previous screening cohorts.

Baseline screening and blood sample collection: biochemical tests for the high-risk population
Only high-risk participants according to the baseline assessment are recommended to receive screening tests for the respective targeted cancers.All screening tests are provided by collaborating hospitals, free of charge, and are conducted by physicians with over 5 years of clinical experience.Meanwhile, an expert panel from the National Cancer Center of China has been assembled as the third party to provide consultation if physicians are unsure of reaching a positive or negative classification.For participants at high-risk for liver cancer, upper gastrointestinal cancer, and colorectal cancer, a 5-mL blood sample per person is collected prior to the above screening tests.Each participant's screening test results and details on pathology reports are archived both physically and electronically in the screening database [22,24].

Electronic health data-integrated follow-up strategy: annual dynamic updates for all participants
A detailed description of the core variables available in Scan-China databases is shown in Table 1 and Supplementary Material 2. In brief, information on disease diagnoses, prescriptions, hospitalizations, and medical expenses throughout an individual's entire course of hospital visits will be available in the URHID.The FPMR offers details related to clinical diagnoses and in-hospital treatments, while the CaR concentrates on tumor-related information, including cancer onset, clinical status, and pathological findings.The survival status, causes of death, and date of death will be monitored using DCs.The passive follow-up will be annually conducted to dynamically update each participant's health condition.

KEY FINDINGS
Scan-China was established on January 1, 2017 and residents could enter the cohort dynamically.By December 31, 2021, Scan-China has benefited 180,255 people (about 11% Mongolian) in Inner Mongolia (70,109 at the first-wave baseline completed by December 31, 2018), with an average increase of 4,500 new participants annually.For 48,471, 792, and 1,004 participants, disease onset and ongoing disease status were reported in the URHID (over 2.3 million records on diagnoses of the Scan-China participants were captured from January 1, 2017 to December 31, 2021), CaR, and DC, respectively.

STRENGTHS AND WEAKNESSES
Scan-China is the first EHD-integrated dynamic screening cohort targeting multiple ethnicities in Inner Mongolia.With the aim of addressing poor screening effectiveness among ethnic minorities, Scan-China is a unique platform for popularizing applicable health interventions for the Mongolian minority.
Cancer cohorts integrated with EHD databases show greater cost-effectiveness and time-effectiveness [25][26][27].The current findings from Scan-China demonstrated accurate linkage across the baseline population and respective EHD databases.In particular, the capture rate of passive follow-up through the claims database reached 92.2%, which is better than the extant highquality passive EHD follow-up [28], indicating the feasibility and high efficiency of the EHD-integrated follow-up strategy.Although data quality remains a stubborn challenge for almost all real-world studies [29,30], the multi-source EHD databases enable Scan-China to achieve information validation, timestamp selection, and progression tracking of diseases.More importantly, Scan-China sets a framework for the linkage and integration of heterogeneous EHD databases in the scope of a cancer screening cohort, advancing beyond the CHERRY study [31], which utilized inherently linked EHD on cardiovascular diseases.
Another distinctive merit of Scan-China is that it presents information on comorbidities, complications, and treatment patterns among Mongolian patients.It should be mentioned that traditional Mongolian medicine accounts for a majority of treatment strategies among this population, in parallel with Western medicine and traditional Chinese medicine.Scan-China will shed light on cancer-related treatment priorities regarding drug effectiveness and safety.For example, cardiovascular complications have been increasingly reported as a major drug adverse reaction during chemotherapy [32].Differences in treatment patterns among the Mongolian ethnicity, in comparison with the general population, might provide insights into how to ameliorate disparities in the prognosis of cancer and other prevalent cardiovascular diseases.
Nonetheless, the project has some limitations.First, Scan-China only targets urban residents and lacks representativeness of the rural population in consideration of study feasibility.Furthermore, the baseline questionnaire was only answered by participants who volunteered to take part in the study.This might have generated selection bias.Second, the baseline information on cancer-related risk factors (such as lifestyle habits) was all self-reported, which induced unavoidable recall bias.Third, the inclusion criteria in terms of the age range might have caused information loss on early-life exposures.Third, overdiagnosis, overtreatment, and misinterpretation of clinical data may have taken place [33].However, all the aforementioned limitations are inherent to most screening cohorts' design and cannot be avoided [34].Moreover, problems with EHD quality are inherently unavoidable.Previous studies have reported that cancer incidence has been Dietary habits 2,3  Average consumption (/wk) of fresh vegetables <0.001Never 3,179 (5.For the categories of "average consumption (per week)" in different types of food: "not much" of fresh vegetables, fresh fruits, red meat, coarse grains was respectively defined as <2,500 g, <1,250 g, <350 g, and <500 g/wk. 3 Meeting the recommended amount of fresh vegetables, fresh fruits, red meat, coarse grains as respectively defined as ≥2,500 g, ≥1,250 g, ≥350 g, and ≥500 g/wk. 4 Assessed at high risk in the targeted 6 types of cancer was defined as a RR ≥1.50 using an established risk score system based on the Harvard Risk Index.Two participants' ethnicities were missing and 1,760 participants' ethnicities were neither Han nor Mongolian, and were thus excluded from the analysis; Only Han and Mongolian participants were included in the above analysis (n=68,349).
underestimated [35,36].However, combining multi-source EHD databases might complement the completeness and reliability of records.Furthermore, the 3-year cumulative cancer incidence from Scan-China showed smaller differences between males and females than reported in the previous literature [37,38].This may have been partly because the denominator used for incidence calculation was only composed of the first-wave population at the current preliminary stage.Alternatively, the larger differences could be explained by the inclusion of all cancer patients in other studies, rather than high-risk groups in the age range of 40-74 years or participants in cancer screening programs.Therefore, the findings from Scan-China need to be cautiously generalized in the context of comparable screening programs, population proportions, and data sources.

DATA ACCESSIBILITY
Scan-China is not an open-access database.The data utilized in its future studies will be available in de-identified form upon reasonable request, with approval from the expert panel of Scan-China, the Inner Mongolia Autonomous Region Center for Disease Control and Prevention, and the Ethics Committee of National Cancer Center/Cancer Hospital, China Academy of Medical Sciences and Peking Union Medical College.Collaborations and external investigations of the Scan-China dataset are welcomed to make further contributions to cancer health promotion.The expert panel of Scan-China will contact you via e-mail if your application is considered meaningful (with application materials including the study protocol, statistical analysis plan, and contribution statement) and data use is approved by the above committees.

Figure 1 .
Figure 1.Spatial distribution of participants in the baseline survey of Screening Cohort for Asian Nomadic descents in China Inner Mongolia Autonomous Region.

Figure
Figure 2. Flowchart of Screening Cohort for Asian Nomadic descents in China Inner Mongolia Autonomous Region (Scan-China).HBsAg, hepatitis B surface antigen; H. pylori, Helicobacter pylori; FOBT, Fecal Occult Blood test; LDCT, low-dose computed tomography; AFP, alpha-Fetoprotein; EHDs, electronic healthcare databases.The numbers attached to each section of the flowchart are from first-wave in 2017-2018, able to show one complete annual procedure of Scan-China.Data in later waves is undergoing processed and not presented here.

2 .
Flowchart of Screening Cohort for Asian Nomadic descents in China Inner Mongolia Autonomous Region (Scan-China).HBsAg, hepatitis B surface antigen; H. pylori, Helicobacter pylori; FOBT, Fecal Occult Blood test; LDCT, low-dose computed tomography; AFP, alpha-Fetoprotein; EHDs, electronic healthcare databases.The numbers attached to each section of the flowchart are from first-wave in 2017-2018, able to show one complete annual procedure of Scan-China.Data in later waves is undergoing processed and not presented here.

Table 1 .
Continued (Continued to the next page)

Table 2 .
Baseline characteristics of the first-wave Han and Mongolian populations in Screening Cohort for Asian Nomadic descendants (n=68,349)1

Table 2 .
ContinuedTwo participants' ethnicities were missing and 1,760 participants' ethnicities were neither Han nor Mongolian, and were thus excluded from the analysis; Only Han and Mongolian participants were included in the above analysis (n=68,349). 1

Table 3 .
Incidence and mortality density 1 of each targeted cancer type for the first-wave Han and Mongolian populations in Screening Cohort for Asian Nomadic descendants (n=68,349)2None of the differences in incidence/mortality density between Han and Mongolian ethnicity, regardless of the sex category showed statistical significance (all p-values >0.05). 1