Identifying pregnancy episodes and estimating the last menstrual period using an administrative database in Korea: an application to patients with systemic lupus erythematosus

OBJECTIVES This study developed an algorithm for identifying pregnancy episodes and estimating the last menstrual period (LMP) in an administrative claims database and applied it to investigate the use of pregnancy-incompatible immunosuppressants among pregnant women with systemic lupus erythematosus (SLE). METHODS An algorithm was developed and applied to a nationwide claims database in Korea. Pregnancy episodes were identified using a hierarchy of pregnancy outcomes and clinically plausible periods for subsequent episodes. The LMP was estimated using preterm delivery, sonography, and abortion procedure codes. Otherwise, outcome-specific estimates were applied, assigning a fixed gestational age to the corresponding pregnancy outcome. The algorithm was used to examine the prevalence of pregnancies and utilization of pregnancy-incompatible immunosuppressants (cyclophosphamide [CYC]/mycophenolate mofetil [MMF]/methotrexate [MTX]) and non-steroidal anti-inflammatory drugs (NSAIDs) during pregnancy in SLE patients. RESULTS The pregnancy outcomes identified in SLE patients included live births (67%), stillbirths (2%), and abortions (31%). The LMP was mostly estimated with outcome-specific estimates for full-term births (92.3%) and using sonography procedure codes (54.7%) and preterm delivery diagnosis codes (37.9%) for preterm births. The use of CYC/MMF/MTX decreased from 7.6% during preconception to 0.2% at the end of pregnancy. CYC/MMF/MTX use was observed in 3.6% of women within 3 months preconception and 2.5% during 0-7 weeks of pregnancy. CONCLUSIONS This study presents the first pregnancy algorithm using a Korean administrative claims database. Although further validation is necessary, this study provides a foundation for evaluating the safety of medications during pregnancy using secondary databases in Korea, especially for rare diseases.


INTRODUCTION
Pregnant women represent a unique population typically not included in clinical trials, and most clinical decisions regarding medication use during pregnancy are based on animal studies, case reports, and a few post-surveillance studies using registry data.Considering the scarcity of safety data with respect to pregnant women, recent studies have attempted to examine medication safety during pregnancy using routinely compiled adminis-Identifying pregnancy episodes and estimating the last menstrual period using an administrative database in Korea: an application to patients with systemic lupus erythematosus trative claims databases.Administrative claims databases offer several advantages, including large sample sizes with relatively low costs and no recall bias; however, misclassification of pregnancy outcomes and the last menstrual period (LMP) due to missing information, miscoding, and cross-coding is a common disadvantage [1,2].
In several countries, pregnancy algorithms based on administrative claims databases have been developed and adapted according to data availability, clinical practice patterns, and the healthcare system to minimize the misclassification of pregnancy outcomes and the LMP [3][4][5][6][7][8][9].The agreement, expressed as percentage, between pregnancy outcomes identified by algorithms and reviewer decisions is 96-100% for live births, 70.8-100% for stillbirths, and 92-100% for spontaneous abortions [4,6,9,10].Despite the validity of published algorithms, they are not directly applicable to the Korean national claims database, and a new algorithm considering code availability and practices in Korea needs to be proposed.
Pregnancy studies using claims databases in Korea often only involve live births, which can be linked to infant claims, and identify the LMP by subtracting a fixed number of weeks, which is usually the average gestational age (GA) for corresponding pregnancy outcomes [11][12][13].The associated algorithms can have high validity for pregnancy outcomes while overlooking pregnancy loss and teratogenic effects resulting in pregnancy loss [14].Consecutive pregnancy episodes of an individual also cannot be fully captured, and overlapping episodes are not considered.Refining the pregnancy algorithm would further reduce the misclassification bias of medication exposure during preconception and certain periods of pregnancy (first, second, and third trimesters).Therefore, developing a pregnancy algorithm using a nationwide claims database in Korea would facilitate further investigation of the safety of medication use during pregnancy, considering the timing of exposure and including outcomes with pregnancy loss.
The objectives of our study were (1) to develop an algorithm for identifying pregnancy episodes and estimating the LMP using a Korean claims database and (2) to examine pregnancy outcomes in patients with systemic lupus erythematosus (SLE) and the use of pregnancy-incompatible immunosuppressants before and during pregnancy.SLE is an autoimmune disease predominant in women of childbearing age, and patients with SLE have a higher incidence of adverse pregnancy outcomes than observed in the general population [15,16].Women with SLE who are considering pregnancy are advised to discontinue pregnancy-incompatible immunosuppressants for up to 3 months before the LMP [15,17].However, although pre-family planning is recommended, it is not universally implemented in clinical practice [18].Therefore, investigating the prevalence of pregnancy-incompatible drug use in clinical settings is important, as it may play a role in the higher risk of adverse pregnancy outcomes in patients with SLE.

Data sources
We used nationwide healthcare claims data from the National Health Information Database (NHID) of Korea, which provides all citizens' eligibility data (income-based insurance contributions, demographic variables, and date of death), national screening data, and healthcare utilization data (inpatient and outpatient usage and prescription records) [19].

Algorithm
The study algorithm used codes from the Korean Standard Classification of Diseases, seventh revision (KCD-7), which are based on the International Classification of Diseases, 10th revision (ICD-10), and procedure codes for claims data from 2002 to 2018.It involved following two steps: (1) determination of pregnancy outcomes (delivery, stillbirth, and abortion) and (2) estimation of the LMP.Women were allowed multiple pregnancies, and all episodes indicative of pregnancy in women of childbearing age (12-49 years) at the time of the pregnancy outcome were included.

Determination of pregnancy outcomes
Following pregnancy outcomes were defined using KCD-7 or procedure codes based on pregnancy studies in Korea: delivery (full-term, preterm), stillbirth, and abortion (spontaneous, induced) [12,[20][21][22].A list of codes defining each outcome is provided in Supplementary Material 1.An obstetrician-gynecologist confirmed definition using KCD-7 codes for stillbirth and abortion and procedure codes for delivery in clinical practice.During our pre-analysis of the NHID, we found that procedures for delivery were recorded in 99.4% of potential delivery claims, indicating delivery by either KCD-7 or procedure codes (Supplementary Material 2).KCD-7 codes for abortion were present in 97.5% of potential abortion claims by either KCD-7 or procedure codes (Supplementary Material 3).Therefore, we chose to use procedure codes for delivery, as they are commonly recorded for reimbursement purposes, whereas for abortions, we opted for diagnosis codes, as not all abortions require medical procedures.
The clinically plausible duration required for subsequent episodes and a hierarchical approach were adapted from a published pregnancy algorithm to identify true episodes from multiple claims (Figure 1A) [4][5][6]10].Pregnancy episodes were defined as the period between the estimated date of the LMP and pregnancy outcome.As illustrated in Supplementary Material 4, the first claim of pregnancy-related codes within the available timeframe of an individual patient was assigned as the first pregnancy episode.Multiple claims within the clinically plausible duration for subsequent episodes were considered to be part of the same episode, and claims outside the minimum duration were categorized as subsequent episodes (Supplementary Material 5).The step was repeated for all pregnancy outcomes.Subsequently, a hierarchical approach was applied to resolve overlapping episodes.Pregnancy episodes with a lower hierarchy were considered miscoded and Considering the validity of outcomes, the hierarchy of pregnancy outcomes was adapted from pregnancy studies using Korean claims databases [4][5][6]10].A study using a claims database with a mother-infant link in Korea confirmed that 95.3% of all deliveries were linked to the infant record, which has shown high validity of the delivery code [20].Stillbirth is considered a higher priority than delivery because it can coexist with delivery claims.Therefore, the following hierarchy of pregnancy outcomes was used in our study: stillbirth > delivery > abortion.

Estimation of the menstrual period
We estimated the LMP using two distinct methods, depending on the presence of procedures for abortion, sonography, and diagnosis for preterm delivery within each episode, as described in Figure 1B [3][4][5].
First, when abortion or sonography procedures or preterm delivery diagnoses were present, we estimated the LMP by subtracting the midpoint of the specified trimester or GA from the outcome date [6] (Supplementary Material 6-1).To ensure clinical validity, a hierarchy was applied based on reimbursement guidelines, verified by obstetrician-gynecologists, and codes with a narrower range of the indicated GA or trimester had higher priority.In cases of conflicting information on GA or trimesters on same date, the lowest GA was chosen.Priority among sonography was given in the following order: second-trimester to third-trimester sonography (target scan), first-trimester sonography (target scan), first-trimester sonography, or second-trimester to third-trimester sonography.Preterm deliveries with a specified GA were given precedence over those without, with the LMP estimated by subtracting 35 weeks from the outcome date for unspecified preterm deliveries [4,23].The latest LMP was selected with the smallest GA estimated by abortion or sonography codes for abortions, and sonography and preterm codes for deliveries.
Second, in the absence of codes indicating GA or trimester, we estimated the LMP by subtracting 39 weeks, 28 weeks, and 10 weeks from the date of the pregnancy outcomes for delivery, stillbirth, and abortion, respectively (Supplementary Material 6-2).This method is referred to as "outcome-specific estimates" and is similar to the conventionally used fixed-week subtraction method [4].
Final adjustment of the LMP was conducted using a retry period and setting minimum and maximum clinically plausible GAs for each outcome.The retry period, which represents the duration clinically required for subsequent pregnancy episodes to start after pregnancy outcome, was applied when two successive pregnancy episodes overlapped [4] (Supplementary Material 7).As illustrated in Supplementary Material 8, if the estimated LMP was earlier than the date of the previous pregnancy outcome, the LMP of the subsequent pregnancy episode was adjusted to the date of the previous pregnancy outcome plus the retry period.
Episodes not within the minimum and maximum GAs for each outcome were either reclassified or excluded [6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23].Stillbirth epi-sodes at less than 20 weeks GA were reclassified as abortions through case review.Deliveries at less than 37 weeks were classified as preterm and deliveries at 37 weeks or more as full-term births.Any pregnancy episodes with estimated GAs exceeding 42 weeks and deliveries at less than 20 weeks were considered miscoded and excluded.Abortion was further categorized into induced abortions, defined as abortions with diagnosis codes or procedure codes for induced abortion, and spontaneous abortions, defined as abortions without induced abortion codes (Supplementary Material 1).

Application of the pregnancy algorithm among patients with systemic lupus erythematosus
The pregnancy episodes of women of childbearing age (15-49 years) with SLE (KCD-7: M32.0, and rare intractable disease registration: V136) were identified from NHID from 2002 to 2018.We included pregnancy episodes after the index date of SLE diagnosis, and those with an LMP between 2005 and 2018 (Supplementary Material 9).A 3-year history period was examined to confirm the diagnosis of SLE before pregnancy.In drug utilization analysis, pregnancy episodes that started at least 1 year after the index date of SLE and had an LMP before 2018 were included to examine exposure during preconception and pregnancy.Episodes missing age, gender, and insurance information at the time of the LMP were excluded.

Identification of pregnancy outcomes and gestational age estimation
The prevalence of live births (full-term and preterm births), stillbirths, and abortions (spontaneous and induced abortions) was assessed in the pregnancies of patients with SLE between 2005 and 2018.The mean estimated GA and proportion of methods for estimating LMP were calculated for each pregnancy outcome.Since the national health insurance in Korea expanded reimbursement for prenatal sonography in 2016 and extended coverage for preterm infant care starting in 2016, the corresponding changes were made to the sonography procedure codes and preterm delivery diagnosis codes from that year onwards [24,25] (Supplementary Materials 10-12).Therefore, subgroups of 2005-2015 and 2016-2017 were also assessed to detect the impact of the code changes implemented in 2016.The prevalence of pregnancy outcomes among patients with SLE, as determined by our algorithm, was compared with that reported in cohort studies, primarily utilizing institutional data, to assess external validity [21,[26][27][28][29][30][31].

Medication use during pregnancy among patients with systemic lupus erythematosus
The European Alliance of Associations for Rheumatology and American College of Rheumatology guideline for the management of pregnancy recommends avoiding cyclophosphamide (CYC), mycophenolate mofetil (MMF), and methotrexate (MTX) to prevent fetal loss or malformation during pregnancy [15,17].It is recommended to avoid MTX and CYC before conception, and MMF must be discontinued at least 6 weeks before conception [17].CYC is reserved for use only in the second and third trimesters in case of life-threatening disease [15,17].
The use of CYC/MMF/MTX drugs during preconception (every 3 months before the LMP) and pregnancy periods (each trimester) was examined.To ascertain drug exposure and reduce misclassification bias, exposure to CYC/MMF/MTX was defined as having received more than one ( > 1) prescription during the relevant period, and a sensitivity analysis was conducted using an exposure definition of one or more ( ≥ 1) prescription [14].Exposure to CYC/MMF/MTX was investigated based on pregnancy outcomes, and subgroup analysis was conducted for the periods of 2005-2015 and 2016-2017.The medication exposure window was defined based on our new algorithm to estimate the LMP and the conventional algorithm (using only outcome-specific estimates).All analyses were also applied to non-steroidal anti-inflammatory drugs (NSAIDs).

Ethics statement
The study protocol for the analysis of de-identified patient data was approved by Hanyang University Bioethics Committee (IRB No. HYUH 2020-05-041).The requirement for informed consent was waived by the institutional review board.
The LMP estimation for full-term births was primarily based on outcome-specific estimates (92.3%) (Table 2).For preterm births, sonography procedure codes (54.7%) and preterm delivery diagnosis codes (37.9%) most frequently determined the estimates.The LMP for stillbirths was mostly estimated using outcome-specific estimates (90.2%).The LMP for spontaneous abortion was estimated using outcome-specific estimates (58.2%) and abortion procedure codes (33.7%), whereas the LMP for induced abortion was primarily estimated using abortion procedure codes (95.2%).In Korea, sonography codes for pregnant women were implemented in 2013 and have been used widely since 2016 with coverage expansion for pregnant women (Supplementary Materials 10 and 11).As a result, since 2016, most LMP estimates have been based on sonography codes (Supplementary Material 14).The mean GAs by year in 2005-2015 and 2016-2017 showed statistically significant differences for live birth and spontaneous abortion (p < 0.001, p = 0.005) (Supplementary Material 15).Additionally, a higher prevalence of preterm birth and abortion was observed in 2016-2017 than in 2005-2015 (Supplementary Material 16).
The utilization patterns of CYC/MMF/MTX and NSAIDs before and during pregnancy were investigated in patients with SLE (Figure 2).The percentage of CYC/MMF/MTX use was similar between the two definitions ( > 1 and ≥ 1 prescription).The use of CYC/MMF/MTX ( > 1) decreased from 7.6% at a year during preconception to 0.2% at the end of pregnancy.During -3 months to 0 months preconception, exposure to CYC/MMF/MTX occurred in 3.6% of cases and in 2.5% of cases at 0-7 weeks of pregnancy.NSAID use decreased from 21.2% a year during preconception to 0.9% at the end of pregnancy.A sharp decrease in NSAID use was observed between -3 months to 0 months preconception and 0-7 weeks of pregnancy.Greater exposure to CYC/MMF/MTX during -3 months to 0 months preconception was observed in cases of spontaneous abortion than in cases of full-term birth (8.2 vs. 1.4%), which is consistent with observations in the subgroups of 2005-2015 and 2016-2017 (Supplementary Material 17).

DISCUSSION
We developed the present algorithm based on a previous algorithm and adapted it to a nationwide claims database in Korea by applying a hierarchy of pregnancy outcomes and incorporating preterm delivery, sonography, and abortion procedure codes to approximate the LMP.The current approach is the first to use the Korean administrative claims database to determine pregnancy outcomes and refine conventional outcome-specific methods for estimating the LMP.
This algorithm builds upon previous ones, particularly those developed by Matcho et al. [4] and Moll et al. [6], to improve the validity of pregnancy outcomes and the LMP using national administrative databases.Matcho et al. [4] and Moll et al. [6] demonstrated improved validity in LMP estimation compared to conventional outcome-specific approaches, which had previously reported 76.3% agreement rates of live birth with the clinical GA   This improvement could be attributed to the application of screening tests, minimum and maximum pregnancy terms, and exclusion of episodes not adhering to clinical guidelines [4,6].Likewise, our algorithm also incorporated screening tests, minimum and maximum pregnancy terms, and a clinically plausible period for the subsequent outcome and LMP.
Identifying preterm births from administrative claims databases is a known challenge in the literature, given the significant impact even a 1-week misclassification can have on pregnancy outcomes.Moll et al. [6] reported an agreement rate of 62.4% for preterm births with adjudicator-identified results, while revealing a much higher rate of 97.8% for full-term births.In another study using a British Columbia administrative database, Margulis et al. [23] reported positive predictive values (PPVs) ranging from 74% to 91% for the ICD-9/10 based definition of preterm status.In our study, the prevalence of preterm birth among patients with SLE was found to be 11%, which is lower than the range reported in institutional studies (13-40%) [26][27][28][29][30]32].The underuse of preterm and sonography codes was observed before 2016 in the NHID, reflecting the expansion of reimbursement policy coverage in 2016 for preterm infant care and prenatal sonography [24,25].Furthermore, preterm delivery diagnoses with a specified GA window were implemented in the Korean administrative claims database in 2016 (Supplementary Material 12).These changes in reimbursement policy contributed to underestimation of preterm birth prevalence before 2016.
From 2005 to 2015, underutilized preterm and sonography codes led to an underestimation of preterm births, while in 2016 and 2017, there was an observable shift towards overestimation (Supplementary Material 16).Interestingly, two institutional studies on SLE in Korea, conducted by Seo et al. [27] and Koh et al. [26], reported a higher prevalence of preterm birth rates (27.2 and 32.4%), respectively, than our estimates.These findings align with the prevalence estimated in our study after 2016, which was 33% for preterm birth < 37 weeks and 21% for preterm birth < 34 weeks (Supplementary Material 16).As such, while the use of our algorithm requires caution, due to the absence of a validity study, we anticipate that our algorithm will achieve improved accuracy for the LMP, with the increasing use of preterm diagnosis and sonography codes specifying GA or trimesters.
We also observed a higher prevalence of spontaneous abortion (28%) among patients with SLE compared to the previously reported range (5-22%) [21,[27][28][29][30].A prior study, which used na-tionwide claims databases in Korea, reported a spontaneous abortion prevalence of 22% but utilized more restricted definition (O02-O06) compared to definition used in our study (O01-O08) [21].Applying the same definition led to a slightly lower estimated prevalence of spontaneous abortion at 26% (Supplementary Material 13).Several factors may contribute to the higher prevalence observed in our study, including recent increase in maternal age in Korea and a longer follow-up period (2005-2017 vs. 2013-2015) [21,33].
Although a high prevalence of spontaneous abortion was observed in the present study, we implemented a hierarchical differentiation of various pregnancy outcomes, with abortion being the lowest category.Specifically, (1) abortion cases adjacent to delivery or stillbirth cases and (2) clinically implausible abortion cases based on both previous and subsequent episodes were excluded from abortion episodes in our final analysis.This approach is in line with the best-performing method identified in a previous validation study for non-live births using Medicaid in the Mass General Brigham Patient Registry, which showed adequate PPVs when excluding cases with adjacent codes for other pregnancy outcomes from the definition of abortion [32].Further validation of abortion codes within the NHID is warranted due to differences in the healthcare system and coding practices.
In drug utilization analysis, CYC/MMF/MTX exposure was successfully reduced from 7.6% before conception to 0.2% during the third trimester.However, CYC/MMF/MTX was still prescribed in 3.6% of cases during -3 months to 0 months before conception.A gradual reduction in CYC/MMF/MTX was observed before conception, in contrast to the drastic decrease in utilization seen with NSAIDs at conception.This pattern may reflect the clinical patterns of discontinuing NSAIDs before the first trimester and suggests insufficient therapeutic interventions to discontinue CYC/ MMF/MTX before conception in patients with SLE.Further studies are required to investigate patterns of CYC/MMF/MTX use during pregnancy in patients with SLE.
A key strength of our study is large dataset of pregnant women with SLE, which is particularly valuable due to the rarity of this population.In contrast to previously studies with 100-200 SLE pregnancy cases, our study benefits from increased statistical power [26,27,29,31,32].Our pregnancy algorithm refined previous outcome-specific methods; however, it is important to emphasize that this algorithm is not a validation study using a database linked with electronic medical records or pregnancy registry.Acknowledging the algorithm's limitations is essential, especially in the context of the Korean reimbursement system, where the widespread use of sonography procedures and preterm diagnosis codes began in 2016.Caution is needed when applying and interpreting data for different periods as well as estimating preterm births.The utilization of O60 codes, which continued until 2010, might have included preterm labor without delivery, potentially resulting in an overestimation of preterm births.Preterm births were predominantly identified through sonography, which may include cases categorized as preterm based on GA but not necessarily requiring clinical care for preterm infants.Furthermore, sonography codes might underestimate GA and be given priority over the LMP estimated by the preterm diagnosis, as our algorithm prioritizes the LMP with the lowest GA.To address these complexities, additional validation studies should investigate the hierarchy between sonography and preterm codes.
Misclassification bias and limited clinical data are inherent drawbacks of claims databases, particularly in capturing pregnancy episodes without healthcare utilization, potentially resulting in outcome underestimation.However, algorithms such as those from Matcho et al. [4] and Moll et al. [6], using hierarchical approaches with various pregnancy markers, have demonstrated improved validity compared to conventional methods.
In this study, we developed an advanced algorithm for determining pregnancy outcomes and estimating the LMP using an administrative claims database in Korea.The algorithm incorporates a hierarchical approach to mitigate miscoding or cross-coding, a clinically plausible gap for subsequent LMP and outcomes, and diagnoses and procedures with the trimester or GA.Further validation studies, including comparison with pregnancy registries or electronic health records, are needed [34].
are presented as number (%).TS, target scan. 1 Sonography codes have been actively used since the expansion of coverage for prenatal sonography in 2016. 2 Preterm codes with specified periods have been utilized since the expansion of coverage for preterm infant care in 2016.