Risk of COVID-19 transmission in heterogeneous age groups and effective vaccination strategy in Korea: a mathematical modeling study

OBJECTIVES This study aims to analyze the possibility and conditions of maintaining an effective reproductive number below 1 using a mathematical model. METHODS The total population was divided into five age groups (0-17, 18-29, 30-59, 60-74, and ≥75 years). Maximum likelihood estimation (MLE) was used to estimate the transmission rate of each age group. Mathematical model simulation was conducted until December 31, 2021, by establishing various strategies for vaccination and social distancing without considering variants. RESULTS MLE results revealed that the group aged 0-17 years had a lower risk of transmission than other age groups, and the older age group had relatively high risks of infection. If 70% of the population will be vaccinated by the end of 2021, then simulations showed that even if social distancing was eased, the effective reproductive number would remain below 1 near August if it was not at the level of the third re-spreading period. However, if social distancing was eased and it reached the level of the re-spreading period, the effective reproductive number could be below 1 at the end of 2021. CONCLUSIONS Considering both stable and worsened situations, simulation results emphasized that sufficient vaccine supply and control of the epidemic by maintaining social distancing to prevent an outbreak at the level of the re-spreading period are necessary to minimize mortality and maintain the effective reproductive number below 1.


INTRODUCTION
was implemented in the metropolitan area.
Vaccination in Korea started for the first time on February 26, 2021, and as of May 25 2021, a total of 3,864,784 people has received at least one dose [6]. The Korean government plans to administer 190 million doses of the vaccine by the end of 2021. In order to minimize the number of severely ill patients and deaths, vaccinations are scheduled for the elderly in the second quarter, starting with residents and workers of long-term care hospitals and nursing facilities and employees at high-risk medical institutions [7,8]. According to the press release dated May 6, 2021 of the Korea Disease Control and Prevention Agency (KDCA), the effectiveness of vaccines was found to be 86% for AZD1222 and 89.7% for BNT162 two weeks after the first dose for those aged ≥ 60 years, and the adverse reaction reporting rate was 0.1% [8].
Mathematical models have been developed to study the spread of COVID-19 in Korea even before the first epidemic wave. When COVID-19 was only prevalent in China in the early 2020s, a study was conducted to investigate the scale of an outbreak once the disease arrives in Korea. The model was used to interpret initial epidemic situation considering changes in the population behavior. Another study was conducted to analyze the effect of social distancing using mobility data observed during the first epidemic wave [9][10][11].
Motivated by earlier studies, this study aims to formulate a mathematical model that considers age groups and vaccination. The period for the model simulation is from October 12, 2020, to May 25, 2021, and the extended simulation includes the vaccination plan for 2021. Considering factors such as social distancing, amount of vaccination, and vaccination method, this study aims to examine the possibility of maintaining the effective reproductive number below 1 and design strategies to minimize mortality.

Estimation of transmission rate matrix between age groups using maximum likelihood estimation
Age, presence of symptoms, symptom onset date, and diagnosis date of individual confirmed cases from October 12, 2020, to February 14, 2021, provided by the KDCA were used to estimate transmission rates between age groups using the maximum likelihood estimation (MLE) method. The total population was divided into five groups considering vaccination policy and behavioral heterogeneity, and accordingly each age group and population were as follows [12]: I: 0-17 years (7,625,814 people); II: 18-29 years (7,781,166 people); III: 30-59 years (23,650,843 people); IV: 60-74 years (9,020,169 people); V: ≥ 75 years (3,624,108 people).
The transmission rates between age groups form a 5 × 5 matrix. It is assumed that the transmission rate matrix is symmetric, and the number of estimated transmission rates is 15. To establish the MLE, we assume a homogenous mixing of infectors and infectees in the community and an exponential distribution for the event of infection. Hence, if the rate at which an individual in age group Y infects an individual in age group X is β XY , then the probability that individual i in age group X at time t is not infected at time t+1 is p sur,X,i (t)= The transmission rates between age groups form a 5×5 matrix is symmetric, and the number of estimated transmission rates is homogenous mixing of infectors and infectees in the community infection. Hence, if the rate at which an individual in age group then the probability that individual in age group at time [13]. Here, ( ) is the number of infected p disease at time , and is the total population. Conversely, the ). The likelihood of all individ moment) can be formulated using the probability of being not in Since there was no information on the date when individual pati [13]. Here, I Y (t) is the number of infected people in age group Y that can transmit the disease at time t, and N is the total population. Conversely, the probability of being infected at time t+1 is The transmission rates between age groups form a 5 is symmetric, and the number of estimated transmissio homogenous mixing of infectors and infectees in the c infection. Hence, if the rate at which an individual in a then the probability that individual in age group [13]. Here, ( ) is the number of disease at time , and is the total population. Conv ). The likelihood of moment) can be formulated using the probability of be Since there was no information on the date when indiv The likelihood of all individuals (including those not infected until the last moment) can be formulated using the probability of being not infected and infected, which is as follows: and population were as follows [12]: Ⅰ: 0-17 years (7,625,814 people); Ⅱ: 18-29 years (7,781,166 people); Ⅲ : 30-59 years (23,650,843 people); Ⅳ: 60-74 years (9,020,169 people); Ⅴ: ≥75 years (3,624,108 people). ) [13]. Here, ( ) is the number of infected people in age group that can transmit the disease at time , and is the total population. Conversely, the probability of being infected at time + 1 is ). The likelihood of all individuals (including those not infected until the last moment) can be formulated using the probability of being not infected and infected, which is as follows Since there was no information on the date when individual patients were infected, symptomatic patients are Since there was no information on the date when individual patients were infected, symptomatic patients are assumed to be infected four days before symptom onset and can transmit the virus two days before symptom onset until one day before diagnosis [14,15]. The periods of infection transmission for asymptomatic infected patients were set in the same way as those with symptoms using the average value of symptomatic patients.

Mathematical modeling for the COVID-19 pandemic
Based on the five age groups considered earlier, a mathematical model reflecting vaccination was developed. According to the epidemiological characteristics of COVID-19, the total population was divided into Susceptible (S), Latent or Exposed (E), Infectious (I), Isolated (Q), and Recovered (R) groups. Three additional groups (U, V, P) were considered to reflect vaccination: the group who is vaccinated but not effectively (Unprotected: U), the group that had been effectively vaccinated but not yet immunized (Vaccinated: V), and the group that was effectively vaccinated and became immunized after a certain period (Protected: P). Therefore, the groups S, U, and V can be infected, and it was assumed that all had the same susceptibility. The mathematical model flowchart is shown in Figure 1, and the governing differential equations are as follows.
assumed to be infected four days before symptom onset and can transmit the vir onset until one day before diagnosis [14,15]. The periods of infection transmiss patients were set in the same way as those with symptoms using the average val

Mathematical modeling for the COVID-19 pandemic
Based on the five age groups considered earlier, a mathematical model reflecti According to the epidemiological characteristics of COVID-19, the total popula ( ), Latent or Exposed ( ), Infectious ( ), Isolated ( ), and Recovered ( ) grou , ) were considered to reflect vaccination: the group who is vaccinated but n the group that had been effectively vaccinated but not yet immunized (Vaccinat effectively vaccinated and became immunized after a certain period (Protected:   and can be infected, and it was assumed that all had the same susceptibility.
flowchart is shown in Figure 1, and the governing differential equations are as f assumed to be infected four days before symptom onset and can transmit the vir onset until one day before diagnosis [14,15]. The periods of infection transmiss patients were set in the same way as those with symptoms using the average val

Mathematical modeling for the COVID-19 pandemic
Based on the five age groups considered earlier, a mathematical model reflecti According to the epidemiological characteristics of COVID-19, the total popula ( ), Latent or Exposed ( ), Infectious ( ), Isolated ( ), and Recovered ( ) grou , ) were considered to reflect vaccination: the group who is vaccinated but n the group that had been effectively vaccinated but not yet immunized (Vaccinat effectively vaccinated and became immunized after a certain period (Protected: and can be infected, and it was assumed that all had the same susceptibility.
flowchart is shown in Figure 1, and the governing differential equations are as f assumed to be infected four days before symptom onset and can transmit the vir onset until one day before diagnosis [14,15]. The periods of infection transmiss patients were set in the same way as those with symptoms using the average val

Mathematical modeling for the COVID-19 pandemic
Based on the five age groups considered earlier, a mathematical model reflecti According to the epidemiological characteristics of COVID-19, the total popula ( ), Latent or Exposed ( ), Infectious ( ), Isolated ( ), and Recovered ( ) grou , ) were considered to reflect vaccination: the group who is vaccinated but n the group that had been effectively vaccinated but not yet immunized (Vaccinat effectively vaccinated and became immunized after a certain period (Protected: and can be infected, and it was assumed that all had the same susceptibility.
flowchart is shown in Figure 1, and the governing differential equations are as f assumed to be infected four days before symptom onset and can transmit the vir onset until one day before diagnosis [14,15]. The periods of infection transmiss patients were set in the same way as those with symptoms using the average val

Mathematical modeling for the COVID-19 pandemic
Based on the five age groups considered earlier, a mathematical model reflecti According to the epidemiological characteristics of COVID-19, the total popula ( ), Latent or Exposed ( ), Infectious ( ), Isolated ( ), and Recovered ( ) grou , ) were considered to reflect vaccination: the group who is vaccinated but n the group that had been effectively vaccinated but not yet immunized (Vaccinat effectively vaccinated and became immunized after a certain period (Protected: and can be infected, and it was assumed that all had the same susceptibility.
flowchart is shown in Figure 1, and the governing differential equations are as f assumed to be infected four days before symptom onset and can transmit the vir onset until one day before diagnosis [14,15]. The periods of infection transmiss patients were set in the same way as those with symptoms using the average val

Mathematical modeling for the COVID-19 pandemic
Based on the five age groups considered earlier, a mathematical model reflecti According to the epidemiological characteristics of COVID-19, the total popula ( ), Latent or Exposed ( ), Infectious ( ), Isolated ( ), and Recovered ( ) grou , ) were considered to reflect vaccination: the group who is vaccinated but n the group that had been effectively vaccinated but not yet immunized (Vaccinat effectively vaccinated and became immunized after a certain period (Protected: and can be infected, and it was assumed that all had the same susceptibility.
flowchart is shown in Figure 1, and the governing differential equations are as f assumed to be infected four days before symptom onset and can transmit the vir onset until one day before diagnosis [14,15]. The periods of infection transmiss patients were set in the same way as those with symptoms using the average val

Mathematical modeling for the COVID-19 pandemic
Based on the five age groups considered earlier, a mathematical model reflecti According to the epidemiological characteristics of COVID-19, the total popula ( ), Latent or Exposed ( ), Infectious ( ), Isolated ( ), and Recovered ( ) grou , ) were considered to reflect vaccination: the group who is vaccinated but n the group that had been effectively vaccinated but not yet immunized (Vaccinat effectively vaccinated and became immunized after a certain period (Protected: and can be infected, and it was assumed that all had the same susceptibility.
flowchart is shown in Figure 1, and the governing differential equations are as f assumed to be infected four days before symptom onset and can transmit the vir onset until one day before diagnosis [14,15]. The periods of infection transmiss patients were set in the same way as those with symptoms using the average val

Mathematical modeling for the COVID-19 pandemic
Based on the five age groups considered earlier, a mathematical model reflecti According to the epidemiological characteristics of COVID-19, the total popula ( ), Latent or Exposed ( ), Infectious ( ), Isolated ( ), and Recovered ( ) grou , ) were considered to reflect vaccination: the group who is vaccinated but n the group that had been effectively vaccinated but not yet immunized (Vaccinat effectively vaccinated and became immunized after a certain period (Protected: and can be infected, and it was assumed that all had the same susceptibility.
flowchart is shown in Figure 1, and the governing differential equations are as f assumed to be infected four days before symptom onset and can transmit the vir onset until one day before diagnosis [14,15]. The periods of infection transmiss patients were set in the same way as those with symptoms using the average val

Mathematical modeling for the COVID-19 pandemic
Based on the five age groups considered earlier, a mathematical model reflecti According to the epidemiological characteristics of COVID-19, the total popula ( ), Latent or Exposed ( ), Infectious ( ), Isolated ( ), and Recovered ( ) grou , ) were considered to reflect vaccination: the group who is vaccinated but n the group that had been effectively vaccinated but not yet immunized (Vaccinat effectively vaccinated and became immunized after a certain period (Protected: and can be infected, and it was assumed that all had the same susceptibility.
flowchart is shown in Figure 1, and the governing differential equations are as f assumed to be infected four days before symptom onset and can transmit the vir onset until one day before diagnosis [14,15]. The periods of infection transmiss patients were set in the same way as those with symptoms using the average val

Mathematical modeling for the COVID-19 pandemic
Based on the five age groups considered earlier, a mathematical model reflecti  In the model, the subscript X denotes the age group, and β XY was the estimated transmission rate between age groups using MLE, which represented the average transmission among age groups during the third wave of the epidemic. It was assumed that the transmission rates between age groups were the same during the entire simulation period. However, epidemic trends change depending on intervention policies and transmission patterns, so an adjusting constant, β 0 (t), was considered to the force of infection to account for intervention policy and transmission pattern that changed every period. Vaccination was reflected in the model using ν X , and the actual amount of vaccination was reflected until the estimated simulation period of β 0 (t) [6]. The phase dependent parameter β 0 (t) was estimated by mini mizing the square error of the number of cumulative confirmed cases in the model (∫∑αI X dt) and the reported data. The period of estimation was from October 12, 2020, the start of social distancing phase 1 before the third wave in Korea, to May 25, 2021 [16]. The parameter, p, represents the average vaccine effectiveness. The amount of vaccination and vaccine effectiveness of each type of vaccine administered in Korea were calculated as a weighted average and set to 0.84 [17]. The values of the model parameters are listed in Table 1, and the mortality rate by age group was calculated based on the data of individual confirmed cases who were released from isolation. The effective reproductive number R(t) calculated using the next-generation method was presented in the results section [18].

Vaccination scenarios
In order to predict the situation of the COVID-19 epidemic following vaccination, scenarios were considered based on the phase transition criteria of the social distancing system reorganization plan of the Ministry of Health and Welfare [21]. Under the reorganization plan, the social distancing phase is determined by the weekly average number of new confirmed cases per 100,000 population, considering the availability of regional intensive care units. At the national level, the transition from phase 1 to phase 2 requires that the weekly average number of new confirmed cases per 100,000 population is 363. To transition to phases 3 and 4, the weekly average number of new confirmed cases per 100,000 population are 778 and 1,556 people, respectively. Phase 3 was not declared until May 25, 2021, since the weekly average of confirmed cases was below 600. Scenarios were designed by assuming that phases 1 and 2 would continue during the second half of 2021 without the emergence of variants, re-spreading, or eradication of the COVID-19 epidemic. In both scenarios, the obtained transmission rate matrix and estimated values of adjustment factor during the third epidemic wave β 0 (t) were used. ■ Scenario 1 (S1): Phase 1 is applied with the largest value of β 0 (t) during the third wave (when R(t) = 1.68). ■ Scenario 2 (S2): Phase 1 is applied with the value of β 0 (t) during the period right before the third wave (when R(t) = 1.22, October 12 to November 3, 2020). Note that phase 2 is applied with the smallest value of β 0 (t) during the third wave both in scenarios 1 and 2 (when R(t) = 0.71).
Three vaccination strategies were considered; priority vaccination for the elderly group (in the order of group V-IV-III-II), simultaneous vaccination with the same proportion of all age groups except for those under aged 17 (I), and no vaccination. In the priority vaccination strategy, the next age group beginning from the oldest, was vaccinated after all the individuals in their age group were given the full dose of vaccine. In both priority and simultaneous vaccination strategies, individuals in age group I are not vaccinated. Since the total amount of vaccination may change flexibly depending on the situation, the target proportion of the population to be vaccinated by December 31, 2021 is set to 60%, 70%, or 80% (31,021,260, 36,191,470, and 41,361,680) of the total population of Korea. This means that if 60% (70 to 80%) of the total population is vaccinated, approximately 70% (82 to 94%) of the remaining age groups except for the age group I will be vaccinated. Accordingly, the total number of simulations is 14. The simulation period for the scenarios begins from May 25, 2021, the last day of parameter estimation, to December 31, 2021, and the daily vaccination dose is given equally, on average, according to the set total vaccination dose. The observed figures are the number of additional daily new confirmed cases since May 25, 2021, cumulative confirmed cases and deaths, effective reproductive number, and time when the effective reproductive number is consistently below 1 (when the effective reproductive number is below 1 even if the level of social distancing is eased).

Estimation of transmission rate matrix between age groups using maximum likelihood estimation
Estimated transmission rate between age groups (β XY ) is visualized in Figure 2. Figure 2A is the value of the log-scaled transmission rates, and Figure 2B is the result of calculating the risk of being infected adjusted according to the population of each age group. The entries of the matrix are listed in Supplementary Material 1. The transmission rate was estimated to be high in the diagonal components; the highest was for the group aged 18-29 years (II) at 0.79, and the second highest was for the group aged ≥ 75 years (V) at 0.78. Figure 2B displays the group dependent risk of exposure to infection using the adjusted population. This risk was calculated by multiplying the sum of the column elements by the value obtained after dividing the population in the age group corresponding to each column by the total population. This relative risk of infection is in the order of IV-II-V-III-I, from the highest. The group aged 60-74 years (IV) had the highest risk of being infected by other groups at 0.15, and among them, the infection risk by the group aged 30-59 years (III) accounted for more than half with 0.09. The age groups from 0 years to 59 years (I, II, III) had the lowest risk of infection by other groups at 0.07. The risk of infection among the same age group was the highest at 0.12 in the group aged 18-29 years (II) and the lowest at 0.05 in those aged 60-74 years (IV). Figure 3 shows the estimation results of the adjusting constant, β 0 (t), according to the government's intervention policy. Figure  3A is the effective reproductive number, Figure 3 are the mathematical model simulation results (dark curve) and the KD CA press release data (red squares) for the number of daily and cumulative confirmed cases for all age groups, respectively. The graphs comparing the number of confirmed cases by age group and the model simulation results are shown in Figure 3D and E.

Analysis of the effective reproductive number and the ratio of confirmed cases by age group
The adjusting constant, β 0 (t), was estimated to be 1.09 (effective reproductive number of 1.22) between October 12 and November 3, 2020, when phase 1 of social distancing was implemented at the start time of the model, and it was estimated to be the highest at 1.50 (effective reproductive number of 1.68) between November 4 and November 24, 2020. It was estimated to be the smallest at 0.64 (effective reproductive number of 0.71) between December 23, 2020 and January 18, 2021, when the social distancing was strengthened to phase 2.5, and when the ban on private gatherings with five or more people began in the metropolitan area. Figure 4 shows mathematical model simulation results for each age group until May 25, 2021. In Figure 4A, the group aged 30-59 years (III) accounts for the largest number (n = 51,331) of the total number of confirmed cases. Figure 4B shows the number of confirmed cases per 100,000 people, adjusted with the number of people in each age group. It was observed that the group aged 0-17  Figure 5 shows the simulation results for the scenarios when 70% of the total population (approximately 140,000 vaccination per day) is vaccinated by December 31, 2021. The colors of the graph indicate the vaccination strategies. The gray curve shows the case when there is no vaccination, the red curve indicates that all age groups are vaccinated at the same rate, and the blue curve indicates that the older age group is vaccinated first. In each strategy, the solid and the dotted lines describe scenarios where β 0 (t) is varied according to the social distancing level. The dotted line is the simulation result of scenario 1, and the solid line is scenario 2. If there is a point when the effective reproductive number is consistently below 1, the time is marked with an asterisk. The results of vaccinating 60% and 80% of the total population and varying the vaccine effectiveness to 65%, 70%, 75%, 80%, and 90% are summarized in Supplementary Material 2. Table 2 summarizes the time when the reproductive number reaches consistently below 1, the number of additional confirmed cases, and the additional mortality number for each scenario. It was observed that the case when the total population was simultaneously vaccinated with equal rates, except for the group aged under 17, had fewer confirmed cases than the elderly priority vaccination strategy. If 80% of the total population was vaccinated at the same rate, the number of additional confirmed cases was 44,859, the smallest among all scenarios. It was identified that in all sce-

DISCUSSION
Estimation of the transmission rate between age groups using MLE showed a high transmission rate among the group aged 18-29 years and in the group aged ≥ 60 years ( Figure 2). The result adjusted by the population showed that the risk of spreading the disease in the third wave of the epidemic was not limited to a specific age group but was observed in all age groups. The group aged 0-17 years showed a relatively low transmission rate compared to other age groups, not only due to the effect of school closures, delay of school opening, and online lessons, which are part of social distancing, but also due to the nature of the virus itself, in the sense that the transmission rate of the virus among children and adolescents is lower than that of adults [22]. The school-age group had an estimated nine times less transmission rate than the rest of the age groups [23].
The results of the extended model simulation from May 25, 2021 extended to December 31, 2021, show that maintaining social distancing is crucial for an effective vaccination strategy. Scenarios S1 and S2 were established considering different vaccination strategies (simultaneous vaccination for all age groups and priority vaccination for the elderly) and social distancing situations (considering whether the transmission rate would increase to the level of a re-spreading of third wave (S1, at the time when R(t) = 1.68), or whether it would be similar to phase 1 of distancing implemented on October 12, 2020 (S2, at the time when R(t)= 1.22)). In S1, with the elderly having vaccination priority, even if 70% of the population is vaccinated during 2021, it is not until December that the effective reproductive number can start to be consistently below 1. This is not, however, conclude that priority vaccination for the elderly is not appropriate. Rather, when the elderly is vaccinated on priority, mortality number is always lower than of the scenarios when the vaccine is administered in equal proportion to the total population, excluding 588 people in the worst-case scenario (vaccination rate of 60%, scenario 1) ( Table  2). Simulation results showed that if social distancing is maintained as in S2, it is possible that the effective reproductive number begins to remain consistently below 1 in August when 70% of the population is vaccinated by the end of December ( Figure 5D and E solid lines).
This study had some limitations. First, to reduce the complexity of the model, second dose vaccination was not considered, and the type of vaccine and vaccine effectiveness were reflected on an average. In addition, waning immunity to vaccines and having less probability of being infected due to vaccination were not considered. Second, the transmission rate between age groups is likely to have changed due to vaccination that has begun recently, and in particular, the transmission rate in the elderly group may have been overestimated in this study. However, this study reflected the overall average transmission during the third wave of the epidemic. Due to data limitations, the estimated transmission rate, assuming homogeneous mixing in the community, does not imply the causal relationship of the actual infection. Hence, it may differ from the actual contact trend. Third, this study did not reflect the increase in transmission due to the appearance of variants. Hence, reaching a consistent effective reproductive number of below 1 may be delayed in each scenario due to the emergence of variants. However, in the eased social distancing scenario of this study, the high transmission rate (at the time of R(t) = 1.68) can also be interpreted as one reflecting the increase in transmission caused by variants. Further studies will be conducted considering second dose vaccination, transmission rate by epidemic period, waning immunity, detailed vaccine effectiveness, and variants.
In conclusion, this study developed a mathematical model where the transmission rates between age groups are estimated using MLE based on Korean data and analyzed the results in terms of effective reproductive number, the number of infected people, and minimizing mortality number by simulating various scenarios considering the 2021 vaccination plan. A critical result of this study is that effective vaccination strategies should be combined with vaccination priority and rapid vaccination as well as social distancing to prevent the effective reproductive number from reaching the level of re-spreading of third wave. The priority vaccination strategy for the elderly significantly reduces the mortality number, indicating that the purpose and effect of currently established vaccination priorities are the same. However, in the worsening situation, simulation results show that the epidemic will continue in 2021, and even in such situations, the effectiveness of minimizing the mortality number remains unchanged. Thus, the results of the study emphasize the importance of social distancing rather than changing vaccination priorities. It is identified through mathematical modeling, that if 70% of the population is vaccinated by the end of December 2021 and social distancing is implemented to prevent the level of re-spreading during the third wave, the effective reproductive number may continue to be below 1 from around August 2021.