epiH Search


Epidemiol Health > Volume 42; 2020 > Article
Kanagarathinam and Sekar: Estimation of the reproduction number and early prediction of the COVID-19 outbreak in India using a statistical computing approach


Coronavirus disease 2019 (COVID-19), which causes severe respiratory illness, has become a pandemic. The World Health Organization has declared it a public health crisis of international concern. We developed a susceptible, exposed, infected, recovered (SEIR) model for COVID-19 to show the importance of estimating the reproduction number (R0). This work is focused on predicting the COVID-19 outbreak in its early stage in India based on an estimation of R0. The developed model will help policymakers to take active measures prior to the further spread of COVID-19. Data on daily newly infected cases in India from March 2, 2020 to April 2, 2020 were to estimate R0 using the earlyR package. The maximum-likelihood approach was used to analyze the distribution of R0 values, and the bootstrap strategy was applied for resampling to identify the most likely R0 value. We estimated the median value of R0 to be 1.471 (95% confidence interval [CI], 1.351 to 1.592) and predicted that the new case count may reach 39,382 (95% CI, 34,300 to 47,351) in 30 days.


Coronavirus disease 2019 (COVID-19) has rapidly spread worldwide, with 896,450 confirmed total new cases and 45,526 deaths globally as of April 2, 2020 [1]. The disease emerged as 27 cases of pneumonia with an unknown cause in Wuhan, China. The first COVID-19 case in India was identified on January 30, 2020, and the total number of reported cases reached 2,322 as of April 3, 2020 [2]. On March 3, 2020, the Indian government suspended all new visas and visas issued to nationals of Iran, Italy, Japan, and Korea, and on the next day implemented compulsory screening of all international passengers. The Indian government declared a countrywide lockdown for 21 days on March 24, 2020 as a measure to control the spread of COVID-19, which has developed into a pandemic. The transmission rate of COVID-19 has been relatively low in most countries, but with major outbreaks in a few countries, such as Iran, Italy, Japan, and Korea. Most countries have at least an early stage of COVID-19 spread before any mitigation measures have an impact [3]. Myers et al. [4] stated that accurate epidemic forecasting models would noticeably improve epidemic prevention and control capabilities. No vaccine is available for COVID-19, and vaccination is typically not a good option for stopping the spread of a new epidemic, as considerable time is required to develop a safe and effective vaccine (approximately 10 years) [5]. Li et al. [6] found that the COVID-19 incubation period was 5.2 days (95% confidence interval [CI], 4.1 to 7.0) and found indications that human-to-human transmission occurred among close contacts. India is the second most populated country, it is important to estimate the transmissibility of COVID-19 and to predict the total number of new cases, which will help direct focus towards this public health crisis. Mathematically based epidemic models, such as susceptible-infected-recovered (SIR) models [7], susceptible-infected-susceptible (SIS) models [8], susceptible-exposed-infected-recovered (SEIR) models [9], and susceptible-exposed-infected-recovered-susceptible (SEIRS) models [10] are used to predict the trajectory of epidemics. Estimating the reproduction number (R0) can be estimated statistically or empirically. In this work, we used the earlyR (https://cran.r-project.org/) package to estimate R0 and predict the trajectory of the outbreak.


Susceptible-exposed-infected-recovered-susceptible mathematical model

SEIR models can be used to predict the number of people infected based on R0. We have given a SEIR model in this study to demonstrate the importance of estimating R0 [11]. COVID-19 has an incubation period, also known as a latent period or latent delay (τ), of 2-14 days. The following assumptions were made for developing the mathematical model for COVID-19.
- The population growth of the region/country is exponential, and the COVID-19 epidemic is occurring in a sufficiently short period
- Infected individuals are assumed not to give birth
- Recovered individuals acquire permanent immunity with a probability f(0 ≤ f ≤ 1) or die from the disease with a probability of (1-f)
With S referring to susceptible individuals, E to susceptible individuals that become exposed at time t-τ, I to individuals who are infected, and R to those who have recovered from COVID-19, the resulting differential equations are:
dS(t)dt = b s(t) + bE(t) + bR(t) - μs(t) - γI(t) S (t)N(t)
dE(t)dt = γI(t) S (t)N(t) - γI(t-τ) S (t-τ)N(t-τ)e-μτ - μ E(t)
dI(t)dt = γI(t-τ) S (t-τ)N(t-τ)e-μτ - μ I(t) - α I(t)
dR(t)dt = -μ R(t) - f α I(t)
Where μ is the per capita death rate due to causes other than the disease, γ is the rate of contact (or) transmission rate (or) infection rate, α is the recovery rate, and b is the per capita birth rate (with b>μ).
At any instant,
S (t) + E (t) + I (t) + R(t) = N (t)
R0 is defined as,
R0 = γe-b+α
This constant is extremely important in characterizing the spread of COVID-19. It reflects how many people contract the disease from an infectious individual. In general, If R0> 1, secondary infections will occur and the disease is spreading throughout the population. According to WHO information as of January 23, 2020, the R0 of COVID-19 lies between 1.4 and 2.5. R0 may vary considerably for different infectious diseases, but also for the same disease in different populations [12].


All the data shown in Table 1 were collected from an Indian official website [2]. The epidemiological data from March 2, 2020 to April 2, 2020, as shown in Table 1, were utilized to estimate R0. A higher R0 indicates a higher likelihood of new infections.

Model development

The transmissibility of COVID-19 in India was evaluated using the earlyR package. It was assumed that interventions so far have had a minimal impact on COVID-19 transmission in India. The model used herein is a simplified version of the model introduced by Cori et al. [13]. Serial interval distributions (i.e., mean and standard deviation [SD]) are required to estimate R0. We assumed that the mean and SD were 4.7 days and 2.9 days, respectively, based on existing research [14]. The maximum-likelihood (ML) approach was applied to obtain the distribution of R0. The bootstrap strategy was applied for re-sampling 1,000 times to obtain likely R0 values. The R package projection was used to predict the cumulative daily incidence [15]. We forecast the cumulative total new cases after 30 days. The daily incidence obeys a Poisson distribution determined by daily infectiousness, which is denoted as,
λ(t) = k=1t-1XkV(t-k)
Where V (t-k) the vector of the probability mass function and Xk is is the real-time incidence at time k. The forecasting model depended on the present incidence and serial interval distributions. The projections were based on resampling and probability computations. The statistical analysis and model development were done using R version 3.6.3 (https://cran.r-project.org/bin/windows/base/old/3.6.3/).

Ethics statement

The analysis in the article is based on data which is open to public. The article does not require the ethical committee approval.


Figure 1 shows the daily incidence of COVID-19 in India from March 2, 2020 to April 2, 2020. Figure 2 shows the distribution of likely values of the R0 of COVID-19 in India. We estimated the ML value of R0 as 1.471 (95% CI, 1.351 to 1.592) for COVID-19 in the early stage in India. Figure 3 shows a histogram of R0 values using the bootstrap strategy with 1,000 likely samples.
Figure 4 shows the global spread of COVID-19 during the same period. The vertical gray bars indicate the presence of cases and black dots denote the dates of symptom onset. The dashed vertical blue line indicates the current date (April 3, 2020). The vertical scale in Figure 4 shows the relative scale of infections. Figure 5 shows the predicted cumulative cases in next 30 days.
We computed that the cumulative number of new cases may reach 39,382 (95% CI, 34,300 to 47,351) in the next 30 days. The R0 data were estimated based on the existing COVID-19 data from March 2, 2020 to April 2, 2020. The Indian government has already announced a nationwide lockdown. As per the WHO information on January 23, 2020, the R0 of COVID-19 lies between 1.4 and 2.5. Our estimation indicates that for India, the median R0 value of 1.471 (95% CI, 1.351 to 1.592) is in the lower range. However, various studies have indicated that precisely estimating R0 is challenging, because R0 depends on environmental conditions, demography, and the modeling method. In our method, the accuracy of R0 depended on the premise that all cases of COVID-19 in India were identified in the study period. If the same scenario continues, we predict that the cumulative number of new cases may reach 39,382 (95% CI, 34,300 to 47,351) in next 30 days. We believe that our forecasting numbers may help in various aspects, such as developing the required medical infrastructure and focusing efforts on mitigating the economic impact of the pandemic. Our findings were derived based on a limited time frame, and the results may change after the occurrence of a considerable number of additional cases. The R0 value corresponding to the spread of COVID-19 can be controlled by strictly following social distancing in daily life, wearing masks, frequent hand-washing with soap or sanitizers, quarantining infected people, identifying cases using rapid diagnostic methods, and so on.


We estimated the median value of R0 to be 1.471 (95% CI, 1.351 to 1.592) and predicted that the cumulative number of new cases may reach 39,382 (95% CI, 34,300 to 47,351) in the next 30 days. The predicted size largely depends on changes in R0. Effective measures against COVID-19 will help to reduce R0. The presence of numerous unidentified cases in the study period may result uncertainties in the estimated value of R0 used in the developed forecasting model.


The authors have no conflicts of interest to declare for this study.




Conceptualization: KK. Data curation: KS. Formal analysis: KS. Funding acquisition: None. Methodology: KK. Writing – original draft: KK. Writing – review & editing: KK, KS.



Figure 1.
Actual daily incidence of coronavirus disease 2019 in India.
Figure 2.
Maximum-likelihood value of reproduction number (R0).
Figure 3.
Sample of likely values of reproduction number (R0).
Figure 4.
Global spread of infections.
Figure 5.
Predicted cumulative new cases in the next 30 days.
Table 1.
Actual coronavirus disease 2019 daily new confirmed cases in India
Date in 2020 New confirmed cases (n) Date in 2020 New confirmed cases (N)
Mar 2 2 Mar 18 14
Mar 3 1 Mar 19 22
Mar 4 22 Mar 20 50
Mar 5 2 Mar 21 60
Mar 6 1 Mar 22 77
Mar 7 3 Mar 23 74
Mar 8 5 Mar 24 85
Mar 9 5 Mar 25 87
Mar 10 6 Mar 26 88
Mar 11 10 Mar 27 140
Mar 12 13 Mar 28 84
Mar 13 8 Mar 29 106
Mar 14 16 Mar 30 227
Mar 15 10 Mar 31 146
Mar 16 11 Apr 1 437
Mar 17 19 Apr 2 235


1. World Health Organization. Coronavirus disease 2019 (COVID-19) situation report-73. [cited 2020 Apr 3]. Available from: https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200402-sitrep-73-covid-19.pdf?sfvrsn=5ae25bc7_2.

2. Ministry of Health and Family Welfare, Government of India. COVID-19 India. [cited 2020 Apr 3]. Available from: http://www.mohfw.gov.in/index.html#.

3. Anderson RM, Heesterbeek H, Klinkenberg D, Hollingsworth TD. How will country-based mitigation measures influence the course of the COVID-19 epidemic? Lancet 2020; 395: 931-934. PMID: 32164834
crossref pmid pmc
4. Myers MF, Rogers DJ, Cox J, Flahault A, Hay SI. Forecasting disease risk for increased epidemic preparedness in public health. Adv Parasitol 2000; 47: 309-330. PMID: 10997211
crossref pmid pmc
5. Pronker ES, Weenen TC, Commandeur H, Claassen EH, Osterhaus AD. Risk in vaccine research and development quantified. PLoS One 2013; 8: e57755. PMID: 23526951
crossref pmid pmc
6. Li Q, Guan X, Wu P, Wang X, Zhou L, Tong Y, et al. Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia. N Engl J Med 2020; 382: 1199-1207. PMID: 31995857
crossref pmid pmc
7. Eksin C, Paarporn K, Weitz JS. Systematic biases in disease forecasting - the role of behavior change. Epidemics 2019; 27: 96-105. PMID: 30922858
crossref pmid
8. Pinsent A, Liu F, Deiner M, Emerson P, Bhaktiari A, Porco TC, et al. Probabilistic forecasts of trachoma transmission at the district level: a statistical model comparison. Epidemics 2017; 18: 48-55. PMID: 28279456
crossref pmid pmc
9. Funk S, Camacho A, Kucharski AJ, Eggo RM, Edmunds WJ. Real-time forecasting of infectious disease dynamics with a stochastic semi-mechanistic model. Epidemics 2018; 22: 56-61. PMID: 28038870
crossref pmid pmc
10. Khan MA, Badshah Q, Islam S, Khan I, Shafie S, Khan SA. Global dynamics of SEIRS epidemic model with non-linear generalized incidences and preventive vaccination. Adv Differ Equ 2015; 88.
crossref pdf
11. Yan P, Liu S. SEIR epidemic model with delay. ANZIAM J 2006; 48: 119-134.
12. Dietz K. The estimation of the basic reproduction number for infectious diseases. Stat Methods Med Res 1993; 2: 23-41.
13. Cori A, Ferguson NM, Fraser C, Cauchemez S. A new framework and software to estimate time-varying reproduction numbers during epidemics. Am J Epidemiol 2013; 178: 1505-1512. PMID: 24043437
crossref pmid pmc pdf
14. Nishiura H, Linton NM, Akhmetzhanov AR. Serial interval of novel coronavirus (COVID-19) infections. Int J Infect Dis 2020; 93: 284-286. PMID: 32145466
crossref pmid pmc
15. Jombart T, Nouvellet P, Bhatia S, Kamvar ZN. Projections: project future case incidence; 2018 [cited 2020 Apr 3]. Available from: https://cran.r-project.org/web/packages/projections/index.html.


Browse all articles >

Editorial Office
Department of Preventive Medicine, Yonsei University College of Medicine
50-1 Yonsei-ro, Seodaemun-gu, Seoul 03722, Korea
TEL: +82-2-745-0662   FAX: +82-2-764-8328    E-mail: office.epih@gmail.com

Copyright © 2021 by Korean Society of Epidemiology.

Developed in M2PI

Close layer
prev next