Estimation of the reproduction number and early prediction of the COVID-19 outbreak in India using a statistical computing approach

Coronavirus disease 2019 (COVID-19), which causes severe respiratory illness, has become a pandemic. The World Health Organization has declared it a public health crisis of international concern. We developed a susceptible, exposed, infected, recovered (SEIR) model for COVID-19 to show the importance of estimating the reproduction number (R0). This work is focused on predicting the COVID-19 outbreak in its early stage in India based on an estimation of R0. The developed model will help policymakers to take active measures prior to the further spread of COVID-19. Data on daily newly infected cases in India from March 2, 2020 to April 2, 2020 were to estimate R0 using the earlyR package. The maximum-likelihood approach was used to analyze the distribution of R0 values, and the bootstrap strategy was applied for resampling to identify the most likely R0 value. We estimated the median value of R0 to be 1.471 (95% confidence interval [CI], 1.351 to 1.592) and predicted that the new case count may reach 39,382 (95% CI, 34,300 to 47,351) in 30 days.


INTRODUCTION
Coronavirus disease 2019 (COVID-19) has rapidly spread worldwide, with 896,450 confirmed total new cases and 45,526 deaths globally as of April 2, 2020 [1]. The disease emerged as 27 cases of pneumonia with an unknown cause in Wuhan, China. The first COVID-19 case in India was identified on January 30, 2020, and the total number of reported cases reached 2,322 as of April 3, 2020 [2]. On March 3, 2020, the Indian government suspended all new visas and visas issued to nationals of Iran, Italy, Japan, and Korea, and on the next day implemented compulsory screening of all international passengers. The Indian government declared a countrywide lockdown for 21 days on March 24, 2020 as a measure to control the spread of COVID-19, which has developed into a pandemic. The transmission rate of COVID-19 has been relatively low in most countries, but with major outbreaks in a few countries, such as Iran, Italy, Japan, and Korea. Most countries have at least an early stage of COVID-19 spread before any mitigation measures have an impact [3]. Myers et al. [4] stated that accurate epidemic forecasting models would noticeably improve epidemic prevention and control capabilities. No vaccine is available for COVID-19, and vaccination is typically not a good option for stopping the spread of a new epidemic, as considerable time is required to develop a safe and effective vaccine (approximately 10 years) [5]. Li et al. [6] found that the COVID-19 incubation period was 5.2 days (95% confidence interval [CI], 4.1 to 7.0) and found indications that human-to-human transmission occurred among close contacts. India is the second most populated country, it is important to estimate the transmissibility of COVID-19 and to predict the total number of new cases, which will help direct focus towards this public health crisis. Mathematically based epidemic models, such as susceptible-infected-recovered (SIR) models [7], susceptible-infected-susceptible (SIS) models [8], susceptible-exposed-infected-recovered (SEIR) models [9], and susceptible-exposed-infected-recovered-susceptible 2

Data
All the data shown in Table 1 were collected from an Indian official website [2]. The epidemiological data from March 2, 2020 to April 2, 2020, as shown in Table 1, were utilized to estimate R 0 . A higher R 0 indicates a higher likelihood of new infections.

Model development
The transmissibility of COVID-19 in India was evaluated using the earlyR package. It was assumed that interventions so far have had a minimal impact on COVID-19 transmission in India. The model used herein is a simplified version of the model introduced by Cori et al. [13]. Serial interval distributions (i.e., mean and standard deviation [SD]) are required to estimate R 0 . We assumed that the mean and SD were 4.7 days and 2.9 days, respectively, based on existing research [14]. The maximum-likelihood (ML) approach was applied to obtain the distribution of R 0 . The bootstrap strategy was applied for re-sampling 1,000 times to obtain likely R 0 values. The R package projection was used to predict the cumulative daily incidence [15]. We forecast the cumulative total new cases after 30 days. The daily incidence obeys a Poisson distribution determined by daily infectiousness, which is denoted as, Where V (t-k) the vector of the probability mass function and X k is is the real-time incidence at time k. The forecasting model depended on the present incidence and serial interval distributions. The projections were based on resampling and probability computations. The statistical analysis and model development were done using R version 3.6.3 (https://cran.r-project.org/bin/ windows/base/old/3.6.3/).
(SEIRS) models [10] are used to predict the trajectory of epidemics. Estimating the reproduction number (R 0 ) can be estimated statistically or empirically. In this work, we used the earlyR (https://cran.r-project.org/) package to estimate R 0 and predict the trajectory of the outbreak.

Susceptible-exposed-infected-recovered-susceptible mathematical model
SEIR models can be used to predict the number of people infected based on R 0 . We have given a SEIR model in this study to demonstrate the importance of estimating R 0 [11]. COVID-19 has an incubation period, also known as a latent period or latent delay (τ), of 2-14 days. The following assumptions were made for developing the mathematical model for COVID-19.
-The population growth of the region/country is exponential, and the COVID-19 epidemic is occurring in a sufficiently short period -Infected individuals are assumed not to give birth -Recovered individuals acquire permanent immunity with a probability f(0 ≤ f ≤ 1) or die from the disease with a probability of (1-f) With S referring to susceptible individuals, E to susceptible individuals that become exposed at time t-τ, I to individuals who are infected, and R to those who have recovered from COVID-19, the resulting differential equations are: Where μ is the per capita death rate due to causes other than the disease, γ is the rate of contact (or) transmission rate (or) infection rate, α is the recovery rate, and b is the per capita birth rate (with b > μ).
At any instant, R 0 is defined as, This constant is extremely important in characterizing the spread of COVID-19. It reflects how many people contract the disease from an infectious individual. In general, If R 0 > 1, secondary infections will occur and the disease is spreading throughout the population. According to WHO information as of January 23, 2020, the R 0 of COVID-19 lies between 1.4 and 2.5. R 0 may vary considerably for different infectious diseases, but also for the same disease in different populations [12].
With S referring to susceptible individuals, E to susceptible individuals that become exposed at time t-τ, I to individuals who are infected, and R to those who have recovered from COVID-19, the resulting differential equations are: Where μ is the per capita death rate due to causes other than the disease, γ is the rate of contact (or) transmission rate (or) infection rate, α is the recovery rate, and b is the per capita birth rate (with b>μ).
At any instant, R0 is defined as, This constant is extremely important in characterizing the spread of COVID-19. It reflects how many people contract the disease from an infectious individual. In general, If R0>1, secondary infections will occur and the disease is spreading throughout the population. According to WHO information as of January 23, 2020, the R0 of COVID-19 lies between 1.4 and 2.5. R0 may vary considerably for different infectious diseases, but also for the same disease in different populations [12].

Data
All the data shown in Table 1 were collected from an Indian official website [13]. The epidemiological data from March 2, 2020 to April 2, 2020, as shown in Table 1, were utilized to estimate R0. A higher R0 indicates a higher likelihood of new infections. With S referring to susceptible individuals, E to susceptible individuals that become exposed at time t-τ, I to individuals who are infected, and R to those who have recovered from COVID-19, the resulting differential equations are: Where μ is the per capita death rate due to causes other than the disease, γ is the rate of contact (or) transmission rate (or) infection rate, α is the recovery rate, and b is the per capita birth rate (with b>μ).
At any instant, R0 is defined as, This constant is extremely important in characterizing the spread of COVID-19. It reflects how many people contract the disease from an infectious individual. In general, If R0>1, secondary infections will occur and the disease is spreading throughout the population. According to WHO information as of January 23, 2020, the R0 of COVID-19 lies between 1.4 and 2.5. R0 may vary considerably for different infectious diseases, but also for the same disease in different populations [12].

Data
All the data shown in Table 1 were collected from an Indian official website [13]. The epidemiological data from March 2, 2020 to April 2, 2020, as shown in Table 1, were utilized to estimate R0. A higher R0 indicates a higher likelihood of new infections. With S referring to susceptible individuals, E to susceptible individuals that become exposed at time t-τ, I to individuals who are infected, and R to those who have recovered from COVID-19, the resulting differential equations are: Where μ is the per capita death rate due to causes other than the disease, γ is the rate of contact (or) transmission rate (or) infection rate, α is the recovery rate, and b is the per capita birth rate (with b>μ).

At any instant, S (t) + E (t) + I (t) + R(t) = N (t)
R0 is defined as, This constant is extremely important in characterizing the spread of COVID-19. It reflects how many people contract the disease from an infectious individual. In general, If R0>1, secondary infections will occur and the disease is spreading throughout the population. According to WHO information as of January 23, 2020, the R0 of COVID-19 lies between 1.4 and 2.5. R0 may vary considerably for different infectious diseases, but also for the same disease in different populations [12].

Data
All the data shown in Table 1 were collected from an Indian official website [13]. The epidemiological data from March 2, 2020 to April 2, 2020, as shown in Table 1, were utilized to estimate R0. A higher R0 indicates a higher likelihood of new infections.

Model development
The transmissibility of COVID-19 in India was evaluated using the earlyR package. It was assumed that interventions so far have had a minimal impact on COVID-19 transmission in India. The model used herein is a simplified version of the model introduced by Anne Cori et al. [14]. Serial interval distributions (i.e., mean and standard deviation) are required to estimate R0. We assumed that the mean and standard deviation were 4.7 days and 2.9 days, respectively, based on existing research [15]. The maximum-likelihood (ML) approach was applied to obtain the distribution of R0. The bootstrap strategy was applied for re-sampling 1000 times to obtain likely R0 values. The R package projection was used to predict the cumulative daily incidence [16]. We forecast the cumulative total new cases after 30 days. The daily incidence obeys a Poisson distribution determined by daily infectiousness, which is denoted as, Where ���t � ��the vector of the probability mass function and Xk is is the real-time incidence at time k. The forecasting model depended on the present incidence and serial interval distributions. The projections were based on resampling and probability computations. The statistical analysis and model development were done using R version 3.6.3. Figure 1 shows the daily incidence of COVID-19 in India from March 2, 2020 to April 2, 2020. Figure 2 shows the distribution of likely values of the R0 of COVID-19 in India. We estimated the ML value of R0 as 1.471 (95% CI, 1.351 to 1.592) for COVID-19 in the early stage in India. Figure 3 shows a histogram of R0 values using the bootstrap strategy with 1,000 likely samples. Figure 4 shows the global spread of COVID-19 during the same period. The vertical gray bars indicate the presence of cases and black dots denote the dates of symptom onset. The dashed vertical blue line indicates the current date (April 3, 2020). The vertical scale in Figure 4 shows the relative scale of infections. Figure 5 shows the predicted cumulative cases in next 30 days.

RESULTS AND DISCUSSION
We computed that the cumulative number of new cases may reach 39,382 (95% CI, 34,300 to 47,351) in the next 30 days. The R0 data were estimated based on the existing COVID-19 data

Ethics statement
The analysis in the article is based on data which is open to public. The article does not require the ethical committee approval. Figure 1 shows the daily incidence of COVID-19 in India from March 2, 2020 to April 2, 2020. Figure 2 shows the distribution of likely values of the R 0 of COVID-19 in India. We estimated the ML value of R 0 as 1.471 (95% CI, 1.351 to 1.592) for COVID-19 in the early stage in India. Figure 3 shows a histogram of R 0 values using the bootstrap strategy with 1,000 likely samples. Figure 4 shows the global spread of COVID-19 during the same period. The vertical gray bars indicate the presence of cases and black dots denote the dates of symptom onset. The dashed vertical blue line indicates the current date (April 3, 2020). The vertical scale in Figure 4 shows the relative scale of infections. Figure 5 shows the predicted cumulative cases in next 30 days.

RESULTS AND DISCUSSION
We computed that the cumulative number of new cases may reach 39,382 (95% CI, 34,300 to 47,351) in the next 30 days. The R 0 data were estimated based on the existing COVID-19 data from March 2, 2020 to April 2, 2020. The Indian government has already announced a nationwide lockdown. As per the WHO information on January 23, 2020, the R 0 of COVID-19 lies between 1.4 and 2.5. Our estimation indicates that for India, the median R 0 value of 1.471 (95% CI, 1.351 to 1.592) is in the lower range. However, various studies have indicated that precisely estimating R 0 is challenging, because R 0 depends on environmental conditions, demography, and the modeling method. In our method, the accuracy of R 0 depended on the premise that all cases of   COVID-19 in India were identified in the study period. If the same scenario continues, we predict that the cumulative number of new cases may reach 39,382 (95% CI, 34,300 to 47,351) in next 30 days. We believe that our forecasting numbers may help in various aspects, such as developing the required medical infrastructure and focusing efforts on mitigating the economic impact of the pandemic. Our findings were derived based on a limited time frame, and the results may change after the occurrence of a considerable number of additional cases. The R 0 value corresponding to the spread of COVID-19 can be controlled by strictly following social distancing in daily life, wearing masks, frequent hand-washing with soap or sanitizers, quarantining infected people, identifying cases using rapid diagnostic methods, and so on.

CONCLUSION
We estimated the median value of R 0 to be 1.471 (95% CI, 1.351 to 1.592) and predicted that the cumulative number of new cases may reach 39,382 (95% CI, 34,300 to 47,351) in the next 30 days. The predicted size largely depends on changes in R 0 . Effective measures against COVID-19 will help to reduce R 0 . The presence of numerous unidentified cases in the study period may result uncertainties in the estimated value of R 0 used in the developed forecasting model.