Introduction

Infectious diseases modelling attracted great deal of attention by scientists and people around the world during the COVID-19 pandemic. The novel coronavirus (SARS-CoV-2) started to quickly spread in the early months of 2020 and was announced as a global pandemic by the World Health Organization1 in March 2020. Population-wide vaccination is critical for achieving herd immunity and for controlling the COVID-19 pandemic while combined with effective testing and preventive measures.

Inevitably, the development of vaccines became the highest priority of governments and pharmaceutical companies2. Several vaccines were available in the last months of 2020. As of 30 July 2021, over 28% of the world population is partly or fully vaccinated3. Not knowing much about the coronavirus disease in the early months, mathematical models have played an important role in shedding some light on the disease dynamics.

The SIR model categorizes individuals as Susceptible, Infectious, and Recovered. Mathematical infectious disease models based on the classical SIR model4 are widely used to examine the spread of a disease. These models display compelling results especially during the early period of the pandemic5,6,7,8,9,10,11,12,13,14,15,16,17,18,19. In a recent article20, the authors studied the immune response of recovered from COVID-19 individuals up to 8 months patients and found that they have considerable immune memory.

Vaccination is a common method of reducing infectious diseases spread21,22,23,24,25 because it reduces the number of susceptible, from where the reproduction number naturally also decreases. The reproductive number is an indicator of the transmissibility of the virus caused by infectious individuals. It is affected by the population density, vaccinations, quarantines, social distancing, mask wearing and other measures25,26.

In a recent work27, the authors present a study of the temporal evolution of epidemic outbreaks accounting for vaccinations with monitored real time COVID-19 data, using the SIRV model, V denoting the relative fractions of currently vaccinated. They make certain assumptions and reduce the time-dependent general SIRV equations to an analytical model. SIR and SEIR models with vaccination are used to simulate and predict the development of the COVID-19 spread, e.g.25,28,29,30.

In several very recent publications31,32,33,34,35 applied to the COVID-19 epidemic, researchers have developed and used SIR and SEIR based models with vaccination to overcome the limitations of the conventional SIR model. The work in31 presents an investigation of the dynamics of a stochastic SIRV epidemic model with general non-linear incidence and vaccination. The introduced random fluctuations controls the disease outbreak. Zhao et al.32 use improved SIRV to evaluate the performance of non-pharmaceutical interventions in reducing the number of daily new cases of COVID-19 in South and Southeast Asia. They apply statistical methods to estimate parameters.

The research presented in33 proposes an SIRV evolutionary game model for infectious disease vaccination strategies based on the scale-free networks with tunable clustering. Their model analyzes the vaccination strategies taking into account factors such as vaccination effectiveness, vaccination cost, treatment cost after illness, government subsidy rate and treatment discount rate. Other notable work is about modelling infectious diseases with herd immunity in a randomly mixed population34. The authors formulate two new SIR models to mimic the declining transmission rate of infectious diseases at different stages of transmission. They found that natural herd immunity might not be sufficiently effective in infectious diseases with high reproduction numbers.

Researchers implemented a modification of a SIR model35 to study the role of the rate of vaccination, rate of transmission and the likelihood of emergence of resistant strains. They used parameters realistically resembling SARS-CoV-2 transmission to run simulations for a total time of three years, with vaccination starting one year into the model.

The goal of the present work is to develop and to demonstrate the effectiveness of an inverse method for identifying the time-dependent functions and parameters of the SIRV model simultaneously (Sect. 2). We apply the method to the adaptive SIRV (A-SIRV) epidemic model using publicly available COVID-19 data. In contrast to other works which use statistical approaches to estimate parameters, we apply an inverse problem approach to identify these parameters. The dynamically estimated rates can be particularly useful in running other simulations, such as in35, to study the epidemic.

The modified SIR model for the spread of an infectious disease and vaccination (SIRV)

The standard notations in the SIR model are: S(t) denotes the number of susceptible, I(t) – infectives, and R(t) – removed individuals. Assume that the time-dependent function \(u=u(t)\) represents the vaccination rate. Then the total number of vaccinated individuals is given by

$$\begin{aligned} V(t)=\int _0^t u(t)dt. \end{aligned}$$
(1)

Every vaccine has different level of efficacy. We assume that the vaccine efficacy impacts disease spread and prevents transmission at the same rate, which is a reasonable assumption according to the study35. If people are vaccinated with the same vaccine type / brand, it is possible to introduce the efficacy of the vaccine in the model. Since there is lack of information about the vaccine types and other details concerning the vaccinated individuals, we make the following assumptions, which may not be true for all

  • Individuals being vaccinated belong to the class S (susceptible) before the vaccine. There are recovered people who vaccinate themselves; this is not included in the model because of lack of data.

  • Individuals move to the class R (removed) after vaccination. In other words, the model assumes the vaccine is 100% effective against the disease, namely vaccinated people become fully immune.

Due to unavailability of data about COVID-19 variants and vaccine details on a country level, at this stage, the model does not include important assumptions, such as:

  • Vaccinated individuals could be infected and be infectious if the infection is caused by other variants, known as the vaccine breakthrough problem; there might be multiple variants circulated in the same country, not only a single strain.

  • Infected individuals could be vaccinated again to improve their immune level; vaccinated individuals could be infected again due to waning of immunity for COVID-19.

The transmission rate \(\beta >0\) gives the probability that a random infective person infects a random susceptible person. A major approximation here is the assumption that the population under study is well mixed so that every person has equal probability of coming into contact with every other person. The removal rate \(\gamma >0\) gives the probability that an infective person recovers. In the classical SIR model \(\beta \) and \(\gamma \) are constants, while they are functions of time in14. The diagram in Fig. 1 describes the adaptive SIR (A-SIR) model with vaccinations (A-SIRV) and time-dependent coefficients:

$$\begin{aligned} L_1 (S,I,\beta )&=\frac{dS(t)}{dt} +\beta (t) S(t)I(t) + u(t) = 0 \end{aligned}$$
(2)
$$\begin{aligned} L_2 (S,I,\beta , \gamma )&= \frac{dI(t)}{dt} - \beta (t) S(t)I(t) +\gamma (t) I(t) = 0 \end{aligned}$$
(3)
$$\begin{aligned} L_3 (I,R, \gamma )&=\frac{dR(t)}{dt} - \gamma (t) I(t) - u(t) = 0. \end{aligned}$$
(4)

The total population \(N = S(t)+I(t)+R(t)\) is considered constant in the equations. Fig. 1 shows the diagram for the A-SIRV model, corresponding to the system (2)–(4).

Figure 1
figure 1

The A-SIRV epidemic model.

The A-SIRV model used in this study assumes that the removed individuals are no longer susceptible nor infectious. The number of cases for recovered from COVID-19 individuals who are re-infected at the present moment is very limited and the rate cannot be estimated; thus, this possibility in not taken into account.

Effective and basic reproductive ratios \(R_e\) and \(R_0\)

An epidemic occurs if an infective individual introduced into a population of susceptible individuals infects on average more than one other person, namely I(t) is increasing in time. The original SIR model assumes that the transmission and removal rates are constants. Equations (2)–(4), with proper initial conditions, allow the determination of I(t), and S(t), and R(t), if the coefficients \(\beta \) and \(\gamma \) are known constants. However, in the case of a pandemic, the rates may vary in time; hence, \(\beta =\beta (t)\) and \(\gamma =\gamma (t)\).

The so-called effective reproduction number (also effective reproduction rate or ratio) for a given epidemic is the parameter \(R_e(t)\) given by

$$\begin{aligned} R_e(t)=\frac{\beta (t) S(t)}{\gamma (t)}. \end{aligned}$$
(5)

An epidemic occurs when \(R_e(t)>1\). Then the fraction of the population that is immune increases (because of vaccination or because of recovering from the disease) so much that \(R_e(t)<1\), ”herd immunity” is achieved. Hence, the number of new cases occurring in the population will decrease to zero.

Another important characteristic of an epidemic, the basic reproduction number (or ratio, or rate) \(R_0\), is defined as:

$$\begin{aligned} R_0=\frac{\beta (t) N}{\gamma (t)}, \end{aligned}$$
(6)

where N is the size of the total population.

Multiple factors may cause the rates to change over time. In the case of COVID-19, examples include social distancing, restrictions imposed by governments, and preventive treatments. Therefore, we define the effective reproduction number and the basic reproduction number to be functions of time, defined in (5) and (6), respectively.

The main goal of the present work is to identify the time-dependent reproduction rates directly using the SIRV model.

Inverse problem formulation

The initial-value problem consisting of the system of equations (2)–(4), with coefficients \(\beta (t)\) and \(\gamma (t)\) known, along with proper initial conditions derived from the given data, constitutes the direct problem. Note that the vaccination rate function u(t) in equations (2) and (4) can be obtained from the function V(t), which is known from the reported public data on a daily basis.

In reality, the values of the time-dependent parameters \(\beta (t)\) and \(\gamma (t)\) are unknown for a new epidemic disease. Hence, the simultaneous determination of the coefficients and functions from the available data is an inverse problem.

Figure 2
figure 2

The time nodes \(\nu _i\) where the data values are given.

Let [0, P], where P is a number of days, be a time sub-interval (Fig. 2), where approximate values of the susceptible S(t) and currently infectious I(t) are known at specified time moments \(\nu _0, \nu _1, \dots , \nu _P\),

$$\begin{aligned} S(\nu _i)\approx \sigma _i, \qquad I(\nu _i) \approx \lambda _i,\qquad i=0,1,\dots , P. \end{aligned}$$
(7)

At the same time, the values of \(V(\nu _i)\) are given exactly since the number of vaccinated people is available, as reported in3. Therefore, we can approximate the data values of V(t) using a function with a continuous second derivative. From this function, the values \(u(t)=\frac{d V}{d t}\) can be estimated.

Since the values of the vaccination rate function u(t) can be found from the available data of vaccinated individuals, we can find I(t), S(t), \(\beta (t)\), and \(\gamma (t)\) from equations (2), (3), and (7). Note that this problem is inverse and require a special treatment. The approach used here is based on the Method of Variational Imbedding (MVI)14,36. Following the idea of MVI, we construct a functional using the original equations (2), (3), and the available data values (7). We define the functional over the sub-interval [0, P] as

$$\begin{aligned} {\mathscr {F}}= \int _0^P \left[ L_1^2+L_2^2 +\sum _{i=1}^{P-1} \delta (t-\nu _i) \ \mu _i \left( (S(t)-\sigma _{i})^2+(I(t)-\lambda _{i})^2 \right) \right] d t, \end{aligned}$$
(8)

where \(\mu _i\) are the weights prescribed for the i-th node and \(\delta (t)\) is the Dirac delta function \(\delta (t-\nu _i)\) defined as:

$$\begin{aligned} \delta (t-\nu _i) = {\left\{ \begin{array}{ll} \infty , &{} t = \nu _i \\ 0, &{} t \ne \nu _i \end{array}\right. } \quad \text { and } \quad \int _{-\infty }^{\infty } \delta (t-\nu _i) d t = 1. \end{aligned}$$

In other words, we substitute the problem for finding the unknown functions I(t), S(t) and the coefficients \(\beta (t)\), \(\gamma (t)\) in equations (2), (3), and (7) with a problem for minimization of the functional \({\mathscr {F}}\) defined by (8). Finding R(t) is straightforward after knowing the values of I(t), S(t), \(\beta (t)\), and \(\gamma (t)\).

The absolute minimum of \({\mathscr {F}}\) is equal to zero with the functional becoming zero if and only if equations (2), (3) and conditions (7) are satisfied. We want to emphasize here that the functions (S, I, V) and the parameters \((\beta ,\gamma , u)\) are unknown and one have to identify them simultaneously by solving a minimization problem.

Method for solving the inverse problem with time-dependent rates

In order to solve the inverse problem with time-dependent rates, we first find the minimum of the functional \({\mathscr {F}}\), defined in (8), numerically over a sub-interval of the entire period assuming constant transmission and recovery rates. The details of solving the minimization sub-problem are given in “Appendix A”.

Let the values of S(t) and I(t) be known at some time moments \(\nu _0,\nu _1,\dots ,\nu _m\), shown in Fig. 3, namely

$$\begin{aligned} S(\nu _l)=\sigma _l, \quad I(\nu _l)=\lambda _l, \quad \text { for }\quad l=0,1,\dots , m. \end{aligned}$$
(9)
Figure 3
figure 3

The time nodes and the subsets of fixed length of \(P+1\) days for identifying \(\beta (t)\) and \(\gamma (t)\).

We consider the following two approaches for estimating the coefficients \(\beta _l\) and \(\gamma _l\) in the system of equations (2), (3):

  1. i.

    Solve the inverse problem for the system of equations (2), (3) under the boundary conditions

    $$\begin{aligned} S(\nu _{l-1})=\sigma _{l-1}, \quad I(\nu _{l-1})=\lambda _{l-1}, \quad S(\nu _l)=\sigma _l, \quad I(\nu _l)=\lambda _l, \end{aligned}$$
    (10)

    for \(l=1,2,\dots , m\). This approach is similar to the method developed in37 for estimating the coefficient in Euler-Bernoulli equation, later modified in36 for the SIR equations. It works well if the data represent the exact values of the functions S(t) and I(t). In a real-world data, e.g. the available public data for the COVID-19 pandemic, there usually exists random noise causing oscillations in the numerical results.

  2. ii.

    Use the solution method, proposed in Sect. 3, for estimating the parameters \(\beta \) and \(\gamma \) as constants on every sub-interval \([\nu _l, \nu _{l+P}]\), Then, we use the obtained constant values to approximate the non-constant values \(\beta _{l+P}\) and \(\gamma _{l+P}\) for \(l=0,1,\ldots ,m-P\). This approach is smoothing the data automatically. Knowing the approximate values of the transmission and recovery rates, we obtain the reproduction rates

    $$\begin{aligned} R_{0,k} = \frac{\beta _k}{\gamma _k} N \quad \text {and} \quad R_{e,k} = \frac{\beta _k}{\gamma _k} \sigma _l, \qquad k=P,P+1,\ldots ,m. \end{aligned}$$

Results

The numerical simulations have been performed using the available public data from3 and38 websites. The number of currently infected persons, \(\lambda _k\), is reported daily on38. Both websites3 and38, report the total number of infected individuals, \(T_k= I_k+R_k\) from the beginning of the COVID-19 pandemic.

The daily number of vaccinated people, \(V_k\), is taken from3. We assume that the reported data is correct. In order to to obtain a smooth approximation of the vaccinated persons V(t) and the vaccination rate \(u(t)=dV/dt\), we use cubic spline approximation based on the values \(V_k\). Consequently, the number of susceptible individuals, \(\sigma _k\), for a given country can be found in the following way: \(\sigma _k=N-T_k-V_k\), where N is the total population.

We do not pretend that the available data are accurate. For sure, the so-called ”hidden cases” (asymptotic cases or cases without official tests) are not included in the reported data. Moreover, the posted data contain random (human) mistakes which can be treated as random noise. Here, we use the posted data to illustrate the method described in "Method for solving the inverse problem with time-dependent rates" section.

The presented results for the rates \(N \beta \), \(\gamma (t)\), \(R_0(t)\), and \(R_e(t)\) are based on 28-day, 35-day, and 42-day sub-periods, over the entire multi-month period, from August 2020 until August 3, 2021.

The selected countries (Israel, United States, and Japan) for this study represent population with different vaccination dynamics. Israel performed early aggressive vaccination during the first half of 2021, United States is catching up, whereas Japan is behind compared to them. The World is aa good example of aggregated global data. Table 1.

Table 1 Vaccinated, recovered, and susceptible for the World and selected countries as of August 3, 2021.

The World

Estimated rates \(N \beta \), \(\gamma \), \(R_0\), and \(R_e\) for the World are shown at Fig. 4. According to the reported data as of August 3, 2021: \(14.78\%\) of the population was fully vaccinated; about \(2.56\%\) of the population met the virus. Thus, \(82.66\%\) of the population remained susceptible. The basis and effective reproduction rates have been near 1 since January 2021.

Figure 4
figure 4

Estimated rates \(N \beta \), \(\gamma \), \(R_0\), and \(R_e\) for the World.

Israel

During the first half of 2021, Israel was the COVID-19 vaccination champion – over \(57.7\%\) of the population of the country was vaccinated by the end of July. At the same time, according to the reports, approximately \(9.43\%\) of the population met the virus; hence, only \(32.87\%\) of the population was susceptible. According to21, early mass vaccination programs predict a reduction of the effective reproduction rate of infection within communities. The estimated values of the basic and effective reproduction ratios \(R_0\) and \(R_e\) for Israel are shown in Fig. 5. The effective reproduction rate is practically constant, slightly less then 1 from January to middle June 2021. In June 2021, the rates increased significantly for a short period of time. Then, the reproduction rates decreased but they still remained above 1.

Figure 5
figure 5

Estimated rates \(\beta N\), \(\gamma \), \(R_0\), and \(R_e\) for Israel.

The United States of America

The reported data for The United States of America state: \(49.54\%\) were fully vaccinated, about \(10.79\%\) of the population met the virus; hence, \(39.67\%\) of the population were susceptible. The estimated values of the transmission rate \(N\beta \), \(\gamma \), the basic and effective reproduction ratios \(R_0\) and \(R_e\) are shown at Fig. 6. The transmission rate was the highest in fall 2020. It started to increase again in June-July 2021. COVID-19 is on the rise in many countries, casing a new wave. This surge is due to widespread resumption of normal activities.

Figure 6
figure 6

Estimated rates \(N \beta \), \(\gamma \), \(R_0\), and \(R_e\) for United States.

Japan

Japan is with about \(30.10\%\) fully vaccinated individuals, approximately \(0.75\%\) reported to be totally infected; this, about \(69.15\%\) are susceptible;. The estimated values of the transmission rate \(N\beta \), \(\gamma \), the basic and effective reproduction rates \(R_0\) and \(R_e\) are shown in Fig. 7. While the rates had been relatively reasonable and low until recently, they started to increase lately. This growth can be explained with the 2020 Summer Olympics held from 23 July to 8 August 2021 in Japan.

Figure 7
figure 7

Estimated rates \(\beta N\), \(\gamma \), \(R_0\), and \(R_e\) for Japan.

Discussion

We performed numerical simulations with the developed method for the A-SIRV model for identifying the transmission, removal and reproduction rates globally and for three countries (Israel, United States, and Japan) using publicly available data. Estimated rates for the World show the presence of three waves and a forth wave being formed since July 2021.

The values for the identified rates for the World do not oscillate, regardless of the number of days P. By contrast, the obtained rates of the selected individual countries have fluctuations during 2021, which are smaller for \(P=42\) compared to \(P=28\) and \(P=35\). This is because a larger time period is having a smoothing effect of the oscillating functions.

We observe oscillations during time periods that include vaccination data. The vaccination rate u(t) is approximated from the given data for the number of vaccinated individuals V(t). It clearly affects the computed rates in the A-SIRV equations. The oscillations in the rates for Israel, Japan, and to some extent United States can be explained with fluctuations in the vaccination rate u(t) due to irregularities in the data for vaccinated people V(t). We assume that the problem is posed correctly, namely the data have ”physical meaning” and, therefore, a solution of the problem exists, see39. The randomness of the vaccination data introduces oscillations in the obtained numerical values of the rates.

Finally, it is important to mention that the A-SIRV model considers only one homogeneous population for the selected countries: thus, giving aggregated results for the estimated rates and the unknown functions. The populations of The World and The United State of America are obviously not homogeneous. Such results can still help see trends of potential future growth of the epidemics and, if needed, guide the design of alternative interventions.

Conclusions

Mathematical models can help with visualizing and predicting the long-term behaviour of an infectious disease, despite of the fact that there are many limitations to using them. For instance, the SIR type models contain many assumptions such as: accuracy and completeness of reported data; mixing of the population; no reinfection; constant population; and so forth. The present work studies the performance of a method for an epidemic based on an inverse problem approach for estimating the time-dependent transmission and removal rates in the A-SIRV epidemic model. The inverse problem is solved by defining a minimization problem using the entire dataset for the examined population, with available COVID-19 data. The work utilizes an inverse problem approach to the time-dependent transmission and removal rates identification as well as the unknown functions in the A-SIRV (Adaptive SIRV) model. This can give insight into how well the method identifies the parameters that can be used to predict the infectious disease spread. If conditions change, then the predictions may no longer be accurate; hence, adjustments will be required based on the existing conditions, for obtaining new predictions.