1 Introduction

A pandemic (e.g., caused by influenza viruses such as H1N1, H5N1) is an epidemic of an infectious disease spreading through human populations across a large region (e.g., Potter 2001). Recently, the risk of pandemic influenza has been a significant public-health concern, and much attention has been paid to achieve more precise and timely estimates and predictions of influenza activity.

Compartment epidemic models, such as the susceptible-infectious-recovered (SIR) model, have been widely used to study the dynamical evolution of an infectious disease in a large population. The SIR model was first proposed by Kermack and McKendrick (1927) to explain plague and cholera epidemics, and it was extended to the SIR-susceptible (SIRS) model to allow imperfect immunity (Kermack and McKendrick 1932, 1933; Dushoff et al. 2004). Assume that at any given time \(t\), a fixed population can be split into three compartments: \(S(t)\) denotes the number of susceptibles; \(I(t)\) denotes the number of infectious; and \(R(t)\) denotes the number of “recovereds” (which includes deaths). Then in an SIRS model, the dynamical process is captured through the following set of deterministic (i.e., non-stochastic) nonlinear ordinary differential equations (ODEs):

$$\begin{aligned} \frac{\mathrm{d}S}{\mathrm{d}t}&= -\beta S I+\phi R, \end{aligned}$$
(1)
$$\begin{aligned} \frac{\mathrm{d}I}{\mathrm{d}t}&= \beta S I-\gamma I,\end{aligned}$$
(2)
$$\begin{aligned} \frac{\mathrm{d}R}{\mathrm{d}t}&= \gamma I-\phi R, \end{aligned}$$
(3)

In (1)–(3), \(\beta \) denotes the transmission rate (also referred as the contact or infection rate) per unit time, which can be expressed as the fraction of contacts between susceptible individuals and infectious individuals that result in an infection. Further, \(\gamma \) denotes the rate of “recovery” per unit time, which is the rate at which infectious individuals are removed from being infectious due to recovery (or death); then \(\frac{1}{\gamma }\) is the average duration of the infectious period. Finally, \(\phi \) denotes the rate of loss of immunity of recovered individuals per unit time, which is the rate at which recovered individuals become susceptible again (Anderson and May 1991; Hethcote 2000); then \(\frac{1}{\phi }\) is the average duration of the immunity period.

Notice that by adding Eqs. (1)–(3), we can obtain

$$\begin{aligned} \frac{\mathrm{d}S}{\mathrm{d}t}+\frac{\mathrm{d}I}{\mathrm{d}t}+ \frac{\mathrm{d}R}{\mathrm{d}t}=0. \end{aligned}$$
(4)

Thus, the model postulates a fixed total population, \(N\), without entry and exits of demographic type (i.e, there are no births or deaths from causes other than the disease itself). Clearly, it represents a short-term phenomenon where the total number of people in all three compartments together is constant. Thus, (4) implies that for \(t\) in an interval of \(\mathbb {R}_+,\)

$$\begin{aligned} S(t) + I(t) + R(t) = N. \end{aligned}$$
(5)

This assumption (5) is commonly referred to as mass balance (Reluga 2004). As discussed above, the SIRS model in (1)–(3) assumes a fraction \(\phi \) of members of the recovered class can rejoin the susceptible class; the traditional SIR model is obtained when \(\phi =0\).

A discretized version of (1)–(3) can be written as a set of deterministic difference equations. That is, when time \(t\) is discrete (in units of \(\varDelta \) days),

$$\begin{aligned} S(t+1)&= S(t)- \beta S(t)I(t)\varDelta +\phi R(t)\varDelta ,\end{aligned}$$
(6)
$$\begin{aligned} I(t+1)&= I(t)+\beta S(t)I(t)\varDelta -\gamma I(t)\varDelta ,\end{aligned}$$
(7)
$$\begin{aligned} R(t+1)&= R(t)+ \gamma I(t)\varDelta -\phi R(t)\varDelta ; \end{aligned}$$
(8)

and the discrete-time mass-balance constraint is,

$$\begin{aligned} S(t) + I(t) + R(t) = N, \end{aligned}$$
(9)

where here \(t=1,2,\ldots , T\).

In past decades, the deterministic SIR model and its various extensions (e.g., SIRS, SIR including birth and death rates, migration, etc.), have been used extensively for infectious-disease estimation and prediction in large and well mixed populations (e.g., Schenzle 1984; Anderson and May 1991; Keeling and Rohani 2007). The classic SIRS (CSIRS) model given by (1)–(3) or (6)–(8) is appealing because of its straightforward modeling strategy and its easily interpretable parameters. However, there are various sources of uncertainty in the model: First, there is uncertainty in the counts \(\{S(t), I(t),R(t)\}\) themselves; that is, the counts in the compartments are observed with error. Second, the rather simple model (1)–(3) [or (6)–(8) in the discrete-time setting] may not capture the uncertainties in the hidden dynamical epidemic process, such as the uncertainties caused by the presence of heterogeneous populations; and third, the values of the parameters \(\beta ,\,\gamma \), and \(\phi \) are uncertain.

A variety of stochastic models have been developed recently, through a probabilistic mechanism that involves a Markov chain of SIR states (e.g., Bailey 1975; Andersson and Britton 2000; Allen 2003; Xu et al. 2007). Some more recent stochastic models involve complex networks (e.g., Halloran et al. 2002; Zhou et al. 2006; Volz 2008) or drug resistance (e.g., Chao et al. 2012) to avoid the assumption of homogenous mixing. However, these stochastic models ignore the noisy nature of data, and they apply mass balance to the observed counts rather than the true counts. Furthermore, these models typically rely on many carefully chosen parameters, such as transmission rates, recovery rates, and so forth in heterogeneous populations; uncertainty in where the model’s parameter vector is located in the parameter space is not accounted for. Our strategy is to deal with each source of uncertainty using a Bayesian hierarchical statistical model, where the counts are observed with error and the dynamical evolution embodied in (6)–(8) is stochastic.

Bayesian percolation models have proven popular for modeling spatio-temporal dynamical processes (e.g., Catterall et al. 2012; Gibson et al. 2006) and have been applied to epidemics (e.g.,  Cook et al. 2007), but they ignore the true process hidden behind the noisy data. More recent Bayesian hierarchical models, which are widely used for mapping non-infectious diseases, aim to capture the true spatial process (e.g., Besag et al. 1991; Carlin and Banerjee 2002), but their process models and parameter models are not appropriate for epidemics. Those that do have a dynamical spatial statistical component have not generally been parameterized in terms of the interpretable components of the epidemic (e.g., Mugglin et al. 2002; Wood 2010).

Recently, partially observed nonlinear stochastic dynamical systems (also know as partially observed Markov processes, or state-space models) have been used extensively for infectious-disease estimation and prediction. A wide range of inference techniques have been proposed and implemented in the R statistical language as part of the package pomp (http://cran.at.r-project.org/web/packages/pomp/), such as nonlinear forecasting (e.g., Kendall et al. 1999, 2005), iterated filtering (Ionides et al. 2006; King et al. 2008; He et al. 2010), and approximate Bayesian particle filtering (e.g., Liu and West 2001; Arulampalam et al. 2001; Dukić et al. 2012; Rasmussen et al. 2011). Some more recent state-space models involve spatial components in the dynamical process model (e.g., Patterson et al. 2008). Some of these models are not appropriate for modeling epidemic flows (e.g., Kendall et al. 1999, 2005; Patterson et al. 2008). Those that are extensions of classic compartment epidemic models (e.g., the SIR model and the SEIR model) and that do pay attention to the underlying true process hidden behind the noisy data, either ignore the source of variation that captures randomness in the (hidden) epidemic process (e.g., Rasmussen et al. 2011), or they do not preserve the mass-balance property (e.g., Liu and West 2001; Dukić et al. 2012), which may introduce biased results. Recent extensions to stochastic models with a master equation have similar problems with mass balance (e.g., Alonso et al. 2007).

Notice that imposing the mass balance appropriately to improve accuracy of estimating counts in all compartments is important. If one only pays attention to estimating the infectious population (I) and ignores modeling the susceptible population (S) and the recovered population (R) (for example, simply using a value of R from the literature and letting \(S=N-I-R\), as if it were observed), it may result in biased estimates of important model parameters (e.g., the recovery rate \(\gamma \)) and finally lead to bias in the estimation of I. Further, bias in \(\gamma \) may lead to bias in the reproduction number, \(R_0\) (defined in Sect. 4.1), which in turn may affect the estimate of the proportion of susceptibles that need to be vaccinated to achieve “herd immunity” (Fine et al. 2011). Accurate estimates to support public-health decisions are critical for disease control and prevention.

In this article, we return to the classic SIRS (CSIRS) model (6)–(8) for motivation, and we propose a discrete-time, mass-balanced (Bayesian) hierarchical SIRS, or HSIRS, model, which is based directly on counts and imposes mass balance appropriately on the underlying true counts, rather than on the observed counts. Our model captures the randomness in the epidemic process by assuming that the dynamical process occurs on a log-odds-ratio scale, transformed from the scale where the true counts are mass-balanced. This new dynamical approach to infectious-disease modeling also shows how infectious diseases in heterogeneous populations could be modeled hierarchically.

In Sect. 2, we propose the HSIRS model for infectious-disease data. The actual computations associated with the posterior analysis involve local linearization of difference equations; see Sect. 3. In Sect. 4, we simulate in discrete time, a dataset from the HSIRS model and from a CSIRS model that is modified to incorporate observation error, and then we infer all unknowns of the models through Markov chain Monte Carlo (MCMC) analysis. Comparisons are given to CSIRS-model-based inference. In Sect. 5, we extend the example to a carefully designed simulation experiment with sufficient replication to make definitive comparisons between the HSIRS model and the CSIRS model. Discussion and conclusions are given in Sect. 6. Some technical derivations are given in the “Appendices”.

2 Bayesian hierarchical statistical SIRS (HSIRS) model

We assume that there is a true (unobserved) process underlying the observed epidemic counts, which we incorporate into the framework of a Bayesian hierarchical statistical model. This typically consists of three components: the data model (i.e., the conditional distribution of the data given hidden processes and parameters); the process model (i.e., the conditional distribution of the hidden processes given parameters); and the parameter model (i.e., the prior distribution of the parameters).

2.1 Data model

We model the raw counts directly rather than modeling the rates derived from the counts (e.g., Dukić et al. 2012, use Gaussian distributions to model the rates) and assume that the data model consists of (conditionally) independent Poisson distributions evolving at discrete time points. That is, for time points \(t=1,2,\ldots ,T\), in units of \(\varDelta \) days, the data model is

$$\begin{aligned} Z_S(t)|P_S(t)\sim \hbox {ind. }\mathrm{Poisson}(\lambda _N P_S(t)), \end{aligned}$$
(10)

independent from

$$\begin{aligned} Z_I(t)|P_I(t) \sim \hbox {ind. }\mathrm{Poisson}(\lambda _N P_I(t)), \end{aligned}$$
(11)

where \(Z_S(t)\) and \(Z_I(t)\) are the observed number of susceptible and infectious individuals at time \(t\), respectively; “ind.” is shorthand for “independent”; \(\lambda _N\) denotes the true total population count, and \(P_S(t)\) and \(P_I(t)\) are the underlying true rates of susceptible and infectious individuals at time \(t\), respectively. Since \(\lambda _N\) is known from demography and \(\lambda _N=Z_S(t)+Z_I(t)+Z_R(t)\), then \(Z_R(t)\) follows. Thus, the data are \(\{(Z_S(t), Z_I(t)): t=1,2,\ldots ,T\}\).

2.2 Process model

Recall the discrete-time CSIRS model defined by (6)–(8), which assumes that mass balance happens on the observed population. However, the appropriate place to model mass balance is on the true (hidden) process. That is, for \(t=1,2,\ldots T\), we have

$$\begin{aligned} \lambda _S(t)+\lambda _I(t)+\lambda _R(t)=\lambda _N, \end{aligned}$$
(12)

where \(\lambda _S(t),\,\lambda _I(t)\), and \(\lambda _R(t)\) are the underlying true (but hidden) counts of susceptible, infectious, and recovered individuals at time \(t\), respectively. Now define the true (hidden) rates, \(P_S(t),\,P_I(t)\), and \(P_R(t)\), via

$$\begin{aligned}&\lambda _S(t)\equiv \lambda _NP_S(t),\end{aligned}$$
(13)
$$\begin{aligned}&\lambda _I(t)\equiv \lambda _NP_I(t),\end{aligned}$$
(14)
$$\begin{aligned}&\lambda _R(t)\equiv \lambda _NP_R(t), \end{aligned}$$
(15)

where \(P_R(t)\) denotes the underlying true rate of recovered individuals at time \(t\). Then by substituting (13)–(15) into (12), it is straightforward to see that the mass balance in (12) can be rewritten as,

$$\begin{aligned} P_S(t)+P_I(t)+P_R(t)=1, \end{aligned}$$
(16)

and hence for \(t=1,2,\ldots T\),

$$\begin{aligned} P_R(t)=1-P_S(t)-P_I(t). \end{aligned}$$
(17)

Recall that the difference equations defined in (6)–(8) give easily interpretable dynamics, in which individuals move from the susceptible state, to the infectious state, then to the recovered state (and some individuals may become susceptible again). We wish to model this “SIRS flow” on the hidden process where \(t\) is discrete (in units of \(\varDelta \) days). We do this by deriving a set of deterministic difference equations on \(\lambda _S(t),\,\lambda _I(t)\), and \(\lambda _R(t)\). That is, for \(t=1,2,\ldots \), our process model is,

$$\begin{aligned} \lambda _S(t+1)&= \lambda _S(t)-\beta \varDelta \lambda _S(t)\lambda _I(t)+ \phi \varDelta \lambda _R(t),\end{aligned}$$
(18)
$$\begin{aligned} \lambda _I(t+1)&= \lambda _I(t)+ \beta \varDelta \lambda _S(t)\lambda _I(t)-\gamma \varDelta \lambda _I(t),\end{aligned}$$
(19)
$$\begin{aligned} \lambda _R(t+1)&= \lambda _R(t)+\gamma \varDelta \lambda _I(t)-\phi \varDelta \lambda _R(t). \end{aligned}$$
(20)

In (18)–(20), the SIRS flow has been preserved, and the rate parameters \(\beta ,\,\phi \), and \(\gamma \) are in units of per day (\(d^{-1}\)).

According to the definition of \(\lambda _S(t),\,\lambda _I(t)\), and \(\lambda _R(t)\) in (13)–(15), Eqs. (18)–(20) can be rewritten in terms of the true proportions, \(P_S(t),\,P_I(t)\), and \(P_R(t)\):

$$\begin{aligned} P_S(t+1)&= P_S(t)- \beta \lambda _N P_S(t)P_I(t)\varDelta +\phi P_R(t)\varDelta ,\end{aligned}$$
(21)
$$\begin{aligned} P_I(t+1)&= P_I(t)+\beta \lambda _N P_S(t)P_I(t)\varDelta -\gamma P_I(t)\varDelta , \end{aligned}$$
(22)
$$\begin{aligned} P_R(t+1)&= P_R(t)+ \gamma P_I(t)\varDelta -\phi P_R(t)\varDelta . \end{aligned}$$
(23)

Deterministic equations, such as those in (21)–(23), are unable to capture the uncertainties in the hidden epidemic model. Possible sources of uncertainty are the presence of heterogeneous populations and the existence of other categories such as exposed individuals. To handle the complexity while still preserving the mass balance, we apply the logit transformation to the true rates, which changes the scale of variability from \([0,1]\) to \((-\infty , \infty )\). That is, for \(t=1,2,\ldots ,\) define

$$\begin{aligned} W_S(t)&\equiv \mathrm{log}\left( \frac{P_S(t)}{P_R(t)}\right) ,\end{aligned}$$
(24)
$$\begin{aligned} W_I(t)&\equiv \mathrm{log}\left( \frac{P_I(t)}{P_R(t)}\right) , \end{aligned}$$
(25)

where \(W_S(t)\) and \(W_I(t)\) are the log odds ratios of susceptible-over-recovered populations, and infectious-over-recovered populations, respectively, at time \(t\). On the odds-ratio scale (\(\mathbf {W}\)-scale), we construct the process model in terms of \(\mathbf {W}(t)\equiv (W_S(t), W_I(t))^\prime \):

$$\begin{aligned} \mathbf {W}(t+1)={\varvec{\mu }}^W(t)+{\varvec{\xi }}(t+1), \end{aligned}$$
(26)

for discrete time \(t=1,2,\ldots ,\) in units of \(\varDelta \) days. We now discuss each of the components of (26), in turn. The vector \({\varvec{\mu }}^W(t)\equiv (\mu ^W_S(t), \mu ^W_I(t))'\) is the dynamical process that captures the temporal dependence. In “Appendix A.1”, we derive the nonlinear dynamical structure of \({\varvec{\mu }}^W(t)\) using (21)–(25). This derivation retains the SIRS flow on the hidden process; that is, for discrete time \(t=1,2,\ldots ,\) in units of \(\varDelta \) days,

$$\begin{aligned} \mu ^W_S(t)&= W_S(t)\nonumber \\&\quad +\,\mathrm{log}\left[ 1+\frac{\phi \varDelta }{\mathrm{exp} \left( W_S(t)\right) }-\frac{\beta \lambda _N\mathrm{exp}\left( W_I(t) \right) \varDelta }{1+\mathrm{exp}\left( W_S(t)\right) + \mathrm{exp}\left( W_I(t)\right) }\right] \nonumber \\&\quad +\,\mathrm{log}\left[ \frac{1}{1+\gamma \mathrm{exp}\left( W_I(t) \right) \varDelta -\phi \varDelta }\right] ,\nonumber \\ \end{aligned}$$
(27)
$$\begin{aligned} \mu ^W_I(t)&= W_I(t)\nonumber \\&\quad +\,\mathrm{log}\left[ 1-\gamma \varDelta +\frac{\beta \lambda _N \mathrm{exp}\left( W_S(t)\right) \varDelta }{1+\mathrm{exp}\left( W_S(t) \right) +\mathrm{exp}\left( W_I(t)\right) }\right] \nonumber \\&\quad +\,\mathrm{log}\left[ \frac{1}{1+\gamma \mathrm{exp}\left( W_I(t) \right) \varDelta -\phi \varDelta }\right] ,\nonumber \\ \end{aligned}$$
(28)

where recall that \(\beta ,\,\gamma \), and \(\phi \), are the transmission rate, recovery rate, and loss-of-immunity rate per day, respectively.

We denote the vector \({\varvec{\xi }}(t)\equiv (\xi _S(t), \xi _I(t))'\), to be the small-scale variation that captures the uncertainties in the hidden epidemic process. For \(t=1,2,\ldots \), we define

$$\begin{aligned} {\varvec{\xi }}(t)\sim \mathrm{MVN}(\mathbf {0}, {\varvec{\varSigma }}_{{\varvec{\xi }}}(t)), \end{aligned}$$
(29)

a multivariate normal (MVN) distribution with mean \(\mathbf {0}\) and diagonal covariance matrix \({\varvec{\varSigma }}_{{\varvec{\xi }}}(t) \equiv \mathrm{diag}(\sigma ^2_{\xi _S}(t), \sigma ^2_{\xi _I}(t)),\) with nonnegative variance components, \(\sigma ^2_{\xi _S}(t)\) and \(\sigma ^2_{\xi _I}(t)\). For the sake of simplicity, in this article we assume that \(\sigma ^2_{\xi _S}(t)=\sigma ^2_{\xi _S}\) and \(\sigma ^2_{\xi _I}(t)=\sigma ^2_{\xi _I}\), for all \(t=1,2,\ldots \). Notice that the (negative) covariation of \(W_S(t)\) and \(W_I(t)\) [defined in (24) and (25)] due to the mass-balance constraint in (16) has been structured through the covariance of the nonlinear dynamical process \({\varvec{\mu }}^W(t)\). Therefore, it is reasonable to assume that \({\varvec{\varSigma }}_{{\varvec{\xi }}}(t)\), which is the covariance of the conditional distribution, \([\mathbf {W}(t+1)|{\varvec{\mu }}^W(t)]\), has a diagonal structure. Also notice that we can still preserve mass balance while adding the small-scale variation, \({\varvec{\xi }}(t)\), on the \(\mathbf {W}\)-scale, because the mass balance is preserved when

$$\begin{aligned} P_S(t)&= \frac{\mathrm{exp}(W_S(t))}{1+\mathrm{exp}(W_S(t)) +\mathrm{exp}(W_I(t))},\\ P_I(t)&= \frac{\mathrm{exp}(W_I(t))}{1+\mathrm{exp}(W_S(t)) +\mathrm{exp}(W_I(t))},\\ P_R(t)&= \frac{1}{1+\mathrm{exp}(W_S(t))+\mathrm{exp}(W_I(t))}. \end{aligned}$$

Finally, notice that these equations do not put a constraint on the range of the values \(W_S(t)\) and \(W_I(t)\), so adding the error vector \({\varvec{\xi }}(\cdot )\) in (26) will not destroy the balance. However, if an error vector is added directly on the \(\mathbf {P}\)-scale, the mass balance constraint (16) is difficult to retain.

The strategy of transforming from the hidden proportion scale (\(\mathbf {P}\)-scale) to the hidden log-odds-ratio scale (\(\mathbf {W}\)-scale) and adding the small-scale variation on the \(\mathbf {W}\)-scale rather than on the \(\mathbf {P}\)-scale, is key to retaining the mass-balance constraint while allowing flexible SIRS flow to be handled. To our knowledge, this is a new approach to infectious-disease modeling; other Bayesian approaches (e.g., Dukić et al. 2012) do not preserve mass balance after building uncertainties into the process model.

2.3 Parameter model

To complete the model, we now specify the joint prior distribution for the parameters, which includes the transmission rate per unit time, \(\beta \); the rate of recovery per unit time, \(\gamma \); the loss-of-immunity rate per unit time, \(\phi \); and variance components, \(\{\sigma ^2_{\xi _S}\}\) and \(\{\sigma ^2_{\xi _I}\}\). Notice that the difference equations in (6)–(8) impose a natural constraint on \(\beta \); that is, for any time \(t=1,2,\ldots \),

$$\begin{aligned} \beta S(t) I(t) \le N-(1-\phi )R(t), \end{aligned}$$
(30)

because the number of individuals that become infectious at a certain time \(t\) should be less than or equal to the total number that could be infected at that time. Hence,

$$\begin{aligned} 0\le \beta \le \frac{N-(1-\phi ) R(t)}{S(t)I(t)}. \end{aligned}$$
(31)

In the context of the HSIRS model, this amounts to ensuring that \(\beta \) is bounded above by a hyperparameter, \(\beta _{\mathrm{max}}\). Furthermore, it is straightforward to see that \(\gamma \in [0,1]\) and \(\phi \in [0,1]\), due to their definition. Typically, information about the recovery rate \(\gamma \) is easier to obtain than the other rate parameters; we apply a logit transformation to \(\gamma \),

$$\begin{aligned} \theta _{\gamma }\equiv \mathrm{log}\left( \frac{\gamma }{1-\gamma }\right) , \end{aligned}$$
(32)

and assign a Gaussian prior to \(\theta _{\gamma }\).

Assuming statistical independence of parameters and using \([Y]\) as generic notation for the probability distribution of \(Y\), we assume that the parameter model can be written as

$$\begin{aligned} \left[ \beta , \theta _\gamma ,\phi , \sigma ^2_{\xi _S}, \sigma ^2_{\xi _I}\right] =[\beta ][\theta _\gamma ][\phi ][\sigma ^2_{\xi _S}][\sigma ^2_{\xi _I}], \end{aligned}$$
(33)

where we specify the prior distributions of individual parameters as follows:

$$\begin{aligned}&\beta \sim \mathrm{Uniform}[0, \beta _{max}],\\&\theta _{\gamma }\sim \mathrm{Normal}(\mu _{\theta _{\gamma }},\sigma ^2_{\theta _{\gamma }}),\\&\phi \sim \mathrm{Uniform}[0,1],\\&\sigma ^2_{\xi _S}\sim \hbox { Inverse Gamma }\left( a_{\xi _S},b_{\xi _S}\right) ,\\&\sigma ^2_{\xi _I}\sim \hbox { Inverse Gamma }\left( a_{\xi _I},b_{\xi _I}\right) . \end{aligned}$$

Notice that the uniform distributions on \(\beta \) and \(\phi \) could easily be replaced by the very flexible Generalized Beta distributions on their supports.

In practice, the hyperparameter \(\beta _{max}\) could be data-driven using (31):

$$\begin{aligned} \beta _{max}\equiv \min _t\left( \frac{\lambda _N-Z_R(t)}{Z_S(t)Z_I(t)} \right) , \end{aligned}$$
(34)

where

$$\begin{aligned} Z_R(t)=\max \left\{ 1, \lambda _N-Z_S(t)-Z_I(t)\right\} . \end{aligned}$$
(35)

The hyperparameter \(\mu _{\theta _{\gamma }}\) could be data-driven through the inverse relationship between recovery rate and the mean duration of the infectious period for individuals (e.g., Lloyd 2001), which is about 3 days for common influenzas (Centers for Disease Control; http://www.cdc.h1n1flu/recommendations.htm). In our case, we use \(\mu _{\theta _{\gamma }}= \mathrm{log}(0.33/(1-0.33))=-0.708\), with \(\sigma ^2_{\theta _{\gamma }}=0.01\). The other hyperparameters, namely, \(a_{\xi _S},\,b_{\xi _S},\,a_{\xi _I}\), and \(b_{\xi _I}\), need to be specified; for example, the choice, \(a_{\xi _S}=a_{\xi _I}=0.25\) and \(b_{\xi _S}=b_{\xi _I}=0.4\), results in a fairly vague prior for the variance components.

3 \(\mathbf {W}\)-scale approximations for the HSIRS model

Here, we derive a calibrated approximation to the nonlinear \(\mathbf {W}\)-scale process in the HSIRS model. From “Appendix A.2”, for \(t=1,2,\ldots \), Eq. (26) in the HSIRS model can be approximated by

$$\begin{aligned} \mathbf {W}(t+1)={\varvec{\mu }}^{LW}(t)+{\varvec{\zeta }}(t+1), \end{aligned}$$
(36)

where recall that \(\mathbf {W}(t)\equiv (W_S(t), W_I(t))'\) is the true log-odds-ratio vector.

The vector \({\varvec{\zeta }} (t)\equiv (\zeta _S(t), \zeta _I(t))'\) in (36) is the small-scale-variation vector that captures the uncertainties in the epidemic process as well as the higher-order terms in the Taylor-series expansions. For \(t=1,2,\ldots \), we assume a Multivariate Normal (MVN) distribution,

$$\begin{aligned} {\varvec{\zeta }}(t)\sim \mathrm{MVN}\left( \mathbf {0}, {\varvec{\varSigma }}_{{\varvec{\zeta }}}(t)\right) \!, \end{aligned}$$
(37)

where \({\varvec{\varSigma }}_{{\varvec{\zeta }}}(t)\equiv \mathrm{diag}\left( \sigma ^2_{\zeta _S}(t),\sigma ^2_{\zeta _I}(t)\right) \) is the covariance matrix of \({\varvec{\zeta }} (t)\), and \(\sigma ^2_{\zeta _S}(t)\) and \(\sigma ^2_{\zeta _I}(t) \) are nonnegative variance-component parameters. Typically, the components of \({\varvec{\varSigma }}_{{\varvec{\zeta }}}(t)\) are larger than the respective components of \({\varvec{\varSigma }}_{{\varvec{\xi }}}(t)\), because \(\{{\varvec{\zeta }}(t)\}\) also captures the higher-order terms left after matching the linear approximation.

In (36), the vector \({\varvec{\mu }}^{LW}(t)\equiv (\mu ^{LW}_S(t), \mu ^{LW}_I(t))'\) is a linear dynamical process derived through Taylor-series expansions that approximates the nonlinear stochastic process \({\varvec{\mu }}^{W}(t)\) defined in (27)–(28). From “Appendix A.2”, for \(t=1,2,\ldots \),

$$\begin{aligned} \mu ^{LW}_{S}(t)&= J_0(t)+J_1(t)W_S(t)+J_2(t)W_I(t),\end{aligned}$$
(38)
$$\begin{aligned} \mu ^{LW}_{I}(t)&= J_3(t)+J_4(t)W_S(t)+J_5(t)W_I(t), \end{aligned}$$
(39)

where

$$\begin{aligned} J_0(t)&\equiv \mathrm{log}\hat{A}_1(t)+\frac{1}{\hat{A}_1(t)} +\phi \left[ \frac{\mathrm{exp}(\hat{A}_5(t))(1- \hat{A}_5(t))}{\hat{A}_1(t)}+\frac{1}{\hat{A}_2(t)} \right] \varDelta \nonumber \\&\quad -\,\frac{\beta \lambda _N}{\hat{A}_1(t)}\left( 1- \frac{1}{1-\hat{A}_7(t)}+\frac{B_0(t)+\hat{A}_7(t)}{(1- \hat{A}_7(t))^2}\right) \varDelta \nonumber \\&\quad -\,\mathrm{log}\hat{A}_2(t)-\frac{1}{\hat{A}_2(t)}- \frac{\gamma }{\hat{A}_2(t)}e^{\hat{A}_6(t)}(1-\hat{A}_6(t))\varDelta , \end{aligned}$$
(40)
$$\begin{aligned}&\displaystyle J_1(t)\equiv 1-\frac{ \phi \mathrm{exp}\left( \hat{A}_5(t) \right) \varDelta }{\hat{A}_1(t)}-\frac{\beta \lambda _NB_1(t) \varDelta }{\hat{A}_1(t)(1-\hat{A}_7(t))^2},\end{aligned}$$
(41)
$$\begin{aligned}&\displaystyle J_2(t)\equiv -\frac{\beta \lambda _NB_2(t)\varDelta }{\hat{A}_1(t)(1- \hat{A}_7(t))^2}-\frac{\gamma e^{\hat{A}_6(t)} \varDelta }{\hat{A}_2(t)}, \end{aligned}$$
(42)
$$\begin{aligned} J_3(t)&\equiv \mathrm{log}\hat{A}_3(t)+ \frac{1-\gamma \varDelta }{\hat{A}_3(t)}+ \frac{\beta \lambda _N\varDelta }{\hat{A}_3(t)(1-\hat{A}_{10}(t))}- \frac{\beta \lambda _N(B_3(t)+\hat{A}_{10}(t))\varDelta }{\hat{A}_3(t)(1- \hat{A}_{10}(t))^2}\nonumber \\&\quad -\,\mathrm{log}\hat{A}_2(t)-\frac{1}{\hat{A}_2(t)}- \frac{\gamma e^{\hat{A}_6(t)}(1-\hat{A}_6(t))\varDelta }{\hat{A}_2(t)}+ \frac{\phi \varDelta }{\hat{A}_2(t)}, \end{aligned}$$
(43)
$$\begin{aligned}&\displaystyle J_4(t)\equiv \frac{-\beta \lambda _N B_4(t) \varDelta }{\hat{A}_3(t)(1-\hat{A}_{10}(t))^2},\end{aligned}$$
(44)
$$\begin{aligned}&\displaystyle J_5(t)\equiv 1-\frac{\beta \lambda _N B_5(t)\varDelta }{\hat{A}_3(t)(1- \hat{A}_{10}(t))^2}-\frac{\gamma e^{\hat{A}_6(t)} \varDelta }{\hat{A}_2(t)}. \end{aligned}$$
(45)

Also,

$$\begin{aligned} B_0(t)&\equiv e^{\hat{A}_4(t)}(1-\hat{A}_4(t))B^*(t),\end{aligned}$$
(46)
$$\begin{aligned} B_1(t)&\equiv \frac{e^{\hat{A}_5(t)+\hat{A}_4(t)}(1- \hat{A}_4(t))}{(1-\hat{A}_9(t))^2}-e^{\hat{A}_4(t)}B^*(t),\end{aligned}$$
(47)
$$\begin{aligned} B_2(t)&\equiv e^{\hat{A}_4(t)}B^*(t),\end{aligned}$$
(48)
$$\begin{aligned} B_3(t)&\equiv e^{\hat{A}_4(t)}(1-\hat{A}_4(t))+e^{\hat{A}_5(t)}(1-\hat{A}_5(t)),\end{aligned}$$
(49)
$$\begin{aligned} B_4(t)&\equiv -e^{\hat{A}_4(t)}-e^{\hat{A}_5(t)},\end{aligned}$$
(50)
$$\begin{aligned} B_5(t)&\equiv e^{\hat{A}_4(t)}, \end{aligned}$$
(51)

where

$$\begin{aligned} B^*(t)=\frac{1}{1-\hat{A}_9(t)}+\frac{e^{\hat{A}_5(t)}\left( \hat{A}_5(t)-1\right) -\hat{A}_9(t)}{(1-\hat{A}_9(t))^2}, \end{aligned}$$
(52)

and \(\{\hat{A}_j(t): j=1,\ldots ,10\}\) are discussed below.

Notice that by combining Eqs. (36)–(52), we are able to write \(\mathbf {W}(t+1)\) given by (36) in a multivariate autoregressive form:

$$\begin{aligned} \mathbf {W}(t+1)=\mathbf {C}(t)+\mathbf {H}(t)\mathbf {W}(t) +{\varvec{\zeta }}(t+1), \end{aligned}$$
(53)

where \(\mathbf {C}(t)\equiv (J_0(t), J_3(t))'\) and the \(2\times 2\) matrix \(\mathbf {H}(t)\) is

$$\begin{aligned} \mathbf {H}(t)\equiv \left( \begin{array}{cc} J_1(t) &{} J_2(t) \\ J_4(t) &{} J_5(t) \\ \end{array} \right) . \end{aligned}$$

The general idea behind (38)–(39) is to use \(\hat{A}_i(t),\,i=1,\ldots ,10\), as an initialization of the Taylor-series expansion of the nonlinear process \({\varvec{\mu }}^{W}(t)\) in (26). Formulas for \(\hat{A}_i(t)\) and the quantity \(A_i(t)\) that it approximates are given in Table 1. In practice, we use empirical values obtained from data \(\{Z_S(t)\}\) and \(\{Z_I(t)\}\) to obtain \(\hat{A}_i(t)\) close to \(A_i(t)\), where from Table 1 \(\{A_i(t):\,i=1,\ldots ,10\}\) are nonlinear components in the HSIRS model. From the transformation in (24) and (25), we can obtain

$$\begin{aligned} \lambda _NP_S(t)&= \frac{\lambda _N\mathrm{exp}(W_S(t))}{1 +\mathrm{exp}(W_S(t))+\mathrm{exp}(W_I(t))},\\ \lambda _NP_I(t)&= \frac{\lambda _N\mathrm{exp}(W_I(t))}{1+ \mathrm{exp}(W_S(t))+\mathrm{exp}(W_I(t))},\\ \lambda _NP_R(t)&= \frac{\lambda _N}{1+\mathrm{exp}(W_S(t))+ \mathrm{exp}(W_I(t))}. \end{aligned}$$

Now \(\lambda _NP_S(t),\,\lambda _NP_I(t)\), and \(\lambda _NP_R(t)\) are the means of \(\{Z_S(t)\},\,\{Z_I(t)\}\), and \(\{Z_R(t)\}\), and hence the values in the column “Initializations (\(\hat{A}(t)\))” in Table 1 are reasonable choices for \(\{\hat{A}_i(t)\}\). If data are not available at some time point, we obtain \(\{\hat{A}_i(t)\}\) in Table 1 by replacing the observed data \(\{Z_S(t)\}\) and \(\{Z_I(t)\}\) with the predicted counts provided by a simple model, such as the CSIRS model. The goal is to obtain \(\{\hat{A}_i(t)\}\) as close as possible to \(\{A_i(t)\}\).

Table 1 Table of the initializations \(\{\hat{A}_i(t):i=1,\ldots ,10\}\) and the quantities \(\{A_i(t):i=1,\ldots ,10\}\) that they approximate

In Table 1, \(\beta _0,\,\gamma _0\), and \(\phi _0\) are initial values of \(\beta ,\,\gamma \), and \(\phi \), respectively, which are used to enhance the Taylor-series expansions. We use values obtained from the CSIRS model (e.g., Anderson and May 1991; Wearing et al. 2005; Burr and Chowell 2006) for \(\beta _0,\,\gamma _0\), and \(\phi _0\). Further details of implementation can be found in Sect. 4.3. We performed sensitivity studies and noticed that the Markov chain Monte Carlo (MCMC) based on this linear approximation is not sensitive to the initializations (even in forecasting), because the small-scale-variation vector in (36) can absorb the higher-order terms in the Taylor-series expansion (e.g., Cressie and Wikle 2011, Section 7.3.3). Like the data model, the parameter model is unchanged, except that subscript \(\xi \) is replaced with subscript \(\zeta \); see Sect.  2.3 for details.

Finally, the approximate HSIRS model, which we call the ASIRS model, consists of the data model defined in (10)–(11), the linear dynamical process model for \(\{\mathbf {W}(t)\}\) defined in (36), and the parameter model defined in (33) with \(\xi \) replaced by \(\zeta \). Because of its computational efficiency, the ASIRS model is used in the MCMC algorithm that produces our posterior analysis.

4 Posterior analysis of a simulated epidemic

4.1 Simulated data

In this section, an epidemic is simulated from each of two processes meant to mimic an H1N1 outbreak in Franklin County, Ohio. The two datasets are from an HSIRS model (defined in Sect. 2) and a modified-CSIRS model (defined below), which we refer to as the HSIRS dataset and the modified-CSIRS dataset, respectively. Henceforth, the time step of the difference equations is \(\varDelta =1\) day. Notice that for epidemics with a different developing time step, the value of \(\varDelta \) could be easily adjusted according to the infectious nature of the underlying virus.

4.1.1 The HSIRS dataset

For simulating the HSIRS dataset from the HSIRS model, we select \(\gamma ^*=0.33\) per day as the true value of \(\gamma \), since there is an inverse relationship between \(\gamma \) and mean duration of the infectious period (the inverse relation between rate and average duration is due to the assumption that recovery is a process with constant intensity, that is, a constant rate per unit of time, which implies a negative exponential density); this is about 3 days for common influenza (Centers for Disease Control; http://www.cdc.h1n1flu/recommendations.htm). Furthermore, in epidemiology, the basic reproduction number, \(R_0\), is defined as the number of secondary infections caused by a single infective, introduced into a population made up entirely of susceptible individuals, over this individual’s course of the infection. Therefore, \(R_0\) can be obtained by

$$\begin{aligned} R_0=\frac{\beta \lambda _N}{\gamma }. \end{aligned}$$
(54)

Typically, \(R_0\) has a value between 1 and 2 for new strains of Influenza A in human communities (e.g., Anderson 2006). Therefore, since \(\lambda _N=1{,}068{,}978\) in the 2000 Census in Franklin County, Ohio, and \(\gamma ^*=0.33\), if we select \(\beta ^*=5.1*10^{-7}\) as the true value of \(\beta \), then we obtain \(R_0\approx 1.65\) to mimic a pandemic flu. Furthermore, for illustration, we simply assume \(\phi ^*=0.05\), which is a typical value of loss of immunity for pandemic strains used in the long-term period (e.g., Bansal et al. 2010).

We now turn to the components of variance. Recall that the signal-to-noise ratio (SNR) can be defined as

$$\begin{aligned} \mathrm{SNR}\equiv \frac{\mu }{\sigma }, \end{aligned}$$
(55)

where \(\mu \) is the signal mean and \(\sigma \) is the standard deviation of the noise. We denote \(\mathrm{SNR}_{W_S}(t)\) and \(\mathrm{SNR}_{W_I}(t)\) to be the SNR for the log odds ratios \(W_S(t)\) and \(W_I(t)\), respectively, at time \(t\). Then, for \(t=1,2,\ldots ,T+F\), we have

$$\begin{aligned} \mathrm{SNR}_{W_S}(t)&\equiv \frac{\mu ^W_{S}(t)}{\sigma _{\xi _S}(t)},\end{aligned}$$
(56)
$$\begin{aligned} \mathrm{SNR}_{W_I}(t)&\equiv \frac{\mu ^W_{I}(t)}{\sigma _{\xi _I}(t)}, \end{aligned}$$
(57)

where the right-hand sides of (56) and (57) are given by (27)–(29), and the time points \(T+1,\ldots ,T+F\) represent a forecast period of \(F\) days.

As a baseline at time \(t=1\), we assume the rates of susceptible and infectious individuals, namely, \(P^*_S(1)\) and \(P^*_I(1)\) to be \(0.94\) and \(0.01\), respectively. Hence,

$$\begin{aligned} \mu ^{W*}_{S}(1)&= \mathrm{log}\left( \frac{0.94}{1-0.94-0.01} \right) =2.934,\end{aligned}$$
(58)
$$\begin{aligned} \mu ^{W*}_{I}(1)&= \mathrm{log}\left( \frac{0.01}{1-0.94-0.01} \right) =-1.609, \end{aligned}$$
(59)

which results in

$$\begin{aligned} {\varvec{\mu }}^{W*}(1)=\left( \mu ^{W*}_{S}(1), \mu ^{W*}_{I}(1)\right) '=(2.934,-1.609)', \end{aligned}$$

the initial mean of the hidden log-odds-ratio vector, \(\mathbf {W}(1)\).

For the simulated epidemic, we assume that the true values of SNR at time \(t=1\) are

$$\begin{aligned} \mathrm{SNR}^*_{W_S}(1)=\mathrm{SNR}^*_{W_I}(1)=15. \end{aligned}$$

More choices of SNR are discussed in Sect. 5. Then from (56)–(57), we have

$$\begin{aligned} \sigma ^{2}_{\xi _S}(1)&= \left( 2.934/15\right) ^2=0.038,\end{aligned}$$
(60)
$$\begin{aligned} \sigma ^{2}_{\xi _I}(1)&= \left( -1.609/15\right) ^2=0.012. \end{aligned}$$
(61)

Recall that we assume \(\sigma ^{2}_{\xi _S}(t)=\sigma ^{2}_{\xi _S}\) and \(\sigma ^{2}_{\xi _I}(t)=\sigma ^{2}_{\xi _I}\), for \(t=1,2,\ldots \). Therefore, from (60) to (61) we select \(\sigma ^{2*}_{\xi _S}=0.038,\,\sigma ^{2*}_{\xi _I}=0.012\), as the true values of \(\sigma ^{2}_{\xi _S}\) and \(\sigma ^{2}_{\xi _I}\), respectively, and we write \({\varvec{\varSigma }}^*_{\xi }\equiv \mathrm{diag}(\sigma ^{2*}_{\xi _S}, \sigma ^{2*}_{\xi _I})\).

We simulate daily data for \(T+F=45\) days, where \(T=35\) and \(F=10\). Specifically, for \(t=1\), we simulate

$$\begin{aligned} \mathbf {W}(1)\sim \mathrm{MVN}\left( {\varvec{\mu }}^{W*}(1), {\varvec{\varSigma }}^*_{\xi }\right) ; \end{aligned}$$
(62)

then for \(t=2,\ldots ,45\), we simulate \(\{\mathbf {W}(t)\}\) using (26) and obtain \(\{\mathbf {P}(t)\}\) using transformations defined by (78)–(79) in “Appendix A.1”. Finally, we generate observed counts of susceptible and infectious individuals, \(\{Z_S(t)\}\) and \(\{Z_I(t)\}\), from the Poisson distribution defined in (10)–(11), conditional on \(\{\mathbf {P}(t)\}\). These counts, \(\{Z_S(t)\}\) and \(\{Z_I(t)\}\), represent the HSIRS dataset.

4.1.2 The modified-CSIRS dataset

Before we illustrate the procedure for generating the modified-CSIRS dataset, we first define the modified-CSIRS model. Recall that the CSIRS model is deterministic, which we now modify to incorporate uncertainty in the observations and in the parameters. Specifically, the modified-CSIRS model is a hierarchical statistical model that consists of a data model defined by (10)–(11), a deterministic process model defined by the CSIRS model in (21)–(23), and a parameter model. That is, for \(t=1,2,\ldots ,\) in units of \(\varDelta \) days, the modified-CSIRS model can be written as:

Data model:

$$\begin{aligned} Z_S(t)|\lambda _S(t)&\sim \hbox {ind. }\mathrm{Poisson}(\lambda _NP_S(t)),\\ Z_I(t)|\lambda _I(t)&\sim \hbox {ind. }\mathrm{Poisson}(\lambda _NP_I(t)). \end{aligned}$$

Process model:

$$\begin{aligned} P_S(t+1)&= P_S(t)-\beta \lambda _N P_S(t)P_I(t)\varDelta +\phi P_R(t)\varDelta ,\\ P_I(t+1)&= P_I(t)+\beta \lambda _N P_S(t)P_I(t)\varDelta -\gamma P_I(t)\varDelta ,\\ P_R(t+1)&= P_R(t)+\gamma P_I(t)\varDelta -\phi P_R(t)\varDelta . \end{aligned}$$

Parameter model:

$$\begin{aligned} {[}\beta , \gamma , \phi ]=[\beta ][\gamma ][\phi ]. \end{aligned}$$

In this hierarchical statistical model, recall that \(Z_S(t)\) and \(Z_I(t)\) are the observed susceptible and infectious counts, respectively; \(P_S(t),\,P_I(t)\), and \(P_R(t)\) are the hidden true proportions; \(\beta ,\,\gamma \), and \(\phi \) are transmission rate, recovery rate, and loss-of-immunity rate, respectively, per day; and \(\varDelta \) is the time step. As specified at the beginning of this section, \(\varDelta =1\). Notice that in the modified-CSIRS model, the process model is deterministic and does not capture any uncertainty in the hidden epidemic process.

For simulating the modified-CSIRS dataset from the modified-CSIRS model, we likewise simulate daily data for \(T+F=45\) days, where \(T=35\) and \(F=10\). For the unknown parameters, \(\beta ,\,\gamma \), and \(\phi \), we select the same values in Sect. 4.1.1, and we select the same starting proportions, \(P^*_S(1)=0.94\) and \(P^*_I(1)=0.01\). Then for \(t=2,\ldots ,45\), we simulate \(\{\mathbf {P}(t)\}\) from the deterministic process defined in (21)–(23); and finally we generate \(\{Z_S(t)\}\) and \(\{Z_I(t)\}\) using the Poisson distribution defined in (10)–(11), conditional on \(\{\mathbf {P}(t)\}\). These counts, \(\{Z_S(t)\}\) and \(\{Z_I(t)\}\), represent the modified-CSIRS dataset.

Notice that the HSIRS dataset mimics noisy observations from an epidemic with uncertainties in the underlying epidemic process, whereas the modified-CSIRS dataset mimics noisy observations from a deterministic epidemic process. Although for each dataset, we simulated data for \(T+F=45\) days, when we fit models, we assume that data are only available on the first \(T=35\) days and are missing in the last \(F=10\) days. Thus, we can assess the performance of these models on forecasting, by comparing their forecasts of \(\lambda _S(t)\) and \(\lambda _I(t)\) to the true values known from simulation.

Figure 1a–d show the daily observed susceptible counts \(\{Z_S(t)\}\) and infectious counts \(\{Z_I(t)\}\), for each of the two datasets, as a function of time (for all 45 days). We use the vertical line on each plot to emphasize that we assume data are only available for the first \(T=35\) days. Comparing these plots, we can clearly see that the epidemic patterns in the modified-CSIRS dataset (Fig. 1c, d) are much smoother than those in HSIRS dataset. In general, the two datasets suggest similar epidemic patterns in the 45 days. That is, both of them indicate clearly an epidemic between day 15 and day 40, and the infectious population reaches its peak in the period between day 20 and day 30.

Fig. 1
figure 1

Plots of counts \(\{\mathbf {Z}(t)\}\) for the HSIRS and the modified-CSIRS datasets as a function of time. The left-hand plots a and c show \(\{Z_s(t)\}\), and the right-hand plots b and d show \(\{Z_I(t)\}\). The vertical line in each plot emphasizes that for the analysis the data are only available on days 1–35, and they are missing on day 36–45. a HSIRS dataset, b HSIRS dataset, c modified-CSIRS dataset, d modified-CSIRS dataset

4.2 Fitting the HSIRS model

As mentioned in Sect. 3, we derive a well calibrated Gaussian linear process (the ASIRS model) to approximate the nonlinear W-scale process in the HSIRS model, which improves computational efficiency in posterior analysis and forecasting. Recall that within the 45-day study period, data are only available on the first \(T=35\) days and are missing for the last \(F=10\) days.

Based on the ASIRS model specified in Sect. 3, the joint posterior distribution of all “unknowns” is proportional to a product of the data model, the process model, and the parameter model. Notice that \(\{W_S(t)\}\) and \(\{W_I(t)\}\) are transformations of \(\{P_S(t)\}\) and \(\{P_I(t)\}\), and hence the conditioning in the data model can be equivalently written in terms of \(\mathbf {W}(t)\). Therefore, combining Eqs. (78)–(79) in “Appendix A.1”, the data model given by (10) and (11) can be written as follows: For \(t=1,\ldots ,T\),

$$\begin{aligned} Z_S(t)|\mathbf {W}(t)\sim \hbox { ind. }\mathrm{Poisson}\left( \frac{\lambda _N \mathrm{exp}(W_S(t))}{1+\mathrm{exp}(W_I(t))+ \mathrm{exp}(W_S(t))}\right) ,\\ Z_I(t)|\mathbf {W}(t)\sim \hbox { ind. }\mathrm{Poisson}\left( \frac{\lambda _N \mathrm{exp}(W_I(t))}{1+\mathrm{exp}(W_I(t))+ \mathrm{exp}(W_S(t))}\right) . \end{aligned}$$

Therefore, the joint posterior distribution of all unknowns is

$$\begin{aligned}&\left[ \beta , \theta _{\gamma },\phi , \sigma ^2_{\zeta _S}, \sigma ^2_{\zeta _I}, \{W_S(t)\}, \{W_I(t)\}|Z_S(1),\ldots ,Z_S(T), Z_I(1), \ldots , Z_I(T)\right] \nonumber \\&\quad \times \propto \prod _{t=1}^{T}[Z_S(t)|W_S(t),W_I(t)]\cdot \prod _{t=1}^{T}[Z_I(t)|W_S(t), W_I(t)]\cdot [W_S(1), W_I(1)| \sigma ^2_{\zeta _S}, \sigma ^2_{\zeta _I}]\nonumber \\&\quad \times \left( \prod _{t=2}^{T+F}[W_S(t), W_I(t)|\beta ,\theta _{\gamma }, \phi , W_S(t\!-\!1), W_I(t\!-\!1), \sigma ^2_{\zeta _S},\sigma ^2_{\zeta _I}]\right) \cdot [\beta ][ \theta _{\gamma }][\phi ][\sigma ^2_{\zeta _S}][\sigma ^2_{\zeta _I}], \end{aligned}$$
(63)

where at \(t=1,\,\mathbf {W}(1)\equiv (W_S(1), W_I(1))'\) has distribution,

$$\begin{aligned} \mathbf {W}(1)|\sigma ^2_{\zeta _S},\sigma ^2_{\zeta _I} \thicksim \mathrm{MVN}\left( {\varvec{\mu }}^W(1),{\varvec{\varSigma }}_{\zeta } \right) , \end{aligned}$$

and recall that \({\varvec{\varSigma }}_{\zeta }= \mathrm{diag}(\sigma ^2_{\zeta _S},\sigma ^2_{\zeta _I}).\) There is strong prior information on what happens at \(t=1\), which allows the hyperparameter \({\varvec{\mu }}^W(1)\) to be specified. For example, in Sect. 4.1, we specify it as,

$$\begin{aligned} {\varvec{\mu }}^W(1)={\varvec{\mu }}^{W*}(1)\equiv (2.934,-1.609)'. \end{aligned}$$

(Recall from Sect. 4.1 that \({\varvec{\mu }}^{W*}(1)\) is the initial mean of the log-odds-ratio vector used for simulating the data.) The parameter model defined in Sect. 2.3 consists of independent prior distributions on each parameter. Regarding the parameter-model specification for \({\varvec{\varSigma }}_{\zeta }\), we specify fairly vague priors for \(\sigma ^2_{\zeta _S}\) and \(\sigma ^2_{\zeta _I}\) by choosing independent Inverse Gamma distributions with hyperparameters, \(a_{\zeta _S}=a_{\zeta _I}=0.25,\,b_{\zeta _S}=b_{\zeta _I}=0.4\). Recall from Sect. 2.3 that we also specify \(\beta \sim \mathrm{Uniform}(0, \beta _{max})\), where \(\beta _{max}\) is given by (34); \(\theta _{\gamma }\sim \mathrm{Normal}(0.33, 0.01)\); \(\phi \sim \mathrm{Uniform}[0, 1]\).

Even based on the ASIRS model, the posterior distribution is not available analytically, owing to a normalizing constant that cannot be obtained in closed form. However, we can sample from the posterior distribution using a Markov chain Monte Carlo (MCMC) algorithm with a Gibbs sampler that incorporates Metropolis–Hastings steps where necessary (e.g., Waller et al. 1997). These are based on full conditional distributions that are given in “Appendix A.3”.

The MCMC algorithm draws samples, cyclically from each full conditional distribution, conditioning on the most recent samples drawn from the other full conditionals. This iterative procedure defines a Markov chain whose stationary distribution is the joint distribution of all the unknowns given the data (i.e., the posterior distribution). Hence, after a “burn-in” number of iterations, we obtain samples from the posterior distribution. Notice that in our case, except for \([\sigma ^2_{\zeta _S}|\mathrm{rest}]\) and \([\sigma ^2_{\zeta _I}|\mathrm{rest}]\), which follow the Inverse Gamma distributions, and \([W_S(t), W_I(t)|\mathrm{rest}]\) for \(t>T\), which follow a Multivariate Normal (MVN) distribution (in “Appendix A.3”, we define “rest” to represent all other unknowns as well as the data \(\{Z_S(1),\ldots ,Z_S(T)\}\) and \(\{Z_I(1),\ldots ,Z_I(T)\}\)), the full conditional distributions of all the other unknowns cannot be simulated directly, and so Metropolis–Hastings updates are applied (e.g., Robert and Casella 2004).

4.3 Posterior analysis and forecasting

4.3.1 Posterior analysis based on the HSIRS model

For the purpose of comparison, we fit the approximate HSIRS (i.e., ASIRS) model to each of the two datasets, and we refer to them as:

  • case AH: fit the ASIRS model to the HSIRS dataset

  • case AM: fit the ASIRS model to the modified-CSIRS dataset

For each of these two cases, we ran an MCMC chain of 30,000 iterations. After a burn-in of 3,000 iterations, we obtained a total of 27,000 samples from the posterior distribution for each case listed above. Notice that this is not a particle-filtering approach in which new data are used to update current and past posteriors, so avoiding the need to re-run the MCMC.

Figure 2 shows the posterior behavior of \(\beta ,\,\gamma ,\,\sigma ^2_{\zeta _S},\,\sigma ^2_{\zeta _I}\), and \(\phi \), for the case AH. Similar figures are obtained for the case AM (not shown). The posterior median, mean, variance, and 95 % Bayes credible interval (95 % CI) for parameters in each of the two cases are shown in Tables 2 and 3. For both cases, we can see that, except for \(\gamma \) (recall that \(\theta _{\gamma }\) in the tables is the logit transformation of \(\gamma \)), the posteriors for the parameters are much tighter than the priors. See also Fig. 3, which shows the prior and posterior distributions of \(\beta \) (chosen as a representative parameter) for both cases. Hence, there is substantial learning about these parameters, but there is almost no learning for \(\gamma \) because it already has a tight prior.

Fig. 2
figure 2

Trace plots for the case AH: a trace plot for \(\beta \); b trace plot for \(\gamma \); c trace plot for \(\sigma ^2_{\zeta _S}\); d trace plot for \(\sigma ^2_{\zeta _I}\); e trace plot for \(\phi \). Red lines in plots ae indicate the true values of the parameters used for simulating the HSIRS dataset, namely, \(\beta ^*,\,\gamma ^*,\,\sigma ^{2*}_{\xi _S},\,\sigma ^{2*}_{\xi _I}\), and \(\phi ^*\), respectively. (Color figure online)

Fig. 3
figure 3

Prior (red line) and Posterior (histogram) distributions of \(\beta \) for a case AH; b case AM. In each plot, blue dashed lines indicate 95 % posterior credible intervals; the green solid line indicates the true value. (Color figure online)

Table 2 Table of parameters and summaries of their priors and posteriors for the case AH
Table 3 Table of parameters and summaries of their priors and posteriors for the case AM

Because the data are simulated, we have the opportunity to compare the posterior samples with the true values. As mentioned in Sect. 4.1, for both datasets, we use \(\beta ^*=5.1*10^{-7},\,\gamma ^*=0.33\), and \(\phi ^*=0.05\), as the true values of the transmission rate, the recovery rate, and the loss-of-immunity rate, respectively (in units of per day). Tables 2 and 3 show that ASIRS, the (approximate) hierarchical model, performs well to recover these parameters, and the posterior 95 % CIs of \(\beta \) and \(\phi \) are much tighter than those of their priors. (The parameter for \(\gamma \) already has a tight prior, and its posterior hardly changes.) Figure 2a, b, e show the true values of the three parameters as red lines for the case AH. We find the agreements are excellent, especially for \(\beta \) and \(\gamma \). The same conclusions hold for the case AM (not shown).

Recall that the HSIRS dataset has uncertainties associated with the hidden epidemic process, and we selected \(\sigma ^{2*}_{\xi _S}=0.038,\,\sigma ^{2*}_{\xi _I}=0.012\) as the true values of the small-scale variance components. Tables 2 and 3 indicate that when fitting the ASIRS model to the HSIRS dataset (case AH), the posterior samples of the small-scale variances in the ASIRS model (i.e., \(\sigma ^{2}_{\zeta _S}\) and \(\sigma ^{2}_{\zeta _I}\)) tend to be quite a bit larger than these values. These results are expected since, as explained in Sect. 3, the small-scale variation terms in the ASIRS model have the flexibility to capture higher-order terms not included in the linear approximations.

To assess the general performance of the ASIRS model to approximate the HSIRS model, we use a discrepancy measure, as described in Cressie and Wikle (2011), Section 2.2.2, due to Gelman et al. (1996). Based on a given discrepancy measure, we obtain a posterior predictive p value and perform a diagnostic procedure to determine whether the model fits the data.

Specifically, for the \(k\)th sample from the MCMC, \(k=1,\ldots ,m=27{,}000\), we define a discrepancy measure, \(\psi \left( \mathbf {Z};\varvec{\lambda }^{(k)}\right) \), as follows:

$$\begin{aligned} \psi \left( \mathbf {Z};\varvec{\lambda }^{(k)}\right) \!\equiv \!\sum _{t=1}^{T}\left[ \mathbf {Z}(t)\!-\! \mathrm{E}\left( \mathbf {Z}(t)|{\varvec{\lambda }}^{(k)}(t) \right) \right] '\left( {\varvec{\varSigma }}^{(k)}_Z(t) \right) ^{-1}\left[ \mathbf {Z}(t)\!-\!\mathrm{E}\left( \mathbf {Z}(t)|{\varvec{\lambda }}^{(k)}(t)\right) \right] ,\nonumber \\ \end{aligned}$$
(64)

where \({\varvec{\lambda }}^{(k)} (t)\equiv \left( \lambda _S^{(k)}(t), \lambda _I^{(k)}(t)\right) '\) denotes the vector of true counts at time \(t\) for the \(k\)th MCMC sample, and

$$\begin{aligned} \lambda _S^{(k)}(t)&\equiv \lambda _NP_S^{(k)}(t),\end{aligned}$$
(65)
$$\begin{aligned} \lambda _I^{(k)}(t)&\equiv \lambda _NP_I^{(k)}(t). \end{aligned}$$
(66)

Recall that we assume data are only available on the first \(T=35\) days for each of the two datasets. Based on the Poisson data model defined in (10)–(11) and on the definitions of \({\varvec{\lambda }}^{(k)}(t)\) in (65) and (66),

$$\begin{aligned} \left( {\varvec{\varSigma }}^{(k)}_Z(t)\right) \equiv \mathrm{diag}\left( \lambda _S^{(k)}(t), \lambda _I^{(k)}(t)\right) , \end{aligned}$$

represents the data model’s covariance matrix, and hence (64) take the form of a Wald statistic. Furthermore, for \(t=1,\ldots ,T\), if we use \(\mathbf {Z}_{rep}(t)\) to denote an independent replicate of the data, then the posterior-predictive distribution of \(\mathbf {Z}_{rep}(t)\) can be defined as (Gelman et al. 1996),

$$\begin{aligned} \left[ \mathbf {Z}_{rep}(t)|\mathbf {Z}(t)\right] =\int \int \left[ \mathbf {Z}_{rep}(t)|\{\mathbf {W}(t)\},{\varvec{\varTheta }} \right] \left[ \{\mathbf {W}(t)\},{\varvec{\varTheta }}|\mathbf {Z}(t) \right] d\{\mathbf {W}(t)\}d{\varvec{\varTheta }}(t), \end{aligned}$$
(67)

where

$$\begin{aligned} {\varvec{\varTheta }}\equiv \{\beta ,\gamma , \phi , \sigma ^2_{\zeta _S}, \sigma ^2_{\zeta _I}\}. \end{aligned}$$

Thus, for the \(k\)th sample, \(\mathbf {Z}^{(k)}_{rep}(t)\) is drawn from the posterior distribution \(\left[ \mathbf {Z}_{rep}(t)|\mathbf {Z}(t)\right] ,\,t=1,\ldots ,T\), and we obtain the discrepancy measure \(\psi \left( \mathbf {Z}^{(k)}_{rep};\varvec{\lambda }^{(k)}\right) \) by replacing \(\mathbf {Z}\) in (64) with \(\mathbf {Z}^{(k)}_{rep}(t)\) defined in (67). Therefore, the replicates,

$$\begin{aligned} \{\mathbf {Z}^{(1)}_{rep}(t),\mathbf {Z}^{(2)}_{rep}(t), \ldots , \mathbf {Z}^{(m)}_{rep}\}, \end{aligned}$$

should “look like” the data \(\mathbf {Z}(t)\) if the model is appropriate. Based on this idea, we can apply posterior-predictive diagnostics in the case AH and the case AM. Since the HSIRS dataset is simulated from the HSIRS model, we can also assess the performances of the ASIRS model to approximate the HSIRS model.

The posterior predictive p value can be obtained as below (Gelman et al. 1996):

$$\begin{aligned} \mathrm{Pr}\left( \psi (\mathbf {Z}_{rep};\varvec{\lambda }) \ge \psi (\mathbf {Z};\varvec{\lambda })\right) = \frac{1}{m}\sum _{k=1}^{m}\mathrm{I}\left[ \psi \left( \mathbf {Z}^{(k)}_{rep};\varvec{\lambda }^{(k)}\right) \ge \psi \left( \mathbf {Z};\varvec{\lambda }^{(k)}\right) \right] , \end{aligned}$$
(68)

where \(\mathrm{I}(\cdot )\) denotes an indicator function.

Based on (68), we obtain the posterior predictive p values for the case AH, to be 0.458; hence, there is no striking evidence for a lack of model fit when fitting the ASIRS model to the HSIRS dataset. This conclusion is further supported by Fig. 4, which plots \(\psi \left( \mathbf {Z}^{(k)}_{rep};\varvec{\lambda }^{(k)}\right) \) against \(\psi \left( \mathbf {Z};\varvec{\lambda }^{(k)}\right) \) for the case AH; there is no striking departure from the 45-degree line. Indeed, the same is true for the case AM, and we conclude that there is no evidence for lack of fit for the ASIRS model when fitted to either dataset. That is, the ASIRS model appears to have considerable flexibility.

Fig. 4
figure 4

Scatter plot of \(\psi \left( \mathbf {Z};\varvec{\lambda }^{(k)}\right) \) versus \(\psi \left( \mathbf {Z}^{(k)}_{rep};\varvec{\lambda }^{(k)}\right) \) for the case AH

4.3.2 Comparisons of the fitted ASIRS model and the fitted CSIRS model

We now fit the (discrete-time) CSIRS model to each of the two simulated datasets described in Sect. 4.1. In practice, estimates of \(\beta ,\,\gamma \), and \(\phi \), denoted as \(\hat{\beta },\,\hat{\gamma }\), and \(\hat{\phi }\), are needed in order to solve for \(\{S(t)\}\) and \(\{I(t)\}\) in Eqs. (6)–(8). Here, we minimize the sum of squares between the estimated and observed infectious counts over time (e.g., Anderson and May 1991; Wearing et al. 2005; Burr and Chowell 2006). That is,

$$\begin{aligned} (\hat{\beta },\hat{\gamma },\hat{\phi })= \mathrm{arg}\min _{(\beta ,\gamma ,\phi )}\left[ \sum _{t=1}^{T} \left( \hat{I}(t;\beta ,\gamma ,\phi )-Z_I(t)\right) ^2\right] , \end{aligned}$$
(69)

where \(\hat{I}(t; \beta ,\gamma ,\phi )\) is the deterministic estimate of infectious counts at time \(t\), obtained after substituting \(\beta ,\,\gamma \), and \(\phi \) into Eqs. (6)–(8). Recall that here \(T=35\), and also notice that these least squares estimates, \(\hat{\beta },\,\hat{\gamma }\), and \(\hat{\phi }\), are the values selected in Sect. 3 for the initial MCMC values (denoted \(\beta _0,\,\gamma _0\), and \(\phi _0\), there.) Finally, the CSIRS forecasts are given by (6)–(8) for \(t=T+1,\ldots ,T+F;\) using obvious notation, they are \(\{\hat{S}(t; \hat{\beta }, \hat{\gamma }, \hat{\phi }):t=T+1,\ldots ,T+F\},\,\{\hat{I}(t; \hat{\beta }, \hat{\gamma }, \hat{\phi }):t=T+1,\ldots ,T+F\}\), and \(\{\hat{R}(t; \hat{\beta }, \hat{\gamma }, \hat{\phi }):t=T+1,\ldots ,T+F\}\), where \(T=35\) and \(F=10\). For \(t=1,\ldots ,T\), the observed counts are considered to be the CSIRS-predicted counts.

Now, for the HSIRS model (approximated by ASIRS), the appropriate predictions are obtained from posterior inference on \(\lambda _S(t)\), and \(\lambda _I(t)\), for the entire study period \(t=1,\ldots ,T,T+1,\ldots ,T+F\), but based only on data from \(t=1,\ldots ,T\). The Bayesian predictions are actually given for \(P_S(t)\) and \(P_I(t)\); then to obtain ASIRS-predicted counts, Eqs. (13)–(15) along with mass balance given by (17) are used.

In the infectious-disease setting, the number of infectious individuals is one of the most important quantities of interest. The estimated trajectories given in Fig. 5 show infectious counts for the entire 45-day study period, obtained by fitting the ASIRS model and the CSIRS model to each of the two datasets. These are compared to the true (hidden) values of infectious counts. Consider the modified-CSIRS dataset: upon inspection of Fig. 5b, we see that both models can capture the overall epidemic pattern very well, not only during the first 35 days when data are available, but also in the last 10 days when there are no data. Now consider the HSIRS dataset: when the underlying epidemic process has stochastic components, the disadvantage of the CSIRS model becomes apparent; upon inspection of Fig. 5a, estimates from fitting the CSIRS model are oversmoothed, even on the days when data are available. In contrast, the agreement between the true value and the posterior median of \(\{\lambda _I(t)\}\), obtained from fitting the ASIRS model, is excellent when data are available. On the last 10 days when there are no data, the hierarchical model is able to predict the general downward trend at the end of the epidemic process; however, the CSIRS model is unable to deal with the uncertainties, mistakenly forecasting that the epidemic is maintained in the last 10 days.

Fig. 5
figure 5

Estimated infectious counts obtained from fitting the ASIRS model and the CSIRS model to the HSIRS and modified-CSIRS datasets, as a function of time. The vertical line in each plot emphasizes that data are only available on days 1–35, and that they are missing on days 36–45. The blue solid line is the posterior median of \(\{\lambda _I(t)\}\) obtained from the ASIRS model. The black thin line is the estimate of the infectious counts from the CSIRS model. The pink stars give the true infectious counts. a Results based on HSIRS dataset (infectious), b results based on modified-CSIRS dataset (infectious). (Color figure online)

Another disadvantage of the CSIRS model is that it is unable to provide any uncertainty measures to accompany its deterministic-modeling strategy. In contrast, when fitting the hierarchical model, we can obtain uncertainty measures for any quantity of interest, based on its posterior distribution. For example, Fig. 6 shows 0.025, 0.25, 0.5, 0.75, 0.975 quantiles of posterior distributions of the hidden infectious counts, \(\lambda _I(t)\), during the forecasting period \(\{T+1,\ldots ,T+F\}\). We can see that in all cases, the posterior 50 % CI of \(\lambda _I(t)\) obtained from fitting the ASIRS model, cover the true values at all times.

Fig. 6
figure 6

Posterior quantiles of \(\{\lambda _I(t)\}\) for the infectious population, obtained from the ASIRS model during the forecasting period from days 36 to 45 (based on data from day 1 to day 35, from the respective datasets). Lines (from bottom to top) indicate the 0.025, 0.25, 0.5, 0.75 and 0.975 quantiles, and pink stars give the true values of \(\{\lambda _I(t)\}\). a Case AH forecasting (infectious), b case AM forecasting (infectious). (Color figure online)

Now consider the susceptible counts. Figures 7 and 8 clearly indicate that the ASIRS model performs better than the CSIRS model, when fitting to the HSIRS dataset.

Fig. 7
figure 7

Estimated susceptible counts obtained from fitting the ASIRS model and the CSIRS model to the HSIRS and modified-CSIRS datasets, as a function of time. The vertical line in each plot emphasizes that data are only available on days 1–35, and that they are missing on days 36–45. The blue solid line is the posterior median of \(\{\lambda _S(t)\}\) obtained from the ASIRS model. The black thin line is the estimate of the infectious counts from the CSIRS model. The pink stars give the true susceptible counts. a Results based on HSIRS dataset (susceptible), b results based on modified-CSIRS dataset (susceptible). (Color figure online)

Fig. 8
figure 8

Posterior quantiles of \(\{\lambda _S(t)\}\) for the susceptible population, obtained from the ASIRS model during the forecasting period from day 36 to day 45 (based on data from day 1 to day 35, from the respective datasets). Lines (from bottom to top) indicate the 0.025, 0.25, 0.5, 0.75 and 0.975 quantiles, and pink stars give the true values of \(\{\lambda _S(t)\}\). a Case AH forecasting (susceptible), b case AM forecasting (susceptible). (Color figure online)

The analyses given in this section illustrate the advantages and the practicalities associated with a hierarchical statistical approach to inferring the dynamical evolution of an infectious disease. In the next section, we confirm those inferential advantages through a simulation experiment.

5 A simulation experiment

In this section, a simulation experiment is presented to compare the performances of the HSIRS model (actually, the approximative ASIRS model) and the CSIRS model using design-based criteria under various factor combinations (i.e., a factorial experimental design), assuming that data are from an HSIRS model (defined in Sect. 2). That is, the processes \(\{\mathbf {W}(t)\}\) and \(\{\mathbf {P}(t)\}\) are simulated according to (26) and (78)–(79), from which the data processes \(\{Z_S(t)\}\) and \(\{Z_I(t)\}\) are simulated from the Poisson distributions defined in (10)–(11), conditional on \(\{\mathbf {P}(t)\}\). As in Sect. 4.1.1, we set the true values of the parameters in (26) as (\(\gamma ^*=0.33,\,\phi ^*=0.05\)), and we simulate the baseline \(\mathbf {W}(1)\) from (62) with \({\varvec{\mu }}^{W*}(1)=(2.934,-1.609)\). The baseline small-scale variation \({\varvec{\varSigma }}^*_{\xi }\) in (62), the values of \(T\) and \(F\), and the parameter \(\beta \) in (26) are chosen in ways that relate to factors in the experiment and will be described in Sect. 5.1. For each of the factor combinations, we simulate \(L=100\) replications.

5.1 Factors of the simulation experiment

Four factors are considered in this simulation experiment: fitted model (FM) is a factor that compares the HSIRS model and the CSIRS model; in the terminology of experimental design, this is considered the “treatment.” Three other factors related to the data-generating schemes are also included; these are data information (DI), small-scale variation (SV), and transmission rate (TR). The details of these four factors are now presented.

Fitted Model (FM)

The factor FM h as two levels, where FM \(=\) 0 represents the CSIRS model and FM \(=\) 1 represents the HSIRS model. The HSIRS model is specified by (10), (11), and (26) in Sect. 2, and by (78), (79), and (80) in “Appendix A.1”. The CSIRS model is specified by (6)–(8) in Sect. 1.

Data Information (DI)

Two different levels of DI are considered. One is where there are data for 15 days (DI \(=\) 0), and the other for 35 days (DI \(=\) 1). In each situation, we perform forecasting for up to 10 days beyond the last day where there were data. The choice of 15 days (DI \(=\) 0) was based on CDC’s response to the 2009 H1N1 pandemic. According to “The 2009 H1N1 Pandemic: Summary Highlights, April 2009–April 2010” (CDC: http://www.cdc.gov/h1n1flu/cdcresponse.html), an H1N1 infection was first detected in the US in a 10-year-old patient in California on April 15, 2009; 15 days later, that is, on April 29, 2009, WHO raised the influenza pandemic alert from phases 4 to 5, signaling that a pandemic was imminent. We selected 35 days (DI \(=\) 1) because, in retrospect, the 2009 H1N1 pandemic took 35 days to develop from the time the first case occurred in Mexico (Fig. 9, from “Outbreak of Swine-Origin Influenza A (H1N1) Virus Infection—Mexico, March–April 2009”;

CDC: http://www.cdc.gov/mmwR/preview/mmwrhtml/mm58d0430a2.html).

Fig. 9
figure 9

A figure from “Outbreak of Swine-Origin Influenza A (H1N1) Virus Infection—Mexico, March–April 2009,” Centers for Disease Control and Prevention (CDC)

Small-scale Variation (SV)

Two different levels of SV are considered. Recall the definitions of signal-to-noise ratios, \(\mathrm{SNR}_{W_S}(t)\) and \(\mathrm{SNR}_{W_I}(t)\), in Eqs. (56) and (57) in Sect. 4.1.1. These are put equal to each other, and the two levels are SNR \(=\) 15 (SV \(=\) 0) and SNR \(=\) 5 (SV \(=\) 1). Consider the initial time \(t=1\). Then for SV \(=\) 0,

$$\begin{aligned} \mathrm{SNR}_{W_S}(1)= \mathrm{SNR}_{W_I}(1)=15; \end{aligned}$$

hence (60)–(61) gives \(\sigma ^{2}_{\xi _S}(1)=0.038,\, \sigma ^{2}_{\xi _I}(1)=0.012\). For \(\mathbf SV =1\),

$$\begin{aligned} \mathrm{SNR}_{W_S}(1)= \mathrm{SNR}_{W_I}(1)=5; \end{aligned}$$

then from (56) to (57), we have

$$\begin{aligned} \sigma ^{2}_{\xi _S}(1)&= \left( 2.934/5\right) ^2=0.344,\\ \sigma ^{2}_{\xi _I}(1)&= \left( -1.609/5\right) ^2=0.104. \end{aligned}$$

To control this factor in the simulation, we assume henceforth that \(\sigma ^{2}_{\xi _S}(t)\equiv \sigma ^{2}_{\xi _S}(1)\) and \(\sigma ^{2}_{\xi _I}(t)\equiv \sigma ^{2}_{\xi _I}(1)\), for \(t=1,2,\ldots \). Therefore, SV \(=\) 0 represents {\(\sigma ^{2}_{\xi _S}(t)=0.038,\,\sigma ^{2}_{\xi _I}(t)=0.012\), SNR \(=\) 15}, and SV \(=\) 1 represents {\(\sigma ^{2}_{\xi _S}(t)=0.344,\,\sigma ^{2}_{\xi _I}(t)=0.104\), SNR \(=\) 5}.

Transmission Rate (TR)

Two levels of the transmission rate (TR) per unit time are considered: \(\beta =3.3958\times 10^{-7}\) (TR \(=\) 0) and \(\beta =5.8654\times 10^{-7}\) (TR \(=\) 1). These two levels of \(\beta \) result in the basic reproduction number \(R_0=1.1\) and \(1.9\), respectively, given by Eq. (54). According to Anderson (2006), \(R_0\) usually has a value between 1 and 2 for new strains of influenza in the human community. Therefore, by selecting these two levels of TR, we are aiming to mimic scenarios with small and large transmission rates, respectively, but still within the usual range.

5.2 Results of the simulation experiment

In order to compare the approximate HSIRS model and the CSIRS model, we define a response variable based on the empirical mean squared prediction error (MSPE) in each simulation run: Let \(T^O\) denote the times at which data are observed, and let \(T^M\) denote the times at which data are missing; note that in our case, \(T^M\) is a 10-day forecast period that follows \(T^O\). Let \(P_X(t,l)\) denote the \(l\)-th simulated realization of the underlying true rate of the population in subgroup \(X\) [see Eqs. (13)–(15) for the definition of “true rate”]; in our case, \(X\) denotes either the susceptible (S) or the infectious (I) population. Let \(\hat{P}_X(t,l)\) denote a generic predictor of \(P_X(t,l)\). Note that \(P_X(t,l)\) contains the same information as \(\lambda _X(t,l)\), which denotes the \(l\)-th realization of the simulated true counts for the population in subgroup \(X\), since \(\lambda _X(t,l)=\lambda _NP_X(t,l)\) and \(\lambda _N\) is a known constant total population. Then we define

$$\begin{aligned} MSPE _X(T^*,l)&\equiv \frac{1}{|T^*|}\sum _{t\in T^*}\left( \hat{P}_X(t,l)-P_X(t,l)\right) ^2;\quad l\!=\!1,\ldots ,L, \end{aligned}$$
(70)

where \(T^*=T^O\) or \(T^M,\,|T^*|\) is the number of days in the time period \(T^*\), and \(L\) is the total number of simulation runs for each of the factor combinations in this study. The bias of the predictor can be studied through,

$$\begin{aligned} BIAS_X(T^*,l)\equiv \frac{1}{|T^*|}\sum _{t\in T^*}\left( \hat{P}_X(t,l)-P_X(t,l)\right) ;\quad l=1,\ldots ,L. \end{aligned}$$
(71)

We performed an analysis of variance (ANOVA) to investigate which factors are important and under which scenarios the approximate HSIRS model provides substantial improvement over the CSIRS model. First, we consider MSPE. Because of the skewness and mean-variance dependence in \(\left\{ MSPE _X(T^*,l)\right\} \), we use a fourth-root transformation to transform the response (Cressie et al. 2010). The boxplots in Figs. 10 and 11 show that the distributions of the differences between the fourth-root transformations (superscripts “C” and “H” denote, respectively, analysis under the CSIRS model and under the approximate HSIRS model):

$$\begin{aligned} \left\{ MSPE ^C_X(T^*,l)^{1/4}\!-\! MSPE ^H_X(T^*,l)^{1/4}:l\!=\!1,\ldots ,L;\, X\!=\!S,I; \,T^*\!=\!T^O, T^M \right\} , \end{aligned}$$

is suitable for a classical ANOVA. We then define

$$\begin{aligned} A^Y_X(T^*)\equiv \mathrm{ave}_l\left\{ MSPE _X(T^*,l)^{1/4}\right\} ; \,X=S,I; Y=C,H; \, T^*=T^O,T^M, \end{aligned}$$
(72)

where the average is taken over \(L=100\) replications for each of the factor combinations. We construct the “response” variable of the ANOVA study as \(A^C_X(T^*)-A^H_X(T^*)\), for \(X=S\) and \(I\). These paired comparisons will tell us for which factor combinations the approximate HSIRS model shows improvement over the CSIRS model. Four ANOVAs showing up to two-way interactions are reported in Tables 4, 5, 6 and 7; they are based on the response variables, \(A^C_S(T^O)-A^H_S(T^O),\,A^C_S(T^M)-A^H_S(T^M),\,A^C_I(T^O)-A^H_I(T^O)\), and \(A^C_I(T^M)-A^H_I(T^M)\), respectively.

Fig. 10
figure 10

Boxplots of differences, \(\left\{ MSPE ^C_S(T^*,l)^{1/4}- MSPE ^H_S(T^*,l)^{1/4}:l=1,\ldots ,L\right\} \), for the susceptible population, at given levels of DI (Data Information), TR (Transmission Rate), and SV (Small-scale Variation). Left panels \(T^*=T^O\); right panels \(T^*=T^M\). Upper panels DI \(=\) 0; lower panels DI \(=\) 1. The x axis labels are ‘ab’, where ‘a’ (=0 and 1) represents the level for the factor TR, and ‘b’ (=0 and 1) represents the level for the factor SV. The triangles show the mean of the differences

Fig. 11
figure 11

Boxplots of differences, \(\left\{ MSPE ^C_I(T^*,l)^{1/4}- MSPE ^H_I(T^*,l)^{1/4}:l=1,\ldots ,L\right\} \), for the infectious population, at given levels of DI (Data Information), TR (Transmission Rate), and SV (Small-scale Variation). Left panels \(T^*=T^O\); right panels \(T^*=T^M\). Upper panels given DI \(=\) 0; lower panels given DI \(=\) 1. The x axis label ‘ab’, where ‘a’ (=0 and 1) represents the level for the factor TR; and ‘b’ (=0 and 1) represents the level for the factor SV. The triangles show the mean of the differences

Table 4 Analysis of variance (ANOVA) on \(A^C_S(T^O)-A^H_S(T^O)\) up to two-way interactions
Table 5 Analysis of variance (ANOVA) on \(A^C_I(T^O)-A^H_I(T^O)\) up to two-way interactions
Table 6 Analysis of variance (ANOVA) on \(A^C_S(T^M)-A^H_S(T^M)\) up to two-way interactions
Table 7 Analysis of variance (ANOVA) on \(A^C_I(T^M)-A^H_I(T^M)\) up to two-way interactions

Table 4 gives the paired-comparison ANOVA for the infectious population, for \(T^*=T^O\). It indicates that the main effect of the small-scale variation factor, SV, explains over 65 % of the variability in the response. We also find that the difference, \(A^C_S(T^O)-A^H_S(T^O)\), increases by 83 % when SV \(=\) 1, relative to its value when SV \(=\) 0. The remaining variability is mostly explained by the main effects of DI and TR. Figure 12 supports the results shown in Table 4, where the red and blue lines show that the SVTR, SVDE, and TRDI interactions for \(A^C_S(T^O)-A^H_S(T^O)\) are not obvious. Moreover, from Fig. 12, we see that for all factor combinations, the approximate HSIRS model always outperforms the CSIRS model, for \(T^*=T^O\), since the responses are always greater than zero. Notice that they increase as data information becomes longer (DI \(=\) 1), as SNR becomes smaller (SV \(=\) 1), and as the transmission rate becomes larger (TR \(=\) 1), which are all reasonable results that match our intuition. Clearly, as more variability is introduced into the pandemic and its observed counts, the relative inability of the CSIRS model to handle it becomes more obvious.

Fig. 12
figure 12

Plots showing interaction for the susceptible population, between SV (Small-scale Variation) and TR (Transmission Rate), between SV and DI (Data Information), and between TR and DI in the ANOVA of \(A^C_S(T^*)-A^H_S(T^*)\) (red and blue lines \(T^*=T^O\); green and black lines \(T^*=T^M\)). (Color figure online)

Table 5 gives the paired-comparison ANOVA for the infectious population, again for \(T^*=T^O\). It indicates that SV and TR are the two most important factors; their main effects together explain over 89 % of the variability in the response. By investigating the main effects of SV and TR, we find that the difference, \(A^C_I(T^O)-A^H_I(T^O)\), increases by 85 % when SV \(=\) 1, relative to its value at SV \(=\) 0; and it increases by 81 % when TR \(=\) 1, relative to its value at TR=0. The pattern is broadly consistent with that for the susceptible population, and the responses increase when data information become longer (DI \(=\) 1), as we expect. These results are supported further by Fig. 13, where the red and blue lines show the SVTR, SVDI, and TRDI interactions for \(A^C_I(T^O)-A^H_I(T^O)\); it again indicates no evident interactions. Importantly, Fig. 13 shows that for \(T^*=T^O\) and all factor combinations, the approximate HSIRS model outperforms the CSIRS model.

Fig. 13
figure 13

Plots showing interaction for the infectious population between SV (Small-scale Variation) and TR (Transmission rate), between SV and DI (Data Information), and between TR and DI in the ANOVA of \(A^C_I(T^*)-A^H_I(T^*)\) (red and blue lines: \(T^*=T^O\); green and black lines: \(T^*=T^M\)). (Color figure online)

Forecasting in the period \(T^M\) is of more interest, so we turn our attention to the susceptible populations, Table 6, and the ANOVA of \(A^C_S(T^M)-A^H_S(T^M)\). It indicates that DI is the most dominant, which explains over 60 % of the variability in the response. We also find that the difference, \(A^C_S(T^M)-A^H_S(T^M)\), increases by 145 % when DI \(=\) 1, relative to its value when DI \(=\) 0. The remaining variability is mostly explained by the main effect of SV and its interaction with TR. The green and black lines in Fig. 12 show the SVTR, SVDE, and TRDI interactions for \(A^C_S(T^M)-A^H_S(T^M)\). It supports the results shown in Table 6, namely that the SVTR interaction is more evident than the other two interactions. We see that when TR \(=\) 0, the value of \(A^C_S(T^M)-A^H_S(T^M)\) increases substantially when the SNR becomes smaller (SV=1), in contrast to very little change between the two levels of SV when TR \(=\) 1. From Fig. 12, the values of \(A^C_S(T^M)-A^H_S(T^M)\) are always greater than zero, and they increase as SNR becomes smaller (SV \(=\) 1) or as data information becomes longer (DI \(=\) 1), which matches our intuition. This again indicates that for the susceptible population, the approximate HSIRS model always outperforms the CSIRS model holding all other factors the same, even though the difference is not as pronounced as during the period \(T^O\). Since forecasting is inherently difficult, this is not surprising.

Table 7 gives the paired-comparison ANOVA for the infectious population in the period \(T^M\): the SVTR interaction has the largest impact on the response and explains about 50 % of the variability. The main effects of TR, SV, and the SVDI interaction explain most of the remaining variation. These results are supported further by Fig. 13, where the green and black lines show the SVTR, SVDI, and TRDI interactions for \(A^C_I(T^M)-A^H_I(T^M)\). We see that the SVTR interaction is similar to that for the susceptible population, but it is more evident, because when TR \(=\) 1, the value of \(A^C_I(T^M)-A^H_I(T^M)\) decreases as SNR becomes smaller (SV \(=\) 1). A similar pattern is seen for the DISV interaction; however, the DITR interaction is not as evident as the other two interactions. Notice that at both levels of DI, the values of \(A^C_I(T^M)-A^H_I(T^M)\) are larger when TR is smaller (TR \(=\) 0). Importantly, Fig. 13 shows that for \(T^*=T^M\) and all factor combinations, the values of \(A^C_I(T^M)-A^H_I(T^M)\) are always greater than zero; that is, the approximate HSIRS model always outperforms the CSIRS model.

To investigate further the performances of the approximate HSIRS model and the CSIRS model, we turn our attention to bias [as specified in (71)] and define the fraction of MSPE that can be explained by the squared bias as

$$\begin{aligned} \kappa ^Y_X(T^*,l)\!\equiv \! \frac{BIAS^Y_X(T^*,l) ^2}{ MSPE ^Y_X(T^*,l)}; \,X\!=\!S,I;\quad T^*\!=\!T^O,T^M;\,\, Y\!=\!H,C; \,\, l\!=\!1,\ldots ,L. \end{aligned}$$
(73)

Notice that \(0\le \kappa ^Y_X(T^*,l)\le 1\). The boxplots in Figs. 14 and 15 show the distributions of \(\{\kappa ^H_S(T^*,l)\}\) and \(\{\kappa ^H_I(T^*,l)\}\), respectively, for all factor combinations. For both the susceptible and infectious populations, Figs. 14 and 15 show that the fractions of MSPE explained by the squared bias are much smaller (around 0.1 or less) in the period \(T^O\) than those in the period \(T^M\) (around 0.5), for all factor combinations. Again because forecasting is inherently difficult, this is not surprising. By comparing Figs. 14 and 15 with Figs. 10 and 11, we see that those factor combinations where squared bias is relatively small are the same combinations where the approximate HSIRS model yields much better MSPE than that for the CSIRS model. In particular, Figs. 14 and 15 clearly show that in the period of \(T^O\), the responses decrease (or do not increase) as data information becomes longer (DI \(=\) 1), as SNR becomes smaller (SV \(=\) 1), and as the transmission rate becomes larger (TR \(=\) 1), which supports our conclusion that the approximate HSIRS model has better predictive capability even when more variability is introduced into the pandemic through its observed counts. Figures 14 and 15 indicate the strong interactions among DI, SV, and TR in the period of \(T^M\), which agrees with the conclusions drawn from Figs. 12 and 13.

Fig. 14
figure 14

Boxplots of fractions of MSPE explained by squared bias, \(\left\{ \kappa ^H_S(T^*,l):l=1,\ldots ,L\right\} \), for the susceptible population, at given levels of DI (Data Information), TR (Transmission Rate), and SV (Small-scale Variation). Left panels \(T^*=T^O\); right panels \(T^*=T^M\). Upper panels DI \(=\) 0; lower panels DI \(=\) 1. The x axis labels are ‘ab’, where ‘a’ (=0 and 1) represents the level for the factor TR, and ‘b’ (=0 and 1) represents the level for the factor SV. The triangles show the mean of the fraction

Fig. 15
figure 15

Boxplots of fractions of MSPE explained by squared bias, \(\left\{ \kappa ^H_I(T^*,l):l=1,\ldots ,L\right\} \), for the infectious population, at given levels of DI (Data Information), TR (Transmission Rate), and SV (Small-scale Variation). Left panels \(T^*=T^O\); right panels \(T^*=T^M\). Upper panels given DI \(=\) 0; lower panels given DI \(=\) 1. The x axis label ‘ab’, where ‘a’ (=0 and 1) represents the level for the factor TR; and ‘b’ (=0 and 1) represents the level for the factor SV. The triangles show the mean of the fractions

6 Discussion and conclusions

In this article, we develop a Bayesian hierarchical SIRS (HSIRS) model that captures the various sources of uncertainties in modeling infectious diseases such as seasonal or pandemic influenza. Important features of our HSIRS model are that it preserves mass balance on the (hidden) true counts rather than on the observed counts, and that the dynamical process is modeled on a log-odds-ratio scale. Furthermore, our approach captures the stochastic and discrete nature of the epidemic process, as well as keeping the SIRS flow that underlies the CSIRS model.

In Sect. 4, we simulated two datasets, an HSIRS dataset and a modified-CSIRS dataset, where we assumed that data were available on the first 35 days and missing on the following 10 days. Then we used an MCMC algorithm to fit the HSIRS model to each of the two datasets; for computational efficiency, a well calibrated linear approximation was used. We saw that the approximate HSIRS (ASIRS) model was a very good approximation, and that accounting for all known uncertainties led to a superior performance over the deterministic classic SIRS (CSIRS) model.

In Sect. 5, a carefully designed simulation experiment at various levels of various factors with sufficient replication is presented. It allows us to conclude that the approximate HSIRS model offers an accurate and computationally efficient approach to analyzing infectious-disease data. The comparisons given there clearly show that the HSIRS model is better, according to both the MSPE and bias criteria, than the CSIRS model, during either the observation period or the forecast period for all the conditions tested.

In this research, the HSIRS model is based on assuming a constant population for a short period of time; however, the model is not restricted to this assumption and can be extended to include population turnover that handles long-term influenza dynamics. For example, Eqs. (21)–(23) in Sect. 2.2 could instead be derived from an SIRS flow with birth and death rates (see the SIR flow incorporating birth and death rates given in Anderson and May 1991). This extension is useful because recovery from seasonal influenza may give immunity to that specific pathogen, but this immunity can be lost over the years due to the evolution of the virus (e.g., Dushoff et al. 2004). In ongoing research, we are investigating more complicated epidemic dynamics that not only incorporate birth, death, but also emigration/immigration processes for appropriate time periods. Also, spatial (e.g., Hooten and Wikle 2010; Oleson and Wikle 2013) and multivariate (e.g., Zhuang et al. 2013) aspects could be incorporated into these hierarchical dynamical models through vector-valued processes, although the form of such models would require careful integration of the aforementioned emigration/immigration processes.