Automated Importance Sampling via Optimal Control for Stochastic Reaction Networks: A Markovian Projection–based Approach

Chiheb Ben Hammouda Utrecht University, Mathematical Institute, 3584 CD Utrecht, The Netherlands. Nadhir Ben Rached University of Leeds, School of Mathematics, Woodhouse, Leeds LS2 9JT, UK. Raúl Tempone King Abdullah University of Science and Technology (KAUST), Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), Thuwal 23955-6900, Saudi Arabia. RWTH Aachen University, Alexander von Humboldt Professor in Mathematics for Uncertainty Quantification, 52062 Aachen, Germany. Sophia Wiechert Corresponding author: wiechert@uq.rwth-aachen.de RWTH Aachen University, Chair of Mathematics for Uncertainty Quantification, Pontdriesch 14-16, 52062 Aachen, Germany.

Abstract

We propose a novel alternative approach to our previous work (Ben Hammouda et al., 2023) to improve the efficiency of Monte Carlo (MC) estimators for rare event probabilities for stochastic reaction networks (SRNs). In the same spirit of (Ben Hammouda et al., 2023), an efficient path-dependent measure change is derived based on a connection between determining optimal importance sampling (IS) parameters within a class of probability measures and a stochastic optimal control formulation, corresponding to solving a variance minimization problem. In this work, we propose a novel approach to address the encountered curse of dimensionality by mapping the problem to a significantly lower-dimensional space via a Markovian projection (MP) idea. The output of this model reduction technique is a low-dimensional SRN (potentially even one dimensional) that preserves the marginal distribution of the original high-dimensional SRN system. The dynamics of the projected process are obtained by solving a related optimization problem via a discrete $L^{2}$ regression. By solving the resulting projected Hamilton–Jacobi–Bellman (HJB) equations for the reduced-dimensional SRN, we obtain projected IS parameters, which are then mapped back to the original full-dimensional SRN system, resulting in an efficient IS-MC estimator for rare events probabilities of the full-dimensional SRN. Our analysis and numerical experiments reveal that the proposed MP-HJB-IS approach substantially reduces the MC estimator variance, resulting in a lower computational complexity in the rare event regime than standard MC estimators.

Keywords: stochastic reaction networks, tau-leap, importance sampling, stochastic optimal control, Markovian projection, rare event.

1 Introduction

This paper proposes an efficient estimator for rare event probabilities for a particular class of continuous-time Markov processes, stochastic reaction networks (SRNs). We design an automated importance sampling (IS) approach based on the approximate explicit tau-leap (TL) scheme to build a Monte Carlo (MC) estimator for rare event probabilities of SRNs. The used IS change of measure was introduced in [11], wherein the optimal IS controls were determined via a stochastic optimal control (SOC) formulation. In that same work, we also presented a learning-based approach to avoid the curse of dimensionality. Building on that work, we propose an alternative method for high-dimensional SRNs that leverages dimension reduction through Markovian projection (MP) and then recovers the optimal IS controls of the full-dimensional SRNs as a mapping from the solution in lower-dimensional space, potentially one. To the best of our knowledge, we are the first to establish the MP framework for the SRN setting to solve an IS problem.

An SRN (refer to Section 1.1 for a brief introduction and [9] for more details) describes the time evolution of a set of species through reactions and can be found in a wide range of applications, such as biochemical reactions, epidemic processes [15, 5], and transcription and translation in genomics and virus kinetics [48, 32]. For a $d$ -dimensional SRN, $\mathbf{X}:[0,T]\rightarrow\mathbb{N}^{d}$ , with the given final time $T>0$ , we aim to determine accurate and computationally efficient MC estimations for the expected value $\mathbb{E}[g(\mathbf{X}(T))]$ . The observable $g:\mathbb{N}^{d}\to\mathbb{R}$ is a given scalar function of $\mathbf{X}$ , where indicator functions $g(\mathbf{x})=\mathbbm{1}_{\{\mathbf{x}\in\mathcal{B}\}}$ are of interest to estimate the rare event probability $\mathbb{P}(\mathbf{X}(T)\in\mathcal{B})\ll 1$ , where $\mathcal{B}\subset\mathbb{R}^{d}$ .

The quantity of interest, $\mathbb{E}[g(\mathbf{X}(T))]$ , is the solution to the corresponding Kolmogorov backward equations [8]. Solving these ordinary differential equations (ODEs) in closed form is infeasible for most SRNs; thus, numerical approximations based on discretized schemes are used to derive solutions. A drawback of these approaches is that, without using dimension reduction techniques, the computational cost scales exponentially with the number of species $d$ . To avoid the curse of dimensionality, we propose estimating $\mathbb{E}[g(\mathbf{X}(T))]$ using MC methods.

Numerous schemes have been developed to simulate the exact sample paths of SRNs. These include the stochastic simulation algorithm introduced by Gillespie in [26] and the modified next reaction method proposed by Anderson in [3]. However, when SRNs involve reaction channels with high reaction rates, simulating exact realizations of the system can be computationally expensive. To address this issue, Gillespie [27] and Aparicio and Solari [6] independently proposed the explicit-TL method (see Section 1.2), which approximates the paths of $\mathbf{X}$ by evolving the process with fixed time steps while maintaining constant reaction rates within each time step. Additionally, other simulation schemes have been proposed to handle situations with well-separated fast and slow time scales [16, 45, 1, 2, 42, 12].

In order to compute MC estimates of $\mathbb{E}[g(\mathbf{X}(T))]$ more efficiently, different variance reduction techniques have been proposed in the context of SRNs. In the spirit of the multilevel MC (MLMC) idea [22, 23], various MLMC-based methods [4, 38, 42, 12, 10] have been introduced to overcome different challenges in this context. Moreover, as the naive MC and MLMC estimators have high computational costs when used for estimating rare event probabilities, different IS approaches [36, 25, 47, 19, 17, 24, 46, 11] have been proposed.

To estimate various statistical quantities efficiently for SRNs (specifically rare event probabilities), we use the path-dependent IS approach originally introduced in [11]. This class of probability measure change is based on modifying the rates of the Poisson random variables used to construct the TL paths. In [11], it is shown how optimal IS controls are obtained by minimizing the second moment of the IS estimator (equivalently, the variance), representing the cost function of the associated SOC problem, and that the corresponding value function solves a dynamic programming relation (see Section 2.1 for revising these results). In this work, we generalize the discrete-time dynamic programming relation by a set of continuous-time ODEs, the Hamilton–Jacobi–Bellman (HJB) equations, allowing the formulation of optimal IS controls in continuous time. Compared to the discrete-time IS control formulation presented in [11], the continuous-time formulation offers the advantage that it provides a curve of IS controls over time instead of a discrete set. This allows its application for any time stepping in the IS-TL paths and thereby eliminates the need for ad-hoc interpolations often needed in the discrete setting.

In the multidimensional setting, the cost of solving the backward HJB equations increases exponentially with respect to the dimension $d$ (curse of dimensionality). In [11], we proposed a learning-based approach to reduce this effect. In that approach, the value function is approximated using an ansatz function, the parameters of which are learned through a stochastic optimization algorithm (see Figure 1.1 for a schematic illustration of the approach). In this work, we present an alternative method using a dimension reduction approach for SRNs (see Figure 1.2 for a schematic illustration of the approach). The proposed methodology is to adapt the MP idea originally introduced in [29] for the setting of diffusion-type stochastic differential equations (SDEs) to the SRN framework, resulting in a significantly lower-dimensional process, preserving the marginal distribution of the original full-dimensional SRN. The propensities characterizing the lower-dimensional MP process can be approximated using $L^{2}$ regression. Using the resulting low-dimensional SRN, we derive an approximate value function and, consequently, near-optimal IS controls while reducing the effect of the curse of dimensionality. By mapping the IS controls to the original full-dimensional SRNs, we derive an unbiased IS-MC estimator for the TL scheme. Compared to the learning-based approach presented in [11], this novel MP-IS approach eliminates the need for an ansatz function to model the value function. This approach allows its application to general observables $g$ that differ from indicator functions for rare event estimation, because no prior knowledge regarding the shape of the value function and suitable ansatz functions is required.

To the best of our knowledge, we are the first to establish the MP idea for SRNs and apply it to derive an efficient pathwise IS for MC methods. Initially, the MP idea was introduced for Itô stochastic processes in [34, 29] and was later generalized to martingales and semimartingales [35, 14]. In addition, MP has been widely applied for dimension reduction in SDEs [37], particularly in financial applications [43, 20]. For instance, in [7], solving HJB equations for an MP process was pursued but in the setting of Itô SDEs with the application of pricing American options. In [31], MP was used for control problems and IS problems for rare events in high-dimensional diffusion processes with multiple time scales. In this work, we introduce the general dimension reduction framework of MP for SRNs such that it can be applied to other problems beyond the selected IS application. (e.g., solving the chemical master equation [40] or the Kolmogorov backward equations [41]).

The remainder of this work is organized as follows. Sections 1.1, 1.2, 1.3, and 1.4 recall the relevant SRN, TL, MC, and IS notations and definitions from [11]. Next, Section 2 reviews the connection between IS and SOC by introducting the IS scheme, the value function, and the corresponding dynamic programming theorem from [11] in Section 2.1. Then, Section 2.2 extends the framework to a continuous-time formulation leading to the continuous-time value function and deriving the corresponding HJB equations. Section 3 presents the MP technique for SRNs and shows how the projected dynamics can be computed using $L^{2}$ regression. Next, Section 4 addresses the curse of dimensionality of high-dimensional SRNs occurring from the optimal IS scheme in Section 2 by combining the IS scheme with MP (Section 3) to derive near-optimal IS controls. Finally, Section 5 presents the numerical results for the rare event probability estimation to demonstrate the efficiency of the proposed MP-IS approach compared to a standard TL-MC estimator.

Refer to caption — Figure 1.1: Schematic diagram of the learning-based approach in [11].

1.1 Stochastic Reaction Networks (SRNs)

We recall from [11] that an SRN describes the time evolution for a homogeneously mixed chemical reaction system, in which $d$ distinct species interact through $J$ reaction channels. Each reaction channel $\mathcal{R}_{j}$ , $j=1\dots,J$ , is given by the relation

\alpha_{j,1}S_{1}+\dots+\alpha_{j,d}S_{d}\overset{\theta_{j}}{\rightarrow}% \beta_{j,1}S_{1}+\dots+\beta_{j,d}S_{d},

(1.1)

where $\alpha_{j,i}$ molecules of species $S_{i}$ are consumed and $\beta_{j,i}$ molecules are produced. The positive constants $\{\theta_{j}\}_{j=1}^{J}$ represent the reaction rates.

This process can be modeled by a Markovian pure jump process, $\mathbf{X}:[0,T]\times\Omega\to\mathbb{N}^{d}$ , where ( $\Omega$ , $\mathcal{F}$ , $\mathbb{P}$ ) is a probability space. We are interested in the time evolution of the state vector,

\mathbf{X}(t)=\left(X_{1}(t),\ldots,X_{d}(t)\right)\in\mathbb{N}^{d},

(1.2)

where the $i$ -th component, $X_{i}(t)$ , describes the abundance of the $i$ th species present in the system at time $t$ . The process $\mathbf{X}$ is a continuous-time, discrete-space Markov process characterized by Kurtz’s random time change representation [21]:

\mathbf{X}(t)=\mathbf{x}_{0}+\sum_{j=1}^{J}Y_{j}\left(\int_{0}^{t}a_{j}(% \mathbf{X}(s))\,ds\right)\boldsymbol{\nu}_{j},

(1.3)

where $Y_{j}:\mathbb{R}_{+}{\times}\Omega\to\mathbb{N}$ are independent unit-rate Poisson processes and the stoichiometric vector is defined as $\boldsymbol{\nu}_{j}=\left(\beta_{j,1}-\alpha_{j,1},\dots,\beta_{j,d}-\alpha_{% j,d}\right)\in\mathbb{Z}^{d}$ .

For each reaction channel $\mathcal{R}_{j}$ , the propensity function $a_{j}:\mathbb{N}^{d}\rightarrow\mathbb{R}_{+}$ obeys the non-negativity assumption (i.e. the system can not attain negative states), which is that $a_{j}(\mathbf{X})=0$ for $\mathbf{x}$ such that $\mathbf{X}+\mathbf{\nu}_{j}\notin\mathbb{N}^{d}$ . In our numerical simulations, we consider a propensity derived from the stochastic mass-action kinetic principle

a_{j}(\mathbf{x}):=\theta_{j}\prod_{i=1}^{d}\frac{x_{i}!}{(x_{i}-\alpha_{j,i})% !}\mathbf{1}_{\{x_{i}\geq\alpha_{j,i}\}},

(1.4)

where $x_{i}$ is the counting number for species $S_{i}$ . However, the approach presented in this work is not restricted to the particular structure of the propensity function in (1.4) (see Remark 3.4).

1.2 Explicit Tau-Leap Approximation

The explicit-TL scheme is a pathwise approximate method based on Kurtz’s random time change representation (1.3) [27, 6]. It was originally introduced to overcome the computational drawbacks of exact methods, which become computationally expensive when many reactions fire during a short time interval. For a uniform time mesh $\{t_{0}=0,t_{1},...,t_{N}=T\}$ with step size $\Delta t=\frac{T}{N}$ and a given initial value $\mathbf{X}(0)=\mathbf{x}_{0}$ , the explicit-TL approximation for $\mathbf{X}$ is defined by

	$\displaystyle\widehat{\mathbf{X}}_{0}$	$\displaystyle:=\mathbf{x}_{0}$
	$\displaystyle\widehat{\mathbf{X}}^{\Delta t}_{k}$	$\displaystyle:=\max\left(\textbf{0},\widehat{\mathbf{X}}^{\Delta t}_{k-1}+\sum% _{j=1}^{J}\mathcal{P}_{k-1,j}\left(a_{j}(\widehat{\mathbf{X}}^{\Delta t}_{k-1}% )\Delta t\right)\boldsymbol{\nu}_{j}\right),\>1\leq k\leq N,$		(1.5)

where $\{\mathcal{P}_{k,j}(r_{k,j})\}_{\{1\leq j\leq J\}}$ are independent Poisson random variables with respective rates $r_{k,j}:=a_{j}(\widehat{\mathbf{X}}^{\Delta t}_{k})\Delta t$ conditioned on the current state $\widehat{\mathbf{X}}^{\Delta t}_{k}$ . The maximum in (1.2) is applied entry-wise. In each TL step, the current state is projected to zero to prevent the process from exiting the lattice (i.e., producing negative values).

1.3 Biased Monte Carlo estimator

We let $\mathbf{X}$ be an SRN and $g:\mathbb{R}^{d}\rightarrow\mathbb{R}$ be a scalar observable. For a given final time $T$ , we estimate $\mathbb{E}\left[g(\mathbf{X}(T))\right]$ using the standard MC-TL estimator:

\mu_{M}:=\frac{1}{M}\sum_{m=1}^{M}g(\widehat{\mathbf{X}}^{\Delta t}_{[m]}(T)),

(1.6)

where $\{\widehat{\mathbf{X}}^{\Delta t}_{[m]}(T)\}_{m=1}^{M}$ are independent TL samples.

The global error for the proposed MC estimator has the following error decomposition:

\displaystyle\left|\mathbb{E}[g(\mathbf{X}(T))]-\mu_{M}\right|\leq\underbrace{% \left|\mathbb{E}[g(\mathbf{X}(T))]-\mathbb{E}[g(\widehat{\mathbf{X}}^{\Delta t% }(T))]\right|}_{\text{Bias}}+\underbrace{\left|\mathbb{E}[g(\widehat{\mathbf{X% }}^{\Delta t}(T))]-\mu_{M}\right|}_{\text{Statistical Error}}.

(1.7)

Under some assumptions, the TL scheme has a weak order, ${\mathcal{O}}\left(\Delta t\right)$ [39], that is, for sufficiently small $\Delta t$ ,

\displaystyle\left|\mathbb{E}\left[g(\mathbf{X}(T))-g(\widehat{\mathbf{X}}^{% \Delta t}(T))\right]\right|\leq C\Delta t

(1.8)

where $C>0$ .

The bias and statistical error can be bound equally using $\frac{TOL}{2}$ to achieve the desired accuracy, TOL, with a confidence level of $1-\alpha$ for $\alpha\in(0,1)$ , which can be achieved by the step size:

\displaystyle\Delta t(\text{TOL})=\frac{\text{TOL}}{2\cdot C}

(1.9)

and

\displaystyle M^{*}(\text{TOL})=C_{\alpha}^{2}\frac{4\cdot\text{Var}[g(% \widehat{\mathbf{X}}^{\Delta t}(T))]}{\text{TOL}^{2}}

(1.10)

sample paths, where the constant $C_{\alpha}$ is the $(1-\frac{\alpha}{2})-$ quantile for the standard normal distribution. We select $C_{\alpha}=1.96$ for a $95\%$ confidence level corresponding to $\alpha=0.05$ .

When estimating rare event probabilities, we are interested in the relative error

\displaystyle\frac{\left|\mathbb{E}[g(\mathbf{X}(T))]-\mu_{M}\right|}{\left|% \mathbb{E}[g(\mathbf{X}(T))]\right|}.

In this context, to achieve a prescribed relative tolerance $\text{TOL}_{rel}$ , we use step size

\displaystyle\Delta t_{rel}(\text{TOL}_{rel})=\frac{\text{TOL}_{rel}\left|% \mathbb{E}[g(\mathbf{X}(T))]\right|}{2\cdot C}

(1.11)

and

\displaystyle M^{*}_{rel}(\text{TOL}_{rel})=C_{\alpha}^{2}\frac{4\cdot\text{% Var}[g(\widehat{\mathbf{X}}^{\Delta t}(T))]}{\text{TOL}_{rel}^{2}\left|\mathbb% {E}[g(\mathbf{X}(T))]\right|^{2}}

(1.12)

sample paths.

Given that the computational cost to simulate a single path is ${\mathcal{O}}\left({\Delta t}^{-1}\right)$ , the expected total computational complexity is ${\mathcal{O}}\left(\text{TOL}^{-3}\right)$ and ${\mathcal{O}}\left(\text{TOL}_{rel}^{-3}\right)$ for the absolute and relative errors, respectively.

1.4 Importance Sampling

Using IS techniques [36, 25, 24, 19, 17, 24, 46] can improve the computational costs for the crude MC estimator through variance reduction in (1.10). For a general motivation, we refer to [11] Section 1.4. For illustrating the IS method, let us consider the general problem of estimating $\mathbb{E}[g(Y)]$ , where $g$ is a given observable and $Y$ is a random variable taking values in $\mathbb{R}$ with the probability density function $\rho_{Y}$ . We let $\widehat{\rho}_{Z}$ be the probability density function for an auxiliary real random variable $Z$ . The MC estimator under the IS measure is

\displaystyle\mu_{M}^{IS}=\frac{1}{M}\sum_{j=1}^{M}L(Z_{[j]})\cdot g(Z_{[j]}),

(1.13)

where $Z_{[j]}$ are independent and identically distributed samples from $\widehat{\rho}_{Z}$ for $j=1,\dots,M$ and the likelihood factor is given by $L(Z_{[j]}):=\frac{\rho_{Y}(Z_{[j]})}{\widehat{\rho}_{Z}(Z_{[j]})}$ . The IS estimator retains the expected value of (1.6) (i.e., $\mathbb{E}[L(Z)g(Z)]=\mathbb{E}[g(Y)]$ ), but the variance can be reduced due to a different second moment $\mathbb{E}\left[\left(L(Z)\cdot g(Z)\right)^{2}\right]$ .

Determining an auxiliary probability measure that substantially reduces the variance compared with the original measure is challenging and strongly depends on the structure of the considered problem. In addition, the derivation of the new measure must come with a moderate additional computational cost to ensure an efficient IS scheme. This work uses the path-dependent change of probability measure introduced in [11], employing an IS measure derived from changing the Poisson random variable rates in the TL paths. Section 2.1 recalls the SOC formulation for optimal IS parameters from [11] and extends it with a novel HJB formulation. We conclude this consideration in Section 4, combining the IS scheme with a dimension reduction approach to reduce the computational cost.

2 Importance Sampling via Stochastic Optimal Control Formulation

2.1 Dynamic Programming for Importance Sampling Parameters

This section revisits the connection between optimal IS measure determination within a class of probability measures, and the SOC formulated originally in [11]. We let $\mathbf{X}$ be an SRN as defined in Section 1.1 and let $\widehat{\mathbf{X}}^{\Delta t}$ denote its TL approximation as given by (1.2). Then, the goal is to derive a near-optimal IS measure to estimate $\mathbb{E}\left[g(\mathbf{X}(T))\right]$ . We limit ourselves to the parameterized class of IS schemes used in [10, 11]:

	$\displaystyle\overline{\mathbf{X}}_{n+1}^{\Delta t}$	$\displaystyle=\max\left(\textbf{0},\overline{\mathbf{X}}_{n}^{\Delta t}+\sum_{% j=1}^{J}\overline{P}_{n,j}\boldsymbol{\nu}_{j}\right),~{}~{}~{}n=0,\dots,N-1,$		(2.1)
	$\displaystyle\overline{\mathbf{X}}_{0}^{\Delta t}$	$\displaystyle=\mathbf{x}_{0},$

where the measure change is obtained by modifying the Poisson random variable rates of the TL paths:

\overline{P}_{n,j}=\overline{\mathcal{P}}_{n,j}\left(\delta_{n,j}^{\Delta t}(% \overline{\mathbf{X}}^{\Delta t}_{n})\Delta t\right),~{}~{}~{}n=0,\dots,N-1,j=% 1,\dots,J.

(2.2)

In (2.2), $\delta_{n,j}^{\Delta t}(\mathbf{x})\in\mathcal{A}_{\mathbf{x},j}$ is the control parameter at time step $n$ , under reaction $j$ , and in state $\mathbf{x}\in\mathbb{N}^{d}$ . In addition, $\{\overline{\mathcal{P}}_{n,j}(r_{n,j})\}_{\{1\leq j\leq J\}}$ are independent Poisson random variables, conditioned on $\overline{\mathbf{X}}^{\Delta t}_{n}$ , with the respective rates $r_{n,j}:=\delta_{n,j}^{\Delta t}(\overline{\mathbf{X}}^{\Delta t}_{n})\Delta t$ . The set of admissible controls is

\displaystyle\mathcal{A}_{\mathbf{x},j}=\begin{cases}\{0\}&,\text{if }a_{j}(% \mathbf{x})=0\\ \{y\in\mathbb{R}:y>0\}&,\text{otherwise}.\end{cases}

(2.3)

In the following, we use the vector notation $\left(\boldsymbol{\delta}_{n}^{\Delta t}(\mathbf{x})\right)_{j}:=\delta_{n,j}^% {\Delta t}(\mathbf{x})$ and $\left(\overline{\mathbf{P}}_{n}\right)_{j}:=\overline{P}_{n,j}$ for $j=1,\dots,J$ .

The corresponding likelihood ratio of the path $\{\overline{\mathbf{X}}^{\Delta t}_{n}:n=0,\dots,N\}$ for the IS parameters $\boldsymbol{\delta}_{n}^{\Delta t}(\mathbf{x})\in\times_{j=1}^{J}\mathcal{A}_{% \mathbf{x},j}$ is

L\left(\left(\overline{\mathbf{P}}_{0},\dots,\overline{\mathbf{P}}_{N-1}\right% ),\left(\boldsymbol{\delta}_{0}^{\Delta t}(\overline{\mathbf{X}}^{\Delta t}_{0% }),\dots,\boldsymbol{\delta}_{N-1}^{\Delta t}(\overline{\mathbf{X}}^{\Delta t}% _{N-1})\right)\right)=\prod_{n=0}^{N-1}L_{n}(\overline{\mathbf{P}}_{n},% \boldsymbol{\delta}_{n}^{\Delta t}(\overline{\mathbf{X}}^{\Delta t}_{n})),

(2.4)

where the likelihood ratio update at time step $n$ is

	$\displaystyle L_{n}(\overline{\mathbf{P}}_{n},\boldsymbol{\delta}_{n}^{\Delta t% }(\overline{\mathbf{X}}^{\Delta t}_{n}))$	$\displaystyle=\prod_{j=1}^{J}\exp\left(-(a_{j}(\overline{\mathbf{X}}_{n}^{% \Delta t})-\delta_{n,j}^{\Delta t}(\overline{\mathbf{X}}^{\Delta t}_{n}))% \Delta t\right)\left(\frac{a_{j}(\overline{\mathbf{X}}_{n}^{\Delta t})}{\delta% _{n,j}^{\Delta t}(\overline{\mathbf{X}}^{\Delta t}_{n})}\right)^{\overline{P}_% {n,j}}$
		$\displaystyle=\exp\left(-\left(\sum_{j=1}^{J}a_{j}(\overline{\mathbf{X}}_{n}^{% \Delta t})-\delta_{n,j}^{\Delta t}(\overline{\mathbf{X}}^{\Delta t}_{n})\right% )\Delta t\right)\cdot\prod_{j=1}^{J}\left(\frac{a_{j}(\overline{\mathbf{X}}_{n% }^{\Delta t})}{\delta_{n,j}^{\Delta t}(\overline{\mathbf{X}}^{\Delta t}_{n})}% \right)^{\overline{P}_{n,j}}.$		(2.5)

To simplify the notation, we use the convention that $\frac{a_{j}(\overline{\mathbf{X}}_{n}^{\Delta t})}{\delta_{n,j}^{\Delta t}(% \overline{\mathbf{X}}^{\Delta t}_{n})}=1$ , whenever both $a_{j}(\overline{\mathbf{X}}_{n}^{\Delta t})=0$ and $\delta_{n,j}^{\Delta t}(\overline{\mathbf{X}}^{\Delta t}_{n})=0$ in (2.1). From (2.3), this results in a factor of $1$ in the likelihood ratio for reactions where $a_{j}(\overline{\mathbf{X}}_{n}^{\Delta t})=0$ .

Using the introduced change of measure (2.2), the quantity of interest can be expressed with respect to the new measure:

\mathbb{E}[g(\widehat{\mathbf{X}}^{\Delta t}_{N})]=\mathbb{E}\left[L\left(% \left(\overline{\mathbf{P}}_{0},\dots,\overline{\mathbf{P}}_{N-1}\right),\left% (\boldsymbol{\delta}_{0}^{\Delta t}(\overline{\mathbf{X}}^{\Delta t}_{0}),% \dots,\boldsymbol{\delta}_{N-1}^{\Delta t}(\overline{\mathbf{X}}^{\Delta t}_{N% -1})\right)\right)\cdot g(\overline{\mathbf{X}}^{\Delta t}_{N})\right],

(2.6)

with the expectation on the right-hand side of (2.6) taken with respect to the dynamics in (2.1).

Next, we recall the connection between the optimal second moment minimizing IS parameters $\{\boldsymbol{\delta}_{n}^{\Delta t}(\mathbf{x})\}_{n=0,\dots,N-1;\mathbf{x}% \in\mathbb{N}^{d}}$ and the corresponding discrete-time dynamic programming relation from [11]. We revisit the definition of the discrete-time value function $u_{\Delta t}(\cdot,\cdot)$ in Definition 2.1, allowing the formulation of the dynamic programming equations in Theorem 2.2. The proof and further details for Theorem 2.2 are provided in [11].

Definition 2.1 (Value function).

For a given $\Delta t>0$ , the discrete-time value function $u_{\Delta t}(\cdot,\cdot)$ is defined as the optimal (infimum) second moment for the proposed IS estimator. For time step $0\leq n\leq N$ and state $\mathbf{x}\in\mathbb{N}^{d}$ ,

\displaystyle u_{\Delta t}(n,\mathbf{x})

\displaystyle=\inf_{\{\boldsymbol{\delta}^{\Delta t}_{k}\}_{k=n,\dots,N-1}\in% \mathcal{A}^{N-n}}\mathbb{E}\left[g^{2}\left(\overline{\mathbf{X}}_{N}^{\Delta t% }\right)\prod_{k=n}^{N-1}L_{k}^{2}\left(\overline{\mathbf{P}}_{k},\boldsymbol{% \delta}_{k}^{\Delta t}(\overline{\mathbf{X}}_{k}^{\Delta t})\right)\middle|% \overline{\mathbf{X}}_{n}^{\Delta t}=\mathbf{x}\right],

(2.7)

where $\mathcal{A}=\bigtimes_{\mathbf{x}\in\mathbb{N}^{d}}\bigtimes_{j=1}^{J}\mathcal% {A}_{\mathbf{x},j}\in\mathbb{R}^{\mathbb{N}^{d}\times J}$ is the admissible set for the IS parameters, and $u_{\Delta t}(N,\mathbf{x})=g^{2}(\mathbf{x})$ , for any $\mathbf{x}\in\mathbb{N}^{d}$ .

Theorem 2.2 (Dynamic programming for IS parameters [11]).

For $\mathbf{x}\in\mathbb{N}^{d}$ and the given step size $\Delta t>0$ , the discrete-time value function $u_{\Delta t}(n,\mathbf{x})$ fulfills the dynamic programming relation:

$\displaystyle u_{\Delta t}(N,\mathbf{x})$	$\displaystyle=g^{2}(\mathbf{x})$
$\displaystyle\text{and for }n$	$\displaystyle=N-1,\dots,0,\>\text{and}\>\mathcal{A}_{\mathbf{x}}:=\bigtimes_{j% =1}^{J}\mathcal{A}_{\mathbf{x},j},$
$\displaystyle u_{\Delta t}(n,\mathbf{x})$	$\displaystyle=\inf_{\boldsymbol{\delta}_{n}^{\Delta t}(\mathbf{x})\in\mathcal{% A}_{\mathbf{x}}}\exp\left(\left(-2\sum_{j=1}^{J}a_{j}(\mathbf{x})+\sum_{j=1}^{% J}\delta_{n,j}^{\Delta t}(\mathbf{x})\right)\Delta t\right)$	(2.8)
	$\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\times\sum_{\mathbf{p}\in% \mathbb{N}^{J}}\left(\prod_{j=1}^{J}\frac{(\Delta t\cdot\delta_{n,j}^{\Delta t% }(\mathbf{x}))^{p_{j}}}{p_{j}!}(\frac{a_{j}(\mathbf{x})}{\delta_{n,j}^{\Delta t% }(\mathbf{x})})^{2p_{j}}\right)\cdot u_{\Delta t}(n+1,\max(\mathbf{0},\mathbf{% x}+\boldsymbol{\nu}\mathbf{p})),$

where $\boldsymbol{\nu}=\left(\boldsymbol{\nu}_{1},\dots,\boldsymbol{\nu}_{J}\right)% \in\mathbb{Z}^{d\times J}$ .

Analytically solving the minimization problem (2.2) is challenging due to the infinite sum. In [11], the problem is solved by approximating the value function (2.7) using a truncated Taylor expansion of the dynamic programming (2.2). To overcome the curse of dimensionaliy, a learning-based approach for the value function was proposed. Instead, in this work, we utilize a continuous-time SOC formulation, leading to a set of coupled $d$ -dimensional ODEs, the HJB equations (refer to Section 2.2). We deal with the curse of dimensionality issue by using a dimension reduction technique, namely the MP, as explained in Section 3.

2.2 Derivation of Hamilton–Jacobi–Bellman (HJB) Equations

In Corollary 2.3, the discrete-time dynamic programming relation in Theorem 2.2 is replaced by its analogous continuous-time relation, resulting in a set of ODEs known as the HJB equations. The continuous-time value function $\tilde{u}(\cdot,\mathbf{x}):[0,T]\rightarrow\mathbb{R}$ , $\mathbf{x}\in\mathbb{N}^{d}$ , is the limit of the discrete value function $u_{\Delta t}(\cdot,\mathbf{x})$ as the step size $\Delta t$ approaches zero. In addition, the IS controls $\boldsymbol{\delta}(\cdot,\mathbf{x}):[0,T]\rightarrow\mathcal{A}_{\mathbf{x}}$ become time-continuous curves for $\mathbf{x}\in\mathbb{N}^{d}$ .

Corollary 2.3 (HJB equations for IS parameters).

For all $\mathbf{x}\in\mathbb{N}^{d}$ , the continuous-time value function $\tilde{u}(t,\mathbf{x})$ fulfills (2.3) for $t\in[0,T]$ :

	$\displaystyle\tilde{u}(T,\mathbf{x})$	$\displaystyle=g^{2}(\mathbf{x})$
	$\displaystyle-\frac{d\tilde{u}}{dt}(t,\mathbf{x})$	$\displaystyle=\inf_{\boldsymbol{\delta}(t,\mathbf{x})\in\mathcal{A}_{\mathbf{x% }}}\left(-2\sum_{j=1}^{J}a_{j}(\mathbf{x})+\sum_{j=1}^{J}\delta_{j}(t,\mathbf{% x})\right)\tilde{u}(t,\mathbf{x})+\sum_{j=1}^{J}\frac{a_{j}(\mathbf{x})^{2}}{% \delta_{j}(t,\mathbf{x})}\tilde{u}\left(t,\max\left(0,\mathbf{x}+\nu_{j}\right% )\right),$		(2.9)

where $\delta_{j}(t,\mathbf{x}):=\left(\boldsymbol{\delta}(t,\mathbf{x})\right)_{j}$ .

Proof.

The proof of the corollary is presented in Appendix A. ∎

If $\tilde{u}(t,\mathbf{x})>0$ for all $\mathbf{x}\in\mathbb{N}^{d}$ and $t\in[0,T]$ , we can solve the minimization problem in (2.3) in closed form, such that the optimal controls are given by

\displaystyle\tilde{\delta}_{j}(t,\mathbf{x})

\displaystyle=a_{j}(\mathbf{x})\sqrt{\frac{\tilde{u}\left(t,\max\left(0,% \mathbf{x}+\nu_{j}\right)\right)}{\tilde{u}\left(t,\mathbf{x}\right)}}

(2.10)

and (2.3) simplifies to

\displaystyle\frac{d\tilde{u}}{dt}(t,\mathbf{x})

\displaystyle=-2\sum_{j=1}^{J}a_{j}(\mathbf{x})\left(\sqrt{\tilde{u}(t,\mathbf% {x})\tilde{u}(t,\max(0,\mathbf{x}+\nu_{j}))}-\tilde{u}(t,\mathbf{x})\right).

(2.11)

To estimate rare event probabilities with an observable $g(\mathbf{x})=\mathbbm{1}_{\{x_{i}>\gamma\}}$ , we encounter $\tilde{u}(t,\mathbf{x})=0$ for some $\mathbf{x}\in\mathbb{N}^{d}$ ; therefore, we modify (2.3) by approximating the final condition $g(\mathbf{x})$ using a sigmoid:

\displaystyle\tilde{g}(\mathbf{x})=\frac{1}{1+\exp(-b-\beta x_{i})}>0

(2.12)

with appropriately chosen parameters $b\in\mathbb{R}$ and $\beta\in\mathbb{R}$ . By incorporating the modified final condition, we obtain an approximate value function by solving (2.11) using an ODE solver (e.g. ode23s from MATLAB). When using the numerical solver, we truncate the infinite state space $\mathbb{N}^{\overline{d}}$ using sufficiently large upper bounds. The approximated near-optimal IS controls are then expressed by (2.10). By the truncation of the infinite state space and the approximation of the final condition $g$ by $\tilde{g}$ , we introduce a bias to the value function. This can impact the amount of variance reduction in the IS-MC forward run, however, the IS-MC estimator is bias-free.

The cost for the ODE solver scales exponentially with the dimension $d$ of the SRNs, making this approach infeasible for high-dimensional SRNs. Section 3 presents a dimension reduction approach for SRNs employed in Section 4 to derive suboptimal IS controls for a lower-dimensional SRN. We later demonstrate how these controls are mapped to the full-dimensional SRN system.

Remark 2.4 (Continuous-time IS controls).

In Corollary 2.3 and Theorem 2.2, we present two alternative methods to express the value function (2.7) and the IS controls. Utilizing the HJB framework, we can derive continuous controls across time. This allows any time stepping $\Delta t$ in the IS-TL forward run and eliminates the need for ad-hoc interpolations.

3 Markovian Projection for Stochastic Reaction Networks

3.1 Formulation

To address the curse of dimensionality problem when deriving near-optimal IS controls, we project the SRN to a lower-dimensional network while preserving the marginal distribution of the original high-dimensional SRN system. We adapt the MP idea originally introduced in [29] for the setting of diffusion type stochastic differential equations to the SRNs framework. For an $d$ -dimensional SRN state vector, $\mathbf{X}(t)$ , we introduce a projection to a $\overline{d}$ -dimensional space such that $1\leq\overline{d}\ll d$ :

\displaystyle P:\mathbb{R}^{d}\rightarrow\mathbb{R}^{\overline{d}}:\>\mathbf{x% }\mapsto\mathbf{P}\mathbf{x},

where $\mathbf{P}\in\mathbb{R}^{\overline{d}\times d}$ is a given matrix. This section develops a general MP framework for arbitrary projections with $\overline{d}\geq 1$ . However, the choice of the projection depends on the quantity of interest. In particular, when considering rare event probabilities with an observable $g(\mathbf{x})=\mathbbm{1}_{\{x_{i}>\gamma\}},\gamma\in\mathbb{R}$ as we do in Section 4, the projection operator is of the form

\displaystyle P(\mathbf{x})=\left(0,\dots,\underset{\color[rgb]{.5,.5,.5}% \definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5% }\pgfsys@color@gray@fill{.5}{i-1}}{0},\underset{\color[rgb]{.5,.5,.5}% \definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5% }\pgfsys@color@gray@fill{.5}{i}}{1},\>\underset{\color[rgb]{.5,.5,.5}% \definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5% }\pgfsys@color@gray@fill{.5}{i+1}}{0},\dots,0\right)\mathbf{x}.

(3.1)

In the derivation of the MP, we assume that

\mathbb{E}[a_{j}(\mathbf{X}(t))\mid P(\mathbf{X}(t))=\mathbf{s},\mathbf{X}(0)=% \mathbf{x}_{0}]<\infty

(3.2)

for $\mathbf{s}\in\mathbb{N}^{\overline{d}}$ , $t\in[0,T]$ and $1\leq j\leq J$ .

Theorem 3.1 shows that a $\overline{d}$ dimensional SRN, $\overline{\boldsymbol{S}}(t)$ exists that follows the same conditional distribution as $\boldsymbol{S}(t):=P(\mathbf{X}(t))$ conditioned on the initial state $\mathbf{X}(0)=\mathbf{x}_{0}$ for all $t\in[0,T]$ .

Theorem 3.1 (MP for SRNs).

We let $\overline{\boldsymbol{S}}(t)$ be a $\overline{d}$ -dimensional stochastic process whose dynamics are given by

\displaystyle\overline{\boldsymbol{S}}(t)=P(\mathbf{x}_{0})+\sum_{j=1}^{J}% \overline{Y}_{j}\left(\int_{0}^{t}\overline{a}_{j}(\tau,\overline{\boldsymbol{% S}}(\tau))d\tau\right)\underbrace{P(\boldsymbol{\nu}_{j})}_{=:\overline{% \boldsymbol{\nu}}_{j}},

(3.3)

for $t\in[0,T]$ , where $\overline{Y}_{j}$ denotes independent unit-rate Poisson processes and $\overline{a}_{j}$ , $j=1,\dots,J$ , are characterized by

\displaystyle\overline{a}_{j}(t,\boldsymbol{s}):=\mathbb{E}\left[a_{j}(\mathbf% {X}(t))\middle|P\left(\mathbf{X}(t)\right)=\boldsymbol{s},\mathbf{X}(0)=% \mathbf{x}_{0}\right]\text{, for }1\leq j\leq J,\boldsymbol{s}\in\mathbb{N}^{% \overline{d}}.

(3.4)

Thus, $\boldsymbol{S}(t)\mid_{\{\mathbf{X}(0)=\mathbf{x}_{0}\}}=P(\mathbf{X}(t))\mid_% {\{\mathbf{X}(0)=\mathbf{x}_{0}\}}$ and $\overline{\boldsymbol{S}}(t)\mid_{\{\mathbf{X}(0)=\mathbf{x}_{0}\}}$ have the same distribution for all $t\in[0,T]$ .

Proof.

The proof for Theorem 3.1 is given in Appendix B. ∎

In Theorem 3.1, we require the assumption in (3.2) to hold in order to ensure that the MP propensity in (3.4) is well-defined. Assumption (3.2) does not hold for all SRNs. However, in Remark 3.2, we present sufficient conditions guaranteeing (3.2).

Remark 3.2 (Sufficient conditions for assumption (3.2)).

If the propensity functions $\{a_{j}(\cdot)\}_{j=1}^{J}$ are bounded, i.e., it exist bounds $K_{j}\in\mathbb{R}^{+}$ such that for any $\mathbf{x}\in\mathbb{N}^{d}$ $a_{j}(\mathbf{x})<K_{j}$ for $1\leq j\leq J$ . Then, the expectation in (3.2) is bounded by the same bounds. An alternative condition is that the set

\displaystyle\mathcal{K}_{t,\mathbf{s}}:=\left\{\mathbf{y}\in\mathbb{N}^{d}:P(% \mathbf{y})=\mathbf{s}\quad\text{and}\quad\mathbb{P}(\mathbf{X}(t)=\mathbf{y})% >0\right\}

is finite for all $\mathbf{s}\in\mathbb{N}^{\overline{d}}$ and $t\in[0,T]$ . Consequently, the expectation in (3.2) corresponds to a finite sum and is finite. An in-depth analysis is required to establish sharp conditions under which (3.2) holds. Our numerical analysis show that the MP propensity is well defined in many examples.

The mimicking process $\{\overline{\boldsymbol{S}}(t)\}_{t\in[0,T]}$ is a SRN and consequently Markovian, whereas the projected full-dimensional process $\{\boldsymbol{S}(t)\}_{t\in[0,T]}$ is non-Markovian. Moreover, the propensities of the full-dimensional process $\{a_{j}\}_{j=1}^{J}$ are time-homogeneous functions of the state, whereas the resulting propensities, $\{\overline{a}_{j}\}_{j=1}^{J}$ of the MP-SRN $\overline{\boldsymbol{S}}$ are time-dependent (see (3.4)). Reactions with $P(\boldsymbol{\nu}_{j})=0$ do not contribute to the MP propensity in (3.4). For reactions with $P(\boldsymbol{\nu}_{j})\neq 0$ , it may occur that their corresponding projected propensity is known analytically. We denote the index set of reactions requiring an estimation of (3.4) (e.g., via a $L^{2}$ regression as described in Section 3.2) by $\mathcal{J}_{MP}$ . This index set is described as follows:

\displaystyle\mathcal{J}_{MP}:=\left\{1\leq j\leq J:\quad P(\boldsymbol{\nu}_{% j})\neq 0\quad\text{and}\quad\underbrace{a_{j}(\mathbf{x})\neq f(P(\mathbf{x})% )\text{ for all functions }f:\mathbb{R}^{\overline{d}}\rightarrow\mathbb{R}}_{% (*)}\right\},

(3.5)

where condition (*) excludes reaction channels for which the MP propensity is only dependent on $s$ and given in closed form by $\overline{a}_{j}(t,s)=f(s)$ for the function $f$ .

3.2 Discrete $L^{2}$ Regression for Approximating Projected Propensities

To approximate the Markovian propensity $\overline{a}_{j}$ for $j\in\mathcal{J}_{MP}$ , we reformulate (3.4) as a minimization problem and then use discrete $L^{2}$ regression as described below.

We let $V:=\left\{f:[0,T]\times\mathbb{R}^{\overline{d}}\rightarrow\mathbb{R}:\int_{0}% ^{T}\mathbb{E}[f(t,P(\mathbf{X}(t)))^{2}]dt<\infty\right\}$ . Then, the projected propensities via the MP for $j\in\mathcal{J}_{MP}$ are approximated by

$\displaystyle\overline{a}_{j}(\cdot,\cdot)$	$\displaystyle=\text{argmin}_{h\in V}\int_{0}^{T}\mathbb{E}\left[\left(a_{j}(% \mathbf{X}(t))-h(t,P(\mathbf{X}(t)))\right)^{2}\right]dt$
	$\displaystyle\approx\text{argmin}_{h\in V}\mathbb{E}\left[\frac{1}{{N}}\sum_{{% n=0}}^{N-1}\left(a_{j}(\widehat{\mathbf{X}}^{\Delta t}_{n})-h(t_{n},P(\widehat% {\mathbf{X}}^{\Delta t}_{n}))\right)^{2}\right]$
	$\displaystyle\approx\text{argmin}_{h\in V}\frac{1}{M}\sum_{m=1}^{M}\frac{1}{{N% }}\sum_{{n=0}}^{N-1}\left(a_{j}(\widehat{\mathbf{X}}^{\Delta t}_{[m],n})-h(t_{% n},P(\widehat{\mathbf{X}}^{\Delta t}_{[m],n}))\right)^{2},$	(3.6)

where $\left\{\widehat{\mathbf{X}}^{\Delta t}_{[m]}\right\}_{m=1}^{M}$ are $M$ independent TL paths with a uniform time grid $0=t_{0}<t_{1}<\dots<t_{N}=T$ with step size $\Delta t$ .

To solve (3.2), we use a discrete $L^{2}$ regression approach. For the case $\overline{d}=1$ , we employ a set of basis functions in $V$ , $\{\phi_{p}(\cdot,\cdot)\}_{p\in\Lambda}$ , where $\Lambda\subset\mathbb{N}^{2}$ is a finite index set. In Remark 3.3, we provide more details on the choice of the basis. Consequently, the projected propensities via MP are approximated by

\displaystyle\overline{a}_{j}(t,s)\approx\sum_{p\in\Lambda}c_{p}^{(j)}{\phi}_{% p}(t,s),j\in\mathcal{J}_{MP}

(3.7)

where the coefficients $c_{p}^{(j)}$ must be derived for $j\in\mathcal{J}_{MP}$ and $p\in\Lambda$ .

Next, we derive the linear systems of equations, solved by $\{c_{p}^{(j)}\}_{p\in\Lambda}$ from (3.7) for $j\in\mathcal{J}_{MP}$ . For a given one-dimensional indexing of $\{1,\dots,M\}\times\{0,\dots,N-1\}$ , the corresponding design matrix $\boldsymbol{D}\in\mathbb{R}^{MN\times|\Lambda|}$ is given by

\displaystyle D_{k,p}={\phi}_{p}(t_{n},P(\widehat{\mathbf{X}}^{\Delta t}_{[m],% n})),\text{ for }k=(m,n)\in\{1,\dots,M\}\times\{0,\dots,N-1\},\quad p\in\Lambda.

Further, we set $\psi_{k}^{(j)}=a_{j}(\widehat{\mathbf{X}}^{\Delta t}_{[m],n})$ ( $\boldsymbol{\psi}^{(j)}\in\mathbb{R}^{MN}$ ) for $k\in\{1,\dots,M\}\times\{0,\dots,N-1\}$ , and $j\in\mathcal{J}_{MP}$ .

Then, the minimization problem in (3.2) becomes

	$\displaystyle\textbf{c}^{(j)}$	$\displaystyle=\text{argmin}_{\{c_{p}\}_{p\in\Lambda}}\frac{1}{MN}\sum_{m=1}^{M% }\sum_{n=0}^{N-1}\left(a_{j}(\widehat{\mathbf{X}}^{\Delta t}_{[m],n})-\sum_{p% \in\Lambda}c_{p}{\phi}\left(t_{n},P(\widehat{\mathbf{X}}^{\Delta t}_{[m],n})% \right)\right)^{2}$
		$\displaystyle=\text{argmin}_{\mathbf{c}\in\mathbb{R}^{\#\Lambda}}\left(% \boldsymbol{\psi}^{(j)}-\boldsymbol{D}\textbf{c}\right)^{\top}\left(% \boldsymbol{\psi}^{(j)}-\boldsymbol{D}\textbf{c}\right)$
		$\displaystyle=\text{argmin}_{\mathbf{c}\in\mathbb{R}^{\#\Lambda}}\underbrace{{% \boldsymbol{\psi}^{(j)}}^{\top}\boldsymbol{\psi}^{(j)}-2\mathbf{c}^{\top}% \boldsymbol{D}^{\top}\boldsymbol{\psi}^{(j)}+\mathbf{c}^{\top}\boldsymbol{D}^{% \top}\boldsymbol{D}\mathbf{c}}_{=:I(\mathbf{c})}.$

We minimize $I(\mathbf{c})$ with respect to $\mathbf{c}$ by solving

\displaystyle\frac{\partial I(\mathbf{c})}{\partial\mathbf{c}}=-2{\boldsymbol{% D}}^{\top}\boldsymbol{\psi}^{(j)}+2{\boldsymbol{D}}^{\top}\boldsymbol{% \boldsymbol{D}}\mathbf{c}=0

and obtain the normal equation for $j\in\mathcal{J}_{MP}$ :

\displaystyle\left(\boldsymbol{D}^{\top}\boldsymbol{D}\right)\mathbf{c}^{(j)}=% \boldsymbol{D}^{\top}\boldsymbol{\psi}^{(j)}.

(3.8)

Remark 3.3 (Orthonormal basis approach via empirical inner product).

For the case $\overline{d}=1$ , the normal equation with a set of polynomials $\{\phi_{p}\}_{p\in\Lambda}$ in $\mathbb{R}^{2}$ can be used to derive the MP propensity $\overline{a}_{j}$ for $j\in\mathcal{J}_{MP}$ . We use the standard basis $\left\{\phi_{(i_{1},i_{2})}\right\}_{(i_{1},i_{2})\in\Lambda}$ for a two-dimensional index set $\Lambda$ , where

\phi_{(i_{1},i_{2})}:\mathbb{R}^{2}\rightarrow\mathbb{R},(t,x)\mapsto t^{i_{1}% }x^{i_{2}}.

For better stability [18], we use the Gram–Schmidt orthogonalization algorithm to determine an orthonormal set of functions for the empirical scalar product:

\displaystyle\left\langle\phi_{i},\phi_{j}\right\rangle_{{M}}=\frac{1}{N}\sum_% {n=0}^{N-1}\frac{1}{{M}}\sum_{m=1}^{{M}}\phi_{i}\left(t_{n},P\widehat{\mathbf{% X}}^{\Delta t}_{[m],n}\right)\phi_{j}\left(t_{n},P\widehat{\mathbf{X}}^{\Delta t% }_{[m],n}\right)

(3.9)

to find an orthonormal set of functions. We base the empirical scalar product and the normal equation (3.8) on the same set of TL paths, $\{\widehat{\mathbf{X}}^{\Delta t}_{[m]}\}_{m=1,\dots,M}$ , such that the matrix condition number becomes $\text{cond}(\boldsymbol{D}^{\top}\boldsymbol{D})=1$ and $\boldsymbol{D}^{\top}\boldsymbol{D}=\text{diag}\left(\frac{T}{\Delta t}M,\dots% ,\frac{T}{\Delta t}M\right)$ [18].

Remark 3.4 (Propensity functions).

The proof of Theorem (3.1) and the derivation of the normal equation (3.8) is general with respect to the structure of the propensity function. The Markovian projection can be applied to SRNs with arbitrary propensity functions, i.e., not restricted to the mass-action-kinetics principle introduced in (1.4) (e.g. Hill-type reaction rate law [49]). The choice of suitable basis functions $\{\phi_{p}\}_{p\in\Lambda}$ for the $L^{2}$ regression depends on the given example and also on the type of propensity.

3.3 Computational Cost of Markovian Projection

The computational work to derive an MP for an SRN with $J$ reactions based on a time stepping $\Delta t$ based on $M$ TL paths and an orthonormal set of polynomials (see Remark 3.3) of size $\#\Lambda$ splits into three types of costs:

\displaystyle W_{MP}(\#\Lambda,\Delta t,M)\approx M\cdot W_{TL}(\Delta t)+W_{G% -S}(\#\Lambda,\Delta t,M)+W_{L^{2}}(\#\Lambda,\Delta t,M),

(3.10)

where $W_{TL}$ , $W_{G-S}$ , and $W_{L^{2}}$ denote the computational costs to simulate a TL path, derive an orthonormal basis (as described in Remark 3.3), and derive and solve the normal equation in (3.8), respectively. The dominant terms of these costs contribute as follows:

	$\displaystyle W_{TL}(\Delta t)$	$\displaystyle\approx\frac{T}{\Delta t}\cdot J\cdot C_{Poi},$
	$\displaystyle W_{G-S}(\#\Lambda,\Delta t,M)$	$\displaystyle\approx M\cdot\frac{T}{\Delta t}\cdot\left(\#\Lambda\right)^{3},$
	$\displaystyle W_{L^{2}}(\#\Lambda,\Delta t,M)$	$\displaystyle\approx M\cdot\frac{T}{\Delta t}\cdot\left(\left(\#\Lambda\right)% ^{2}+\#\mathcal{J}_{MP}\cdot\#\Lambda\right),$

where $C_{Poi}$ represents the cost to simulate one realization of a Poisson random variable. The main computational cost results from deriving an orthonormal basis (see Remark 3.3). A more detailed derivation of the cost terms is provided in Appendix C. For many applications, such as the MP-IS approach presented in Section 4, the MP must be computed only once, such that the computational cost $W_{MP}(\#\Lambda,\Delta t,M)$ can be regarded as an off-line cost.

Remark 3.5 (Simulating MP paths).

The MP-SRN $\overline{\mathbf{S}}(t)$ can be simulated as a SRN with inhomogeneous propensity function. The explicit TL scheme in (1.2) can be naturally adapted to this setting. However, we note that the IS approach presented in Section 4 does not require explicitly simulating paths of the MP-SRN.

4 Importance Sampling for Higher-dimensional Stochastic Reaction Networks via Markovian Projection

Next, we employ MP to overcome the curse of dimensionality when deriving IS controls from solving (2.3). Specifically, we solve the HJB equations in (2.11) for a reduced-dimensional MP system (refer to Figure 1.2 for a schematic illustration of the approach). Given a suitable projection $P:\mathbb{R}^{d}\rightarrow\mathbb{R}^{\overline{d}}$ and a corresponding final condition $\tilde{g}:\mathbb{N}^{\overline{d}}\rightarrow\mathbb{R}$ with $\tilde{g}(P(\mathbf{x}))=g(\mathbf{x})$ , the HJB equations (2.11) for the MP process are

	$\displaystyle\tilde{u}_{\overline{d}}(T,\boldsymbol{s})$	$\displaystyle=\tilde{g}^{2}(\boldsymbol{s}),\quad\boldsymbol{s}\in\mathbb{N}^{% \overline{d}}$
	$\displaystyle\frac{d\tilde{u}_{\overline{d}}}{dt}(t,\boldsymbol{s})$	$\displaystyle=-2\sum_{j=1}^{J}\overline{a}_{j}(t,\boldsymbol{s})\left(\sqrt{% \tilde{u}_{\overline{d}}(t,\boldsymbol{s})\tilde{u}_{\overline{d}}(t,\max(0,% \boldsymbol{s}+\overline{\boldsymbol{\nu}}_{j}))}-\tilde{u}_{\overline{d}}(t,% \boldsymbol{s})\right),\quad t\in[0,T],\boldsymbol{s}\in\mathbb{N}^{\overline{% d}}.$		(4.1)

For observables of the type $g(\mathbf{x})=\mathbbm{1}_{\{x_{i}>\gamma\}}$ , we use an MP to a ( $\overline{d}=1$ )-dimensional process via projection (3.1), and the final condition is approximated by a positive sigmoid (see (2.12)). The solution of (4) is the value function $\tilde{u}_{\overline{d}}$ of the $\overline{d}$ -dimensional MP process. To obtain continuous-time IS controls for the $d$ -dimensional SRN, we substitute the value function $\tilde{u}\left(t,\mathbf{x}\right)$ of the full-dimensional process in (2.10) with the value function $\tilde{u}_{\overline{d}}(t,P(\mathbf{x}))$ of the MP-SRN:

\displaystyle\overline{\delta}_{j}(t,\mathbf{x})={a_{j}(\mathbf{x})}\sqrt{% \frac{\tilde{u}_{\overline{d}}\left(t,\max\left(0,P(\mathbf{x}+\boldsymbol{\nu% }_{j})\right)\right)}{\tilde{u}_{\overline{d}}\left(t,P(\mathbf{x})\right)}}% \text{ for }\mathbf{x}\in\mathbb{N}^{d},t\in[0,T].

(4.2)

Remark 4.1 (Alternative MP-IS approach).

In the presented approach, we map the value function of the $\overline{d}$ -dimensional MP process to the full-dimensional SRNs. Alternatively, one could also map the optimal controls from the $\overline{d}$ -dimensional MP-SRN to the full-dimensional SRNs, leading to the following controls:

\displaystyle\tilde{\delta}^{\overline{d}}_{j}(t,\mathbf{x})=\overline{a}_{j}(% t,P(\mathbf{x}))\sqrt{\frac{\tilde{u}_{\overline{d}}\left(t,\max\left(0,P(% \mathbf{x}+\boldsymbol{\nu}_{j})\right)\right)}{\tilde{u}_{\overline{d}}\left(% t,P(\mathbf{x})\right)}},\text{ for }\mathbf{x}\in\mathbb{N}^{d},t\in[0,T].

(4.3)

The numerical experiments demonstrate that this approach results in a comparable variance reduction to the approach presented in (4.2).

Remark 4.2 (Adaptive MP for $\overline{d}>1$ ).

In (4.2), when utilizing $\tilde{u}_{\overline{d}}$ as the value function for the $d$ -dimensional control, we introduce a bias to the optimal IS controls by approximating $\tilde{u}(t,\mathbf{x})$ by $\tilde{u}_{\overline{d}}(t,P(\mathbf{x}))$ for $\mathbf{x}\in\mathbb{N}^{d}$ and $t\in[0,T]$ . For the case $\overline{d}=d$ , we have $\tilde{u}_{\overline{d}}(t,P(\mathbf{x}))=\tilde{u}(t,\mathbf{x})$ and the MP produces the optimal IS control for the full-dimensional SRNs. For $\overline{d}<d$ , this equality does not hold, since the interaction (correlation effects) between non-projected species are not taken into account in the MP SRNs, because the MP only ensures that the marginal distributions of $P(\mathbf{X}(t))\mid_{\{\mathbf{X}(0)=\mathbf{x}_{0}\}}$ and $\overline{\boldsymbol{S}}(t)\mid_{\{\mathbf{X}(0)=\mathbf{x}_{0}\}}$ are identical. This can be seen in examples in which reactions occur with $P(\boldsymbol{\nu}_{j})=0$ . Those reactions, are not present in the MP and; thus, are not included in the IS scheme. For the extreme case, $\overline{d}=1$ , we expect to achieve the least variance reduction which could be already substantial and satisfactory for many examples as we show in our numerical experiments. However, examples could exist where a projection to dimension $\overline{d}=1$ is insufficient to achieve a desired variance reduction. In this case, we can adaptively choose a better projection with increased dimension $\overline{d}=1,2,\dots$ until a sufficient variance reduction is achieved. This will imply an increased computational cost in the MP and in solving the HJB equations (4) for $\mathbf{x}\in\mathbb{N}^{\overline{d}}$ . Investigating the effect of $\overline{d}$ on improving the variance reduction of our approach is left for a future work.

To derive an MP-IS-MC estimator for a given uniform time grid $0=t_{0}\leq t_{1}\leq\dots\leq t_{N}=T$ with step size $\Delta t$ , we generate IS paths using the scheme in (2.1) with IS control parameters $\delta_{n,j}^{\Delta t}(\mathbf{x})=\overline{\delta}_{j}(t_{n},\mathbf{x})$ , as in (4.2), for $j=1,\dots,J,\mathbf{x}\in\mathbb{R}^{d},n=0,\dots,N-1$ . Figure 4.1 presents a schematic illustration of the entire derivation of the MP-IS-MC estimator.

This computational work consists of three cost contributions:

	$\displaystyle W_{MP-IS-MC}$	$\displaystyle(\#\Lambda,\Delta t,M,M_{fw})$		(4.4)
		$\displaystyle\approx W_{MP}(\#\Lambda,\Delta t,M)+W_{HJB}(\#\Lambda)+W_{% forward}(\Delta t,M_{fw}),$

where $W_{MP}(\#\Lambda,\Delta t,M)$ denotes the off-line cost to derive the MP (see (3.10)), $W_{HJB}(\#\Lambda)$ represents the cost to solve the HJB (4) for the $\overline{d}$ -dimensional MP-SRN, and $W_{forward}(\Delta t,M_{fw})$ indicates the cost of deriving $M_{fw}$ IS paths. The cost to solve the HJB (4) $W_{HJB}(\#\Lambda)$ depends on the used solver, and the cost for the forward run has the following dominant terms:

\displaystyle W_{forward}(\Delta t,M_{fw})\approx M_{fw}\cdot\frac{T}{\Delta t% }\cdot(J\cdot C_{Poi}+C_{lik}+\#\mathcal{J}_{MP}\cdot C_{\delta}),

(4.5)

where $C_{\delta}$ is the cost to evaluate (4.2).

Figure 4.1: Schematic diagram MP-IS-MC. The costs of the operations in the first line (blue boxes) are off-line.

{NoHyper}

Remark 4.3 (Further applications of MP in the context of SRNs).

In this work, we use the described MP for dimension reduction to derive a sub-optimal change of measure for IS, but the same MP framework can be used for other applications, such as solving the chemical master equation [40] or the Kolmogorov backward equations [41]. We intend to explore these directions in a future work.

5 Numerical Experiments and Results

Through Examples 5.1 and 5.2, we demonstrate the advantages of the proposed MP-IS approach compared with the standard MC approach for rare event estimations. We numerically demonstrate that the proposed approach achieves a substantial variance reduction compared with standard MC estimators when applied to SRNs with various dimensions.

Example 5.1 (Michaelis–Menten enzyme kinetics [44]).

The Michaelis-Menten enzyme kinetics are enzyme-catalyzed reactions describing the interaction of an enzyme $E$ with a substrate $S$ , resulting in a product $P$ :

\displaystyle E+S\overset{\theta_{1}}{\rightarrow}C,~{}~{}C\overset{\theta_{2}% }{\rightarrow}E+S,~{}~{}C\overset{\theta_{3}}{\rightarrow}E+P,

where $\theta=(0.001,0.005,0.01)^{\top}$ . We consider the initial state $\mathbf{X}_{0}=(E(0),S(0),C(0),P(0))^{\top}=(100,100,0,0)^{\top}$ and the final time $T=1$ . The corresponding propensity and the stoichiometric matrix are given by

\displaystyle a(\textbf{x})=\left(\begin{array}[]{c}\theta_{1}ES\\ \theta_{2}C\\ \theta_{3}C\end{array}\right),\quad\boldsymbol{\nu}=\left(\begin{array}[]{ccc}% -1&1&1\\ -1&1&0\\ 1&-1&-1\\ 0&0&1\end{array}\right).

The observable of interest is $g(\mathbf{x})=\mathbf{1}_{\{x_{3}>22\}}$ .

Example 5.2 (Goutsias’s model of regulated transcription [28, 33]).

The model describes a transcription regulation through the following six molecules:
Protein monomer ( $M$ ), Transcription factor ( $D$ ), mRNA ( $RNA$ ), Unbound DNA ( $DNA$ ), DNA bound at one site ( $DNA\cdot D$ ), DNA bounded at two sites ( $DNA\cdot 2D$ ). These species interact through the following 10 reaction channels R N A $\overset{\theta_{1}}{\rightarrow}RNA+M,$ M $\overset{\theta_{2}}{\rightarrow}\varnothing,$ D N A $\cdot$ D $\overset{\theta_{3}}{\rightarrow}RNA+DNA\cdot D,$ R N A $\overset{\theta_{4}}{\rightarrow}\varnothing,$ D N A+D $\overset{\theta_{5}}{\rightarrow}DNA\cdot D,$ D N A $\cdot$ D $\overset{\theta_{6}}{\rightarrow}DNA+D,$ D N A $\cdot$ D+D $\overset{\theta_{7}}{\rightarrow}DNA\cdot 2D,$ D N A $\cdot$ 2 D $\overset{\theta_{8}}{\rightarrow}DNA\cdot D+D,$ 2M $\overset{\theta_{9}}{\rightarrow}D,$ D $\overset{\theta_{10}}{\rightarrow}2M$ ,

where $(\theta_{1},\dots,\theta_{10})=(0.043,0.0007,0.0715,0.0039,0.0199,0.479,0.0001% 99,8.77\times 10^{-12},0.083,0.5)$ . As the initial state, we use $\mathbf{X}_{0}=(M(0),D(0),RNA(0),DNA(0),DNA\cdot D(0),DNA\cdot 2D(0))=(2,6,0,0% ,2,0)$ , and the final time is $T=1$ . We aim to estimate the rare event probability $\mathbb{P}(D(T)>8)$ .

5.1 Markovian Projection Results

Through simulations for Examples 5.1 and 5.2, we numerically demonstrate that the distribution of the MP process $\overline{\boldsymbol{S}}(T)\mid_{\{\mathbf{X}_{0}=\mathbf{x}_{0}\}}$ matches the conditional distribution of the projected process $\boldsymbol{S}(t)\mid_{\{\mathbf{X}_{0}=\mathbf{x}_{0}\}}=P(\mathbf{X}(t))\mid% _{\{\mathbf{X}_{0}=\mathbf{x}_{0}\}}$ , as shown in Theorem 3.1. For both examples, we use an MP projection with $\overline{d}=1$ using the projection given in (3.1), where the projected species is indexed as $i=3$ in Example 5.1 and as $i=2$ in Example 5.2.

The MP is based on $M=10^{4}$ TL sample paths with a step size of $\Delta t=2^{-8}$ and uses the orthonormal basis of polynomials described in Remark 3.3 with $\Lambda=\{0,1,2\}\times\{0,1,2\}$ for the $L^{2}$ regression. For the numerical comparison of the full-dimensional SRN with the MP-SRN, we use explicit TL paths with step size $\Delta t=2^{-8}$ (see Remark 3.5). In Figure 5.1, we show sample paths of the original SRN and the mimicking MP-SRN. Figure 5.2 shows the relative occurrences of states at final time $T$ with $M_{fw}=10^{4}$ sample paths, comparing the TL estimate of $P(\mathbf{X}(t))\mid_{\{\mathbf{X}_{0}=\mathbf{x}_{0}\}}$ and the TL-MP estimate of $\overline{\boldsymbol{S}}(T)\mid_{\{\mathbf{X}_{0}=\mathbf{x}_{0}\}}$ . In both examples, the one-dimensional MP process mimics the distribution of the state of interest $X_{i}(T)$ of the original SRNs. Further quantification and analyses of the MP error are left for future work. In this work, a detailed analysis of the MP error is less relevant because the MP is used as a tool to derive IS controls for the full-dimensional process, and the IS is bias-free with respect to the TL scheme.

5.2 Makovian Projection-Importance Sampling Results

For the numerical experiments, we use a four-dimensional and a six-dimensional SRNs with the observable $g(\mathbf{x})=\mathbbm{1}_{\{x_{i}>\gamma\}}$ , where $i$ and $\gamma$ are specified in Examples 5.1 and 5.2. Figure 5.2 indicates that this observable leads to a rare event probability estimation for which an MC estimate is insufficient. We use the workflow in Figure 4.1 with separate simulations for various $\Delta t$ values for the MP-IS simulations. The MP is based on $M=10^{4}$ TL sample paths each. The MP-IS-MC estimator, the sample variance, and the kurtosis estimate are based on $M_{fw}=10^{6}$ IS sample paths.

The relative error is more relevant than the absolute error for rare event probabilities. Therefore, we display the squared coefficient of variation [11, 13] in the simulations results, which is given by the following for a random variable $X$ :

\displaystyle Var_{rel}[X]=\frac{Var[X]}{\mathbb{E}[X]^{2}}.

(5.1)

The kurtosis is a good indicator of the robustness of the variance estimator (see [10, 11] for the connection between the sample variance and kurtosis).

Figure 5.3 shows the simulation results for the four-dimensional Example 5.1 for different step sizes $\Delta t$ . The quantity of interest is a rare event probability with a magnitude of $10^{-5}$ . For a step size of $\Delta t=2^{-10}$ , the proposed MP-IS approach reduces the squared coefficient of variation by a factor of $10^{6}$ compared to the standard MC-TL approach. The third plot in Figure 5.3 indicates that the kurtosis of the proposed MP-IS approach is below the kurtosis for standard TL for all observed step sizes $\Delta t$ , confirming that the proposed approach results in a robust variance estimator. In Figure 5.3 (d), we show the required number of sample paths to reach a prescribed relative tolerance (see (1.12) and (1.11)). We compare the required number of sample paths for standard TL with the sample paths in the forward run of the IS-TL estimator, $M_{fw}$ , and the total number of paths $M+M_{fw}$ including $M=10^{4}$ TL paths to derive the MP (see Section 3.3). We observe that for small tolerances, the additional paths to derive the MP become negligible compared to the cost of the forward IS-TL run. Further, we reduce the number of paths significantly compared to the standard TL approach. For small tolerances, the reduction of sample paths is up to a factor of approximately $10^{6}$ compared to the standard TL approach.

The second application of the proposed IS approach is the six-dimensional Example 5.2. Figure 5.4 shows that this rare event probability has a magnitude of $10^{-3}$ . We observe that, for $\Delta t\leq 2^{-3}$ , the squared coefficient of variation of the proposed MP-IS approach is reduced compared to the standard TL-MC approach. For a step size of $\Delta t=2^{-10}$ , this is a variance reduction of a factor of approximately $500$ . Note that, this example achieves less variance reduction than Example 5.1 due to a less rare quantity of interest. For most step sizes $\Delta t$ , the kurtosis of the proposed IS approaches is moderately increased compared to the standard TL estimator, with decreasing kurtosis for smaller $\Delta t$ . This outcome indicates a potentially insatiable variance estimator for coarse time steps of $\Delta t>2^{-7}$ . For finer time steps, we expect a robust variance estimator. For small tolerances the total number of paths is reduced by a factor of approximately $500$ .

6 Conclusion

In conclusion, this work presented an efficient IS scheme for estimating rare event probabilities for SRNs. We utilized a class of parameterized IS measure changes originally introduced in [11], for which near-optimal IS controls can be derived through a SOC formulation. We showed that the value function associated with this formulation can be expressed as a solution of a set of coupled ODEs, the HJB equations. One challenge encountered in solving the HJB equations is the curse of dimensionality, arising from the high-dimensional SRN. To address this issue, we introduced a dimension reduction approach for the setting of SRNs, namely MP. Then, we used a discrete $L^{2}$ regression to approximate the propensity and the stoichiometric vector of the MP-SRN. We demonstrated how the MP-SRN can be used for solving a significantly lower-dimensional HJB system, and how the resulting parameters are then mapped back to the full-dimensional SRNs to derive near-optimal IS controls. Our numerical simulations showed substantial variance reduction for the MP-IS-MC estimator compared to the standard MC-TL estimator for rare event probability estimations.

Acknowledgments This publication is based upon work supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. OSR-2019-CRG8-4033. This work was performed as part of the Helmholtz School for Data Science in Life, Earth and Energy (HDS-LEE) and received funding from the Helmholtz Association of German Research Centres and the Alexander von Humboldt Foundation. For the purpose of open access, the authors have applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising from this submission.

References Cited

[1] Assyr Abdulle, Yucheng Hu, and Tiejun Li. Chebyshev methods with discrete noise: the $\tau$ -ROCK methods. Journal of Computational Mathematics, pages 195–217, 2010.
[2] Tae-Hyuk Ahn, Adrian Sandu, and Xiaoying Han. Implicit simulation methods for stochastic chemical kinetics. arXiv preprint arXiv:1303.3614, 2013.
[3] David F Anderson. A modified next reaction method for simulating chemical systems with time dependent propensities and delays. The Journal of chemical physics, 127(21):214107, 2007.
[4] David F Anderson and Desmond J Higham. Multilevel Monte Carlo for continuous time Markov chains, with applications in biochemical kinetics. Multiscale Modeling & Simulation, 10(1):146–179, 2012.
[5] David F Anderson and Thomas G Kurtz. Stochastic analysis of biochemical systems, volume 1. Springer, 2015.
[6] Juan P Aparicio and Hernán G Solari. Population dynamics: Poisson approximation and its relation to the Langevin process. Physical Review Letters, 86(18):4183, 2001.
[7] Christian Bayer, Juho Häppölä, and Raúl Tempone. Implied stopping rules for american basket options from Markovian projection. Quantitative Finance, 19(3):371–390, 2019.
[8] Christian Bayer, Alvaro Moraes, Raúl Tempone, and Pedro Vilanova. An efficient forward–reverse expectation-maximization algorithm for statistical inference in stochastic reaction networks. Stochastic Analysis and Applications, 34(2):193–231, 2016.
[9] Chiheb Ben Hammouda. Hierarchical approximation methods for option pricing and stochastic reaction networks. PhD thesis, 2020.
[10] Chiheb Ben Hammouda, Nadhir Ben Rached, and Raúl Tempone. Importance sampling for a robust and efficient multilevel monte carlo estimator for stochastic reaction networks. Statistics and Computing, 30:1665–1689, 2020.
[11] Chiheb Ben Hammouda, Nadhir Ben Rached, Raúl Tempone, and Sophia Wiechert. Learning-based importance sampling via stochastic optimal control for stochastic reaction networks. Statistics and Computing, 33(3):58, 2023.
[12] Chiheb Ben Hammouda, Alvaro Moraes, and Raúl Tempone. Multilevel hybrid split-step implicit tau-leap. Numerical Algorithms, 74(2):527–560, 2017.
[13] Nadhir Ben Rached, Abdul-Lateef Haji-Ali, Gerardo Rubino, and Raúl Tempone. Efficient importance sampling for large sums of independent and identically distributed random variables. Statistics and Computing, 31(6):1–13, 2021.
[14] Amel Bentata and Rama Cont. Mimicking the marginal distributions of a semimartingale. arXiv preprint arXiv:0910.3992, 2009.
[15] Fred Brauer and Carlos Castillo-Chavez. Mathematical models in population biology and epidemiology, volume 40. Springer.
[16] Yang Cao and Linda Petzold. Trapezoidal tau-leaping formula for the stochastic simulation of biochemical systems. Proceedings of Foundations of Systems Biology in Engineering (FOSBE 2005), pages 149–152, 2005.
[17] Youfang Cao and Jie Liang. Adaptively biased sequential importance sampling for rare events in reaction networks with comparison to exact solutions from finite buffer dCME method. The Journal of chemical physics, 139(2):07B605_1, 2013.
[18] Albert Cohen, Mark A Davenport, and Dany Leviatan. On the stability and accuracy of least squares approximations. Foundations of computational mathematics, 13:819–834, 2013.
[19] Bernie J Daigle Jr, Min K Roh, Dan T Gillespie, and Linda R Petzold. Automated estimation of rare event probabilities in biochemical systems. The Journal of chemical physics, 134(4):01B628, 2011.
[20] Boualem Djehiche and Björn Löfdahl. Risk aggregation and stochastic claims reserving in disability insurance. Insurance: Mathematics and Economics, 59:100–108, 2014.
[21] Stewart N Ethier and Thomas G Kurtz. Markov processes : characterization and convergence. Wiley series in probability and mathematical statistics. J. Wiley & Sons, New York, Chichester, 1986.
[22] Michael B Giles. Multilevel Monte Carlo path simulation. Operations Research, 56(3):607–617, 2008.
[23] Michael B Giles. Multilevel Monte Carlo methods. Acta Numerica, 24:259–328, 2015.
[24] Colin S Gillespie and Andrew Golightly. Guided proposals for efficient weighted stochastic simulation. The Journal of chemical physics, 150(22):224103, 2019.
[25] Dan T Gillespie, Min Roh, and Linda R Petzold. Refining the weighted stochastic simulation algorithm. The Journal of chemical physics, 130(17):174103, 2009.
[26] Daniel T Gillespie. A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. Journal of computational physics, 22(4):403–434, 1976.
[27] Daniel T Gillespie. Approximate accelerated stochastic simulation of chemically reacting systems. The Journal of Chemical Physics, 115(4):1716–1733, 2001.
[28] John Goutsias. Quasiequilibrium approximation of fast reaction kinetics in stochastic biochemical systems. The Journal of chemical physics, 122(18):184102, 2005.
[29] István Gyöngy. Mimicking the one-dimensional marginal distributions of processes having an Itô differential. Probability theory and related fields, 71(4):501–516, 1986.
[30] Floyd B. Hanson. Applied Stochastic Processes and Control for Jump-Diffusions: Modeling, Analysis and Computation. SIAM, Philadelphia, PA, 2007.
[31] Carsten Hartmann, Christof Schütte, and Wei Zhang. Model reduction algorithms for optimal control and importance sampling of diffusions. Nonlinearity, 29(8):2298, 2016.
[32] Sebastian C Hensel, James B Rawlings, and John Yin. Stochastic kinetic modeling of vesicular stomatitis virus intracellular growth. Bulletin of mathematical biology, 71(7):1671–1692, 2009.
[33] Hye-Won Kang and Thomas G Kurtz. Separation of time-scales and model reduction for stochastic reaction networks. 2013.
[34] Nikolai Vladimirovich Krylov. Nonlinear elliptic and parabolic equations of the second order. Springer, 1987.
[35] Thomas Kurtz and Richard Stockbridge. Stationary solutions and forward equations for controlled and singular martingale problems. 2001.
[36] Hiroyuki Kuwahara and Ivan Mura. An efficient and exact stochastic simulation method to analyze rare events in biochemical systems. The Journal of chemical physics, 129(16):10B619, 2008.
[37] Frédéric Legoll and Tony Lelievre. Effective dynamics using conditional expectations. Nonlinearity, 23(9):2131, 2010.
[38] Christopher Lester, Christian Adam Yates, Michael B Giles, and Ruth E Baker. An adaptive multi-level simulation algorithm for stochastic biological systems. The Journal of chemical physics, 142(2):01B612_1, 2015.
[39] Tiejun Li. Analysis of explicit tau-leaping schemes for simulating chemically reacting systems. Multiscale Modeling & Simulation, 6(2):417–436, 2007.
[40] Linar Mikeev and Werner Sandmann. Approximate numerical integration of the chemical master equation for stochastic reaction networks. arXiv preprint arXiv:1907.10245, 2019.
[41] Alvaro Moraes. Simulation and statistical inference of stochastic reaction networks with applications to epidemic models. PhD thesis, 2015.
[42] Alvaro Moraes, Raúl Tempone, and Pedro Vilanova. A multilevel adaptive reaction-splitting simulation method for stochastic reaction networks. SIAM Journal on Scientific Computing, 38(4):A2091–A2117, 2016.
[43] Vladimir Piterbarg. Markovian projection method for volatility calibration. Available at SSRN 906473, 2006.
[44] Christopher V Rao and Adam P Arkin. Stochastic chemical kinetics and the quasi-steady-state assumption: Application to the Gillespie algorithm. The Journal of chemical physics, 118(11):4999–5010, 2003.
[45] Muruhan Rathinam and Hana El Samad. Reversible-equivalent-monomolecular tau: A leaping method for “small number and stiff” stochastic chemical systems. Journal of Computational Physics, 224(2):897–923, 2007.
[46] Min K Roh. Data-driven method for efficient characterization of rare event probabilities in biochemical systems. Bulletin of mathematical biology, 81(8):3097–3120, 2019.
[47] Min K Roh, Dan T Gillespie, and Linda R Petzold. State-dependent biasing method for importance sampling in the weighted stochastic simulation algorithm. The Journal of chemical physics, 133(17):174106, 2010.
[48] Ranjan Srivastava, Lingchong You, Jesse Summers, and John Yin. Stochastic vs. deterministic modeling of intracellular viral kinetics. Journal of theoretical biology, 218(3):309–321, 2002.
[49] James N Weiss. The hill equation revisited: uses and misuses. The FASEB Journal, 11(11):835–841, 1997.

Appendix A Proof for Corollary 2.3

Proof.

For $\mathbf{x}\in\mathbb{N}^{d}$ , we define $\tilde{u}(\cdot,\mathbf{x};\Delta t)$ as the continuous smooth extension of $u_{\Delta t}(\cdot,\mathbf{x})$ (defined in (2.2)) on $[0,T]$ . Consequently, we denote the continuous-time IS controls by $\boldsymbol{\delta}(\cdot,\mathbf{x}):[0,T]\rightarrow\mathcal{A}_{\mathbf{x}}$ for $\mathbf{x}\in\mathbb{N}^{d}$ . Then, the Taylor expansion of $\tilde{u}(t+\Delta t,\mathbf{x};\Delta t)$ in $t$ results in the following:

\displaystyle\tilde{u}(t+\Delta t,\mathbf{x};\Delta t)=\tilde{u}(t,\mathbf{x};% \Delta t)+\Delta t\partial_{t}\tilde{u}(t,\mathbf{x};\Delta t)+\mathcal{O}% \left(\Delta t^{2}\right),\mathbf{x}\in\mathbb{N}^{d}.

(A.1)

By the definition of the value function 2.1, the final condition is given by

\displaystyle\tilde{u}(T,\mathbf{x};\Delta t)=g^{2}(\mathbf{x}),\mathbf{x}\in% \mathbb{N}^{d}.

For $t=T-\Delta t,\ldots,0$ , we apply to (2.2) from Theorem 2.2 a Taylor expansion around $\Delta t=0$ to the exponential term and (A.1) to $\tilde{u}(t+\Delta t,\mathbf{x};\Delta t)$ :

$\displaystyle\tilde{u}(t,\mathbf{x};\Delta t)$	$\displaystyle=\inf_{\boldsymbol{\delta}(t,\mathbf{x})\in\mathcal{A}_{\mathbf{x% }}}\exp\left(\left(-2\sum_{j=1}^{J}a_{j}(\mathbf{x})+\sum_{j=1}^{J}\delta_{j}(% t,\mathbf{x})\right)\Delta t\right)$
	$\displaystyle\quad\quad\quad\quad\times\sum_{\mathbf{p}\in\mathbb{N}^{J}}\left% (\prod_{j=1}^{J}\frac{(\Delta t\cdot\delta_{j}(t,\mathbf{x}))^{p_{j}}}{p_{j}!}% \left(\frac{a_{j}(\mathbf{x})}{\delta_{j}(t,\mathbf{x})}\right)^{2p_{j}}\right% )\cdot\tilde{u}(t+\Delta t,\max(\mathbf{0},\mathbf{x}+\boldsymbol{\nu}\mathbf{% p});\Delta t)$
	$\displaystyle=\inf_{\boldsymbol{\delta}(t,\mathbf{x})\in\mathcal{A}_{\mathbf{x% }}}\left(1+\left(-2\sum_{j=1}^{J}a_{j}(\mathbf{x})+\sum_{j=1}^{J}\delta_{j}(t,% \mathbf{x})\right)\Delta t+\mathcal{O}(\Delta t^{2})\right)$
	$\displaystyle\quad\quad\quad\quad\times\left[\sum_{\mathbf{p}\in\mathbb{N}^{J}% }\left(\prod_{j=1}^{J}\frac{(\Delta t\cdot\delta_{j}(t,\mathbf{x}))^{p_{j}}}{p% _{j}!}\left(\frac{a_{j}(\mathbf{x})}{\delta_{j}(t,\mathbf{x})}\right)^{2p_{j}}% \right)\right.$
	$\displaystyle\quad\quad\quad\quad\left.\vphantom{\prod_{j=1}^{J}}\cdot\left(% \tilde{u}\left(t,\max(\mathbf{0},\mathbf{x}+\boldsymbol{\nu}\mathbf{p});\Delta t% \right)+\Delta t\partial_{t}\tilde{u}\left(t,\max(\mathbf{0},\mathbf{x}+% \boldsymbol{\nu}\mathbf{p});\Delta t\right)+\mathcal{O}(\Delta t^{2})\right)\right]$
$\displaystyle\overset{(*)}{\Longrightarrow}-\partial_{t}$	$\displaystyle\tilde{u}(t,\mathbf{x};\Delta t)=\inf_{\boldsymbol{\delta}(t,% \mathbf{x})\in\mathcal{A}_{\mathbf{x}}}\left(-2\sum_{j=1}^{J}a_{j}(\mathbf{x})% +\sum_{j=1}^{J}\delta_{j}(t,\mathbf{x})\right)\tilde{u}(t,\mathbf{x};\Delta t)% +\mathcal{O}(\Delta t)$
	$\displaystyle\quad\quad\quad\quad\quad\quad+(1+\mathcal{O}(\Delta t))\left[% \sum_{\mathbf{p}\neq\mathbf{0}}\Delta t^{\|\mathbf{p}\|-1}\left(\prod_{j=1}^{J}% \frac{a_{j}(\mathbf{x})^{2p_{j}}}{p_{j}!\cdot\left(\delta_{j}(t,\mathbf{x})% \right)^{p_{j}}}\right)\right.$
	$\displaystyle\quad\quad\quad\quad\quad\quad\left.\vphantom{\prod_{j=1}^{J}}% \cdot\left(\tilde{u}\left(t,\max(\mathbf{0},\mathbf{x}+\boldsymbol{\nu}\mathbf% {p});\Delta t\right)+\mathcal{O}(\Delta t)\right)\right],$	(A.2)

where $|\mathbf{p}|:=\sum_{i=1}^{J}p_{j}$ , and $\delta_{j}(t,\mathbf{x}):=\left(\boldsymbol{\delta}(t,\mathbf{x})\right)_{j}$ . In (*), we split the sum, rearrange the terms, divide by $\Delta t$ , and collect the terms of $\mathcal{O}(\Delta t)$ . The limit for $\Delta t\rightarrow 0$ in (A) is denoted by $\tilde{u}(t,\mathbf{x})$ , leading to (2.3) for $0<t<T$ and $\mathbf{x}\in\mathbb{N}^{d}$ . ∎

Appendix B Proof for Theorem 3.1

Proof.

We let $f:\mathbb{R}^{\overline{d}}\rightarrow\mathbb{R}$ be an arbitrary bounded continuous function and $\overline{\boldsymbol{S}}$ be defined in (3.3). We consider the following weak approximation error:

\displaystyle\varepsilon_{T}:=\mathbb{E}\left[f(P(\mathbf{X}(T)))\middle|% \mathbf{X}(0)=\mathbf{x}_{0}\right]-\mathbb{E}\left[f(\overline{\boldsymbol{S}% }(T))\middle|\overline{\boldsymbol{S}}(0)=P(\mathbf{x}_{0})\right].

(B.1)

For $t\in[0,T]$ , we define the cost to go function as

\displaystyle\overline{v}(t,\boldsymbol{s}):=\mathbb{E}\left[f(\overline{% \boldsymbol{S}}(T))\middle|\overline{\boldsymbol{S}}(t)=\boldsymbol{s}\right].

Then, we can represent the weak error in (B.1) as follows:

\displaystyle\varepsilon_{T}=\mathbb{E}\left[\overline{v}(T,P(\mathbf{X}(T)))% \middle|\mathbf{X}(0)=\mathbf{x}_{0}\right]-\overline{v}(0,P(\mathbf{x}_{0})).

(B.2)

Using Dynkin’s formula [30] and (1.3), we can express the first term in (B.1) as follows:

	$\displaystyle\mathbb{E}$	$\displaystyle\left[\overline{v}(T,P(\mathbf{X}(T)))\middle\|\mathbf{X}(0)=% \mathbf{x}_{0}\right]$
		$\displaystyle=\overline{v}(0,P(\mathbf{x}_{0}))+\int_{0}^{T}\mathbb{E}\left[% \vphantom{\sum_{j=1}^{J}}\partial_{t}\overline{v}(\tau,P(\mathbf{X}(\tau)))\right.$
		$\displaystyle\left.+\sum_{j=1}^{J}a_{j}(\mathbf{X}(\tau))\left(\overline{v}(% \tau,P(\mathbf{X}(\tau)+\boldsymbol{\nu}_{j}))-\overline{v}(\tau,P(\mathbf{X}(% \tau)))\right)\middle\|\mathbf{X}(0)=\mathbf{x}_{0}\right]d\tau.$

The Kolmogorov backward equations [30] of $\overline{\boldsymbol{S}}$ are given as

\displaystyle\partial_{\tau}\overline{v}(\tau,\boldsymbol{s})=-\sum_{j=1}^{J}% \overline{a}_{j}(\tau,\boldsymbol{s})\left(\overline{v}(\tau,\boldsymbol{s}+% \overline{\boldsymbol{\nu}}_{j})-\overline{v}(\tau,\boldsymbol{s})\right),% \mathbf{s}\in\mathbb{N}^{\overline{d}},

implying that the weak error simplifies to

	$\displaystyle\varepsilon_{T}$	$\displaystyle=\sum_{j=1}^{J}\int_{0}^{T}\mathbb{E}\left[a_{j}(\mathbf{X}(\tau)% )\overline{v}(\tau,P(\mathbf{X}(\tau)+\boldsymbol{\nu}_{j}))-\overline{a}_{j}(% \tau,P(\mathbf{X}(\tau)))\overline{v}(\tau,P(\mathbf{X}(\tau))+\overline{% \boldsymbol{\nu}}_{j})\middle\|\mathbf{X}(0)=\mathbf{x}_{0}\right]$
		$\displaystyle-\mathbb{E}\left[\left(a_{j}(\mathbf{X}(\tau))-\overline{a}_{j}(% \tau,P(\mathbf{X}(\tau)))\right)\overline{v}(\tau,P(\mathbf{X}(\tau)))\middle\|% \mathbf{X}(0)=\mathbf{x}_{0}\right]d\tau.$		(B.3)

Next, we choose $\overline{a}_{j}$ and $\overline{\boldsymbol{\nu}}_{j}$ for $j=1,\dots,J$ such that $\varepsilon_{T}=0$ for any function $f$ . We consider the second term in (B) and use the tower property to obtain

		$\displaystyle\mathbb{E}\left[\left(a_{j}(\mathbf{X}(\tau))-\overline{a}_{j}(% \tau,P(\mathbf{X}(\tau)))\right)\overline{v}(\tau,P(\mathbf{X}(\tau)))\middle\|% \mathbf{X}(0)=\mathbf{x}_{0}\right]$
		$\displaystyle=\mathbb{E}\left[\mathbb{E}\left[\left(a_{j}(\mathbf{X}(\tau))-% \overline{a}_{j}(\tau,P(\mathbf{X}(\tau)))\right)\overline{v}(\tau,P(\mathbf{X% }(\tau)))\middle\|P(\mathbf{X}(\tau)),\mathbf{X}(0)=\mathbf{x}_{0}\right]% \middle\|\mathbf{X}(0)=\mathbf{x}_{0}\right]$
		$\displaystyle=\mathbb{E}\left[\left(\mathbb{E}\left[{a}_{j}(\mathbf{X}(\tau))% \middle\|P(\mathbf{X}(\tau)),\mathbf{X}(0)=\mathbf{x}_{0}\right]-\overline{a}_{% j}(\tau,P(\mathbf{X}(\tau)))\right)\overline{v}(\tau,P(\mathbf{X}(\tau)))% \middle\|\mathbf{X}(0)=\mathbf{x}_{0}\right].$		(B.4)

To ensure that (B) $=0$ for any function $f$ , we obtain the following:

\displaystyle\overline{a}_{j}(\tau,P(\mathbf{X}(\tau)))=\mathbb{E}\left[{a}_{j% }(\mathbf{X}(\tau))\middle|P(\mathbf{X}(\tau)),\mathbf{X}(0)=\mathbf{x}_{0}% \right],j=1,\dots,J.

(B.5)

Applying (B.5) and the tower property for the first term, we derive

		$\displaystyle\mathbb{E}\left[a_{j}(\mathbf{X}(\tau))\overline{v}(\tau,P(% \mathbf{X}(\tau)+\boldsymbol{\nu}_{j}))-\overline{a}_{j}(\tau,P(\mathbf{X}(% \tau)))\overline{v}(\tau,P(\mathbf{X}(\tau))+\overline{\boldsymbol{\nu}}_{j})% \middle\|\mathbf{X}(0)=\mathbf{x}_{0}\right]$
		$\displaystyle=\mathbb{E}\left[\mathbb{E}\left[a_{j}(\mathbf{X}(\tau))\overline% {v}(\tau,P(\mathbf{X}(\tau)+\boldsymbol{\nu}_{j}))\right.\right.$
		$\displaystyle\left.\left.-\overline{a}_{j}(\tau,P(\mathbf{X}(\tau)))\overline{% v}(\tau,P(\mathbf{X}(\tau))+\overline{\boldsymbol{\nu}}_{j})\middle\|P(\mathbf{% X}(\tau)),\mathbf{X}(0)=\mathbf{x}_{0}\right]\middle\|\mathbf{X}(0)=\mathbf{x}_% {0}\right]$
		$\displaystyle=\mathbb{E}\left[\mathbb{E}\left[a_{j}(\mathbf{X}(\tau))\middle\|P% (\mathbf{X}(\tau)),\mathbf{X}(0)=\mathbf{x}_{0}\right]\overline{v}(\tau,P(% \mathbf{X}(\tau))+P(\boldsymbol{\nu}_{j})))\right.$
		$\displaystyle\left.-\overline{a}_{j}(\tau,P(\mathbf{X}(\tau)))\overline{v}(% \tau,P(\mathbf{X}(\tau))+\overline{\boldsymbol{\nu}}_{j})\middle\|\mathbf{X}(0)% =\mathbf{x}_{0}\right]$
		$\displaystyle=\mathbb{E}\left[\vphantom{\underbrace{\overline{v}}_{\overline{% \boldsymbol{\nu}}_{j}}}\mathbb{E}\left[a_{j}(\mathbf{X}(\tau))\middle\|P(% \mathbf{X}(\tau)),\mathbf{X}(0)=\mathbf{x}_{0}\right]\right.$
		$\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\cdot\left.\left(\overline{v}(\tau,% P(\mathbf{X}(\tau))+P(\boldsymbol{\nu}_{j}))-\overline{v}(\tau,P(\mathbf{X}(% \tau))+\overline{\boldsymbol{\nu}}_{j})\right)\middle\|\mathbf{X}(0)=\mathbf{x}% _{0}\right].$		(B.6)

Moreover, Equation (B) becomes zero for any function $f$ using

\displaystyle\overline{\boldsymbol{\nu}}_{j}=P(\boldsymbol{\nu}_{j}),j=1,\dots% ,J.

(B.7)

With this choice for $\overline{a}_{j}$ and $\overline{\boldsymbol{\nu}}_{j}$ , we derive $\varepsilon_{T}=0$ . The derivation holds for arbitrary bounded and smooth functions $f$ , for all fixed times $T$ ; thus, the process $\boldsymbol{S}(t)=P(\mathbf{X}(t))$ has the same conditional distribution as $\overline{\boldsymbol{S}}(t)$ conditioned on the initial value $\mathbf{X}(0)=\mathbf{x}_{0}$ .
∎

Appendix C Markovian Projection Cost Derivation

We present details on the computational cost of MP, as provided in (3.10):

•

The number of operations to generate one TL paths is given by

\displaystyle W_{TL}(\Delta t)=\frac{T}{\Delta t}\cdot(C_{prop}+J\cdot C_{Poi}% +d(J+2)),

(C.1)

where $C_{prop}$ is the cost of one evaluation of the propensity function (1.4). The dominant cost in (C.1) is $C_{Poi}$ (the cost of generating a Poisson random variable).

•

The number of operations for the Gram–Schmidt algorithm, as described in Remark 3.3, is given by

\displaystyle W_{G-S}(\#\Lambda,\Delta t,M)=\#\Lambda\cdot(C_{inner}+\#\Lambda% +1)+\frac{(\#\Lambda-1)\#\Lambda}{2}(2\#\Lambda+C_{inner}),

(C.2)

where $C_{inner}$ is the cost of the evaluation of the empirical inner product (3.9) given by

\displaystyle C_{inner}=\frac{T}{\Delta t}\cdot M(2+2C_{pol})+3=\mathcal{O}(% \frac{T}{\Delta t}\cdot M\cdot\#\Lambda).

(C.3)

The cost $C_{pol}$ in (C.3) is the computational cost for one evaluation of a polynomial in the space $<\phi_{p}>_{p\in\Lambda}$ , which is $\mathcal{O}(\#\Lambda)$ .

In the simulations, we apply the setting $\#\Lambda\ll\frac{T}{\Delta t}\cdot M$ (see Section 5.2, using the parameter $\#\Lambda=9$ , $M=10^{4}$ , $\frac{T}{\Delta t}=2^{4}$ ). Therefore, the dominant cost in (C.2) is $\mathcal{O}(M\cdot\frac{T}{\Delta t}\cdot\left(\#\Lambda\right)^{3})$ .

•

The cost $W_{L^{2}}(\#\Lambda,\Delta t,M)$ is split into two: the cost to (i) derive and (ii) solve the normal equation (3.8). The number of operations to derive the design matrix $D$ is $M\cdot\frac{T}{\Delta t}\cdot\#\Lambda\cdot C_{pol}$ , and the cost to derive one right-hand side $(\Psi^{(j)})_{j\in\mathcal{J}_{MP}}$ is $M\cdot\frac{T}{\Delta t}\cdot C_{prop}$ . In (3.8), the cost for the matrix product $D^{\top}D$ is $\mathcal{O}(\#\Lambda^{2}\cdot M\cdot\frac{T}{\Delta t})$ , and the cost for $\#\mathcal{J}_{MP}$ matrix-vector products is $\mathcal{O}(\#\mathcal{J}_{MP}\cdot\#\Lambda\cdot M\cdot\frac{T}{\Delta t})$ . Finally, solving (3.8) costs $\mathcal{O}(\#\mathcal{J}_{MP}\cdot\#\Lambda^{3})$ , which is a nondominant term under the given setting, $\#\Lambda\ll\frac{T}{\Delta t}\cdot M$ .

		$\displaystyle\mathbb{E}\left[\left(a_{j}(\mathbf{X}(\tau))-\overline{a}_{j}(% \tau,P(\mathbf{X}(\tau)))\right)\overline{v}(\tau,P(\mathbf{X}(\tau)))\middle\|% \mathbf{X}(0)=\mathbf{x}_{0}\right]$
		$\displaystyle=\mathbb{E}\left[\mathbb{E}\left[\left(a_{j}(\mathbf{X}(\tau))-% \overline{a}_{j}(\tau,P(\mathbf{X}(\tau)))\right)\overline{v}(\tau,P(\mathbf{X% }(\tau)))\middle\|P(\mathbf{X}(\tau)),\mathbf{X}(0)=\mathbf{x}_{0}\right]% \middle\|\mathbf{X}(0)=\mathbf{x}_{0}\right]$
		$\displaystyle=\mathbb{E}\left[\left(\mathbb{E}\left[{a}_{j}(\mathbf{X}(\tau))% \middle\|P(\mathbf{X}(\tau)),\mathbf{X}(0)=\mathbf{x}_{0}\right]-\overline{a}_{% j}(\tau,P(\mathbf{X}(\tau)))\right)\overline{v}(\tau,P(\mathbf{X}(\tau)))% \middle\|\mathbf{X}(0)=\mathbf{x}_{0}\right].$		(B.4)

		$\displaystyle\mathbb{E}\left[a_{j}(\mathbf{X}(\tau))\overline{v}(\tau,P(% \mathbf{X}(\tau)+\boldsymbol{\nu}_{j}))-\overline{a}_{j}(\tau,P(\mathbf{X}(% \tau)))\overline{v}(\tau,P(\mathbf{X}(\tau))+\overline{\boldsymbol{\nu}}_{j})% \middle\|\mathbf{X}(0)=\mathbf{x}_{0}\right]$
		$\displaystyle=\mathbb{E}\left[\mathbb{E}\left[a_{j}(\mathbf{X}(\tau))\overline% {v}(\tau,P(\mathbf{X}(\tau)+\boldsymbol{\nu}_{j}))\right.\right.$
		$\displaystyle\left.\left.-\overline{a}_{j}(\tau,P(\mathbf{X}(\tau)))\overline{% v}(\tau,P(\mathbf{X}(\tau))+\overline{\boldsymbol{\nu}}_{j})\middle\|P(\mathbf{% X}(\tau)),\mathbf{X}(0)=\mathbf{x}_{0}\right]\middle\|\mathbf{X}(0)=\mathbf{x}_% {0}\right]$
		$\displaystyle=\mathbb{E}\left[\mathbb{E}\left[a_{j}(\mathbf{X}(\tau))\middle\|P% (\mathbf{X}(\tau)),\mathbf{X}(0)=\mathbf{x}_{0}\right]\overline{v}(\tau,P(% \mathbf{X}(\tau))+P(\boldsymbol{\nu}_{j})))\right.$
		$\displaystyle\left.-\overline{a}_{j}(\tau,P(\mathbf{X}(\tau)))\overline{v}(% \tau,P(\mathbf{X}(\tau))+\overline{\boldsymbol{\nu}}_{j})\middle\|\mathbf{X}(0)% =\mathbf{x}_{0}\right]$
		$\displaystyle=\mathbb{E}\left[\vphantom{\underbrace{\overline{v}}_{\overline{% \boldsymbol{\nu}}_{j}}}\mathbb{E}\left[a_{j}(\mathbf{X}(\tau))\middle\|P(% \mathbf{X}(\tau)),\mathbf{X}(0)=\mathbf{x}_{0}\right]\right.$
		$\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\cdot\left.\left(\overline{v}(\tau,% P(\mathbf{X}(\tau))+P(\boldsymbol{\nu}_{j}))-\overline{v}(\tau,P(\mathbf{X}(% \tau))+\overline{\boldsymbol{\nu}}_{j})\right)\middle\|\mathbf{X}(0)=\mathbf{x}% _{0}\right].$		(B.6)

Automated Importance Sampling via Optimal Control for Stochastic Reaction Networks: A Markovian Projection–based Approach

Abstract

1 Introduction

1.1 Stochastic Reaction Networks (SRNs)

1.2 Explicit Tau-Leap Approximation

1.3 Biased Monte Carlo estimator

1.4 Importance Sampling

2 Importance Sampling via Stochastic Optimal Control Formulation

2.1 Dynamic Programming for Importance Sampling Parameters

Definition 2.1 (Value function).

Theorem 2.2 (Dynamic programming for IS parameters [11]).

2.2 Derivation of Hamilton–Jacobi–Bellman (HJB) Equations

Corollary 2.3 (HJB equations for IS parameters).

Proof.

Remark 2.4 (Continuous-time IS controls).

3 Markovian Projection for Stochastic Reaction Networks

3.1 Formulation

Theorem 3.1 (MP for SRNs).

Proof.

Remark 3.2 (Sufficient conditions for assumption (3.2)).

3.2 Discrete L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT Regression for Approximating Projected Propensities

Remark 3.3 (Orthonormal basis approach via empirical inner product).

Remark 3.4 (Propensity functions).

3.3 Computational Cost of Markovian Projection

Remark 3.5 (Simulating MP paths).

4 Importance Sampling for Higher-dimensional Stochastic Reaction Networks via Markovian Projection

Remark 4.1 (Alternative MP-IS approach).

Remark 4.2 (Adaptive MP for d¯>1¯𝑑1\overline{d}>1over¯ start_ARG italic_d end_ARG > 1).

Remark 4.3 (Further applications of MP in the context of SRNs).

5 Numerical Experiments and Results

Example 5.1 (Michaelis–Menten enzyme kinetics [44]).

Example 5.2 (Goutsias’s model of regulated transcription [28, 33]).

5.1 Markovian Projection Results

5.2 Makovian Projection-Importance Sampling Results

6 Conclusion

References Cited

Appendix A Proof for Corollary 2.3

Proof.

Appendix B Proof for Theorem 3.1

Proof.

Appendix C Markovian Projection Cost Derivation

3.2 Discrete $L^{2}$ Regression for Approximating Projected Propensities

Remark 4.2 (Adaptive MP for $\overline{d}>1$ ).