BréhierEtal 2016
BréhierEtal 2016
BréhierEtal 2016
1.1. Motivation and mathematical setting. Let us consider the Markov chain
(Xt )t∈N defined as the time discretization of the overdamped Langevin dynamics:
(1) ∀t ∈ N, Xt+1 − Xt = −∇V (Xt )h + 2β −1 (W(t+1)h − Wth ).
are for instance regions located around local minima of V . This actually corre-
sponds to a physical reality: the timescale at the molecular level (given by h, which
is typically chosen at the limit of stability for the stochastic differential equation) is
much smaller than the timescales of interest, which correspond to hopping events
between metastable states. Let us denote by A ⊂ R3N and B ⊂ R3N two (disjoint)
metastable states. One problem of interest is then the following: for some initial
condition outside A and B and close to A, how to efficiently sample paths which
reach B before A. In the context of molecular dynamics, such paths are called re-
active paths. The efficient sampling of reactive paths is a very important subject in
many applications since it is a way to understand the mechanism of the transition
between metastable states. In mathematical terms, one is interested in computing,
for a given test function ϕ : (R3N )N → R depending on the path (Xt )t∈N of the
Markov chain, the expectation
(2) E ϕ (Xt )t∈N 1τB <τA ,
where τA = inf{t ∈ N : Xt ∈ A}, τB = inf{t ∈ N : Xt ∈ B} and X0 = x0 ∈ / (A ∪
B) is assumed to be a deterministic initial position close to A: most trajectories
starting from x0 hit A before B. Generalization to a random initial condition is
straightforward, by a conditioning argument. If ϕ = 1, the above expectation is
P(τB < τA ), namely the probability that the Markov chain reaches B before A.
This is typically a very small probability: since A is metastable and x0 is close
to A, for most of the realizations, τA is smaller than τB . This is why naive Monte
Carlo methods will not give reliable estimates of (2). We refer for instance to [8,
17] for some examples in the context of molecular simulation. The problem we
would like to address in this article is thus the following: how to build “good”
estimators of (2), where (Xt )t∈N is a general Markov chain, and τA and τB are two
stopping times.
1.2. A short review of the literature on rare event simulation. A complete re-
view of rare event simulation techniques is out of the scope of this article; we
instead refer the interested reader to [1, 11, 12, 38], for instance. Our aim in this
section is mainly to explain the interest of splitting techniques in general, and
Adaptive Multilevel Splitting (AMS) in particular, to simulate rare events in some
specific contexts.
Two main families of algorithms for the efficient sampling of rare events have
been studied and applied successfully in many contexts since the pioneering works
on Monte Carlo methods in the 1950s [28, 30, 37].
The first family is known as importance sampling: the probability distribution
of interest is modified using an importance function, in order to enhance the real-
ization of the rare events; unbiased estimators are recovered thanks to the use of
appropriate likelihood ratios.
It may happen, for example for industrial applications in engineering or chem-
istry, that the stochastic model is only given as a black-box and cannot be modified.
UNBIASEDNESS OF SOME GAMS ALGORITHMS 3561
In such situations, importance sampling algorithms are unpractical, and one may
rely on splitting strategies as described below, for which by construction the model
does not need to be modified and is used as a black-box.
The second family of methods is given by splitting algorithms, which are it-
erative procedures based on interacting systems of replicas. These are selected
using an importance function, and then possibly duplicated and weighted accord-
ingly. A common interpretation is that the state-space is decomposed in a nested
sequence of subsets (which are level-sets for the importance function), such that
the rare event probability can be written as a (telescoping) product of conditional
probabilities, which are easier to compute.
Note that in the last 20 years, splitting algorithms have been studied exten-
sively, and many variants appeared: to cite a few, Subset Simulation [2], (Multi-
level) Splitting [25, 32, 33], Nested Sampling [41] and RESTART [43]. Note also
the relations with Genealogical Particle models [14, 19, 21] and Sequential Monte
Carlo methods [13, 20]. In the nonadaptive versions of the splitting method, the
nested sequence of subsets is fixed a priori and it is easy to build unbiased estima-
tors. However, to the best of our knowledge, similar results do not exist for adaptive
versions such as the Adaptive Multilevel Splitting (AMS) algorithm, where the se-
quence of subsets is built on-the-fly using the ensemble of replicas. This is the
focus of this article.
Adaptive versions, as proposed in [15], of multilevel splitting algorithms are
needed in practice: the performance of the estimation indeed depends on the choice
of the levels (i.e., of the nested susbets), which is not trivial if no additional infor-
mation on the system is available. In previous theoretical studies [5, 9, 10, 15, 27,
40, 44], the consistency (estimators are unbiased) and the efficiency (the algorithm
outperforms crude Monte Carlo methods in the rare event regime, when the size of
the system of replica increases) have been analyzed under very restrictive condi-
tions, in so-called idealized settings. Especially, these conditions are not satisfied
for processes in dimension larger than one, or for discrete-time dynamics.
Note that the sensitivity of the performance with respect to the choice of the im-
portance function is a well-known fact for both importance sampling and splitting
algorithms: the variance deteriorates for inappropriate choices; see [24, 26], and
numerical simulations in Section 5. Note that a general strategy to design impor-
tance functions for diffusions in small noise regimes is by approximating solutions
of Hamilton–Jacobi–Bellman equations; these equations are obtained by asymp-
totic analysis (often related to large deviations behavior); see, for instance, [4, 22,
42] for importance sampling and [18] for splitting algorithms. In the absence of
a small parameter (typically small noise) in order to perform asymptotic analysis,
such a technique breaks down.
Let us roughly describe the principle of the method; see Section 4.5 for the pre-
cise definition of the algorithm. The crucial ingredient we need is an importance
function:
(3) ξ : R3N → R
which will be used to measure the advance of the paths towards B. This function
is known as a reaction coordinate in the molecular dynamics community, and this
is the terminology we will use here. In this paper, we also call ξ(Xt ) the level of
the process Xt at time t, and [see (25)]
(X) = sup ξ(Xt∧τA ) : t ∈ N
the maximum level of the Markov chain path X. A useful requirement on ξ is the
existence of a level zmax ∈ R such that
B ⊂ x ∈ R3N : ξ(x) ∈ ]zmax , ∞[ .
Then, starting from a system of nrep replicas (all starting from the same initial
condition x0 and stopped at time τA ), the idea is to remove the least fit paths and
to duplicate the remaining paths while keeping a fixed number of replicas. The
least fit paths are those with the smallest maximum levels (X). As soon as one
of the least fit paths is removed, one of the remaining path is duplicated and then
partially resampled: the new path is a copy of the path up to the maximum level
of the removed least fit paths, and the end of the trajectory is then sampled using
independent random numbers. The algorithm thus goes through three steps: (i) a
level computation step (to determine the level under which paths will be removed:
this level is computed as an empirical quantile over the maximum levels among
the replicas); (ii) a splitting step (to determine which paths will be removed and
which ones of the remaining paths will be duplicated); (iii) a partial resampling
step (to generate new paths from the selected paths). By iterating these three steps,
one obtains successively systems of nrep paths with an increasing minimum of the
maximum levels among the replicas. The algorithm is stopped when the current
level is larger than zmax , and an estimator of (2) is then built using a weighted
empirical average over the replicas. The adaptive feature of the algorithm is in
the first step (the level computation step): indeed, at each iteration, paths are re-
moved if their maximum level is below some threshold, and these thresholds are
determined iteratively using empirical quantiles, rather than by fixing a priori a
deterministic sequence of levels (as it would be the case in nonadaptive splitting,
or more generally in standard sequential Monte Carlo algorithms; see [13, 19]).
All the details of the algorithm will be given in Section 4.5.
In this work, we focus on the application of the AMS algorithm to sample
Markov chains, namely discrete time stochastic dynamics, and not continuous time
stochastic dynamics as in [15] for example. The reason is mainly practical: in most
cases of interest, even if the original model is continuous in time, it is discretized
UNBIASEDNESS OF SOME GAMS ALGORITHMS 3563
in time when numerical approximations are needed. There are actually also many
cases where the original model is discrete in time (e.g. kinetic Monte Carlo or
Markov state models in the context of molecular dynamics).
The discrete time setting, which is thus of practical interest, raises specific ques-
tions in the context of the AMS algorithm. First, in the partial resampling step,
a natural question which is answered in this article is whether the path should be
copied up to the last time before or first time after it reaches the level of the re-
moved paths. Second, in the discrete time context, it may happen that several paths
have exactly the same maximum level. This implies some subtleties in the imple-
mentation of the splitting step which have a large influence on the quality of the
estimators. We refer to [6] and [7], Section 5.1 for concrete examples where bad
implementations lead to strongly biased results. One objective of the present arti-
cle is actually to elucidate a correct implementation of the AMS algorithm in such
a discrete time setting.
1.4. Main results and outline. The main results and outline of this work are
the following:
• In Section 2, we introduce the Generalized Adaptive Multilevel Splitting
(GAMS) framework, which encompasses the AMS algorithm presented above.
The interest of this generalized setting is twofold. First, it is very useful to write
variants of the classical AMS algorithm (see in particular [7], Section 3.5). Sec-
ond, it highlights the essential mathematical properties that are required to pro-
duce unbiased estimators of quantities such as (2).
• In Section 3, we state and prove the main theoretical result (Theorem 3.2) of
this article: algorithms which fit in the GAMS framework yield unbiased esti-
mators of the rare event probability, and more generally of any nonnormalized
expectation related to the rare event, of the form (2).
• Section 4 is devoted to the detailed presentation of the AMS algorithm discussed
above; appropriate implementations of the level computation and splitting steps
are made explicit in Section 4.5. In particular, in Section 4.6 we prove that the
algorithm fits in the GAMS framework of Section 2.
• Section 5 is entirely devoted to some numerical experiments which illustrate the
unbiasedness result, and discuss the efficiency of the AMS algorithm to sample
rare events. We discuss through various numerical experiments the influence of
the choice of the reaction coordinate ξ on the variance of the estimators and we
end up with some practical recommendations in order to get reliable estimates
using the AMS algorithm (see Section 5.2). In particular, using the unbiased-
ness property proven in this paper, it is possible to compare the results obtained
using different parameters (in particular different reaction coordinates) in order
to assess the quality of the numerical results. In addition, it is easy to build an
unbiased estimator with smaller variance by parallelizing computations using
many independent realizations of AMS with a fixed number of replicas.
3564 C.-E. BRÉHIER ET AL.
Let us mention that we very often refer to [7], which is an extended version of
the present work with in particular additional numerical experiments, and detailed
extensions of the AMS algorithm which enter into the GAMS framework.
From a theoretical perspective, this article presents extensions of previous re-
sults on splitting algorithms in two directions. On the one hand, it is well known
that unbiased estimators of quantities such as (2) can be built for nonadaptive split-
ting algorithms (i.e., with a fixed sequence of levels), and one of the main contribu-
tions of this article is to extend this highly desirable property, under appropriate as-
sumptions, for adaptive algorithms (when levels are computed on-the-fly). On the
other hand, compared to previous results in the literature concerning the AMS al-
gorithm, the main novelty of this work is the proof of the unbiasedness in a general
setting and whatever the parameters: the number of replicas, the (minimum) num-
ber of replicas sampled at each iteration and the reaction coordinate ξ . In previous
works (see, for instance, [10, 27, 40]), unbiasedness is proved in an idealized set-
ting, namely when the reaction coordinate is given by ξ(x) = Px (τB < τA ) (known
as the committor function; here, the subscript x ∈ R3N indicates that the Markov
chain Xt has x as an initial condition), and for a different partial resampling step,
where new replicas are sampled according to the conditional distribution of paths
conditioned to reach the level of the removed replicas. In many cases of practical
interest, in particular for sampling paths of discrete time processes, these two con-
ditions are not met, and the AMS algorithm of Section 4 is a suitable generalization
of existing algorithms to deal with such situations.
The proof of unbiasedness is inspired by the interpretation of the AMS algo-
rithm as a sequential Monte Carlo algorithm in path space, in the spirit of [29]
(the selection and mutation steps respectively corresponds to the branching step
and the partial resampling step in the AMS algorithm). In this interpretation, the
iteration index (or “time” index) of the sequential algorithm is given by the in-
creasing levels defined by the reaction coordinate. We refer the interested reader
to [7], Section 3.4, where this analogy is made precise.
As explained above, the bias is only one part of the error when using the AMS
algorithm: the statistical error (namely the variance) also plays a crucial role in
the quality of the estimator as will be shown numerically in Section 5. There are
unfortunately very few theoretical results concerning the influence of the choice of
ξ on the statistical error. We refer to [9, 16] for an analysis of the statistical error.
For discussions about the role of ξ on the statistical error, we also refer to [23, 24,
36]. In particular, in the numerical experiments, we discuss situations for which
the confidence intervals of the estimators associated with different reaction coor-
dinates do not overlap if the number of independent realizations of the algorithm
is not sufficiently large. We relate this observation to the well-known phenomenon
of “apparent bias” for splitting algorithms; see [24].
We would like to stress that our results hold in the setting where a family of
partial resampling kernels indexed by the levels is available (see Section 2.1.2 for
a precise definition). This is particularly well suited to the sampling of trajectories
UNBIASEDNESS OF SOME GAMS ALGORITHMS 3565
of Markov dynamics (see [7], Section 3.4, for another possible setting). In the ter-
minology of [29], we have in mind the dynamic setting (considered, e.g., in [15]),
and not the static setting (considered, e.g., in [13, 16]). Nested sampling, [41], is
one instance of an adaptive multilevel splitting algorithm devised in the static set-
ting, and which does not enter into our framework; indeed, the partial resampling
kernels used in practice (based on few steps of a Metropolis–Hastings algorithm)
do not satisfy our assumptions (see Assumption 2 below).
1.5. Notation. Before going into the details, let us provide some general nota-
tion which is useful in the following:
• The underlying probability space is denoted by (, F , P). For m σ -fields
F 1 , . . . , F m ⊂ F , F 1 ∨ · · · ∨ F m denotes the smallest σ -field on contain-
ing all the σ -fields F 1 , . . . , F m . For any t, s ∈ N = {0, 1, . . .}, t ∧ s = min{t, s}
and t ∨ s = max{t, s}. We use the convention inf ∅ = +∞. For two sets A and
B which are disjoint, A B denotes the disjoint set union.
• We work in the following standard setting: random variables take values in state
spaces E which are Polish (namely metrizable, complete for some distance dE
and separable). The associated Borel σ -field is denoted by B (E ). We will give
precise examples below (see, e.g., Section 4.1 for the space of trajectories for
Markov chains).
Then Proba(E ) denotes the set of probability distributions on E . It is endowed
with the standard Polish structure associated with the Prohorov–Levy metric
which metrizes convergence in distribution, that is, weak convergence of proba-
bilities tested against continuous and bounded test functions (see, e.g., [3]). The
distribution of a E -valued random variable X will be denoted by Law(X).
• If E1 and E2 are two Polish state spaces, a Markov kernel (or transition probabil-
ity kernel) (x1 , dx2 ) from E1 to E2 is a measurable map from states in x1 ∈ E1 ,
to probability measures in Proba(E2 ).
• We use the following standard notation associated with probability transitions:
for ϕ : E2 → R a bounded and measurable test function,
(4) (ϕ)(x1 ) = ϕ(x2 ) (x1 , dx2 ), ∀x1 ∈ E1 .
x2 ∈E2
Similarly, we use the notation π(ϕ) = x∈E2 ϕ(x)π(dx) for π ∈ Proba(E2 ).
• Let X1 and X2 be random variables with values respectively in E1 and E2 , and
a Markov kernel from E1 to E2 . We say that X2 is sampled according to
(X1 , ·) [and we denote X2 ∼ (X1 , ·)] if X2 = f (X1 , U ) a.s. where, on the
one hand, U is a random variable independent of X1 and of all the random
variables introduced before (namely at previous iterations of the algorithm) and,
on the other hand, f is a measurable function which is such that (x1 , ·) =
Law(f (x1 , U )), for Law(X1 )-almost every x1 ∈ E1 .
3566 C.-E. BRÉHIER ET AL.
where I ⊂ N∗ = N \ {0} is a random finite subset of labels and (X (n) )n∈I are
elements of E . The space E rep is endowed with the following distance: for X 1 =
(x (1,n) )n∈I 1 and X 2 = (x (2,n) )n∈I 2 in E rep , we set
⎧
⎪2, if I 1 = I 2 ,
⎨
d X 1 , X 2 = min d x (1,n) , x (2,n) , 1 , if I 1 = I 2 .
⎪
⎩ E
n∈I 1
Endowed with this distance, the set E rep is Polish and we denote by B (E rep ) the
Borel σ -field. This σ -field can also be written as follows:
B E rep = B (E )⊗I ,
I ∈I
2.1. The setting. In this section, we introduce the main ingredients and as-
sumptions we need in order to define the GAMS framework.
Let us introduce the state space P (typically the path space of a random pro-
cess); it is assumed to be a Polish space and let us denote B (P ) its Borel σ -field.
Let X be a random variable with values in (P , B (P )) and with probability distri-
bution
π = P ◦ X −1 ∈ Proba(P ).
The aim of the algorithms we present is to estimate
(6) π(ϕ) = ϕ(x)π(dx) = E ϕ(X)
P
2.1.2. The partial resampling kernel πz (x, ·). We introduce a transition prob-
ability kernel from R × P to P :
(z, x) ∈ R × P → πz (x, ·) ∈ Proba(P ).
3568 C.-E. BRÉHIER ET AL.
2.1.3. Assumptions on (filtz )z∈R and (πz )z∈R . We will need two assumptions
on (filtz )z∈R and (πz )z∈R . The first one states a right continuity property of the
mapping z → πz (φ)(x) and is required to apply the Doob’s optional stopping the-
orem in the proof of Lemma 3.6.
Second, we require a consistency relation between the filtration (filtz )z∈R and
the transition probability kernel (πz )z∈R .
A SSUMPTION 2. Let us consider a random variable X, (filtX z )z∈R and (πz )z∈R
as introduced above. We assume the following consistency relation: if X is dis-
tributed according to πz (x, ·) for some (z, x) ∈ R × P , then for any z ≥ z and for
any bounded measurable test function ϕ : P → R,
z = πz (ϕ)(X)
E ϕ(X)|filtX a.s.
−1
filtX
z =X filtrep
z .
R EMARK 2.1 (On the definition of the filtrations). In many cases of practical
interest, for any z ∈ R, filtz is defined as the smallest filtration which makes an
rep
application Fz : P → (E , B (E )) measurable, for some Polish space E . Then filtz
is the smallest σ -field which makes the application Gz : P rep → E rep measurable,
with Gz ((x (n) )n∈I ) = (Fz (x (n) ))n∈I .
D EFINITION 2.2 (Stopping level, Stopped σ -field). Let (Fz )z∈R be a filtration
on (, F , P). A stopping level Z with respect to (Fz )z∈R is a random variable
with values in R such that {Z ≤ z} ∈ Fz for any z ∈ R ∪ {−∞, +∞}. The stopped
σ -field, denoted by FZ , is characterized as follows:
A ∈ FZ if and only if ∀z ∈ R, A ∩ {Z ≤ z} ∈ Fz .
2.2. The generalized adaptive multilevel splitting framework. The aim of this
section is to introduce a general framework for splitting algorithms [which we
refer to as the Generalized Adaptive Multilevel Splitting (GAMS) framework in
the sequel]. It is a convenient, general procedure which allows to define practically
implementable algorithms.
It iterates over three successive steps: (1) a branching or splitting step, (2) a
partial resampling step and (3) a level computation step. These steps are performed
until a suitable stopping criterion is satisfied. We denote by Qiter the number of
iterations, which in general is a random variable.
At each iteration step q ≥ 0 of the algorithm, the distribution π is approximated
by an empirical distribution
(9) π̂ (q) = G(n,q) δX(n,q) ,
n∈I (q)
over a system of weighted replicas X (q) := (X (n,q) , G(n,q) )n∈I (q) ∈ P rep , where
I (q) ⊂ N∗ is the (random) finite set of labels at step q of the algorithm and G(n,q) ∈
R+ is the (random) weight attached to the replica X(n,q) .
As it will be proven in Section 3, any algorithm which enters into the GAMS
framework is such that, for any bounded measurable function ϕ : P → R, π̂ (q) (ϕ)
is an unbiased estimator of π(ϕ): for any q ≥ 0, E(π̂ (q) (ϕ)) = π(ϕ). Moreover,
under appropriate assumptions (see Theorem 3.2), this statement can be gener-
alized when q is replaced by the random number of iterations Qiter of the algo-
rithm:
E G(n,Qiter ) ϕ X (n,Qiter ) = π(ϕ).
n∈I (Qiter )
The proof of this result (Theorem 3.2) is given in Sections 3.2 and 3.3.
As it will become clear, in order to obtain a fully implementable algorithm from
the GAMS framework, three procedures need to be made precise (i) the stopping
criterion, (ii) the computation rule of the branching numbers and (iii) the computa-
tion of the stopping levels. These procedures require to define three sets of random
variables that are used in the GAMS framework presented in the next Section 2.2.1:
(S (q) )q≥0 (the exit rule defining Qiter ), (B (n,q+1) )q≥0,n∈I (q) (the branching num-
bers defining duplication of replicas) and (Z (q) )q≥0 (the random levels defining
the partial resampling of replicas states). The precise assumptions on these ran-
dom variables required to have unbiasedness for (9) (Theorem 3.2) are stated in
Assumption 3 below (Section 2.2.2). A concrete example of sets of random vari-
ables will be given in Section 4, where the AMS algorithm in a Markov chain
context is presented.
2.2.1. Precise definition of the GAMS framework. We now introduce the Gen-
eralized Adaptive Multilevel Splitting (GAMS) framework, which is an iterative
procedure on an integer index q ≥ 0.
UNBIASEDNESS OF SOME GAMS ALGORITHMS 3571
that P (q+1)
(n ) = n,
G(n,q)
(11) G(n ,q+1) = .
E(B (n,q+1) |F (q) )
R EMARK 2.3. The choice of the new weights in (11) can be generalized in
the following way. In the splitting step (v), sample the new weights G(n ,q+1) con-
ditionally on F (q) and P (q+1) and assume that the weights satisfy: for all n ∈ I (q) ,
(n ,q+1)
E 1B (n,q+1) ≥1 G F (q)
= G(n,q) .
n ,P (q+1) (n )=n
Note that the weights defined by (11) satisfy this generalized requirement, but
other strategies are also allowed, such as: for all n ∈ I (q+1) and n ∈ I (q) \ Ikilled
(q+1)
A SSUMPTION 3. The random variables (S (q) )q≥0 , (B (n,q+1) )q≥0,n∈I (q) , and
(Z (q) )q≥0 ) satisfy the following properties:
• the random variables (S (q) )q≥0 needed for defining the stopping criterion are
such that S (q) is F (q) -measurable;
• for each q ∈ N, the branching numbers (B (n,q+1) )n∈I (q) are with values in N,
and are assumed to be sampled conditionally on F (q) (see Section 1.5 for a
precise definition), such that E(B (n,q+1) |F (q) ) > 0 a.s.;
• the stopping levels (Z (q) )q≥0 are with values in R, satisfy Z (q+1) ≥ Z (q) and are
(q)
such that Z (q) is a stopping level with respect to (Fz )z∈R (see Definition 2.2).
Once these three sets of random variables have been defined, the GAMS frame-
work becomes a practical splitting algorithm which yields an unbiased estimator
of (6) (this is the claim of Theorem 3.2 proved in Section 3). A concrete example
of such an algorithm is given in Section 4.5.
(q)
Let us emphasize that the requirement that Z (q) is a (Fz )z∈R -stopping level
will be instrumental to apply Doob’s optimal stopping Theorem for martingales
indexed by levels z, in order to prove Theorem 3.2.
Note the following result, which is a straightforward consequence of the hy-
pothesis on (S (q) )q≥0 in Assumption 3.
P ROPOSITION 2.4. The random variable Qiter is a stopping time with respect
to the filtration (F (q) )q≥0 .
3.1. Statement of the main result. We start with the definition of a useful prop-
erty.
D EFINITION 3.1. A splitting algorithm which enters into the GAMS frame-
work satisfies the almost sure mass conservation property if
(13) ∀q ≥ 0, G(n,q) = 1 a.s.
n∈I (q)
Note that an algorithm which enters into the GAMS framework does not nec-
essarily satisfy the almost sure mass conservation property of Definition 3.1. Ac-
tually,
Theorem 3.2 states that the mass conservation holds on average: ∀q ≥ 0,
E( n ∈I (q) G(n ,q) ) = 1, by taking ϕ(x) = 1 and Qiter = q.
Notice that by choosing a given deterministic sequence of levels Z (q) = zq in
the GAMS framework, one actually recovers the well-known unbiasedness result
for nonadaptive splitting algorithms, where the levels are fixed a priori.
The strategy we follow to prove Theorem 3.2 is to introduce the sequence of
random variables
(14) M (q) (ϕ) = E π̂ (q) (ϕ)|F (q)
5 To obtain Q
iter = q0 , one simply has to choose
1, if q < q0 ,
S (q) =
0, if q ≥ q0 .
UNBIASEDNESS OF SOME GAMS ALGORITHMS 3575
for a given bounded measurable test function ϕ : P → R, and to show that the
process (M (q) (ϕ))q∈N indexed by q is a martingale with respect to the filtration
(F (q) )q∈N . Since, by Proposition 2.4, Qiter is a stopping time for this filtration,
Doob’s stopping theorem for discrete-time martingales can then be applied to ob-
tain Theorem 3.2. The next two sections are devoted to the proof of Theorem 3.2.
Let us now state two intermediate propositions before proving Theorem 3.2. The
first proposition states that, in the sense of Definition 3.3, the set of replicas with
indices in I (q) (resp., I (q+1) ) are F (q) -conditionally independent [resp., F (q) ∨
σ (P (q+1) )-conditionally independent] with explicit distributions.
P ROPOSITION 3.4. Let us consider the setting of Theorem 3.2. For any integer
q ≥ 0:
(i)q The replicas (X (n,q) )n∈I (q) are independent conditionally on the σ -field F (q) ,
with distribution (πZ (q) (X (n,q) , ·))n∈I (q) .
(ii)q The replicas (X (n ,q+1) )n ∈I (q+1) are independent conditionally on the σ -field
F (q) ∨ σ (P (q+1) ), with distribution (πZ (q) (X (n ,q+1) , ·))n ∈I (q+1) .
P ROPOSITION 3.5. Let us consider the setting of Theorem 3.2. For any integer
q ≥ 0 and for any bounded measurable test function ϕ : P → R:
(iii)q E(π̂ (q+1/2) (ϕ)|F (q) ) = E(π̂ (q) (ϕ)|F (q) ).
(iv)q E(π̂ (q+1) (ϕ)|F (q) ∨ σ (P (q+1) )) = E(π̂ (q+1/2) (ϕ)|F (q) ∨ σ (P (q+1) )).
3576 C.-E. BRÉHIER ET AL.
The proofs of both Proposition 3.4 and Proposition 3.5 are postponed to Sec-
tion 3.3. We are now in position to prove Theorem 3.2.
P ROOF OF T HEOREM 3.2. The proof consists of first proving that the process
(M (q) (ϕ))q≥0 defined by (14) is a (F (q) )q≥0 -martingale and then applying the
Doob’s optional stopping theorem.
Notice that E(M (q+1) (ϕ)|F (q) ) = E(π̂ (q+1) (ϕ)|F (q) ) and let us compute the
right-hand side. First, from point (iv)q of Proposition 3.5 and since F (q) ⊂ F (q) ∨
σ (P (q+1) ), we get
E π̂ (q+1) (ϕ)|F (q) = E E π̂ (q+1) (ϕ)|F (q) ∨ σ P (q+1) |F (q)
= E π̂ (q+1/2) (ϕ)|F (q) .
Second, from point (iii)q of Proposition 3.5 we have
E π̂ (q+1/2) (ϕ)|F (q) = E π̂ (q) (ϕ)|F (q) .
We thus have for any q ≥ 0,
(16) E π̂ (q+1) (ϕ)|F (q) = E π̂ (q) (ϕ)|F (q) ,
and (M (q) (ϕ))q∈N is therefore a (F (q) )q∈N -martingale.
We now focus on stopping the latter martingale at the random iteration Qiter . By
assumption, either the almost sure mass conservation property (13) is satisfied, in
which case (M (q) (ϕ))q∈N is a bounded martingale [since |M (q) (ϕ)| ≤ ϕ∞ ], or
Qiter ≤ qmax for some deterministic integer qmax ∈ N. In both cases, we apply the
Doob’s optional stopping theorem (see, for instance, [39], Chapter 7, Section 2,
Theorem 1 and Corollaries 1 and 2) to the martingale (M (q) (ϕ))q∈N and with the
stopping time Qiter with respect to the filtration (F (q) )q∈N . We obtain
E π̂ (Qiter ) (ϕ) = E M (0) (ϕ) = π(ϕ)
which completes the proof of Theorem 3.2.
3.3. Proofs of Propositions 3.4 and 3.5. Proposition 3.4 requires an additional
intermediate result, namely the propagation Lemma 3.6 below. This lemma gives
rigorous conditions under which the property on a system of replicas (X (n) )n∈I of
being independently distributed with distribution (πZ (X (n) , ·))n∈I conditionally
on F can be transported from the σ -field F to a larger σ -field. It is based on
Doob’s optional stopping theorem for martingales indexed by the level variable z.
Notice that it is the only result where the right continuity property of Assumption 1
is used.
and assume that Z ∈ R ∪ {−∞, +∞} is a stopping level for the filtration (Gz )z∈R
such that, almost surely, Z ≥ Z.
Then the replicas (X (n) )n∈I are independently distributed conditionally on GZ ,
with distribution (πZ (X (n) , ·))n∈I .
P ROOF. Step 1. The first step consists in proving that for any fixed z ∈ R, the
system of replicas is independently distributed with distribution (πZ∨z (X (n) , ·))n∈I
conditionally on G ∨ filtX
z . By a standard monotone class argument, it is sufficient
to show that
(n) (n)
E ϕn X (n) ψn X (n) Y =E πZ∨z (ϕn ) X ψn X Y ,
n∈I n∈I
where (ϕn )n≥1 ranges over bounded measurable test functions from P to R,
(ψn )n≥1 ranges over filtz -measurable test functions from P to R, and Y over
bounded G -measurable random
variables.
Let us denote I = E( n∈I ϕn (X (n) )ψn (X (n) )Y ) the left-hand side. Since Y is
G -measurable, by Definition 3.3 of the conditional independence we get that
(n)
I =E πZ (ϕn ψn ) X Y .
n∈I
The functions (ψn )n≥1 being filtz -measurable, they are a fortiori filtz∨z -measu-
rable for any z ∈ R. Assumption 2 on the partial resampling kernel (πz )z∈R then
yields
πz (ϕn ψn )(x) = πz πz ∨z (ψn ϕn ) (x) = πz ψn πz ∨z (ϕn ) (x).
As a consequence, using again that the system of replicas (X(n) )n∈I is indepen-
dently distributed with distribution (πZ (X (n) , ·))n∈I conditionally on G and that Y
is G -measurable, we get the following identity:
(n) (n)
I =E πZ ψn πZ∨z (ϕn ) X (n) Y =E πZ∨z (ϕn ) X ψn X Y ,
n∈I n∈I
P ROOF OF (i)q ⇒ (ii)q . Assume that (i)q holds. We rewrite property (ii)q as
follows:
E ϕn X (n ,q+1) F (q) ∨ σ P (q+1)
n ∈I (q+1)
(18)
= πZ (q) (ϕn ) X (n ,q+1) ,
n ∈I (q+1)
Next, from the induction hypothesis (i)q , the replicas (X(n,q) )n∈I (q) are indepen-
dent with distribution (πZ (q) (X (n,q) , ·))n∈I (q) conditionally on F (q) . Since P (q+1)
is sampled conditionally on F (q) , the replicas (X (n,q) )n∈I (q) are also independent
conditionally on F (q) ∨ σ (P (q+1) ), with the same distributions. Therefore [notice
(q+1)
that I (q) and Ikilled are F (q) ∨ σ (P (q+1) )-measurable],
E ϕn X (n,q) F (q) ∨ σ P (q+1)
(q+1)
n∈I (q) \Ikilled
= πZ (q) (ϕn ) X (n,q)
(q+1)
n∈I (q) \Ikilled
(q+1) (n ),q)
= πZ (q) (ϕn ) X (P .
(q+1)
n ∈I (q) \Ikilled
3580 C.-E. BRÉHIER ET AL.
P ROOF OF (ii)q ⇒ (i)q+1 . Let us now assume that (ii)q holds. To prove that
(i)q+1 holds, it is sufficient to check that
(n,q+1) (q+1)
E ϕn X F = πZ (q+1) (ϕn ) X (n,q+1) .
n∈I (q+1) n∈I (q+1)
This is again exactly the result of Lemma 3.6 applied to X (q+1) , taking Z = Z (q) ,
Z = Z (q+1) and G = F (q) ∨ σ (P (q+1) ) so that Gz = Fz
(q+1)
[where, we recall,
(q+1)
Fz is defined by equation (12)].
P ROOF OF (i)q + (ii)q ⇒ (iv)q . Using successively (i)q , the identity (19) and
(ii)q , we have
(q+1) (n ),q)
E π̂ (q+1/2) (ϕ)|F (q) ∨ σ P (q+1) = G(n ,q+1) πZ (q) (ϕ) X (P
n ∈I (q+1)
= G(n ,q+1) πZ (q) (ϕ) X (n ,q+1)
n ∈I (q+1)
= E π̂ (q+1) (ϕ)|F (q) ∨ σ P (q+1) .
4. The AMS algorithm for Markov chains. The goal of this section is to
define an Adaptive Multilevel Splitting algorithm based on the GAMS framework
of Section 2.2.1, when applied to paths of a Markov chain (namely a discrete
time stochastic process). In particular, we provide explicit examples of filtrations
(filtz )z∈R and partial resampling kernels (πz )z∈R introduced in Section 2.1, as well
as examples of level computation and branching rules in the algorithm.
In order to satisfy Assumptions 1, 2 and 3, and thus to obtain unbiased esti-
mators in this setting, a special care is required to treat the situations when many
replicas have the same maximum level, or the situations when there is extinction
of the population of replicas. These aspects which are specific to the discrete time
setting were not treated in details in many previous works where continuous time
diffusions were considered.
4.1. The Markov chain setting. Let X̃ = (X̃t )t∈N be a Markov chain, with
probability transition P , which takes values in a Polish state space S . Without
loss of generality, we assume that X̃0 = x0 where x0 ∈ S is a deterministic initial
condition.
The state space is the path space
(20) P = x = (xt )t∈N : xt ∈ S for all t ∈ N .
It is well known that, by introducing the distance dP (x, y) = t∈N 21t (1 ∧
sups≤t dS (xs , ys )) (which is a metric for the product topology), the space (P , dP )
is complete and separable. We thus see X̃ as a random variable with values in P .
4.2. The rare event of interest. Given two disjoint Borel subsets A and B of S ,
our objective is the efficient sampling of events such as {τB < τA } where
τA = inf{t ∈ N : X̃t ∈ A} and τB = inf{t ∈ N : X̃t ∈ B}
are respectively the first entrance times in A and B. Both τA and τB are stopping
times with respect to the natural filtration of the process X̃.
We are mainly interested in the estimation of the probability P(τB < τA ) in
the rare event regime, namely when this probability is very small (typically less
3582 C.-E. BRÉHIER ET AL.
than 10−8 ). As explained in the Introduction, this occurs for example if the initial
condition x0 ∈ Ac ∩ B c is such that x0 is close to A, and A and B are metastable
regions for the dynamics. The Markov chain starting from (a neighborhood of)
A (resp., B) remains for a very long time near A (resp. B) before going to B
(resp., A), and thus, the Markov chain starting from x0 reaches A before B with a
probability close to one. A specific example will be studied in Section 5.
The P -valued random variable X considered to apply the GAMS framework
and the associated results is the Markov chain stopped at time τA : X = (Xt )t∈N
where
(21) Xt = X̃t∧τA for any t ∈ N.
The probability distribution on (P , B (P )) is π = Law(X), namely the law of
the stopped Markov chain X. The aim of the AMS algorithm is to estimate the
(small) probability
(22) p = P(τB < τA ) = E(1TB (X)<TA (X) ) = π(1TB (·)<TA (·) ),
where we denote for any path x ∈ P
(23) TA (x) = inf{t ∈ N : xt ∈ A} and TB (x) = inf{t ∈ N : xt ∈ B}.
More generally, we build unbiased estimators of E(ϕ((Xt )t∈N )1τB <τA ), for
bounded measurable function ϕ : S → R [see equation (2)].
R EMARK 4.1 (On the stopping times τA and τB ). We defined above the stop-
ping times as first entrance times in some sets A and B. As will become clear
below, the definition of the algorithm and the unbiasedness result only require τA
and τB to be stopping times with respect to the natural filtration of the chain X̃.
In what follows, the values of ξ are called levels, and they precisely allow us to
specify the levels Z (q) computed at each iteration of the GAMS framework. We
will very often refer to the maximum level of a path, defined as follows.
4.4. Filtration and partial resampling kernel. We are now in position to define
the filtration (filtz )z∈R and the partial resampling kernel (πz )z∈R of Section 2.1 in
the Markov chain setting.
For all z ∈ R, filtz is the smallest σ -field which makes the application x ∈ P →
xt∧(Tz (x)) ∈ P measurable, where Tz is defined by (26):
(27) filtz = σ x → (xt∧(Tz (x)) )t≥0 .
For all z ∈ R the partial resampling kernel πz is defined as follows: for any
x ∈ P , πz (x, dx ) ∈ Proba(P ) is the law of the P -valued random variable Y such
that
Yt = xt , if t ≤ Tz (x),
(28)
Law(Yt |Ys , 0 ≤ s ≤ t − 1) = P (Yt−1 , ·), if t > Tz (x)
and is stopped at TA (Y ), when Y hits A. We recall that P is the transition kernel of
the Markov chain X. In other words, for t ≤ Tz (x), Y is identically x, while for t >
Tz (x), Yt is generated according to the Markov dynamics on S , with probability
transition P , and stopped when reaching A. The partial resampling kernel thus
performs a branching of the path x at time Tz (x) and position xTz (x) .
3584 C.-E. BRÉHIER ET AL.
Notice from the definition of the partial resampling kernel that if (x) ≤ z,
then this kernel does not modify x: Tz (x) = +∞ and πz (x, dx ) is a Dirac mass:
Yt = xt∧TA (x) for any t ∈ N.
Let us now check that Assumptions 1 and 2 are satisfied. The conditions of
Assumption 2 are direct consequences of the strong Markov property applied to
the chain t → Xt ∈ S defined by (21) at the stopping time τz (the strong Markov
property always holds true for discrete-time Markov processes).
The right-continuity property of Assumption 1 crucially relies on the defi-
nition (26) of Tz (x) as the entrance time of the path t → xt in the level set
ξ −1 (]z, +∞[): the fact that ]z, ∞[ is an open set implies z → Tz (x) is right con-
tinuous. Indeed, we have the following lemma.
L EMMA 4.4. Assumption 1 is satisfied for the partial resampling kernel de-
fined by (28). More precisely, for any x ∈ P , the function z ∈ R → πz (x, ·) ∈
Proba(P ) is piecewise constant and right continuous.
P ROOF. First, assume that Tz (x) = +∞, which means that (x) ≤ z. Then,
for any ε ≥ 0 we still have Tz+ε (x) = +∞. In that case, πz (x, ·) is a Dirac mass:
πz (x, ·) = πz+ε (x, ·) = δ(xt∧TA (x) )t≥0 .
Now, assume that Tz (x) < +∞. Then, for ε ∈ ]0, ξ(xTz (x) ) − z[ , Tz (x) =
Tz+ε (x), and by the definition of the partial resampling kernel, πz (x, ·) =
πz+ε (x, ·).
4.5. The AMS algorithm. In this section, we introduce the AMS algorithm for
the sampling of Markov chain trajectories. It is based on the GAMS framework of
Section 2.2.1; in addition to the framework, as explained in Section 2.2.2, we make
precise the following rules: the stopping criterion, the computation of branching
numbers and the computation of the stopping levels. We check below that they
satisfy Assumption 3. As a consequence, the GAMS framework encompasses the
AMS algorithm, and unbiased estimators of (6) can be defined.
The AMS algorithm iteratively generates a system of weighted replicas in the
state space P rep , using selection and partial resampling steps. The set of all the
labels of the replicas generated by the algorithm up to iteration q is denoted by
(q) (q) (q) (q)
I (q) = Ion Ioff , where Ion is the set of labels of “working” replicas, Ioff is
the set of labels of replicas which have been declared “retired.” We recall that
denotes the disjoint set union.
The cardinal of I (q) is increasing, while the number of “working” replicas is
(q)
kept fixed: card Ion = nrep for any q, where nrep is specified by the user of the
algorithm.
An additional parameter k ∈ {1, . . . , nrep − 1} finally is needed: it is the (min-
imum) number of replicas sampled at each step of the algorithm. The levels Z (q)
are computed as kth order statistics of the maximum levels of “working” replicas.
UNBIASEDNESS OF SOME GAMS ALGORITHMS 3585
F IG . 1. Schematic representation of the first iteration of the AMS algorithm, with nrep = 4 and
k = 2. The replicas numbered 2 and 4 are declared retired at the first iteration, and are replaced
by the replicas with label 5 and 6, which are respectively partially resampled from the replicas with
labels 3 and 1.
At iteration q, all replicas with maximum levels lower or equal to Z (q) are de-
clared “retired”, and new replicas are sampled in order to keep a fixed number nrep
of replicas with maximum level strictly larger than Z (q) .
We are now in position to introduce the AMS algorithm in full detail (see Fig-
ure 1 for a schematic representation of one iteration of the algorithm).
The initialization step (q = 0):
(i) Let (X(n,0) )1≤n≤nrep be i.i.d. replicas of the stopped Markov chain in P , dis-
tributed according to π . Initialize the sets of labels of working and retired
(0) (0)
replicas: I (0) = Ion = {1, . . . , nrep } and Ioff = ∅.
(ii) Initialize the weights: G (n,0) = 1/nrep for n ∈ {1, . . . , nrep }.
(0)
(0)
(iii) Compute a permutation of Ion = {1, . . . , nrep } such that
(0) (1),0) (0) (n ),0)
X ( ≤ · · · ≤ X ( rep
6 Notice that (0) is not necessarily unique since several replicas may have the same maximum
level. Nevertheless, the level Z (0) does not depend on the choice of (0) . The same remark applies
to the definition of the level Z (q+1) at iteration q ≥ 0.
3586 C.-E. BRÉHIER ET AL.
The stopping criterion. If Z (q) > zmax , then the algorithm stops and set
Qiter = q. Else perform the following four steps.
The splitting (branching) step:
(i) Set
(q)
Ion,>Z (q) = n ∈ Ion
(q)
, X (n,q) > Z (q)
(q) (q) (q)
and Ion,≤Z (q) = Ion \ Ion,>Z (q) .
(q) (q)
Set K (q+1) = card Ion,≤Z (q) = nrep − card Ion,>Z (q) ≥ k.
(ii) Introduce a new set Inew = {card I (q) + 1, . . . , card I (q) + K (q+1) } ∈ N∗ \
(q+1)
The stopping criterion. In the AMS algorithm, we set S (q) = 1Z (q) ≤zmax which
(q)
is indeed a F (q) -measurable random variable, since Z (q) is a (Fz )z∈R -stopping
level; see Lemma 4.6 below.
(q)
Thus, for n ∈ Ion,>Z (q) (and since P (q+1) (n) = n) the formula G(n,q+1) =
nrep −K (q+1) (n,q)
nrep G in (29) for the AMS algorithm is indeed consistent with the up-
dating formula (11) for the weights in the GAMS framework.
UNBIASEDNESS OF SOME GAMS ALGORITHMS 3589
which is again consistent with the updating formula (11) for the weights in the
nrep −K (q+1) (q+1) (n),q+1)
GAMS framework since nrep = 1/E(B (P |F (q) ).
Computation of the stopping levels. Let us now check that the requirements on
Z (q) in Assumption 3 are satisfied. By definition of Z (q+1) [see the level computa-
tion step of the AMS algorithm, with (30)], it is clear that Z (q+1) ≥ Z (q) (actually,
the strict inequality Z (q+1) > Z (q) holds). It remains to prove that Z (q) is a stop-
(q)
ping level for the filtration (Fz )z∈R .
We start with an elementary result, which again highlights the importance of the
strict inequality > z in the definitions (26) of Tz (x) and of τz = Tz (X).
L EMMA 4.6. Let X : → P be a Markov chain over the state space S [see
equation (20)]. Then the random variable (X) [where, we recall, the maximum
z )z∈R -stopping level: for any z ∈ R,
level mapping is defined by (25)] is a (filtX
{(X) ≤ z} ∈ filtz .
X
We are now in position to prove the last result which is needed for Assumption 3
to hold.
L EMMA 4.7. For any q ≥ 0, Z (q) is a stopping level with respect to the filtra-
(q) (q)
tion (Fz )z∈R : for any z ∈ R, {Z (q) ≤ z} ∈ Fz .
(q+1) (q+1)
Therefore, for any z ∈ R, (using the partition = {Lmax ≤ z} {Lmax > z})
(q+1) (q+1) (q+1)
Z ≤ z = Lk ≤ z ∩ Lk < L(q+1)
max
(q+1) (q+1)
= Lk (q+1)
< Lmax ≤ z Lk ≤ z < L(q+1)
max .
3590 C.-E. BRÉHIER ET AL.
4.7. The AMS estimator for the probability (22). An immediate corollary of
the results of Sections 4.4 and 4.6 above is that the GAMS framework encom-
passes the AMS algorithm, and that the unbiasedness result, Theorem 3.2, proven
in Section 3 can be applied. We detail the particular example of the estimation of
the probability p = P(τB < τA ); see (22).
(q+1)
Observe that the weights are easily computed: for n ∈ Ion ,
namely the proportion of working replicas that have reached B before A at the
final iteration. The properties of this estimator will be numerically investigated in
Section 5.
Note that the AMS algorithm presented in this section satisfies the almost sure
mass conservation property, Definition 3.1. Applying Theorem 3.2, we obtain the
following result.
UNBIASEDNESS OF SOME GAMS ALGORITHMS 3591
C OROLLARY 4.8. For any choice of the number of replicas nrep , of k, and of
the reaction coordinate ξ [provided it satisfies condition (24)], p̂ defined by (31)
is an unbiased estimator of the probability p = P(τB < τA ) defined by (22):
1 N
(33) pN = p̂m .
N m=1
the size of the 95% empirical confidence interval computed using the empirical
variance obtained over N independent runs of the algorithm.
The section is organized as follows. In Section 5.1, we give an example on
which we discuss the efficiency of the AMS algorithm by studying how the con-
vergence of the estimator depends on the reaction coordinate ξ . In Section 5.2,
we draw some conclusions and practical recommendations from these numerical
experiments. We refer to [7], Section 5, for additional simulations.
so that x0 ∈ R2 \ (A ∪ B).
5.1.2. Evolution of the empirical mean. Let us first perform simulations with
N independent runs of the algorithm, N varying between 1 and 6.106 . We repre-
sent on Figure 3 the evolution as a function of N of the empirical mean pN [defined
by (33)] and of the associated 95% confidence intervals [pn − δN /2, pn + δN /2]
computed using the empirical variance; see (34).
The colors in the figures are as follows: green (solid line) for ξ 1 , red (line with
crosses) for ξ 2 and blue (line with circles) for ξ 3 . The full lines represent the
evolution of the upper and lower bounds of the confidence intervals, while dashed
lines represent the evolution of the empirical means.
From these simulations, we observe that:
• When N is sufficiently large, the confidence intervals overlap. This is in agree-
ment with the fact that p̂ is an unbiased estimator of p whatever the choice of
the reaction coordinate.
• The statistical fluctuations depend a lot on the reaction coordinate. In particular,
the results obtained with ξ 1 seem much better than with ξ 2 or ξ 3 . We will come
back to this in Section 5.1.3.
• The confidence interval being computed empirically, one may conclude that the
algorithm is biased by considering the results for N too small (see, e.g., the
3594 C.-E. BRÉHIER ET AL.
F IG . 3. Evolution as a function of N of the empirical mean pN and of the associated 95% con-
fidence intervals [p N − δN /2, p N + δN /2]. Upper to lower β = 8.67, 9.33, 10. The right inserts
are zooms of the left graphs on smaller values of N , in order to illustrate the “apparent bias” phe-
nomenon.
UNBIASEDNESS OF SOME GAMS ALGORITHMS 3595
graphs in the right column in Figure 3). This is due to the fact that the empirical
variance dramatically underestimates the real variance if N is too small. This
is a well-known phenomenon for splitting algorithms in general called “appar-
ent bias”; see [24]. As β gets larger (namely as the temperature gets smaller),
the number of independent runs N required to observe overlapping confidence
intervals gets larger.
We observe that there are some realizations for which the estimator of the prob-
ability is very large. These realizations have small probability but they dramati-
cally increase the value of the empirical mean and of the empirical variance. This
explains the large variations which are observed on the empirical average and con-
fidence interval as a function of the number of realizations; see Figure 3. As it
is usually the case with Monte Carlo simulations for rare event simulations, it
is impossible to decide a priori if the sample size N is sufficiently large to give
an accurate estimation. However, using the unbiasedness property (Theorem 3.2),
a sensible way to choose a number of realizations N is to set it sufficiently large so
that the confidence intervals obtained with different reaction coordinates overlap.
TABLE 1
The bi-channel case. Proportion and conditional probabilities for two reaction coordinates: the
norm to the initial point (ξ 1 ) and the abscissa (ξ 3 ). e-n stands for 10−n
ξ1 8.67 2.106 0.81 0.45 0.03 0.52 2.7e–09 3.0e–09 2.3e–09 1.7e–09
ξ3 8.67 2.106 0.99 0.0008 0.02 0.98 2.3e–06 5.9e–10 5.5e–10 2.4e–09
ξ1 9.33 4.106 0.72 0.51 0.02 0.47 6.2e–10 6.3e–10 2.5e–10 3.2e–10
ξ3 9.33 4.106 0.97 0.0005 0.02 0.98 1.0e–06 5.6e–11 9.7e–11 6.0e–10
ξ1 10 6.106 0.62 0.51 0.01 0.48 1.5e–10 1.4e–10 5.2e–11 6.2e–11
ξ3 10 6.106 0.92 0.0004 0.01 0.99 1.4e–07 1.5e–11 1.8e–11 6.8e–11
3596 C.-E. BRÉHIER ET AL.
y > 0.5 (resp., such that y ≤ 0.5). More precisely, let us define 1 (x, y) = x
and 2 (x, y) = y for any (x, y) ∈ R2 . For a replica X = (Xt )t∈N such that
τ = inf{t ∈ N : 1 (Xt ) > 0} < ∞, X ∈ Upper if 2 (Xτ ) > 0.5 and X ∈ Lower
if 2 (Xτ ) ≤ 0.5.
For each realization of the algorithm, we compute the three following quantities:
• the number of replicas which reach B before A:
MB = 1TB (X(n,Qiter ) )<TA (X(n,Qiter ) ) ;
(Q )
n∈Ion iter
• the number of replicas which reach B before A and go through the upper chan-
nel:
MB,upper = 1TB (X(n,Qiter ) )<TA (X(n,Qiter ) ) 1X(n,Qiter ) ∈Upper ;
(Q )
n∈Ion iter
• the number of replicas which reach B before A and go through the lower chan-
nel:
MB,lower = 1TB (X(n,Qiter ) )<TA (X(n,Qiter ) ) 1X(n,Qiter ) ∈Lower .
(Q )
n∈Ion iter
card ENlower
ENlower = m ∈ EN : MB,upper
m = 0 and ρN
lower
= .
card EN
• Both channels are used by the replicas reaching B before A:
upper card ENmix
ENmix = EN \ EN ∪ ENlower and ρN
mix
= .
card EN
UNBIASEDNESS OF SOME GAMS ALGORITHMS 3597
upper
Obviously, ρN + ρN lower + ρ mix = 1. Finally, we define conditional estimators
N
for p̂ associated with the partition of EN defined above:
upper
m∈EN p̂m m∈ENlower p̂m m∈ENmix p̂m
upper
p̃N = upper , lower
p̃N = and mix
p̃N = .
card EN card ENlower card ENmix
Notice that
upper upper
p N = RN ρ N p̃N + ρN p̃N + ρN
lower lower mix mix
p̃N .
In other words, we have separated the nonzero contributions to p N into (i) real-
izations for which all the replicas go through the upper channel (first term in the
parenthesis), (ii) realizations for which all the replicas go through the lower chan-
nel (second term in the parenthesis) and finally (iii) realizations for which the two
channels are used by the replicas (third term in the parenthesis).
Let us emphasize that contrary to pN , the limit when N → ∞ of the estimators
upper lower , ρ mix , p̃ upper , p̃ mix or p̃ lower (for a given value of n ) depends
R N , ρN , ρN N N N N rep
on the choice of the reaction coordinate ξ ; see [7], Remark 5.5.
From Table 1, we observe that for ξ 1 , approximately half of the realizations use
exclusively the upper channel and the other half use the lower channel. The associ-
upper lower are very close. This is not the case for
ated conditional estimators p̃N and p̃N
3
ξ : only very few realizations go through the upper channel while the associated
upper lower and p̃ mix . This
probability p̃N is much larger than the two other ones p̃N N
means that a few realizations contribute a lot to the empirical average pN . This
explains the very large confidence intervals observed with ξ 3 (in comparison with
those observed for ξ 1 ) on Figure 3.
REFERENCES
[1] A SMUSSEN , S. and G LYNN , P. W. (2007). Stochastic Simulation: Algorithms and Analysis.
Stochastic Modelling and Applied Probability 57. Springer, New York. MR2331321
[2] AU , S. K. and B ECK , J. L. (2001). Estimation of small failure probabilities in high dimensions
by subset simulation. Journal of Probabilistic Engineering Mechanics 16 263–277.
[3] B ILLINGSLEY, P. (1999). Convergence of Probability Measures, 2nd ed. Wiley Series in Prob-
ability and Statistics: Probability and Statistics. Wiley, New York. MR1700749
[4] B LANCHET, J., G LYNN , P. and L IU , J. C. (2006). State-dependent importance sampling and
large deviations. In Proceedings of the 1st International Conference on Performance Eval-
uation Methodolgies and Tools, Valuetools ’06 ACM, New York.
[5] B RÉHIER , C.-E. (2015). Large deviations principle for the adaptive multilevel splitting al-
gorithm in an idealized setting. ALEA Lat. Am. J. Probab. Math. Stat. 12 717–742.
MR3446035
[6] B RÉHIER , C. E., C HAUDRU DE R AYNAL , P. E., L EMAIRE , V., PANLOUP, F. and R EY, C.
(2015). Recent advances in various fields of numerical probability. ESAIM Proceedings
and Reviews 51 272–292.
[7] B RÉHIER , C.-E., G AZEAU , M., G OUDENÈGE , L., L ELIÈVRE , T. and ROUSSET, M. (2015).
Unbiasedness of some generalized adaptive multilevel splitting algorithms. Preprint.
Available at arXiv:1505.02674.
[8] B RÉHIER , C.-E., G AZEAU , M., G OUDENÈGE , L. and ROUSSET, M. (2015). Analysis and
simulation of rare events for SPDEs. ESAIM Proceedings and Reviews 48 364–384.
[9] B RÉHIER , C.-E., G OUDENÈGE , L. and T UDELA , L. (2014). Central limit theorem for adaptive
multilevel splitting estimators in an idealized setting. In Monte Carlo and Quasi Monte
Carlo Methods. Springer, MCQMC, Leuven, Belgium.
[10] B RÉHIER , C.-E., L ELIÈVRE , T. and ROUSSET, M. (2015). Analysis of adaptive multilevel
splitting algorithms in an idealized case. ESAIM Probab. Stat. 19 361–394. MR3417480
[11] B UCKLEW, J. A. (2004). Introduction to Rare Event Simulation. Springer, New York.
MR2045385
[12] C ARON , V., G UYADER , A., M UNOZ Z UNIGA , M. and T UFFIN , B. (2014). Some recent results
in rare event estimation. In Journées MAS 2012. ESAIM Proc. 44 239–259. EDP Sci., Les
Ulis. MR3178620
[13] C ÉROU , F., D EL M ORAL , P., F URON , T. and G UYADER , A. (2012). Sequential Monte Carlo
for rare event estimation. Stat. Comput. 22 795–808. MR2909622
[14] C ÉROU , F., D EL M ORAL , P., L E G LAND , F. and L EZAUD , P. (2006). Genetic genealogi-
cal models in rare event analysis. ALEA Lat. Am. J. Probab. Math. Stat. 1 181–203.
MR2249654
[15] C ÉROU , F. and G UYADER , A. (2007). Adaptive multilevel splitting for rare event analysis.
Stoch. Anal. Appl. 25 417–443. MR2303095
[16] C ÉROU , F. and G UYADER , A. (2016). Fluctuations analysis of adaptive multilevel splitting.
Ann. Appl. Probab. 26 3319–3380.
3600 C.-E. BRÉHIER ET AL.
[17] C ÉROU , F., G UYADER , A., L ELIÈVRE , T. and P OMMIER , D. (2011). A multiple replica ap-
proach to simulate reactive trajectories. J. Chem. Phys. 134 054108 (16 p.).
[18] D EAN , T. and D UPUIS , P. (2009). Splitting for rare event simulation: A large deviation ap-
proach to design and analysis. Stochastic Process. Appl. 119 562–587. MR2494004
[19] D EL M ORAL , P. (2004). Feynman–Kac Formulae. Springer, New York. MR2044973
[20] D EL M ORAL , P., D OUCET, A. and JASRA , A. (2006). Sequential Monte Carlo samplers. J. R.
Stat. Soc. Ser. B. Stat. Methodol. 68 411–436. MR2278333
[21] D EL M ORAL , P. and G ARNIER , J. (2005). Genealogical particle analysis of rare events. Ann.
Appl. Probab. 15 2496–2534. MR2187302
[22] D UPUIS , P. and WANG , H. (2004). Importance sampling, large deviations, and differential
games. Stoch. Stoch. Rep. 76 481–508. MR2100018
[23] G ARVELS , M. J. J., K ROESE , D. P. and VAN O MMEREN , J. C. W. (2002). On the importance
function in splitting simulation. European Transactions on Telecom-Munications 13 363–
371.
[24] G LASSERMAN , P., H EIDELBERGER , P., S HAHABUDDIN , P. and Z AJIC , T. (1998). A large de-
viations perspective on the efficiency of multilevel splitting. IEEE Trans. Automat. Con-
trol 43 1666–1679. MR1658685
[25] G LASSERMAN , P., H EIDELBERGER , P., S HAHABUDDIN , P. and Z AJIC , T. (1999). Multilevel
splitting for estimating rare event probabilities. Oper. Res. 47 585–600. MR1710951
[26] G LASSERMAN , P. and WANG , Y. (1997). Counterexamples in importance sampling for large
deviations probabilities. Ann. Appl. Probab. 7 731–746. MR1459268
[27] G UYADER , A., H ENGARTNER , N. and M ATZNER -L ØBER , E. (2011). Simulation and esti-
mation of extreme quantiles and extreme probabilities. Appl. Math. Optim. 64 171–196.
MR2822407
[28] H AMMERSLEY, J. M. and H ANDSCOMB , D. C. (1965). Monte Carlo Methods. Methuen, Lon-
don. MR0223065
[29] J OHANSEN , A. M., D EL M ORAL , P. and D OUCET, A. (2006). Sequential Monte Carlo sam-
plers for rare events. In Proceedings of the 6th International Workshop on Rare Event
Simulation (RESIM 2006) 256–267, Bamberg.
[30] K AHN , H. and H ARRIS , T. E. (1951). Estimation of particle transmission by random sampling.
National Bureau of Standards 12 27–30.
[31] K ARATZAS , I. and S HREVE , S. E. (1991). Brownian Motion and Stochastic Calculus, 2nd ed.
Graduate Texts in Mathematics 113. Springer, New York. MR1121940
[32] L AGNOUX -R ENAUDIE , A. (2008). Effective branching splitting method under cost constraint.
Stochastic Process. Appl. 118 1820–1851. MR2454466
[33] L AGNOUX -R ENAUDIE , A. (2009). A two-step branching splitting model under cost constraint
for rare event analysis. J. Appl. Probab. 46 429–452. MR2535824
[34] M ETZNER , P., S CHÜTTE , C. and VANDEN -E IJNDEN , E. (2006). Illustration of transition path
theory on a collection of simple examples. J. Chem. Phys. 125 084110. Available at
http://dx.doi.org/10.1063/1.2335447.
[35] PARK , S., S ENER , M. K., L U , D. and S CHULTEN , K. (2003). Reaction paths based on mean
first-passage times. J. Chem. Phys. 119 1313–1319.
[36] ROLLAND , J. and S IMONNET, E. (2015). Statistical behaviour of adaptive multilevel splitting
algorithms in simple models. J. Comput. Phys. 283 541–558. MR3294688
[37] ROSENBLUTH , M. N. and ROSENBLUTH , A. W. (1955). Monte Carlo calculation of the aver-
age extension of molecular chains. J. Chem. Phys. 23 356–359.
[38] RUBINO , G. and T UFFIN , B. (2009). Introduction to rare event simulation. In Rare Event Sim-
ulation Using Monte Carlo Methods 1–13. Wiley, Chichester. MR2730759
[39] S HIRYAYEV, A. N. (1984). Probability. Graduate Texts in Mathematics 95. Springer, New
York. MR0737192
UNBIASEDNESS OF SOME GAMS ALGORITHMS 3601
[40] S IMONNET, E. (2014). Combinatorial analysis of the adaptive last particle method. Stat. Com-
put. 26 211–230.
[41] S KILLING , J. (2006). Nested sampling for general Bayesian computation. Bayesian Anal. 1
833–859 (electronic). MR2282208
[42] VANDEN -E IJNDEN , E. and W EARE , J. (2012). Rare event simulation of small noise diffusions.
Comm. Pure Appl. Math. 65 1770–1803. MR2982641
[43] V ILLÉN -A LTAMIRANO , M. and V ILLÉN -A LTAMIRANO RESTART, J. (1991). A method for
accelerating rare events simulations. In Proceeding of the Thirteenth International Tele-
traffic Congress, Volume Copenhagen, Denmark, June 19–26 of Queueing, Performance
and Control in ATM: ITC-13 Workshops 71–76. North-Holland, Amsterdam.
[44] WALTER , C. Moving particles: A parallel optimal multilevel splitting method with applications
in quantiles estimation and meta-model-based algorithms. Struct. Saf. 55 10–25.