BréhierEtal 2016

The Annals of Applied Probability
2016, Vol. 26, No. 6, 3559–3601

DOI: 10.1214/16-AAP1185
© Institute of Mathematical Statistics, 2016
UNBIASEDNESS OF SOME GENERALIZED ADAPTIVE

MULTILEVEL SPLITTING ALGORITHMS
B Y C HARLES -E DOUARD B RÉHIER∗,1 M AXIME G AZEAU† ,

L UDOVIC G OUDENÈGE‡ , T ONY L ELIÈVRE§,2 AND M ATHIAS ROUSSET§,2
CNRS and Institut Camille Jordan, Université Lyon 1∗ , University of Toronto† ,
Fédération de Mathématiques de l’École Centrale Paris‡
and Université Paris-Est, CERMICS (ENPC), INRIA§
We introduce a generalization of the Adaptive Multilevel Splitting algo-
rithm in the discrete time dynamic setting, namely when it is applied to sam-
ple rare events associated with paths of Markov chains. We build an estimator
of the rare event probability (and of any nonnormalized quantity associated
with this event) which is unbiased, whatever the choice of the importance
function and the number of replicas. This has practical consequences on the
use of this algorithm, which are illustrated through various numerical experi-
ments.
1. Introduction. The efficient sampling of rare events is a very important

topic in various application fields such as reliability analysis, computational statis-
tics or molecular dynamics. Let us start with describing the typical problem of
interest in the context of molecular dynamics.
1.1. Motivation and mathematical setting. Let us consider the Markov chain
(Xt )t∈N defined as the time discretization of the overdamped Langevin dynamics:

(1) ∀t ∈ N, Xt+1 − Xt = −∇V (Xt )h + 2β −1 (W(t+1)h − Wth ).
Typically, Xt ∈ R3N is a high-dimensional vector giving the positions of N parti-

cles in R3 at time th (h > 0 being the time step size), V : R3N → R is the potential
function [for any set of positions x ∈ R3N , V (x) is the energy of the configuration],
β = (kB T )−1 is the inverse temperature and Wt is a standard Brownian motion (so
that W(t+1)h −Wth is a vector of 3N i.i.d. centered Gaussian random variables with
variance h). In many cases of interest, the dynamics (1) is metastable: the N parti-
cles remain trapped for very long times in some so-called metastable states. These
Received June 2015; revised November 2015.

1 Supported by SNF Grant 200020-149871/1 for having funded his postdoctoral position at Uni-
versity of Neuchâtel (January 2015–August 2015).
2 Supported by the European Research Council under the European Union’s Seventh Framework
Programme (FP/2007-2013)/ERC Grant Agreement Number 614492.
MSC2010 subject classifications. 65C05, 65C35.
Key words and phrases. Rare event, adaptive multilevel splitting algorithms, unbiased estimator.
3559
3560 C.-E. BRÉHIER ET AL.
are for instance regions located around local minima of V . This actually corre-
sponds to a physical reality: the timescale at the molecular level (given by h, which
is typically chosen at the limit of stability for the stochastic differential equation) is
much smaller than the timescales of interest, which correspond to hopping events
between metastable states. Let us denote by A ⊂ R3N and B ⊂ R3N two (disjoint)
metastable states. One problem of interest is then the following: for some initial
condition outside A and B and close to A, how to efficiently sample paths which
reach B before A. In the context of molecular dynamics, such paths are called re-
active paths. The efficient sampling of reactive paths is a very important subject in
many applications since it is a way to understand the mechanism of the transition
between metastable states. In mathematical terms, one is interested in computing,
for a given test function ϕ : (R3N )N → R depending on the path (Xt )t∈N of the
Markov chain, the expectation

(2) E ϕ (Xt )t∈N 1τB <τA ,
where τA = inf{t ∈ N : Xt ∈ A}, τB = inf{t ∈ N : Xt ∈ B} and X0 = x0 ∈ / (A ∪
B) is assumed to be a deterministic initial position close to A: most trajectories
starting from x0 hit A before B. Generalization to a random initial condition is
straightforward, by a conditioning argument. If ϕ = 1, the above expectation is
P(τB < τA ), namely the probability that the Markov chain reaches B before A.
This is typically a very small probability: since A is metastable and x0 is close
to A, for most of the realizations, τA is smaller than τB . This is why naive Monte
Carlo methods will not give reliable estimates of (2). We refer for instance to [8,
17] for some examples in the context of molecular simulation. The problem we
would like to address in this article is thus the following: how to build “good”
estimators of (2), where (Xt )t∈N is a general Markov chain, and τA and τB are two
stopping times.
1.2. A short review of the literature on rare event simulation. A complete re-
view of rare event simulation techniques is out of the scope of this article; we
instead refer the interested reader to [1, 11, 12, 38], for instance. Our aim in this
section is mainly to explain the interest of splitting techniques in general, and
Adaptive Multilevel Splitting (AMS) in particular, to simulate rare events in some
specific contexts.
Two main families of algorithms for the efficient sampling of rare events have
been studied and applied successfully in many contexts since the pioneering works
on Monte Carlo methods in the 1950s [28, 30, 37].
The first family is known as importance sampling: the probability distribution
of interest is modified using an importance function, in order to enhance the real-
ization of the rare events; unbiased estimators are recovered thanks to the use of
appropriate likelihood ratios.
It may happen, for example for industrial applications in engineering or chem-
istry, that the stochastic model is only given as a black-box and cannot be modified.
UNBIASEDNESS OF SOME GAMS ALGORITHMS 3561
In such situations, importance sampling algorithms are unpractical, and one may
rely on splitting strategies as described below, for which by construction the model
does not need to be modified and is used as a black-box.
The second family of methods is given by splitting algorithms, which are it-
erative procedures based on interacting systems of replicas. These are selected
using an importance function, and then possibly duplicated and weighted accord-
ingly. A common interpretation is that the state-space is decomposed in a nested
sequence of subsets (which are level-sets for the importance function), such that
the rare event probability can be written as a (telescoping) product of conditional
probabilities, which are easier to compute.
Note that in the last 20 years, splitting algorithms have been studied exten-
sively, and many variants appeared: to cite a few, Subset Simulation [2], (Multi-
level) Splitting [25, 32, 33], Nested Sampling [41] and RESTART [43]. Note also
the relations with Genealogical Particle models [14, 19, 21] and Sequential Monte
Carlo methods [13, 20]. In the nonadaptive versions of the splitting method, the
nested sequence of subsets is fixed a priori and it is easy to build unbiased estima-
tors. However, to the best of our knowledge, similar results do not exist for adaptive
versions such as the Adaptive Multilevel Splitting (AMS) algorithm, where the se-
quence of subsets is built on-the-fly using the ensemble of replicas. This is the
focus of this article.
Adaptive versions, as proposed in [15], of multilevel splitting algorithms are
needed in practice: the performance of the estimation indeed depends on the choice
of the levels (i.e., of the nested susbets), which is not trivial if no additional infor-
mation on the system is available. In previous theoretical studies [5, 9, 10, 15, 27,
40, 44], the consistency (estimators are unbiased) and the efficiency (the algorithm
outperforms crude Monte Carlo methods in the rare event regime, when the size of
the system of replica increases) have been analyzed under very restrictive condi-
tions, in so-called idealized settings. Especially, these conditions are not satisfied
for processes in dimension larger than one, or for discrete-time dynamics.
Note that the sensitivity of the performance with respect to the choice of the im-
portance function is a well-known fact for both importance sampling and splitting
algorithms: the variance deteriorates for inappropriate choices; see [24, 26], and
numerical simulations in Section 5. Note that a general strategy to design impor-
tance functions for diffusions in small noise regimes is by approximating solutions
of Hamilton–Jacobi–Bellman equations; these equations are obtained by asymp-
totic analysis (often related to large deviations behavior); see, for instance, [4, 22,
42] for importance sampling and [18] for splitting algorithms. In the absence of
a small parameter (typically small noise) in order to perform asymptotic analysis,
such a technique breaks down.
1.3. The adaptive multilevel splitting algorithm. In this article, we focus on

the Adaptive Multilevel Splitting (AMS) method which has been proposed in [15].
Let us roughly describe the principle of the method; see Section 4.5 for the pre-
cise definition of the algorithm. The crucial ingredient we need is an importance
function:
(3) ξ : R3N → R
which will be used to measure the advance of the paths towards B. This function
is known as a reaction coordinate in the molecular dynamics community, and this
is the terminology we will use here. In this paper, we also call ξ(Xt ) the level of
the process Xt at time t, and [see (25)]

(X) = sup ξ(Xt∧τA ) : t ∈ N
the maximum level of the Markov chain path X. A useful requirement on ξ is the
existence of a level zmax ∈ R such that

B ⊂ x ∈ R3N : ξ(x) ∈ ]zmax , ∞[ .
Then, starting from a system of nrep replicas (all starting from the same initial
condition x0 and stopped at time τA ), the idea is to remove the least fit paths and
to duplicate the remaining paths while keeping a fixed number of replicas. The
least fit paths are those with the smallest maximum levels (X). As soon as one
of the least fit paths is removed, one of the remaining path is duplicated and then
partially resampled: the new path is a copy of the path up to the maximum level
of the removed least fit paths, and the end of the trajectory is then sampled using
independent random numbers. The algorithm thus goes through three steps: (i) a
level computation step (to determine the level under which paths will be removed:
this level is computed as an empirical quantile over the maximum levels among
the replicas); (ii) a splitting step (to determine which paths will be removed and
which ones of the remaining paths will be duplicated); (iii) a partial resampling
step (to generate new paths from the selected paths). By iterating these three steps,
one obtains successively systems of nrep paths with an increasing minimum of the
maximum levels among the replicas. The algorithm is stopped when the current
level is larger than zmax , and an estimator of (2) is then built using a weighted
empirical average over the replicas. The adaptive feature of the algorithm is in
the first step (the level computation step): indeed, at each iteration, paths are re-
moved if their maximum level is below some threshold, and these thresholds are
determined iteratively using empirical quantiles, rather than by fixing a priori a
deterministic sequence of levels (as it would be the case in nonadaptive splitting,
or more generally in standard sequential Monte Carlo algorithms; see [13, 19]).
All the details of the algorithm will be given in Section 4.5.
In this work, we focus on the application of the AMS algorithm to sample
Markov chains, namely discrete time stochastic dynamics, and not continuous time
stochastic dynamics as in [15] for example. The reason is mainly practical: in most
cases of interest, even if the original model is continuous in time, it is discretized
in time when numerical approximations are needed. There are actually also many
cases where the original model is discrete in time (e.g. kinetic Monte Carlo or
Markov state models in the context of molecular dynamics).
The discrete time setting, which is thus of practical interest, raises specific ques-
tions in the context of the AMS algorithm. First, in the partial resampling step,
a natural question which is answered in this article is whether the path should be
copied up to the last time before or first time after it reaches the level of the re-
moved paths. Second, in the discrete time context, it may happen that several paths
have exactly the same maximum level. This implies some subtleties in the imple-
mentation of the splitting step which have a large influence on the quality of the
estimators. We refer to [6] and [7], Section 5.1 for concrete examples where bad
implementations lead to strongly biased results. One objective of the present arti-
cle is actually to elucidate a correct implementation of the AMS algorithm in such
a discrete time setting.
1.4. Main results and outline. The main results and outline of this work are
the following:
• In Section 2, we introduce the Generalized Adaptive Multilevel Splitting
(GAMS) framework, which encompasses the AMS algorithm presented above.
The interest of this generalized setting is twofold. First, it is very useful to write
variants of the classical AMS algorithm (see in particular [7], Section 3.5). Sec-
ond, it highlights the essential mathematical properties that are required to pro-
duce unbiased estimators of quantities such as (2).
• In Section 3, we state and prove the main theoretical result (Theorem 3.2) of
this article: algorithms which fit in the GAMS framework yield unbiased esti-
mators of the rare event probability, and more generally of any nonnormalized
expectation related to the rare event, of the form (2).
• Section 4 is devoted to the detailed presentation of the AMS algorithm discussed
above; appropriate implementations of the level computation and splitting steps
are made explicit in Section 4.5. In particular, in Section 4.6 we prove that the
algorithm fits in the GAMS framework of Section 2.
• Section 5 is entirely devoted to some numerical experiments which illustrate the
unbiasedness result, and discuss the efficiency of the AMS algorithm to sample
rare events. We discuss through various numerical experiments the influence of
the choice of the reaction coordinate ξ on the variance of the estimators and we
end up with some practical recommendations in order to get reliable estimates
using the AMS algorithm (see Section 5.2). In particular, using the unbiased-
ness property proven in this paper, it is possible to compare the results obtained
using different parameters (in particular different reaction coordinates) in order
to assess the quality of the numerical results. In addition, it is easy to build an
unbiased estimator with smaller variance by parallelizing computations using
many independent realizations of AMS with a fixed number of replicas.
Let us mention that we very often refer to [7], which is an extended version of
the present work with in particular additional numerical experiments, and detailed
extensions of the AMS algorithm which enter into the GAMS framework.
From a theoretical perspective, this article presents extensions of previous re-
sults on splitting algorithms in two directions. On the one hand, it is well known
that unbiased estimators of quantities such as (2) can be built for nonadaptive split-
ting algorithms (i.e., with a fixed sequence of levels), and one of the main contribu-
tions of this article is to extend this highly desirable property, under appropriate as-
sumptions, for adaptive algorithms (when levels are computed on-the-fly). On the
other hand, compared to previous results in the literature concerning the AMS al-
gorithm, the main novelty of this work is the proof of the unbiasedness in a general
setting and whatever the parameters: the number of replicas, the (minimum) num-
ber of replicas sampled at each iteration and the reaction coordinate ξ . In previous
works (see, for instance, [10, 27, 40]), unbiasedness is proved in an idealized set-
ting, namely when the reaction coordinate is given by ξ(x) = Px (τB < τA ) (known
as the committor function; here, the subscript x ∈ R3N indicates that the Markov
chain Xt has x as an initial condition), and for a different partial resampling step,
where new replicas are sampled according to the conditional distribution of paths
conditioned to reach the level of the removed replicas. In many cases of practical
interest, in particular for sampling paths of discrete time processes, these two con-
ditions are not met, and the AMS algorithm of Section 4 is a suitable generalization
of existing algorithms to deal with such situations.
The proof of unbiasedness is inspired by the interpretation of the AMS algo-
rithm as a sequential Monte Carlo algorithm in path space, in the spirit of [29]
(the selection and mutation steps respectively corresponds to the branching step
and the partial resampling step in the AMS algorithm). In this interpretation, the
iteration index (or “time” index) of the sequential algorithm is given by the in-
creasing levels defined by the reaction coordinate. We refer the interested reader
to [7], Section 3.4, where this analogy is made precise.
As explained above, the bias is only one part of the error when using the AMS
algorithm: the statistical error (namely the variance) also plays a crucial role in
the quality of the estimator as will be shown numerically in Section 5. There are
unfortunately very few theoretical results concerning the influence of the choice of
ξ on the statistical error. We refer to [9, 16] for an analysis of the statistical error.
For discussions about the role of ξ on the statistical error, we also refer to [23, 24,
36]. In particular, in the numerical experiments, we discuss situations for which
the confidence intervals of the estimators associated with different reaction coor-
dinates do not overlap if the number of independent realizations of the algorithm
is not sufficiently large. We relate this observation to the well-known phenomenon
of “apparent bias” for splitting algorithms; see [24].
We would like to stress that our results hold in the setting where a family of
partial resampling kernels indexed by the levels is available (see Section 2.1.2 for
a precise definition). This is particularly well suited to the sampling of trajectories
of Markov dynamics (see [7], Section 3.4, for another possible setting). In the ter-
minology of [29], we have in mind the dynamic setting (considered, e.g., in [15]),
and not the static setting (considered, e.g., in [13, 16]). Nested sampling, [41], is
one instance of an adaptive multilevel splitting algorithm devised in the static set-
ting, and which does not enter into our framework; indeed, the partial resampling
kernels used in practice (based on few steps of a Metropolis–Hastings algorithm)
do not satisfy our assumptions (see Assumption 2 below).
1.5. Notation. Before going into the details, let us provide some general nota-
tion which is useful in the following:
• The underlying probability space is denoted by (, F , P). For m σ -fields
F 1 , . . . , F m ⊂ F , F 1 ∨ · · · ∨ F m denotes the smallest σ -field on contain-
ing all the σ -fields F 1 , . . . , F m . For any t, s ∈ N = {0, 1, . . .}, t ∧ s = min{t, s}
and t ∨ s = max{t, s}. We use the convention inf ∅ = +∞. For two sets A and
B which are disjoint, A B denotes the disjoint set union.
• We work in the following standard setting: random variables take values in state
spaces E which are Polish (namely metrizable, complete for some distance dE
and separable). The associated Borel σ -field is denoted by B (E ). We will give
precise examples below (see, e.g., Section 4.1 for the space of trajectories for
Markov chains).
Then Proba(E ) denotes the set of probability distributions on E . It is endowed
with the standard Polish structure associated with the Prohorov–Levy metric
which metrizes convergence in distribution, that is, weak convergence of proba-
bilities tested against continuous and bounded test functions (see, e.g., [3]). The
distribution of a E -valued random variable X will be denoted by Law(X).
• If E1 and E2 are two Polish state spaces, a Markov kernel (or transition probabil-
ity kernel) (x1 , dx2 ) from E1 to E2 is a measurable map from states in x1 ∈ E1 ,
to probability measures in Proba(E2 ).
• We use the following standard notation associated with probability transitions:
for ϕ : E2 → R a bounded and measurable test function,

(4) (ϕ)(x1 ) = ϕ(x2 ) (x1 , dx2 ), ∀x1 ∈ E1 .
x2 ∈E2

Similarly, we use the notation π(ϕ) = x∈E2 ϕ(x)π(dx) for π ∈ Proba(E2 ).
• Let X1 and X2 be random variables with values respectively in E1 and E2 , and
a Markov kernel from E1 to E2 . We say that X2 is sampled according to
(X1 , ·) [and we denote X2 ∼ (X1 , ·)] if X2 = f (X1 , U ) a.s. where, on the
one hand, U is a random variable independent of X1 and of all the random
variables introduced before (namely at previous iterations of the algorithm) and,
on the other hand, f is a measurable function which is such that (x1 , ·) =
Law(f (x1 , U )), for Law(X1 )-almost every x1 ∈ E1 .
• Let X be a random variable with values in E , and G be a sub-σ -field of F .

We say that X is sampled conditionally on G if there exists a random variable
XG which is G -measurable, a random variable U independent of XG and all
random variables introduced before, and a measurable function f such that X =
f (XG , U ).
• A random system of replicas in E is denoted by

(5) X = X (n) n∈I ∈ E rep , card I < +∞,
where I ⊂ N∗ = N \ {0} is a random finite subset of labels and (X (n) )n∈I are
elements of E . The space E rep is endowed with the following distance: for X 1 =
(x (1,n) )n∈I 1 and X 2 = (x (2,n) )n∈I 2 in E rep , we set
⎧
⎪2, if I 1 = I 2 ,
⎨
d X 1 , X 2 = min d x (1,n) , x (2,n) , 1 , if I 1 = I 2 .
⎪
⎩ E
n∈I 1
Endowed with this distance, the set E rep is Polish and we denote by B (E rep ) the
Borel σ -field. This σ -field can also be written as follows:

B E rep = B (E )⊗I ,
I ∈I
where I denotes the ensemble of finite subsets of N∗ (which is a countable set).

• When we consider systems of weighted replicas, to each replica X (n) of the sys-
tem X with label n ∈ I is attached a weight G(n) ∈ R+ , and we use the notation
X = (X(n) , G(n) )n∈I . The topological setting is the same as in the previous item,
E being replaced by the augmented state space E × R.
2. Generalized adaptative multilevel splitting. In this section, we introduce

a general framework for adaptive multilevel splitting algorithms, which contains
in particular the AMS algorithm which is described in Section 4 and applied on
a concrete example in Section 5. We refer to this framework as the Generalized
Adaptive Multilevel Splitting (GAMS) framework.
The interest of this abstract presentation is twofold. First, it highlights the es-
sential mathematical properties that are required to produce unbiased estimators of
quantities such as (2), and more generally (6) below. As will be proven in Section 3,
any algorithm which enters into the GAMS framework yields unbiased estimators
of such quantities. Second, it is very useful to propose variants of the classical
AMS algorithm which still yield unbiased estimators; see [7], Section 3.5.
In Section 2.1, we introduce in a general setting the quantities we are interested
in computing, and the main ingredients we need to state the GAMS framework,
which is then presented in Section 2.2.
2.1. The setting. In this section, we introduce the main ingredients and as-
sumptions we need in order to define the GAMS framework.
Let us introduce the state space P (typically the path space of a random pro-
cess); it is assumed to be a Polish space and let us denote B (P ) its Borel σ -field.
Let X be a random variable with values in (P , B (P )) and with probability distri-
bution
π = P ◦ X −1 ∈ Proba(P ).
The aim of the algorithms we present is to estimate

(6) π(ϕ) = ϕ(x)π(dx) = E ϕ(X)
P
for a given bounded measurable observable ϕ : P → R. Typically, ϕ has strong

variations within a specific set of states occurring with small probability (rare
event). Devising an algorithm which is able to efficiently sample these states [so
as to obtain an estimator of π(ϕ) with a small variance] is the main goal of the
presented algorithms.
We need to introduce two ingredients: a filtration on (P , B (P )) denoted by
(filtz )z∈R and a transition probability kernel πz (x, ·) from R × P to P . Concrete
examples (mainly for sampling paths of Markov chains) of the objects and proce-
dures considered in the present section are given in Section 4.
2.1.1. The filtration. We need an additional structure on (P , B (P )), namely

a filtration indexed by a real number z ∈ R (that we refer to as “level” in the
following). It is denoted by
(7) (filtz )z∈R .
This filtration is a nondecreasing family of sub-σ -fields of B (P ): filtz ⊂ filtz ⊂
B (P ) for any z < z . For a given z ∈ R, the σ -field filtz is interpreted below as
containing the information required to partially resample replicas, conditionally
on the knowledge of states up to level z. By convention, we set
filt−∞ = {∅, P }, filt+∞ = B (P ).
For any random variable X : (, F ) → (P , B (P )), we define a filtration
(filtX
z )z∈R on the probability space by pulling-back the filtration (filtz )z∈R :
−1
(8) z = X (filtz ).
filtX
2.1.2. The partial resampling kernel πz (x, ·). We introduce a transition prob-
ability kernel from R × P to P :
(z, x) ∈ R × P → πz (x, ·) ∈ Proba(P ).
By convention, for any x ∈ P , we set

π−∞ (x, ·) = π, π+∞ (x, ·) = δx ,
which is consistent with Assumption 1 below.
This kernel is used to perform the partial resampling in the algorithm: for a
given level z ∈ R and a given state x ∈ P , πz (x, dx ) is the probability distribution
of the offspring which is resampled from x knowing the state x up to level z.
In the following, we will refer to this transition probability kernel as a partial
resampling kernel.
Let us emphasize that it is implicitly assumed that one knows a practical proce-
dure to sample according to the probability measure π [step (ii) of the initialization
step below] and according to the probability distribution πz (x, ·), for any x ∈ P
and z ∈ R [step (ii) of the partial resampling step below].
2.1.3. Assumptions on (filtz )z∈R and (πz )z∈R . We will need two assumptions
on (filtz )z∈R and (πz )z∈R . The first one states a right continuity property of the
mapping z → πz (φ)(x) and is required to apply the Doob’s optional stopping the-
orem in the proof of Lemma 3.6.
A SSUMPTION 1. For any x ∈ P , and any continuous bounded test function

ϕ : P → R,
⎧
⎪
⎨R → R,
⎩z → πz (ϕ)(x) =
⎪ ϕ(y)πz (x, dy)
y∈P
is right-continuous. Moreover, limz→−∞ πz (ϕ)(x) = π−∞ (ϕ)(x) = π(ϕ).
Second, we require a consistency relation between the filtration (filtz )z∈R and
the transition probability kernel (πz )z∈R .
A SSUMPTION 2. Let us consider a random variable X, (filtX z )z∈R and (πz )z∈R
as introduced above. We assume the following consistency relation: if X is dis-
tributed according to πz (x, ·) for some (z, x) ∈ R × P , then for any z ≥ z and for
any bounded measurable test function ϕ : P → R,

z = πz (ϕ)(X)
E ϕ(X)|filtX a.s.
As a consequence (by letting z → −∞ in the previous assumption), if X is dis-

tributed according to π , then, for any z ∈ R, πz (X, ·) is a version of the law of X
z . Therefore, the σ -field filtz defines πz (x, ·) for π -almost all x,
conditional on filtX X
and can be interpreted as containing all the information on a replica X required to

partially resample its state from level z .
2.1.4. Filtrations generated by systems of replicas. We can construct a filtra-

rep
tion (filtz )z∈R on the space of replicas (P rep , B (P rep )) [defined by (5)] by con-
sidering for all z ∈ R the σ -field

z =
filtrep (filtz )⊗I ,
I ∈I
where, we recall, I denotes the ensemble of finite subsets of N∗ .

Then, if X = (X (n) )n∈I ∈ P rep denotes a random system of replicas—that is,
a random variable X : (, F ) → (P rep , B (P rep ))—we also define the filtration
(filtX
z )z∈R by the pulling-back procedure:
−1
filtX
z =X filtrep
z .
We also consistently set

rep rep
filt−∞ = X −1 filt−∞ = σ (I ) filt+∞ = X −1 filt+∞ = σ (X ),
χ χ
and
where σ (I ) is the σ -field generated by the random set of labels I .
R EMARK 2.1 (On the definition of the filtrations). In many cases of practical
interest, for any z ∈ R, filtz is defined as the smallest filtration which makes an
rep
application Fz : P → (E , B (E )) measurable, for some Polish space E . Then filtz
is the smallest σ -field which makes the application Gz : P rep → E rep measurable,
with Gz ((x (n) )n∈I ) = (Fz (x (n) ))n∈I .
2.1.5. Stopping levels. We finally introduce the notion of stopping level,

which is simply a reformulation of the notion of stopping time in our context where
the filtrations are indexed by levels instead of times.
D EFINITION 2.2 (Stopping level, Stopped σ -field). Let (Fz )z∈R be a filtration
on (, F , P). A stopping level Z with respect to (Fz )z∈R is a random variable
with values in R such that {Z ≤ z} ∈ Fz for any z ∈ R ∪ {−∞, +∞}. The stopped
σ -field, denoted by FZ , is characterized as follows:
A ∈ FZ if and only if ∀z ∈ R, A ∩ {Z ≤ z} ∈ Fz .
In particular, Z is a FZ -measurable random variable.
We are now in position to introduce the GAMS framework in the following

section.
2.2. The generalized adaptive multilevel splitting framework. The aim of this
section is to introduce a general framework for splitting algorithms [which we
refer to as the Generalized Adaptive Multilevel Splitting (GAMS) framework in
the sequel]. It is a convenient, general procedure which allows to define practically
implementable algorithms.
It iterates over three successive steps: (1) a branching or splitting step, (2) a
partial resampling step and (3) a level computation step. These steps are performed
until a suitable stopping criterion is satisfied. We denote by Qiter the number of
iterations, which in general is a random variable.
At each iteration step q ≥ 0 of the algorithm, the distribution π is approximated
by an empirical distribution

(9) π̂ (q) = G(n,q) δX(n,q) ,
n∈I (q)
over a system of weighted replicas X (q) := (X (n,q) , G(n,q) )n∈I (q) ∈ P rep , where
I (q) ⊂ N∗ is the (random) finite set of labels at step q of the algorithm and G(n,q) ∈
R+ is the (random) weight attached to the replica X(n,q) .
As it will be proven in Section 3, any algorithm which enters into the GAMS
framework is such that, for any bounded measurable function ϕ : P → R, π̂ (q) (ϕ)
is an unbiased estimator of π(ϕ): for any q ≥ 0, E(π̂ (q) (ϕ)) = π(ϕ). Moreover,
under appropriate assumptions (see Theorem 3.2), this statement can be gener-
alized when q is replaced by the random number of iterations Qiter of the algo-
rithm:

E G(n,Qiter ) ϕ X (n,Qiter ) = π(ϕ).
n∈I (Qiter )
The proof of this result (Theorem 3.2) is given in Sections 3.2 and 3.3.
As it will become clear, in order to obtain a fully implementable algorithm from
the GAMS framework, three procedures need to be made precise (i) the stopping
criterion, (ii) the computation rule of the branching numbers and (iii) the computa-
tion of the stopping levels. These procedures require to define three sets of random
variables that are used in the GAMS framework presented in the next Section 2.2.1:
(S (q) )q≥0 (the exit rule defining Qiter ), (B (n,q+1) )q≥0,n∈I (q) (the branching num-
bers defining duplication of replicas) and (Z (q) )q≥0 (the random levels defining
the partial resampling of replicas states). The precise assumptions on these ran-
dom variables required to have unbiasedness for (9) (Theorem 3.2) are stated in
Assumption 3 below (Section 2.2.2). A concrete example of sets of random vari-
ables will be given in Section 4, where the AMS algorithm in a Markov chain
context is presented.
2.2.1. Precise definition of the GAMS framework. We now introduce the Gen-
eralized Adaptive Multilevel Splitting (GAMS) framework, which is an iterative
procedure on an integer index q ≥ 0.
The initialization step (q = 0):

(i) Define the initial set of labels I (0) = {1, . . . , card I (0) } ⊂ N∗ , where card I (0)
is assumed to be positive and finite.
(ii) Let (X(n,0) )n∈I (0) be a sequence of P -valued i.i.d. random variables, dis-
tributed according to the probability measure π .
(iii) Initialize uniformly the weights: for any n ∈ I (0) set G(n,0) = 1/ card I (0) .
(iv) Define the system of weighted replicas X (0) = (G(n,0) , X (n,0) )n∈I (0) and for
(0)
any z ∈ R, define the σ -field of events Fz = filtX
(0)
z .
(v) Sample the initial level Z (0) .
(0)
(vi) Define the σ -field of events3 F (0) = FZ (0) .
Iteration. Iterate on q ≥ 0, while the stopping criterion is not satisfied.
The stopping criterion. Sample the random variable S (q) ∈ {0, 1}.
If S (q) = 0, then the algorithm stops and we set Qiter = q.
Otherwise, if S (q) = 1, the three following steps are performed.
The splitting (branching) step:
(i) Conditionally on F (q) , sample the N-valued random branching numbers
(B (n,q+1) )n∈I (q) .
Introduce the set of labels of replicas which are removed from the system:
(q+1)
Ikilled = {n ∈ I (q) : B (n,q+1) = 0}.

(ii) Compute K (q+1) = n∈I (q) max{B (n,q+1) − 1, 0} the total number of new
replicas.
(iii) Introduce the set Inew = {max I (q) + 1, . . . , max I (q) + K (q+1) } ⊂ N∗ \ I (q)
(q+1)
for new labels and update the current set of labels

(q+1)
(10) I (q+1) = I (q) \ Ikilled Inew
(q+1)
.
(q+1) (q+1)
(iv) Set a children-parent map P (q+1) : Inew → I (q) \ Ikilled such that for any
(q+1)
n ∈ I (q) \ Ikilled we have

card n ∈ Inew
(q+1)
: P (q+1) n = n = B (n,q+1) − 1.
This map associates to the label of a new replica the label of its parent.
The map is extended to I (q+1) (i.e., to all remaining replicas) as follows:
(q+1)
P (q+1) (n) = n for any n ∈ I (q) \ Ikilled .
(v) Update the weights as follows: for all n ∈ I (q+1) and n ∈ I (q) \ Ikilled such
(q+1)
that P (q+1)
(n ) = n,
G(n,q)
(11) G(n ,q+1) = .
E(B (n,q+1) |F (q) )
3 Assumption 3 ensures that Z (0) is a (F (0) ) (0)

z z∈R - stopping level, so that the σ -field FZ (0) is well
defined.
The partial resampling step:

(q+1) (q+1)
(i) Replicas in I (q) \ Ikilled are not modified, that is, for any n ∈ I (q) \ Ikilled ,
X (n,q+1) = X(n,q) .

(ii) For each n ∈ Inew , X(n ,q+1) is sampled independently according to the dis-
(q+1)
(q+1) (n ),q)
tribution πZ (q) (X (P , dx), that is, by partially resampling the state of
(q+1) (n ),q)
its parent replica X(P at level Z (q) .
Then set X (q+1) = (X(n,q+1) , G(n,q+1) )n∈I (q+1) .
The level computation step:
(i) For any z ∈ R, define the σ -field of events
(q+1)
(12) Fz(q+1) = F (q) ∨ σ P (q+1) ∨ filtX
z .
Note that the σ -field σ (P (q+1) ) generated by P (q+1) contains in particular
the σ -field generated by (B (n,q+1) )n∈I (q) .
(ii) Sample the next level Z (q+1) ≥ Z (q) .
(q+1)
(iii) Define the σ -field of events4 F (q+1) = FZ (q+1) .
Increment. Increment q ← q + 1 and go back to the stopping criterion.
The branching number B (n,q+1) introduced in the splitting step (i) represents
the number of offsprings of the replica X (n,q) . If B (n,q+1) ≥ 1, the replica X (n,q)
will be split into B (n,q+1) replicas: the old one (parent) X (n,q) with label n ∈ I (q)
and, if B (n,q+1) > 1, B (n,q+1) − 1 new ones (children) that are defined in the partial
resampling step. If B (n,q+1) = 0, the replica X (n,q) is removed from the system.
Note that the system of weighted replicas (X (q) )q≥0 and the associated filtration
(F (q) )q≥0 are actually defined for all q ≥ 0 (and not only up to the iteration Qiter ).
This is simply verified by considering the iterative procedure above with S (q) = 1
for all q ≥ 0.
R EMARK 2.3. The choice of the new weights in (11) can be generalized in

the following way. In the splitting step (v), sample the new weights G(n ,q+1) con-
ditionally on F (q) and P (q+1) and assume that the weights satisfy: for all n ∈ I (q) ,

(n ,q+1)
E 1B (n,q+1) ≥1 G F (q)
= G(n,q) .
n ,P (q+1) (n )=n
Note that the weights defined by (11) satisfy this generalized requirement, but
other strategies are also allowed, such as: for all n ∈ I (q+1) and n ∈ I (q) \ Ikilled
(q+1)
4 Assumption 3 ensures that Z (q+1) is a (F (q+1) ) (q+1)

z z∈R -stopping level, so that the σ -field FZ (q+1)
is well defined.
such that P (q+1) (n ) = n,

G(n,q)
G(n ,q+1) = .
B (n,q+1) P(B (n,q+1) ≥ 1|F (q) )
2.2.2. From the GAMS framework to a practical algorithm. In the GAMS

(q)
framework, we have defined [see (12)] a family of σ -fields (Fz )q≥0,z∈R , which
is indexed both by the level z ∈ R and by the iteration index q ≥ 0. At the
end of the qth iteration of the algorithm (q ≥ 0), one can think of the σ -field
(q+1)
F (q+1) = FZ (q+1) as containing all the necessary information required to perform
the next step of the algorithm.
To make a practical splitting algorithm which enters into the GAMS framework,
three sets of random variables need to be defined: (S (q) )q≥0 , (B (n,q+1) )q≥0,n∈I (q) )
and (Z (q) )q≥0 . From now on, we assume the following on these random variables.
A SSUMPTION 3. The random variables (S (q) )q≥0 , (B (n,q+1) )q≥0,n∈I (q) , and
(Z (q) )q≥0 ) satisfy the following properties:
• the random variables (S (q) )q≥0 needed for defining the stopping criterion are
such that S (q) is F (q) -measurable;
• for each q ∈ N, the branching numbers (B (n,q+1) )n∈I (q) are with values in N,
and are assumed to be sampled conditionally on F (q) (see Section 1.5 for a
precise definition), such that E(B (n,q+1) |F (q) ) > 0 a.s.;
• the stopping levels (Z (q) )q≥0 are with values in R, satisfy Z (q+1) ≥ Z (q) and are
(q)
such that Z (q) is a stopping level with respect to (Fz )z∈R (see Definition 2.2).
Once these three sets of random variables have been defined, the GAMS frame-
work becomes a practical splitting algorithm which yields an unbiased estimator
of (6) (this is the claim of Theorem 3.2 proved in Section 3). A concrete example
of such an algorithm is given in Section 4.5.
(q)
Let us emphasize that the requirement that Z (q) is a (Fz )z∈R -stopping level
will be instrumental to apply Doob’s optimal stopping Theorem for martingales
indexed by levels z, in order to prove Theorem 3.2.
Note the following result, which is a straightforward consequence of the hy-
pothesis on (S (q) )q≥0 in Assumption 3.
P ROPOSITION 2.4. The random variable Qiter is a stopping time with respect
to the filtration (F (q) )q≥0 .
3. The unbiasedness theorem. In the present section, the unbiasedness of

the empirical distribution (9) which estimates π is proven. The main result, Theo-
rem 3.2, is stated in Section 3.1. The last two Sections 3.2 and 3.3 are devoted to
the proof of Theorem 3.2.
3.1. Statement of the main result. We start with the definition of a useful prop-
erty.
D EFINITION 3.1. A splitting algorithm which enters into the GAMS frame-
work satisfies the almost sure mass conservation property if

(13) ∀q ≥ 0, G(n,q) = 1 a.s.
n∈I (q)
The main theoretical result of this paper is the following.
T HEOREM 3.2. Let (X (q) )0≤q≤Qiter be the sequence of random systems of

weighted replicas generated by an algorithm which enters into the GAMS frame-
work of Section 2. In particular, the Assumptions 1 and 2 on the general setting
(see Section 2.1) as well as the Assumption 3 on the stopping criterion, branching
numbers and level computations (see Section 2.2.2) are supposed to hold.
Assume moreover that the number of iterations Qiter is almost surely finite [this
condition writes P(Qiter < +∞) = 1] and that one of the following conditions is
satisfied:
• Qiter is bounded from above by a deterministic constant,
• or the almost sure mass conservation (13) is satisfied.
Then, for any bounded measurable test function ϕ : P → R,

E π̂ (Qiter ) (ϕ) = π(ϕ).
In particular,5

∀q ∈ N, E π̂ (q) (ϕ) = π(ϕ).
Note that an algorithm which enters into the GAMS framework does not nec-
essarily satisfy the almost sure mass conservation property of Definition 3.1. Ac-
tually,

Theorem 3.2 states that the mass conservation holds on average: ∀q ≥ 0,

E( n ∈I (q) G(n ,q) ) = 1, by taking ϕ(x) = 1 and Qiter = q.
Notice that by choosing a given deterministic sequence of levels Z (q) = zq in
the GAMS framework, one actually recovers the well-known unbiasedness result
for nonadaptive splitting algorithms, where the levels are fixed a priori.
The strategy we follow to prove Theorem 3.2 is to introduce the sequence of
random variables

(14) M (q) (ϕ) = E π̂ (q) (ϕ)|F (q)
5 To obtain Q
iter = q0 , one simply has to choose

1, if q < q0 ,
S (q) =
0, if q ≥ q0 .
for a given bounded measurable test function ϕ : P → R, and to show that the
process (M (q) (ϕ))q∈N indexed by q is a martingale with respect to the filtration
(F (q) )q∈N . Since, by Proposition 2.4, Qiter is a stopping time for this filtration,
Doob’s stopping theorem for discrete-time martingales can then be applied to ob-
tain Theorem 3.2. The next two sections are devoted to the proof of Theorem 3.2.
3.2. Proof of Theorem 3.2. The following definition of conditionally indepen-

dent replicas will be useful in the proof.
D EFINITION 3.3. Let Z be a random level, I ⊂ N∗ a finite random set of

indices and G a σ -field of events. We assume that σ (I )∨σ (Z) ⊂ G . We say that the
random system of replicas (X(n) )n∈I is independently distributed with distribution
(πZ (X (n) , ·))n∈I conditionally on G , if for any sequence of bounded measurable
functions (ϕn )n≥1 from P to R, we have

E ϕn X (n) G = πZ (ϕn ) X (n) .
n∈I n∈I
Let us now state two intermediate propositions before proving Theorem 3.2. The
first proposition states that, in the sense of Definition 3.3, the set of replicas with
indices in I (q) (resp., I (q+1) ) are F (q) -conditionally independent [resp., F (q) ∨
σ (P (q+1) )-conditionally independent] with explicit distributions.
P ROPOSITION 3.4. Let us consider the setting of Theorem 3.2. For any integer
q ≥ 0:
(i)q The replicas (X (n,q) )n∈I (q) are independent conditionally on the σ -field F (q) ,
with distribution (πZ (q) (X (n,q) , ·))n∈I (q) .

(ii)q The replicas (X (n ,q+1) )n ∈I (q+1) are independent conditionally on the σ -field

F (q) ∨ σ (P (q+1) ), with distribution (πZ (q) (X (n ,q+1) , ·))n ∈I (q+1) .
The second proposition states intermediate equalities between conditional aver-

ages of the empirical distributions, required to obtain the desired martingale prop-
erty of (M (q) (ϕ))q≥0 , and which are easily obtained from Proposition 3.4. We
introduce the following notation for the weighted empirical distribution at the end
of the splitting step (i.e., before the partial resampling step is performed):

(15) π̂ (q+1/2) = G(n ,q+1) δX(P (q+1) (n ),q) .
n ∈I (q+1)
P ROPOSITION 3.5. Let us consider the setting of Theorem 3.2. For any integer
q ≥ 0 and for any bounded measurable test function ϕ : P → R:
(iii)q E(π̂ (q+1/2) (ϕ)|F (q) ) = E(π̂ (q) (ϕ)|F (q) ).
(iv)q E(π̂ (q+1) (ϕ)|F (q) ∨ σ (P (q+1) )) = E(π̂ (q+1/2) (ϕ)|F (q) ∨ σ (P (q+1) )).
The proofs of both Proposition 3.4 and Proposition 3.5 are postponed to Sec-
tion 3.3. We are now in position to prove Theorem 3.2.
P ROOF OF T HEOREM 3.2. The proof consists of first proving that the process
(M (q) (ϕ))q≥0 defined by (14) is a (F (q) )q≥0 -martingale and then applying the
Doob’s optional stopping theorem.
Notice that E(M (q+1) (ϕ)|F (q) ) = E(π̂ (q+1) (ϕ)|F (q) ) and let us compute the
right-hand side. First, from point (iv)q of Proposition 3.5 and since F (q) ⊂ F (q) ∨
σ (P (q+1) ), we get

E π̂ (q+1) (ϕ)|F (q) = E E π̂ (q+1) (ϕ)|F (q) ∨ σ P (q+1) |F (q)

= E π̂ (q+1/2) (ϕ)|F (q) .
Second, from point (iii)q of Proposition 3.5 we have

E π̂ (q+1/2) (ϕ)|F (q) = E π̂ (q) (ϕ)|F (q) .
We thus have for any q ≥ 0,

(16) E π̂ (q+1) (ϕ)|F (q) = E π̂ (q) (ϕ)|F (q) ,
and (M (q) (ϕ))q∈N is therefore a (F (q) )q∈N -martingale.
We now focus on stopping the latter martingale at the random iteration Qiter . By
assumption, either the almost sure mass conservation property (13) is satisfied, in
which case (M (q) (ϕ))q∈N is a bounded martingale [since |M (q) (ϕ)| ≤ ϕ∞ ], or
Qiter ≤ qmax for some deterministic integer qmax ∈ N. In both cases, we apply the
Doob’s optional stopping theorem (see, for instance, [39], Chapter 7, Section 2,
Theorem 1 and Corollaries 1 and 2) to the martingale (M (q) (ϕ))q∈N and with the
stopping time Qiter with respect to the filtration (F (q) )q∈N . We obtain

E π̂ (Qiter ) (ϕ) = E M (0) (ϕ) = π(ϕ)
which completes the proof of Theorem 3.2.
3.3. Proofs of Propositions 3.4 and 3.5. Proposition 3.4 requires an additional
intermediate result, namely the propagation Lemma 3.6 below. This lemma gives
rigorous conditions under which the property on a system of replicas (X (n) )n∈I of
being independently distributed with distribution (πZ (X (n) , ·))n∈I conditionally
on F can be transported from the σ -field F to a larger σ -field. It is based on
Doob’s optional stopping theorem for martingales indexed by the level variable z.
Notice that it is the only result where the right continuity property of Assumption 1
is used.
L EMMA 3.6. Let us assume that Assumptions 1 and 2 hold. Let Z ∈ R ∪

{−∞, +∞} be a random level, G a σ -field, and I ⊂ N∗ a finite random set of
labels. Assume that σ (I ) ∨ σ (Z) ⊂ G . Consider a random system of replicas X =

(X (n) )n∈I , which is independently distributed with distribution (πZ (X (n) , ·))n∈I
conditionally on G (in the sense of Definition 3.3). Set
(17) ∀z ∈ R, Gz = G ∨ filtX
z ,
and assume that Z ∈ R ∪ {−∞, +∞} is a stopping level for the filtration (Gz )z∈R
such that, almost surely, Z ≥ Z.
Then the replicas (X (n) )n∈I are independently distributed conditionally on GZ ,
with distribution (πZ (X (n) , ·))n∈I .
P ROOF. Step 1. The first step consists in proving that for any fixed z ∈ R, the
system of replicas is independently distributed with distribution (πZ∨z (X (n) , ·))n∈I
conditionally on G ∨ filtX
z . By a standard monotone class argument, it is sufficient
to show that

(n) (n)
E ϕn X (n) ψn X (n) Y =E πZ∨z (ϕn ) X ψn X Y ,
n∈I n∈I
where (ϕn )n≥1 ranges over bounded measurable test functions from P to R,
(ψn )n≥1 ranges over filtz -measurable test functions from P to R, and Y over
bounded G -measurable random

variables.
Let us denote I = E( n∈I ϕn (X (n) )ψn (X (n) )Y ) the left-hand side. Since Y is
G -measurable, by Definition 3.3 of the conditional independence we get that

(n)
I =E πZ (ϕn ψn ) X Y .
n∈I
The functions (ψn )n≥1 being filtz -measurable, they are a fortiori filtz∨z -measu-
rable for any z ∈ R. Assumption 2 on the partial resampling kernel (πz )z∈R then
yields

πz (ϕn ψn )(x) = πz πz ∨z (ψn ϕn ) (x) = πz ψn πz ∨z (ϕn ) (x).
As a consequence, using again that the system of replicas (X(n) )n∈I is indepen-
dently distributed with distribution (πZ (X (n) , ·))n∈I conditionally on G and that Y
is G -measurable, we get the following identity:

(n) (n)
I =E πZ ψn πZ∨z (ϕn ) X (n) Y =E πZ∨z (ϕn ) X ψn X Y ,
n∈I n∈I
and this concludes the first step.

Step 2. We now prove the main claim of this lemma, namely the fact that the
replicas (X (n) )n∈I are independent with distribution (πZ (X (n) , ·))n∈I condition-
ally on GZ . Let us first assume that the test functions (ϕn )n∈I are continuous from
P to R.
In order to come back to a classical setting to apply Doob’s optional stopping

theorem, let us introduce a continuous, one-to-one and strictly increasing change of
level parametrization Z : [0, 1] → R ∪ {−∞, +∞}. Let us consider the following
stochastic process indexed by t ∈ [0, 1]:

Nt = E ϕn X (n) GZ (t) .
n∈I
It is a bounded (since I is GZ (t) -measurable for all t) and thus uniformly inte-
grable martingale with respect to the filtration (GZ (t) )t∈[0,1] . In addition, N1 =
χ
E( n∈I ϕn (X (n) )|G+∞ ) where G+∞ = G ∨ filt+∞ .
Thanks to Step 1 above, we get: almost surely, for all t ∈ [0, 1],

Nt = πZ∨Z (t) (ϕn ) X (n) .
n∈I
Therefore, Nt is almost surely a right-continuous bounded martingale from As-
sumption 1 on (πz )z∈R . By assumption, T = Z −1 (Z ) is a (GZ (t) )t∈[0,1] -stopping
level, and we can use a Doob’s optional stopping argument for right continuous
bounded martingales (see, for instance, [31], Theorem 3.22) to get
E(N1 |GZ (T ) ) = NT
which can be rewritten as (since Z ≥ Z a.s.)

(n)
E ϕn X GZ = πZ (ϕn ) X (n) .
n∈I n∈I
This equality actually holds for any sequence of bounded measurable functions
(ϕn )n∈I since continuous bounded functions are separating. This completes the
proof of Lemma 3.6.
Thanks to Lemma 3.6 we can now prove Proposition 3.4.
P ROOF OF P ROPOSITION 3.4. We proceed by induction on the iteration index

q ≥ 0. We first prove directly the statement (i)q ⇒ (ii)q and then (ii)q ⇒ (i)q+1
using Lemma 3.6. The initialization step consists in proving (i)0 using Lemma 3.6.
In this proof, (ϕn )n≥1 denotes a sequence of bounded measurable test functions
from P to R.
P ROOF OF (i)0 . The statement (i)0 reads

(n,0) (0)
E ϕn X F = πZ (0) (ϕn ) X (n,0) ,
n∈I (0) n∈I (0)
X (0)
where F (0) = filt . This is exactly the result of Lemma 3.6, taking Z = −∞,
Z (0)

Z = Z , G = σ (I ), and recalling that the replicas are initially independent
(0) (0)
and distributed according to π .

P ROOF OF (i)q ⇒ (ii)q . Assume that (i)q holds. We rewrite property (ii)q as
follows:

E ϕn X (n ,q+1) F (q) ∨ σ P (q+1)
n ∈I (q+1)
(18)
= πZ (q) (ϕn ) X (n ,q+1) ,
n ∈I (q+1)
and we now prove this identity.

Let us recall that in the partial resampling step, the replicas with labels in I (q)

are not modified and the replicas (X (n ,q+1) )n ∈I (q+1) are sampled in such a way
new
that they are independently distributed conditionally on F (q) ∨ σ (P (q+1) ), with
(q+1) (n ),q)
distribution (πZ (q) (X (P , ·))n ∈I (q+1) . Therefore, by definition of the total
new
set of labels I (q+1) given in (10), one obtains

(n ,q+1) (q) (q+1)
E ϕ X
n F ∨ σ P
n ∈I (q+1)

(n ,q+1) (q) (q+1)
=E ϕn X (n,q) ϕn X F ∨ σ P
(q+1) (q+1)
n∈I (q) \Ikilled n ∈Inew
(q+1) (n ),q)
= πZ (q) (ϕn ) X (P
(q+1)
n ∈Inew

×E ϕn X (n,q) F (q) ∨ σ P (q+1) .
(q+1)
n∈I (q) \Ikilled
Next, from the induction hypothesis (i)q , the replicas (X(n,q) )n∈I (q) are indepen-
dent with distribution (πZ (q) (X (n,q) , ·))n∈I (q) conditionally on F (q) . Since P (q+1)
is sampled conditionally on F (q) , the replicas (X (n,q) )n∈I (q) are also independent
conditionally on F (q) ∨ σ (P (q+1) ), with the same distributions. Therefore [notice
(q+1)
that I (q) and Ikilled are F (q) ∨ σ (P (q+1) )-measurable],

E ϕn X (n,q) F (q) ∨ σ P (q+1)
(q+1)
n∈I (q) \Ikilled

= πZ (q) (ϕn ) X (n,q)
(q+1)
n∈I (q) \Ikilled
(q+1) (n ),q)
= πZ (q) (ϕn ) X (P .
(q+1)
n ∈I (q) \Ikilled
Gathering the results leads to

(n ,q+1) (q) (q+1)
E ϕn X F ∨ σ P
n ∈I (q+1)
(q+1) (n ),q)
= πZ (q) (ϕn ) X (P .
n ∈I (q+1)
From the partial resampling step and Assumption 2, the following identity holds:
(q+1) (n ),q)
(19) ∀q ≥ 0, ∀n ∈ I (q+1) , πZ (q) X (n ,q+1) , · = πZ (q) X (P ,· .
This completes the proof of (18).
P ROOF OF (ii)q ⇒ (i)q+1 . Let us now assume that (ii)q holds. To prove that
(i)q+1 holds, it is sufficient to check that

(n,q+1) (q+1)
E ϕn X F = πZ (q+1) (ϕn ) X (n,q+1) .
n∈I (q+1) n∈I (q+1)
This is again exactly the result of Lemma 3.6 applied to X (q+1) , taking Z = Z (q) ,
Z = Z (q+1) and G = F (q) ∨ σ (P (q+1) ) so that Gz = Fz
(q+1)
[where, we recall,
(q+1)
Fz is defined by equation (12)].
Finally, let us prove Proposition 3.5.
P ROOF OF P ROPOSITION 3.5. The first equality (iii)q is a direct consequence

of the definition of the branching numbers. The second equality (iv)q is obtained
as a consequence of Proposition 3.4 by combining (i)q and (ii)q .
P ROOF OF (iii)q . The proof of this assertion is a direct application of the

branching rule. Indeed, by definition of the weights G(n ,q+1) given in (11), by
definition of the branching numbers (B (n,q+1) )n∈I (q) as the number of offsprings
of the nth replica, and because these branching numbers are independent of
(G(n,q) , X (n,q) )n∈I (q) conditionally on F (q) , we get

E π̂ (q+1/2) (ϕ)|F (q)

(q+1) (n ),q)
G(P (q+1) (n ),q)
=E (q+1) (n ),q+1)
ϕ X (P F (q)
n ∈I (q+1)
E(B (P |F (q) )

G(n,q) (n,q+1) (n,q) (q)
=E B ϕ X F
(q)
E(B (n,q+1) |F (q) )
n∈I

= E π̂ (q) (ϕ)|F (q) .
P ROOF OF (i)q + (ii)q ⇒ (iv)q . Using successively (i)q , the identity (19) and
(ii)q , we have
(q+1) (n ),q)
E π̂ (q+1/2) (ϕ)|F (q) ∨ σ P (q+1) = G(n ,q+1) πZ (q) (ϕ) X (P
n ∈I (q+1)

= G(n ,q+1) πZ (q) (ϕ) X (n ,q+1)
n ∈I (q+1)

= E π̂ (q+1) (ϕ)|F (q) ∨ σ P (q+1) .
4. The AMS algorithm for Markov chains. The goal of this section is to
define an Adaptive Multilevel Splitting algorithm based on the GAMS framework
of Section 2.2.1, when applied to paths of a Markov chain (namely a discrete
time stochastic process). In particular, we provide explicit examples of filtrations
(filtz )z∈R and partial resampling kernels (πz )z∈R introduced in Section 2.1, as well
as examples of level computation and branching rules in the algorithm.
In order to satisfy Assumptions 1, 2 and 3, and thus to obtain unbiased esti-
mators in this setting, a special care is required to treat the situations when many
replicas have the same maximum level, or the situations when there is extinction
of the population of replicas. These aspects which are specific to the discrete time
setting were not treated in details in many previous works where continuous time
diffusions were considered.
4.1. The Markov chain setting. Let X̃ = (X̃t )t∈N be a Markov chain, with
probability transition P , which takes values in a Polish state space S . Without
loss of generality, we assume that X̃0 = x0 where x0 ∈ S is a deterministic initial
condition.
The state space is the path space

(20) P = x = (xt )t∈N : xt ∈ S for all t ∈ N .

It is well known that, by introducing the distance dP (x, y) = t∈N 21t (1 ∧
sups≤t dS (xs , ys )) (which is a metric for the product topology), the space (P , dP )
is complete and separable. We thus see X̃ as a random variable with values in P .
4.2. The rare event of interest. Given two disjoint Borel subsets A and B of S ,
our objective is the efficient sampling of events such as {τB < τA } where
τA = inf{t ∈ N : X̃t ∈ A} and τB = inf{t ∈ N : X̃t ∈ B}
are respectively the first entrance times in A and B. Both τA and τB are stopping
times with respect to the natural filtration of the process X̃.
We are mainly interested in the estimation of the probability P(τB < τA ) in
the rare event regime, namely when this probability is very small (typically less
than 10−8 ). As explained in the Introduction, this occurs for example if the initial
condition x0 ∈ Ac ∩ B c is such that x0 is close to A, and A and B are metastable
regions for the dynamics. The Markov chain starting from (a neighborhood of)
A (resp., B) remains for a very long time near A (resp. B) before going to B
(resp., A), and thus, the Markov chain starting from x0 reaches A before B with a
probability close to one. A specific example will be studied in Section 5.
The P -valued random variable X considered to apply the GAMS framework
and the associated results is the Markov chain stopped at time τA : X = (Xt )t∈N
where
(21) Xt = X̃t∧τA for any t ∈ N.
The probability distribution on (P , B (P )) is π = Law(X), namely the law of
the stopped Markov chain X. The aim of the AMS algorithm is to estimate the
(small) probability
(22) p = P(τB < τA ) = E(1TB (X)<TA (X) ) = π(1TB (·)<TA (·) ),
where we denote for any path x ∈ P
(23) TA (x) = inf{t ∈ N : xt ∈ A} and TB (x) = inf{t ∈ N : xt ∈ B}.
More generally, we build unbiased estimators of E(ϕ((Xt )t∈N )1τB <τA ), for
bounded measurable function ϕ : S → R [see equation (2)].
R EMARK 4.1 (On the stopping times τA and τB ). We defined above the stop-
ping times as first entrance times in some sets A and B. As will become clear
below, the definition of the algorithm and the unbiasedness result only require τA
and τB to be stopping times with respect to the natural filtration of the chain X̃.
4.3. Reaction coordinate. The crucial ingredient we need to introduce the

AMS algorithm is a reaction coordinate, to play the role of an importance function.
This is a measurable R-valued mapping defined on the state space S :
ξ : S → R.
The choice of a good function ξ for given sets A and B is a difficult problem in gen-
eral. One of the main aims of this paper is to show that whatever the choice of ξ , it
is possible to define an unbiased estimator of (22), (2) and more generally (6). The
only requirement we impose on ξ is that there exists a constant zmax ∈ R such that

(24) B ⊂ ξ −1 ]zmax , +∞[ .
R EMARK 4.2 [On assumption (24)]. Assumption (24) is extremely useful in

practice when computing estimators, of (22) and (2); see Section 4.7: it allows to
remove from memory the replicas which are declared “retired” in the splitting step
at each iteration in the AMS algorithm described in the sequel, since by construc-
tion we know in advance that they will not contribute to the computation of the
associated estimator. The algorithm thus only requires to retain a fixed number of
replicas, denoted by nrep below.
In what follows, the values of ξ are called levels, and they precisely allow us to
specify the levels Z (q) computed at each iteration of the GAMS framework. We
will very often refer to the maximum level of a path, defined as follows.
D EFINITION 4.3. For any path x ∈ P , the maximum level of x is defined as

the supremum of ξ along the path x stopped at TA (x):

(25) (x) = sup ξ(xt∧TA (x) ) : t ∈ N ∈ R ∪ {+∞}.
The function can be seen as a reaction coordinate on the path space P .

We also introduce for any level z ∈ R and any path x ∈ P

(26) Tz (x) = inf t ∈ 0, . . . , TA (x) : ξ(xt ) > z ,
which is the first entrance time of the path x stopped at TA (x) in the set
ξ −1 (]z, +∞[). We emphasize on the strict inequality in the above definition of
the entrance times Tz (x): it is one of the important ingredients which allows us to
apply the GAMS framework in the Markov chain setting, and thus to obtain unbi-
ased estimators of (6). Notice that the above assumption (24) on B is equivalent to
the inequality

∀x ∈ P , Tzmax (x) ≤ inf t ∈ 0, . . . , TA (x) : xt ∈ B .
We denote by τz = Tz (X) the entrance time associated with the (stopped)
Markov chain X. It is a stopping time for the natural filtration of the Markov chain.
4.4. Filtration and partial resampling kernel. We are now in position to define
the filtration (filtz )z∈R and the partial resampling kernel (πz )z∈R of Section 2.1 in
the Markov chain setting.
For all z ∈ R, filtz is the smallest σ -field which makes the application x ∈ P →
xt∧(Tz (x)) ∈ P measurable, where Tz is defined by (26):

(27) filtz = σ x → (xt∧(Tz (x)) )t≥0 .
For all z ∈ R the partial resampling kernel πz is defined as follows: for any
x ∈ P , πz (x, dx ) ∈ Proba(P ) is the law of the P -valued random variable Y such
that

Yt = xt , if t ≤ Tz (x),
(28)
Law(Yt |Ys , 0 ≤ s ≤ t − 1) = P (Yt−1 , ·), if t > Tz (x)
and is stopped at TA (Y ), when Y hits A. We recall that P is the transition kernel of
the Markov chain X. In other words, for t ≤ Tz (x), Y is identically x, while for t >
Tz (x), Yt is generated according to the Markov dynamics on S , with probability
transition P , and stopped when reaching A. The partial resampling kernel thus
performs a branching of the path x at time Tz (x) and position xTz (x) .
Notice from the definition of the partial resampling kernel that if (x) ≤ z,
then this kernel does not modify x: Tz (x) = +∞ and πz (x, dx ) is a Dirac mass:
Yt = xt∧TA (x) for any t ∈ N.
Let us now check that Assumptions 1 and 2 are satisfied. The conditions of
Assumption 2 are direct consequences of the strong Markov property applied to
the chain t → Xt ∈ S defined by (21) at the stopping time τz (the strong Markov
property always holds true for discrete-time Markov processes).
The right-continuity property of Assumption 1 crucially relies on the defi-
nition (26) of Tz (x) as the entrance time of the path t → xt in the level set
ξ −1 (]z, +∞[): the fact that ]z, ∞[ is an open set implies z → Tz (x) is right con-
tinuous. Indeed, we have the following lemma.
L EMMA 4.4. Assumption 1 is satisfied for the partial resampling kernel de-
fined by (28). More precisely, for any x ∈ P , the function z ∈ R → πz (x, ·) ∈
Proba(P ) is piecewise constant and right continuous.
P ROOF. First, assume that Tz (x) = +∞, which means that (x) ≤ z. Then,
for any ε ≥ 0 we still have Tz+ε (x) = +∞. In that case, πz (x, ·) is a Dirac mass:
πz (x, ·) = πz+ε (x, ·) = δ(xt∧TA (x) )t≥0 .
Now, assume that Tz (x) < +∞. Then, for ε ∈ ]0, ξ(xTz (x) ) − z[ , Tz (x) =
Tz+ε (x), and by the definition of the partial resampling kernel, πz (x, ·) =
πz+ε (x, ·).
4.5. The AMS algorithm. In this section, we introduce the AMS algorithm for
the sampling of Markov chain trajectories. It is based on the GAMS framework of
Section 2.2.1; in addition to the framework, as explained in Section 2.2.2, we make
precise the following rules: the stopping criterion, the computation of branching
numbers and the computation of the stopping levels. We check below that they
satisfy Assumption 3. As a consequence, the GAMS framework encompasses the
AMS algorithm, and unbiased estimators of (6) can be defined.
The AMS algorithm iteratively generates a system of weighted replicas in the
state space P rep , using selection and partial resampling steps. The set of all the
labels of the replicas generated by the algorithm up to iteration q is denoted by
(q) (q) (q) (q)
I (q) = Ion Ioff , where Ion is the set of labels of “working” replicas, Ioff is
the set of labels of replicas which have been declared “retired.” We recall that
denotes the disjoint set union.
The cardinal of I (q) is increasing, while the number of “working” replicas is
(q)
kept fixed: card Ion = nrep for any q, where nrep is specified by the user of the
algorithm.
An additional parameter k ∈ {1, . . . , nrep − 1} finally is needed: it is the (min-
imum) number of replicas sampled at each step of the algorithm. The levels Z (q)
are computed as kth order statistics of the maximum levels of “working” replicas.
F IG . 1. Schematic representation of the first iteration of the AMS algorithm, with nrep = 4 and
k = 2. The replicas numbered 2 and 4 are declared retired at the first iteration, and are replaced
by the replicas with label 5 and 6, which are respectively partially resampled from the replicas with
labels 3 and 1.
At iteration q, all replicas with maximum levels lower or equal to Z (q) are de-
clared “retired”, and new replicas are sampled in order to keep a fixed number nrep
of replicas with maximum level strictly larger than Z (q) .
We are now in position to introduce the AMS algorithm in full detail (see Fig-
ure 1 for a schematic representation of one iteration of the algorithm).
The initialization step (q = 0):
(i) Let (X(n,0) )1≤n≤nrep be i.i.d. replicas of the stopped Markov chain in P , dis-
tributed according to π . Initialize the sets of labels of working and retired
(0) (0)
replicas: I (0) = Ion = {1, . . . , nrep } and Ioff = ∅.
(ii) Initialize the weights: G (n,0) = 1/nrep for n ∈ {1, . . . , nrep }.
(0)
(0)
(iii) Compute a permutation of Ion = {1, . . . , nrep } such that
(0) (1),0) (0) (n ),0)
X ( ≤ · · · ≤ X ( rep
and set the initial level as the kth order statistics:6

(0) (k),0)
Z (0) = X ( .
(0)
(iv) If card{n ∈ Ion : (X (n,0) ) ≤ Z (0) } = nrep , then set Z (0) = +∞.
Iterations. Iterate on q ≥ 0, while the following stopping criterion is not satis-
fied.
6 Notice that (0) is not necessarily unique since several replicas may have the same maximum
level. Nevertheless, the level Z (0) does not depend on the choice of (0) . The same remark applies
to the definition of the level Z (q+1) at iteration q ≥ 0.
The stopping criterion. If Z (q) > zmax , then the algorithm stops and set
Qiter = q. Else perform the following four steps.
The splitting (branching) step:
(i) Set
(q)
Ion,>Z (q) = n ∈ Ion
(q)
, X (n,q) > Z (q)
(q) (q) (q)
and Ion,≤Z (q) = Ion \ Ion,>Z (q) .
(q) (q)
Set K (q+1) = card Ion,≤Z (q) = nrep − card Ion,>Z (q) ≥ k.
(ii) Introduce a new set Inew = {card I (q) + 1, . . . , card I (q) + K (q+1) } ∈ N∗ \
(q+1)
I (q) of labels for the new replicas sampled at iteration q.

(q+1) (q)
(iii) Sample the children-parent mapping P (q+1) : Inew → Ion,>Z (q) , by pick-
ing the K (q+1) labels (P (q+1) (n ))n ∈I (q+1) independently and uniformly in
new
(q)
Ion,>Z (q) .
Extend the map by setting P (q+1) (n) = n for all n ∈ I (q) .
(iv) Update the sets of labels:
(q) (q+1) (q) (q)
(q+1)
Ion = Ion,>Z (q) Inew
(q+1)
, Ioff = Ioff Ion,≤Z (q) ,
(q+1)
I (q+1) = Ion
(q+1)
Ioff .
Update the weights:
⎧
(n ,q+1) (q+1) (n ),q)
n ∈ Ioff
(q+1)
⎪
⎨G = G(P ,
(29) nrep − K (q+1) (q+1) (n ),q)
⎪
⎩G(n ,q+1) = G(P n ∈ Ion
(q+1)
.
nrep
The partial resampling step:
(i) Replicas in I (q) are not modified: for n ∈ I (q) , X (n,q+1) = X (n,q) .

(ii) For each n ∈ Inew , sample independently X (n ,q+1) according to the partial
(q+1)
(q+1) (n ),q)
resampling distribution: πZ (q) (X (P , dx ).
The level computation step. Compute a bijective mapping (q+1) : {1, . . . ,
(q+1)
nrep } → Ion such that
(q+1) (1),q+1) (q+1) (n ),q+1)
X ( ≤ · · · ≤ X ( rep
and set the new level as the kth order statistics:

(q+1) (k),q+1)
(30) Z (q+1) = X ( .
(q+1)
If card{n ∈ Ion : (X (n,q+1) ) ≤ Z (q+1) } = nrep , then set Z (q+1) = +∞.
Increment. Increment q ← q + 1, and go back to the stopping criterion step.

This algorithm follows the steps of the GAMS framework, as introduced in
(q)
Section 2.2. The filtrations (Fz )z∈R and the σ -fields F (q) , for q ∈ N, are defined
for the AMS algorithm as explained in Section 2.2.
Important remarks on the algorithm need to be made.
(q)
Notice that necessarily, in the splitting step (i), card Ion,>Z (q) ≥ 1 (otherwise
Z (q) = +∞ and the stopping criterion has been fulfilled before entering the split-
ting step of iteration q). As a consequence, the sampling in the splitting step (iii)
is well defined.
(q+1)
Note that by construction card Ion = nrep .
Notice that the number of times the loop consisting of the three steps (split-
ting / partial resampling / level computation) is performed is

Qiter = inf q ≥ 0 : Z (q) > zmax .
If Z (Qiter ) = +∞, none of the working replicas at the iteration Qiter − 1 is above
(Q )
the new level (X( iter (k),Qiter ) ) and thus, all of them would have been declared
retired at the iteration Qiter : this situation is referred to as extinction. We refer
to [7], Remark 2.4, for a discussion of this phenomenon.
Finally, it is very important to notice that the number of replicas sampled at a
given iteration q is at least k, but may be larger than k: K (q+1) is not necessarily
equal to k. This requires at least two replicas to have Z (q) as the maximum level at
the beginning of iteration q, see [7], Figure 2. Actually, it may even happen that, in
(q+1)
the level computation step, all the replicas in Ion have Z (q+1) as the maximum
(q+1)
level, which implies extinction: card{n ∈ Ion : (X n,q+1 ) ≤ Z (q+1) } = nrep ,
Z (q+1) = +∞ and the algorithm stops, as explained above.
R EMARK 4.5 (Variants and extensions). Using the Generalized Adaptive

Multilevel Splitting framework, it is possible to propose variants of the AMS al-
gorithm presented in this section. We refer to [7], Section 3.5, for examples which
allow to remove extinction, to reduce the computational cost associated with sort-
ing procedures (computing the level Z (q) at each iteration q using a subset of the
ensemble of replicas) or to apply additional selection.
The GAMS framework also allows for different general settings: splitting algo-
rithms can be used to sample other random variables than paths of Markov chains
with levels defined as sup{ξ(Xt∧τA )t≥0 } for some stopping time τA and some reac-
tion coordinate function ξ . Actually, under appropriate assumptions, the following
cases also enter into the setting of the GAMS framework: path-dependent reaction
coordinates (duration of the path, integral over the path), sampling of continuous
time stochastic processes (diffusions, jump processes, branching processes), sam-
pling of nonhomogeneous stochastic processes, etc. Finally, the GAMS framework
can for instance be applied for the sampling of a Gaussian bridge distribution;
see [7], Section 3.5.
4.6. Verification of Assumption 3. We prove that the three procedures (the

stopping criterion, the computation rule of the branching numbers and the compu-
tation of the stopping levels) which are implemented in the AMS algorithm above
satisfy the requirements of Assumption 3.
The stopping criterion. In the AMS algorithm, we set S (q) = 1Z (q) ≤zmax which
(q)
is indeed a F (q) -measurable random variable, since Z (q) is a (Fz )z∈R -stopping
level; see Lemma 4.6 below.
The computation rule of the branching numbers. The branching numbers

B (n,q+1) are defined by

B (n,q+1) = 1 + card n ∈ Inew
(q+1)
: P (q+1) n = n
(q) (q)
for any n ∈ Ion,>Z (q) . We extend the definition for n ∈ I (q) \ Ion,>Z (q) by simply
setting B (n,q+1) = 1. It is then easy to check that they satisfy the requirements of
Assumption 3. Notice that in the AMS algorithm, the total number of new repli-
(q)
cas K (q+1) = n∈I (q) max{B (n,q+1) − 1, 0} is given by K (q+1) = card Ion,≤Z (q) .
(q+1)
Moreover, all branching numbers are positive, so that Ikilled = ∅. Another partic-
ular feature of the AMS algorithm is that the map P (q+1) takes values in the strict
(q)
subset Ion,>Z (q) of I (q) .
Let us check that the computation rule (29) for the weights in the AMS al-
gorithm is indeed consistent with the formula (11) given in the GAMS frame-
(q+1) (q) (q)
work. First, for n ∈ Ioff = Ion,≤Z (q) Ioff , B (n,q+1) = 1, P (q+1) (n) = n
(q)
and, consistently, G(n,q+1) = G(n,q) . Second, for n ∈ Ion,>Z (q) , it is clear that
E(B (n,q+1) |F (q) ) does not depend on n (since the random variables are exchange-
(q)
able in n ∈ Ion,>Z (q) ). In addition, by construction, n ∈I (q) B (n ,q+1) = nrep .
on,>Z (q)
(q)
Thus, we have by a simple counting argument: for any n ∈ Ion,>Z (q) ,
1
E B (n,q+1) |F (q) = (q)
E B (n ,q+1) |F (q)
card Ion,>Z (q) (q)
n ∈I
on,>Z (q)

E( n ∈I
(q) B (n ,q+1) |F (q) )
on,>Z (q) nrep
= = .
(q)
card Ion,>Z (q) nrep − K (q+1)
(q)
Thus, for n ∈ Ion,>Z (q) (and since P (q+1) (n) = n) the formula G(n,q+1) =
nrep −K (q+1) (n,q)
nrep G in (29) for the AMS algorithm is indeed consistent with the up-
dating formula (11) for the weights in the GAMS framework.
(q+1) (q+1) n −K (q+1) (q+1)

Third, for n ∈ Inew , G(n,q+1) = G(P (n),q+1) = rep
nrep G(P (n),q)
which is again consistent with the updating formula (11) for the weights in the
nrep −K (q+1) (q+1) (n),q+1)
GAMS framework since nrep = 1/E(B (P |F (q) ).
Computation of the stopping levels. Let us now check that the requirements on
Z (q) in Assumption 3 are satisfied. By definition of Z (q+1) [see the level computa-
tion step of the AMS algorithm, with (30)], it is clear that Z (q+1) ≥ Z (q) (actually,
the strict inequality Z (q+1) > Z (q) holds). It remains to prove that Z (q) is a stop-
(q)
ping level for the filtration (Fz )z∈R .
We start with an elementary result, which again highlights the importance of the
strict inequality > z in the definitions (26) of Tz (x) and of τz = Tz (X).
L EMMA 4.6. Let X : → P be a Markov chain over the state space S [see
equation (20)]. Then the random variable (X) [where, we recall, the maximum
z )z∈R -stopping level: for any z ∈ R,
level mapping is defined by (25)] is a (filtX
{(X) ≤ z} ∈ filtz .
X
P ROOF. On one hand, we clearly have the equality of subsets of P :

x ∈ P : (x) ≤ z = x ∈ P : Tz (x) = +∞ .
On the other hand, τz = Tz (X) is a filtX
z -measurable random variable. The result is
then a consequence of these two facts.
We are now in position to prove the last result which is needed for Assumption 3
to hold.
L EMMA 4.7. For any q ≥ 0, Z (q) is a stopping level with respect to the filtra-
(q) (q)
tion (Fz )z∈R : for any z ∈ R, {Z (q) ≤ z} ∈ Fz .
P ROOF. Set by convention Z (−1) = −∞ and let us consider q ≥ 0. Let

us introduce the kth order statistics over the maximum levels at iteration q:
(q+1) (q+1) (k),q+1) (q+1)
Lk = (X( ). Let us also introduce Lmax = max{(X (n,q+1) ) :
(q+1)
n ∈ Ion }. By definition of Z (q+1) (see the level computation step of the AMS
algorithm),
(q+1)
Z (q+1) = Lk 1{L(q+1) <L(q+1) } + (+∞)1{L(q+1) =L(q+1) } .
k max k max
(q+1) (q+1)
Therefore, for any z ∈ R, (using the partition = {Lmax ≤ z} {Lmax > z})
(q+1) (q+1) (q+1)
Z ≤ z = Lk ≤ z ∩ Lk < L(q+1)
max
(q+1) (q+1)
= Lk (q+1)
< Lmax ≤ z Lk ≤ z < L(q+1)
max .
These events belong to σ ({(X (n,q+1) ) ≤ z}, {(X(n,q+1) ) ≤ Z (q) }, n ∈ I (q+1) )

(q+1)
(in particular, the set of labels Ion is measurable with respect to the sigma-field
{(X (n,q+1) ) ≤ Z }). To conclude, note that by construction [level computation
(q)
step, (i)] and thanks to Lemma 4.6: for any z ∈ R,

σ X (n,q+1) ≤ z , X (n,q+1) ≤ Z (q) , n ∈ I (q+1) ⊂ Fz(q+1) .
4.7. The AMS estimator for the probability (22). An immediate corollary of
the results of Sections 4.4 and 4.6 above is that the GAMS framework encom-
passes the AMS algorithm, and that the unbiasedness result, Theorem 3.2, proven
in Section 3 can be applied. We detail the particular example of the estimation of
the probability p = P(τB < τA ); see (22).
(q+1)
Observe that the weights are easily computed: for n ∈ Ion ,
nrep − K (q+1) nrep − K (q) nrep − K (1) 1

G(n,q+1) = ··· .
nrep nrep nrep nrep
Moreover, the weight of a replica remains constant as soon as it is declared retired
(q+1)
(namely from the first iteration q such that its label is in Ioff ).
(Q )
Finally, due to the assumption (24) on B, only replicas with labels in Ion iter
contribute to the estimation of p, and thus, from one iteration to the other, only
(q)
replicas with labels in Ion have to be retained, namely a system of nrep replicas.
Then, for the specific observable ϕ(x) = 1TB (x)<TA (x) , we obtain the following
unbiased estimator of p = P(τB < τA ):

p̂ = G(n,Qiter ) 1TB (X(n,Qiter ) )<TA (X(n,Qiter ) )
(Q )
n∈Ion iter
(31)
nrep − K (Qiter ) nrep − K (1)
= ··· Pcorr ,
nrep nrep
where the so-called “corrector term” is given by
1
(32) Pcorr = 1TB (X(n,Qiter ) )<TA (X(n,Qiter ) )
nrep (Q )
n∈Ion iter
namely the proportion of working replicas that have reached B before A at the
final iteration. The properties of this estimator will be numerically investigated in
Section 5.
Note that the AMS algorithm presented in this section satisfies the almost sure
mass conservation property, Definition 3.1. Applying Theorem 3.2, we obtain the
following result.
C OROLLARY 4.8. For any choice of the number of replicas nrep , of k, and of
the reaction coordinate ξ [provided it satisfies condition (24)], p̂ defined by (31)
is an unbiased estimator of the probability p = P(τB < τA ) defined by (22):
E(p̂) = p = P(τB < τA ).
5. Numerical illustration. The aim of this section is to illustrate the behav-

ior of the AMS algorithm as defined in Section 4, in various situations involving
discrete-time approximations (1) of the overdamped Langevin dynamics in dimen-
sion 2.
We would like to discuss in particular the unbiasedness (Corollary 4.8) of the
AMS estimator p̂ of p = P(τB < τA ) [defined by the formula (31)] whatever the
choice of the reaction coordinate ξ , the number of replicas nrep and the minimal
number k of replicas which are declared retired and resampled at each iteration of
the AMS algorithm.
In the following, (p̂m )1mN refers to i.i.d. random variables distributed like
the estimator p̂, obtained by N independent realizations of the algorithm. The
associated empirical mean is denoted by
1 N
(33) pN = p̂m .
N m=1
The variance of the estimator p̂ is also investigated numerically, and it is shown

for a two-dimensional process that the variance heavily depends on the choice of
the reaction coordinate. In the following, we will denote by

1.96
1
N
(34) δN = 2 × √ × (p̂m )2 − (p N )2
N N m=1
the size of the 95% empirical confidence interval computed using the empirical
variance obtained over N independent runs of the algorithm.
The section is organized as follows. In Section 5.1, we give an example on
which we discuss the efficiency of the AMS algorithm by studying how the con-
vergence of the estimator depends on the reaction coordinate ξ . In Section 5.2,
we draw some conclusions and practical recommendations from these numerical
experiments. We refer to [7], Section 5, for additional simulations.
5.1. The bi-channel problem. The aim of this two-dimensional example is to

investigate the importance of the choice of the reaction coordinate on the efficiency
of the algorithm, on a typical example which has been used in previous numerical
studies; see [17, 34, 35].
5.1.1. The model. For an initial condition X0 = x0 ∈ R2 , a time step size

h > 0, and for t ∈ N, the Markov process in R2 we consider is defined by (1)
(discretization of the overdamped Langevin dynamics).
In the simulations below, the initial condition is X0 = x0 = (−0.9, 0) and the
time step is h = 0.05. The potential function V : R2 → R is given by
4
1 2 −(y− 1 )2 2 −(y− 5 )2
V (x, y) =0.2x + 0.2 y −
4
+ 3e−x 3 − 3e−x 3
3
− 5e−(x−1)
2 −y 2
− 5e−(x+1)
2 −y 2
.
This function is plotted on Figure 2. It has two global minima connected one to
another by two channels: the upper channel [which goes through the shallow min-
imum around (0, 1.5)] and the lower channel [which goes through the saddle point
around (0, −0.5)]. The two global minima are close to mA = (xA , yA ) = (−1, 0)
and mB = (xB , yB ) = (1, 0). For some ρ ∈ ]0, 1[ (in the numerical applications,
we take ρ = 0.05), we consider the sets A and B defined as the Euclidean open
balls of radius ρ around the two minima mA and mB , namely
⎧
⎨A = B (m , ρ) = (x, y) ∈ R2 : (x − x )2 + (y − y )2 < ρ ,
A A A
⎩B = B (m , ρ) = (x, y) ∈ R2 : (x − x )2 + (y − y )2 < ρ ,
B B B
so that x0 ∈ R2 \ (A ∪ B).
F IG . 2. Plot of the potential function V for the bi-channel problem.

Most of the trajectories starting from x0 hit A before B. Moreover, A and B

are metastable states: in the small temperature regime, starting from A (resp. B),
it takes a lot of time to leave A (resp. B).
We are interested in the estimation of the probability p = P(τB < τA ), where
τA = TA (X) and τB = TB (X) are the first hitting times of sets A and B by the
process X; see equation (23).
We will consider the results of the AMS algorithm for the three reaction coor-
dinates ξ i with i ∈ {1, 2, 3}:
1. The norm to the initial point mA :

ξ 1 (x, y) = (x − xA )2 + (y − yA )2 .
2. The norm to the final point mB :

ξ 2 (x, y) = ξ 1 (xB , yB ) − (x − xB )2 + (y − yB )2 .
3. The abscissa: ξ 3 (x, y) = x.

The maximum levels used in the stopping criterion of the algorithm are zmax 1 =
3 i −1
zmax = 1.9 and zmax = 0.9. Notice that for i ∈ {1, 2, 3}, we have B ⊂ (ξ ) (]zmax
2 i ,
+∞[), and thus (24) is satisfied.

In this section, we take k = 1 and the number of replicas is nrep = 100. The val-
ues of β belong to the set {8.67, 9.33, 10} which are associated with probabilities
p ranging approximately from 2.10−9 to 10−10 .
5.1.2. Evolution of the empirical mean. Let us first perform simulations with
N independent runs of the algorithm, N varying between 1 and 6.106 . We repre-
sent on Figure 3 the evolution as a function of N of the empirical mean pN [defined
by (33)] and of the associated 95% confidence intervals [pn − δN /2, pn + δN /2]
computed using the empirical variance; see (34).
The colors in the figures are as follows: green (solid line) for ξ 1 , red (line with
crosses) for ξ 2 and blue (line with circles) for ξ 3 . The full lines represent the
evolution of the upper and lower bounds of the confidence intervals, while dashed
lines represent the evolution of the empirical means.
From these simulations, we observe that:
• When N is sufficiently large, the confidence intervals overlap. This is in agree-
ment with the fact that p̂ is an unbiased estimator of p whatever the choice of
the reaction coordinate.
• The statistical fluctuations depend a lot on the reaction coordinate. In particular,
the results obtained with ξ 1 seem much better than with ξ 2 or ξ 3 . We will come
back to this in Section 5.1.3.
• The confidence interval being computed empirically, one may conclude that the
algorithm is biased by considering the results for N too small (see, e.g., the
F IG . 3. Evolution as a function of N of the empirical mean pN and of the associated 95% con-
fidence intervals [p N − δN /2, p N + δN /2]. Upper to lower β = 8.67, 9.33, 10. The right inserts
are zooms of the left graphs on smaller values of N , in order to illustrate the “apparent bias” phe-
nomenon.
graphs in the right column in Figure 3). This is due to the fact that the empirical
variance dramatically underestimates the real variance if N is too small. This
is a well-known phenomenon for splitting algorithms in general called “appar-
ent bias”; see [24]. As β gets larger (namely as the temperature gets smaller),
the number of independent runs N required to observe overlapping confidence
intervals gets larger.
We observe that there are some realizations for which the estimator of the prob-
ability is very large. These realizations have small probability but they dramati-
cally increase the value of the empirical mean and of the empirical variance. This
explains the large variations which are observed on the empirical average and con-
fidence interval as a function of the number of realizations; see Figure 3. As it
is usually the case with Monte Carlo simulations for rare event simulations, it
is impossible to decide a priori if the sample size N is sufficiently large to give
an accurate estimation. However, using the unbiasedness property (Theorem 3.2),
a sensible way to choose a number of realizations N is to set it sufficiently large so
that the confidence intervals obtained with different reaction coordinates overlap.
5.1.3. Fluctuations induced by the two channels. In this section, we compare

the results when using two reaction coordinates: ξ 1 (norm to the initial point) and
ξ 3 (abscissa). Since the typical behavior we observe in Figure 3 and in Table 1 is
the same for ξ 2 and ξ 3 , we do not repeat the analysis for ξ 2 . Our aim in this section
is to relate the large variations observed in Figure 3 with the fact that two channels
connect A to B.
As explained above, there are two possible channels for the reactive trajec-
tories going from A to B: the upper channel and the lower channel. For each
realization m, one can distinguish the contributions to the estimator p̂m of the
replicas going through the upper channel and the ones going through the lower
channel. In the following, for a given path, the trajectory is associated to the
upper (resp., lower) channel if the first hitting point of the y-axis is such that
TABLE 1
The bi-channel case. Proportion and conditional probabilities for two reaction coordinates: the
norm to the initial point (ξ 1 ) and the abscissa (ξ 3 ). e-n stands for 10−n
upper mix lower upper mix lower

β N RN ρN ρN ρN p̃N p̃N p̃N pN
ξ1 8.67 2.106 0.81 0.45 0.03 0.52 2.7e–09 3.0e–09 2.3e–09 1.7e–09
ξ3 8.67 2.106 0.99 0.0008 0.02 0.98 2.3e–06 5.9e–10 5.5e–10 2.4e–09
ξ1 9.33 4.106 0.72 0.51 0.02 0.47 6.2e–10 6.3e–10 2.5e–10 3.2e–10
ξ3 9.33 4.106 0.97 0.0005 0.02 0.98 1.0e–06 5.6e–11 9.7e–11 6.0e–10
ξ1 10 6.106 0.62 0.51 0.01 0.48 1.5e–10 1.4e–10 5.2e–11 6.2e–11
ξ3 10 6.106 0.92 0.0004 0.01 0.99 1.4e–07 1.5e–11 1.8e–11 6.8e–11
y > 0.5 (resp., such that y ≤ 0.5). More precisely, let us define 1 (x, y) = x
and 2 (x, y) = y for any (x, y) ∈ R2 . For a replica X = (Xt )t∈N such that
τ = inf{t ∈ N : 1 (Xt ) > 0} < ∞, X ∈ Upper if 2 (Xτ ) > 0.5 and X ∈ Lower
if 2 (Xτ ) ≤ 0.5.
For each realization of the algorithm, we compute the three following quantities:
• the number of replicas which reach B before A:

MB = 1TB (X(n,Qiter ) )<TA (X(n,Qiter ) ) ;
(Q )
n∈Ion iter
• the number of replicas which reach B before A and go through the upper chan-
nel:

MB,upper = 1TB (X(n,Qiter ) )<TA (X(n,Qiter ) ) 1X(n,Qiter ) ∈Upper ;
(Q )
n∈Ion iter
• the number of replicas which reach B before A and go through the lower chan-
nel:

MB,lower = 1TB (X(n,Qiter ) )<TA (X(n,Qiter ) ) 1X(n,Qiter ) ∈Lower .
(Q )
n∈Ion iter
Notice that MB = MB,upper + MB,lower and that MB = 0 is equivalent to p̂ = 0.

When needed, we explicitly indicate the dependence of these quantities on the
B,upper
realization by a lower script m: for m ∈ {1, . . . , N}, we thus denote MB
m , Mm
and Mm B,lower B
the mth realization of M , M B,upper and M B,lower .
Let us introduce the set EN = {m : p̂m = 0} of realizations which lead to a
nonzero p̂ and the proportion RN = card EN /N of such realizations. We now di-
vide the realizations in EN into three disjoint subsets, with associated proportions.
• All replicas reaching B before A go through the upper channel:
upper
upper upper card EN
EN = m ∈ EN : MB,lower
m = 0 and ρN = .
card EN
• All replicas reaching B before A go through the lower channel:
card ENlower
ENlower = m ∈ EN : MB,upper
m = 0 and ρN
lower
= .
card EN
• Both channels are used by the replicas reaching B before A:
upper card ENmix
ENmix = EN \ EN ∪ ENlower and ρN
mix
= .
card EN
upper
Obviously, ρN + ρN lower + ρ mix = 1. Finally, we define conditional estimators
N
for p̂ associated with the partition of EN defined above:

upper
m∈EN p̂m m∈ENlower p̂m m∈ENmix p̂m
upper
p̃N = upper , lower
p̃N = and mix
p̃N = .
card EN card ENlower card ENmix
Notice that
upper upper
p N = RN ρ N p̃N + ρN p̃N + ρN
lower lower mix mix
p̃N .
In other words, we have separated the nonzero contributions to p N into (i) real-
izations for which all the replicas go through the upper channel (first term in the
parenthesis), (ii) realizations for which all the replicas go through the lower chan-
nel (second term in the parenthesis) and finally (iii) realizations for which the two
channels are used by the replicas (third term in the parenthesis).
Let us emphasize that contrary to pN , the limit when N → ∞ of the estimators
upper lower , ρ mix , p̃ upper , p̃ mix or p̃ lower (for a given value of n ) depends
R N , ρN , ρN N N N N rep
on the choice of the reaction coordinate ξ ; see [7], Remark 5.5.
From Table 1, we observe that for ξ 1 , approximately half of the realizations use
exclusively the upper channel and the other half use the lower channel. The associ-
upper lower are very close. This is not the case for
ated conditional estimators p̃N and p̃N
3
ξ : only very few realizations go through the upper channel while the associated
upper lower and p̃ mix . This
probability p̃N is much larger than the two other ones p̃N N
means that a few realizations contribute a lot to the empirical average pN . This
explains the very large confidence intervals observed with ξ 3 (in comparison with
those observed for ξ 1 ) on Figure 3.
5.2. Conclusions and practical recommendations. Let us summarize our find-

ings on these numerical simulations.
• We always observe that for sufficiently large values of N (number of indepen-
dent Monte Carlo simulations), the confidence intervals of the estimator p̂ over-
lap, whatever nrep , k or ξ . This is in accordance with our theoretical result on
the unbiasedness of this estimator.
• In multiple channel cases (namely when multiple pathways exist from A to B),
one may observe nonoverlapping empirical confidence intervals of the estima-
tor for different reaction coordinates if the number of independent realizations
N is too small. This is related to the fact that very large contributions to the
average of the estimator are associated with trajectories going through very un-
likely (for the considered reaction coordinate and value of nrep ) channels. This
is a known phenomenon for splitting algorithms in general; see [24], where it is
referred to as “apparent bias”. In [7], Section 5.3, we report on additional exper-
iments which indeed show that when a single channel connects A to B (again
in a two-dimensional setting), the apparent bias phenomenon disappears. As ex-
plained in [24], a good reaction coordinate in a multiple channel case is such
that, conditionally to reach a certain maximum level z, the relative likelihood of

the channels used by the paths to reach this maximum level does not depend too
much on z. For example, a reaction coordinate close to the committor function
is a good candidate to achieve this purpose. This opens the route to adaptive al-
gorithms, where the reaction coordinate would be updated in order to get closer
and closer to the committor function as long as successive AMS algorithms are
launched (see [17]). We intend to investigate this direction in future works.
Let us also mention that we observed in our simulations that the estimator p̂ has
a heavy tail so that very few realizations contribute a lot to the empirical average
(which is also consistent with the results presented in Section 5.1.3). We refer
to [7], Section 5.2.3, for a discussion on this aspect.
As a conclusion to these numerical results, we thus recommend the following
in order to get reliable estimates of the probability P(τB < τA ) with the AMS
algorithm.
• Thanks to the unbiasedness property, one should check the independence of the
computed probability on the choice of the parameters: the number of replicas
nrep , the minimum number of sampled replicas per iteration k and, more im-
portantly, the reaction coordinate ξ . In particular, we recommend to perform
simulations with various reaction coordinates and to set the minimal number of
independent realizations such that the empirical confidence intervals overlap.
• Thanks to the unbiasedness property, one can perform many independent real-
izations of the algorithm with a relatively small number of replicas, instead of
using a few independent realizations with a large number of replicas. Indeed,
assume that we are in a regime where the variance scales like nrep1×N ; this is the
case for instance in the so-called ideal case for sufficiently large nrep and N ;
see [10]. Since the parallelization of independent runs of the algorithm is trivial,
for a fixed product nrep × N (namely for a fixed CPU cost), the strategy with
less replicas is thus much more interesting in terms of wall-clock time (which
scales like nrep ) than the strategy with more replicas.
Finally, let us recall that one should be careful in the implementation of the split-
ting and branching steps, in particular in the treatment of replicas which have the
same maximum level and in the definition of the branching point in the partial
resampling procedure. One output of this work is to identify correct implementa-
tions in such cases. For incorrect implementations, strong biases may be observed;
see [7], Section 5.1.
Acknowledgments. We would like to thank the referees for their comments

which helped us improving the presentation of the paper.
C.-E. Bréhier is grateful to INRIA Rocquencourt for having funded his post-
doctoral position (September 2013–December 2014). M. Gazeau is grateful to IN-
RIA Lille Nord Europe where a part of this research was conducted and to Labex
CEMPI (ANR-11-LABX-0007-01). The authors are grateful to the Labex Bezout

(ANR-10-LABX-58-01) which supported this project at an early stage, during the
CEMRACS 2013. T. Lelièvre would like to acknowledge the Laboratoire Interna-
tional Associé between the Centre National de la Recherche Scientifique (CNRS)
and the University of Illinois at Urbana-Champaign (UIUC). Finally, the authors
would like to thank F. Bouchet, F. Cérou, A. Guyader, J. Rolland and E. Simonnet
for many fruitful discussions.
REFERENCES
[1] A SMUSSEN , S. and G LYNN , P. W. (2007). Stochastic Simulation: Algorithms and Analysis.
Stochastic Modelling and Applied Probability 57. Springer, New York. MR2331321
[2] AU , S. K. and B ECK , J. L. (2001). Estimation of small failure probabilities in high dimensions
by subset simulation. Journal of Probabilistic Engineering Mechanics 16 263–277.
[3] B ILLINGSLEY, P. (1999). Convergence of Probability Measures, 2nd ed. Wiley Series in Prob-
ability and Statistics: Probability and Statistics. Wiley, New York. MR1700749
[4] B LANCHET, J., G LYNN , P. and L IU , J. C. (2006). State-dependent importance sampling and
large deviations. In Proceedings of the 1st International Conference on Performance Eval-
uation Methodolgies and Tools, Valuetools ’06 ACM, New York.
[5] B RÉHIER , C.-E. (2015). Large deviations principle for the adaptive multilevel splitting al-
gorithm in an idealized setting. ALEA Lat. Am. J. Probab. Math. Stat. 12 717–742.
MR3446035
[6] B RÉHIER , C. E., C HAUDRU DE R AYNAL , P. E., L EMAIRE , V., PANLOUP, F. and R EY, C.
(2015). Recent advances in various fields of numerical probability. ESAIM Proceedings
and Reviews 51 272–292.
[7] B RÉHIER , C.-E., G AZEAU , M., G OUDENÈGE , L., L ELIÈVRE , T. and ROUSSET, M. (2015).
Unbiasedness of some generalized adaptive multilevel splitting algorithms. Preprint.
Available at arXiv:1505.02674.
[8] B RÉHIER , C.-E., G AZEAU , M., G OUDENÈGE , L. and ROUSSET, M. (2015). Analysis and
simulation of rare events for SPDEs. ESAIM Proceedings and Reviews 48 364–384.
[9] B RÉHIER , C.-E., G OUDENÈGE , L. and T UDELA , L. (2014). Central limit theorem for adaptive
multilevel splitting estimators in an idealized setting. In Monte Carlo and Quasi Monte
Carlo Methods. Springer, MCQMC, Leuven, Belgium.
[10] B RÉHIER , C.-E., L ELIÈVRE , T. and ROUSSET, M. (2015). Analysis of adaptive multilevel
splitting algorithms in an idealized case. ESAIM Probab. Stat. 19 361–394. MR3417480
[11] B UCKLEW, J. A. (2004). Introduction to Rare Event Simulation. Springer, New York.
MR2045385
[12] C ARON , V., G UYADER , A., M UNOZ Z UNIGA , M. and T UFFIN , B. (2014). Some recent results
in rare event estimation. In Journées MAS 2012. ESAIM Proc. 44 239–259. EDP Sci., Les
Ulis. MR3178620
[13] C ÉROU , F., D EL M ORAL , P., F URON , T. and G UYADER , A. (2012). Sequential Monte Carlo
for rare event estimation. Stat. Comput. 22 795–808. MR2909622
[14] C ÉROU , F., D EL M ORAL , P., L E G LAND , F. and L EZAUD , P. (2006). Genetic genealogi-
cal models in rare event analysis. ALEA Lat. Am. J. Probab. Math. Stat. 1 181–203.
MR2249654
[15] C ÉROU , F. and G UYADER , A. (2007). Adaptive multilevel splitting for rare event analysis.
Stoch. Anal. Appl. 25 417–443. MR2303095
[16] C ÉROU , F. and G UYADER , A. (2016). Fluctuations analysis of adaptive multilevel splitting.
Ann. Appl. Probab. 26 3319–3380.
[17] C ÉROU , F., G UYADER , A., L ELIÈVRE , T. and P OMMIER , D. (2011). A multiple replica ap-
proach to simulate reactive trajectories. J. Chem. Phys. 134 054108 (16 p.).
[18] D EAN , T. and D UPUIS , P. (2009). Splitting for rare event simulation: A large deviation ap-
proach to design and analysis. Stochastic Process. Appl. 119 562–587. MR2494004
[19] D EL M ORAL , P. (2004). Feynman–Kac Formulae. Springer, New York. MR2044973
[20] D EL M ORAL , P., D OUCET, A. and JASRA , A. (2006). Sequential Monte Carlo samplers. J. R.
Stat. Soc. Ser. B. Stat. Methodol. 68 411–436. MR2278333
[21] D EL M ORAL , P. and G ARNIER , J. (2005). Genealogical particle analysis of rare events. Ann.
Appl. Probab. 15 2496–2534. MR2187302
[22] D UPUIS , P. and WANG , H. (2004). Importance sampling, large deviations, and differential
games. Stoch. Stoch. Rep. 76 481–508. MR2100018
[23] G ARVELS , M. J. J., K ROESE , D. P. and VAN O MMEREN , J. C. W. (2002). On the importance
function in splitting simulation. European Transactions on Telecom-Munications 13 363–
371.
[24] G LASSERMAN , P., H EIDELBERGER , P., S HAHABUDDIN , P. and Z AJIC , T. (1998). A large de-
viations perspective on the efficiency of multilevel splitting. IEEE Trans. Automat. Con-
trol 43 1666–1679. MR1658685
[25] G LASSERMAN , P., H EIDELBERGER , P., S HAHABUDDIN , P. and Z AJIC , T. (1999). Multilevel
splitting for estimating rare event probabilities. Oper. Res. 47 585–600. MR1710951
[26] G LASSERMAN , P. and WANG , Y. (1997). Counterexamples in importance sampling for large
deviations probabilities. Ann. Appl. Probab. 7 731–746. MR1459268
[27] G UYADER , A., H ENGARTNER , N. and M ATZNER -L ØBER , E. (2011). Simulation and esti-
mation of extreme quantiles and extreme probabilities. Appl. Math. Optim. 64 171–196.
MR2822407
[28] H AMMERSLEY, J. M. and H ANDSCOMB , D. C. (1965). Monte Carlo Methods. Methuen, Lon-
don. MR0223065
[29] J OHANSEN , A. M., D EL M ORAL , P. and D OUCET, A. (2006). Sequential Monte Carlo sam-
plers for rare events. In Proceedings of the 6th International Workshop on Rare Event
Simulation (RESIM 2006) 256–267, Bamberg.
[30] K AHN , H. and H ARRIS , T. E. (1951). Estimation of particle transmission by random sampling.
National Bureau of Standards 12 27–30.
[31] K ARATZAS , I. and S HREVE , S. E. (1991). Brownian Motion and Stochastic Calculus, 2nd ed.
Graduate Texts in Mathematics 113. Springer, New York. MR1121940
[32] L AGNOUX -R ENAUDIE , A. (2008). Effective branching splitting method under cost constraint.
Stochastic Process. Appl. 118 1820–1851. MR2454466
[33] L AGNOUX -R ENAUDIE , A. (2009). A two-step branching splitting model under cost constraint
for rare event analysis. J. Appl. Probab. 46 429–452. MR2535824
[34] M ETZNER , P., S CHÜTTE , C. and VANDEN -E IJNDEN , E. (2006). Illustration of transition path
theory on a collection of simple examples. J. Chem. Phys. 125 084110. Available at
http://dx.doi.org/10.1063/1.2335447.
[35] PARK , S., S ENER , M. K., L U , D. and S CHULTEN , K. (2003). Reaction paths based on mean
first-passage times. J. Chem. Phys. 119 1313–1319.
[36] ROLLAND , J. and S IMONNET, E. (2015). Statistical behaviour of adaptive multilevel splitting
algorithms in simple models. J. Comput. Phys. 283 541–558. MR3294688
[37] ROSENBLUTH , M. N. and ROSENBLUTH , A. W. (1955). Monte Carlo calculation of the aver-
age extension of molecular chains. J. Chem. Phys. 23 356–359.
[38] RUBINO , G. and T UFFIN , B. (2009). Introduction to rare event simulation. In Rare Event Sim-
ulation Using Monte Carlo Methods 1–13. Wiley, Chichester. MR2730759
[39] S HIRYAYEV, A. N. (1984). Probability. Graduate Texts in Mathematics 95. Springer, New
York. MR0737192
[40] S IMONNET, E. (2014). Combinatorial analysis of the adaptive last particle method. Stat. Com-
put. 26 211–230.
[41] S KILLING , J. (2006). Nested sampling for general Bayesian computation. Bayesian Anal. 1
833–859 (electronic). MR2282208
[42] VANDEN -E IJNDEN , E. and W EARE , J. (2012). Rare event simulation of small noise diffusions.
Comm. Pure Appl. Math. 65 1770–1803. MR2982641
[43] V ILLÉN -A LTAMIRANO , M. and V ILLÉN -A LTAMIRANO RESTART, J. (1991). A method for
accelerating rare events simulations. In Proceeding of the Thirteenth International Tele-
traffic Congress, Volume Copenhagen, Denmark, June 19–26 of Queueing, Performance
and Control in ATM: ITC-13 Workshops 71–76. North-Holland, Amsterdam.
[44] WALTER , C. Moving particles: A parallel optimal multilevel splitting method with applications
in quantiles estimation and meta-model-based algorithms. Struct. Saf. 55 10–25.
C.-E. B RÉHIER M. G AZEAU

CNRS UMR 5208, I NSTITUT C AMILLE J ORDAN D EPARTMENT OF M ATHEMATICS
U NIV LYON , U NIVERSITÉ C LAUDE B ERNARD LYON 1 U NIVERSITY OF T ORONTO
43 BD . DU 11 NOVEMBRE 1918 40 S T. G EORGE S T.
F-69622 V ILLEURBANNE CEDEX T ORONTO M5S 2E4
F RANCE C ANADA
E- MAIL : brehier@math.univ-lyon1.fr E- MAIL : gazeauma@math.toronto.edu
L. G OUDENÈGE T. L ELIÈVRE
F ÉDÉRATION DE M ATHÉMATIQUES M. ROUSSET
DE L’É COLE C ENTRALE PARIS U NIVERSITÉ PARIS -E ST
CNRS CERMICS (ENPC), INRIA
G RANDE VOIE DES VIGNES 6-8 AVENUE B LAISE PASCAL
92295 C HÂTENAY-M ALABRY C ITÉ D ESCARTES
F RANCE F-77455 M ARNE - LA -VALLÉE
E- MAIL : goudenege@math.cnrs.fr F RANCE
E- MAIL : lelievre@cermics.enpc.fr
roussetm@cermics.enpc.fr

BréhierEtal 2016

Uploaded by

Copyright:

Available Formats

BréhierEtal 2016

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BréhierEtal 2016

Uploaded by

Copyright:

Available Formats

The Annals of Applied Probability

2016, Vol. 26, No. 6, 3559–3601

UNBIASEDNESS OF SOME GENERALIZED ADAPTIVE

B Y C HARLES -E DOUARD B RÉHIER∗,1 M AXIME G AZEAU† ,

1. Introduction. The efficient sampling of rare events is a very important

Typically, Xt ∈ R3N is a high-dimensional vector giving the positions of N parti-

Received June 2015; revised November 2015.

1.3. The adaptive multilevel splitting algorithm. In this article, we focus on

• Let X be a random variable with values in E , and G be a sub-σ -field of F .

where I denotes the ensemble of finite subsets of N∗ (which is a countable set).

2. Generalized adaptative multilevel splitting. In this section, we introduce

for a given bounded measurable observable ϕ : P → R. Typically, ϕ has strong

2.1.1. The filtration. We need an additional structure on (P , B (P )), namely

By convention, for any x ∈ P , we set

A SSUMPTION 1. For any x ∈ P , and any continuous bounded test function

is right-continuous. Moreover, limz→−∞ πz (ϕ)(x) = π−∞ (ϕ)(x) = π(ϕ).

As a consequence (by letting z → −∞ in the previous assumption), if X is dis-

and can be interpreted as containing all the information on a replica X required to

2.1.4. Filtrations generated by systems of replicas. We can construct a filtra-

where, we recall, I denotes the ensemble of finite subsets of N∗ .

We also consistently set

where σ (I ) is the σ -field generated by the random set of labels I .

2.1.5. Stopping levels. We finally introduce the notion of stopping level,

In particular, Z is a FZ -measurable random variable.

We are now in position to introduce the GAMS framework in the following

The initialization step (q = 0):

for new labels and update the current set of labels

3 Assumption 3 ensures that Z (0) is a (F (0) ) (0)

The partial resampling step:

4 Assumption 3 ensures that Z (q+1) is a (F (q+1) ) (q+1)

such that P (q+1) (n ) = n,

2.2.2. From the GAMS framework to a practical algorithm. In the GAMS

3. The unbiasedness theorem. In the present section, the unbiasedness of

The main theoretical result of this paper is the following.

T HEOREM 3.2. Let (X (q) )0≤q≤Qiter be the sequence of random systems of

3.2. Proof of Theorem 3.2. The following definition of conditionally indepen-

D EFINITION 3.3. Let Z be a random level, I ⊂ N∗ a finite random set of

The second proposition states intermediate equalities between conditional aver-

L EMMA 3.6. Let us assume that Assumptions 1 and 2 hold. Let Z ∈ R ∪

labels. Assume that σ (I ) ∨ σ (Z) ⊂ G . Consider a random system of replicas X =

and this concludes the first step.

In order to come back to a classical setting to apply Doob’s optional stopping

Thanks to Lemma 3.6 we can now prove Proposition 3.4.

P ROOF OF P ROPOSITION 3.4. We proceed by induction on the iteration index

P ROOF OF (i)0 . The statement (i)0 reads

and distributed according to π .

and we now prove this identity.

Gathering the results leads to

Finally, let us prove Proposition 3.5.

P ROOF OF P ROPOSITION 3.5. The first equality (iii)q is a direct consequence

P ROOF OF (iii)q . The proof of this assertion is a direct application of the

4.3. Reaction coordinate. The crucial ingredient we need to introduce the

R EMARK 4.2 [On assumption (24)]. Assumption (24) is extremely useful in

D EFINITION 4.3. For any path x ∈ P , the maximum level of x is defined as

The function  can be seen as a reaction coordinate on the path space P .

and set the initial level as the kth order statistics:6

I (q) of labels for the new replicas sampled at iteration q.

and set the new level as the kth order statistics:

Increment. Increment q ← q + 1, and go back to the stopping criterion step.

R EMARK 4.5 (Variants and extensions). Using the Generalized Adaptive

such that P (q+1) (n ) = n,

The function can be seen as a reaction coordinate on the path space P .

These events belong to σ ({(X (n,q+1) ) ≤ z}, {(X(n,q+1) ) ≤ Z (q) }, n ∈ I (q+1) )

Notice that MB = MB,upper + MB,lower and that MB = 0 is equivalent to p̂ = 0.