Markov chains2
Markov chains2
Pierre Brémaud
December 9, 2015
2
Contents
2 Recurrence 23
2.1 The potential matrix criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Stationary distribution criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 Foster’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3 Long-run behaviour 45
3.1 Ergodic theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 Convergence in variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3 Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.4 Absorption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4 Solutions 77
A Appendix 89
A.1 Greatest Common Divisor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
A.2 Dominated convergence for series . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3
4 CONTENTS
Chapter 1
2
p12
1
p11 p23 p32
p41
p34
3
5
6 CHAPTER 1. THE TRANSITION MATRIX
Recall that a sequence {Xn }n≥0 of random variables with values in a set E is called
a discrete-time stochastic process with state space E. In this chapter, the state
space is countable, and its elements will be denoted by i, j, k,. . . If Xn = i, the
process is said to be in state i at time n, or to visit state i at time n.
this stochastic process is called a Markov chain, and a homogeneous Markov chain
(hmc) if, in addition, the right-hand side is independent of n.
is called the transition matrix of the hmc. Since the entries are probabilities, and
since a transition from any state i must be to some state, it follows that
X
pij ≥ 0, and pik = 1
k∈E
for all states i, j. A matrix P indexed by E and satisfying the above properties is
called a stochastic matrix. The state space may be infinite, and therefore such a
matrix is in general not of the kind studied in linear algebra. However, the basic
operations of addition and multiplication will be defined by the same formal rules.
The notation x = {x(i)}i∈E formally represents a column vector, and xT is the
corresponding row vector.
The Markov property easily extends (Exercise 1.5.2) to
P (A | Xn = i, B) = P (A | Xn = i) ,
where
P (A ∩ B | Xn = i) = P (A | Xn = i)P (B | Xn = i).
That is, A and B are conditionaly independent given Xn = i. In other words, the
future at time n and the past at time n are conditionally independent given the
1.1. THE DISTRIBUTION OF A MARKOV CHAIN 7
The matrix Pm is called the m-step transition matrix because its general term is
In fact,P
by the Bayes sequential rule and the Markov property, the right-hand side
equals i1 ,...,im−1 ∈E pii1 pi1 i2 · · · pim−1 j , which is the general term of the m-th power
of P.
The probability distribution ν0 of the initial state X0 is called the initial distri-
bution. From the Bayes sequential rule and in view of the homogeneous Markov
property and the definition of the transition matrix,
Therefore,
Theorem 1.1.1 The distribution of a discrete-time hmc is uniquely determined
by its initial distribution and its transition matrix.
Proof. Iteration of recurrence (1.2) shows that for all n ≥ 1, there is a function
gn such that Xn = gn (X0 , Z1 , . . . , Zn ), and therefore P (Xn+1 = j | Xn = i, Xn−1 =
in−1 , . . . , X0 = i0 ) = P (f (i, Zn+1 ) = j | Xn = i, Xn−1 = in−1 , . . . , X0 = i0 ) =
P (f (i, Zn+1 ) = j), since the event {X0 = i0 , . . . , Xn−1 = in−1 , Xn = i} is express-
ible in terms of X0 , Z1 , . . . , Zn and is therefore independent of Zn+1 . Similarly,
P (Xn+1 = j | Xn = i) = P (f (i, Zn+1 ) = j). We therefore have a Markov chain,
and it is homogeneous since the right-hand side of the last equality does not depend
on n. Explicitly:
pij = P (f (i, Z1 ) = j) . (1.3)
Not all homogeneous Markov chains receive a “natural” description of the type
featured in Theorem 1.1.2. However, it is always possible to find a “theoretical”
description of the kind. More exactly,
Theorem 1.1.3 For any transition matrix P on E, there exists a homogeneous
Markov chain with this transition matrix and with a representation such as in
Theorem 1.1.2.
Proof. Define
j−1 j
X X
Xn+1 := j if pXn k ≤ Zn+1 < pXn k ,
k=0 k=0
where {Zn }n≥1 is iid, uniform on [0, 1]. By application of Theorem 1.1.2 and of
formula (1.3), we check that this hmc has the announced transition matrix.
As we already mentioned, not all homogeneous Markov chains are naturally de-
scribed by the model of Theorem 1.1.2. A slight modification of this result con-
siderably enlarges its scope.
1.1. THE DISTRIBUTION OF A MARKOV CHAIN 9
Theorem 1.1.4 Let things be as in Theorem 1.1.2 except for the joint distribu-
tion of X0 , Z1 , Z2 , . . .. Suppose instead that for all n ≥ 0, Zn+1 is condition-
ally independent of Zn , . . . , Z1 , Xn−1 , . . . , X0 given Xn , and that for all i, j ∈ E,
P (Zn+1 = k | Xn = i) is independent of n. Then {Xn }n≥0 is a hmc, with transition
probabilities
pij = P (f (i, Z1 ) = j | X0 = i).
Proof. The proof is quite similar to that of Theorem 1.1.2 (Exercise ??).
Example 1.1.2: The Ehrenfest urn, take 1. This idealized model of dif-
fusion through a porous membrane, proposed in 1907 by the Austrian physicists
Tatiana and Paul Ehrenfest to describe in terms of statistical mechanics the ex-
change of heat between two systems at different temperatures, considerably helped
understanding the phenomenon of thermodynamic irreversibility (see Example ??).
It features N particles that can be either in compartment A or in compartment
B.
A B
Xn = i N −i
Xn+1 = Xn + Zn+1 ,
i
where Zn ∈ {−1, +1} and P (Zn+1 = −1 | Xn = i) = N
. The nonzero entries of
the transition matrix are therefore
N −i i
pi,i+1 = , pi,i−1 = .
N N
10 CHAPTER 1. THE TRANSITION MATRIX
i−1 i 1
1 1− N
1− N N
1 i i+1 1
N N N
First-step analysis
Some functionals of homogeneous MarkovP chains such as probabilities of absorption
by a closed set (A is called closed if j∈A pij = 1 for all i ∈ A) and average times
before absorption can be evaluated by a technique called first-step analysis.
Example 1.1.3: The gambler’s ruin, take 1. Two players A and B play
“heads or tails”, where heads occur with probability p ∈ (0, 1), and the successive
outcomes form an iid sequence. Calling Xn the fortune in dollars of player A at
time n, then Xn+1 = Xn + Zn+1 , where Zn+1 = +1 (resp., −1) with probability
p (resp., q := 1 − p), and {Zn }n≥1 is iid. In other words, A bets $1 on heads at
each toss, and B bets $1 on tails. The respective initial fortunes of A and B are
a and b (positive integers). The game ends when a player is ruined, and therefore
the process {Xn }n≥1 is a random walk as described in Example 1.1.1, except that
it is restricted to E = {0, . . . , a, a + 1, . . . , a + b = c}. The duration of the game is
T , the first time n at which Xn = 0 or c, and the probability of winning for A is
u(a) = P (XT = c | X0 = a).
Instead of computing u(a) alone, first-step analysis computes
u(i) = P (XT = c | X0 = i)
for all states i, 0 ≤ i ≤ c, and for this, it first generates a recurrence equation
for u(i) by breaking down event “A wins” according to what can happen after the
first step (the first toss) and using the rule of exclusive and exhaustive causes. If
X0 = i, 1 ≤ i ≤ c − 1, then X1 = i + 1 (resp., X1 = i − 1) with probability p (resp.,
q), and the probability of winning for A with updated initial fortune i + 1 (resp.,
i − 1) is u(i + 1) (resp., u(i − 1)). Therefore, for i, 1 ≤ i ≤ c − 1,
A wins
c=a+b
0
1 2 3 4 5 6 7 8 9 10 T = 11
The gambler’s ruin
The characteristic equation associated with this linear recurrence equation is pr2 −
r + q = 0. It has two distinct roots, r1 = 1 and r2 = pq , if p 6= 21 , and a double root,
i
r1 = 1, if p = 12 . Therefore, the general solution is u(i) = λr1i + µr2i = λ + µ pq
when p 6= q, and u(i) = λr1i + µir1i = λ + µi when p = q = 21 . Taking into account
the boundary conditions, one can determine the values of λ and µ. The result is,
for p 6= q,
1 − ( pq )i
u(i) = ,
1 − ( pq )c
and for p = q = 12 ,
i
u(i) = .
c
In the case p = q = 12 , the probability v(i) that B wins when the initial fortune of
B is c−i is obtained by replacing i by c−i in expression for u(i): v(i) = c−i
c
= 1− ci .
One checks that u(i) + v(i) = 1, which means in particular that the probability
that the game lasts forever is null. The reader is invited to check that the same is
true in the case p 6= q.
First-step analysis can also be used to compute average times before absorption
(Exercise 1.5.5).
12 CHAPTER 1. THE TRANSITION MATRIX
In particular, a state i is always accessible from itself, since pii (0) = 1 (P0 = I,
the identity).
P
For M ≥ 1, pij (M ) = i1 ,...,iM −1 pii1 · · · piM −1 j , and therefore pij (M ) > 0 if and
only if there exists at least one path i, i1 , . . . , iM −1 , j from i to j such that
i↔i (reflexivity),
i↔j⇒j↔i (symmetry),
i ↔ j, j ↔ k ⇒ i ↔ k (transivity).
Definition 1.2.3 If there exists only one communication class, then the chain, its
transition matrix, and its transition graph are said to be irreducible.
Period
Consider the random walk on Z (Example 1.1.1). Since 0 < p < 1, it is irreducible.
Observe that E = C0 + C1 , where C0 and C1 , the set of even and odd relative
integers respectively, have the following property. If you start from i ∈ C0 (resp.,
C1 ), then in one step you can go only to a state j ∈ C1 (resp., C0 ). The chain
1.2. COMMUNICATION AND PERIOD 13
{Xn } passes alternately from cyclic class to the other. In this sense, the chain
has a periodic behavior, corresponding to the period 2. More generally, for any
irreducible Markov chain, one can find a unique partition of E into d classes C0 ,
C1 , . . ., Cd−1 such that for all k, i ∈ Ck ,
X
pij = 1,
j∈Ck+1
Theorem 1.2.1 If states i and j communicate they have the same period.
Proof. As i and j communicate, there exist integers N and M such that pij (M ) >
0 and pji (N ) > 0. For any k ≥ 1,
Proof. It suffices to prove the theorem for i = j. Indeed, there exists m such
that pij (m) > 0, because j is accessible from i, the chain being irreducible, and
therefore, if for some n0 ≥ 0 we have pjj (nd) > 0 for all n ≥ n0 , then pij (m+nd) ≥
pij (m)pjj (nd) > 0 for all n ≥ n0 .
The rest of the proof is an immediate consequence of a classical result of number
theory (Theorem A.1.1). Indeed, the gcd of the set A = {k ≥ 1; pjj (k) > 0} is
d, and A is closed under addition. The set A therefore contains all but a finite
number of the positive multiples of d. In other words, there exists n0 such that
n > n0 implies pjj (nd) > 0.
C0 C1 C2 = Cd−1
πT = πT P (1.4)
The global balance equation (1.4) says that for all states i,
X
π(i) = π(j)pji .
j∈E
1.3. STATIONARITY AND REVERSIBILITY 15
Example 1.3.1: Two-state Markov chain. Take E = {1, 2} and define the
transition matrix
1−α α
P= ,
β 1−β
where α, β ∈ (0, 1). The global balance equations are
π(1) = π(1)(1 − α) + π(2)β , π(2) = π(1)α + π(2)(1 − β .
These two equations are dependent and reduce to the single equation π(1)α =
π(2)β, to which must be added the constraint π(1) + π(2) = 1 expressing that π
is a probability vector. We obtain
β α
π(1) = , π(2) = .
α+β α+β
Example 1.3.2: The Ehrenfest urn, take 2. The global balance equations
are, for i ∈ [1, N − 1],
i−1 i+1
π(i) = π(i − 1) 1 − + π(i + 1)
N N
16 CHAPTER 1. THE TRANSITION MATRIX
Reversible chains
The notions of time-reversal and time-reversibility are very productive, as we shall
see in several occasions in the sequel.
Let {Xn }n≥0 be an hmc with transition matrix P and admitting a stationary
distribution π > 0 (meaning π(i) > 0 for all states i). Define the matrix Q,
indexed by E, by
π(i)qij = π(j)pji . (1.5)
This is a stochastic matrix since
X X π(j) 1 X π(i)
qij = pji = π(j)pji = = 1,
j∈E j∈E
π(i) π(i) j∈E
π(i)
where the third equality uses the global balance equations. Its interpretation is the
following: Suppose that the initial distribution of the chain is π, in which case for
all n ≥ 0, all i ∈ E, P (Xn = i) = π(i). Then, from Bayes’s retrodiction formula,
P (Xn+1 = i | Xn = j)P (Xn = j)
P (Xn = j | Xn+1 = i) = ,
P (Xn+1 = i)
that is, in view of (1.5),
We see that Q is the transition matrix of the initial chain when time is reversed.
The following is a very simple observation that will be promoted to the rank of a
theorem in view of its usefulness and also for the sake of easy reference.
Theorem 1.3.2 Let P be a stochastic matrix indexed by a countable set E, and
let π be a probability distribution on E. Define the matrix Q indexed by E by (1.5).
If Q is a stochastic matrix, then π is a stationary distribution of P.
Definition 1.3.2 One calls reversible a stationary Markov chain with initial dis-
tribution π (a stationary distribution) if for all i, j ∈ E, we have the so-called
detailed balance equations
π(i)pij = π(j)pji . (1.6)
We then say: the pair (P, π) is reversible.
In this case, qij = pij , and therefore the chain and the time-reversed chain are
statistically the same, since the distribution of a homogeneous Markov chain is
entirely determined by its initial distribution and its transition matrix.
The following is an immediate corollary of Theorem 1.3.2.
Theorem 1.3.3 Let P be a transition matrix on the countable state space E, and
let π be some probability distribution on E. If for all i, j ∈ E, the detailed balance
equations (1.6) are satisfied, then π is a stationary distribution of P.
Example 1.3.3: The Ehrenfest urn, take 3. The verification of the detailed
balance equations π(i)pi,i+1 = π(i + 1)pi+1,i is immediate.
Stopping times
Let {Xn }n≥0 be a stochastic process with values in the denumerable set E. For an
event A, the notation A ∈ X0n means that there exists a function ϕ : E n+1 7→ {0, 1}
such that
1A (ω) = ϕ(X0 (ω), . . . , Xn (ω)) .
In other terms, this event is expressible in terms of X0 (ω), . . . , Xn (ω). Let now τ
be a random variable with values in . It is called a X0n -stopping time if for all
m ∈ , {τ = m} ∈ X0m . In other words, it is a non-anticipative random time
with respect to {Xn }n≥0 , since in order to check if τ = m, one needs only observe
the process up to time m and not beyond. It is immediate to check that if τ is a
X0n -stopping time, then so is τ + n for all n ≥ 1.
Example 1.4.1: Return time. Let {Xn }n≥0 be an hmc with state space E.
Define for i ∈ E the return time to i by
Ti := inf{n ≥ 1 ; Xn = i}
using the convention inf ∅ = ∞ for the empty set of . This is a X0n -stopping
time since for all m ∈ ,
Example 1.4.2: Successive return times. This continues the previous ex-
ample. Let us fix a state, conventionally labeled 0, and let T0 be the return time
to 0. We define the successive return times to 0, τk , k ≥ 1 by τ1 = T0 and for
k ≥ 1,
τk+1 := inf{n ≥ τk + 1 ; Xn = 0}
with the above convention that inf ∅ = ∞. In particular, if τk = ∞ for some k,
then τk+ℓ = ∞ for all ℓ ≥ 1. The identity
(m−1 )
X
{τk = m} ≡ 1{Xn =0} = k − 1 , Xm = 0
n=1
Let {Xn }n≥0 be a stochastic process with values in the countable set E and let
τ be a random time taking its values in := ∪ {+∞}. In order to define Xτ
when τ = ∞, one must decide how to define X∞ . This is done by taking some
arbitrary element ∆ not in E, and setting
X∞ = ∆.
Proof. (α) We have to show that for all times k ≥ 1, n ≥ 0, and all states
i0 , . . . , in , i, j1 , . . . , jk ,
P (Xτ +1 = j1 , . . . , Xτ +k = jk | Xτ = i, Xτ ∧0 = i0 , . . . , Xτ ∧n = in )
= P (Xτ +1 = j1 , . . . , Xτ +k = jk | Xτ = i).
The general case is obtained by the same arguments. The left-hand side of (⋆)
equals
P (Xτ +k = j, Xτ = i, Xτ ∧n = in )
.
P (Xτ = i, Xτ ∧n = in )
The numerator of the above expression can be developed as
X
P (τ = r, Xr+k = j, Xr = i, Xr∧n = in ) . (⋆⋆)
r∈
Therefore, the left-hand side of (⋆) is just pij (k). Similar computations show that
the right-hand side of (⋆) is also pij (k), so that (α) is proven.
(β) We must show that for all states i, j, k, in−1 , . . . , i1 ,
But the first equality follows from the fact proven in (α) that for the stopping time
τ ′ = τ + n, the processes before and after τ ′ are independent given Xτ ′ = j. The
second equality is obtained by the same calculations as in the proof of (α).
are independent and identically distributed. Such pieces are called the regenerative
cycles of the chain between visits to state 0. Each random time τk is a regeneration
time, in the sense that {Xτk +n }n≥0 is independent of the past X0 , . . . , Xτk −1 and
has the same distribution as {Xn }n≥0 . In particular, the sequence {τk − τk−1 }k≥1
is iid.
1.5. EXERCISES 21
1.5 Exercises
Exercise 1.5.1. A counterexample.
The Markov property does not imply that the past and the future are independent
given any information concerning the present. Find a simple example of an hmc
{Xn }n≥0 with state space E = {1, 2, 3, 4, 5, 6} such that
Exercise 1.5.3.
Let {Xn }n≥0 be a hmc with state space E and transition matrix P. Show that for
all n ≥ 1, all k ≥ 2, Xn is conditionally independent of X0 , . . . , Xn−2 , Xn+2 , . . . , Xn+k
given Xn−1 , Xn+1 and compute the conditional distribution of Xn given Xn−1 , Xn+1 .
Recurrence
23
24 CHAPTER 2. RECURRENCE
and
Pi (Ti < ∞) < 1 ⇐⇒ Pi (Ni = ∞) = 0 ⇐⇒ Ei [Ni ] < ∞. (2.1)
In particular, the event {Ni = ∞} has Pi -probability 0 or 1.
The potential matrix G associated with the transition matrix P is defined by
X
G= Pn .
n≥0
is the average number of visits to state j, given that the chain starts from state i.
Recall that Ti denotes the return time to state i.
Pi (Ti < ∞) = 1,
Ei [Ti ] < ∞,
Example 2.1.1: 1-D random walk. The state space of this Markov chain is
E := and the non-null terms of its transition matrix are pi,i+1 = p , pi,i−1 = 1−p,
where p ∈ (0, 1). Since this chain is irreducible, it suffices to elucidate the nature
(recurrent or transient) of any one of its states, say, 0. We have p00 (2n + 1) = 0
and
(2n)! n
p00 (2n) = p (1 − p)n .
n!n!
√
By Stirling’s equivalence formula n! ∼ (n/e)n 2πn, the above quantity is equiva-
lent to
[4p(1 − p)]n
√ (⋆)
πn
P
and the nature of the series ∞ n=0 p00 (n) (convergent or divergent) is that of the
series with general term (⋆). If p 6= 12 , in which case 4p(1 − p) < 1, the latter series
converges, and if p = 21 , in which case 4p(1 − p) = 1, it diverges. In summary, the
states of the 1-D random walk are transient if p 6= 21 , recurrent if p = 12 .
Example 2.1.2: 3-D random walk. The state space of this hmc is E =
Z3 . Denoting by e1 , e2 , and e3 the canonical basis vectors of R3 (respectively
(1, 0, 0), (0, 1, 0), and (0, 0, 1)), the nonnull terms of the transition matrix of the
3-D symmetric random walk are given by
1
px,x±ei = .
6
We elucidate the nature of state, say, 0 = (0, 0, 0). Clearly, p00 (2n + 1) = 0 for all
n ≥ 0, and (exercise)
X 2n
(2n)! 1
p00 (2n) = .
0≤i+j≤n
(i!j!(n − i − j)!)2 6
and this shows that for large n, i0 ∼ n/3 and j0 ∼ n/3. Therefore, for large n,
n! 2n
p00 (2n) ∼ 2n n
.
(n/3)!(n/3)!2 e n
By Stirling’s equivalence formula, the right-hand side of the latter equivalence is
in turn equivalent to √
3 3
,
2(πn)3/2
the general term of a convergent series. State 0 is therefore transient.
One might wonder at this point about the symmetric random walk on 2 , which
moves at each step northward, southward, eastward and westward equiprobably.
Exercise ?? asks you to show that it is null recurrent. Exercise ?? asks you to
prove that the symmetric random walk on p , p ≥ 4 are transient.
2.2. STATIONARY DISTRIBUTION CRITERION 27
Proof. By definition, i and j communicate if and only if there exist integers M and
N such that pij (M ) > 0 and pji (N ) > 0. Going from i to j in M steps, then from
j to j in n steps, then from j to i in N steps, is just one way of going from i back
to i in M + n + N steps. Therefore, pii (M + n + N ) ≥ pij (M ) × pjj (n) × pji (N ).
Similarly, pjj (N + n + M ) ≥ pji (N ) × pii (n) × pij (M ). Therefore, with α :=
pij (M ) pji (N ) (a strictly positive quantity), we have pii (M + N P+ n) ≥ α pjj (n)
∞
and pjj (M + N + n) ≥ α pii (n). This implies that the series
P n=0 pii (n) and
∞
p
n=0 jj (n) either both converge or both diverge. The potential matrix criterion
concludes the proof.
xT = xT P . (2.2)
x0 = 1.
P P P P P
Also, i∈E n≥1 1{Xn =i} 1{n≤T0 } = n≥1 i∈E 1{Xn =i} 1{n≤T0 } = n≥1 1{n≤T0 } =
T0 , and therefore X
xi = E0 [T0 ]. (2.4)
i∈E
This is the probability, starting from state 0, of visiting i at time n before returning
to 0. From the definition of x,
X
xi = 0 p0i (n) . (†)
n≥1
If xi were null for some i ∈ E, i 6= 0, the latter equality would imply that p0i (n) =
0 for all n ≥ 0, which means that 0 and i do not communicate, in contradiction to
the irreducibility assumption.
It remains to show that xi < ∞ for all i ∈ E. As before, we find that
X
1 = x0 = xj pj0 (n)
j∈E
2.2. STATIONARY DISTRIBUTION CRITERION 29
for all n ≥ 1, and therefore if xi = ∞ for some i, necessarily pi0 (n) = 0 for all
n ≥ 1, and this also contradicts irreducibility.
Proof. In the proof of Theorem 2.2.1, we showed that for an invariant measure y
of an irreducible chain, yi > 0 for all i ∈ E, and therefore, one can define, for all
i, j ∈ E, the matrix Q by
yi
qji = pij . (⋆)
yj
P P y
It is a transition matrix, since i∈E qji = y1j i∈E yi pij = yjj = 1. The general
term of Qn is
yi
qji (n) = pij (n) . (⋆⋆)
yj
Indeed, supposing (⋆⋆) true for n,
X X yk yi
qji (n + 1) = qjk qki (n) = pkj pik (n)
k∈E k∈E
yj yk
yi X yi
= pik (n)pkj = pij (n + 1),
yj k∈E yj
We therefore see that the sequences {y0 0 p0i (n)} and {yi gi0 (n)} satisfy the same
recurrence equation. Their first terms (n = 1), respectively y0 0 p0i (1) = y0 p0i and
yi gi0 (1) = yi qi0 , are equal in view of (⋆). Therefore, for all n ≥ 1,
yi
0 p0i (n) = gi0 (n).
y0
P
Summing up with respect to n ≥ 1 and using n≥1 gi0 (n) = 1 (Q is recurrent),
we obtain that xi = yy0i .
Equality (2.4) and the definition of positive recurrence give the following.
An hmc may well be irreducible and possess an invariant measure, and yet not be
recurrent. The simplest example is the 1-D non-symmetric random walk, which
was shown to be transient and yet admits xi ≡ 1 for invariant measure. It turns
out, however, that the existence of a stationary probability distribution is neces-
sary and sufficient for an irreducible chain (not a priori assumed recurrent) to be
recurrent positive.
Theorem 2.2.4 An irreducible hmc is positive recurrent if and only if there exists
a stationary distribution. Moreover, the stationary distribution π is, when it exists,
unique, and π > 0.
Proof. The direct part follows from Theorems 2.2.1 and 2.2.3. For the converse
part, assume the existence of a stationary distribution
P π. Iterating π T = π T P, we
obtain π T = π T Pn , that is, for all i ∈ E, π(i) = j∈E π(j)pji (n). If the chain
were transient, then, for all states i, j,
In particular, limn pji (n) = 0. Since pji (n) is bounded uniformly in j and n by 1 ,
by dominated convergence (Theorem A.2.1):
X X
π(i) = lim π(j)pji (n) = π(j) lim pji (n) = 0.
n↑∞ n↑∞
j∈E j∈E
P
This contradicts the assumption that π is a stationary distribution ( i∈E π(i) =
1). The chain must therefore be recurrent, and by Theorem 2.2.3, it is positive
recurrent.
The stationary distribution π of an irreducible positive recurrent chain is unique
(use Theorem 2.2.2 and the fact that there is no choice for a multiplicative factor
but 1). Also recall that π(i) > 0 for all i ∈ E (see Theorem 2.2.1).
Theorem 2.2.5 Let π be the unique stationary distribution of an irreducible pos-
itive recurrent hmc, and let Ti be the return time to state i . Then
Proof. This equality is a direct consequence of expression (2.3) for the invariant
measure. Indeed, π is obtained by normalization of x: for all i ∈ E,
xi
π(i) = P ,
j∈E xj
Since state 0 does not play a special role in the analysis, (2.5) is true for all i ∈ E.
and in particular, the limit of the left hand side is 1. If the chain were transient,
then, as we saw in the proof of Theorem 2.2.4, for all i, j ∈ E,
for some finite set F and some ǫ > 0. Then the corresponding hmc is positive
recurrent.
2.3. FOSTER’S THEOREM 33
Proof. Since inf i h(i) > −∞, one may assume without loss of generality that h ≥ 0, by
adding a constant if necessary. Call τ the return time to F , and define Yn = h(Xn )1{n<τ } .
Equality (2.7) is just E[h(Xn+1 ) | Xn = i] ≤ h(i) − ǫ for all i 6∈ F . For i 6∈ F ,
Ei [Yn+1 | X0n ] = Ei [Yn+1 1{n<τ } | X0n ] + Ei (Yn+1 1{n≥τ } | X0n ]
= Ei [Yn+1 1{n<τ } | X0n ] ≤ Ei [h(Xn+1 )1{n<τ } | X0n ]
= 1{n<τ } Ei [h(Xn+1 ) | X0n ] = 1{n<τ } Ei [h(Xn+1 ) | Xn ]
≤ 1{n<τ } h(Xn ) − ǫ1{n<τ } ,
where the third equality comes from the fact that 1{n<τ } is a function of X0n , the fourth
equality is the Markov property, and the last inequality is true because Pi -a.s., Xn 6∈ F
on n < τ . Therefore, Pi -a.s., Ei [Yn+1 | X0n ] ≤ Yn − ǫ1{n<τ } , and taking expectations,
Ei [Yn+1 ] ≤ Ei [Yn ] − ǫPi (τ > n).
Iterating the above equality, and observing that Yn is non-negative, we obtain
n
X
0 ≤ Ei [Yn+1 ] ≤ Ei [Y0 ] − ǫ Pi (τ > k).
k=0
P∞
But Y0 = h(i), Pi -a.s., and k=0 Pi (τ > k) = Ei [τ ]. Therefore, for all i 6∈ F ,
Ei [τ ] ≤ ǫ−1 h(i).
For j ∈ F , first-step analysis yields
X
Ej [τ ] = 1 + pji Ei [τ ].
i6∈F
P
Thus Ej [τ ] ≤ 1+ǫ−1 i6∈F pji h(i), and this quantity is finite in view of assumption (2.6).
Therefore, the return time to F starting anywhere in F has finite expectation. Since F
is a finite set, this implies positive recurrence in view of the following lemma.
Lemma 2.3.1 Let {Xn }n≥0 be an irreducible hmc, let F be a finite subset of the
state space E, and let τ (F ) be the return time to F . If Ej [τ (F )] < ∞ for all j ∈ F ,
the chain is positive recurrent.
and therefore ∞
X
Ei [Ti ] = Ei [Sk 1{k<T̃i } ].
k=0
Now, X
Ei [Sk 1{k<T̃i } ] = Ei [Sk 1{k<T̃i } 1{Xτk =ℓ} ] ,
ℓ∈F
and by the strong Markov property applied to {Xn }n≥0 and the stopping time τk ,
and the fact that the event {k < T̃i } belongs to the past of {Xn }n≥0 at time τk ,
Ei [Sk 1{k<T̃i } 1{Xτk =ℓ} ] = Ei [Sk | k < T̃i , Xτk = ℓ]Pi (k < T̃i , Xτk = ℓ)
= Ei [Sk | Xτk = ℓ]Pi (k < T̃i , Xτk = ℓ) .
Observing that Ei [Sk | Xτk = ℓ] = Eℓ [τ (F )], we see that the latter expression is
bounded by (maxℓ∈F Eℓ [τ (F )]) Pi (k < T̃i , Xτk = ℓ), and therefore
X
∞
Ei [Ti ] ≤ max Eℓ (τ (F )) Pi (T̃i > k) = max Eℓ (τ (F )) Ei [T̃i ] < ∞.
ℓ∈F ℓ∈F
k=0
Corollary 2.3.1 Let {Xn }n≥0 be an irreducible hmc on E = such that for all
n ≥ 0 and all i ∈ E,
E[Xn+1 − Xn | Xn = i] < ∞
and
lim sup E[Xn+1 − Xn | Xn = i] < 0. (2.8)
i↑∞
Proof. Let −2ǫ be the left-hand side of (2.8). In particular, ǫ > 0. By (2.8), for i
sufficiently large, say i > i0 , E[Xn+1 − Xn | Xn = i] < −ǫ. We are therefore in the
conditions of Foster’s theorem with h(i) = i and F = {i; i ≤ i0 }.
Now, with j 6∈ F ,
∞
X
Ej |h(Xn+1 ) − h(Xn )| 1{τ >n}
n=0
∞
X
= Ej Ej [|h(Xn+1 ) − h(Xn )| |X0n ] 1{τ >n}
n=0
∞
X
= Ej Ej [|h(Xn+1 ) − h(Xn )| |Xn ] 1{τ >n}
n=0
∞
X
≤K Pj (τ > n)
n=0
36 CHAPTER 2. RECURRENCE
for some finite positive constant K by (2.10). Therefore, if the chain is positive
recurrent, the latter bound is KEj [τ ] < ∞. Therefore
Ej [h(Xτ )] = Ej h(Xτ )1{τ <∞}
∞
X
= h(j) + Ej (h(Xn+1 ) − h(Xn )) 1{τ >n} > h(j),
n=0
by (2.11). In view of assumption (2.9), we have h(j) > maxi∈F h(i) ≥ Ej [h(Xτ )],
hence a contradiction. The chain therefore cannot be positive recurrent.
2.4 Examples
Birth-and-death Markov chain
We first define the birth-and-death process with a bounded population. The state
space of such a chain is E = {0, 1, . . . , N } and its transition matrix is
r0 p 0
q 1 r1 p 1
q 2 r 2 p 2
. ..
P= ,
q i ri pi
. . .
.. .. ..
qN −1 rN −1 pN −1
pN rN
where pi > 0 for all i ∈ E\{N }, qi > 0 for all i ∈ E\{0}, ri ≥ 0 for all i ∈ E, and
pi + qi + ri = 1 for all i ∈ E. The positivity conditions placed on the pi ’s and qi ’s
guarantee that the chain is irreducible. Since the state space is finite, it is positive
recurrent (Theorem 2.2.6), and it has a unique stationary distribution. Motivated
by the Ehrenfest hmc which is reversible in the stationary state, we make the
educated guess that the birth and death process considered has the same property.
This will be the case if and only if there exists a probability distribution π on E
satisfying the detailed balance equations, that is, such that for all 1 ≤ i ≤ N ,
π(i − 1)pi−1 = π(i)qi . Letting w0 = 1 and for all 1 ≤ i ≤ N ,
i
Y pk−1
wi =
k=1
qk
we find that
wi
π(i) = PN (2.12)
j=0 wj
2.4. EXAMPLES 37
indeed satisfies the detailed balance equations and is therefore the (unique) sta-
tionary distribution of the chain.
We now consider the unbounded birth-and-death process. This chain has the state
space E = N and its transition matrix is as in the previous example (only, it is
unbounded on the right). In particular, we assume that the pi ’s and qi ’s are positive
in order to guarantee irreducibility. The same reversibility argument as above
applies with a little difference. In fact we can show that the wi ’s defined above
satisfy the detailed balance equations and therefore the global balance equations.
Therefore the vector {wi }i∈E the unique, up to a multiplicative factor, invariant
measure of the chain. It can be normalized to a probability distribution if and
only if
X∞
wj < ∞ .
j=0
Therefore, in this case and only in this case there exists a (unique) stationary
distribution, also given by (2.12).
Note that the stationary distribution, when it exists, does not depend on the ri ’s.
The recurrence properties of the above unbounded birth-and-death process are
therefore the same as those of the chain below, which is however not aperiodic.
For aperiodicity, it suffices to suppose at least one of the ri ’s to be positive.
p0 = 1 p1 p2 pi−1 pi
0 1 2 i−1 i i+1
q1 q2 q3 qi qi+1
For this, consider for any given k ∈ {0, 1, . . . , N } the truncated chain, which moves
on the state space {0, 1, . . . , k} as the original chain, except in state k where it
moves one step down with probability qk and stays still with probability pk + rk .
Write Ee for expectations of the modified chain. The unique stationary distribution
of this chain is given by
wℓ
π
eℓ = Pk
j=0 wℓ
ek [Tk ] = (rk + pk ) × 1 +
forall 0 ≤ ℓ ≤ k. First-step analysis shows that E
qk 1 + E ek−1 [Tk ] , that is
ek [Tk ] = 1 + qk E
E ek−1 [Tk ] .
Also
k
e 1 1 X
Ek [Tk ] = = wj ,
π
ek wk j=0
k−1 j
! k %
1 X p 1 q
Ek−1 [Tk ] = k = 1− .
p q p−q p
q q
j=0
where a+ = max(a, 0). The sequence {Zn }n≥1 is assumed to be an iid sequence,
independent of the initial state X0 , with common probability distribution
P (Z1 = k) = ak , k ≥ 0
2.4. EXAMPLES 39
and therefore zz Xn+1 − z Xn z Zn+1 = (z − 1)1{Xn =0} z Zn+1 . From the independence
of Xn and Zn+1 , E[z Xn z Zn+1 ] = E[z Xn ]gZ (z), and E[1{Xn =0} z Zn+1 ] = π(0)gZ (z),
where π(0) = P (Xn = 0). Therefore, zE[z Xn+1 ] − gZ (z)E[z Xn ] = (z − 1)π(0)gZ (z).
40 CHAPTER 2. RECURRENCE
= π(0) (gZ (z) + (z − 1)gZ′ (z)), and let z = 1, to obtain, taking into account the
equalities gX (1) = gZ (1) = 1 and gZ′ (1) = E[Z],
But the stationary distribution of an irreducible hmc is positive, hence the neces-
sary condition of positive recurrence:
E[Z1 ] < 1.
We now show this condition is also sufficient for positive recurrence. This follows
immediately from Pakes’s lemma, since for i ≥ 1, E[Xn+1 − Xn | Xn = i] = E[Z] −
1 < 0.
From (2.15) and (2.16), we have the generating function of the stationary distri-
bution:
X ∞
(z − 1)gZ (z)
π(i)z i = (1 − E[Z]) . (2.17)
i=0
z − g Z (z)
therefore
X n
Xn ≥ Zk − n,
k=1
1 1 1
3
1
1 2
1 1
2 4 2 2 2 4
1 1
3 2
1
3
3 3
1
directions. Clearly, whatever be the dimension n ≥ 2, di = n
and the stationary
distribution is the uniform distribution.
The lazy random walk on the graph is, by definition, the Markov chain on V with
the transition probabilities pii = 21 and for i, j ∈ V such that i and j are connected
by an edge of the graph, pi,i = 2d1 i . This modified chain admits the same stationary
distribution as the original random walk. The difference is that the lazy version is
always aperiodic, whereas the original version maybe periodic.
2.5 Exercises
Exercise 2.5.1. Truncated hmc.
Let P be a transition matrix on the countable state space E, with the positive
stationary distribution π. Let A be a subset of the state space, and define the
truncation of P on A to be the transition matrix Q indexed by A and given by
qij = pij if i, j ∈ A, i 6= j,
X
qii = pii + pik .
k∈Ā
π
Show that if (P, π) is reversible, then so is (Q, π(A) ).
the configuration is not altered. At each step, stone Si is selected with probability
αi > 0. Call Xn the situation at time n, for instance Xn = Si1 · · · SiM , meaning
that stone Sij is in the jth position. Show that {Xn }n≥0 is an irreducible hmc
and that it has a stationary distribution given by the formula
π(Si1 · · · SiM ) = CαiM1 αiM2 −1 · · · αiM ,
for some normalizing constant C.
Long-run behaviour
Pn
where T0 is the return time to 0. Define for n ≥ 1, ν(n) := k=1 1{Xk =0} .
45
46 CHAPTER 3. LONG-RUN BEHAVIOUR
By the independence property of the regenerative cycles, {Up }p≥1 is an iid se-
quence. Moreover, assuming f ≥ 0 and using the strong Markov property,
"T #
X 0
E[U1 ] = E0 f (Xn )
n=1
" T0 X
# " T0
#
X X X
= E0 f (i)1{Xn =i} = f (i)E0 1{Xn =i}
n=1 i∈E i∈E n=1
X
= f (i)xi .
i∈E
By hypothesis, this quantity is finite, and threfore the strong law of large numbers
applies, to give
n
1X X
lim Up = f (i)xi ,
n↑∞ n
p=1 i∈E
that is,
τn+1
1 X X
lim f (Xk ) = f (i)xi . (3.4)
n↑∞ n
k=T +1
0 i∈E
Observing that
τν(n) ≤ n < τν(n)+1 ,
we have Pτν(n) Pn Pτν(n)+1
k=1f (Xk ) k=1 f (Xk ) k=1 f (Xi )
≤ ≤ .
ν(n) ν(n) ν(n)
3.1. ERGODIC THEOREM 47
Corollary 3.1.1 Let {Xn }n≥0 be an irreducible positive recurrent Markov chain
with the stationary distribution π, and let f : E → R be such that
X
|f (i)|π(i) < ∞. (3.5)
i∈E
Now, f satisfying (3.5) also satisfies (3.2), since x and π are proportional, and
therefore, Pµ -a.s.,
N
1 X X
lim f (Xk ) = f (i)xi .
N ↑∞ ν(N )
k=1 i∈E
Corollary 3.1.2 Let {Xn }n≥1 be an irreducible positive recurrent Markov chain
with the stationary distribution π, and let g : E L+1 → R be such that
X
|g(i0 , . . . , iL )|π(i0 )pi0 i1 · · · piL−1 iL < ∞
i0 ,...,iL
48 CHAPTER 3. LONG-RUN BEHAVIOUR
N
1 X X
lim g(Xk , Xk+1 , . . . , Xk+L ) = g(i0 , i1 , . . . , iL )π(i0 )pi0 i1 · · · piL−1 iL .
N k=1 i ,i ,...,i 0 1 L
Proof. Apply Corollary 3.1.1 to the snake chain {(Xn , Xn+1 , . . . , Xn+L )}n≥0 ,
which is irreducible recurrent and admits the stationary distribution
Note that
X
g(i0 , i1 , . . . , iL )π(i0 )pi0 i1 · · · piL−1 iL = Eπ [g(X0 , . . . , XL )]
i0 ,i1 ,...,iL
1X
dV (α, β) := |α(i) − β(i)|. (3.7)
2 i∈E
Lemma 3.2.1 Let α and β be two probability distributions on the same countable
space E. Then
Proof. For the second equality observe that for each subset A there is a subset B
such that |α(A) − β(A)| = α(B) − β(B) (take B = A or Ā). For the first equality,
write X
α(A) − β(A) = 1A (i){α(i) − β(i)}
i∈E
and observe that the right-hand side is maximal for A = {i ∈ E; α(i) > β(i)}.
Therefore, with g(i) = α(i) − β(i),
X 1X
sup {α(A) − β(A)} = g + (i) = |g(i)| ,
A⊆E
i∈E
2 i∈E
P
where the equality i∈E g(i) = 0 was taken into account.
The distance in variation between two random variables X and Y with values in
E is the distance in variation between their probability distributions, and it is
denoted (with a slight abuse of notation) by dV (X, Y ). Therefore
1X
dV (X, Y ) := |P (X = i) − P (Y = i)| .
2 i∈E
For two probability distributions α and β on the countable set E, let D(α, β) be
the collection of pairs of random variables (X, Y ) taking their values in E × E,
and with marginal distributions α and β, that is,
Theorem 3.2.1 For any pair (X, Y ) ∈ D(α, β), we have the fundamental cou-
pling inequality
dV (α, β) ≤ P (X 6= Y ), (3.9)
and equality is attained by some pair (X, Y ) ∈ D(α, β), which is then said to
realize maximal coincidence.
and therefore
P (U = 1) = 1 − dV (α, β),
P (Z = i) = (α(i) ∧ β(i))/ (1 − dV (α, β)) ,
P (V = i) = (α(i) − β(i))+ /dV (α, β) ,
P (W = i) = (β(i) − α(i))+ /dV (α, β) .
(X, Y ) = (Z, Z) if U = 1
= (V, W ) if U = 0 ,
we have
P (X = i) = P (U = 1, Z = i) + P (U = 0, V = i)
= P (U = 1)P (Z = i) + P (U = 0)P (V = i)
= α(i) ∧ β(i) + (α(i) − β(i))+ = α(i),
Observe that Definition 3.2.3 concerns only the marginal distributions of the
stochastic process, not the stochastic process itself. Therefore, if there exists an-
D
other stochastic process {Xn′ }n≥0 such that Xn ∼ Xn′ for all n ≥ 0, and if there
D
exists a third one {Xn′′ }n≥0 such that Xn′′ ∼ π for all n ≥ 0, then (3.13) follows
from
lim dV (Xn′ , Xn′′ ) = 0. (3.10)
n↑∞
This trivial observation is useful because of the resulting freedom in the choice of
{Xn′ } and {Xn′′ }. An interesting situation occurs when there exists a finite random
time τ such that Xn′ = Xn′′ for all n ≥ τ .
Definition 3.2.2 Two stochastic processes {Xn′ }n≥0 and {Xn′′ }n≥0 taking their val-
ues in the same state space E are said to couple if there exists an almost surely
finite random time τ such that
Theorem 3.2.2 For any coupling time τ of {Xn′ }n≥0 and {Xn′′ }n≥0 , we have the
coupling inequality
dV (Xn′ , Xn′′ ) ≤ P (τ > n) . (3.12)
lim dV (αn , β) = 0 .
n↑∞
52 CHAPTER 3. LONG-RUN BEHAVIOUR
(B) An E-valued random sequence {Xn }n≥0 such that for some probability dis-
tribution π on E,
lim dV (Xn , π) = 0, (3.13)
n↑∞
If the state space is finite, computation of the n-th iterate of the transition matrix
P is all that we need, in principle, to prove (3.14). Such computation requires some
knowledge of the eigenstructure of P, and there is a famous result of linear algebra,
the Perron–Fröbenius theorem, that does the work. We shall give the details in
Subsection 3.2. However, in the case of infinite state space, linear algebra fails to
provide the answer, and recourse to other methods is necessary.
In fact, (3.14) can be drastically improved:
Theorem 3.2.3 Let {Xn }n≥0 be an ergodic hmc on the countable state space E
with transition matrix P and stationary distribution π, and let µ be an arbitrary
initial distribution. Then
X
lim |Pµ (Xn = i) − π(i)| = 0,
n↑∞
i∈E
lim dV (µT Pn , ν T Pn ) = 0.
n↑∞
The announced results correspond to the particular case where ν is the stationary
distribution π, and particularizing further, µ = δj . From the discussion preceding
Definition 3.2.2, it suffices to construct two coupling chains with initial distribu-
tions µ and ν, respectively. This is done in the next lemma.
(1) (2)
Lemma 3.2.2 Let {Xn }n≥0 and {Xn }n≥0 be two independent ergodic hmcs
with the same transition matrix P and initial distributions µ and ν, respectively.
(1) (2)
Let τ = inf{n ≥ 0; Xn = Xn }, with τ = ∞ if the chains never intersect. Then
τ is, in fact, almost surely finite. Moreover, the process {Xn′ }n≥0 defined by
(
(1)
Xn if n ≤ τ,
Xn′ = (2) (3.15)
Xn if n ≥ τ
(1) (2)
Proof. Step 1. Consider the product hmc {Zn }n≥0 defined by Zn = (Xn , Xn ).
It takes values in E × E, and the probability of transition from (i, k) to (j, ℓ) in n
steps is pij (n)pkℓ (n). We first show that this chain is irreducible. The probability
of transition from (i, k) to (j, ℓ) in n steps is pij (n)pkℓ (n). Since P is irreducible and
aperiodic, by Theorem 1.2.2, there exists m such that for all pairs (i, j) and (k, ℓ),
n ≥ m implies pij (n)pkℓ (n) > 0. This implies irreducibility. (Note the essential
role of aperiodicity. A simple counterexample is that of the the symmetric random
walk on , which is irreducible but of period 2. The product of two indepencent
such hmc’s is the symmetric random walk on 2 which has two communications
classes.)
Step 2. Next we show that the two independent chains meet in finite time. Clearly,
the distribution σ̃ defined by σ̃(i, j) := π(i)π(j) is a stationary distribution for
the product chain, where π is the stationary distribution of P. Therefore, by
the stationary distribution criterion, the product chain is positive recurrent. In
particular, it reaches the diagonal of E 2 in finite time, and consequently, P (τ <
∞) = 1.
54 CHAPTER 3. LONG-RUN BEHAVIOUR
It remains to show that {Xn′ }n≥0 given by (3.15) is an hmc with transition matrix
P. For this we use the following lemma.
Lemma 3.2.3 Let X01 , X02 , Zn1 , Zn2 (n ≥ 1), be independent random variables, and
suppose moreover that Zn1 , Zn2 (n ≥ 1) are identically distributed. Let τ be a non-
negative integer-valued random variable such that for all m ∈ , the event {τ = m}
is expressible in terms of X01 , X02 , Zn1 , Zn2 (n ≤ m). Define the sequence {Zn }n≥1 by
Zn = Zn1 if n ≤ τ
= Zn2 if n > τ
Then, {Zn }n≥1 has the same distribution as {Zn1 }n≥1 and is independent of X01 , X02 .
Step 3. We now complete the proof. The statement of the theorem concerns
only the distributions of {Xn1 }n≥0 and {Xn2 }n≥0 , and therefore we can assume a
representation
ℓ
Xn+1 = f (Xnℓ , Zn+1
ℓ
) (ℓ = 1, 2) ,
3.2. CONVERGENCE IN VARIATION 55
where X01 , X02 , Zn1 , Zn2 (n ≥ 1) satisfy the conditions stated in Lemma 3.2.3. The
random time τ satisfies the condition of Lemma 3.2.3. Defining {Zn }n≥1 in the
same manner as in this lemma, we therefore have
Perron–Frobenius
When the state space of a hmc is finite, we can rely on the standard results of linear
algebra to study the asymptotic behavior of the n-step transition matrix Pn , which
depends on the eigenstructure of P. The Perron–Frobenius theorem detailing the
eigenstructure of non-negative matrices is therefore all that is needed, at least in
theory.
The main result of Perron and Frobenius is that convergence to steady state of
an ergodic finite state space hmc is geometric, with relative speed equal to the
second-largest eigenvalue modulus (slem). Even if there are a few interesting
models, especially in biology, where the eigenstructure of the transition matrix
can be extracted, this situation remains nevertheless exceptional. It is therefore
important to find estimates of the slem.
From the basic results of the theory of matrices relative to eigenvalues and eigen-
vectors we quote the following one, relative to a square matrix A of dimension r
with distinct eigenvalues denoted λ1 , . . . , λr . Let u1 , . . . , ur and v1 , . . . , vr be the
associated sequences of left and right eigenvectors, respectively. Then, u1 , . . . , ur
form an independent collection of vectors, and so do v1 , . . . , vr . Also, uTi vj = 0 if
i 6= j. Since eigenvectors are determined up to multiplication by an arbitrary non-
null scalar, one can choose them in such a way that uTi vi = 1 for all i, 1 ≤ i ≤ r.
We then have the spectral decomposition
r
X
n
A = λni vi uTi . (3.17)
i=1
56 CHAPTER 3. LONG-RUN BEHAVIOUR
lim Pn = 1π T = P∞ ,
n↑∞
is obtained for this special case in a purely algebraic way. In addition, this algebraic
method gives the convergence speed, which is exponential and determined by the
second-largest eigenvalue absolute value. This is a general fact, which follows from
the Perron–Frobenius theory of non-negative matrices below.
A matrix A = {aij }1≤i,j≤r with real coefficients is called non-negative (resp., posi-
tive) if all its entries are
P non-negative (resp., positive). A non-negative
P matrix A
is called stochastic if rj=1 aij = 1 for all i, and substochastic if rj=1 aij ≤ 1 for
all i, with strict inequality for at least one i.
Non-negativity (resp., positivity) of A will be denoted by A ≥ 0 (resp., A > 0).
If A and B are two matrices of the same dimensions with real coefficients, the
notation A ≥ B (resp., A > B) means that A − B ≥ 0 (resp., A − B > 0).
The communication graph of a square non-negative matrix A is the directed graph
with the state space E = {1, . . . , r} as its set of vertices and an directed edge from
vertex i to vertex j if and only if aij > 0.
3.2. CONVERGENCE IN VARIATION 57
We may always order the eigenvalues in such a way that if |λ2 | = |λj | for some
j ≥ 3, then m2 ≥ mj , where mj is the algebraic multiplicity of λj . Then
Approximate sampling
The quest for a random generator without these ailments is at the origin of the
Monte Carlo Markov chain (mcmc) sampling methodology. The basic principle
is the following. One constructs an irreducible aperiodic hmc {Xn }n≥0 with state
space E and stationary distribution π. Since the state space is finite, the chain
is ergodic, and therefore, by Theorem 3.2.3, for any initial distribution µ and all
i ∈ E,
lim Pµ (Xn = i) = π(i) . (3.21)
n→∞
Therefore, when n is “large,” we can consider that Xn has a distribution close to
π.
The first task is that of designing the mcmc algorithm. One must find an ergodic
transition matrix P on E, the stationary distribution of which π. In the Monte
Carlo context, the transition mechanism of the chain is called a sampling algorithm,
and the asymptotic distribution π is called the target distribution, or sampled
distribution.
There are infinitely many transition matrices with a given target distribution, and
among them there are infinitely many that correspond to a reversible chain, that
is, such that
π(i)pij = π(j)pji .
We seek solutions of the form
pij = qij αij (3.22)
for j 6= i, where Q = {qij }i,j∈E is an arbitrary irreducible transition matrix on
E, called the candidate-generator matrix. When the present state is i, the next
3.3. MONTE CARLO 59
tentative state j is chosen with probability qij . When j 6= i, this new state is
accepted with probability αij . Otherwise, the next state is the same state i. Hence,
the resulting probability of moving from i to j when i 6= j is given by (3.22). It
remains to select the acceptance probabilities αij .
In Physics, it often arises, and we shall understand why later, that the distribution
π is of the form 3.23.
e−U (i)
π(i) = , (3.23)
Z
where U : E → is the “energy function” and Z is the “partition function”,
the normalizing constant ensuring that π is indeed a probability vector. The
acceptance probability of the transition from i to j is then, assuming the candidate-
generating matrix to be symmetric,
αij = min 1, e−(U (j)−U (i)) .
e−U (i)
αij = −U (i) .
e + e−U (j)
This corresponds to the basic principle of statistical thermodynamics: when two
−E
states 1 and 2 with energies E1 and E2 , choose 1 with probability e−Ee1 +e1−E2 .
on a set E = ΛN , where Λ is countable. The basic step of the Gibbs sampler for
the multivariate distribution π consists in selecting a coordinate number 1 ≤ i ≤,
at random, and choosing the new value y(i) of the corresponding coordinate, given
the present values x(1), . . . , x(i − 1), x(i + 1), . . . , x(N ) of the other coordinates,
with probability
π(y(i) | x(1), . . . , x(i − 1), x(i + 1), . . . , x(N )).
One checks as above that π is the stationary distribution of the corresponding
chain.
Exact sampling
We attempt to construct an exact sample of a given π on a finite state space E,
that is a random variable Z such that P (Z = i) = π(i) for all i ∈ E. The following
algorithm (Propp–Wilson algorithm) is based on a coupling idea. One starts as
usual from an ergodic transition matrix P with stationary distribution π, just as
in the classical mcmc method.
The algorithm is based on a representation of P in terms of a recurrence equation,
that is, for given a function f and an iid sequence {Zn }n≥1 independent of the
initial state, the chain satisfies the recurrence
Xn+1 = f (Xn , Zn+1 ) . (3.25)
The Propp-Wilson algorithm constructs a family of hmc with this transition ma-
trix with the help of a unique iid sequence of random vectors {Yn }n∈ , called the
updating sequence, where Yn = (Zn+1 (1), · · · , Zn+1 (r)) is a r-dimensional random
vector, and where the coordinates Zn+1 (i) have a common distribution, that of Z1 .
For each N ∈ and each k ∈ E, a process {XnN (k)}n≥N is defined recursively by:
XNN (k) = k,
and, for n ≥ N ,
N
Xn+1 (k) = f (XnN (k), Zn+1 (XnN (k)).
(Thus, if the chain is in state i at time n, it will be at time n + 1 in state j =
f (i, Zn+1 (i).) Each of these processes is therefore a hmc with the transition matrix
P. Note that for all k, ℓ ∈ E, and all M, N ∈ , the hmc’s {XnN (k)}n≥N and
{XnM (ℓ)}n≥M use at any time n ≥ max(M, N ) the same updating random vector
Yn+1 .
If, in addition to the independence of {Yn }n∈ , the components Zn+1 (1), Zn+1 (2),
. . ., Zn+1 (r) are, for each n ∈ , independent, we say that the updating is compo-
nentwise independent.
3.3. MONTE CARLO 61
is called the forward coupling time (Fig. 3.1). The random time
5
4
3
2
1
−n +n
−7 −6 −5 −4 −3 −2 −1 0 0 +1 +2 +3 +4
τ− = 7 τ+ = 4
Thus, τ + is the first time at which the chains {Xn0 (i)}n≥0 , 1 ≤ i ≤ r, coalesce.
Lemma 3.3.1 When the updating is componentwise independent, the forward cou-
pling time τ + is almost surely finite.
Proof. Consider the (immediate) extension of Lemma 3.2.2 to the case of r inde-
pendent hmc’s with the same transition matrix. It cannot be applied directly to
our situation, because the chains are not independent. However, the probability
of coalescence in our situation is bounded below by the probability of coalescence
in the completely independent case. To see this, first construct the independent
chains model, using r independent iid componentwise independent updating se-
quences. The difference with our model is that we use too many updatings. In
order to construct from this a set of r chains as in our model, it suffices to use for
two chains the same updatings as soon as they meet. Clearly, the forward cou-
pling time of the so modified model is smaller than or equal to that of the initial
completely independent model.
62 CHAPTER 3. LONG-RUN BEHAVIOUR
Z = X0−τ (i).
Proof. We shall show at the end of the current proof that for all k ∈ , P (τ ≤
k) = P (τ + ≤ k), and therefore the finiteness of τ follows from that of τ + proven
in the last lemma. Now, since for n ≥ τ , X0−n (i) = Z,
P (Z = j) = P (Z = j, τ > n) + P (Z = j, τ ≤ n)
= P (Z = j, τ > n) + P (X0−n (i) = j, τ ≤ n)
= P (Z = j, τ > n) − P (X0−n (i) = j, τ > n) + P (X0−n (i) = j)
= P (Z = j, τ > n) − P (X0−n (i) = j, τ > n) + pij (n)
= An − Bn + pij (n)
But An and Bn are bounded above by P (τ > n), a quantity that tends to 0 as
n ↑ ∞ since τ is almost-surely finite. Therefore
It remains to prove the equality of the distributions of the forwards and backwards
coupling time. For this, select an arbitrary integer k ∈ . Consider an updating
sequence constructed from a bona fide updating sequence {Yn }n∈ , by replacing
Y−k+1 , Y−k+2 , . . . , Y0 by Y1 , Y2 , . . . , Yk . Call τ ′ the backwards coupling time in the
modified model. Clearly τ an τ ′ have the same distribution.
E
5
4
3
2
1
−7 −6 −5 −4 −3 −2 −1 0 0 +1 +2 +3 +4 +5 +6 +7
Y0 Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y0 Y1 Y2 Y3 Y4 Y5 Y6 Y7
Figure 2. τ + ≤ k implies τ ′ ≤ k
3.3. MONTE CARLO 63
P (τ + ≤ k) ≤ P (τ ′ ≤ k) = P (τ ≤ k).
5
4
3
2
1
−7 −6 −5 −4 −3 −2 −1 0 0 +1 +2 +3 +4 +5 +6 +7
Y0 Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y 0 Y1 Y 2 Y 3 Y 4 Y 5 Y 6 Y 7
Figure 3. τ ′ ≤ k implies τ + ≤ k
Now, suppose that τ ′ ≤ k. Then, in the modified model, the chains starting at
time k − τ ′ from states 1, . . . , r must at time −k + τ + ≤ 0 coalesce at time k.
Therefore (see Fig. 3), τ + ≤ k. Therefore τ ′ ≤ k implies τ + ≤ k, so that
P (τ ≤ k) = P (τ ′ ≤ k) ≤ P (τ + ≤ k).
Note that the coalesced value at the forward coupling time is not a sample of π
(see Exercise 3.5.12).
Sandwiching
The above exact sampling algorithm is often prohibitively time-consuming when
the state space is large. However, if the algorithm required the coalescence of
two, instead of r processes, then it would take less time. The Propp and Wilson
algorithm does this in a special, yet not rare, case.
It is now assumed that there exists a partial order relation on E, denoted by ,
with a minimal and a maximal element (say, respectively, 1 and r), and that we
can perform the updating in such a way that for all i, j ∈ E, all N ∈ , all n ≥ N ,
5
4
3
2
1
-n
−6 −5 −4 −3 −2 −1 0
τ =6
Theorem 3.3.2 The monotone backwards coupling time τm is almost surely finite.
Also, the random variables X0−τm (1) = X0−τm (r) has the distribution π.
Proof. We can use most of the proof of Theorem 3.3.1. We need only to prove
independently that τ + is finite. It is so because τ + is dominated by the first time
n ≥ 0 such that Xn0 (r) = 1, and the latter is finite in view of the recurrence
assumption.
Monotone coupling will occur with representations of the form (3.25) such that for
all z,
i j ⇒ f (i, z) f (j, z),
and if for all n ∈ , all i ∈ {1, . . . , r},
recurrence equation
where, as usual, {Zn }n≥1 is iid. In this specific model, Xn is the content at time
n of a dam reservoir with maximum capacity r, and Zn+1 = An+1 − c, where An+1
is the input into the reservoir during the time period from n to n + 1, and c is the
maximum release during the same period. The updating rule is then monotone.
3.4 Absorption
Before absorption
We now consider the absorption problem for hmc’s based only on the transition
matrix P, not necessarily
P assumed irreducible. The state space E is then decom-
posable as E = T + j Rj , where R1 , R2 , . . . are the disjoint recurrent classes and
T is the collection of transient states. (Note that the number of recurrent classes
as well as the number of transient states may be infinite.) The transition matrix
can therefore be block-partitioned as
P1 0 ··· 0
0 P2 · · · 0
P = .. .. . . ..
. . . .
B(1) B(2) · · · Q
or in condensed notation,
D 0
P= . (3.26)
B Q
This structure of the transition matrix accounts for the fact that one cannot go
from a state in a given recurrent class to any state not belonging to this recurrent
class. In other words, a recurrent class is closed.
What is the probability of being absorbed by a given recurrent class when starting
from a given transient state? This kind of problem was already addressed when
the first-step analysis method was introduced. It led to systems of linear equations
with boundary conditions, for which the solution was unique, due to the finiteness
of the state space. With an infinite state space, the uniqueness issue cannot be
overlooked, and the absorption problem will be reconsidered with this in mind,
and also with the intention of finding general matrix-algebraic expressions for the
solutions. Another phenomenon not manifesting itself in the finite case is the
66 CHAPTER 3. LONG-RUN BEHAVIOUR
possibility, when the set of transient states is infinite, of never being absorbed by
the recurrent set. We shall consider this problem first, and then proceed to derive
the distribution of the time to absorption by the recurrent set, and the probability
of being absorbed by a given recurrent class.
Let A be a subset of the state space E (typically the set of transient states, but
not necessarily). We aim at computing for any initial state i ∈ A the probability
of remaining forever in A,
v n = Q n 1A ,
where 1A is the column vector indexed by A with all entries equal to 1. From this
equality we obtain
vn+1 = Qvn ,
and by dominated convergence v = Qv. Moreover, 0A ≤ v ≤ 1A , where 0A is the
column vector indexed by A with all entries equal to 0. The above result can be
refined as follows:
Theorem 3.4.1 The vector v is the maximal solution of
v = Qv, 0A ≤ v ≤ 1A .
Proof. Only maximality and the last statement remain to be proven. To prove
maximality consider a vector u indexed by A such that u = Qu and 0A ≤ u ≤ 1A .
Iteration of u = Qu yields u = Qn u, and u ≤ 1A implies that Qn u ≤ Qn 1A = vn .
Therefore, u ≤ vn , which gives u ≤ v by passage to the limit.
To prove the last statement of the theorem, let c = supi∈A v(i). From v ≤ c1A , we
obtain v ≤ cvn as above, and therefore, at the limit, v ≤ cv. This implies either
v = 0A or c = 1.
3.4. ABSORPTION 67
When the set T is finite, the probability of infinite sojourn in T is null, because
otherwise at least one transient state would be visited infinitely often.
Equation v = Qv reads
X
v(i) = pij v(j) (i ∈ A) .
j∈A
Example 3.4.1: The repair shop once more. We shall prove in a different
way a result already obtained in Subsection 2.4, that is: the chain is recurrent if
and only if ρ ≤ 1,. Observe that the restriction of P to Ai := {i + 1, i + 2, . . .},
namely
a 1 a 2 a3 · · ·
a0 a1 a2 · · ·
Q=
,
a0 a 1 · · ·
···
does not depend on i ≥ 0. In particular, the maximal solution of v = Qv, 0A ≤ v ≤
1A when A ≡ Ai has, in view of Theorem 3.4.1, the following two interpretations.
Firstly, for i ≥ 1, 1 − v(i) is the probability of visiting 0 when starting from i ≥ 1.
Secondly, (1 − v(1)) is the probability of visiting {0, 1, . . . , i} when starting from
i + 1. But when starting from i + 1, the chain visits {0, 1, . . . , i} if and only if it
visits i, and therefore (1 − v(1)) is also the probability of visiting i when starting
from i + 1. The probability of visiting 0 when starting from i + 1 is
that is,
(1 − β) = a1 (1 − β) + a2 (1 − β 2 ) + · · · .
P
Since i≥0 ai = 1, this reduces to
β = g(β) , (⋆)
68 CHAPTER 3. LONG-RUN BEHAVIOUR
where g is the generating function of the probability distribution (ak , k ≥ 0). Also,
all other equations of v = Qv reduce to (⋆).
Under the irreduciblity assumptions a0 > 0, a0 + a1 < 1, (⋆) has only one solution
in [0, 1], namely β = 1 if ρ ≤ 1, whereas if ρ > 1, it has two solutions in [0, 1],
this probability is β = 1 and β = β0 ∈ (0, 1). We must take the smallest solution.
Therefore, if ρ > 1, the probability of visiting state 0 when starting from state
i ≥ 1 is 1 − v(i) = β0i < 1, and therefore the chain is transient. If ρ ≤ 1, the latter
probability is 1 − v(i) = 1, and therefore the chain is recurrent.
Example 3.4.2: 1-D random walk, take 5. The transition matrix of the
random walk on N with a reflecting barrier at 0,
0 1
q 0 p
P=
q 0 p
,
q 0 p
. . .
where p ∈ (0, 1), is clearly irreducible. Intuitively, if p > q, there is a drift to the
right, and one expects the chain to be transient. This will be proven formally by
showing that the probability v(i) of never visiting state 0 when starting from state
i ≥ 1 is strictly positive. In order to apply Theorem 3.4.1 with A = N − {0}, we
must find the general solution of u = Qu. This equation reads
u(1) = pu(2),
u(2) = qu(1) + pu(3),
u(3) = qu(2) + pu(4),
···
Pi−1 q j
and its general solution is u(i) = u(1) j=0 p
. The largest value of u(1) re-
specting the constraint u(i) ∈ [0, 1] is u(1) = 1− pq . The solution v(i) is therefore
i
q
v(i) = 1 − .
p
3.4. ABSORPTION 69
Time to absorption
We now turn to the determination of the distribution of τ , the time of exit from the
transient set T . Theorem 3.4.1 tells that v = {v(i)}i∈T , where v(i) = Pi (τ = ∞),
is the largest solution of v = Qv subject to the constraints 0T ≤ v ≤ 1T , where
Q is the restriction of P to the transient set T . The probability distribution of τ
when the initial state is i ∈ T is readily computed starting from the identity
Pi (τ = n) = Pi (τ ≥ n) − Pi (τ ≥ n + 1)
Pi (τ > n) = {Qn 1T }i .
m−1
X
Pi (n < τ ≤ n + m) = {(Qn+j − Qn+j−1 )1T }i
j=0
= Qn − Qn+m 1T i
,
Absorption destination
We seek to compute the probability of absorption by a given recurrent class when
starting from a given transient state. As we shall see later, it suffices for the theory
to treat the case where the recurrent classes are singletons. We therefore suppose
that the transition matrix has the form
I 0
P= . (3.28)
B Q
70 CHAPTER 3. LONG-RUN BEHAVIOUR
Let fij be the probability of absorption by recurrent class Rj = {j} when starting
from the transient state i. We have
n I 0
P = ,
Ln Q n
0 1 0 0
We find
16 8 4 1
1 8 16 8 2
S = (1 − Q)−1 = ,
6 4 8 16 1
8 16 8 8
and the absorption probability matrix is
1/4 0 3/4 1/4
1/16 1/16 1/2 1/2
SB = S 0
= .
1/4 1/4 3/4
0 0 1/2 1/2
For instance, the (3, 2)entry, 43 , is the probability that when starting from a couple
of ancestors of type Aa × aa, the race will end up in genotype aa × aa.
72 CHAPTER 3. LONG-RUN BEHAVIOUR
3.5 Exercises
Exercise 3.5.1. abbabaa!
A sequence of A’s and B’s is formed as follows. The first item is chosen at random,
P (A) = P (B) = 12 , as is the second item, independently of the first one. When
the first n ≥ 2 items have been selected, the (n + 1)st is chosen, independently of
the letters in positions k ≤ n − 2 conditionally on the pair at position n − 1 and
n, as follows:
1 1 1 1
P (A | AA) = , P (A | AB) = , P (A | BA) = , P (A | BB) = .
2 2 4 4
What is the proportion of A’s and B’s in a long chain?
Exercise 3.5.4.
Let {Zn }n≥1 be an iid sequence of iid {0, 1}-valued random variables, P (Zn =
1) = p ∈ (0, 1). Show that for all k ≥ 1,
Hint: modulo k.
Exercise 3.5.5.
Let P be an ergodic transition matrix on the finite state space E. Prove that
for any initial distributions µ and ν, one can construct two hmc’s {Xn }n≥0 and
{Yn }n≥0 on E with the same transition matrix P, and the respective initial dis-
tributions µ and ν, in such a way that they couple at a finite time τ such that
E[eατ ] < ∞ for some α > 0.
Let {Xn }n≥0 be an hmc with state space E and transition matrix P. Define for
L ≥ 1, Yn = (Xn , Xn+1 , . . . , Xn+L ).
(a) The process {Yn }n≥0 takes its values in F = E L+1 . Prove that {Yn }n≥0 is an
hmc and give the general entry of its transition matrix. (The chain {Yn }n≥0 is
called the snake chain of length L + 1 associated with {Xn }n≥0 .)
(b) Show that if {Xn }n≥0 is irreducible, then so is {Yn }n≥0 if we restrict the state
space of the latter to be F = {(i0 , . . . , iL ) ∈ E L+1 ; pi0 i1 pi1 i2 · · · piL−1 iL > 0}. Show
that if the original chain is irreducible aperiodic, so is the snake chain.
(c) Show that if {Xn }n≥0 has a stationary distribution π, then {Yn }n≥0 also has a
stationary distribution. Which one?