Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

Markov chains2

The document provides an in-depth exploration of basic Markov chains, focusing on concepts such as transition matrices, recurrence, and long-run behavior. It includes formal definitions, theorems, examples, and exercises related to homogeneous Markov chains and their properties. The content is structured into chapters that cover various aspects of Markov chains, including their distribution, recurrence criteria, and ergodic behavior.

Uploaded by

David Ng
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Markov chains2

The document provides an in-depth exploration of basic Markov chains, focusing on concepts such as transition matrices, recurrence, and long-run behavior. It includes formal definitions, theorems, examples, and exercises related to homogeneous Markov chains and their properties. The content is structured into chapters that cover various aspects of Markov chains, including their distribution, recurrence criteria, and ergodic behavior.

Uploaded by

David Ng
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

Basic Markov Chains

Pierre Brémaud

December 9, 2015
2
Contents

1 The transition matrix 5


1.1 The distribution of a Markov chain . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Communication and period . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Stationarity and reversibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4 Strong Markov property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2 Recurrence 23
2.1 The potential matrix criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Stationary distribution criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 Foster’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3 Long-run behaviour 45
3.1 Ergodic theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 Convergence in variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3 Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.4 Absorption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4 Solutions 77

A Appendix 89
A.1 Greatest Common Divisor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
A.2 Dominated convergence for series . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

3
4 CONTENTS
Chapter 1

The transition matrix

1.1 The distribution of a Markov chain


A particle moves on a denumerable set E. If at time n, the particle is in position
Xn = i, it will be at time n + 1 in a position Xn+1 = j chosen independently of
the past trajectory Xn−1 , Xn−2 with probability pij . This can be represented by a
labeled directed graph, called the transition graph, whose set of vertices is E, and
for which there is a directed edge from i ∈ E to j ∈ E with label pij if and only
the latter quantity is positive. Note that there may be “self-loops”, corresponding
to positions i such that pii > 0.

2
p12
1
p11 p23 p32

p41
p34
3

This graphical interpretation of as Markov chain in terms of a “random walk” on a


set E is adapted to the study of random walks on graphs. Since the interpretation
of a Markov chain in such terms is not always the natural one, we proceed to give
a more formal definition.

5
6 CHAPTER 1. THE TRANSITION MATRIX

Recall that a sequence {Xn }n≥0 of random variables with values in a set E is called
a discrete-time stochastic process with state space E. In this chapter, the state
space is countable, and its elements will be denoted by i, j, k,. . . If Xn = i, the
process is said to be in state i at time n, or to visit state i at time n.

Definition 1.1.1 If for all integers n ≥ 0 and all states i0 , i1 , . . . , in−1 , i, j,

P (Xn+1 = j | Xn = i, Xn−1 = in−1 , . . . , X0 = i0 ) = P (Xn+1 = j | Xn = i) ,

this stochastic process is called a Markov chain, and a homogeneous Markov chain
(hmc) if, in addition, the right-hand side is independent of n.

The matrix P = {pij }i,j∈E , where

pij = P (Xn+1 = j | Xn = i),

is called the transition matrix of the hmc. Since the entries are probabilities, and
since a transition from any state i must be to some state, it follows that
X
pij ≥ 0, and pik = 1
k∈E

for all states i, j. A matrix P indexed by E and satisfying the above properties is
called a stochastic matrix. The state space may be infinite, and therefore such a
matrix is in general not of the kind studied in linear algebra. However, the basic
operations of addition and multiplication will be defined by the same formal rules.
The notation x = {x(i)}i∈E formally represents a column vector, and xT is the
corresponding row vector.
The Markov property easily extends (Exercise 1.5.2) to

P (A | Xn = i, B) = P (A | Xn = i) ,

where

A = {Xn+1 = j1 , . . . , Xn+k = jk }, B = {X0 = i0 , . . . , Xn−1 = in−1 }.

This is in turn equivalent to

P (A ∩ B | Xn = i) = P (A | Xn = i)P (B | Xn = i).

That is, A and B are conditionaly independent given Xn = i. In other words, the
future at time n and the past at time n are conditionally independent given the
1.1. THE DISTRIBUTION OF A MARKOV CHAIN 7

present state Xn = i. In particular, the Markov property is independent of the


direction of time.
Notation. We shall from now on abbreviate P (A | X0 = i) as Pi (A). Also, if µ is
a probability distribution on E, then Pµ (A) is the probability of A given that the
initial state X0 is distributed according to µ.
The distribution at time n of the chain is the vector νn := {νn (i)}i∈E , where

νn (i) := P (Xn = i).


P
From the Bayes rule of exclusive and exhaustive causes, νn+1 (j) = i∈E νn (i)pij ,
T
that is, in matrix form, νn+1 = νnT P. Iteration of this equality yields

νnT = ν0T Pn . (1.1)

The matrix Pm is called the m-step transition matrix because its general term is

pij (m) = P (Xn+m = j | Xn = i).

In fact,P
by the Bayes sequential rule and the Markov property, the right-hand side
equals i1 ,...,im−1 ∈E pii1 pi1 i2 · · · pim−1 j , which is the general term of the m-th power
of P.
The probability distribution ν0 of the initial state X0 is called the initial distri-
bution. From the Bayes sequential rule and in view of the homogeneous Markov
property and the definition of the transition matrix,

P (X0 = i0 , X1 = i1 , . . . , Xk = ik ) = ν0 (i0 )pi0 i1 · · · pik−1 ik .

Therefore,
Theorem 1.1.1 The distribution of a discrete-time hmc is uniquely determined
by its initial distribution and its transition matrix.

Sample path realization


Many hmc’s receive a natural description in terms of a recurrence equation.
Theorem 1.1.2 Let {Zn }n≥1 be an iid sequence of random variables with values
in an arbitrary space F . Let E be a countable space, and f : E × F → E be some
function. Let X0 be a random variable with values in E, independent of {Zn }n≥1 .
The recurrence equation
Xn+1 = f (Xn , Zn+1 ) (1.2)
then defines a hmc.
8 CHAPTER 1. THE TRANSITION MATRIX

Proof. Iteration of recurrence (1.2) shows that for all n ≥ 1, there is a function
gn such that Xn = gn (X0 , Z1 , . . . , Zn ), and therefore P (Xn+1 = j | Xn = i, Xn−1 =
in−1 , . . . , X0 = i0 ) = P (f (i, Zn+1 ) = j | Xn = i, Xn−1 = in−1 , . . . , X0 = i0 ) =
P (f (i, Zn+1 ) = j), since the event {X0 = i0 , . . . , Xn−1 = in−1 , Xn = i} is express-
ible in terms of X0 , Z1 , . . . , Zn and is therefore independent of Zn+1 . Similarly,
P (Xn+1 = j | Xn = i) = P (f (i, Zn+1 ) = j). We therefore have a Markov chain,
and it is homogeneous since the right-hand side of the last equality does not depend
on n. Explicitly:
pij = P (f (i, Z1 ) = j) . (1.3)


Example 1.1.1: 1-D random walk, take 1. Let X0 be a random variable


with values in Z. Let {Zn }n≥1 be a sequence of iid random variables, independent
of X0 , taking the values +1 or −1, and with the probability distribution
P (Zn = +1) = p,
where p ∈ (0, 1). The process {Xn }n≥1 defined by
Xn+1 = Xn + Zn+1
is, in view of Theorem 1.1.2, an hmc, called a random walk on Z. It is called a
“symmetric” random walk if p = 21 .

Not all homogeneous Markov chains receive a “natural” description of the type
featured in Theorem 1.1.2. However, it is always possible to find a “theoretical”
description of the kind. More exactly,
Theorem 1.1.3 For any transition matrix P on E, there exists a homogeneous
Markov chain with this transition matrix and with a representation such as in
Theorem 1.1.2.

Proof. Define
j−1 j
X X
Xn+1 := j if pXn k ≤ Zn+1 < pXn k ,
k=0 k=0

where {Zn }n≥1 is iid, uniform on [0, 1]. By application of Theorem 1.1.2 and of
formula (1.3), we check that this hmc has the announced transition matrix. 

As we already mentioned, not all homogeneous Markov chains are naturally de-
scribed by the model of Theorem 1.1.2. A slight modification of this result con-
siderably enlarges its scope.
1.1. THE DISTRIBUTION OF A MARKOV CHAIN 9

Theorem 1.1.4 Let things be as in Theorem 1.1.2 except for the joint distribu-
tion of X0 , Z1 , Z2 , . . .. Suppose instead that for all n ≥ 0, Zn+1 is condition-
ally independent of Zn , . . . , Z1 , Xn−1 , . . . , X0 given Xn , and that for all i, j ∈ E,
P (Zn+1 = k | Xn = i) is independent of n. Then {Xn }n≥0 is a hmc, with transition
probabilities
pij = P (f (i, Z1 ) = j | X0 = i).

Proof. The proof is quite similar to that of Theorem 1.1.2 (Exercise ??). 

Example 1.1.2: The Ehrenfest urn, take 1. This idealized model of dif-
fusion through a porous membrane, proposed in 1907 by the Austrian physicists
Tatiana and Paul Ehrenfest to describe in terms of statistical mechanics the ex-
change of heat between two systems at different temperatures, considerably helped
understanding the phenomenon of thermodynamic irreversibility (see Example ??).
It features N particles that can be either in compartment A or in compartment
B.

A B

Xn = i N −i

Suppose that at time n ≥ 0, Xn = i particles are in A. One then chooses a particle


at random, and this particle is moved at time n + 1 from where it is to the other
compartment. Thus, the next state Xn+1 is either i − 1 (the displaced particle was
found in compartment A) with probability Ni , or i + 1 (it was found in B) with
probability NN−i . This model pertains to Theorem 1.1.4. For all n ≥ 0,

Xn+1 = Xn + Zn+1 ,

i
where Zn ∈ {−1, +1} and P (Zn+1 = −1 | Xn = i) = N
. The nonzero entries of
the transition matrix are therefore

N −i i
pi,i+1 = , pi,i−1 = .
N N
10 CHAPTER 1. THE TRANSITION MATRIX

i−1 i 1
1 1− N
1− N N

0 1 i−1 i i+1 N−1 N

1 i i+1 1
N N N

First-step analysis
Some functionals of homogeneous MarkovP chains such as probabilities of absorption
by a closed set (A is called closed if j∈A pij = 1 for all i ∈ A) and average times
before absorption can be evaluated by a technique called first-step analysis.

Example 1.1.3: The gambler’s ruin, take 1. Two players A and B play
“heads or tails”, where heads occur with probability p ∈ (0, 1), and the successive
outcomes form an iid sequence. Calling Xn the fortune in dollars of player A at
time n, then Xn+1 = Xn + Zn+1 , where Zn+1 = +1 (resp., −1) with probability
p (resp., q := 1 − p), and {Zn }n≥1 is iid. In other words, A bets $1 on heads at
each toss, and B bets $1 on tails. The respective initial fortunes of A and B are
a and b (positive integers). The game ends when a player is ruined, and therefore
the process {Xn }n≥1 is a random walk as described in Example 1.1.1, except that
it is restricted to E = {0, . . . , a, a + 1, . . . , a + b = c}. The duration of the game is
T , the first time n at which Xn = 0 or c, and the probability of winning for A is
u(a) = P (XT = c | X0 = a).
Instead of computing u(a) alone, first-step analysis computes

u(i) = P (XT = c | X0 = i)

for all states i, 0 ≤ i ≤ c, and for this, it first generates a recurrence equation
for u(i) by breaking down event “A wins” according to what can happen after the
first step (the first toss) and using the rule of exclusive and exhaustive causes. If
X0 = i, 1 ≤ i ≤ c − 1, then X1 = i + 1 (resp., X1 = i − 1) with probability p (resp.,
q), and the probability of winning for A with updated initial fortune i + 1 (resp.,
i − 1) is u(i + 1) (resp., u(i − 1)). Therefore, for i, 1 ≤ i ≤ c − 1,

u(i) = pu(i + 1) + qu(i − 1),

with the boundary conditions u(0) = 0, u(c) = 1.


1.1. THE DISTRIBUTION OF A MARKOV CHAIN 11

A wins
c=a+b

0
1 2 3 4 5 6 7 8 9 10 T = 11
The gambler’s ruin

The characteristic equation associated with this linear recurrence equation is pr2 −
r + q = 0. It has two distinct roots, r1 = 1 and r2 = pq , if p 6= 21 , and a double root,
 i
r1 = 1, if p = 12 . Therefore, the general solution is u(i) = λr1i + µr2i = λ + µ pq
when p 6= q, and u(i) = λr1i + µir1i = λ + µi when p = q = 21 . Taking into account
the boundary conditions, one can determine the values of λ and µ. The result is,
for p 6= q,

1 − ( pq )i
u(i) = ,
1 − ( pq )c

and for p = q = 12 ,

i
u(i) = .
c

In the case p = q = 12 , the probability v(i) that B wins when the initial fortune of
B is c−i is obtained by replacing i by c−i in expression for u(i): v(i) = c−i
c
= 1− ci .
One checks that u(i) + v(i) = 1, which means in particular that the probability
that the game lasts forever is null. The reader is invited to check that the same is
true in the case p 6= q.

First-step analysis can also be used to compute average times before absorption
(Exercise 1.5.5).
12 CHAPTER 1. THE TRANSITION MATRIX

1.2 Communication and period


Communication and period are topological properties in the sense that they concern
only the naked transition graph (with only the arrows, without the labels).

Communication and irreducibility


Definition 1.2.1 State j is said to be accessible from state i if there exists M ≥ 0
such that pij (M ) > 0. States i and j are said to communicate if i is accessible
from j and j is accessible from i, and this is denoted by i ↔ j.

In particular, a state i is always accessible from itself, since pii (0) = 1 (P0 = I,
the identity).
P
For M ≥ 1, pij (M ) = i1 ,...,iM −1 pii1 · · · piM −1 j , and therefore pij (M ) > 0 if and
only if there exists at least one path i, i1 , . . . , iM −1 , j from i to j such that

pii1 pi1 i2 · · · piM −1 j > 0,

or, equivalently, if there is a directed path from i to j in the transition graph G.


Clearly,

i↔i (reflexivity),
i↔j⇒j↔i (symmetry),
i ↔ j, j ↔ k ⇒ i ↔ k (transivity).

Therefore, the communication relation (↔) is an equivalence relation, and it gen-


erates a partition of the state space E into disjoint equivalence classes called com-
munication classes.

Definition 1.2.2 A state i such that pP ii = 1 is called closed. More generally, a


set C of states such that for all i ∈ C, j∈C pij = 1 is called closed.

Definition 1.2.3 If there exists only one communication class, then the chain, its
transition matrix, and its transition graph are said to be irreducible.

Period
Consider the random walk on Z (Example 1.1.1). Since 0 < p < 1, it is irreducible.
Observe that E = C0 + C1 , where C0 and C1 , the set of even and odd relative
integers respectively, have the following property. If you start from i ∈ C0 (resp.,
C1 ), then in one step you can go only to a state j ∈ C1 (resp., C0 ). The chain
1.2. COMMUNICATION AND PERIOD 13

{Xn } passes alternately from cyclic class to the other. In this sense, the chain
has a periodic behavior, corresponding to the period 2. More generally, for any
irreducible Markov chain, one can find a unique partition of E into d classes C0 ,
C1 , . . ., Cd−1 such that for all k, i ∈ Ck ,
X
pij = 1,
j∈Ck+1

where by convention Cd = C0 , and where d is maximal (that is, there is no other


such partition C0′ , C1′ , . . . , Cd′ ′ −1 with d′ > d). The proof follows directly from
Theorem 1.2.2 below.
The number d ≥ 1 is called the period of the chain (resp., of the transition ma-
trix, of the transition graph). The classes C0 , C1 , . . . , Cd−1 are called the cyclic
classes.The chain therefore moves from one class to the other at each transition,
and this cyclically.
We now give the formal definition of period. It is based on the notion of greatest
common divisor of a set of positive integers.

Definition 1.2.4 The period di of state i ∈ E is, by definition,

di = gcd{n ≥ 1 ; pii (n) > 0},

with the convention di = +∞ if there is no n ≥ 1 with pii (n) > 0. If di = 1, the


state i is called aperiodic .

Theorem 1.2.1 If states i and j communicate they have the same period.

Proof. As i and j communicate, there exist integers N and M such that pij (M ) >
0 and pji (N ) > 0. For any k ≥ 1,

pii (M + nk + N ) ≥ pij (M )(pjj (k))n pji (N )

(indeed, the path X0 = i, XM = j, XM +k = j, . . . , XM +nk = j, XM +nk+N = i is


just one way of going from i to i in M + nk + N steps). Therefore, for any k ≥ 1
such that pjj (k) > 0, we have pii (M + nk + N ) > 0 for all n ≥ 1. Therefore, di
divides M +nk +N for all n ≥ 1, and in particular, di divides k. We have therefore
shown that di divides all k such that pjj (k) > 0, and in particular, di divides dj .
By symmetry, dj divides di , and therefore, finally, di = dj . 

We can therefore speak of the period of a communication class or of an irreducible


chain.
14 CHAPTER 1. THE TRANSITION MATRIX

The important result concerning periodicity is the following.


Theorem 1.2.2 Let P be an irreducible stochastic matrix with period d. Then for
all states i, j there exist m ≥ 0 and n0 ≥ 0 (m and n0 possibly depending on i, j)
such that
pij (m + nd) > 0, for all n ≥ n0 .

Proof. It suffices to prove the theorem for i = j. Indeed, there exists m such
that pij (m) > 0, because j is accessible from i, the chain being irreducible, and
therefore, if for some n0 ≥ 0 we have pjj (nd) > 0 for all n ≥ n0 , then pij (m+nd) ≥
pij (m)pjj (nd) > 0 for all n ≥ n0 .
The rest of the proof is an immediate consequence of a classical result of number
theory (Theorem A.1.1). Indeed, the gcd of the set A = {k ≥ 1; pjj (k) > 0} is
d, and A is closed under addition. The set A therefore contains all but a finite
number of the positive multiples of d. In other words, there exists n0 such that
n > n0 implies pjj (nd) > 0. 

C0 C1 C2 = Cd−1

Behaviour of a Markov chain with period 3

1.3 Stationarity and reversibility


The central notion of the stability theory of discrete-time hmc’s is that of station-
ary distribution.

Definition 1.3.1 A probability distribution π satisfying

πT = πT P (1.4)

is called a stationary distribution of the transition matrix P, or of the corresponding


hmc.

The global balance equation (1.4) says that for all states i,
X
π(i) = π(j)pji .
j∈E
1.3. STATIONARITY AND REVERSIBILITY 15

Iteration of (1.4) gives π T = π T Pn for all n ≥ 0, and therefore, in view of (1.1), if


the initial distribution ν = π, then νn = π for all n ≥ 0. Thus, if a chain is started
with a stationary distribution, it keeps the same distribution forever. But there is
more, because then,

P (Xn = i0 , Xn+1 = i1 , . . . , Xn+k = ik ) = P (Xn = i0 )pi0 i1 . . . pik−1 ik


= π(i0 )pi0 i1 . . . pik−1 ik
does not depend on n. In this sense the chain is stationary. One also says that the
chain is in a stationary regime, or in equilibrium, or in steady state. In summary:
Theorem 1.3.1 A hmc whose initial distribution is a stationary distribution is
stationary.
The balance equation π T P = π T , together with the requirement that π be a
probability vector, that is, π T 1 = 1 (where 1 is a column vector with all its entries
equal to 1), constitute when E is finite, |E|+1 equations for |E| unknown variables.
One of the |E| equations in π T P = π T is superfluous given the constraint π T 1 = 1.
Indeed, summing up all equalities of π T P = π T yields the equality π T P1 = π T 1,
that is, π T 1 = 1.

Example 1.3.1: Two-state Markov chain. Take E = {1, 2} and define the
transition matrix  
1−α α
P= ,
β 1−β
where α, β ∈ (0, 1). The global balance equations are
π(1) = π(1)(1 − α) + π(2)β , π(2) = π(1)α + π(2)(1 − β .
These two equations are dependent and reduce to the single equation π(1)α =
π(2)β, to which must be added the constraint π(1) + π(2) = 1 expressing that π
is a probability vector. We obtain
β α
π(1) = , π(2) = .
α+β α+β

Example 1.3.2: The Ehrenfest urn, take 2. The global balance equations
are, for i ∈ [1, N − 1],
 
i−1 i+1
π(i) = π(i − 1) 1 − + π(i + 1)
N N
16 CHAPTER 1. THE TRANSITION MATRIX

and, for the boundary states,


1 1
π(0) = π(1) , π(N ) = π(N − 1) .
N N
Leaving π(0) undetermined, one can solve
 the balance equations for i = 0, 1, . . . , N
N
successively, to obtain π(i) = π(0) i . The value of π(0) is then determined by
P PN N 
writing that π is a probability vector: 1 = N i=0 π(i) = π(0) i=0 i = π(0)2 .
N
1
This gives for π the binomial distribution of size N and parameter 2 :
 
1 N
π(i) = N .
2 i
This is the distribution one would obtain by placing independently each particle
in the compartments, with probability 21 for each compartment.

Stationary distributions may be many. Take the identity as transition matrix.


Then any probability distribution on the state space is a stationary distribution.
Also ther may well not exist any stationary distribution. See Exercise 2.5.5.

Reversible chains
The notions of time-reversal and time-reversibility are very productive, as we shall
see in several occasions in the sequel.
Let {Xn }n≥0 be an hmc with transition matrix P and admitting a stationary
distribution π > 0 (meaning π(i) > 0 for all states i). Define the matrix Q,
indexed by E, by
π(i)qij = π(j)pji . (1.5)
This is a stochastic matrix since
X X π(j) 1 X π(i)
qij = pji = π(j)pji = = 1,
j∈E j∈E
π(i) π(i) j∈E
π(i)

where the third equality uses the global balance equations. Its interpretation is the
following: Suppose that the initial distribution of the chain is π, in which case for
all n ≥ 0, all i ∈ E, P (Xn = i) = π(i). Then, from Bayes’s retrodiction formula,
P (Xn+1 = i | Xn = j)P (Xn = j)
P (Xn = j | Xn+1 = i) = ,
P (Xn+1 = i)
that is, in view of (1.5),

P (Xn = j | Xn+1 = i) = qji .


1.4. STRONG MARKOV PROPERTY 17

We see that Q is the transition matrix of the initial chain when time is reversed.
The following is a very simple observation that will be promoted to the rank of a
theorem in view of its usefulness and also for the sake of easy reference.
Theorem 1.3.2 Let P be a stochastic matrix indexed by a countable set E, and
let π be a probability distribution on E. Define the matrix Q indexed by E by (1.5).
If Q is a stochastic matrix, then π is a stationary distribution of P.

Proof. For fixed i ∈ E, sum equalities (1.5) with respect to j ∈ E to obtain


X X
π(i)qij = π(j)pji .
j∈E j∈E
P
This is the global balance equation since the left-hand side is equal to π(i) j∈E qij =
π(i). 

Definition 1.3.2 One calls reversible a stationary Markov chain with initial dis-
tribution π (a stationary distribution) if for all i, j ∈ E, we have the so-called
detailed balance equations
π(i)pij = π(j)pji . (1.6)
We then say: the pair (P, π) is reversible.

In this case, qij = pij , and therefore the chain and the time-reversed chain are
statistically the same, since the distribution of a homogeneous Markov chain is
entirely determined by its initial distribution and its transition matrix.
The following is an immediate corollary of Theorem 1.3.2.
Theorem 1.3.3 Let P be a transition matrix on the countable state space E, and
let π be some probability distribution on E. If for all i, j ∈ E, the detailed balance
equations (1.6) are satisfied, then π is a stationary distribution of P.

Example 1.3.3: The Ehrenfest urn, take 3. The verification of the detailed
balance equations π(i)pi,i+1 = π(i + 1)pi+1,i is immediate.

1.4 Strong Markov property


The Markov property, that is, the independence of past and future given the
present state, extends to the situation where the present time is a stopping time,
a notion which we now introduce.
18 CHAPTER 1. THE TRANSITION MATRIX

Stopping times
Let {Xn }n≥0 be a stochastic process with values in the denumerable set E. For an
event A, the notation A ∈ X0n means that there exists a function ϕ : E n+1 7→ {0, 1}
such that
1A (ω) = ϕ(X0 (ω), . . . , Xn (ω)) .
In other terms, this event is expressible in terms of X0 (ω), . . . , Xn (ω). Let now τ
be a random variable with values in . It is called a X0n -stopping time if for all
m ∈ , {τ = m} ∈ X0m . In other words, it is a non-anticipative random time
with respect to {Xn }n≥0 , since in order to check if τ = m, one needs only observe
the process up to time m and not beyond. It is immediate to check that if τ is a
X0n -stopping time, then so is τ + n for all n ≥ 1.

Example 1.4.1: Return time. Let {Xn }n≥0 be an hmc with state space E.
Define for i ∈ E the return time to i by

Ti := inf{n ≥ 1 ; Xn = i}

using the convention inf ∅ = ∞ for the empty set of . This is a X0n -stopping
time since for all m ∈ ,

{Ti = m} = {X1 6= i, X2 6= i, . . . , Xm−1 6= i, Xm = i} .

Note that Ti ≥ 1. It is a “return” time, not to be confused with the closely


related “hitting” time of i, defined as Si := inf{n ≥ 0 ; Xn = i}, which is also a
X0n -stopping time, equal to Ti if and only if X0 6= i.

Example 1.4.2: Successive return times. This continues the previous ex-
ample. Let us fix a state, conventionally labeled 0, and let T0 be the return time
to 0. We define the successive return times to 0, τk , k ≥ 1 by τ1 = T0 and for
k ≥ 1,
τk+1 := inf{n ≥ τk + 1 ; Xn = 0}
with the above convention that inf ∅ = ∞. In particular, if τk = ∞ for some k,
then τk+ℓ = ∞ for all ℓ ≥ 1. The identity
(m−1 )
X
{τk = m} ≡ 1{Xn =0} = k − 1 , Xm = 0
n=1

for m ≥ 1 shows that τk is a X0n -stopping time.


1.4. STRONG MARKOV PROPERTY 19

Let {Xn }n≥0 be a stochastic process with values in the countable set E and let
τ be a random time taking its values in := ∪ {+∞}. In order to define Xτ
when τ = ∞, one must decide how to define X∞ . This is done by taking some
arbitrary element ∆ not in E, and setting

X∞ = ∆.

By definition, the “process after τ ” is the stochastic process

{Sτ Xn }n≥0 := {Xn+τ }n≥0 .

The “process before τ ,” or the “process stopped at τ ,” is the process

{Xnτ }n≥0 := {Xn∧τ }n≥0 ,

which freezes at time τ at the value Xτ .


Theorem 1.4.1 Let {Xn }n≥0 be an hmc with state space E and transition matrix
P. Let τ be a X0n -stopping time. Then for any state i ∈ E,
(α) Given that Xτ = i, the process after τ and the process before τ are independent.
(β) Given that Xτ = i, the process after τ is an hmc with transition matrix P.

Proof. (α) We have to show that for all times k ≥ 1, n ≥ 0, and all states
i0 , . . . , in , i, j1 , . . . , jk ,

P (Xτ +1 = j1 , . . . , Xτ +k = jk | Xτ = i, Xτ ∧0 = i0 , . . . , Xτ ∧n = in )
= P (Xτ +1 = j1 , . . . , Xτ +k = jk | Xτ = i).

We shall prove a simplified version of the above equality, namely

P (Xτ +k = j | Xτ = i, Xτ ∧n = in ) = P (Xτ +k = j | Xτ = i) . (⋆)

The general case is obtained by the same arguments. The left-hand side of (⋆)
equals
P (Xτ +k = j, Xτ = i, Xτ ∧n = in )
.
P (Xτ = i, Xτ ∧n = in )
The numerator of the above expression can be developed as
X
P (τ = r, Xr+k = j, Xr = i, Xr∧n = in ) . (⋆⋆)
r∈

(The sum is over because Xτ = i 6= ∆ implies that τ < ∞.) But P (τ =


r, Xr+k = j, Xr = i, Xr∧n = in ) = P (Xr+k = j | Xr = i, Xr∧n = in , τ = r)
20 CHAPTER 1. THE TRANSITION MATRIX

P (τ = r, Xr∧n = in , Xr = i), and since r ∧ n ≤ r and {τ = r} ∈ X0r , the


event B := {Xr∧n = in , τ = r} is in X0r . Therefore, by the Markov property,
P (Xr+k = j | Xr = i, Xr∧n = in , τ = r} = P (Xr+k = j | Xr = i) = pij (k). Finally,
expression (⋆⋆) reduces to
X
pij (k)P (τ = r, Xr∧n = in , Xr = i) = pij (k)P (Xτ =i , Xτ ∧n = in ).
r∈

Therefore, the left-hand side of (⋆) is just pij (k). Similar computations show that
the right-hand side of (⋆) is also pij (k), so that (α) is proven.
(β) We must show that for all states i, j, k, in−1 , . . . , i1 ,

P (Xτ +n+1 = k | Xτ +n = j, Xτ +n−1 = in−1 , . . . , Xτ = i)


= P (Xτ +n+1 = k | Xτ +n = j) = pjk .

But the first equality follows from the fact proven in (α) that for the stopping time
τ ′ = τ + n, the processes before and after τ ′ are independent given Xτ ′ = j. The
second equality is obtained by the same calculations as in the proof of (α). 

The cycle independence property

Consider a Markov chain with a state conventionally denoted by 0 such that


P0 (T0 < ∞) = 1. In view of the strong Markov property, the chain starting
from state 0 will return infinitely often to this state. Let τ1 = T0 , τ2 , . . . be the
successive return times to 0, and set τ0 ≡ 0.
By the strong Markov property, for any k ≥ 1, the process after τk is independent
of the process before τk (observe that condition Xτk = 0 is always satisfied), and
the process after τk is a Markov chain with the same transition matrix as the
original chain, and with initial state 0, by construction. Therefore, the successive
times of visit to 0, the pieces of trajectory

{Xτk , Xτk +1 , . . . , Xτk+1 −1 }, k ≥ 0,

are independent and identically distributed. Such pieces are called the regenerative
cycles of the chain between visits to state 0. Each random time τk is a regeneration
time, in the sense that {Xτk +n }n≥0 is independent of the past X0 , . . . , Xτk −1 and
has the same distribution as {Xn }n≥0 . In particular, the sequence {τk − τk−1 }k≥1
is iid.
1.5. EXERCISES 21

1.5 Exercises
Exercise 1.5.1. A counterexample.
The Markov property does not imply that the past and the future are independent
given any information concerning the present. Find a simple example of an hmc
{Xn }n≥0 with state space E = {1, 2, 3, 4, 5, 6} such that

P (X2 = 6 | X1 ∈ {3, 4}, X0 = 2) 6= P (X2 = 6 | X1 ∈ {3, 4}).

Exercise 1.5.2. Past, present, future.


For an hmc {Xn }n≥0 with state space E, prove that for all n ∈ N, and all states
i0 , i1 , . . . , in−1 , i, j1 , j2 , . . . , jk ∈ E,

P (Xn+1 = j1 , . . . , Xn+k = jk | Xn = i, Xn−1 = in−1 , . . . , X0 = i0 )

= P (Xn+1 = j1 , . . . , Xn+k = jk | Xn = i).

Exercise 1.5.3.
Let {Xn }n≥0 be a hmc with state space E and transition matrix P. Show that for
all n ≥ 1, all k ≥ 2, Xn is conditionally independent of X0 , . . . , Xn−2 , Xn+2 , . . . , Xn+k
given Xn−1 , Xn+1 and compute the conditional distribution of Xn given Xn−1 , Xn+1 .

Exercise 1.5.4. Streetgangs.


Three characters, A, B, and C, armed with guns, suddenly meet at the corner of a
Washington D.C. street, whereupon they naturally start shooting at one another.
Each street-gang kid shoots every tenth second, as long as he is still alive. The
probability of a hit for A, B, and C are α, β, and γ respectively. A is the most
hated, and therefore, as long as he is alive, B and C ignore each other and shoot
at A. For historical reasons not developed here, A cannot stand B, and therefore
he shoots only at B while the latter is still alive. Lucky C is shot at if and only if
he is in the presence of A alone or B alone. What are the survival probabilities of
A, B, and C, respectively?

Exercise 1.5.5. The gambler’s ruin.


(This exercise continues Example 1.1.3.) Compute the average duration of the
game when p = 21 .

Exercise 1.5.6. Records.


Let {Zn }n≥1 be an iid sequence of geometric random variables: For k ≥ 0, P (Zn =
k) = (1 − p)k p, where p ∈ (0, 1). Let Xn = max(Z1 , . . . , Zn ) be the record value
22 CHAPTER 1. THE TRANSITION MATRIX

at time n, and suppose X0 is an N-valued random variable independent of the


sequence {Zn }n≥1 . Show that {Xn }n≥0 is an hmc and give its transition matrix.

Exercise 1.5.7. Aggregation of states.


Let {Xn }n≥0 be a hmc with state space E and transition matrix P, and let (Ak , k ≥
1) be a countable partition of E. Define the process {X̂n }n≥0 with state space
P
Ê = {1̂, 2̂, . . .} by X̂n = k̂ if and only if Xn ∈ Ak . Show that if j∈Aℓ pij is
independent
P of i ∈ Ak for all k, ℓ, {X̂n }n≥0 is a hmc with transition probabilities
p̂k̂ℓ̂ = j∈Aℓ pij (any i ∈ Ak ).
Chapter 2

Recurrence

2.1 The potential matrix criterion


The potential matrix criterion
P
The distribution given X0 = j of Ni = n≥1 1{Xn =i} , the number of visits to state
i strictly after time 0, is
Pj (Ni = r) = fji fiir−1 (1 − fii ) (r ≥ 1)
Pj (Ni = 0) = 1 − fji ,
where fji = Pj (Ti < ∞) and Ti is the return time to i.

Proof. We first go from j to i (probability fji ) and then, r − 1 times in succession,


from i to i (each time with probability fii ), and the last time, that is the r + 1-st
time, we leave i never to return to it (probability 1−fii ). By the cycle independence
property, all these “cycles” are independent, so that the successive probabilities
multiplicate. 

The distribution of Ni given X0 = j and given Ni ≥ 1 is geometric. This has two


main consequences. Firstly, Pi (Ti < ∞) = 1 ⇐⇒ Pi (Ni = ∞) = 1. In words: if
starting from i the chain almost surely returns to i, and will then visit i infinitely
often. Secondly,

X ∞
X fii
Ei [Ni ] = rPi (Ni = r) = rfiir (1 − fii ) = .
r=1 r−1
1 − fii

23
24 CHAPTER 2. RECURRENCE

In particular, Pi (Ti < ∞) < 1 ⇐⇒ Ei [Ni ] < ∞.


We collect these results for future reference. For any state i ∈ E,

Pi (Ti < ∞) = 1 ⇐⇒ Pi (Ni = ∞) = 1

and
Pi (Ti < ∞) < 1 ⇐⇒ Pi (Ni = ∞) = 0 ⇐⇒ Ei [Ni ] < ∞. (2.1)
In particular, the event {Ni = ∞} has Pi -probability 0 or 1.
The potential matrix G associated with the transition matrix P is defined by
X
G= Pn .
n≥0

Its general term


∞ ∞ ∞
" ∞
#
X X X X
gij = pij (n) = Pi (Xn = j) = Ei [1{Xn =j} ] = Ei 1{Xn =j}
n=0 n=0 n=0 n=0

is the average number of visits to state j, given that the chain starts from state i.
Recall that Ti denotes the return time to state i.

Definition 2.1.1 State i ∈ E is called recurrent if

Pi (Ti < ∞) = 1,

and otherwise it is called transient. A recurrent state i ∈ E such that

Ei [Ti ] < ∞,

is called positive recurrent , and otherwise it is called null recurrent.

Although the next criterion of recurrence is of theoretical rather than practical


interest, it can be helpful in a few situations, for instance in the study of recurrence
of random walks (see the examples below).
Theorem 2.1.1 State i ∈ E is recurrent if and only if

X
pii (n) = ∞.
n=0

Proof. This merely rephrases Eqn. (2.1). 


2.1. THE POTENTIAL MATRIX CRITERION 25

Example 2.1.1: 1-D random walk. The state space of this Markov chain is
E := and the non-null terms of its transition matrix are pi,i+1 = p , pi,i−1 = 1−p,
where p ∈ (0, 1). Since this chain is irreducible, it suffices to elucidate the nature
(recurrent or transient) of any one of its states, say, 0. We have p00 (2n + 1) = 0
and
(2n)! n
p00 (2n) = p (1 − p)n .
n!n!

By Stirling’s equivalence formula n! ∼ (n/e)n 2πn, the above quantity is equiva-
lent to
[4p(1 − p)]n
√ (⋆)
πn
P
and the nature of the series ∞ n=0 p00 (n) (convergent or divergent) is that of the
series with general term (⋆). If p 6= 12 , in which case 4p(1 − p) < 1, the latter series
converges, and if p = 21 , in which case 4p(1 − p) = 1, it diverges. In summary, the
states of the 1-D random walk are transient if p 6= 21 , recurrent if p = 12 .

Example 2.1.2: 3-D random walk. The state space of this hmc is E =
Z3 . Denoting by e1 , e2 , and e3 the canonical basis vectors of R3 (respectively
(1, 0, 0), (0, 1, 0), and (0, 0, 1)), the nonnull terms of the transition matrix of the
3-D symmetric random walk are given by

1
px,x±ei = .
6
We elucidate the nature of state, say, 0 = (0, 0, 0). Clearly, p00 (2n + 1) = 0 for all
n ≥ 0, and (exercise)

X  2n
(2n)! 1
p00 (2n) = .
0≤i+j≤n
(i!j!(n − i − j)!)2 6

This can be rewritten as


X   2  2n
1 2n n! 1
p00 (2n) = 2n
.
0≤i+j≤n
2 n i!j!(n − i − j)! 3

Using the trinomial formula


X  n
n! 1
= 1,
0≤i+j≤n
i!j!(n − i − j)! 3
26 CHAPTER 2. RECURRENCE

we obtain the bound    n


1 2n 1
p00 (2n) ≤ Kn 2n ,
2 n 3
where
n!
Kn = max .
0≤i+j≤n i!j!(n − i − j)!

For large values of n, Kn is bounded as follows. Let i0 and j0 be the values of i, j


that maximize n!/(i!j!(n + −i − j)!) in the domain of interest 0 ≤ i + j ≤ n. From
the definition of i0 and j0 , the quantities
n!
(i0 − 1)!j0 !(n − i0 − j0 + 1)!,
n!
(i0 + 1)!j0 !(n − i0 − j0 − 1)!,
n!
i0 !(j0 − 1)!(n − i0 − j0 + 1)!,
n!
,
i0 !(j0 + 1)!(n − i0 − j0 − 1)!
are bounded by
n!
.
i0 !j0 !(n − i0 − j0 )!
The corresponding inequalities reduce to

n − i0 − 1 ≤ 2j0 ≤ n − i0 + 1 and n − j0 − 1 ≤ 2i0 ≤ n − j0 + 1,

and this shows that for large n, i0 ∼ n/3 and j0 ∼ n/3. Therefore, for large n,
 
n! 2n
p00 (2n) ∼ 2n n
.
(n/3)!(n/3)!2 e n
By Stirling’s equivalence formula, the right-hand side of the latter equivalence is
in turn equivalent to √
3 3
,
2(πn)3/2
the general term of a convergent series. State 0 is therefore transient.
One might wonder at this point about the symmetric random walk on 2 , which
moves at each step northward, southward, eastward and westward equiprobably.
Exercise ?? asks you to show that it is null recurrent. Exercise ?? asks you to
prove that the symmetric random walk on p , p ≥ 4 are transient.
2.2. STATIONARY DISTRIBUTION CRITERION 27

A theoretical application of the potential matrix criterion is to the proof that


recurrence is a (communication) class property.
Theorem 2.1.2 If i and j communicate, they are either both recurrent or both
transient.

Proof. By definition, i and j communicate if and only if there exist integers M and
N such that pij (M ) > 0 and pji (N ) > 0. Going from i to j in M steps, then from
j to j in n steps, then from j to i in N steps, is just one way of going from i back
to i in M + n + N steps. Therefore, pii (M + n + N ) ≥ pij (M ) × pjj (n) × pji (N ).
Similarly, pjj (N + n + M ) ≥ pji (N ) × pii (n) × pij (M ). Therefore, with α :=
pij (M ) pji (N ) (a strictly positive quantity), we have pii (M + N P+ n) ≥ α pjj (n)

and pjj (M + N + n) ≥ α pii (n). This implies that the series
P n=0 pii (n) and

p
n=0 jj (n) either both converge or both diverge. The potential matrix criterion
concludes the proof. 

2.2 Stationary distribution criterion


Invariant measure
The notion of invariant measure plays an important technical role in the recurrence
theory of Markov chains. It extends the notion of stationary distribution.

Definition 2.2.1 A non-trivial (that is, non-null) vector x (indexed by E) of non-


negative real numbers (notation: 0 ≤ x < ∞) is called an invariant measure of the
stochastic matrix P (indexed by E) if

xT = xT P . (2.2)

Theorem 2.2.1 Let P be the transition matrix of an irreducible recurrent hmc


{Xn }n≥0 . Let 0 be an arbitrary state and let T0 be the return time to 0. Define for
all i ∈ E "T #
X 0

xi = E 0 1{Xn =i} . (2.3)


n=1

(For i 6= 0, xi is the expected number of visits to state i before returning to 0).


Then, 0 < x < ∞ and x is an invariant measure of P.

Proof. We make three preliminary observations. First, it will be convenient to


28 CHAPTER 2. RECURRENCE

rewrite (2.3) as " #


X
xi = E 0 1{Xn =i} 1{n≤T0 } .
n≥1

Next, when 1 ≤ n ≤ T0 , Xn = 0 if and only if n = T0 . Therefore,

x0 = 1.
P P P P  P
Also, i∈E n≥1 1{Xn =i} 1{n≤T0 } = n≥1 i∈E 1{Xn =i} 1{n≤T0 } = n≥1 1{n≤T0 } =
T0 , and therefore X
xi = E0 [T0 ]. (2.4)
i∈E

We introduce the quantity

0 p0i (n) := E0 [1{Xn =i} 1{n≤T0 } ] = P0 (X1 6= 0, · · · , Xn−1 6= 0, Xn = i).

This is the probability, starting from state 0, of visiting i at time n before returning
to 0. From the definition of x,
X
xi = 0 p0i (n) . (†)
n≥1

We first prove (2.2).


P Observe that 0 p0i (1) = p0i , and, by first-step analysis, for all
n ≥ 2, 0 p0i (n) = j6=0 0 p0j (n − 1)pji . Summing up all the above equalities, and
taking (†) into account, we obtain
X
xi = p0i + xj pji ,
j6=0

that is, (2.2), since x0 = 1.


Next we show that xi > 0 for all i ∈ E. Indeed, iterating (2.2), we find xT = xT Pn ,
that is, since x0 = 1,
X X
xi = xj pji (n) = p0i (n) + xj pji (n).
j∈E j6=0

If xi were null for some i ∈ E, i 6= 0, the latter equality would imply that p0i (n) =
0 for all n ≥ 0, which means that 0 and i do not communicate, in contradiction to
the irreducibility assumption.
It remains to show that xi < ∞ for all i ∈ E. As before, we find that
X
1 = x0 = xj pj0 (n)
j∈E
2.2. STATIONARY DISTRIBUTION CRITERION 29

for all n ≥ 1, and therefore if xi = ∞ for some i, necessarily pi0 (n) = 0 for all
n ≥ 1, and this also contradicts irreducibility. 

Theorem 2.2.2 The invariant measure of an irreducible recurrent hmc is unique


up to a multiplicative factor.

Proof. In the proof of Theorem 2.2.1, we showed that for an invariant measure y
of an irreducible chain, yi > 0 for all i ∈ E, and therefore, one can define, for all
i, j ∈ E, the matrix Q by
yi
qji = pij . (⋆)
yj
P P y
It is a transition matrix, since i∈E qji = y1j i∈E yi pij = yjj = 1. The general
term of Qn is
yi
qji (n) = pij (n) . (⋆⋆)
yj
Indeed, supposing (⋆⋆) true for n,
X X yk yi
qji (n + 1) = qjk qki (n) = pkj pik (n)
k∈E k∈E
yj yk
yi X yi
= pik (n)pkj = pij (n + 1),
yj k∈E yj

and (⋆⋆) follows by induction.


Clearly, Q is irreducible, since P is irreducible (just observe that qji (n) > 0 if
and
P only if pijP (n) > 0 in view of (⋆⋆)). Also, pii (n) = qii (n), and therefore
n≥0 qii (n) = n≥0 pii (n), and therefore Q is recurrent by the potential matrix
criterion. Call gji (n) the probability, relative to the chain governed by the tran-
sition matrix Q, of returning to state i for the first time at step n when starting
from j. First-step analysis gives
X
gi0 (n + 1) = qij gj0 (n) ,
j6=0

that is, using (⋆), X


yi gi0 (n + 1) = (yj gj0 (n))pji .
j6=0
P
Recall that 0 p0i (n + 1) = j6=0 0 p0j (n)pji , or, equivalently,
X
y0 0 p0i (n + 1) = (y0 0 p0j (n))pji .
j6=0
30 CHAPTER 2. RECURRENCE

We therefore see that the sequences {y0 0 p0i (n)} and {yi gi0 (n)} satisfy the same
recurrence equation. Their first terms (n = 1), respectively y0 0 p0i (1) = y0 p0i and
yi gi0 (1) = yi qi0 , are equal in view of (⋆). Therefore, for all n ≥ 1,

yi
0 p0i (n) = gi0 (n).
y0

P
Summing up with respect to n ≥ 1 and using n≥1 gi0 (n) = 1 (Q is recurrent),
we obtain that xi = yy0i . 

Equality (2.4) and the definition of positive recurrence give the following.

Theorem 2.2.3 An irreducible recurrent hmc is positive recurrent if and only if


its invariant measures x satisfy
X
xi < ∞ .
i∈E

Stationary distribution criterion of positive recurrence

An hmc may well be irreducible and possess an invariant measure, and yet not be
recurrent. The simplest example is the 1-D non-symmetric random walk, which
was shown to be transient and yet admits xi ≡ 1 for invariant measure. It turns
out, however, that the existence of a stationary probability distribution is neces-
sary and sufficient for an irreducible chain (not a priori assumed recurrent) to be
recurrent positive.
Theorem 2.2.4 An irreducible hmc is positive recurrent if and only if there exists
a stationary distribution. Moreover, the stationary distribution π is, when it exists,
unique, and π > 0.

Proof. The direct part follows from Theorems 2.2.1 and 2.2.3. For the converse
part, assume the existence of a stationary distribution
P π. Iterating π T = π T P, we
obtain π T = π T Pn , that is, for all i ∈ E, π(i) = j∈E π(j)pji (n). If the chain
were transient, then, for all states i, j,

lim pji (n) = 0 .


n↑∞
2.2. STATIONARY DISTRIBUTION CRITERION 31

The following is a formal proof1 :


X XX
pji (n) = Pj (Ti = k)pii (n − k)
n≥1 n≥1 k≥1
X X
= Pj (Ti = k) pii (n − k)
k≥1 n≥1
! %! %
X X
≤ Pj (Ti = k) pii (n)
k≥1 n≥1
! %
X X
= Pj (Ti < ∞) pii (n) ≤ pii (n) < ∞ .
n≥1 n≥1

In particular, limn pji (n) = 0. Since pji (n) is bounded uniformly in j and n by 1 ,
by dominated convergence (Theorem A.2.1):
X X  
π(i) = lim π(j)pji (n) = π(j) lim pji (n) = 0.
n↑∞ n↑∞
j∈E j∈E
P
This contradicts the assumption that π is a stationary distribution ( i∈E π(i) =
1). The chain must therefore be recurrent, and by Theorem 2.2.3, it is positive
recurrent.
The stationary distribution π of an irreducible positive recurrent chain is unique
(use Theorem 2.2.2 and the fact that there is no choice for a multiplicative factor
but 1). Also recall that π(i) > 0 for all i ∈ E (see Theorem 2.2.1). 
Theorem 2.2.5 Let π be the unique stationary distribution of an irreducible pos-
itive recurrent hmc, and let Ti be the return time to state i . Then

π(i)Ei [Ti ] = 1. (2.5)

Proof. This equality is a direct consequence of expression (2.3) for the invariant
measure. Indeed, π is obtained by normalization of x: for all i ∈ E,
xi
π(i) = P ,
j∈E xj

and in particular, for i = 0, recalling that x0 = 1 and using (2.4),


1
π(0) = .
E0 [T0 ]
1
Rather awkward, but using only the elementary tools available.
32 CHAPTER 2. RECURRENCE

Since state 0 does not play a special role in the analysis, (2.5) is true for all i ∈ E.


The situation is extremely simple when the state space is finite.


Theorem 2.2.6 An irreducible hmc with finite state space is positive recurrent.

Proof. We first show recurrence. We have


X
pij (n) = 1,
j∈E

and in particular, the limit of the left hand side is 1. If the chain were transient,
then, as we saw in the proof of Theorem 2.2.4, for all i, j ∈ E,

lim pij (n) = 0,


n↑∞

and therefore, since the state space is finite


X
lim pij (n) = 0 ,
n↑∞
j∈E

a contradiction. Therefore, the chain isPrecurrent. By Theorem 2.2.1 it has an


invariant measure x. Since E is finite, i∈E xi < ∞, and therefore the chain is
positive recurrent, by Theorem 2.2.3. 

2.3 Foster’s theorem


The stationary distribution criterion of positive recurrence of an irreducible chain
requires solving the balance equations, and this is not always feasible. Therefore
one needs more efficient conditions guaranteeing positive recurrence. The following
result (Foster’s theorem) gives a sufficient condition of positive recurrence.
Theorem 2.3.1 Let the transition matrix P on the countable state space E be
irreducible and suppose that there exists a function h : E → such that inf i h(i) >
−∞ and X
pik h(k) < ∞ for all i ∈ F, (2.6)
k∈E
X
pik h(k) ≤ h(i) − ǫ for all i 6∈ F, (2.7)
k∈E

for some finite set F and some ǫ > 0. Then the corresponding hmc is positive
recurrent.
2.3. FOSTER’S THEOREM 33

Proof. Since inf i h(i) > −∞, one may assume without loss of generality that h ≥ 0, by
adding a constant if necessary. Call τ the return time to F , and define Yn = h(Xn )1{n<τ } .
Equality (2.7) is just E[h(Xn+1 ) | Xn = i] ≤ h(i) − ǫ for all i 6∈ F . For i 6∈ F ,
Ei [Yn+1 | X0n ] = Ei [Yn+1 1{n<τ } | X0n ] + Ei (Yn+1 1{n≥τ } | X0n ]
= Ei [Yn+1 1{n<τ } | X0n ] ≤ Ei [h(Xn+1 )1{n<τ } | X0n ]
= 1{n<τ } Ei [h(Xn+1 ) | X0n ] = 1{n<τ } Ei [h(Xn+1 ) | Xn ]
≤ 1{n<τ } h(Xn ) − ǫ1{n<τ } ,
where the third equality comes from the fact that 1{n<τ } is a function of X0n , the fourth
equality is the Markov property, and the last inequality is true because Pi -a.s., Xn 6∈ F
on n < τ . Therefore, Pi -a.s., Ei [Yn+1 | X0n ] ≤ Yn − ǫ1{n<τ } , and taking expectations,
Ei [Yn+1 ] ≤ Ei [Yn ] − ǫPi (τ > n).
Iterating the above equality, and observing that Yn is non-negative, we obtain
n
X
0 ≤ Ei [Yn+1 ] ≤ Ei [Y0 ] − ǫ Pi (τ > k).
k=0
P∞
But Y0 = h(i), Pi -a.s., and k=0 Pi (τ > k) = Ei [τ ]. Therefore, for all i 6∈ F ,
Ei [τ ] ≤ ǫ−1 h(i).
For j ∈ F , first-step analysis yields
X
Ej [τ ] = 1 + pji Ei [τ ].
i6∈F
P
Thus Ej [τ ] ≤ 1+ǫ−1 i6∈F pji h(i), and this quantity is finite in view of assumption (2.6).
Therefore, the return time to F starting anywhere in F has finite expectation. Since F
is a finite set, this implies positive recurrence in view of the following lemma. 

Lemma 2.3.1 Let {Xn }n≥0 be an irreducible hmc, let F be a finite subset of the
state space E, and let τ (F ) be the return time to F . If Ej [τ (F )] < ∞ for all j ∈ F ,
the chain is positive recurrent.

Proof. Select i ∈ F , and let Ti be the return time of {Xn } to i. Let τ1 =


τ (F ), τ2 , τ3 , . . . be the successive return times to F . It follows from the strong
Markov property that {Yn }n≥0 defined by Y0 = X0 = i and Yn = Xτn for n ≥ 1
is an hmc with state space F . Since {Xn } is irreducible, so is {Yn }. Since F is
finite, {Yn } is positive recurrent, and in particular, Ei [T̃i ] < ∞, where T̃i is the
return time to i of {Yn }. Defining S0 = τ1 and Sk = τk+1 − τk for k ≥ 1, we have

X
Ti = Sk 1{k<T̃i } ,
k=0
34 CHAPTER 2. RECURRENCE

and therefore ∞
X
Ei [Ti ] = Ei [Sk 1{k<T̃i } ].
k=0
Now, X
Ei [Sk 1{k<T̃i } ] = Ei [Sk 1{k<T̃i } 1{Xτk =ℓ} ] ,
ℓ∈F

and by the strong Markov property applied to {Xn }n≥0 and the stopping time τk ,
and the fact that the event {k < T̃i } belongs to the past of {Xn }n≥0 at time τk ,
Ei [Sk 1{k<T̃i } 1{Xτk =ℓ} ] = Ei [Sk | k < T̃i , Xτk = ℓ]Pi (k < T̃i , Xτk = ℓ)
= Ei [Sk | Xτk = ℓ]Pi (k < T̃i , Xτk = ℓ) .
Observing that Ei [Sk | Xτk = ℓ] = Eℓ [τ (F )], we see that the latter expression is
bounded by (maxℓ∈F Eℓ [τ (F )]) Pi (k < T̃i , Xτk = ℓ), and therefore
 X
∞  
Ei [Ti ] ≤ max Eℓ (τ (F )) Pi (T̃i > k) = max Eℓ (τ (F )) Ei [T̃i ] < ∞.
ℓ∈F ℓ∈F
k=0

The function h in Foster’s theorem is called a Lyapunov function because it plays a


role similar to the Lyapunov functions in the stability theory of ordinary differential
equations. The corollary below is refered to as Pakes’s lemma.

Corollary 2.3.1 Let {Xn }n≥0 be an irreducible hmc on E = such that for all
n ≥ 0 and all i ∈ E,
E[Xn+1 − Xn | Xn = i] < ∞
and
lim sup E[Xn+1 − Xn | Xn = i] < 0. (2.8)
i↑∞

Such an hmc is positive recurrent.

Proof. Let −2ǫ be the left-hand side of (2.8). In particular, ǫ > 0. By (2.8), for i
sufficiently large, say i > i0 , E[Xn+1 − Xn | Xn = i] < −ǫ. We are therefore in the
conditions of Foster’s theorem with h(i) = i and F = {i; i ≤ i0 }. 

Example 2.3.1: A random walk on N. Let {Zn }n≥1 be an iid sequence of


integrable random variables with values in Z such that
E[Z1 ] < 0,
2.3. FOSTER’S THEOREM 35

and define {Xn }n≥0 , an HMC with state space E = N, by


Xn+1 = (Xn + Zn+1 )+ ,
where X0 is independent of {Zn }n≥1 . Assume irreducibility (the industrious reader
will find the necessary and sufficient condition for this). Here
E[Xn+1 − i | Xn = i] = E[(i + Zn+1 )+ − i]
= E[−i1{Zn+1 ≤−i} + Zn+1 1{Zn+1 >−i} ] ≤ E[Z1 1{Z1 >−i} ].
By dominated convergence, the limit of E[Z1 1{Z1 >−i} ] as i tends to ∞ is E[Z1 ] < 0
and therefore, by Pakes’s lemma, the hmc is positive recurrent.

The following is a Foster-type theorem, only with a negative conclusion.


Theorem 2.3.2 Let the transition matrix P on the countable state space E be
irreducible and suppose that there exists a finite set F and a function h : E → +
such that
there exists j ∈
/ F such that h(j) > max h(i) (2.9)
i∈F
X
sup pik |h(k) − h(i)| < ∞, (2.10)
i∈E
k∈E
X
pik (h(k) − h(i)) ≤ 0 for all i 6∈ F. (2.11)
k∈E

Then the corresponding hmc cannot be positive recurrent.

Proof. Let τ be the return time to F . Observe that


X∞
h(Xτ )1{τ <∞} = h(X0 ) + (h(Xn+1 ) − h(Xn )) 1{τ >n} .
n=0

Now, with j 6∈ F ,

X  
Ej |h(Xn+1 ) − h(Xn )| 1{τ >n}
n=0

X  
= Ej Ej [|h(Xn+1 ) − h(Xn )| |X0n ] 1{τ >n}
n=0

X  
= Ej Ej [|h(Xn+1 ) − h(Xn )| |Xn ] 1{τ >n}
n=0

X
≤K Pj (τ > n)
n=0
36 CHAPTER 2. RECURRENCE

for some finite positive constant K by (2.10). Therefore, if the chain is positive
recurrent, the latter bound is KEj [τ ] < ∞. Therefore
 
Ej [h(Xτ )] = Ej h(Xτ )1{τ <∞}

X  
= h(j) + Ej (h(Xn+1 ) − h(Xn )) 1{τ >n} > h(j),
n=0

by (2.11). In view of assumption (2.9), we have h(j) > maxi∈F h(i) ≥ Ej [h(Xτ )],
hence a contradiction. The chain therefore cannot be positive recurrent. 

2.4 Examples
Birth-and-death Markov chain
We first define the birth-and-death process with a bounded population. The state
space of such a chain is E = {0, 1, . . . , N } and its transition matrix is
 
r0 p 0
 q 1 r1 p 1 
 
 q 2 r 2 p 2 
 
 . .. 
 
P= ,
 q i ri pi 
 . . . 
 .. .. .. 
 
 qN −1 rN −1 pN −1 
pN rN
where pi > 0 for all i ∈ E\{N }, qi > 0 for all i ∈ E\{0}, ri ≥ 0 for all i ∈ E, and
pi + qi + ri = 1 for all i ∈ E. The positivity conditions placed on the pi ’s and qi ’s
guarantee that the chain is irreducible. Since the state space is finite, it is positive
recurrent (Theorem 2.2.6), and it has a unique stationary distribution. Motivated
by the Ehrenfest hmc which is reversible in the stationary state, we make the
educated guess that the birth and death process considered has the same property.
This will be the case if and only if there exists a probability distribution π on E
satisfying the detailed balance equations, that is, such that for all 1 ≤ i ≤ N ,
π(i − 1)pi−1 = π(i)qi . Letting w0 = 1 and for all 1 ≤ i ≤ N ,
i
Y pk−1
wi =
k=1
qk
we find that
wi
π(i) = PN (2.12)
j=0 wj
2.4. EXAMPLES 37

indeed satisfies the detailed balance equations and is therefore the (unique) sta-
tionary distribution of the chain.
We now consider the unbounded birth-and-death process. This chain has the state
space E = N and its transition matrix is as in the previous example (only, it is
unbounded on the right). In particular, we assume that the pi ’s and qi ’s are positive
in order to guarantee irreducibility. The same reversibility argument as above
applies with a little difference. In fact we can show that the wi ’s defined above
satisfy the detailed balance equations and therefore the global balance equations.
Therefore the vector {wi }i∈E the unique, up to a multiplicative factor, invariant
measure of the chain. It can be normalized to a probability distribution if and
only if
X∞
wj < ∞ .
j=0

Therefore, in this case and only in this case there exists a (unique) stationary
distribution, also given by (2.12).
Note that the stationary distribution, when it exists, does not depend on the ri ’s.
The recurrence properties of the above unbounded birth-and-death process are
therefore the same as those of the chain below, which is however not aperiodic.
For aperiodicity, it suffices to suppose at least one of the ri ’s to be positive.

p0 = 1 p1 p2 pi−1 pi

0 1 2 i−1 i i+1

q1 q2 q3 qi qi+1

We now compute for the (bounded or unbounded) irreducible birth-and death


process, the average time it takes to reach a state b from a state a < b. In fact, we
shall prove that
Xb k−1
1 X
Ea [Tb ] = wj . (2.13)
k=a+1
q k wk j=0
Pb
Since obviously Ea [Tb ] = k=a+1 Ek−1 [Tk ], it suffices to prove that
k−1
1 X
Ek−1 [Tk ] = wj . (⋆)
qk wk j=0
38 CHAPTER 2. RECURRENCE

For this, consider for any given k ∈ {0, 1, . . . , N } the truncated chain, which moves
on the state space {0, 1, . . . , k} as the original chain, except in state k where it
moves one step down with probability qk and stays still with probability pk + rk .
Write Ee for expectations of the modified chain. The unique stationary distribution
of this chain is given by
wℓ
π
eℓ = Pk
j=0 wℓ

ek [Tk ] = (rk + pk ) × 1 +
forall 0 ≤ ℓ ≤ k. First-step analysis shows that E
qk 1 + E ek−1 [Tk ] , that is

ek [Tk ] = 1 + qk E
E ek−1 [Tk ] .

Also
k
e 1 1 X
Ek [Tk ] = = wj ,
π
ek wk j=0

ek−1 [Tk ] = Ek−1 [Tk ], we have (⋆).


and therefore, since E
In the special case where (pj , qj , rj ) = (p, q, r) for all j 6= 0, N , (p0 , q0 , r0 ) =
 i
(p, q + r, 0) and (pN , qN , rN ) = (0, p + r, q), we have wi = pq , and for 1 ≤ k ≤ N ,

k−1  j
!  k %
1 X p 1 q
Ek−1 [Tk ] =  k = 1− .
p q p−q p
q q
j=0

In the further particularization where p = q, wi = 1 for all i and


k
Ek−1 [Tk ] = .
p

The repair shop


During day n, Zn+1 machines break down, and they enter the repair shop on
day n + 1. Every day one machine among those waiting for service is repaired.
Therefore, denoting by Xn the number of machines in the shop on day n,

Xn+1 = (Xn − 1)+ + Zn+1 , (2.14)

where a+ = max(a, 0). The sequence {Zn }n≥1 is assumed to be an iid sequence,
independent of the initial state X0 , with common probability distribution

P (Z1 = k) = ak , k ≥ 0
2.4. EXAMPLES 39

of generating function gZ . The stochastic process {Xn }n≥0 is a hmc of transition


matrix
 
a0 a1 a2 a3 ···
 a0 a1 a2 a3 ··· 
 
 
P =  0 a0 a1 a2 ··· .
 0 0 a0 a1 ··· 
 
.. .. .. ..
. . . .

Indeed, by formula (1.3), pij = P ((i − 1)+ + Z1 = j) = P (Z1 = j − (i − 1)+ ).


The repair shop model may also be interpreted in terms of communications. It
describes a communications link in which time is divided into successive intervals
(the “slots”) of equal length, conventionally taken to be equal to 1. In slot n
(extending from time n included to time n+1 excluded), there arrive Zn+1 messages
requiring transmission. Since the link can transmit at most one message in a given
slot, the messages may have to be buffered, and Xn represents the number of
messages in the buffer (supposed of infinite capacity) at time n. The dynamics of
the buffer content are therefore those of Eqn. (2.14).
A necessary and sufficient condition of irreducibility of this chain is that P (Z1 =
0) > 0 and P (Z1 ≥ 2) > 0 as we now prove formally. Looking at (2.14), we make
the following observations. If P (Zn+1 = 0) = 0, then Xn+1 ≥ Xn a.s. and there is
no way of going from i to i − 1. If P (Zn+1 ≤ 1) = 1, then Xn+1 ≤ Xn and there is
no way of going from i to i + 1. Therefore, the two conditions P (Z1 = 0) > 0 and
P (Z2 ≥ 2) > 0 are necessary for irreducibility. They are also sufficient. Indeed
if there exists k ≥ 2 such that P (Zn+1 = k) > 0, then one can go form any
i > 0 to i + k − 1 > i or from i = 0 to k > 0 with positive probability. Also if
P (Zn+1 = 0) > 0, one can go from i > 0 to i − 1 with positive probability. In
particular, one can go from i to j < i with positive probability. Therefore, to go
from i to j ≥ i, one can take several successive steps of height at least k − 1, and
reach a state l ≥ i, and then, in the case of l > i, go down one by one from l to i.
All this with positive probability.
Assuming irreducibility, we now seek a necessary and sufficient condition for pos-
itive recurrence. For any complex number z with modulus not larger than 1, it
follows from the recurrence equation (2.14) that
 +
 
z Xn+1 +1 = z (Xn −1) +1 z Zn+1 = z Xn − 1{Xn =0} + z1{Xn =0} z Zn+1 ,

and therefore zz Xn+1 − z Xn z Zn+1 = (z − 1)1{Xn =0} z Zn+1 . From the independence
of Xn and Zn+1 , E[z Xn z Zn+1 ] = E[z Xn ]gZ (z), and E[1{Xn =0} z Zn+1 ] = π(0)gZ (z),
where π(0) = P (Xn = 0). Therefore, zE[z Xn+1 ] − gZ (z)E[z Xn ] = (z − 1)π(0)gZ (z).
40 CHAPTER 2. RECURRENCE

But in steady state, E[z Xn+1 ] = E[z Xn ] = gX (z), and therefore

gX (z) (z − gZ (z)) = π(0)(z − 1)gZ (z). (2.15)


P
This gives the generating function gX (z) = ∞ i
i=0 π(i)z , as long as π(0) is avail-
able. To obtain π(0), differentiate (2.15): gX (z) (z − gZ (z)) + gX (z) (1 − gZ′ (z))

= π(0) (gZ (z) + (z − 1)gZ′ (z)), and let z = 1, to obtain, taking into account the
equalities gX (1) = gZ (1) = 1 and gZ′ (1) = E[Z],

π(0) = 1 − E[Z]. (2.16)

But the stationary distribution of an irreducible hmc is positive, hence the neces-
sary condition of positive recurrence:

E[Z1 ] < 1.

We now show this condition is also sufficient for positive recurrence. This follows
immediately from Pakes’s lemma, since for i ≥ 1, E[Xn+1 − Xn | Xn = i] = E[Z] −
1 < 0.
From (2.15) and (2.16), we have the generating function of the stationary distri-
bution:
X ∞
(z − 1)gZ (z)
π(i)z i = (1 − E[Z]) . (2.17)
i=0
z − g Z (z)

If E[Z1 ] > 1, the chain is transient, as a simplePargument basedP


on the strong law
of large numbers shows. In fact, Xn = X0 + k=1 Zk − n + nk=1 1{Xk =0} , and
n

therefore
X n
Xn ≥ Zk − n,
k=1

which tends to ∞ because, by the strong law of large numbers,


Pn
k=1 Zk − n
→ E[Z] − 1 > 0.
n

This is of course incompatible with recurrence.


We finally examine the case E[Z1 ] = 1, for which there are only two possibilities
left: transient or null recurrent. It turns out that the chain is null recurrent in
this case.
2.4. EXAMPLES 41

The pure random walk on a graph


Consider a finite non-directed connected graph G = (V, E) where V is the set
of vertices, or nodes, and E is the set of edges. Let di be the index of vertex i
(the number of edges “adjacent” to vertex i). Since there is no isolated nodes (a
consequence of the connectedness assumption), di > 0 for all i ∈ V . Transform
this graph into a directed graph by splitting each edge into two directed edges of
opposite directions, and make it a transition graph by associating to the directed
edge from i to j the transition probability d1i (see the figure below). Note that
P
i∈V di = 2|E|.

1 1 1
3
1
1 2

1 1
2 4 2 2 2 4
1 1
3 2
1
3
3 3

A random walk on a graph

The corresponding hmc with state space E ≡ V is irreducible (G is connected).


It therefore admits a unique stationary distribution π, that we attempt to find a
stationary distribution via Theorem 1.3.3. Let i and j be connected by an edge,
and therefore pij = d1i and pji = d1j , so that the detailed balance equation between
these two states is
1 1
π(i) = π(j) .
di dj
P −1
This gives π(i) = Kdi , where K is obtained by normalization: K = j∈E dj =
(2|E|)−1 . Therefore
di
π(i) = .
2|E|

Example 2.4.1: Random walk on the hypercube, take 1. The random


walk on the (n-dimensional) hypercube is the random walk on the graph with
set of vertices E = {0, 1}n and edges between vertices x and y that differ in just
one coordivate. For instance, in three dimensions, the only possible motions of
a particle performing the random walk on the cube is along its edges in both
42 CHAPTER 2. RECURRENCE

1
directions. Clearly, whatever be the dimension n ≥ 2, di = n
and the stationary
distribution is the uniform distribution.

The lazy random walk on the graph is, by definition, the Markov chain on V with
the transition probabilities pii = 21 and for i, j ∈ V such that i and j are connected
by an edge of the graph, pi,i = 2d1 i . This modified chain admits the same stationary
distribution as the original random walk. The difference is that the lazy version is
always aperiodic, whereas the original version maybe periodic.

2.5 Exercises
Exercise 2.5.1. Truncated hmc.
Let P be a transition matrix on the countable state space E, with the positive
stationary distribution π. Let A be a subset of the state space, and define the
truncation of P on A to be the transition matrix Q indexed by A and given by

qij = pij if i, j ∈ A, i 6= j,
X
qii = pii + pik .
k∈Ā

π
Show that if (P, π) is reversible, then so is (Q, π(A) ).

Exercise 2.5.2. Extension to negative times.


Let {Xn }n≥0 be a hmc with state space E, transition matrix P, and suppose that
there exists a stationary distribution π > 0. Suppose moreover that the initial
distribution is π. Define the matrix Q = {qij }i,j∈E by (1.5). Construct {X−n }n≥1 ,
independent of {Xn }n≥1 given X0 , as follows:

P (X−1 = i1 , X−2 = i2 , . . . , X−k = ik | X0 = i, X1 = j1 , . . . , Xn = jn )


= P (X−1 = i1 , X−2 = i2 , . . . , X−k = ik | X0 = i) = qii1 qi1 i2 · · · qik−1 ik

for all k ≥ 1, n ≥ 1, i, i1 , . . . , ik , j1 , . . . , jn ∈ E. Prove that {Xn }n∈Z is a hmc with


transition matrix P and P (Xn = i) = π(i), for all i ∈ E, all n ∈ Z.

Exercise 2.5.3. Moving stones.


Stones S1 , . . . , SM are placed in line. At each time n a stone is selected at random,
and this stone and the one ahead of it in the line exchange positions. If the
selected stone is at the head of the line, nothing is changed. For instance, with
M = 5: Let the current configuration be S2 S3 S1 S5 S4 (S2 is at the head of the
line). If S5 is selected, the new situation is S2 S3 S5 S1 S4 , whereas If S2 is selected,
2.5. EXERCISES 43

the configuration is not altered. At each step, stone Si is selected with probability
αi > 0. Call Xn the situation at time n, for instance Xn = Si1 · · · SiM , meaning
that stone Sij is in the jth position. Show that {Xn }n≥0 is an irreducible hmc
and that it has a stationary distribution given by the formula
π(Si1 · · · SiM ) = CαiM1 αiM2 −1 · · · αiM ,
for some normalizing constant C.

Exercise 2.5.4. Aperiodicity.


a. Show that an irreducible transition matrix P with at least one state i ∈ E such
that pii > 0 is aperiodic.
b. Let P be an irreducible transition matrix on the finite state space E. Show
that a necessary and sufficient condition for P to be aperiodic is the existence of
an integer m such that Pm has all its entries positive.
c. Consider a hmc that is irreducible with period d ≥ 2. Show that the restriction
of the transition matrix to any cyclic class is irreducible. Show that the restriction
of Pd to any cyclic class is aperiodic.

Exercise 2.5.5. No stationary distribution.


Show that the symmetric random walk on Z cannot have a stationary distribution.

Exercise 2.5.6. An interpretation of invariant measure.


A countable number of particles move independently in the countable space E,
each according to a Markov chain with the transition matrix P. Let An (i) be the
number of particles in state i ∈ E at time n ≥ 0, and suppose that the random
variables A0 (i), i ∈ E, are independent Poisson random variables with respective
means µ(i), i ∈ E, where µ = {µ(i)}i∈E is an invariant measure of P. Show that
for all n ≥ 1, the random variables An (i), i ∈ E, are independent Poisson random
variables with respective means µ(i), i ∈ E.

Exercise 2.5.7. Doubly stochastic transition matrix.


A stochastic
P matrix P on the state space E is called doubly stochastic if for all
states i, j∈E pji = 1. Suppose in addition that P is irreducible, and that E
is infinite. Find the invariant measure of P. Show that P cannot be positive
recurrent.

Exercise 2.5.8. Return time to the initial state.


Let τ be the first return time to inital state of an irreducible positive recurrent
hmc {Xn }n≥0 , that is,
τ = inf{n ≥ 1; Xn = X0 },
44 CHAPTER 2. RECURRENCE

with τ = +∞ if Xn 6= X0 for all n ≥ 1. Compute the expectation of τ when


the initial distribution is the stationary distribution π. Conclude that it is finite
if and only if E is finite. When E is infinite, is this in contradiction to positive
recurrence?
Chapter 3

Long-run behaviour

3.1 Ergodic theorem


An important application of the strong law of large numbers is to the ergodic theo-
rem for Markov chains. This theorem gives conditions guaranteeing that empirical
averages of the type
N
1 X
g(Xk , . . . , Xk+L )
N k=1

converge to probabilistic averages. As a matter of fact, if the chain is irreducible


positive recurrent with the stationary distribtion π, the above empirical aver-
age converges Pµ -almost-surely to Eπ [g(X0 , . . . , XL )] for any initial distribution µ
(Corollary 3.1.2), at least if Eπ [|g(X0 , . . . , XL )|] < ∞.
We shall obtain this result as a corollary of the following proposition concerning
irreducible recurrent (not necessarily positive recurrent) hmc’s.
Let {Xn }n≥0 be an irreducible recurrent hmc, and let x denote the canonical
invariant measure associated with state 0 ∈ E,
" #
X
xi = E 0 1{Xn =i} 1{n≤T0 } , (3.1)
n≥1

Pn
where T0 is the return time to 0. Define for n ≥ 1, ν(n) := k=1 1{Xk =0} .

45
46 CHAPTER 3. LONG-RUN BEHAVIOUR

Theorem 3.1.1 Let f : E → R be such that


X
|f (i)|xi < ∞. (3.2)
i∈E

Then, for any initial distribution µ, Pµ -a.s.,


N
1 X X
lim f (Xk ) = f (i)xi . (3.3)
N ↑∞ ν(N )
k=1 i∈E

Proof. Let T0 = τ1 , τ2 , τ3 , . . . be the successive return times to state 0, and define


τp+1
X
Up = f (Xn ).
n=τp +1

By the independence property of the regenerative cycles, {Up }p≥1 is an iid se-
quence. Moreover, assuming f ≥ 0 and using the strong Markov property,
"T #
X 0

E[U1 ] = E0 f (Xn )
n=1
" T0 X
# " T0
#
X X X
= E0 f (i)1{Xn =i} = f (i)E0 1{Xn =i}
n=1 i∈E i∈E n=1
X
= f (i)xi .
i∈E

By hypothesis, this quantity is finite, and threfore the strong law of large numbers
applies, to give
n
1X X
lim Up = f (i)xi ,
n↑∞ n
p=1 i∈E

that is,
τn+1
1 X X
lim f (Xk ) = f (i)xi . (3.4)
n↑∞ n
k=T +1
0 i∈E

Observing that
τν(n) ≤ n < τν(n)+1 ,
we have Pτν(n) Pn Pτν(n)+1
k=1f (Xk ) k=1 f (Xk ) k=1 f (Xi )
≤ ≤ .
ν(n) ν(n) ν(n)
3.1. ERGODIC THEOREM 47

Since the chain is recurrent, limn↑∞ ν(n) = ∞, and therefore,


P from (3.4), the
extreme terms of the above chain of inequality tend to i∈E f (i)xi as n goes to
∞, and this implies (3.3). The case of a function f of arbitrary sign is obtained by
considering (3.3) written separately for f + = max(0, f ) and f − = max(0, −f ), and
then taking the difference of the two equalities obtained this way. The difference
is not an undetermined form ∞ − ∞ due to hypothesis (3.2). 

Corollary 3.1.1 Let {Xn }n≥0 be an irreducible positive recurrent Markov chain
with the stationary distribution π, and let f : E → R be such that
X
|f (i)|π(i) < ∞. (3.5)
i∈E

Then for any initial distribution µ, Pµ -a.s.,


N
1 X X
lim f (Xk ) = f (i)π(i). (3.6)
n↑∞ N
k=1 i∈E

Proof. Apply TheoremP3.1.1 to f ≡ 1. Condition (3.2) is satisfied, since in the


positive recurrent case, i∈E xi = E0 [T0 ] < ∞. Therefore, Pµ -a.s.,
N X
lim = xj .
N ↑∞ ν(N )
j∈E

Now, f satisfying (3.5) also satisfies (3.2), since x and π are proportional, and
therefore, Pµ -a.s.,
N
1 X X
lim f (Xk ) = f (i)xi .
N ↑∞ ν(N )
k=1 i∈E

Combination of the above equalities gives, Pµ -a.s.,


N N P
1 X ν(N ) 1 X i∈E f (i)xi
lim f (Xk ) = lim f (Xk ) = P ,
N →∞ N N →∞ N ν(N )
k=1 k=1 j∈E xj

from which (3.6) follows, since π is obtained by normalization of x. 

Corollary 3.1.2 Let {Xn }n≥1 be an irreducible positive recurrent Markov chain
with the stationary distribution π, and let g : E L+1 → R be such that
X
|g(i0 , . . . , iL )|π(i0 )pi0 i1 · · · piL−1 iL < ∞
i0 ,...,iL
48 CHAPTER 3. LONG-RUN BEHAVIOUR

(see Example 3.5.8) Then for all initial distributions µ, Pµ -a.s.

N
1 X X
lim g(Xk , Xk+1 , . . . , Xk+L ) = g(i0 , i1 , . . . , iL )π(i0 )pi0 i1 · · · piL−1 iL .
N k=1 i ,i ,...,i 0 1 L

Proof. Apply Corollary 3.1.1 to the snake chain {(Xn , Xn+1 , . . . , Xn+L )}n≥0 ,
which is irreducible recurrent and admits the stationary distribution

π(i0 )pi0 i1 · · · piL−1 iL .


Note that
X
g(i0 , i1 , . . . , iL )π(i0 )pi0 i1 · · · piL−1 iL = Eπ [g(X0 , . . . , XL )]
i0 ,i1 ,...,iL

3.2 Convergence in variation


The purpose is to bound the “distance” between two probability distributions,
this distance being the variation distance. One interest of this, among others, is
to replace a random element by another for which computations may be easier.

Definition 3.2.1 Let E be a countable space. The distance in variation between


two probability distributions α and β on E is the quantity

1X
dV (α, β) := |α(i) − β(i)|. (3.7)
2 i∈E

That dV is indeed a distance is clear.

Lemma 3.2.1 Let α and β be two probability distributions on the same countable
space E. Then

dV (α, β) = sup {α(A) − β(A)}


A⊆E

= sup {|α(A) − β(A)|} .


A⊆E
3.2. CONVERGENCE IN VARIATION 49

Proof. For the second equality observe that for each subset A there is a subset B
such that |α(A) − β(A)| = α(B) − β(B) (take B = A or Ā). For the first equality,
write X
α(A) − β(A) = 1A (i){α(i) − β(i)}
i∈E

and observe that the right-hand side is maximal for A = {i ∈ E; α(i) > β(i)}.
Therefore, with g(i) = α(i) − β(i),
X 1X
sup {α(A) − β(A)} = g + (i) = |g(i)| ,
A⊆E
i∈E
2 i∈E
P
where the equality i∈E g(i) = 0 was taken into account. 

The distance in variation between two random variables X and Y with values in
E is the distance in variation between their probability distributions, and it is
denoted (with a slight abuse of notation) by dV (X, Y ). Therefore
1X
dV (X, Y ) := |P (X = i) − P (Y = i)| .
2 i∈E

The distance in variation between a random variable X with values in E and a


probability distribution α on E denoted (again with a slight abuse of notation) by
dV (X, α) is defined by
1X
dV (X, α) := |P (X = i) − α(i)| .
2 i∈E

The coupling inequality

Coupling two discrete probability distributions π ′ on E ′ and π ′′ on E ′′ consists in


the construction of a probability distribution π on E := E ′ × E ′′ such that the
marginal distributions of π on E ′ and E ′′ respectively are π ′ and π ′′ , that is
X X
π(i, j) = π ′ (i) and π(i, j) = π ′′ (j) .
j∈E ′′ i∈E ′

For two probability distributions α and β on the countable set E, let D(α, β) be
the collection of pairs of random variables (X, Y ) taking their values in E × E,
and with marginal distributions α and β, that is,

P (X = i) = α(i), P (Y = i) = β(i) . (3.8)


50 CHAPTER 3. LONG-RUN BEHAVIOUR

Theorem 3.2.1 For any pair (X, Y ) ∈ D(α, β), we have the fundamental cou-
pling inequality
dV (α, β) ≤ P (X 6= Y ), (3.9)
and equality is attained by some pair (X, Y ) ∈ D(α, β), which is then said to
realize maximal coincidence.

Proof. For arbitrary A ⊂ E,

P (X 6= Y ) ≥ P (X ∈ A, Y ∈ Ā) = P (X ∈ A)−P (X ∈ A, Y ∈ A) ≥ P (X ∈ A)−P (Y ∈ A),

and therefore

P (X 6= Y ) ≥ sup {P (X ∈ A) − P (Y ∈ A)} = dV (α, β).


A⊂E

We now construct (X, Y ) ∈ D(α, β) realizing equality. Let U, Z, V , and W be


independent random variables; U takes its values in {0, 1}, and Z, V, W take their
values in E. The distributions of these random variables are given by

P (U = 1) = 1 − dV (α, β),
P (Z = i) = (α(i) ∧ β(i))/ (1 − dV (α, β)) ,
P (V = i) = (α(i) − β(i))+ /dV (α, β) ,
P (W = i) = (β(i) − α(i))+ /dV (α, β) .

Observe that P (V = W ) = 0. Defining

(X, Y ) = (Z, Z) if U = 1
= (V, W ) if U = 0 ,

we have

P (X = i) = P (U = 1, Z = i) + P (U = 0, V = i)
= P (U = 1)P (Z = i) + P (U = 0)P (V = i)
= α(i) ∧ β(i) + (α(i) − β(i))+ = α(i),

and similarly, P (Y = i) = β(i). Therefore, (X, Y ) ∈ D(α, β). Also, P (X = Y ) =


P (U = 1) = 1 − dV (α, β), that is P (X 6= Y ) = dV (α, β). 

A sequence {Xn }n≥1 of discrete random variables with values in E is said to


converge in distribution to the probability distribution π on E if for all i ∈ E,
limn↑∞ P (Xn = i) = π(i). It is said to converge in variation to this distribution if
X
lim |P (Xn = i) − π(i)| = 0 .
n↑∞
i∈E
3.2. CONVERGENCE IN VARIATION 51

Observe that Definition 3.2.3 concerns only the marginal distributions of the
stochastic process, not the stochastic process itself. Therefore, if there exists an-
D
other stochastic process {Xn′ }n≥0 such that Xn ∼ Xn′ for all n ≥ 0, and if there
D
exists a third one {Xn′′ }n≥0 such that Xn′′ ∼ π for all n ≥ 0, then (3.13) follows
from
lim dV (Xn′ , Xn′′ ) = 0. (3.10)
n↑∞

This trivial observation is useful because of the resulting freedom in the choice of
{Xn′ } and {Xn′′ }. An interesting situation occurs when there exists a finite random
time τ such that Xn′ = Xn′′ for all n ≥ τ .

Definition 3.2.2 Two stochastic processes {Xn′ }n≥0 and {Xn′′ }n≥0 taking their val-
ues in the same state space E are said to couple if there exists an almost surely
finite random time τ such that

n ≥ τ ⇒ Xn′ = Xn′′ . (3.11)

The random variable τ is called a coupling time of the two processes.

Theorem 3.2.2 For any coupling time τ of {Xn′ }n≥0 and {Xn′′ }n≥0 , we have the
coupling inequality
dV (Xn′ , Xn′′ ) ≤ P (τ > n) . (3.12)

Proof. For all A ⊆ E,

P (Xn′ ∈ A) − P (Xn′′ ∈ A) = P (Xn′ ∈ A, τ ≤ n) + P (Xn′ ∈ A, τ > n)


− P (Xn′′ ∈ A, τ ≤ n) − P (Xn′′ ∈ A, τ > n)
= P (Xn′ ∈ A, τ > n) − P (Xn′′ ∈ A, τ > n)
≤ P (Xn′ ∈ A, τ > n) ≤ P (τ > n).

Inequality (3.12) then follows from Lemma 3.2.1. 

Therefore, if the coupling time is P-a.s. finite, that is limn↑∞ P (τ > n) = 0,

lim dV (Xn , π) = lim dV (Xn′ , Xn′′ ) = 0 .


n↑∞ n↑∞

Definition 3.2.3 (A) A sequence {αn }n≥0 of probability distributions on E is said


to converge in variation to the probability distribution β on E if

lim dV (αn , β) = 0 .
n↑∞
52 CHAPTER 3. LONG-RUN BEHAVIOUR

(B) An E-valued random sequence {Xn }n≥0 such that for some probability dis-
tribution π on E,
lim dV (Xn , π) = 0, (3.13)
n↑∞

is said to converge in variation to π.

Kolmogorov’s hmc convergence theorem


Consider a hmc that is irreducible and positive recurrent. If its initial distribution
is the stationary distribution, it keeps the same distribution at all times. The chain
is then said to be in the stationary regime, or in equilibrium, or in steady state.
A question arises naturally: What is the long-run behavior of the chain when the
initial distribution µ is arbitrary? For instance, will it converge to equilibrium? in
what sense? The classical form of the result is that for arbitrary states i and j,
lim pij (n) = π(j) , (3.14)
n↑∞

if the chain is ergodic, according to the following definition:

Definition 3.2.4 An irreducible positive recurrent and aperiodic hmc is called


ergodic.

If the state space is finite, computation of the n-th iterate of the transition matrix
P is all that we need, in principle, to prove (3.14). Such computation requires some
knowledge of the eigenstructure of P, and there is a famous result of linear algebra,
the Perron–Fröbenius theorem, that does the work. We shall give the details in
Subsection 3.2. However, in the case of infinite state space, linear algebra fails to
provide the answer, and recourse to other methods is necessary.
In fact, (3.14) can be drastically improved:
Theorem 3.2.3 Let {Xn }n≥0 be an ergodic hmc on the countable state space E
with transition matrix P and stationary distribution π, and let µ be an arbitrary
initial distribution. Then
X
lim |Pµ (Xn = i) − π(i)| = 0,
n↑∞
i∈E

and in particular, for all j ∈ E,


X
lim |pji (n) − π(i)| = 0.
n↑∞
i∈E

The proof will be given in Section 3.2.


3.2. CONVERGENCE IN VARIATION 53

The coupling proof


The proof of Theorem 3.2.3 will be given via the coupling method.

Proof. We prove that, for all probability distributions µ and ν on E,

lim dV (µT Pn , ν T Pn ) = 0.
n↑∞

The announced results correspond to the particular case where ν is the stationary
distribution π, and particularizing further, µ = δj . From the discussion preceding
Definition 3.2.2, it suffices to construct two coupling chains with initial distribu-
tions µ and ν, respectively. This is done in the next lemma. 

(1) (2)
Lemma 3.2.2 Let {Xn }n≥0 and {Xn }n≥0 be two independent ergodic hmcs
with the same transition matrix P and initial distributions µ and ν, respectively.
(1) (2)
Let τ = inf{n ≥ 0; Xn = Xn }, with τ = ∞ if the chains never intersect. Then
τ is, in fact, almost surely finite. Moreover, the process {Xn′ }n≥0 defined by
(
(1)
Xn if n ≤ τ,
Xn′ = (2) (3.15)
Xn if n ≥ τ

is an hmc with transition matrix P (see the figure below). FIGURE?

(1) (2)
Proof. Step 1. Consider the product hmc {Zn }n≥0 defined by Zn = (Xn , Xn ).
It takes values in E × E, and the probability of transition from (i, k) to (j, ℓ) in n
steps is pij (n)pkℓ (n). We first show that this chain is irreducible. The probability
of transition from (i, k) to (j, ℓ) in n steps is pij (n)pkℓ (n). Since P is irreducible and
aperiodic, by Theorem 1.2.2, there exists m such that for all pairs (i, j) and (k, ℓ),
n ≥ m implies pij (n)pkℓ (n) > 0. This implies irreducibility. (Note the essential
role of aperiodicity. A simple counterexample is that of the the symmetric random
walk on , which is irreducible but of period 2. The product of two indepencent
such hmc’s is the symmetric random walk on 2 which has two communications
classes.)
Step 2. Next we show that the two independent chains meet in finite time. Clearly,
the distribution σ̃ defined by σ̃(i, j) := π(i)π(j) is a stationary distribution for
the product chain, where π is the stationary distribution of P. Therefore, by
the stationary distribution criterion, the product chain is positive recurrent. In
particular, it reaches the diagonal of E 2 in finite time, and consequently, P (τ <
∞) = 1.
54 CHAPTER 3. LONG-RUN BEHAVIOUR

It remains to show that {Xn′ }n≥0 given by (3.15) is an hmc with transition matrix
P. For this we use the following lemma.

Lemma 3.2.3 Let X01 , X02 , Zn1 , Zn2 (n ≥ 1), be independent random variables, and
suppose moreover that Zn1 , Zn2 (n ≥ 1) are identically distributed. Let τ be a non-
negative integer-valued random variable such that for all m ∈ , the event {τ = m}
is expressible in terms of X01 , X02 , Zn1 , Zn2 (n ≤ m). Define the sequence {Zn }n≥1 by
Zn = Zn1 if n ≤ τ
= Zn2 if n > τ
Then, {Zn }n≥1 has the same distribution as {Zn1 }n≥1 and is independent of X01 , X02 .

Proof. For any sets sets C1 , C2 , A1 , . . . , Ak in the appropriate spaces,


P (X01 ∈ C1 , X02 ∈ C2 , Zℓ ∈ Aℓ , 1 ≤ ℓ ≤ k)
k
X
= P (X01 ∈ C1 , X02 ∈ C2 , Zℓ ∈ Aℓ , 1 ≤ ℓ ≤ k, τ = m)
m=0
+ P (X01 ∈ C1 , X02 ∈ C2 , Z1 ∈ A1 , . . . , Zk ∈ Ak , τ > k)
k
X
= P (X01 ∈ C1 , X02 ∈ C2 , Zℓ1 ∈ Aℓ , 1 ≤ ℓ ≤ m, τ = m, Zr2 ∈ Ar , m + 1 ≤ r ≤ k)
m=0
+ P (X01 ∈ C1 , X02 ∈ C2 , Zℓ1 ∈ Aℓ , 1 ≤ ℓ ≤ k, τ > k) .
2
Since the event {τ = m} is independent of Zm+1 ∈ Am+1 , . . . , Zk2 ∈ Ak (k ≥ m),
k
X
= P (X01 ∈ C1 , X02 ∈ C2 , Zℓ1 ∈ Aℓ , 1 ≤ ℓ ≤ m, τ = m)P (Zr2 ∈ Ar , m + 1 ≤ r ≤ k)
m=0
+ P (X01 ∈ C1 , X02 ∈ C2 , Zℓ1 ∈ Aℓ , 1 ≤ ℓ ≤ k, τ > k)
k
X
= P (X01 ∈ C1 , X02 ∈ C2 , Zℓ1 ∈ Aℓ , 1 ≤ ℓ ≤ m, τ = m, Zr1 ∈ Ar , m + 1 ≤ r ≤ k)
m=0
+ P (X01 ∈ C1 , X02 ∈ C2 , Zℓ1 ∈ Aℓ , 1 ≤ ℓ ≤ k, τ > k)
= P (X01 ∈ C1 , X02 ∈ C2 , Z11 ∈ A1 , . . . , Zk1 ∈ Ak ) .


Step 3. We now complete the proof. The statement of the theorem concerns
only the distributions of {Xn1 }n≥0 and {Xn2 }n≥0 , and therefore we can assume a
representation

Xn+1 = f (Xnℓ , Zn+1

) (ℓ = 1, 2) ,
3.2. CONVERGENCE IN VARIATION 55

where X01 , X02 , Zn1 , Zn2 (n ≥ 1) satisfy the conditions stated in Lemma 3.2.3. The
random time τ satisfies the condition of Lemma 3.2.3. Defining {Zn }n≥1 in the
same manner as in this lemma, we therefore have

Xn+1 = f (Xn , Zn+1 ) ,

which proves the announced result. 

Null recurrent case


Theorem 3.2.3 concerns the positive recurrent case. In the null recurrent case we
have Orey’s theorem:
Theorem 3.2.4 Let P be an irreducible null recurrent transition matrix on E.
Then for all i, j ∈ E,
lim pij (n) = 0. (3.16)
n↑∞

Perron–Frobenius
When the state space of a hmc is finite, we can rely on the standard results of linear
algebra to study the asymptotic behavior of the n-step transition matrix Pn , which
depends on the eigenstructure of P. The Perron–Frobenius theorem detailing the
eigenstructure of non-negative matrices is therefore all that is needed, at least in
theory.
The main result of Perron and Frobenius is that convergence to steady state of
an ergodic finite state space hmc is geometric, with relative speed equal to the
second-largest eigenvalue modulus (slem). Even if there are a few interesting
models, especially in biology, where the eigenstructure of the transition matrix
can be extracted, this situation remains nevertheless exceptional. It is therefore
important to find estimates of the slem.
From the basic results of the theory of matrices relative to eigenvalues and eigen-
vectors we quote the following one, relative to a square matrix A of dimension r
with distinct eigenvalues denoted λ1 , . . . , λr . Let u1 , . . . , ur and v1 , . . . , vr be the
associated sequences of left and right eigenvectors, respectively. Then, u1 , . . . , ur
form an independent collection of vectors, and so do v1 , . . . , vr . Also, uTi vj = 0 if
i 6= j. Since eigenvectors are determined up to multiplication by an arbitrary non-
null scalar, one can choose them in such a way that uTi vi = 1 for all i, 1 ≤ i ≤ r.
We then have the spectral decomposition
r
X
n
A = λni vi uTi . (3.17)
i=1
56 CHAPTER 3. LONG-RUN BEHAVIOUR

Example 3.2.1: Two-state chain. Consider the transition matrix on E =


{1, 2}  
1−α α
P= ,
β 1−β
where α, β ∈ (0, 1). Its characteristic polynomial (1 − α − λ)(1 − β − λ) − αβ
admits the roots λ1 = 1 and
λ2 = 1 − α − β.
Observe at this point that λ = 1 is always an eigenvalue of a stochastic r×r matrix
P, associated with the right eigenvector v = 1 with all entries equal to 1, since
1
P1 = 1. Also, the stationary distribution π T = α+β (β, α) is the left eigenvector
corresponding to the eigenvalue 1. In this example, the representation (3.17) takes
the form    
n 1 β α (1 − α − β)n α −α
P = + ,
α+β β α α+β −β +β
and therefore, since |1 − α − β| < 1,
 
n 1 β α
lim P = .
n↑∞ α+β β α

In particular, the result of convergence to steady state,

lim Pn = 1π T = P∞ ,
n↑∞

is obtained for this special case in a purely algebraic way. In addition, this algebraic
method gives the convergence speed, which is exponential and determined by the
second-largest eigenvalue absolute value. This is a general fact, which follows from
the Perron–Frobenius theory of non-negative matrices below.

A matrix A = {aij }1≤i,j≤r with real coefficients is called non-negative (resp., posi-
tive) if all its entries are
P non-negative (resp., positive). A non-negative
P matrix A
is called stochastic if rj=1 aij = 1 for all i, and substochastic if rj=1 aij ≤ 1 for
all i, with strict inequality for at least one i.
Non-negativity (resp., positivity) of A will be denoted by A ≥ 0 (resp., A > 0).
If A and B are two matrices of the same dimensions with real coefficients, the
notation A ≥ B (resp., A > B) means that A − B ≥ 0 (resp., A − B > 0).
The communication graph of a square non-negative matrix A is the directed graph
with the state space E = {1, . . . , r} as its set of vertices and an directed edge from
vertex i to vertex j if and only if aij > 0.
3.2. CONVERGENCE IN VARIATION 57

A non-negative square matrix A is called irreducible (resp., irreducible aperiodic) if


it has the same communication graph as an irreducible (resp., irreducible aperiodic)
stochastic matrix. It is called primitive if there exists an integer k such that Ak > 0.

Example 3.2.2: A non-negative matrix is primitive if and only if it is irreducible


and aperiodic (Exercise ??).

Let A be a non-negative primitive r × r matrix. There exists a real eigenvalue


λ1 with algebraic as well as geometric multiplicity one such that λ1 > 0, and
λ1 > |λj | for any other eigenvalue λj . Moreover, the left eigenvector u1 and
the right eigenvector v1 associated with λ1 can be chosen positive and such that
uT1 v1 = 1.
Let λ2 , λ3 , . . . , λr be the eigenvalues of A other than λ1 ordered in such a way that

λ1 > |λ2 | ≥ · · · ≥ |λr | (3.18)

We may always order the eigenvalues in such a way that if |λ2 | = |λj | for some
j ≥ 3, then m2 ≥ mj , where mj is the algebraic multiplicity of λj . Then

An = λn1 v1 uT1 + O(nm2 −1 |λ2 |n ). (3.19)

If in addition, A is stochastic (resp., substochastic), then λ1 = 1 (resp., λ1 < 1).


If A is stochastic and irreducible with period d > 1, then there are exactly d
distinct eigenvalues of modulus 1, namely the d-th roots of unity, and all other
eigenvalues have modulus strictly less than 1.

Example 3.2.3: Convergence rates via Perron–Frobenius. If P is a


transition matrix on E = {1, . . . , r} that is irreducible and aperiodic, and therefore
primitive, then
v1 = 1, u1 = π,

where π is the unique stationary distribution. Therefore

Pn = 1π T + O(nm2 −1 |λ2 |n ), (3.20)

which generalizes the observation in Example 3.2.1.


58 CHAPTER 3. LONG-RUN BEHAVIOUR

3.3 Monte Carlo


Recall the method of the inverse in order to generate a discrete random variable
Z with distribution P (Z = ai ) = pi (0 ≤ i ≤ K). A crude algorithm based on
this method would perform successively the tests U ≤ p0 ?, U ≤ p0 + p1 ?, . . ., until
the answer is positive. Although very simple in principle, the inverse method has
the following drawbacks when the size r of the state space E is large.
(a) Problems arise that are due to the small size of the intervals partitioning
[0, 1] and to the cost of precision in computing.
(b) Another situation is that in which the probability density π isPknown only
up to a normalizing factor, that is, π(i) = K π̃(i), and when the sum i∈E π(i) =
K −1 that gives the normalizing factor is prohibitively difficult to compute. In
physics, this is a frequent case.

Approximate sampling

The quest for a random generator without these ailments is at the origin of the
Monte Carlo Markov chain (mcmc) sampling methodology. The basic principle
is the following. One constructs an irreducible aperiodic hmc {Xn }n≥0 with state
space E and stationary distribution π. Since the state space is finite, the chain
is ergodic, and therefore, by Theorem 3.2.3, for any initial distribution µ and all
i ∈ E,
lim Pµ (Xn = i) = π(i) . (3.21)
n→∞
Therefore, when n is “large,” we can consider that Xn has a distribution close to
π.
The first task is that of designing the mcmc algorithm. One must find an ergodic
transition matrix P on E, the stationary distribution of which π. In the Monte
Carlo context, the transition mechanism of the chain is called a sampling algorithm,
and the asymptotic distribution π is called the target distribution, or sampled
distribution.
There are infinitely many transition matrices with a given target distribution, and
among them there are infinitely many that correspond to a reversible chain, that
is, such that
π(i)pij = π(j)pji .
We seek solutions of the form
pij = qij αij (3.22)
for j 6= i, where Q = {qij }i,j∈E is an arbitrary irreducible transition matrix on
E, called the candidate-generator matrix. When the present state is i, the next
3.3. MONTE CARLO 59

tentative state j is chosen with probability qij . When j 6= i, this new state is
accepted with probability αij . Otherwise, the next state is the same state i. Hence,
the resulting probability of moving from i to j when i 6= j is given by (3.22). It
remains to select the acceptance probabilities αij .

Example 3.3.1: Metropolis, take 1. In the Metropolis algorithm


 
π(j)qji
αij = min 1, .
π(i)qij

In Physics, it often arises, and we shall understand why later, that the distribution
π is of the form 3.23.
e−U (i)
π(i) = , (3.23)
Z
where U : E → is the “energy function” and Z is the “partition function”,
the normalizing constant ensuring that π is indeed a probability vector. The
acceptance probability of the transition from i to j is then, assuming the candidate-
generating matrix to be symmetric,

αij = min 1, e−(U (j)−U (i)) .

Example 3.3.2: Barker’s algorithm. The Barker algorithm, corresponds to


the choice
π(j)qji
αij = . (3.24)
π(j)qji + π(i)qij
When the distribution π is of the form 3.23, the acceptance probability of the tran-
sition from i to j is, assuming the candidate-generating matrix to be symmetric,

e−U (i)
αij = −U (i) .
e + e−U (j)
This corresponds to the basic principle of statistical thermodynamics: when two
−E
states 1 and 2 with energies E1 and E2 , choose 1 with probability e−Ee1 +e1−E2 .

Example 3.3.3: The Gibbs algorithm. Consider a multivariate probability


distribution
π(x(1), . . . , x(N ))
60 CHAPTER 3. LONG-RUN BEHAVIOUR

on a set E = ΛN , where Λ is countable. The basic step of the Gibbs sampler for
the multivariate distribution π consists in selecting a coordinate number 1 ≤ i ≤,
at random, and choosing the new value y(i) of the corresponding coordinate, given
the present values x(1), . . . , x(i − 1), x(i + 1), . . . , x(N ) of the other coordinates,
with probability
π(y(i) | x(1), . . . , x(i − 1), x(i + 1), . . . , x(N )).
One checks as above that π is the stationary distribution of the corresponding
chain.

Exact sampling
We attempt to construct an exact sample of a given π on a finite state space E,
that is a random variable Z such that P (Z = i) = π(i) for all i ∈ E. The following
algorithm (Propp–Wilson algorithm) is based on a coupling idea. One starts as
usual from an ergodic transition matrix P with stationary distribution π, just as
in the classical mcmc method.
The algorithm is based on a representation of P in terms of a recurrence equation,
that is, for given a function f and an iid sequence {Zn }n≥1 independent of the
initial state, the chain satisfies the recurrence
Xn+1 = f (Xn , Zn+1 ) . (3.25)
The Propp-Wilson algorithm constructs a family of hmc with this transition ma-
trix with the help of a unique iid sequence of random vectors {Yn }n∈ , called the
updating sequence, where Yn = (Zn+1 (1), · · · , Zn+1 (r)) is a r-dimensional random
vector, and where the coordinates Zn+1 (i) have a common distribution, that of Z1 .
For each N ∈ and each k ∈ E, a process {XnN (k)}n≥N is defined recursively by:
XNN (k) = k,
and, for n ≥ N ,
N
Xn+1 (k) = f (XnN (k), Zn+1 (XnN (k)).
(Thus, if the chain is in state i at time n, it will be at time n + 1 in state j =
f (i, Zn+1 (i).) Each of these processes is therefore a hmc with the transition matrix
P. Note that for all k, ℓ ∈ E, and all M, N ∈ , the hmc’s {XnN (k)}n≥N and
{XnM (ℓ)}n≥M use at any time n ≥ max(M, N ) the same updating random vector
Yn+1 .
If, in addition to the independence of {Yn }n∈ , the components Zn+1 (1), Zn+1 (2),
. . ., Zn+1 (r) are, for each n ∈ , independent, we say that the updating is compo-
nentwise independent.
3.3. MONTE CARLO 61

Definition 3.3.1 The random time

τ + = inf{n ≥ 0; Xn0 (1) = Xn0 (2) = · · · = Xn0 (r)}

is called the forward coupling time (Fig. 3.1). The random time

τ − = inf{n ≥ 1; X0−n (1) = X0−n (2) = · · · = X0−n (r)}

is called the backward coupling time (Fig. 3.1).

5
4
3
2
1
−n +n
−7 −6 −5 −4 −3 −2 −1 0 0 +1 +2 +3 +4
τ− = 7 τ+ = 4

Figure 1. Backward and forward coupling

Thus, τ + is the first time at which the chains {Xn0 (i)}n≥0 , 1 ≤ i ≤ r, coalesce.

Lemma 3.3.1 When the updating is componentwise independent, the forward cou-
pling time τ + is almost surely finite.

Proof. Consider the (immediate) extension of Lemma 3.2.2 to the case of r inde-
pendent hmc’s with the same transition matrix. It cannot be applied directly to
our situation, because the chains are not independent. However, the probability
of coalescence in our situation is bounded below by the probability of coalescence
in the completely independent case. To see this, first construct the independent
chains model, using r independent iid componentwise independent updating se-
quences. The difference with our model is that we use too many updatings. In
order to construct from this a set of r chains as in our model, it suffices to use for
two chains the same updatings as soon as they meet. Clearly, the forward cou-
pling time of the so modified model is smaller than or equal to that of the initial
completely independent model. 
62 CHAPTER 3. LONG-RUN BEHAVIOUR

For easier notation, we set τ − = τ . Let

Z = X0−τ (i).

(This random variable is independent of i. In Figure 1, Z = 2.) Then,


Theorem 3.3.1 With a componentwise independent updating sequence, the back-
wardward coupling time τ is almost surely finite. Also, the random variable Z has
the distribution π.

Proof. We shall show at the end of the current proof that for all k ∈ , P (τ ≤
k) = P (τ + ≤ k), and therefore the finiteness of τ follows from that of τ + proven
in the last lemma. Now, since for n ≥ τ , X0−n (i) = Z,

P (Z = j) = P (Z = j, τ > n) + P (Z = j, τ ≤ n)
= P (Z = j, τ > n) + P (X0−n (i) = j, τ ≤ n)
= P (Z = j, τ > n) − P (X0−n (i) = j, τ > n) + P (X0−n (i) = j)
= P (Z = j, τ > n) − P (X0−n (i) = j, τ > n) + pij (n)
= An − Bn + pij (n)

But An and Bn are bounded above by P (τ > n), a quantity that tends to 0 as
n ↑ ∞ since τ is almost-surely finite. Therefore

P (Z = j) = lim pij (n) = π(j).


n↑∞

It remains to prove the equality of the distributions of the forwards and backwards
coupling time. For this, select an arbitrary integer k ∈ . Consider an updating
sequence constructed from a bona fide updating sequence {Yn }n∈ , by replacing
Y−k+1 , Y−k+2 , . . . , Y0 by Y1 , Y2 , . . . , Yk . Call τ ′ the backwards coupling time in the
modified model. Clearly τ an τ ′ have the same distribution.
E

5
4
3
2
1

−7 −6 −5 −4 −3 −2 −1 0 0 +1 +2 +3 +4 +5 +6 +7
Y0 Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y0 Y1 Y2 Y3 Y4 Y5 Y6 Y7

Figure 2. τ + ≤ k implies τ ′ ≤ k
3.3. MONTE CARLO 63

Suppose that τ + ≤ k. Consider in the modified model the chains starting at


time −k from states 1, . . . , r. They coalesce at time −k + τ + ≤ 0 (see Fig. 2), and
consequently τ ′ ≤ k. Therefore τ + ≤ k implies τ ′ ≤ k, so that

P (τ + ≤ k) ≤ P (τ ′ ≤ k) = P (τ ≤ k).

5
4
3
2
1

−7 −6 −5 −4 −3 −2 −1 0 0 +1 +2 +3 +4 +5 +6 +7
Y0 Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y 0 Y1 Y 2 Y 3 Y 4 Y 5 Y 6 Y 7

Figure 3. τ ′ ≤ k implies τ + ≤ k

Now, suppose that τ ′ ≤ k. Then, in the modified model, the chains starting at
time k − τ ′ from states 1, . . . , r must at time −k + τ + ≤ 0 coalesce at time k.
Therefore (see Fig. 3), τ + ≤ k. Therefore τ ′ ≤ k implies τ + ≤ k, so that

P (τ ≤ k) = P (τ ′ ≤ k) ≤ P (τ + ≤ k).

Note that the coalesced value at the forward coupling time is not a sample of π
(see Exercise 3.5.12).

Sandwiching
The above exact sampling algorithm is often prohibitively time-consuming when
the state space is large. However, if the algorithm required the coalescence of
two, instead of r processes, then it would take less time. The Propp and Wilson
algorithm does this in a special, yet not rare, case.
It is now assumed that there exists a partial order relation on E, denoted by ,
with a minimal and a maximal element (say, respectively, 1 and r), and that we
can perform the updating in such a way that for all i, j ∈ E, all N ∈ , all n ≥ N ,

i  j ⇒ XnN (i)  XnN (j).


64 CHAPTER 3. LONG-RUN BEHAVIOUR

However we do not require componentwise independent updating (but the updat-


ing vectors sequence remains iid). The corresponding sampling procedure is called
the monotone Propp–Wilson algorithm.
Define the backwards monotone coupling time

τm = inf{n ≥ 1; X0−n (1) = X0−n (r)} .

5
4
3
2
1

-n
−6 −5 −4 −3 −2 −1 0
τ =6

Figure 4. Monotone Propp–Wilson algorithm

Theorem 3.3.2 The monotone backwards coupling time τm is almost surely finite.
Also, the random variables X0−τm (1) = X0−τm (r) has the distribution π.

Proof. We can use most of the proof of Theorem 3.3.1. We need only to prove
independently that τ + is finite. It is so because τ + is dominated by the first time
n ≥ 0 such that Xn0 (r) = 1, and the latter is finite in view of the recurrence
assumption. 

Monotone coupling will occur with representations of the form (3.25) such that for
all z,
i  j ⇒ f (i, z)  f (j, z),
and if for all n ∈ , all i ∈ {1, . . . , r},

Zn+1 (i) = Zn+1 .

Example 3.3.4: A dam model. We consider the following model of a dam


reservoir. The corresponding hmc, with values in E = {0, 2, . . . , r} satisfies the
3.4. ABSORPTION 65

recurrence equation

Xn+1 = min{r, max(0, Xn + Zn+1 )},

where, as usual, {Zn }n≥1 is iid. In this specific model, Xn is the content at time
n of a dam reservoir with maximum capacity r, and Zn+1 = An+1 − c, where An+1
is the input into the reservoir during the time period from n to n + 1, and c is the
maximum release during the same period. The updating rule is then monotone.

3.4 Absorption
Before absorption
We now consider the absorption problem for hmc’s based only on the transition
matrix P, not necessarily
P assumed irreducible. The state space E is then decom-
posable as E = T + j Rj , where R1 , R2 , . . . are the disjoint recurrent classes and
T is the collection of transient states. (Note that the number of recurrent classes
as well as the number of transient states may be infinite.) The transition matrix
can therefore be block-partitioned as
 
P1 0 ··· 0
 0 P2 · · · 0 
 
P =  .. .. . . .. 
 . . . .
B(1) B(2) · · · Q

or in condensed notation,  
D 0
P= . (3.26)
B Q
This structure of the transition matrix accounts for the fact that one cannot go
from a state in a given recurrent class to any state not belonging to this recurrent
class. In other words, a recurrent class is closed.
What is the probability of being absorbed by a given recurrent class when starting
from a given transient state? This kind of problem was already addressed when
the first-step analysis method was introduced. It led to systems of linear equations
with boundary conditions, for which the solution was unique, due to the finiteness
of the state space. With an infinite state space, the uniqueness issue cannot be
overlooked, and the absorption problem will be reconsidered with this in mind,
and also with the intention of finding general matrix-algebraic expressions for the
solutions. Another phenomenon not manifesting itself in the finite case is the
66 CHAPTER 3. LONG-RUN BEHAVIOUR

possibility, when the set of transient states is infinite, of never being absorbed by
the recurrent set. We shall consider this problem first, and then proceed to derive
the distribution of the time to absorption by the recurrent set, and the probability
of being absorbed by a given recurrent class.
Let A be a subset of the state space E (typically the set of transient states, but
not necessarily). We aim at computing for any initial state i ∈ A the probability
of remaining forever in A,

v(i) = Pi (Xr ∈ A; r ≥ 0).

Defining vn (i) := Pi (X1 ∈ A, . . . , Xn ∈ A), we have, by monotone sequential


continuity,
lim ↓ vn (i) = v(i).
n↑∞
P P
But for j ∈ A, Pi (X1 ∈ A, . . . , Xn−1 ∈ A, Xn = j) = i1 ∈A · · · in−1 ∈A pii1 · · · pin−1 j
is the general term qijP
(n) of the n-th iterate of the restriction Q of P to the set
A. Therefore vn (i) = j∈A qij (n), that is, in vector notation,

v n = Q n 1A ,

where 1A is the column vector indexed by A with all entries equal to 1. From this
equality we obtain
vn+1 = Qvn ,
and by dominated convergence v = Qv. Moreover, 0A ≤ v ≤ 1A , where 0A is the
column vector indexed by A with all entries equal to 0. The above result can be
refined as follows:
Theorem 3.4.1 The vector v is the maximal solution of

v = Qv, 0A ≤ v ≤ 1A .

Moreover, either v = 0A or supi∈A v(i) = 1. In the case of a finite transient set T ,


the probability of infinite sojourn in T is null.

Proof. Only maximality and the last statement remain to be proven. To prove
maximality consider a vector u indexed by A such that u = Qu and 0A ≤ u ≤ 1A .
Iteration of u = Qu yields u = Qn u, and u ≤ 1A implies that Qn u ≤ Qn 1A = vn .
Therefore, u ≤ vn , which gives u ≤ v by passage to the limit.
To prove the last statement of the theorem, let c = supi∈A v(i). From v ≤ c1A , we
obtain v ≤ cvn as above, and therefore, at the limit, v ≤ cv. This implies either
v = 0A or c = 1.
3.4. ABSORPTION 67

When the set T is finite, the probability of infinite sojourn in T is null, because
otherwise at least one transient state would be visited infinitely often. 

Equation v = Qv reads
X
v(i) = pij v(j) (i ∈ A) .
j∈A

First-step analysis gives this equality as a necessary condition. However, it does


not help to determine which solution to choose, in case there are several.

Example 3.4.1: The repair shop once more. We shall prove in a different
way a result already obtained in Subsection 2.4, that is: the chain is recurrent if
and only if ρ ≤ 1,. Observe that the restriction of P to Ai := {i + 1, i + 2, . . .},
namely  
a 1 a 2 a3 · · ·
a0 a1 a2 · · ·
Q= 
,
a0 a 1 · · · 
···
does not depend on i ≥ 0. In particular, the maximal solution of v = Qv, 0A ≤ v ≤
1A when A ≡ Ai has, in view of Theorem 3.4.1, the following two interpretations.
Firstly, for i ≥ 1, 1 − v(i) is the probability of visiting 0 when starting from i ≥ 1.
Secondly, (1 − v(1)) is the probability of visiting {0, 1, . . . , i} when starting from
i + 1. But when starting from i + 1, the chain visits {0, 1, . . . , i} if and only if it
visits i, and therefore (1 − v(1)) is also the probability of visiting i when starting
from i + 1. The probability of visiting 0 when starting from i + 1 is

1 − v(i + 1) = (1 − v(1))(1 − v(i)),

because in order to go from i + 1 to 0 one must first reach i, and then go to 0.


Therefore, for all i ≥ 1,
v(i) = 1 − β i ,
where β = 1 − v(1). To determine β, write the first equality of v = Qv:

v(1) = a1 v(1) + a2 v(2) + · · · ,

that is,
(1 − β) = a1 (1 − β) + a2 (1 − β 2 ) + · · · .
P
Since i≥0 ai = 1, this reduces to

β = g(β) , (⋆)
68 CHAPTER 3. LONG-RUN BEHAVIOUR

where g is the generating function of the probability distribution (ak , k ≥ 0). Also,
all other equations of v = Qv reduce to (⋆).
Under the irreduciblity assumptions a0 > 0, a0 + a1 < 1, (⋆) has only one solution
in [0, 1], namely β = 1 if ρ ≤ 1, whereas if ρ > 1, it has two solutions in [0, 1],
this probability is β = 1 and β = β0 ∈ (0, 1). We must take the smallest solution.
Therefore, if ρ > 1, the probability of visiting state 0 when starting from state
i ≥ 1 is 1 − v(i) = β0i < 1, and therefore the chain is transient. If ρ ≤ 1, the latter
probability is 1 − v(i) = 1, and therefore the chain is recurrent.

Example 3.4.2: 1-D random walk, take 5. The transition matrix of the
random walk on N with a reflecting barrier at 0,
 
0 1
q 0 p 
 
P=
 q 0 p
,

 q 0 p 
. . .

where p ∈ (0, 1), is clearly irreducible. Intuitively, if p > q, there is a drift to the
right, and one expects the chain to be transient. This will be proven formally by
showing that the probability v(i) of never visiting state 0 when starting from state
i ≥ 1 is strictly positive. In order to apply Theorem 3.4.1 with A = N − {0}, we
must find the general solution of u = Qu. This equation reads

u(1) = pu(2),
u(2) = qu(1) + pu(3),
u(3) = qu(2) + pu(4),
···
Pi−1  q j
and its general solution is u(i) = u(1) j=0 p
. The largest value of u(1) re-
 
specting the constraint u(i) ∈ [0, 1] is u(1) = 1− pq . The solution v(i) is therefore

 i
q
v(i) = 1 − .
p
3.4. ABSORPTION 69

Time to absorption
We now turn to the determination of the distribution of τ , the time of exit from the
transient set T . Theorem 3.4.1 tells that v = {v(i)}i∈T , where v(i) = Pi (τ = ∞),
is the largest solution of v = Qv subject to the constraints 0T ≤ v ≤ 1T , where
Q is the restriction of P to the transient set T . The probability distribution of τ
when the initial state is i ∈ T is readily computed starting from the identity

Pi (τ = n) = Pi (τ ≥ n) − Pi (τ ≥ n + 1)

and the observation that for n ≥ 1 {τ ≥ n} = {Xn−1 ∈ T }, from which we obtain,


for n ≥ 1,
X
Pi (τ = n) = Pi (Xn−1 ∈ T ) − P (Xn ∈ T ) = (pij (n − 1) − pij (n)).
j∈T

Now, pij (n) is for i, j ∈ T the general term of Qn , and therefore:


Theorem 3.4.2
Pi (τ = n) = {(Qn−1 − Qn )1T }i . (3.27)
In particular, if Pi (τ = ∞) = 0,

Pi (τ > n) = {Qn 1T }i .

Proof. Only the last statement remains to be proved. From (3.27),

m−1
X
Pi (n < τ ≤ n + m) = {(Qn+j − Qn+j−1 )1T }i
j=0
 
= Qn − Qn+m 1T i
,

and therefore, if Pi (τ = ∞) = 0, we obtain (3.27) by letting m ↑ ∞. 

Absorption destination
We seek to compute the probability of absorption by a given recurrent class when
starting from a given transient state. As we shall see later, it suffices for the theory
to treat the case where the recurrent classes are singletons. We therefore suppose
that the transition matrix has the form
 
I 0
P= . (3.28)
B Q
70 CHAPTER 3. LONG-RUN BEHAVIOUR

Let fij be the probability of absorption by recurrent class Rj = {j} when starting
from the transient state i. We have
 
n I 0
P = ,
Ln Q n

where Ln = (I + Q + · · · + Qn )B. Therefore, limn↑∞ Ln = SB. For i ∈ T , the


(i, j) term of Ln is
Ln (i, j) = P (Xn = j|X0 = i).
Now, if TRj is the first time of visit to Rj after time 0, then

Ln (i, j) = Pi (TRj ≤ n),

since Rj is a closed state. Letting n go to ∞ gives the following:


Theorem 3.4.3 For a hmc with transition matrix P of the form (3.28), the prob-
ability of absorption by recurrent class Rj = {j} starting from transient state i is

Pi (TRj < ∞) = (SB)i,Rj .


The general case, where the recurrence classes are not necessarily singletons, can
be reduced to the singleton case as follows. Let P∗ be the matrix obtained from
the transition matrix P, by grouping for each j the states of recurrent class Rj
into a single state ̂:  
1 0 0 0
0 1 0 0
 
P∗ =  . .  (3.29)
0 0 . 0
b1̂ b2̂ · · · Q
where b̂ = B(j)1T is obtained by summation of the columns of B(j), the matrix
consisting of the columns i ∈ Rj of B. The probability fiRj of absorption by class
Rj when starting from i ∈ T equals fˆi̂ , the probability of ever visiting ̂ when
starting from i, computed for the chain with transition matrix P∗ .

Example 3.4.3: Sibmating. In the reproduction model called sibmating (sister–


brother mating), two individuals are mated and two individuals from their offspring
are chosen at random to be mated, and this incestuous process goes on through
the subsequent generations.
We shall denote by Xn the genetic type of the mating pair at the nth genera-
tion. Clearly, {Xn }n≥0 is a hmc with six states representing the different pairs
of genotypes AA × AA, aa × aa, AA × Aa, Aa × Aa, Aa × aa, AA × aa, de-
noted respectively 1, 2, 3, 4, 5, 6. The following table gives the probabilities of
3.4. ABSORPTION 71

occurrence of the 3 possible genotypes in the descent of a mating pair:



AA Aa aa 



AA AA 1 0 0 



aa aa 0 0 1 
AA Aa 1/2 1/2 0 parents’ genotype


Aa Aa 1/4 1/2 1/4 



Aa aa 0 1/2 1/2 


AA aa 0 1 0
z }| {
descendant’s genotype
The transition matrix of {Xn }n≥0 is then easily deduced:
 
1
 1 
 
 1/4 1/2 1/4 
P=  1/16 1/16 1/4 1/4 1/4 1/8
.

 
 1/4 1/4 1/2 
1
The set R = {1, 2} is absorbing, and the restriction of the transition matrix to the
transient set T = {3, 4, 5, 6} is
 
1/2 1/4 0 0
 1/4 1/4 1/4 1/8 
Q=  0 1/4 1/2 0  .

0 1 0 0
We find
 
16 8 4 1
1  8 16 8 2 
S = (1 − Q)−1 =  ,
6  4 8 16 1 
8 16 8 8
and the absorption probability matrix is
   
1/4 0 3/4 1/4
 1/16 1/16   1/2 1/2 
SB = S  0
= .
1/4   1/4 3/4 
0 0 1/2 1/2

For instance, the (3, 2)entry, 43 , is the probability that when starting from a couple
of ancestors of type Aa × aa, the race will end up in genotype aa × aa.
72 CHAPTER 3. LONG-RUN BEHAVIOUR

3.5 Exercises
Exercise 3.5.1. abbabaa!
A sequence of A’s and B’s is formed as follows. The first item is chosen at random,
P (A) = P (B) = 12 , as is the second item, independently of the first one. When
the first n ≥ 2 items have been selected, the (n + 1)st is chosen, independently of
the letters in positions k ≤ n − 2 conditionally on the pair at position n − 1 and
n, as follows:
1 1 1 1
P (A | AA) = , P (A | AB) = , P (A | BA) = , P (A | BB) = .
2 2 4 4
What is the proportion of A’s and B’s in a long chain?

Exercise 3.5.2. Fixed-age retirement policy.


Let {Un }n≥1 be a sequence of iid random variables taking their values in + =
{1, 2, . . . , }. The random variable Un is interpreted as the lifetime of some equip-
ment, or “machine”, the nth one, which is replaced by the (n + 1)st one upon
failure. Thus at time 0, machine 1 is put in service until it breaks down at time
U1 , whereupon it is immediately replaced by machine 2, which breaks down at
time U1 + U2 , and so on. The time to next failure of the current machine at time n
is denoted by Xn . More precisely, the process {Xn }n≥0 takes its values in E = N,
P
equals 0 at time Rk = ki=1 Ui , equals Uk+1 − 1 at time Rk + 1, and then decreases
of one unit per unit of ime until it reaches the value 0 at time Rk+1 . It is assumed
that for all k ∈ + , P (U1 > k) > 0, so that the state space E is . Then {Xn }n≥0
is an irreducible hmc called thr forward recurrence chain. We assume positive
recurrence, that is E[U ] < ∞, where U = U1 .
A. Show that the chain is irreducible. Give the necessary and sufficient condition
for positive recurrence. Assuming positive recurrence, what is the stationary dis-
tribution? A visit of the chain to state 0 corresponds to a breakdown of a machine.
What is the empirical frequency of breakdowns?
B. Suppose that the cost of a breakdown is so important that it is better to replace
a working machine during its lifetime (breakdown implies costly repairs, whereas
replacement only implies moderate maintenance costs). The fixed-age retirement
policy fixes an integer T ≥ 1 and requires that a machine having reached age T
be immediately replaced. What is the empirical frequency of breakdowns (not
replacements)?

Exercise 3.5.3. Convergence speed via coupling.


Suppose that the coupling time τ in Theorem 3.2.3 satisfies
E[ψ(τ )] < ∞
3.5. EXERCISES 73

for some non-decreasing function ψ : N → R+ such that limn↑∞ ψ(n) = ∞. Show


that for any initial distributions µ and ν
 
T n T n 1
|µ P − ν P | = o .
ψ(n)

Exercise 3.5.4.
Let {Zn }n≥1 be an iid sequence of iid {0, 1}-valued random variables, P (Zn =
1) = p ∈ (0, 1). Show that for all k ≥ 1,

lim P (Z1 + Z2 + · · · Zn is divisible by k) = 1 .


n↑∞

Hint: modulo k.

Exercise 3.5.5.
Let P be an ergodic transition matrix on the finite state space E. Prove that
for any initial distributions µ and ν, one can construct two hmc’s {Xn }n≥0 and
{Yn }n≥0 on E with the same transition matrix P, and the respective initial dis-
tributions µ and ν, in such a way that they couple at a finite time τ such that
E[eατ ] < ∞ for some α > 0.

Exercise 3.5.6. The lazy random walk on the circle.


Consider N points on the circle forming the state space E := {0, 1, . . . , N − 1}.
Two points i, j are said to be neighbours if j = i ± 1 modulo n. Consider the
Markov chain {(Xn , Yn )}n≥0 with state space E × E and representing two particles
moving on E as follows. At each time n choose Xn or Yn with probability 21 and
move the corresponding particle to the left or to the right, equiprobably while
the other particle remains still. The initial positions of the particles are a and b
respectively. Compute the average time it takes until the two particles collide (the
average coupling time of two lazy random walks).

Exercise 3.5.7. Coupling time for the 2-state hmc.


Find the distribution of the first meeting time of two independent hmc with state
space E = {1, 2} and transition matrix
 
1−α α
P= ,
β 1−β

where α, β ∈ (0, 1), when their initial states are different.

Exercise 3.5.8. The snake chain.


74 CHAPTER 3. LONG-RUN BEHAVIOUR

Let {Xn }n≥0 be an hmc with state space E and transition matrix P. Define for
L ≥ 1, Yn = (Xn , Xn+1 , . . . , Xn+L ).
(a) The process {Yn }n≥0 takes its values in F = E L+1 . Prove that {Yn }n≥0 is an
hmc and give the general entry of its transition matrix. (The chain {Yn }n≥0 is
called the snake chain of length L + 1 associated with {Xn }n≥0 .)
(b) Show that if {Xn }n≥0 is irreducible, then so is {Yn }n≥0 if we restrict the state
space of the latter to be F = {(i0 , . . . , iL ) ∈ E L+1 ; pi0 i1 pi1 i2 · · · piL−1 iL > 0}. Show
that if the original chain is irreducible aperiodic, so is the snake chain.
(c) Show that if {Xn }n≥0 has a stationary distribution π, then {Yn }n≥0 also has a
stationary distribution. Which one?

Exercise 3.5.9. Target time.


Let π be the stationary distribution of an ergodic Markov chain with finite state
space, and denote by Ti the return time to state i. Let SZ be the time necessary
to visit for the first time the random state Z chosen according to the distribution
π, independently of the chain. Show that Ei [SZ ] is independent of i, and give its
expression in terms of the fundamental matrix.

Exercise 3.5.10. Mean time between successive visits of a set.


Let {Xn }n≥0 be an irreducible positive recurrent hmc with stationary distribution
π. Let A be a subset of the state space E and let {τ (k)}k≥1 be the sequence of
return times to A. Show that
τ (k) 1
lim =P .
k↑∞ k
i∈A π(i)

(This extends Formula (2.5)).

Exercise 3.5.11. Irreducibility of the Barker sampling chain.


Show that for both the Metropolis and Barker samplers, if Q is irreducible and U
is not a constant, then P(T ) is irreducible and aperiodic for all T > 0.

Exercise 3.5.12. Forward coupling does not yield exact sampling.


Refer to the Propp–Wilson algorithm. Show that the coalesced value at the for-
wards coupling time is not a sample of π. For a counterexample use the two-state
hmc with E = {1, 2}, p1,2 = 1, p2,2 = p2,1 = 1/2.

Exercise 3.5.13. The modified random walk.


Consider the usual random walk on a graph. Its stationary distribution is in
general non-uniform. We wish to modify it so as to obtain a hmc with uniform
3.5. EXERCISES 75

stationary distribution. Now accept a transition from vertex i to vertex j of the


original random walk with probability αij . Find one such acceptance probability
depending only d(i) and d(j) that guarantees that the corresponding Monte Carlo
Markov chain admits the uniform distribution as stationary distribution.

You might also like