Markov Chains (4728)
Markov Chains (4728)
Markov Chains (4728)
&
Paris Dauphine
Supervisor:
Constantin Dalyac Cyril Labbé
1
Introduction
Markov chains were introduced by Andrei Markov in the early 20th century in an
argument with Pavel Nekrasov, who claimed independence was necessary for the
weak law of large numbers to hold. Markov managed to show in 1906 that under
some conditions, the average of Markov chains will converge to a stationary
distribution, thus proving a weak law of large numbers without the assumption
of independence. Today, Markov chains are used in many domains, ranging from
Biology and Physics to speech recognition. Google decided to model websites
and links as Markov chains: using its mathematical properties was key in making
it the most-used research engine in the world. We will see in the mathematical
introduction that Markov chains can be described with matrices; a central aim of
this paper is to use the tools of linear algebra in order to understand the different
properties of Markov Chains, illustrating them with examples simulated with
Matlab. We will first explore the different characteristics of Markov chains
and the way they evolve in time.
2
Part I
Matrix Representation of Markov
Chains
Let Ω be a finite set of the form Ω = {x1 , x2 , . . . , xN }. A finite Markov chain is
a process which moves along the elements of Ω in the following manner: when at
xi ∈ Ω, the next position is chosen according to a fixed probability distribution
P (xi , ·). More precisely, a sequence of random variables (X0 , X1 , . . .) is a Markov
chain with state space Ω and transition matrix P if for all i, j ∈ J1; N K, all t ≥ 1,
and all events Ht−1 = ∩t−1 s=0 {Xs = ys }, ys ∈ Ω, satisfying P(Ht−1 ∩ {Xt = x}) >
0, we have:
This equation is called the Markov property, meaning that the con-
ditional probability of proceeding from state xi to state xj does not depend on
the states preceding xi . Hence the total information on the Markov chain is
contained in a matrix P ∈ MN ([0; 1]). P is a stochastic matrix, i.e. its entries
are all non-negative and
X
P (x, y) = 1, for all ∈ Ω.
y∈Ω
µ0 := (0, 1, 0, . . . , 0),
µt = µ0 P t
3
question then rises: can we expect µt to converge to a certain distribution when
t goes to infinity? And if it is the case, does the long-term distribution depend
on the initial distribution µ0 ?
4
Part II
Markov Chain Mixing
Since we are interested in quantifying the speed of convergence of Markov chains,
we need to choose an appropriate metric for measuring the distance between
distributions.
The total variation distance between two probability distributions
µ and ν on Ω is defined by
.
Theses equations are called the detailed balanced equations.
Proposition 3.1. Let P be the transition matrix of a Markov chain with state
space Ω. Any distribution π satisfying the detailed balanced equations is station-
ary for P.
5
Proof. π is a stationary distribution
PN i.i.f. π = πP .PLet π̃ = πP . Then
N
for all j ∈ J1; N K , π̃j = i=1 P (xi , xj )π(xi ) = i=1 P (xj , xi )π(xj ) = π(xj )
since P is stochastic. Hence π̃ = π
Furthermore, when (2) holds,
Pπ {X0 = x0 , · · · , XN = xN } = Pπ {X0 = xN , · · · , Xn = x0 }
In other words, if a chain Xt satisfies (2) and has stationary initial distribution
, then the distribution of (X0 , X1 , · · · , XN ) is the same as the distribution of
(XN , XN −1 , · · · , X0 ). For this reason, a chain satisfying (2) is called reversible.
6
4.1 The relaxation time
For a reversible transition matrix P, we label the eigenvalues of P in decreasing
order:
1 = λ1 > λ2 ≥ · · · ≥ λ|Ω| ≥ −1.
We define λ? := max{|λ| : λ is an eigenvalue of P, λ 6= 1}
The difference γ? := 1−λ? is called the absolute spectral gap.Lemma
4.1 implies that if P is periodic and irreducible, γ? > 0. The spectral gap of a
reversible chain is defined by γ := 1 − λ2 .
The relaxation time trel of a reversible Markov chain with absolute
spectral gap γ? is defined to be
1
trel :=
γ?
Theorem 2. Let P be the transition matrix of a reversible, irreducible Markov
chain with state space Ω, and let πmin := minx∈Ω π(x). Then
1 1
(trel − 1)log( ) ≤ tmix () ≤ log( )trel
2 πmin
We will now illustrate the previous definitions with two family of ex-
amples. The first one will be a Markov chain on a cyclic group, then we will see
how it is linked to a random walk on a path.
7
Part III
Two examples of Markov chains
We decided to study the random walk on a cycle and on a segment.
0 12 0 ··· 0 12
1 1 ..
.
2 0 0
2
.
0 . . . . . . . . . . . . ..
P = . .
.. . . . . . . . . . . . 0
0 . . . . . . . . . 0 1
2
1 1
2 0 ··· 0 2 0
f (ω)
f (ω 2 )
Let f = . be an eigenfunction of P with eigenvalue λ. It
. .
f (1)
satisfies:
f (ω k−1 ) + f (ω k+1 )
∀k ∈ J0; n − 1K, λf (ω k ) = P f (ω k ) =
2
For 0 ≤ j ≤ n − 1, define φj (ω k ) := ω kj . Then
8
(a) (b)
(c) (d)
Figure 2: Matlab simulation of the random walk on the group (W5 , ·) ((a) and
(b)) and (W9 , ·) ((c) and (d)). The eigenvalues are represented in blue in the
complex plane, while the width of the red band represents the spectral gap. As
n → ∞, the spectral gap tends to 0 and relaxation time tends to ∞, as expected
from our calculations.
9
(a) projection of the simple walk on the 12-
cycle onto the real axis. We can see that(b) Projecting a random walk on the odd
for most of the projected points, except atstates of a 16-cycle gives a random walk on
the ends, there is the same probability ofthe 4-path, with holding probability of 1/2
going to the left or to the right. Notice theat the endpoints.
reflecting boundary conditions.
10
Path with reflection at endpoints
Let P be the transition matrix of simple random walk on the 2(n-1)-cycle identi-
fied with random walk on the multiplicative group W2(n−1) = {ω, ω 2 , · · · , ω 2n−1 =
1} , where ω = eπi/(n−1) . Now we choose the relation of equivalence as con-
jugation, i.e. ω k ∼ ω −k . The equivalence respects the first lemma, and now
if we identify each equivalence class with the projection of its elements on the
real axis vk = cos(πk/(n − 1)), the projected chain is a simple random walk on
the math with n vertices W ] = {v0 , v1 , · · · , vn−1 }. Note the reflecting boudary
conditions; when at v0 , it moves with probability one to v1 .
According to the previous lemma and the calculation done in the pre-
vious part, the functions fj] : W ] → R defined for all j ∈ J0; n − 1K by
πjk
fj] (vk ) = cos( )
n−1
πj
are eigenfunctions of the projected walk, associated to the eigenvalue cos( (n−1) ).
2
π −4
We have λ2 = cos(π/(n − 1)) = 1 − 2(n−1) 2 + O(n ), therefore the
−2
spectral gap is of order n and the relaxation time is of order n2 , as in the
simple random walk on the cycle.
πj(2k + 1)
fj (ω 2k+1 ) = cos( )
2n
πj(2k + 1)
fj] (uk ) = cos( )
2n
11
are eigenfunctions of the projected walk, associated to the eigenvalue cos( πj
n ).
π2 −4
We have λ2 = cos(π/n) = 1 − n2 + O(n ), therefore the spectral gap
is of order n−2 and the relaxation time is of order n2 .
(a) (b)
(c) (d)
Figure 4: Matlab simulation of the random walk on the 7-path, i.e. the projected
chain of the random walk on the 12-cycle ((a) and (b)) with reflection at the
endpoints. ((c) and (d)) represent the calculations for the random walk on the
4-path as a projection of a random walk on the "odd" states of a 16-cycle. The
eigenvalues are represented in blue in the complex plane, while the width of the
red band represents the spectral gap; the simulations confirm our calculations.
12
Remerciement
Un grand merci à Cyril Labbé qui m’a suivi de près durant cette période, tout
en me laissant le plaisir d’explorer les nombreuses facettes de cet objet math-
ématique si intéressant. J’ai le sentiment d’avoir joué les apprentis-chercheurs
pendant ces quelques mois qui ont été baignés dans la bonne humeur; merci!
References
[1] David A. Levin, Yuval Peres, Elizabeth L. Wilmer. Markov Chains and
Mixing Times. Vol 107,American Mathematical Soc., 2017.
[2] John Tsitsiklis. Lecture 17: Markov Chains II.
https://www.youtube.com/watch?v=ZulMqrvP-Pkfeature=youtu.be
MIT, 2011.
13