STAT333 Lecture Notes Book Version
STAT333 Lecture Notes Book Version
University of Waterloo
I Basic Settings
0.1 General Outline 7
0.2 Basic Settings 7
X :Ω → R
ω 7→ X(ω)
8
Definition 0.2.3 — Stochastic Processes. 1. Process is idea of change overtime
2. Stochastic is just the idea of randomness. The Etymology is from the ancient greek,
stands for “aim at", “guess". This particular vocabulary had not been used until the
beginning of the 20-th century by Khinchine. There is a political story behind this, the
note will not go into details of this.
This was later translated by Feller and Doob into English. And the rest is history.
(a) A sequence/family of random variables (simpler, so we take this as the definition)
(b) A random function (hard to formulate)
3. Formal Definition: A stochastic process {Xt }t∈T is a collection of random variables
defined on the common probability space, where T is an index set. In most cases, T
corresponds to “time", which can be either discrete or continuous.
(a) In discrete case, we rather write {Xn }n=0,1,...
Definition 0.2.4 — States. The possible values of Xt ,t ∈ T are called the states of the process.
There collection is called the state space, often denoted by S. The state space can either be
discrete or continuous. The famous example of continuous state space is the Brownian motion.
But in this course, we will be more focusing on the discrete state space so that we can relabel
the states in S into the so-called standardized state space
or
S∗ = {0, 1, 2, . . . , n} ←− Finite State Space
Example 0.1 — White Noise. Let X0 , X1 , . . . be i.i.d r.v.s following certain distribution. Then,
{Xn }n=0,1,2,... is a stochastic process. This is also known as (“White Noise").
Example 0.2 — Simple Random Walk. Let X1 , X2 , . . . be i.i.d r.v.s. For each one of them
(
P(Xi = 1) = p
P(Xi = −1) = 1 − p
define S0 = 0, for the other Si , we have
n
Sn = ∑ Xi
i=1
1. Q: Now the question is why do we need the concept of stochastic process? Why don’t we
just look at the joint distribution of (X0 , X1 , . . . , Xn )?
2. A: The joint distribution of a large number of r.v.s is very complicated, because it does not
take advantage of the special structure of T , which is time. For example, for simple random
walk, the full distribution of (S0 , S1 , . . . , Sn ) is complicated for n large. However, the structure
is actually simple if we focus on adjacent terms.
Sn+1 = Sn + Xn+1
0.2 Basic Settings 9
Randomness
Number Random Variable
Change of Time
where Sn is the last value and Xn+1 is a dirichlet type of output. Also, note that Sn and Xn+1
are actually independent by definition. By introducing time into the framework, things can
be often greatly simplified. More precisely, from this simple random walk example, we
found that for {Sn }n=0,1,2,... , if we know Sn , then the distribution of Sn+1 will be depent on
the history Si , ∀0 ≤ i ≤ n − 1. This is a very useful property and it motivates the notion of
Markov Chain.
II
STAT 333 Main Part
P(B ∩ A)
P(B|A) =
P(A)
then we have
1. Law of Total Probability:
∞
P(B) = ∑ P(B|Ai ) · P(Ai )
i=1
2. Bayes’ Rule:
P(B|Ai ) · P(Ai )
P(Ai |B) = ∞
∑ j=1 P(B|A j ) · P(A j )
The previous definition only depends on i, j, but this one depends on i, j and time n.
[Yi Shen] “Given the present state, the history and the future are independent."
The reason is
3. Matrix form:
Pi j
the row index i gives the current state and the column index j gives the next state.
Example 1.1 — Simple Random Walk Revisit. We have not yet shown simple random walk
is DTMC. Left as an exercise.
Exercise 1.1 Show that the simple random walk is a DTMC.
1.4 C-K Equation 15
|=
X1 S0
=p
|=
X1 S0
= 1− p
Example 1.2 — Ehrenfest’s Urn. Two urns A, B, total M balls. Each time, pick one ball at
random uniformly and leave it to the opposite urn.
1. Let Xn be the number of balls in A after step n.
2. The state space S = {0, . . . , M}.
3. X0 = i means i balls in A and M − i balls
in B
i
M
j = i − 1 (The ball is from A)
Pi j = P(X1 = j|X0 = i) = M−i
M j = i + 1 (The ball is from B)
0 j 6= i ± 1
this is nothing else but the conditional version of the law of total probability. Details are as
follow
Now, we have (∗) proved. We have not yet used Markov property! By Markov property, X0
|=
X2
(2)
(∗) → Pi j = ∑ P(X2 = j|X1 = k)P(X1 = k|X0 = i) = ∑ Pk j Pik = ∑ Pik Pk j = (P2 )i j
k∈S k∈S k∈S
Or we can interpret
(2)
Pi j = [P]Ti [P] j
Xm
(n+m)
Pi j = P(Xn+m = j|X0 = i)
= ∑ P(Xn+m = j|Xm = k, X0 = i) · P(Xm = k|X0 = i)
k∈S
= ∑ P(Xn+m = j|Xm = k) · P(Xm = k|X0 = i)
k∈S
(m) (n)
= ∑ Pik Pk j
k∈S
(m)
= (P · P(n) )i j → P(n+m) = P(n) P(m) ⇐= C-K Equation
P( 1) = P
P(2) = P(1) P(1) = P2
P(3) = P(1) P(2) = P3
..
.
P(n) = Pn
1.4 C-K Equation 17
while the right hand side is the n-th power of the one-step transition matrix
Pn = P · P . . . P
R
0 m m+n
i 1 j
..
(m) . (n)
Pik Pk j
..
.
1
2
1 2
2 3 5
5
1 2
1
Example 1.4 — Simple Random Walk Revisit Again. Another example for Markov Chain
visualization with a graph representation
p p p p p p
··· −2 −1 0 1 2 ···
1− p 1− p 1− p 1− p 1− p 1− p
!
P(n) =
i − − −
Definition 1.4.1 — Initial Distribution. Let µ = (µ(0), µ(1), . . . ), where µ(i) = P(X0 = i). The
row vector µ is called the initial distribution of this Markov Chain.
Definition 1.4.2 — µn . Similarly, we can define µn = (µn (0), µn (1), . . . ) to be the distribution
of Xn , where µn (i) = P(Xn = i). In this notation, we think of this as a row vector representing
a distribution. Sometimes, written as µn (Xn = i). While in this notation, we think of it as a
probability.
R We note that µ = µ0 .
Proposition 1.4.2 — Property of µn . The row vector µn represents a distribution, hence we have
1. µn (i) ≥ 0, ∀i
2.
0
∑ µn (i) = 1 = µn · I
i∈S
Q: µn =? given µ and P
µn = µPn
µn ( j) = P(Xn = j)
= ∑ P(Xn = j|X0 = i)P(X0 = i) Law of Total Probability
i∈S
(n)
= ∑ µ(i)Pi j A Bunch of Definitions
i∈S
h i
= µP(n)
j
h i
(n)
= µP C-K
j
Thus,
µn = µPn
P(X = x,Y = y)
P(X = x|Y = y) = , ∀x ∈ A§ and P(Y = y) 6= 0
P(Y = y)
R Since the conditional pmf is a legitimate pmf, the conditional distribution is also a legitimate
probability distribution. (Potentially different from the unconditional distribution) As a result,
we can define expectation on the conditional distribution.
Definition 1.5.2 — Conditional Expectation. Let X,Y be discrete random variables and g is
a continuous function on X. Then, the conditional expectation of g(X) given Y = y is defined as
R Fixed y, the conditional expectation is nothing else but the expectation under the conditional
distribution.
E (g(X)|Y ) = h(Y )
Proof. Due to independence, the conditional distribution is the same as the unconditional
distribution
4. Law of Iterated Expectation
E (E (X|Y )) = E (X)
Alert: E (X|Y ) is a random variable, as a function of Y
Proof. Discrete case:
when Y = y j , then random variable
E (X|Y ) = E (X|Y = y j ) = ∑ xi P(X = xi |Y = y j )
xi
= ∑ xi ∑ P(X = xi |Y = y j )P(Y = y j )
xi yj
Example 1.5 Let Y be the number of claims received by an insurance company, X is some
random parameter, such that
Y |X ∼ Poi(X), X ∼ Exp(λ )
what is E (Y )?
Solution: E (Y ) = E (E (Y |X)), since Y |X ∼ Poi(X), then E (Y |X = x) = x → E (Y |X) = X.
Thus, E (Y ) = E (E (Y |X)) = E (X) = λ1
1. Method 1:
E ( f (Xn )) = ∑ f (i)P(Xn = i)
i∈S
= ∑ f (i)µn (i)
i∈S
0
= µn f
where
f (0)
f (1)
0 ..
f = .
f (i)
..
.
is the column vector giving all the values of f on different states. Then,
0
E ( f (Xn )) = µn f
0
= µPn f
where
(a) µ is the row vector, initial distribution
(b) Pn is the transition matrix
(c) f 0 is the column vector, function of states
0
R [what happens if we calculate Pn f first?] This corresponds to finding E ( f (Xn )|X0 = i) , i ∈
S first
1.5 Expected Value of A Function of Xn 23
2. Method 2:
E ( f (Xn )) = E (E ( f (Xn )|X0 ))
= ∑ E ( f (Xn )|X0 = i) P(X0 = i)
i∈S
= ∑ E ( f (Xn )|X0 = i) µ(i)
i∈S
= ∑ f (n) (i)µ(i)
i∈S
f (n) (0)
f (n) (1)
0 .
= µ f (n) =µ .
.
f (n) (i)
..
.
0
= µPn f
= ∑ Pinj f ( j)
j
0
= Pn f
i
0 0
Thus, f (n) = Pn f
µ1 = µP = πP = π = µ
the distribution of X2 :
µ2 = µP2 = πPP = π = µ
Thus, µn = µ. If the markov chain starts from a stationary distribution, its distribution will never
change (stationary/invariant).
Example 1.6 An electron has two states: ground state |0i and excited state|1i. Let Xn = {0, 1}
be the state at time n. At each step, the electron changes state with probability α if it is in state |0i,
with probability β if it is in state |1i.
Then, {Xn } is a DTMC. Its transitional matrix is
0 1
0 1−α α
1 β 1−β
Solution: π = πP
1−α α
π0 π1 = π0 π1
β 1−β
(
π0 (1 − α) + π1 β = π0 (1)
→
π0 α + π1 (1 − β ) = π1 (2)
Even though we have 2 equations with 2 unknowns. However, note that they are not linearly
independent. Subtracting (1) from indentity π0 + π1 = π1 + π0 , gives (2). Hence, (2) is redundant.
From (1), we have απ0 = β π1 → ππ01 = αβ . Now, we need π0 + π1 = 1. Thus,
β α
π0 = π1 =
α +β α +β
β α
Thus, there exists a unique stationary distribution π = α+β α+β
Ty = min {n ≥ 1 : Xn = y}
Definition 1.7.2 — Recurrent & Transient State. A state y ∈ S is called recurrent, if ρyy = 1,
which means always return to y.
It is called transient if ρyy < 1, which means there is a pass probability that the chain never visits
y again.
R Note that
1 − ρyy = Py (Py = ∞) > 0
01 1
1
2
0 2 2
P= 1 1 1
2 2
2 1
1 1
2 2 1
0 1 2
1 1
2 2
1
P(X1 = 0|X0 = 0) = P(X1 = 1|X0 = 0) =
2
Note that X1 = 0 means T0 = 1 and X1 = 1 means T0 = ∞ since 1 and 2 will never go to 0. This
analysis implies that
1
ρ00 = P0 (T0 < ∞) = < 1
2
Thus, by definition, state 0 is transient. By similar deduction, we have state 1 is also transient.
Given X0 = 2, we have
Definition 1.7.3 — x communicates to y. Let x, y ∈ S (x, y can be the same state). x is said to
communicate to y, denoted by x → y, means it starting from x, the probability that the chain
eventually (re)visits state y is positive. i.e,
n
∃n ≥ 1, 3 Pxy >0
or say “x can go to y".
M PN just specifies one M + N-long path from x to z via y. While the quantity
this is true since Pxy yz
M+N
Pxz captures all possible paths.
Where Pxy1 · Py1 y2 · · · · · Pyκ−1 y is the path going from x to y without returning to x and (1 − Pyx )
corresponds to the idea of once in y, never goes back to x. This quantity is larger than 0 since
ρyx < 1. These two together describes one way not to visit x ever again via y. Thus,
Corollary 1.8.2 If x is recurrent, and ρxy > 0, then ρyx = 1. (Result of contrapositive)
“States in the same class communicate with each other, states in different classes
do not communicates in both ways"
Example 1.8
0 1 2 3
0 1
2
1
2
1 21 1
2
1 1 1
2 4 4 0 2
3 1
Note that
0 1
2 3
0 1
2 3
Note that ρ01 , ρ12 , ρ20 > 0 =⇒ 0,1,2 are in the same class.
ρ23 , ρ32 > 0 =⇒ 2,3 are in the same class. Then, by transivity, we have 0, 1, 2, 3 are all in the same
class.
Definition 1.8.2 — Irreducile Markov Chain. A Markov Chain is called irreducible if all the
states are in the same class. In other words, i ↔ j, for all i, j ∈ S.
A set B is called irreducible if i ↔ j, ∀i, j ∈ B.
0 1
2 3
0 1
2 3
R As a result of the theorem. We can call a class recurrent/transient, if all its states are
recurrent/transient.
(This is equivalent to one state in the class if recurrent/transient.)
In order to decide if a class is recurrent/transient, we just need to check an element is
recurrent/transient is enough.
Figure 1.8.3: Once get into the outer class, cannot get out
Theorem 1.8.4 — Decomposition of the State Space. The state space S can be written as a
disjoint union
S = T ∪ R1 ∪ R2 ∪ . . .
where T is the set of all transient states (T is not necessarily one class), and Ri , i = 1, 2, . . . are
closed recurrent classes. Equivalently, in other textbook, we can say irreducible sets of recurrent
states.
(This proof is trivial if we can recognize communicating classes are equivalence classes with the
equivalence relation ↔, two-way communication)
Proof. First wee collect all the transient states and put them into T . For each recurrent states, note
that it must belong to at least one class, since it communicates with itself. We collect one class
for each state, remove the identical classes, and get {Rk }k=1,2,... . Also, since recurrence is a class
property, Rk , k = 1, 2, . . . are all recurrent classes. By construction, we have
S = T ∪ R1 ∪ R2 ∪ . . .
30 Chapter 1. (Discrete-Time) Markov Chains
/ ∀i 6= j ∈ {1, 2, . . . }.
1. Disjoint: We still need to show Ri ∩ R j = 0,
For the sake of contradiction, suppose ∃i ∈ S and i ∈ Rk , i ∈ Rk0 . But then, this means
∀ j ∈ Rk , j0 ∈ Rk0 , we have
i ↔ j, i ↔ j0 =⇒ j0 ↔ j0
Thus, j, j0 are in the same class. Since j and j0 are arbitrary states in Rk and Rk0 . We must
have Rk = Rk0 contradicting the construction that Rk 6= Rk0 for k 6= k0 .
2. Closeness: for the sake of contradiction, say Rk is not closed, then there exists i ∈ Rk and
j 6∈ Rk such that Pi j > 0 → i → j. Or equivalently, we can say ρi j > 0. As i 6∈ Rk , i 6↔ j, which
implies that j 6→ i, then ρ ji = 0 < 1. Thus, i is a transient state, which yields a contradiction.
0 1
2 3
0 1
2 3
Theorem
1.8.5 — Strong Markov Property for (time-homogeneous) MC. The process
XTy +k k=0,1,2,...
behaves like the MC with initial state y. (forget about the history and restart at
state y)
the LHS means revisiting y for at least k times, while the ρyy means revisit y for the first time.
Two Possibilities:
k → 0 as k → ∞, where
1. ρyy < 1 and y is transient, then ρyy
Indeed, we know more. Let N(y) be the total number of visits to state y. Then,
k+1
=⇒ Py (N(y) ≥ k + 1) = ρyy
k+1
=⇒ Py (N(y) ≤ k) = 1 − ρyy
then
ρyy = 1 =⇒ Ey N(y) = ∞
ρyy < 1 =⇒ Ey N(y) < ∞
Proof.
Lemma 1.10
∞
Ex N(y) = ∑ Pxyn
n=1
Proof.
∞ ∞
N(y) = ∑ 1{X =y} =⇒ Ex N(y) = ∑ Ex (1{X =y} )
n n
n=1 n=1
∞ ∞
= ∑ Px (Xn = y) = ∑ Pxyn
n=1 n=1
ρyy = 1 =⇒ Ey N(y) = ∞
ρyy < 1 =⇒ Ey N(y) < ∞
Proof. Now, we can do this proof. Suppose x, y are in the same class, which means x ↔ y and x is
recurrent. Since x → y and y → x, there exists M, NN such that
(M) (N)
Pxy > 0 Pyx > 0
Note that
M+N+k (N) (k) (M)
P(XM+N+k = y|X0 = y) = Pyy ≥ Pyx Pxx Pxy
= P(XM+N+k = y, XN+k = x, Xn = x|X0 = y)
∞ ∞ ∞
(l) (l)
=⇒ ∑ Pyy ≥ ∑ Pyy = ∑ PyyM+N+k Let k = l − M − N
l=1 l=M+N+1 k=1
∞ ∞
(N) (k) (M) (N) (M) k
≥ ∑ Pyx Pxx Pxy = Pyx Pxy Pxx ∑ =∞ x is recurrent
k=1 k=1
Thus, y is recurrent. Therefore, x recurrent implies y recurrent and recurrence is a class property.
This also implies that transience is a class property as well.
Lemma 1.11 In a finite closed set, there has to be at least one recurrent state.
Proof. For the sake of contradiction, suppose all states in C are transient. Then, for any two x, y ∈ C
ρxy
Ex (N(y)) = <∞
1 − ρyy
=⇒ ∑ Ex (N(y)) < ∞
y∈C
However,
!
∑ Ex (N(y)) = Ex ∑ N(y)
y∈C y∈C
!
∞
= Ex ∑ ∑ 1{X =y} n
y∈C n=1
!
∞
= Ex ∑ ∑ 1{X =y} n
n=1 y∈C
For each n, exactly one indicator takes value 1, the rest ones are 0, this implies
∑ 1{X =y} ) = 1
n
y∈C
then, we have ! !
∞ ∞
Ex ∑ ∑ 1{X =y} n
= Ex ∑1 =∞
n=1 y∈C n=1
This yields a contradiction. Hence, there must exist at least one recurrent state
Combine this result with the fact that recurrence/transience are class properties, we have
34 Chapter 1. (Discrete-Time) Markov Chains
Theorem 1.11.1 A finite closed class must be recurrent. In particular, an irreducible Markov
Chain with finite state space must be recurrent.
Theorem 1.12.1 Let {Xn }n=0,1,2,... be an irreducible and recurrent DTMC with transition matrix
P. Let x ∈ S and T x := min {n ≥ 1 : Xn = x}, then
∞
µx (y) = ∑ Px (Xn = y, Tx > n), y ∈ S
n=0
then,
= Px (Tx = n + 1)
Thus,
∞ ∞
(µx P)(x) = ∑ ∑ P̄xyn Pyx = ∑ Px (Tx = n + 1) = 1 By total probability and recurrence
n=0 y n=0
∞
= µx (x) = ∑ Px (Xn = x, Tx > n)
n=0
The last line is true since the probability will be 1 only when n = 0, 0 for all other n.
Combine part 1 and 2, we have
1 = µx (x) = (µx Pn )x
= ∑ µx (z)Pzxn
z
n
≥ µx (y)Pyx
n > 0. This
Since we have irreducibility which implies y → x. Then, there exists n such that Pyx
implies that µx (y) < ∞.
Second, recall that the chain can visit y before returning to x is
R Note that
∞
µx (y) = ∑ Px (Xn = y, T x > n)
n=0
∞ Tx −1
= ∑ Ex 1 1
{Xn =y} {Tx >n} = Ex ∑ 1{Xn =y}
n=0 n=0
= Ex (number of visits to y before returning to x)
n
d(x) := gcd {n ≥ 1 : Pxx > 0}
R Note that we are taking the gcd of the steps when the probability of state x going back to x is
not 0. There is no guarantee for going-back.
36 Chapter 1. (Discrete-Time) Markov Chains
Definition 1.13.2 — Aperiodic. If x has period 1, then we say x is aperiodic. If all
states in a MC is aperiodic, then we call this MC is aperiodic.
If Pxx > 0, then x is obviously aperiodic. But the converse is not true, x is aperiodic does
not imply Pxx > 0.
If x is aperiodic, we do not necessarily have Pxx > 0.
3 > 0 and P2 > 0 means d(0) = 1. However, P = 0.
Example 1.14 Note that P00 00 00
1 2
n
n
is the number of ways to get n2 steps to the right, and 12 is the probability of a
where n/2
particular ordering with n2 steps to the right. By this analysis, we have that
d(0) = 2
p p p p p p
··· −2 −1 0 1 2 ···
1− p 1− p 1− p 1− p 1− p 1− p
1
Figure 1.13.2: Random Walk Visualization with p = 2
x → y, y → x =⇒ d(x) = d(y)
then
m+n m n
Pxx ≥ Pxy Pyx > 0
1.15 Main Theorems 37
l > 0, we have
Moreover, for any l such that Pyy
m+n+l m l n
Pxx ≥ Pxy Pyy Pyx > 0
As a result,
d(x)|m + n d(x)|m + n + l
thus,
d(x)|l
n o
l
l ≥ 1 : Pyy >0
d(x) = d(y)
as desired.
n
Pxy −→n→∞ π(y), ∀x, y ∈ S
“The limiting transition probability, hence also the limiting distribution, does not
depend on where we start. (Under the conditions of I,A,S)”
Or we can write
lim Pn = π(y), ∀x, y ∈ S =⇒ lim P(Xn = y) = π(y)
n→∞ xy n→∞
Proof. To show this convergence theorem is true, we need to prove a lemma first.
Lemma 1.16 If there exists a stationary distribution π such that π(y) > 0, then y is recurrent.
38 Chapter 1. (Discrete-Time) Markov Chains
Proof. Assume the DTMC {Xn }n=0,1,... starts from the stationary distribution π. This means that
P(Xn = y) = π(y), ∀n = 0, 1, . . .
∞
∞= ∑ P(Xn = y)
n=1
∞
= ∑ E(1{X =y} ) n
n=1
!
∞
=E ∑ 1{X =y} n
n=1
= E(N(y))
= ∑ Ex N(y)π(x)
x∈S
ρxy
= ∑ π(x) 1 − ρyy
x∈S
1
≤ ∑ π(x) 1 − ρyy
x∈S
1
= =
1 − ρyy
Thus,
ρyy = 1
This means the state y is recurrent.
By taking the contrapositive the lemma above, we have the following corollaries
Corollary 1.16.2
I and S =⇒ R
Alright, back to the Convergence Theorem Proof. The proof is freaking long, but the main idea is
“Coupling"
Consider two independent DTMCs {Xn }n=0,1,... and {Yn }n=0,1,... , both having the same transition
matrix P with arbitrary initial distributions.
It is easy to show that the pairs Zn = (Xn ,Yn ) can also result in a DTMC with transition matrix
P̄(x1 ,y1 ),(x2 ,y2 ) = Px1 ,x2 Py1 ,y2
Next, we show that under I and A, {Zn }n=1,... is also inreducible. Consider the following lemma
n > 0 for all n ≥ n .
Lemma 1.17 If y is aperiodic, then there exists n0 such that Pyy 0
Proof. We will use a fact in number theory, as a corollary of Bezout’s Lemma. It states that
If we have a set of coprime numbers I, then there are integers i1 , . . . , im from this I and
a n0 , such that for any n ≥ n0 , n can be written as
n = a1 i1 + a2 i2 + · · · + an in
where ai are positive integers.
1.15 Main Theorems 39
n > 0 , by aperiodicity of y, we know that all elements in I are coprime. By
Here, I = n ≥ 1 : Pyy
the Bezout’s Corollary, there exists n0 ∈ N such that n ≥ n0 ,
n i1 i1 i1 i2 i2 i2 im im im
Pyy ≥ Pyy Pyy . . . Pyy · Pyy Pyy . . . Pyy . . . Pyy Pyy . . . Pyy >0
| {z } | {z } | {z }
a1 terms a2 terms am terms
Since we know that {Xn } is irreducible, we have for any x1 , x2 there exists K such that
Since the DTMCs are aperiodic, we can apply the above lemma, and take M large enough such that
Pxm2 ,x2 > 0 and Pym2 ,y2 > 0 for any m ≤ M. In particular, for any m ≥ M + max K, L, then
Since this holds for any (x1 , x2 ), (y1 , y2 ). We have {Zn }n=0,1,... is irreducible. Moreover, we want to
show {Zn }n=0,1,... is recurrent. To see this, note that π̄ given by π̄(x, y) = π(x)π(y) is a stationary
distribution. Take x such that π(x) > 0, then π̄(x, x) > 0. By the first lemma within this proof, we
have (x, x) as a recurrent state. And since {Zn } is irreducible, then it must be recurrent as well.
Now, define T = min {n ≥ 0 : Xn = Yn }, this is the first time that the two chains meet and
T ≤ V (x, x) < ∞
with certainty.
Recall that if i is recurrent and ρi j > 0, then ρ ji = 1.
Finally, we have proved that
“The two independent Markov Chains will eventually meet”
For any state y ∈ S, discuss the values of T and XT , we have
n
P(Xn = y, T ≤ n) = ∑ ∑ P(T = m, Xm = x, Xn = y)
m=0 x∈S
n
= ∑ ∑ P(T = m, Xm = x)P(Xn = y|Xm = x) Markov Property
m=0 x
n
= ∑ ∑ P(T = m,Ym = x)P(Yn = y|Ym = x)
m=0 y
= P(Yn = y, T ≤ n)
40 Chapter 1. (Discrete-Time) Markov Chains
Recall that we have the freedom to choose the initial distributions of the {Xn } and {Yn }. Take
X0 = x and Y0 ∼ π. Then,
n
Pxy − π(y) = |P(Xn = y) − π(y)| = |P(Xn = y) − P(Yn = y)| −n→∞
−−→ 0
Thus, we have
n n→∞
Pxy −−−→ π(y)
Theorem 1.17.1 — Existence of Stationary Measure. Suppose I, R, then there exists a sta-
tionary measure µ ∗ with 0 < µ ∗ < ∞, ∀x ∈ S.
where Ey (Ty ) is the expected revisit time to y given that we start with y, which is also the
“expected cycle length”.
(1) (2)
Proof. We can chop the time line into difference cycles. Let Ty , Ty , . . . bee the time that the
(2) (1) (3) (2)
chain (re)visits y after time 0. By Strong Markov Property, Ty − Ty , Ty − Ty , . . . are i.i.d
···
(0) (2) (4)
Ty Ty Ty
Theorem 1.17.3 — Strong Law of Large Number. For X1 , X2 , . . . i.i.d r.v.s, then
∑ni=1 Xi
→
− E(Xi )
n a.s
we have
(i+1) (i)
∑k−1 Ty − Ty
=⇒ i=1
k − 1
(i+1) (i)
=⇒ E Ty − Ty = Ey (Ty )
With negligible changes (THIS IS VERY ANNOYING)
(k)
Ty n→∞
=⇒ −−−→ Ey (Ty )
k
(Nn (y)) (Nn (y)+1)
Observe that Ty ≤ n ≤ Ty , then
(N (y))
Ty n
(N (y)) (N (y))+1 =⇒ Ey (Ty )
Ty n n Ty n Nn (y) + 1 n→∞ Nn (y)
≤ ≤ −−−→ (N (y))+1
Nn (y) Nn (y) Nn (y) + 1 Nn (y) Ty n Nn (y) + 1
−→ Ey (Ty )
Nn (y) + 1 Nn (y)
By Squeeze Theorem, we have
n Nn (y) n→∞ 1
lim = Ey (Ty ) =⇒ −−−→
n→∞ Nn (y) n Ey (Ty )
1
π(y) =
Ey (Ty )
Proof. As I, S implies R, we can apply the result above, with the initial distribution being π. Taking
expectation on both sides,
Nn (y) 1
E −→
n Ey (Ty )
This is a result from Dominant Convergence Theorem (DCT) from real analysis. Then, note that
!
n
E(Nn (y)) = E ∑ 1{X =y} m
m=1
n
= ∑E 1{Xm =y} By DCT
m=1
n
= ∑ P(Xm = y)
m=1
Thus,
1
π(y) =
E(Ty )
n Nn (y) 1
π(y) = lim Pxy = lim =
n→∞ n→∞ n Ey (Ty )
R We don’t need R actually, but we have not yet shown that relationship.
1 n
lim
n→∞ n
∑ f (Xm ) = ∑ f (x)π(x) = π f 0
m=1 x
“This is pretty much telling you that the stationary distribution exists and it is unique.”
Proof. Say π is a distribution, then at π(y) > 0 for some y ∈ S. Since the Markov Chain is
n > 0, then
irreducible, we know that y → x for all x ∈ S. It means that there exists nN such that Pyx
π = πPn n
hence,
π(x) = ∑ π(z)Pzxn
z∈S
n
≥ π(y)Pyx >0
1.15 Main Theorems 43
Thus,
Ex (NTx (y))
= π(y)
Ex (Tx )
1
while Ex (Tx ) = π(x) , we have
π(y)
Ex (NTx (y)) =
π(x)
Proposition 1.18.1 If I, then the stationary distribution is unique if it exists.
T0 T2 T4
44 Chapter 1. (Discrete-Time) Markov Chains
R
1. Reminder: f 0 is the transpose of f and f 0 is a column vector. Thus, π f 0 is the dot
product.
2. Interpretation: where ∑nm=1 f (Xm ) is the total of rewards/costs up to time n based on
f . Then, we can interpret 1n ∑nm=1 f (Xm ) as the average rewards/costs up to time n per
step. Then, the whole left-hand side limn→∞ n1 ∑nm=1 f (Xm ) can be interpreted as the
long-run average rewards/costs per step
Example 1.16
0 1 2 3
0 0.1 0.2 0.4 0.3
P= 1 0.1 0.2 0.4 0.3
2 0.3 0.4 0.3 0
3 0.1 0.2 0.4 0.3
R Typically, we want to solve for the stationary distribution first for questions like this.
lim Pn
n→∞ xy
x x
Flow x → y should be the flow y → x
Theorem 1.20.2 If {Xm } starts from a stationary distribution π satisfying π(i) > 0 for any i ∈ S,
then its reversed process {Ym } is a DTMC with transition matrix given by
Since {Xm } starts from a stationary distribution P(Xn−(m+1) = im+1 ) = π(im+1 ) and P(Xn−m =
im ) = π(im ). Now, we have
P(Ym+1 = im+1 |Ym = im , . . . ,Y0 = i0 )
π(im+1 )Pim+1 ,im
=
π(im )
This shows:
1. This transitional probability does not depend on the history im−1 , . . . , i0 . Hence, {Ym }nm=0 is a
DTMC
2. The transition probability is given by
π( j)Pji
Pˆi j = P(Ym+1 = j|Ym = i) =
π(i)
R
1. We note this theorem means that the reversability requires the stationary distribution to
exist
2. We can check that P̂ = P̂i j i, j∈S is actually a valid transition matrix
π( j)Pji
Pˆi j = ≥0
π(i)
and
∑ j∈S π( j)Pji (πP)i π(i)
∑ Pˆi j = π(i)
=
π(i)
=
π(i)
=1
j∈S
R This is much stronger than reversability, not all reversable DTMC is time-reversable. But the
other way around is clearly true. This is somehow related to the detailed balance condition in
an intuitive way. Since the detail balance condition provides a two-way transition for each
state, which provides the “time-reversability”.
Proposition 1.20.3 A DTMC {Xm }m=0,1,... is time-reversable if and only if it satisfies the detailed
balance condition.
Proof. 1. Assume Detailed Balance Condition: we have stationarity for free. Say {Xm } starts
from π and π(i)Pi j = π( j)Pji . Then, Y0 = Xn starts from π and the transition matrix
π( j)Pji
Pˆi j = = Pi j , ∀i, j ∈ S
π(i)
Note that the starting stationary distribution and their transition matrices are identical, then
we must have them as identical DTMC with same distributions.
2. Assume Time-Reversability: by definition, we have X0 adn Xn = Y0 have the same distribu-
tion for all n. Then, X0 follows a stationary distribution.
π( j)Pji
Pi j = Pˆi j = =⇒ π(i)Pi j = π( j)Pji , ∀i, j ∈ S
π(i)
Thus, we have detailed balance condition.
48 Chapter 1. (Discrete-Time) Markov Chains
• Start an irreducible DTMC with transition matrix Q = {Qxy }x,y∈S and certain initial
distribution (typically, an initial state)
• In each time,
1. Propose a move from the current state x to state y ∈ S according to probability Qxy
2. Accept this move with probability
π(y)Qyx
rxy = min ,1
π(x)Qxy
“Starting in a state in C, what is the probability that the chain exits C by entering A or
B.”
Mathematical Formulation
We define VA = min {n ≥ 0 : Xn ∈ A} and VB = min {n ≥ 0, Xn ∈ B}. Then, what is Px (VA < VB )?
1.21 Exit Distribution 49
Example 1.19
1 2 3 4
1 0.25 0.6 0 0.15
P= 2 0 0.2 0.7 0.1
3 1
4 1
We think the entry P33 = 1 and P44 = 1 are not that important since we only care about the chain
before going to 3 or 4.
Let h(1) = P1 (V3 < V4 ), h(2) = P2 (V3 < V4 ). Discuss the first-step
similarly,
h(2) = 0.2h(2) + 0.7
h(a) = 1, a ∈ A h(b) = 0, b ∈ B
50 Chapter 1. (Discrete-Time) Markov Chains
= ∑ Pxy h(y)
y∈S
then,
I · h0 = Qh0 + R0A
0 =⇒ h0 = (I − Q)−1 R0A
(I − Q)h = R0A
is unique as long as I − Q is invertible. Note that
C A B
C Q | R
−− −− −−
A I
0
B I
Since for PX (VA < VB ), we only need to observe the chain before it hits A or B, the change of the
rows in P corresponding to A and B will not change the result of this problem.
By doing this change, A and B are now absorbing, and all the states in C becomes transient! (since
Px (VA ∧VB < ∞) > 0).
R Hence, the “exit distributions/exit times" are also expressed as “absorption probability/absorption
time".
To show I − Q is invertible, note that since the states in C are transient (in P0 )
0 = lim Px (Xn0 ∈ C)
n→∞
n
= lim ∑ P0 xy
n→∞
y∈C
= lim
n→∞
∑ (Qn )xy
y∈C
1.21 Exit Distribution 51
The last equality holds because of the block structure of P0 , recall that
0Q R
P =
0 I
This corresponds to the fact that in order to have Xn0 ∈ C, we must have X00 , . . . , Xn−1
0 ∈ C which
implies that
lim Qn = 0
n→∞
as the zero matrix. Then, all the eigenvalues of Q have norm smaller than 1. Thus, there does not
exist a non-zero f 0 such that
I · f 0 = f 0 = Q f 0 ⇐⇒ (I − Q) f 0 = 0
R
1. We see that the function h in the above theorem satisifies
h is called harmonic in A ⊆ S, if
h is called harmonic if
0
h0 = Ph0 = h(1)
h(0)
h(x) = ∑ Pxy h(y) = Ex (h(X1 )) = h(1) (x), ∀x ∈ S ⇐⇒
h(1)
y
.
..
C A B
C Q | R
∑y∈A Px1 ,y P(X ∈ A|X0 = x1 )
−− −− −−
R0A = ∑y∈A Px2 ,y = P(X ∈ A|X0 = x2 )
A | .. ..
B | . .
1 2 3 4
1 0.25 0.6 0 0.15
P= 2 0 0.2 0.7 0.1
3 1
4 1
First-step Analysis
1.
g(1) = E(VA |X0 = 1)
4
= ∑ E(VA |X1 = x, X0 = 1)P(X1 = x|X0 = 1)
x=1
g(1) + 1 x=1
g(2) + 1 x=2
E(VA |X1 = x, X0 = 1) =
1 x=3
1 x=4
Note that this “1" comes from the time passed.
g(1) = 0.25(g(1) + 1) + 0.6(g(2) + 1) + 0.15 × 1
=⇒
= 1 + 0.25g(1) + 0.6g(2) [1]
2. Similarly, we have
g(2) = 1 + 0.2g(2) [2]
We can solve for g(1), g(2) with [1], [2]. Then,
7 5
(g(1), g(2)) = ,
3 4
Thus, starting from the state 1, the expected time until the chain reaches 3 or 3 = 4 is 73 . If starting
from state 2, the expected time until the chain reaches 3 or 3 = 4 is 54 .
1.22 Exit Time 53
= 1 + ∑ Pxy g(y)
y∈C
means
g0 = 10 + Qg0
g(x1 ) 1 g(x1 )
g(x2 ) 1
= + Q g(x2 )
.. .. ..
. . .
where
C A
C Q | R !
−− | −−
A |
Then,
Ig0 = 10 + Qg0
(I − Q)g0 = 10
g0 = (I − Q)−1 10
We are looking at exactly the same matrix I − Q as in the exit distribution part. By Theorem
1.21.1, we have I − Q is invertible, g0 is the unique solution.
R
1. g0 = |C|× column vector
2. Q = |C| × |c| matrix
3. 10 = |C| × 1 column vector
54 Chapter 1. (Discrete-Time) Markov Chains
Definition 1.23.1 — Positive Recurrent and Null Recurrent. A state x is called positive
recurrent if Ex (Tx ) < ∞ (recall Tx = min {n ≥ 1 : Xn = X}).
A recurrent state x is called null recurrent, if Ex (Tx ) = ∞.
R Recall that recurrence means P(Tx < ∞) = 1 and transient means P(Tx = ∞) > 0. The
classification should be as follow:
Category Subcategory
Recurrent Positive Recurrent
Recurrent Null Recurrent
Transient Transient
Example 1.21 — St. Petersburg paradox. A random variable which is finite with probability
1, but has infinite mean. Let X = 2n with probability 2−n for n = 1, 2, . . . . Then,
∞
∑ 2−n = 1 =⇒ P(X = ∞) = 1
n=1
But
1 1 1
E(X) = 2 × + 4 × + 8 × + · · · = 1 + 1 + · · · + ∞
2 4 8
Proof. 1. 3 → 1 : TRIVIAL!!!!!!!!
2. 1 → 2 : Let x be a positive recurrent state. By Theorem 1.12.1, we know that a recurrent state
can give us a stationary measure
∞
µx (y) = ∑ Px (Xn = y|Tx > n), y ∈ S
n=0
µy µx (y) < ∞
1.23 Infinite State Space 55
Note that
∞
∑ µ(y) = ∑ ∑ Px (Xn = y|Tx > n)
y∈S y∈S n=0
∞
=∑ ∑ Ex (1X =y · · · 1T >n )
n x
y∈S n=0
!
∞
= Ex ∑ 1T >n ∑ 1X =y
x n
n=0 y∈S
!
∞
= Ex ∑ 1T >n
x
n=0
= Ex (Tx ) < ∞ 1Tx >n = 1, n = 0, 1, . . . , n − 1
µ(y)
=⇒ π(y) =
Ex (Tx )
Corollary 1.23.2 Positive recurrence and null recurrence are class properties.
and
x ↔ y =⇒ (x is null recurrent ⇐⇒ y is null recurrent)
Proof. This is just a sketch of the proof. Let x be a positive recurrent state and C be the commu-
nicating class containing x. Since C is recurrent, it is closed. Hence, {Xn }C is a DTMC and its
transition matrix is given by Pi j i, j∈C .
C Cc
C PC 0
Cc
The top right block is 0 since C is closed. Thus, PC is a transition matrix. This restricted Markov
Chain is irreducible. By the Theorem above, x is positive recurrent, will imply all the states in C
are positive recurrent. (Note that for every y ∈ C,
Ey (Ty )P = Ey (Ty )P|
C
since both positive recurrence and recurrence are closed property, so is null recurrence.
56 Chapter 1. (Discrete-Time) Markov Chains
Corollary 1.23.3 A state x is positive recurrent if and only if there exists a stationary distribution
π such that π(x) > 0.
Proof. Note that for both directions, we have x is recurrent. Hence, it suffices to prove the result
for the case when the chain is irreducible (otherwise, we can consider the restricted chain as the
cloased class containing x)
1. =⇒: From a previous result, we know that
∞
µ(y) = ∑ Px (Xn = y, Tx > n), y ∈ S
n=0
Corollary 1.23.4 A DTMC with finite state space must have at least one positive recurrent
state.
is a stationary distribution. Thus, by the theorem above, one state is positive recurrent.
Corollary 1.23.5 A DTMC with finite state space does not have null recurrent state.
R Why is this? consider a nulll recurrent class. Since it is recurrent, it is closed. This implies
that the class forms an irreducible Markov Chain. Assume the state space (hence the class
only have finite states, then by the previous corollary, we have a positive recurrent state, which
contradicts the class being null recurrent. The corollay holds automatically.
Moreover, this tells us that
The intution is
1 Nn (x)
= lim
Ex (Tx ) n→∞ n
if we have Ex (Tx ) = ∞, this is saying that the long run fraction to each state is 0. This will
not be possible with only finitely many states.
S=Z
Pi,i+1 = p p ∈ (0, 1)
Pi,i−1 = 1 − p = q
p p p p p p
··· −2 −1 0 1 2 ···
1− p 1− p 1− p 1− p 1− p 1− p
1. Irreducible
2. The period is 2
3. Fact: The simple random walk transient for p 6= 12 , null recurrent for p = 12 .
Xn = Y1 + · · · +Yn
Then,
E(Yn ) = 1 · p + (−1)(1 − p) = 2p − 1 > 0
Xn 1 n a.s.
= ∑ Ym −−→ E(Y1 ) = 2p − 1
n n m=1
as n → ∞. Thus,
n→∞
Xn −−−→ ∞
Thus, for any state i ≥ 0 (in particular, state 0), there is a last visit time to i. This implies that
0 is transient and {Xn } is transient.
58 Chapter 1. (Discrete-Time) Markov Chains
4 y
x
1 2 3 4 5 6 7
−2
−4
(n)
2. Case 2: p = 12 , recall that a state i is recurrent, if and only if ∑∞
n=0 Pii = ∞. For state 0, we
have
2n
P00 = P(n stes to the left, n steps to the right)
n n
2n 1 1
=
2n+1
P00 =0 period is 2 n 2 2
n to the left n to the right
n
2n 1
=
n 4
Thus,
When p = 12 , there does not exist a stationary distribution.
Try to solve πP = π
1.23 Infinite State Space 59
1 1
π(i) = π(i − 1) + π(i + 1)
2 2
π(i + 1) − π(i) = π(i) − π(i − 1)
Since this holds for all i ∈ Z, {π(i)} is an arithmetic series. The general form is
also, π(i) ∈ [0, 1], ∀i. This will force a = 0. However, this implies π(i) = π(0), ∀i.
π(i)
2
i
−3 −2 −1 1 2 3
−2
1
Figure 1.23.3: Simple Random Walk π(i) with p = 2
Example 1.23 — Simple Random Walk with A Reflecting Barrier. A random walk with a
reflecting barrier.
S = Z+ = {0, 1, 2, . . . }
and Pi,i+1 = p, Pi,i−1 = 1 − p, i ≥ 1, P01 = 1. When p < 12 , such a chain is positive recurrent.
1 p p p p
0 1 2 3 4 ···
1− p 1− p 1− p 1− p 1− p
To see this, we solve for the stationary distribution. Only Pi,i+1 and Pi,i−1 are non-zero, we can use
the detailed balanced condition (tridiagonal transition matrix). Then,
1
π(0) · 1 = π(1)(1 − P) =⇒ π(1) = π(0)
1− p
p
π(i) · p = π(i + 1)(1 − p), i = 1, 2, . . . =⇒ π(i + 1) = π(i)
1− p
2
p p
π(i) = π(i − 1) = π(i − 2) = · · ·
1− p 1− p
i−1
p
= π(i)
1− p
i−1
p 1
= π(0)
1− p 1− p
P(Y = k) = Pk , Pk ≥ 0, k = 0, 1, . . .
and ∑∞ k=0 Pk = 1.
Start from one common ancestor, X0 = 1. The number of offsprings of different individuals are
independent.
Let
Then,
(n) (n)
Xn+1 = Y1n +Y2 + · · · +YXn
(n) (n) (n)
where Y1n ,Y2 , . . . ,YXn are independent copies of Y and Yi is the ubmer of offsprings of the i-th
individual in the n-th generation.
One thing we care about is the expectation of the number of offsprings of the n-th generation:
E(Xn ) =?
Assuming E(Y ) = µ.
1.24 Branching Process (Galton-Watson Process) 61
Generation 1
Generation 2
Solution:
(n) (n)
E(Xn+1 ) = E Y1n +Y2 + · · · +YXn
n (n) (n)
= E E Y1 +Y2 + · · · +YXn |Xn
= E(µXn ) = µE(Xn )
E(Xn+1 ) = µE(Xn )
E(Xn ) = µ n E(X0 ) = µ n , n = 0, 1, . . .
P(η = i) = Pi
ϕ(s) = E(sη )
∞
= ∑ Pk sk 0≤s≤1
k=0
1 d k ϕ(s)
Pk =
k! dsk s=0
62 Chapter 1. (Discrete-Time) Markov Chains
then,
d k ϕ(s)
= k!Pk + (· · · )s + (· · · )s2 + . . .
dsk
Then,
d k ϕ(s) 1 d k ϕ(s)
= k!Pk =⇒ Pk =
dsk s=0 k! dsk s=0
In particular, given P1 , P2 , · · · ≥ 0, this implies ϕ(s) is increasing and convex (all of its
derivatives will be positive).
3. Let η1 , . . . , ηn be independent random variables with generating functions ϕ1 , . . . , ϕn , then
X = η1 + · · · + ηn
Proof.
ϕx (s) = E sX
= E(sη1 . . . sηn )
= E(sη1 ) . . . E(sηn ) independence
= ϕ1 (s) . . . ϕn (s)
d k ϕ(s) d k E(sη )
k η
d s
η−k
= =E = E η(η − 1) . . . (η − k + 1)s
dsk s=1 dsk s=1 dsk s=1
s=1
= E(η(η − 1) . . . (η − k + 1))
In particular,
E(η) = ϕ 0 (1)
and
00 2
Var(η) = ϕ (1) + ϕ 0 (1) − ϕ 0 (1)
N = min {n ≥ 0 : Xn = 0}
u := lim un = P(N < ∞) = P(the population eventually die out) = extinction probability
n→∞
1.24 Branching Process (Galton-Watson Process) 63
0.8
ϕη (s)
0.6
0.4
P0
0.2
0
0 0.2 0.4 0.6 0.8 1
where Nm is the number of steps for the sub-population to die out. And we can also write
k Y
un−1 = E un−1 .
Thus, the problem becomes:
un = ϕ(un−1 )
what is limn→∞ un = u?
Recall that
1. ϕ(0) = P0 > 0
2. ϕ(1) = 1
3. ϕ(s) is increasing
64 Chapter 1. (Discrete-Time) Markov Chains
4. ϕ(s) is convex
Draw ϕ(s) and the function f (s) = between 0 and 1. We have two possibilities.
1 1
0.8 0.8
ϕ(s)
0.6 0.6
ϕ(s) f (s)
f (s)
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Theorem 1.24.2 The extinction probability u will be the smallest intersection of ϕ(s) and f (s).
Equivalently, it is the smallest solution of ϕ(s) = s between 0 and 1.
Reason:
see the dynamics of the graph
This dynamic process verifies the results for case 1 and case 2.
f (s)
ϕ(s)
ϕ(u2 )
ϕ(u1 )
ϕ(u0 )
E(Y ) > 1 =⇒ Extinction with certain porbability less than 1 and u is a unique solution
also, we can think of E(Y ) > 1 as on average, there are more than 1 offspring, so the population
will probably explode, which diminish the chance to wipe out the whole population. (Thanos has
left the chat.)
E(Y ) ≤ 1 =⇒ Extinction happes for sure (with prob. 1)
we can think of E(Y ) ≤ 1 as on average, there is less than or equal to 1 offspring, so there is
always a risk to have the population to die out.
P(n) = Pn
Conditional Probability
1. E(g(X)|Y = y) is a number
2. E(g(X)|Y ) is a random variable of Y
R Make sure you check the properties of conditional expectations. Especially, the iterated
expectation
E(E(X|Y )) = E(X)
µn = µPn
where
E( f (Xn )|X0 = 0)
0
f (n) = E( f (Xn )|X0 = 1)
..
.
and µ and P completely characterize a DTMC.
R We have row vectors as distribution and function vector (on state) is a column vector when
used.
ρyy = 1 Recurrent
State y Ey (Ty ) = ∞
Null Recurrent
ρyy < 1
Transient
Corollary 1.25.5 A DTMC with finite DTMC does not have null recurrent state/class.
where T is the transient states (con contain classes and stand-alone states) and Ri are the recurrent
(closed) classes.
68 Chapter 1. (Discrete-Time) Markov Chains
Period
n
d(x) = gcd {n ≥ 1 : Pxx > 0}
If d(x) = 1 =⇒ x is aperiodic.
If all states are aperiodic, then the MC is aperiodic. Easiest way to check is
give a stationary measure where µx (x) = 1. It gives stationary distribution if and only if
Main Theorems
Theorem 1.25.10 — Convergence Theorem. Suppose I,A,S. Then,
n
Pxy −→n→∞ π(y), ∀x, y ∈ S
“The limiting transition probability, hence also the limiting distribution, does not
depend on where we start. (Under the conditions of I,A,S)”
Or we can write
lim Pn = π(y), ∀x, y ∈ S =⇒ lim P(Xn = y) = π(y)
n→∞ xy n→∞
1.25 Review on DTMC 69
where Ey (Ty ) is the expected revisit time to y given that we start with y, which is also the
“expected cycle length”.
1
π(y) =
Ey (Ty )
n Nn (y) 1
π(y) = lim Pxy = lim =
n→∞ n→∞ n Ey (Ty )
1 n
lim
n→∞ n
∑ f (Xm ) = ∑ f (x)π(x) = π f 0
m=1 x
S ⇐⇒ positive recurrent
we know that
Time-reversed Chain
For fixed n,
Ym = Xn−m , m = 0, 1, . . . , n
Proposition 1.25.16 {Ym } is a DTMC if X starts from a stationary distribution.
d
Definition 1.25.4 — Time-reversable. X is called time-reversable if {Xn } = {Ym }.
C A B
C Q | R
∑y∈A Px1 ,y P(X ∈ A|X0 = x1 )
−− −− −− R0A = ∑y∈A Px2 ,y = P(X ∈ A|X0 = x2 )
A | .. ..
. .
B |
Exit Time
S = A∪S
we have g(x) = Ex (VA ) is the unique solution of
(
g(x) = 1 + ∑y Pxy g(y) x ∈ C
g(a) = 0 a∈A
E(Y ) = µ =⇒ E(Xn ) = µ n