STAT 433 Course Note
STAT 433 Course Note
Tuan Hiep Do
Contents
1. Preparation 2
1.1. Probability Space and Random Variables 2
1.2. Stochastic Processes 2
1.3. Review on Discrete-time Markov Chain (DTMC) 3
1.4. Classification and Class Properties 5
1.5. Stationary Distribution and Limiting Behaviour 6
1.6. Infinite State Space 7
1.7. Branching Process 8
1.8. Absorption Probability and Absorption Time 8
2. Discrete Phase-type Distribution 11
References 11
1
2 STOCHASTIC PROCESSES 2 (STAT 433) COURSE NOTE
1. Preparation
We will begin by reviewing the materials in STAT 333 before proceeding with new concepts
in STAT 433. The most fundamental of all is the concept of probability space and random
variables.
1.1. Probability Space and Random Variables.
Definition 1. A probability space, or a probability measure space, consists of a triplet
(Ω, E, P) where
(1) Ω is a sample space. This is the collection of all possible outcomes of a random exper-
iment. For example, the collection of all possible outcomes of rolling a dice, flipping a
coin, and forecasting weather would be {1, 2, 3, 4, 5, 6}, {H, T}, and {sunny, cloudy, rainy, · · · }
respectively.
(2) E is a σ-algebra/σ-field. It is the collection of all the events. More precisely, an event
E is a subset of Ω for which we can talk about probability. For example, an event of
getting odd numbers when rolling a dice would be {1, 3, 5} ⊆ {1, 2, 3, 4, 5, 6}.
(3) P is a probability measure which is a function mapping E to R by assigning each
E ∈ E a real value P(E). In particular, it needs to satisfy the following axioms
(a) 0 ≤ P(E) ≤ 1 for any E ∈ E,
(b) P(∅) = 0, P(Ω) = 1, and
(c) For a countable collection of pairwise disjoint events {Ei }∞ i=1 ,
∞
! ∞
[ X
P Ei = P(Ei ).
i=1 i=1
In the language of measure theory, (Ω, E, P) is precisely a measure space subjected to the
normalization condition P(Ω) = 1. Axiom (3)(c) is often referred to as countable additivity.
Readers who are interested can refer to section 1.2 of reference [1] for full details.
Definition 2. A random variable X, abbreviated as R.V., is a real (measurable) function
from Ω to R by mapping w 7→ X(w).
1.2. Stochastic Processes.
A process is a change/evolvement over time and stochastic just means random. As such,
a stochastic process, abbreviated as S.P., informally is a random change/evolvement over
time. We could formulate this in two ways as described in the following diagram.
+ Randomness
Number R.V.
+ Randomness
STOCHASTIC PROCESSES 2 (STAT 433) COURSE NOTE 3
The first approach would be to add randomness first and then let the random variable
evolve over time. This would yield a stochastic process. For the second approach, one may
let a number change over time in order to yield a function of time which combined with
randomness produces a stochastic process. It is worth noting that the second definition is
very hard to formulate since there is an involvement of a random function. Thus, we will
take the first approach in this course.
Definition 3. A stochastic process {Xt }t∈T is collection of random variables defined on a
common probability space indexed by a set T .
In most cases, T corresponds to “time” which could either by discrete as N ∪ {0} or contin-
uous as [0, ∞). In discrete cases, we typically write {Xn }n=0,1,··· . All the possible values of
Xt where t ∈ T are called the states of the process. Their collection is called the state space
which is denoted by S. Note that there is a subtle difference between the definition of a state
space and that of a sample space. The former implies that there is a time progression; that
is, the system will be in different states as time progresses. On the other hand, the latter
hints that there will be a probability measure defined on our sample space.
Naturally, the state space can be either discrete or continuous. For this section, we will
focus on discrete state space. One may relabel the states in a discrete state space S to get
the standardized state space {0, 1, 2, · · · } or {0, 1, · · · , n} if S is infinite or finite respectively.
1.3. Review on Discrete-time Markov Chain (DTMC).
For two events A and B, if P(A) > 0, then we define the conditional probability of B given
A as
P(B ∩ A)
P(B|A) := .
P(A)
Theorem 4. (Law of Total Probability) For a countable collection of pairwise disjoint events
{Ai }∞
i=1 such that ∪Ai = Ω, then
X∞
P(B) = P(B|Ai ) · P(Ai ).
i=1
Proof. By countable additivity and the definition of conditional probability, it follows that
" #
[ X
P(B) = P(B ∩ Ω) = P (B ∩ Ai ) = P(B ∩ Ai )
i≥1 i≥1
X
= P(B|Ai ) · P(Ai ).
i≥1
Theorem 5. (Bayes’ Rule) For a countable collection of pairwise disjoint events {Ai }∞
i=1
such that ∪Ai = Ω, then
P(B|Ai ) · P(Ai )
P(Ai |B) = P∞ .
j=1 P(B|Aj ) · P(Aj )
Proof. Following the definition and the law of total probability, one has
P(Ai ∩ B) P(B|Ai ) · P(Ai )
P(Ai |B) = = P∞ .
P(B) j=1 P(B|Aj ) · P(Aj )
4 STOCHASTIC PROCESSES 2 (STAT 433) COURSE NOTE
Intuitively, the definition states that the past/history will only influence the future through
(n)
the present (state). Recall the n-step transition matrix P (n) = {Pij }i,j∈S where
(n)
Pij := P(Xm+n = j|Xm = i) = P(Xn = j|X0 = i).
The C-K equation states that P (n+m) = P (m) · P (n) = P (n) · P (m) and
(n+m)
X (n) (m)
Pij = Pik · Pkj .
k∈S
As a corollary of the C-K equation, it follows that
P (n) = P n = |P · P ·{z
· · P · P} .
n times
If an initial distribution µ = µ0 µ1 · · · = P(X0 = 0) P(X0 = 1) · · · is given, then
the distribution of Xn is equal to
µn = P(Xn = 0) P(Xn = 1) · · · = µ · P (n) = µ · P n ,
From previous probability courses, readers are familiar with the notion of conditional expec-
tation E[g(X)|Y = y] at a specific value y of Y where X, Y are random variables and g is a
real (measurable) function. In particular,
(P
x g(x) · P(X = x|Y = y) if discrete,
E[g(X)|Y = y] = R
g(x)fX|Y (x|y) dx if continuous.
STOCHASTIC PROCESSES 2 (STAT 433) COURSE NOTE 5
This is a value which is not necessarily finite. We may regard E[g(X)|Y ] as a random variable
and also as a function of Y that takes specific values w ∈ Ω
E[g(X)|Y ](w) = E[g(X)|Y = Y(w) ].
The law of iterated expectation states that E[E(X|Y )] = E(X) and as a result,
E[f (Xn )] = µ · P n · f 0 = µn · f 0 = µ · f (n)0 .
T
In here, f 0 = f (0) f (1) · · · and
T
f (n)0 = E[f (Xn )|X0 = 0] E[f (Xn )|X0 = 1] · · · .
1.4. Classification and Class Properties.
In this section, we will state results related to classification from STAT 333 and leave out
the proofs. Readers can refer to previous notes for full details.
Definition 9. For two states x and y, x is said to communicate to y, which we denote by
x → y, if
ρxy := Px (Ty < ∞) = P(Ty < ∞|X0 = x) > 0
n
where Ty := min{n ≥ 1 : Xn = y}. That is, there exists n ∈ N so that Pxy > 0. We then say
that “x can go to y”.
It is easy to see that “→” satisfies transitivity. Namely, if x, y, and z are states so that
x → y and y → z, then x → z. Thus, this inspires the definition of a communicating class.
Definition 10. We say that C ⊆ S is a communicating class if
(1) For all i, j ∈ C, i ↔ j. That is, i → j and j → i.
(2) For all i ∈ C and j ∈/ C, i 6↔ j. This means that either i 6→ j or j 6→ i.
A DTMC is called irreducible if all the states are in the same communicating class.
Let ∞
1{Xn =y} = total number of visits to y.
X
N (y) =
n=1
We have the following results related to transience and recurrence from STAT 333.
Recurrence Transience
ρyy = 1 ρyy < 1
Py (N (y) = ∞) = 1 Py (N (y) < ∞) = 1 .
Ey (N (y)) = ∞ Ey (N (y)) < ∞
P∞ n
P ∞ n
n=1 Pyy = ∞ n=1 Pyy < ∞
It turns out that recurrence and transience are class properties. Other criteria and properties
for recurrence and transience are
(1) If ρxy > 0 and ρyx < 1, then x is transient. This makes sense intuitively because
there is a positive probability that the chain will not go back to x given that it starts
from x. Taking the contrapositive, if x is recurrent and ρxy > 0, then ρyx = 1.
(2) If A is a closed set, x ∈ A, and y ∈ / A, then Pxy = 0 or equivalently, ρxy = 0. In
particular, a closed set with finite states has at least one recurrent state. As such,
a closed class with finite states must be recurrent which means that an irreducible
DTMC with finite state space is recurrent.
6 STOCHASTIC PROCESSES 2 (STAT 433) COURSE NOTE
Theorem 18. Suppose that a DTMC is irreducible and has a stationary distribution, then
1
πy =
Ey (Ty )
In particular, the stationary distribution is unique.
Corollary 18.1. If a DTMC is irreducible, aperiodic, and has a stationary distribution,
then
n Nn (y) 1
πy = lim Pxy = lim = .
n→∞ n→∞ n Ey (Ty )
That is, the stationary distribution agrees with the limiting transition probability, the long-run
fraction of time, and 1/(expected revisit time).
Theorem 19. (Long-run
P Average) Suppose that the chain is irreducible, has a stationary
distribution, and x |f (x)| · πx < ∞. Then
n n−1
1X 1X X
lim f (Xm ) = lim f (Xm ) = f (x)π(x)
n→∞ n n→∞ n
m=1 m=0 x
= π · f 0,
T
where f 0 = f (0) f (1) · · · .
Definition 20. A distribution π satisfies the detailed balance condition if πx Pxy = πy Pyx for
all x, y ∈ S.
It was shown in STAT 333 that the detailed balance condition implies the existence of a
stationary distribution but the converse is not true. However, the converse will hold if P is
tridiagonal.
Definition 21. Fix n and define Ym = Xn−m for m ∈ {0, · · · , n}. The chain {Ym } is called
a time-reversed chain.
Lemma 22. {Ym } is a DTMC if {Xn } is a DTMC that starts from a stationary distribution.
In particular, {Xn } =d {Ym }.
Theorem 23. It turns out that time reversibility is equivalent to the condition that the
detailed balance condition holds.
1.6. Infinite State Space.
Suppose that x ∈ S, then either x is transient or recurrent. Within recurrence, there are
two further sub-categories that prove to be useful when S is infinite, namely positive re-
currence and null recurrence. A state x is said to be positive recurrent if x is recurrent
and Ex (Tx ) < ∞. On the other hand, x is said to be null recurrent if it is recurrent and
Ex (Tx ) = ∞. Once again, transience, positive recurrence, and null recurrence are class prop-
erties.
With these two notions, it follows that a stationary distribution exists if and only if there
exists at least one positive recurrent class and it is unique if and only if there exists only
one positive recurrent class. Additionally, πj = 0 for all π if and only if j is transient or null
recurrent. Finally, a DTMC with finite state space must have at least one positive recurrent
class and no null recurrent class/state.
8 STOCHASTIC PROCESSES 2 (STAT 433) COURSE NOTE
1
Example: Recall that a simple random walk is transient for p 6= 2
and it is null recur-
rent if p = 21 .
1.7. Branching Process.
Consider a branching process {Xn } where Xn is the population size of generation n and
X0 = 1. Let Y be the random number of offsprings of one individual.
And so, h(1) = 0.25 · h(1) + 0.6 · h(2). Similarly, h(2) = 0.2 · h(2) + 0.7. Solving the system
of equations yields h(1) = 0.7, h(2) = 87 . We can generalize this approach in order to obtain
the following result.
Theorem 24. (General Result) Suppose that S = A ∪ B ∪ C where C is finite. Starting
from any state in C, we are interested in the probability that the chain gets absorbed to set
A rather than set B assuming it is positive. Define
h(x) = P(absorbed to A|X0 = x).
Then h(x) = 1 for x ∈ A and 0 for x ∈ B. In particular, one can solve for h0 to obtain
h0 = (I − Q)−1 · R0A where
T
h0 = h(x1 ) h(x2 ) · · · ,
Q = {Pxy }x,y∈C ,
T
R0A =
P P
y∈A Px1 y y∈A Px1 y · · ·
in which xi ∈ C.
Proof. for x ∈ C, by using the law of total probability and conditioning on X1 , it follows
that
X
h(x) = P(absorbed to A|X1 = y, X0 = x) · P(X1 = y|X0 = x)
y inS
X
= Pxy h(y)
y∈S
X X
= Pxy h(y) + Pxy .
y∈C y∈A
to show that I − Q is invertible. Indeed, one can modify the transition matrix as follows
C A B C A B
C Q R 0 C Q R
P =
A 7→ P = A I .
0
B B I
Since we are only interested in observing the chain before it hits A or B, changing the
transition probabilities going out of the states in A or B will not change the result of this
problem. After this change, the states in A and B are absorbing, and all the states in C are
transient. Hence, for x ∈ C,
X X
0 = lim Px ( Xn0 ∈ C) = lim (P 0 )nxy = lim Qnxy .
n→∞ |{z} n→∞ n→∞
modified chain y∈C y∈C
10 STOCHASTIC PROCESSES 2 (STAT 433) COURSE NOTE
The second and third equalities are obtained from the fact that Px (Xn0 = y) = 0 for any
y ∈ C and
Cn A B
C Q ···
(P 0 )n = .
A I
0
B I
Now, note that
X
Qnxy = Qxx1 Qx1 x2 · · · Qxn−1 y
x1 ,··· ,xn−1 ∈C
X
= P(X10 = x1 , X20 = x2 · · · , Xn0 = y|X00 = x).
x1 ,··· ,xn−1 ∈C
Therefore,
X X
Qnxy = P(X10 = x1 , X20 = x2 · · · , Xn0 = y|X00 = x)
y inC x1 ,··· ,xn−1 ,y∈C
With a similar theme as above, let us now consider the idea of absorption time. The basic
setting is as follows: S = A ∪ C disjoint, C is finite, and the same assumptions applied as
for absorption probability. Define
VA := min{n ≥ 0 : Xn ∈ A}.
We want to determine Ex (VA ) for x ∈ C. Denote g(x) := Ex (VA ), then by first step analysis,
X X
g(x) = E(VA |X1 = y, X0 = x) · P(X1 = y|X0 = x) = Pxy (g(y) + 1)
y∈S y∈S
X
= Pxy g(y) + 1
y∈S
X
= Pxy g(y) + 1
|{z}
y∈C
Qxy
T
Using matrix notation, this is g0 = Q · g0 + 10 where g0 = g(x1 ) g(x2 ) · · · with xi ∈ C
T
and 10 = 1 1 · · · . Rearranging yields (I − Q)g0 = 10 . Observe that the matrix I − Q
is exactly the same as what we have already seen in the part of absorption probability. As
such, this matrix is invertible which implies that g0 = (I − Q)−1 10 and moreover, this is the
unique solution.
STOCHASTIC PROCESSES 2 (STAT 433) COURSE NOTE 11
The one-step transition probability matrix will have the following form
0 ··· M − 1 M ··· N
0
.. Q R
.
P = M −1
.
M
.. 0 I
.
N
Let T := min{n : M ≤ Xn ≤ N } be the time until absorption. Assume that we start from
the initial distribution
α0 = α0,0 α0,1 · · · α0,M −1 α0,M · · · α0,N .
In here, α0,i = P(X0 = i). Let α0 0 = α0,0 α0,1 · · · α0,M −1 . In this section, we are
interested in the exact distribution of T.
First,
References
[1] K. Davidson. Note for Pure Math 451 - Measure Theory. 2020.