STAT0007 Course Notes
STAT0007 Course Notes
Lecture notes
Course details
• Course organiser: Dr. Elinor Jones (elinor.jones@ucl.ac.uk).
Tutorials
Assessment
• ICA : Friday 3rd of March 3-4pm (provisional)- open book, 45 minutes. This will be 10% of your final
mark.
• Final exam in Term 3 - closed book, 2 or 2.5 hours (for STAT3102 and STAT2003 respectively). This
will be 90% of your final mark.
• There is no choice of questions in the ICA or final written exam.
1
Contents
1 Introduction (Important!) 4
1.1 A course overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Mathematical/ Statistical Writing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 Definitions (optional) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.2 Discrete random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.3 Continuous random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.4 Important distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.5 Generating functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.6 Probability generating functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.7 Moment generating functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.8 What else can we do with generating functions? . . . . . . . . . . . . . . . . . . . . . 12
1.4 Conditioning on events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.1 Conditional probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4.2 Conditional expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4.3 Useful conditioning formulae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.4.4 The idea of ‘first step decomposition’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2
4 Continuous-time Markov processes 56
4.1 Continuous-time Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2 The importance of the exponential distribution and ‘little-o’ notation . . . . . . . . . . . . . . 57
4.2.1 Order notation: o(h) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2.2 The lack-of-memory property of the exponential distribution . . . . . . . . . . . . . . 58
4.2.3 Other useful properties of the exponential distribution . . . . . . . . . . . . . . . . . . 59
4.3 Breaking down the definition of a continuous time Markov process . . . . . . . . . . . . . . . 60
4.3.1 Holding times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3.2 The jump chain: rates of change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3.3 The jump chain: probability of going to a particular state . . . . . . . . . . . . . . . . 62
4.4 Analysis of transition probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.4.1 Transition probabilities over a small interval . . . . . . . . . . . . . . . . . . . . . . . . 64
4.4.2 Transitions over longer periods: the Chapman-Kolmogorov equations . . . . . . . . . . 65
4.4.3 Kolmogorov’s forward equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.4.4 Kolmogorov’s backward equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.4.5 Solving the KFDEs and KBDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.4.6 The generator matrix, Q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.5 Limiting behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.5.1 Invariant distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.5.2 Equilibrium distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.5.3 A limit theorem for continuous time Markov chains . . . . . . . . . . . . . . . . . . . . 73
3
1 Introduction (Important!)
1.1 A course overview
A stochastic process is a process which evolves randomly over time (or space, or both):
Being able to write clearly will help you to think clearly. It forces you to understand the material: if you
can’t write it well, you don’t understand it well enough. Learning mathematics is not about repetition, as
it was at A-Level, and so past papers are NOT A GOOD REVISION TOOL. In particular, I write ICA and
exam questions that probe your understanding, and not your ability to regurgitate methods. This reflects
the fact that proper mathematics is not about learning to apply a set of methods, but using logical reasoning
to find your way through a problem.
• The ‘equals sign’ has a very specific meaning. Please don’t abuse it.
4
• Do a reality check on your answer. For example, a probability should be between 0 and 1. An
expectation of how many time units it takes to reach a particular point should be at least 1 (assuming
moving between locations takes 1 unit of time, and that you’re not already there!).
• If your answer doesn’t contain written explanation, then it is not a complete solution. Think about
guiding the marker through your thought process, rather than expecting the marker to second guess
what you meant to say.
• Use tutorial sheets to practice good mathematical writing.
It is important that you can spot a random variable, and can also define appropriate random variables for
a given scenario.
5
Probability model Real world
With a finite sample space Ω, we can assign probabilities to individual sample points ω via a ‘weight function’,
p : Ω → [0, 1]. This allocates a value p(ω) for each possible outcome ω, which we interpret as the probability
that ω occurs. We need X
p(ω) = 1,
ω∈Ω
Caution: this doesn’t always work with infinite sample spaces, but we won’t consider this further.
Notice that:
1. P(A) ≥ 0 for all events A,
P∞
2. if A1 , A2 , A3 , . . . are events with Ai ∩ Aj = ∅ for all i 6= j, then P(∪∞
i=1 Ai ) = i=1 P(Ai ) i.e. P is
countably additive,
3. P(Ω) = 1.
As the outcome of an experiment corresponds to a sample point ω ∈ Ω, we can make some numerical mea-
surements whose values depend on the outcome ω. This gives rise to a random variable, let’s call it X,
which is a function X : Ω → R. Its value at a sample point, X(ω), represents the numerical value of the
measurement when the outcome of the experiment is ω.
In our set-up, there are different random variables you could define. Let’s define X to be the number of
heads in the two tosses of a coin. In this case X maps Ω to 0 (if we see TT), 1 (if we see TH or HT) or 2 (if
we see HH).
X(T T ) = 0
X(T H) = X(HT ) = 1
X(HH) = 2
6
All outcomes are equally likely here, and so p(ω) = 1/4 for all outcomes ω. If we define our event A =“two
heads”, then since the only sample point in A is HH,
X
p(ω) = 1/4,
ω∈A
Similarly, we can also define another event B =“at least one tail”, and this is given by P (X ≤ 1), which we
can calculate as:
P(B) = P(X ≤ 1) = P({ω ∈ Ω : X(ω) ≤ 1}) (1)
Which ω satisfy (1) in this case? Now you should be able to calculate the probability required.
Note: we have defined the distribution function of X above. In general, the distribution function of
X of a random variable is given, for x ∈ R, by:
FX (x) = P(X ≤ x) = P({ω ∈ Ω : X(ω) ≤ x}).
Then
dFX (x)
fX (x) = .
dx
The expectation of X is Z ∞
E(X) = xfX (x)dx
−∞
and the expectation of g(X), for some function g, is
Z ∞
E(g(X)) = g(x)fX (x)dx ,
−∞
7
1.3.4 Important distributions
There are several probability distributions which we will use repeatedly during the course. Make sure you
are very familiar with the following:
Discrete distributions
k = 0, 1, . . . , n; q = 1 − p q =1−p
k
Poisson X ∼ Po(λ) 0, 1, 2, 3... P(X = k) = exp(−λ) λk! λ λ
where λ ≥ 0 k = 0, 1, 2, . . .
Continuous distributions
β α exp(−βx)xα−1
Gamma X ∼ Gamma(α, β) R+ fX (x) = Γ(α) α/β α/β 2
Imagine you have a sequence of real numbers, z1 , z2 , z3 , .... It can be hard to keep track of such numbers and
all the information they contain. A generating function provides a way of wrapping up the entire sequence
into one expression. When we want to extract information about the sequence, we simply interrogate this
one expression in a particular way in order to get what we want.
There are many different types of generating function. These are different in the sense that we wrap up the
sequence in different ways, and so extracting particular pieces of information from them requires different
techniques. The two types of generating function that you should be familiar with are:
8
1. The probability generating function (PGF);
2. The moment generating function (MGF)
2
d GX (s)
E[X(X − 1)] = = G00X (1), and so on.
ds2 s=1
(b) If we expand GX (s) in powers of s the coefficient of sk is equal to P(X = k), so we can also find
P(X = k) for all k.
9
Example 1.1
You can also work ‘backwards’: given a PGF for a random variable X, what is the distribution of
X?
For example, suppose you are told that GX (s) = ps/[1 − (1 − p)s]. We can convert this expression
into the summation format of a PGF and spot its distribution:
ps
GX (s) = ,
1 − (1 − p)s
∞
X
= ps (1 − p)k sk
k=0
∞
X
= sj (1 − p)j−1 p, where j = k + 1.
j=1
In fact, different distributions have different PGF formats but random variables with the same distribution
have a PGF of the same format. You can therefore spot the distribution of a random variable instantly from
the format of its PGF.
Exercise 1.2
Find the format of the following PGFs, either by calculating the PGF directly or looking in a text
book!
X ∼ Geometric(p)
X ∼ Poisson(λ)
X ∼ Binomial(n, p)
X ∼ Bernoulli(p)
10
Z
MX (s) = E [exp(tX)] = exp(tx)fX (x)dx for any t ∈ R such that the integral converges.
R
dn MX
n (n)
E[X ] = MX (0) =
dtn t=0
The same as for PGFs, you can spot the distribution of a random variable instantly from the format of its
MGF.
Exercise 1.3
Find the format of the following PGFs, either by calculating the PGF directly or looking in a text
book!
X ∼ Normal(µ, σ 2 )
X ∼ Exponential(λ)
X ∼ Binomial(n, p)
X ∼ Bernoulli(p)
X ∼ Geometric(p)
Notice that a discrete or continuous random variable can have a MGF, whereas a PGF is for discrete random
variables only (i.e. those with countable state space).
11
1.3.8 What else can we do with generating functions?
We’ll concentrate here on (useful) things you can do with probability generating functions, though similar
conclusions can be reached using moment generating functions too.
1. Calculating the distribution of sums of random variables.
Suppose that X1 , . . . , Xn are i.i.d. random variables with common PGF GX (s). The PGF GY (s) of
Y = X1 + · · · + Xn is given by
Yn
E sXi ,
=
i=1
= [GX (s)]n ,
and by looking at [GX (s)]n , we can spot the distribution of Y , which may otherwise be difficult. Also,
if Z1 Z2 (that is, Z1 and Z2 are independent but do not necessarily have the same distribution) then
|=
By looking at the PGF of Z1 + Z2 , GZ1 +Z2 (s), we can then deduce the distribution Z1 + Z2 . This can
be an easier strategy than trying to deduce its distribution directly.
Example 1.4
X and Y are both Poisson random variables, with parameters λ1 and λ2 , respectively. Assume that
X and Y are also independent of each other. What is the distribution of X + Y ?
G(s) = eλ(s−1) .
We recognise the format of this PGF as the PGF of a Poisson distribution with parameter (λ1 + λ2 ).
Therefore,
X + Y ∼ Poisson(λ1 + λ2 )
2. Calculating the PGF of a random sum (that is, the sum of a random number of random variables).
Suppose that X1 , . . . , XN are i.i.d. random variables with common PGF GX (s) and that N has PGF
GN (s).
12
We’ll see this technique of expanding an expectation again later (you have already seen it if you studied
STAT2001 or STAT3101), and it will prove to be a very useful technique in solving problems. Using
this, we can deduce that
h i h i h i
E E sY | N = E [GX (s)]N = GN GX (s) .
Example 1.5
The number of people who enter a particular bookstore on Gower Street, per day, follows a Poisson
distribution with parameter 300. Of those who visit, 60% will make a purchase. What is the
distribution of the number of people who make a purchase at the bookstore in one day?
Notice that whether a visitor makes a purchase is a Bernoulli trial, with probability of success 0.6.
Let (
1 if ith customer makes a purchase
Xi =
0 otherwise
so that we are looking to find the distribution of Y = X1 + ... + XN where N ∼ Poisson(300). Notice
that the PGF of the Bernoulli random variable X is GX (s) = (0.4 + 0.6s), and the PGF of N is
e300(s−1) .
By the result above, we know that the PGF of Y is GY (s) = GN (GX (s)). Think of this as using
GX (s) as the ‘in place of’ s in the PGF of N :
We recognise this as the PGF of a Poisson random variable with mean 180. Therefore, the distribution
of those who make a purchase is Poisson(180).
Easy exercise
You have a deck of playing cards (52 cards in total, split equally into four suits: hearts, diamonds,
spades and clubs, with the first two considered ‘red’ cards, and the others considered ‘black’ cards).
• I pick a card at random. What is the probability that the chosen card is from the diamond
suit?
• I pick a card at random and tell you that it is a red card (i.e. either hearts or diamonds). What
is the probability that the chosen card is from the diamond suit?
13
The second question requires a conditional probability (it is conditional on knowing that the card is red),
but you most likely calculated the probability intuitively without using any formulae. How did you do that?
You applied Bayes’s theorem intuitively, without even realising it.
In fact, Bayes’s theorem for calculating conditional probabilities is perfectly intuitive. Suppose we are
calculating P (A|C) for some events A and C:
• This is asking for the probability of A occurring, given that C occurs.
• Assuming C occurs, then the contents of the statespace Ω which relate to C becomes the ‘new’ states-
pace, Ω0 (which is just the event C for our purposes).
• What we are asking here, in effect, is to calculate the probability of A in the ‘new’ sample space Ω0
(i.e. P (A ∩ C)), scaled to Ω0 (i.e. divided by P (C)).
• The final equation is therefore P (A|C) = P (A ∩ C)/P (C)
The following graphic may help you to visualise why Bayes’s theorem ‘works’.
14
1.4.1 Conditional probability
Let A and C be events with P(C) > 0. The conditional probability of A given C is
P(A ∩ C)
P(A | C) = .
P(C)
1. P(A | C) ≥ 0 ,
2. if A1 , A2 , . . . are mutually exclusive events, then
X
P(A1 ∪ A2 ∪ . . . | C) = P(Ai | C),
i
3. P(C | C) = 1.
Conditional probability for random variables instead of events works is the same way.
X and Y independent if and only if for all x, y: X and Y independent if and only if for all x, y:
pX,Y (x, y) = pX (x)pY (y) = P(X = x)P(Y = y). fX,Y (x, y) = fX (x)fY (y)
15
The conditional expectation of g(X) given Y = y, for some function g, is
(P
g(x)P(X = x | Y = y) for discrete RVs
E(g(X)|Y = y) = R ∞x
−∞
g(x)fX|Y =y (x) dx for continuous RVs
Example 1.6
Suppose that Ω = {ω1 , ω2 , ω3 } and P(ωi ) = 1/3 for i = 1, 2, 3. Suppose also that X and Y are
random variables with X(ω1 ) = 2, X(ω2 ) = 3, X(ω3 ) = 1, and Y (ω1 ) = 2, Y (ω2 ) = 2, Y (ω3 ) = 1.
Find the conditional pmf pX|Y =2 (x) and the conditional expectation E(X | Y = 2).
P(X = 2, Y = 2) P(ω1 ) 1
pX|Y=2 (2) = P(X = 2 | Y = 2) = = = .
P(Y = 2) P(ω1 )+P(ω2 ) 2
P(X = 3, Y = 2) P(ω2 ) 1
pX|Y=2 (3) = P(X = 3 | Y = 2) = = = .
P(Y = 2) P(ω1 )+P(ω2 ) 2
Thus the conditional pmf of X given Y = 2 equals 21 for x = 2 and for x = 3 and is 0 otherwise. The
conditional expectation is
X
E(X | Y = 2) = x P(X = x | Y = 2) = (1 × 0) + (2 × 1/2) + (3 × 1/2) = 5/2 .
x
An important concept
So far, we have only calculated conditional expectations when conditioning on specific values, e.g.
E[X|Y = 2]. We can extend this idea to conditioning on random variables without equating them to
specific values.
The notation E(X | Y ) is used to denote a random variable that takes the value E(X | Y = y) with
probability P(Y = y).
Note that E(X | Y ) is a function of the random variable Y , and is itself a random variable.
Example 1.7
In Example 1.6 we found that E(X | Y = 2) = 25 . It can similarly be shown that E(X | Y = 1) = 1.
You can also check that P(Y = 1) = 1/3 and P(Y = 2) = 2/3. Thus, the random variable E(X | Y )
has two possible values 1 and 25 , with probabilities 31 and 23 , respectively. That is,
(
1 with probability 1/3
E[X|Y ] =
5/2 with probability 2/3
16
1.4.3 Useful conditioning formulae
Three important formulae:
(i) Law of total probability Let A be an event and Y be ANY random variable. Then
(P
P(A|Y = y)P(Y = y) if Y discrete
P(A) = R ∞y
−∞
P(A | Y = y)fY (y) dy if Y continuous
assuming the conditional probabilities are all defined. Note the similarity with (i).
Note in particular:
(i) The law of total probability applies to ANY event A and ANY random variable Y .
(ii) The law of the conditional (iterated) expectation, E[X] = E[E[X|Y ]] applies to ANY random
variables X and Y , and we will be using this identity heavily during the course.
17
Example 1.8
In Example 1.7 we found the distribution of the random variable E(X | Y ) was
(
1 w.p. 1/3
E[X|Y ] =
5/2 w.p. 2/3
Example 1.9
Let X and Y have joint density
1 −x/y −y
fX,Y (x, y) = e e , 0 < x, y < ∞ .
y
Solution
For any y > 0, Z ∞
1 −x/y −y
fY (y) = e e dx = e−y [−e−x/y ]∞
0 =e
−y
,
0 y
so Y has an exponential distribution with parameter 1.
so X | Y = y ∼ exp(1/y).
It follows that E(X | Y = y) = y and so E(X | Y ) = Y . Using the Law of Conditional Expectations
we then have
E[X] = E[E(X | Y )] = E(Y ) = 1 .
18
Example 1.10
Roads connecting three locations in a city are shown in the following diagram, the time taken to cycle
along any of the roads being 1 minute. Geraint is a bicycle courier whose office is located at location
A. He has a parcel that he needs to deliver to location C. However, he is a law abiding cyclist who
will only follow the direction of travel permitted on each road, as indicated by the arrows. Whenever
he comes to a junction, he selects any of the routes available to him with probabilities as given in the
diagram below.
C
1/3 1/2
2/3
A B
1/2
(i) How long, on average, will it take Geraint to deliver the parcel?
(ii) How many times, on average, will Geraint visit location B before delivering the parcel?
Now suppose that another location, D, is added to the mix, but that all couriers arriving at D are
stuck there indefinitely.
C
1/3 1/3
2/3 1/3
A B D
1/3
(iii) With what probability does the parcel never get delivered?
(ii) How many times, on average, will Geraint visit location B before delivering the parcel?
– Such expectations must be positive as Geraint will visit location B with positive probability.
– This expectation cannot be negative! Again, this is a good reality check.
19
– This expectation can be anything in the interval (0, ∞). Compare this with the previous question.
(iii) For the updated map, with what probability does the parcel never get delivered?
– This must be between 0 and 1. This may be obvious, but I have known (good) students to forget
that they are calculating a probability and give either a negative answer or something greater
than 1.
There are many ways of solving these questions, but a convenient method is to use ‘first step decomposition’:
think about ‘where do we go first’ ?
This technique can be adapted to solve many different problems, and it will be used frequently in this
course. In fact, all three questions posed about Geraint’s cycling trip can be solved using the general
idea of first step decomposition.
The idea is to decompose the process on the basis of where the process goes first (or its first step, hence
the name). This, combined with the iterated law of conditional expectation (when we want to compute
an expectation), or the law of total probability (for computing probabilities), proves to be a rather
powerful method.
Typical questions which can be solved via first step decomposition include:
1. How long does it take, on average, to reach a particular point for the first time?
2. How often, on average, do we visit a certain point before a particular event occurs?
3. What’s the probability that we ever reach/ never reach a particular point?
20
Example 1.10 continued
How long, on average, will it take Geraint to deliver the parcel?
We’ll apply first step decomposition to solve this question. Let Xn denote Geraint’s location at time
n, and let T denote the time it takes for Geraint to deliver the parcel to location C.
We need to calculate E[T ], and we will do this by conditioning on Geraint’s next move. In the first
instance, he starts in location A, so X0 = A. Using the iterated law of conditional expectation, we
get:
E[T ] = E[E[T |X1 ]] = E[T |X1 = B]P (X1 = B) + E[T |X1 = C]P (X1 = C).
Since X1 =B with probability 2/3 and X1 =C with probability 1/3, we get:
2 1
E[T ] = E[T |X1 = B] + E[T |X1 = C].
3 3
Now notice that E[T |X1 = C] = 1 because we are conditioning on Geraint reaching location C for
the first time at time 1, so this simplifies to
2 1
E[T ] = E[T |X1 = B] + . (2)
3 3
Now compute E[T |X1 = B] by applying first step decomposition again: from B, where can Geraint
go next?
1 1
E[T |X1 = B] = E[E[T |X2 , X1 = B]] = E[T |X2 = C, X1 = B] + E[T |X2 = A, X1 = B].
2 2
Now notice that E[T |X2 = C, X1 = B] = 2 and E[T |X2 = A, X1 = B] = 2 + E[T ] (the latter because
Geraint used 2 minutes going to B and then back to A, and once back in A it is as if the process
starts over from scratch).
1
Therefore E[T |X1 = B] = 1 + 2 (2 + E[T ]). Substituting this back in to (2), we have:
2 1 1 5
E[T ] = E[T |X1 = B] + = E[T ] + .
3 3 3 3
Solving gives E[T ] = 2.5 minutes.
You may want to consider simplifying the notation of the solution to Example 1.8 above, by defining
new quantities. For example, let Li denote the expected time to reach C, given that we start in state i.
Then, the solution above simplifies to
2 1
LA = 1 + LB + LC . (3)
3 3
We must add 1 since we are using one minute to move from state A to another state. Now we know
that LC = 0 (if we start in C, it takes no time to get there!), and LB = 1 + LA /2 + LC /2 = 1 + LA /2.
Therefore, substituting into (3), we need to solve
2 1 5
LA = 1 + (1 + LA /2) = LA + .
3 3 3
Solving gives LA = 2.5 minutes, as before.
21
Exercise 1.11
Use a similar method to solve parts (ii) and (iii) of Example 1.10.
1.5 Exercises
The following problems can be solved by a variety of methods. Try to solve them using conditioning
arguments.
Exercise 1.12
A coin, with probability of heads equal to p, is tossed repeatedly until the first head is obtained.
What is the expected number of throws required?
Exercise 1.13
Two UCL students are trapped in a corridor in Torrington Place after 7 p.m. The corridor has three
exits. The first exit leads out of the building in 2 minutes, the second leads back to the original
corridor after 5 minutes and the third is a dead end that leads back after 1 minute. Assume that
the students are trying to get out of the building, but are so engrossed in a statistical discussion that
they fail to learn from experience, and they choose an exit at random each time they return to the
corridor.
(a) Show that the students are certain to get out of the building eventually.
(b) Find the expected length of time until they get out.
(c) Find the variance of the time until they get out.
(d) Find the probability generating function of the time until they get out.
Exercise 1.14
The number of claims arriving at an insurance company in a week has a Poisson distribution with
mean λ. For any claim form, there is a probability p that it is completed incorrectly, independently
of other claim forms and of the number of claims.
(a) Find the distribution of the number of forms completed incorrectly in a week.
(b) Deduce the distribution of the number of forms completed incorrectly in a year.
(c) The sizes of the claims are independent random variables X1 , X2 , X3 , . . . that have the same
distribution, and that are independent of the number of claims. Let T be the total amount
claimed in one week (including claims on incorrect forms). Obtain formulae for the expectation
and variance of T .
22
2 What is a stochastic process?
2.1 Definitions and basic properties
Definition: Stochastic process
A stochastic process is a collection of random variables {Xt , t ∈ T } taking values in the state space
S. The parameter, or index, set T often represents time.
Notation
• Discrete-time processes: {Xt , t ∈ T }, where T = {0, 1, 2, . . .}. This is often written in terms of the
natural numbers (including zero) rather than T , i.e. {Xn , n ∈ N}, where N = {0, 1, 2, . . .}
• Continuous-time processes: {X(t), t ∈ T }, where T = R+ (i.e. t ≥ 0).
• Notice that the state space can be discrete or continuous. In this course we will only consider discrete
state spaces.
STATE SPACE
Discrete Continuous
How many more heads than tails in total? A gambler’s winnings and losses after each bet
Discrete
23
NOTE: Continuous-time processes can change their value/state (‘jump’) at any instant of time; discrete-
time processes can only do this at a discrete set of time points. For discrete-time processes, when we use the
word ‘time’ we mean ‘number of transitions/steps’.
A stochastic process {Xt , t ∈ T } is a Markov process if, for any sequence of times
0 < t1 < · · · < tn < tn+1 and any n ≥ 0,
for any j, i0 , . . . , in in S.
Let
• Xn be the present state;
• A be a past event (involving X0 , . . . , Xn−1 );
• B be a future event (involving Xn+1 , Xn+2 , . . ..)
The Markov property (MP) states that given the present state (Xn ), the future B is conditionally independent
of the past A, denoted
B A|Xn .
|=
Therefore,
24
2.3 Why is the Markov property useful?
We’ll be using the Markov property repeatedly throughout the course, so make sure you are familiar with it!
The Markov property is a strong independence assumption. It is useful because it simplifies probability
calculations. It enables joint probabilities to be expressed as a product of simple conditional probabilities.
Example 2.1
Suppose that X is a (discrete time) Markov process taking values 0, 1 and 2. Write
P(X0 = 1, X1 = 2, X2 = 0)
as a product of conditional probabilities, simplifying your answer as much as possible, justifying each
step of your argument.
Exercise 2.2
Make sure you understand how to apply the Markov property (repeatedly!) by answering the
following.
Suppose that X is a (discrete time) Markov process taking values in some statespace S. Show that
n
Y
P(X0 = i0 , . . . , Xn = in ) = P(Xk = ik | Xk−1 = ik−1 ) P(X0 = i0 ).
k=1
This is a very useful identity which we’ll be making use of throughout the course.
The second strategy is particularly hard, so we mainly use this technique when we suspect that the Markov
property does not hold by finding a counter example which does not satisfy the Markov property. To show
that the Markov property does not hold, find one set of states a, b, c and perhaps d such that one of the
following holds (you do not need to show both hold - one will do!):
25
• P (Yn+1 = a|Yn = b, Yn−1 = c) 6= P (Yn+1 = a|Yn = b, Yn−1 = d) for some a, b, c, d with c 6= d. This
shows that the state of the process at time (n − 1) affects the state of the process at time (n + 1),
violating the Markov property.
Many questions that you will encounter will provide you with a process {Xn , n = 0, 1, 2, ...}
which is known to be Markov, and will define a further process {Yn , n = 0, 1, 2, ...} as a function of the
original Markov process. The question is usually whether the ‘new’ process is also a Markov process.
That is, if {Xn , n = 0, 1, 2, ...} is a Markov process, and {Yn , n = 0, 1, 2, ...} is a function of this Markov
process, then does the Markov property hold for {Yn , n = 0, 1, 2, ...} too?
In this kind of situation, it is often expected (or even necessary) to take a mathematical approach to
proving or disproving that the Markov property holds in the ‘new’ process. Two strategies to keep in
mind:
• Is the function in question bijective (one-to-one)? If it is, then the new process will also be Markov.
(Think carefully as to why this has to be the case!).
• If the function in question is not bijective, then the new process may, or may not, be Markov.
In this case, if you are trying to prove that the new process is not Markov, a good strategy is to
exploit the fact that you do not have uniqueness.
We’ll see these strategies being used in the next example.
26
Example 2.3.
Solution:
(i) It is easy to show that ANY sequence of i.i.d. random variables satisfies the Markov property
(try it), therefore part (i) is complete.
(ii) The state space for {Yn , n = 0, 1, 2, ...} is S = {−1, 0, 1}. Notice that
– Yn = −1 only if Xn−1 = −1 and Xn = −1;
– Yn = 1 only if Xn−1 = 1 and Xn = 1;
– Yn = 0 if Xn−1 = −1 and Xn = 1 OR Xn−1 = 1 and Xn = −1;
The transformation function from X to Y is not bijective because there is ambiguity in the value
of the pair (Xn , Xn−1 ) when Yn = 0. Let’s try to show that the Markov property does not hold
by finding a counter example. After a trial-and-error approach I find a counter example, which
proves that the Markov property does not hold (note: this is not the only counter example in
this case):
(iii) The state space for {Zn , n = 0, 1, 2, ...} is S = {−1, −1/3, 1/3, 1}. Notice that
– Zn = −1 only if Xn−1 = −1 and Xn = −1;
– Zn = −1/3 only if Xn−1 = 1 and Xn = −1;
– Zn = 1/3 only if Xn−1 = −1 and Xn = 1;
– Zn = 1 only if Xn−1 = 1 and Xn = 1.
Therefore, whatever the value of Z, we can deduce exactly the values of the relevant random
variables X: we have a one-to-one relationship between pairs of X variables and values for Z.
We must therefore conclude that {Zn , n = 0, 1, 2, ...} is a Markov process.
27
Exercise 2.4.
Try the following exercises to make sure that you are comfortable in establishing whether the Markov
property holds for the following processes.
???
Yn = (Xn + Xn−1 )2
and Y0 = 0. State whether or not the stochastic process {Yn , n = 0, 1, 2, ...} is a Markov chain. If it is
a Markov chain, state why it must be so. If the process is not a Markov chain, then give an example
where the Markov property breaks down.
???
• Given the base at loci n, the probability that it will not change at loci n + 1 is 0.5.
• If the base at loci n + 1 is different from loci n, the base at loci n + 1 is chosen randomly from
the remaining bases.
Let Zn be the base at loci n. Is {Zn } a Markov chain? [HINT: No maths required! Deduce that the
process must satisfy the Markov property logically.]
28
3 Discrete-time Markov processes
3.1 Introduction to discrete time Markov chains
Definition: Discrete-time Markov chain
A discrete-time Markov chain, often abbreviated to Markov chain, is a sequence of random
variables X0 , X1 , X2 , . . . taking values in a finite or countable state space S such that, for all
n, i, j, i0 , i1 , . . . , in−1
P(Xn+1 = j | Xn = i)
P(Xn+1 = j | Xn = i) = P(Xm+1 = j | Xm = i)
even if m 6= n. In effect, this allows us to forget about when the chain moved from i to j, all that matters is
that this occurred in one time step. When a chain is time homogeneous, we also have that for any integer
r ≥ 1,
P(Xn+r = j | Xn = i) = P(Xm+r = j | Xm = i).
That is, the probability of moving from state i to state j in r time steps is the same regardless of when
this happened. In this course, we will assume that all discrete time Markov processes are time homogeneous
unless otherwise stated.
The probabilities
29
Important properties of a transition matrix include:
(i) All entries are non-negative (and, because they are probabilities, are ≤ 1).
P
(ii) Each of the rows sum to 1, that is, j pij = 1, for all j ∈ S.
Why must each row sum to 1? Why needn’t each column sum to 1?
Exercise 3.1
A rat is put in room 1 in the maze illustrated.
1 2
3 4
Every minute the rat changes rooms, choosing its exit at random from the exits available. Let Zn
be the number of the room occupied just after the nth transition.
Justify that {Zn } is a Markov chain, and find its transition matrix.
(n)
So pij is the probability that a process in state i will be in state j after n ‘steps’. Note that due to time
homogeneity, this does not depend on m.
(n+m)
The Chapman-Kolmogorov (C-K) equations show us how to compute pij or P (n+m) for m ≥ 0, n ≥ 0.
These are useful equations which will enable us to calculate these quantities quickly and easily.
30
Consider time points 0, m and m + n, where m ≥ 0, n ≥ 0 and suppose that we want to compute the
probability that we go from state i to state j in (n + m) steps,
(n+m)
pij = P (Xn+m = j|X0 = i).
The C-K equations are derived using the simple observation that at time m, we must be in some state (let’s
call it k). By summing over all possibilities for k, we derive the C-K equations.
i k j state
-
0 m m+n time
(m+n)
By conditioning on the state k at time m we derive pij :
(n+m)
pij = P(Xm+n = j | X0 = i)
X
= P(Xm+n = j, Xm = k | X0 = i)
k∈S
↓ see (ii) in ‘Useful conditioning formulae’
X
= P(Xm+n = j | Xm = k, X0 = i) P(Xm = k | X0 = i)
k∈S
↓ Markov property and time homogeneity
(m) (n)
X
= pik pkj , for all n, m ≥ 0, all i, j ∈ S.
k∈S
Note the plural - there is one equation for each (i, j) pair.
Since
(m)
{pik , k ∈ S} is the ith row of P (m)
and
(n)
{pkj , k ∈ S} is the jth column of P (n) ,
the Chapman-Kolmogorov equations can be written in matrix form as
This may not seem particularly helpful, but note the following:
1. P (1) is the transition matrix P .
2. P (0) is the identity matrix I.
31
3. Note
P (2) = P (1+1) = P · P = P 2 .
P (3) = P (1+2) = P · P (2) = P 3 , etc.
P (n) = P n .
That is, the n-step transition matrix P (n) is equal to the nth matrix power of P .
This is enormously helpful - given a Markov chain with (one-step) transition matrix P , we can now compute
its n-step transition matrix by simply multiplying P with itself n times.
Exercise 3.2
The weather on a particular day is either ‘rainy’ (R) or ‘fine’ (F ). We assume that a Markov chain
model is appropriate. Let Xn be the state of weather on day n. The state space is {R, F }.
The Markov chain assumption means that, given information on whether today is rainy or not, the
probability that tomorrow is a rainy day does not depend on the weather on days prior to today.
That is,
Xn+1 {Xn−1 , . . . , X0 } | Xn .
|=
(a) Find the 2-step transition matrix. [HINT: Find the (1-step) transition matrix first, then apply
Chapman-Kolmogorov.]
(b) Find P(X4 = R, X3 = R | X0 = R).
is a probability row vector (i.e. a row vector with non-negative entries summing to 1) specifying the distri-
bution of Xn .
(n) (n)
Do not confuse pij and pj . The former is a conditional probability, and the latter a marginal
probability.
32
Definition: Initial distribution
(0) (0) (0)
When n = 0, and the state space is S = {0, 1, 2, ...}, then p(0) = (p0 , p1 , p2 , . . .) = (P(X0 =
0), P(X0 = 1), P(X0 = 2), . . .) is the distribution of X0 . This is the initial distribution: the distribu-
tion of states in which we start the Markov chain. This tells us the probability of the chain starting
in any particular state.
(n)
If we look at pj , the probability that the chain is in state j at time n, then
Thus, the initial distribution, p(0) , and the transition matrix, P , together contain all the
probabilistic information about the Markov chain. That is, this is all we need in order to
compute or derive any quantity of interest related to the Markov chain.
For example,
(0)
P(X0 = i0 , X1 = i1 , . . . , Xn = in ) = pi0 pi0 i1 pi1 i2 · · · pin−1 in
is a useful identity to compute a joint probability using only the (one-step) transition probabilities and
initial distribution.
Exercise 3.3
(0) (0)
In Exercise 3.2, suppose pR = 0.7 and pF = 0.3.
Therefore, the duration of stay in state i has a geometric distribution with parameter (1 − pii ), from which
1
we can deduce that the expected amount of time that the process remains in state i is 1−p ii
.
33
Where will the Markov chain go when it leaves i?
For j 6= i,
Summary:
The duration of stay of the Markov chain in state i has a geometric distribution with expected value
1
1−pii steps.
pij
When the chain moves from state i, it goes to state j (6= i) with probability 1−pii .
Even if nobody has studied this particular Markov chain before, we are able to use some general results
on the behaviour of Markov chains to help us. In order to use this theory we need to classify each state
of the chain as being one of a number types, and hence classify the type of Markov chain that we have.
The first step in classifying Markov chains is to split the state space into non-overlapping groups (classes)
of states. We will see later that states in the same class are of the same type. This simplifies the problem of
classifying all the states in the chain because we only need to classify one state in each class.
that is, starting from state i, it is possible that the chain will eventually enter state j.
34
Exercise 3.5
A Markov chain has the following state-space and transition matrix.
1/2 1/2 0 0
1 0 0 0
S = {1, 2, 3, 4}, P =
.
0 1/2 1/3 1/6
0 0 0 1
Draw a diagram to summarise the possible moves which can be made in one step. Which states
intercommunicate?
Properties of ↔
(i) i ↔ i for all states i.
(ii) if i ↔ j then j ↔ i.
(iii) if i ↔ j and j ↔ k then i ↔ k.
(i)+(ii)+(iii) mean that ↔ is an equivalence relation on S.
If there is only one irreducible class, then the Markov chain is said to be irreducible.
Example 3.6
For the Markov chain with state space S = {0, 1, 2} and transition matrix
1/4 1/4 1/2
P =
0 1 0 ,
1/2 0 1/2
the state space can be partitioned in to the irreducible classes {0, 2} and {1}. In particular, the
Markov chain is not irreducible because there are two irreducible classes.
35
Exercise 3.6
Find the irreducible classes for each of the Markov chains given below.
1 0 0
(a) S = {0, 1, 2}, P =
1/2 .
0 1/2
0 1 0
0 0 01
0 0 0 1
(b) S = {0, 1, 2, 3}, P =
.
1/2 1/2 0 0
0 0 1 0
Exercise 3.7
1. For the Markov chain defined in Exercise 3.6(a), deduce whether fi = 1 or fi < 1 for i = 0, 1, 2,
and hence classify each state as either recurrent or transient.
2. For the Markov chain defined in Exercise 3.6(b), deduce whether fi = 1 or fi < 1 for i = 0, 1, 2, 3,
and hence classify each state as either recurrent or transient.
Studying recurrence and transience enables us to ask further interesting questions of our Markov chain, for
example, how many times will a Markov chain hit state i, given it starts in state i?
(i) If i is transient
Let N be the number of hits on i (including the hit from the fact that X0 = i). Then
P(N = n | X0 = i) = P(return n − 1 times to i, then never return | X0 = i)
= fin−1 (1 − fi ),
So, given X0 = i, N has a geometric distribution with parameter 1 − fi . So N is finite with probability
1 and
1
E(N | X0 = i) = (< ∞).
1 − fi
36
(ii) If i is recurrent
With probability 1 the chain will eventually return to state i.
By time homogeneity and the Markov property, the chain ‘starts afresh’ on return to the initial state,
so that state i will eventually be visited again (with probability 1).
Repeating the argument shows that the chain will return to i infinitely often (with probability 1).
We also have
E(N | X0 = i) is infinite.
Another idea connected with recurrence and transience is that of first passage times. This describes the time
it takes for a Markov chain to return to state i, given that it started there.
Let
Tii = min{n ≥ 1 : Xn = i | X0 = i}.
Tii is the first passage time from state i to itself, that is, the number of steps until the chain first
returns to state i given that it starts in state i.
Note that:
fi = P(ever return to i | X0 = i) = P(Tii < ∞)
and
1 − fi = P(never return to i | X0 = i) = P(Tii = ∞).
Connecting this with the idea of transience and recurrence yields:
• if i is recurrent then P(Tii < ∞) = 1, that is, Tii is finite with probability 1.
• if i is transient then P(Tii < ∞) < 1, or equivalently P(Tii = ∞) = 1 − P(Tii < ∞) > 0, that is, Tii is
infinite with positive probability. (Therefore, µi = E[Tii ] is infinite.)
i recurrent i transient
P(Tii = ∞) 1 − fi = 0 1 − fi > 0
37
3.2.3 Positive recurrence and null recurrence
In fact, there are two types of recurrent state. Recall that the first passage time (also called the recurrence
time) Tii of a state i is the number of steps until the chain first returns to state i given that it starts in state i.
Let µi = E(Tii ) be the mean recurrence time of state i. We have already seen that if i is transient then
µi = ∞. We have not computed µi when i is recurrent. In fact, the value of µi when i is recurrent enables
us to distinguish between two types of recurrence: positive recurrence and null recurrence.
At first glance it may seem strange that a state i can be null recurrent: return to state i is certain, but the
expected time (number of steps) to return to i is infinite. In other words, the random variable Tii is finite
with probability 1, but its mean, E(Tii ), is infinite. Such states do exist and we will see examples later in
the course.
3.2.4 Periodicity
Definition: Periodicity
(n)
The period of a state i is the greatest common divisor of the set of integers n ≥ 1 such that pii > 0,
that is, n o
(n)
di = gcd n ≥ 1 : pii > 0 .
A class property is a property which is shared by all the states in an irreducible class. If it
can be shown that one of the states in a class has a certain class property then all the states in the
irreducible class must also have that same property.
I give sketch proofs for some of the following results relating to class properties. The other proofs can be
found in any good textbook on Markov processes.
38
Result: Recurrence/transience is a class property
A pair of intercommunicating states are either both recurrent or both transient. That is, recurrence/
transience is a class property. It is not possible to have a mixture of recurrent and transient states
in an irreducible class.
Result
Null recurrence is a class property.
Proof:
Recall that µi = E(Tii ) is the expected first passage time or the mean recurrence time of state i, and that
(
finite if i is positive recurrent
µi is
infinite if i is null recurrent or transient.
(n) 1
i aperiodic ⇒ pii → as n → ∞,
µi
(nd) d (m)
i periodic, period d ⇒ pii → as n → ∞(and pii = 0 if m is not a multiple of d).
µi
So, as n → ∞
(n) 1
• if i is aperiodic, positive recurrent then pii → µi (positive, finite),
(nd) d
• if i is periodic, positive recurrent then pii → µi (positive, finite),
(n)
• if i is null recurrent then pii → 0,
P∞ (n) (n)
• if i is transient then (proof omitted) n=0 pii < ∞, and pii → 0.
(n) (m)
Now let i ↔ j, with pij > 0, pji > 0. Suppose that i is null recurrent. Then
As k → ∞
(m+k+n)
pii → 0
(because i is null recurrent), and so
(k)
pjj → 0
(n) (m)
(because pij > 0 and pji > 0). So j is either transient or null recurrent. But i is recurrent and i ↔ j,
so j must be recurrent as well (because recurrent/transience is a class property). Hence j must be null
recurrent.
39
Result
Positive recurrence is a class property.
Proof:
Suppose that i ↔ j and that i is positive recurrent. Then j must be recurrent (because recurrence is a
class property). If j is null recurrent then i is also null recurrent. This is a contradiction. Hence j must be
positive recurrent.
Result
Intercommunicating states have the same period. That is, periodicity is a class property.
Proof:
Let i and j be states with i ↔ j. Let di be the period of i and dj be the period of j. Let n and m be such
(n) (m)
that pij > 0 and pji > 0. Then
(n+m) (m) (n)
pjj ≥ pji pij > 0,
because there is at least one way of returning to j in n + m steps: j to i in m steps and then i to j in a
further n steps. Therefore,
so
dj | n + k + m, (dj is a divisor of n + k + m).
together imply that
dj | k, (dj is a divisor of k).
(k)
That is, dj is a factor of any {k ≥ 1 : pii > 0}.
However, by definition di is the greatest common divisor of such k. Therefore, dj ≤ di . Reversing the roles
of i and j gives di ≤ dj . Hence di = dj .
Summary
• Periodicity.
40
Exercise 3.8
Find and classify the irreducible classes of these Markov chains.
(a)
1/2 1/2 0 0
1 0 0 0
S = {1, 2, 3, 4}, P =
0 1/2 1/3 1/6
0 0 0 1
(b)
1 0 0
S = {0, 1, 2}, P =
1/2 0 1/2
0 1 0
Result
A finite irreducible Markov chain cannot be null recurrent.
(n)
Proof: Recall that pj = P(Xn = j). Suppose that the chain is null recurrent. It can be shown that
(n)
pj → 0 as n → ∞ for all j ∈ S.
We know that
(n)
X
pj = 1 for all n,
j∈S
so
(n)
X
lim pj = 1.
n→∞
j∈S
Result
A finite Markov chain cannot contain any null recurrent states.
Proof:
Suppose that the chain contains a state i which is null recurrent. Then, since null recurrence is a class
property, state i is in some irreducible finite closed class of null recurrent states. This is not possible (see
above).
41
Result: Finite Markov chains
It is not possible for all states in a finite state space Markov chain to be transient.
Note that this implies that all states in a finite irreducible Markov chain are recurrent (because
recurrence is a class property). That is, finite and irreducible ⇒ recurrent Markov chain.
Proof:
Suppose all states are transient and S = {0, 1, . . . , M }. A transient state will be visited only a finite number
of times. Therefore for each state i there exists a time Ti after which i will never be visited again. Therefore,
after time T = max{T0 , . . . , TM } no state will be visited again. BUT the Markov chain must be in some
state. Therefore we have a contradiction. Therefore, at least one state must be recurrent.
Exercise 3.9
Find all closed classes and absorbing states for the following Markov chains.
(a)
1/2 1/2 0 0
1 0 0 0
S = {1, 2, 3, 4}, P =
0 1/2 1/3 1/6
0 0 0 1
(b)
1 0 0
S = {0, 1, 2}, P =
1/2 0 1/2
0 1 0
42
Result
An irreducible class C of recurrent states is closed.
Proof:
Suppose that C is not closed. Then there exist i ∈ C and j ∈ / C such that i → j. But j 6→ i, otherwise
i and j would intercommunicate and j would be in C. Thus, there is positive probability that the chain
leaves i and never returns to i. This means that i is transient. This is a contradiction, as i ∈ C and C is
recurrent.
The next three exercises ask you to find quantities relating to closed classes and absorption. Try to use first
step decomposition to solve these questions (see section 1.4.4).
Exercise 3.10
Suppose that X is a discrete time Markov chain with state-space S = {0, 1, 2, 3, 4} and transition
matrix P given by
1 0 0 0 0
1/2 1/4 1/4 0 0
P = 0 1/2 0 1/2 0 .
0 0 0 1/2 1/2
0 0 0 1/2 1/2
(a) Suppose that X0 = 1. Find the probability that the chain is eventually absorbed in C3 = {3, 4}.
(b) If p(0) = (0, 1/3, 2/3, 0, 0), calculate the probability that the chain is absorbed in C3 .
(c) If the chain is absorbed in C3 = {3, 4}, what is the probability that X0 = 2 ?
Exercise 3.11
Now suppose that X is a discrete time Markov chain with state-space S = {0, 1, 2, 3} and transition
matrix P given by
1/3 2/3 0 0
1 0 0 0
P = .
1/4 1/4 1/4 1/4
0 0 1 0
(a) Suppose that X0 = 2. Find the expected time until absorption into {0, 1}.
(b) Find the expected time to absorption in one or other of the two closed classes, starting from
state 1.
43
Exercise 3.12
Now suppose that X is a discrete time Markov chain with state-space S = {0, 1, 2, 3} and transition
matrix P given by
0 1/2 1/2 0
1/3 0 0 2/3
P = .
1/2 1/2 0 0
0 0 1 0
S = T ∪ Cm1 ∪ Cm2 ∪ · · · ,
where
• T is a set of transient states;
Each Cmi is either null recurrent or positive recurrent, and all states in a particular Cmi have the same
period (different Cmi ’s can have different periods). The following observations apply:
(c) If S is finite then it is impossible for the chain to remain forever in the (finite) set T of transient states.
(d) If S is finite then at least one state must be visited infinitely often and so there must be at least one
recurrent state.
We also know that if S is finite then there are no null recurrent states. The consequences this are:
(a) If a Markov chain has a finite state space then there must be at least one positive recurrent state.
(b) If a Markov chain has a finite state space and is irreducible, then it must be positive recurrent.
(c) A finite closed irreducible class must be positive recurrent.
44
Exercise 3.13
Find and classify the irreducible classes of the following Markov chains.
(a)
1/2 1/2 0 0 0
1/2 1/2 0 0 0
S = {0, 1, 2, 3, 4}, P =
0 0 0 1 0 .
0 0 1 0 0
1/4 0 1/4 0 1/2
(b)
1/2 1/2 0 ···
0 0
1/3 0 2/3 0 0 · · ·
S = {0, 1, 2, . . .}, P =
,
1/4 0
0 3/4 0 · · ·
. .. .. .. ..
.. . . . .
that is,
1 i+1
pi0 = and pi,i+1 = , for i = 0, 1, 2, . . . .
i+2 i+2
(c)
0 1/2 1/2 0
1/2 0 0 1/2
S = {1, 2, 3, 4}, P =
.
1 0 0 0
0 1 0 0
that is,
pii = q and pi,i+1 = p for i = 0, 1, 2, . . . .
45
3.3 Limiting behaviour
This is the crux of the first half of the course: understanding a Markov chain’s behaviour in the long run
(limiting behaviour).
Our goal here is to find whether the distribution of states, p(n) , settles down (i.e. converges) as n → ∞.
If it does, what is this limit?
We now have enough tools at our disposal. It will turn out that to fully determine its long term behaviour
we must classify the Markov chain in terms of:
(a) Irreducible classes;
(b) Transience and positive/null recurrence;
(c) Periodicity;
The answer to this question is ‘yes’, and these distributions are known as ‘invariant’.
π = π P.
If π is an invariant distribution, and the initial distribution of the Markov chain is π, why does this imply
that we can study its long term behaviour?
The idea is that if the chain starts in π (or equivalently, we find at some later time that the distribution
of states is π), then this tells us that the distribution of states is π forever: we’ve uncovered the long term
behaviour of the chain in this specific case.
This works since, suppose that an invariant distribution π exists and that p(0) = π, then
p(1) = p(0) P = π P = π;
p(2) = p(1) P = π P = π;
p(3) = p(2) P = π P = π; etc.
In fact p(n) = π, for n = 1, 2, . . . and we have found the limit of p(n) as n → ∞, as required!
If our initial distribution is π, with π = πP , then the distribution of the states at any given future time
n is also π. This is why it is called an invariant distribution.
An invariant distribution always exists if the state space S is finite, but it need not be unique. If
S is infinite then an invariant distribution need not exist.
46
Assuming that the transition matrix P is a k × k matrix, and the state space is S = {1, 2, ..., k}, solving
π = πP is a matter of solving a set of k simultaneous equations, each of the form
If you try this you will find that one equation is redundant (try it!): we are therefore trying to solve for k
unknowns (π1 , ..., πk ) using (k − 1) equations. This is clearly impossible, but we have the advantage that
since π is itself a probability distribution, we must have π1 + ... + πk = 1. This added requirement means
that we can now solve for π1 , ..., πk .
Exercise 3.14
Find an invariant distribution for the Markov chain in Exercise 3.2.
Example 3.15
Recall the weather example (Exercise 3.2), with state space S = { R, F } and transition matrix P ,
0.6 0.4
P = .
0.5 0.5
0.56 0.44 0.556 0.444 0.555556 0.444444
P2 = , P3 = , P 6 = P 3 ·P 3 = .
0.55 0.45 0.555 0.445 0.555555 0.444445
Looking at different initial distributions for this Markov chain and what happens in the ‘long run’
from these starting distributions:
47
Notice that as n → ∞, P (n) is tending to a limiting matrix whose rows are identical to each other, and
p(n) seems to be tending to a limiting probability distribution which is the same regardless of the initial
distribution. This is a very important phenomena which we describe in the next definition.
A probability row vector π = {πj , j ∈ S} is an equilibrium distribution for the discrete-time Markov
chain {Xn } if
p(n) −→ π as n −→ ∞,
independently of the initial distribution p(0) (i.e. for any initial distribution, the long-run distribution
of Xn as n → ∞ is the same).
for all i, j ∈ S.
Notice what this says: π is an equilibrium distribution if p(n) → π as n → ∞ regardless of the start
distribution, p(0) .
It is important to note that there cannot be two or more of these limits: either such a limit does not exist,
or exactly one exists. Think about this carefully!
Though the concepts of an invariant distribution and an equilibrium distribution are therefore closely related,
they are NOT the same thing: an equilibrium distribution must also be an invariant distribution, BUT an
invariant distribution is not necessarily an equilibrium distribution.
=⇒
⇐=
6
Invariant distribution π:
- If the chain starts in (or equivalently, gets to) π, the distribution of states at all further times is also
π.
48
Equilibrium distribution π:
- If the chain is run for long enough (i.e. forever) its probabilistic behaviour settles down to that of π.
- This is regardless of the state in which the chain starts.
Notice:
• It is possible for a Markov chain to have more than one invariant distribution but not have an equilib-
rium distribution.
• It is possible for a Markov chain to have neither an invariant distribution nor an equilibrium distribu-
tion.
49
Result: Main limit theorem
Let {Xn } be an irreducible ergodic Markov chain. Without loss of generality we assume that S =
{0, 1, 2, . . .}. Then
π = πP ,
(n)
(b) pij → πj as n → ∞, for all i, j ∈ S ,
(c) π satisfies
1
πj = , for all j ∈ S ,
µj
where µj is the mean recurrence time of state j.
as n → ∞.
3. Part (b) of the result implies that π is the equilibrium distribution of the chain (since p(n) = p(0) P (n)
tends to π as n → ∞).
4. The following interpretations of π are important.
(i) Observation 3 above implies that πj is the limiting probability that the chain is in state j at time
n.
(ii) It can be shown that πj is also equal to the long-run proportion of time that the chain spends in
state j.
5. Note that part (a) of the result can be strengthened to:
An irreducible aperiodic Markov chain has an invariant distribution if and only if the chain is
positive recurrent.
50
This implies that π is unique and satisfies πj = 1/µj , j ∈ S.
This provides an alternative way to show that a Markov chain is positive recurrent: if the chain is
irreducible, aperiodic and has an invariant distribution (that is, there exits a π such that π = πP ) then
it must be positive recurrent.
Exercise 3.16
Determine whether the Markov chains with the following features have an equilibrium distribution
and/ or invariant distribution. [HINT: are the following irreducible, ergodic Markov chains? How
many invariant distributions does each one have?]
0.6 0.4
(a) S = {R, F } , P = .
0.5 0.5
1 0
(b) S = {0, 1} , P = .
0 1
Periodicity
The MLT assumed that the chain was ergodic, and therefore aperiodic. What happens if we relax this
assumption, so that we have an irreducible positive recurrent Markov chain {Xn , n = 0, 1, 2, . . .} with period
d > 1?
Let state j have mean recurrence time µj . We construct a new process {Yn , n = 0, 1, 2, . . .} with Yn = Xnd .
Then {Yn , n = 0, 1, 2, . . .} is an ergodic Markov chain and in this new chain, state j has mean recurrence
time µj /d. From the Main Limit Theorem it follows that
d
P(Yn = j | Y0 = j) → as n → ∞,
µj
that is,
(nd) d
P(Xnd = j | X0 = j) = pjj → as n → ∞.
µj
However,
(n)
pjj = 0 for n ∈
/ {0, d, 2d, 3d, . . .}.
Hence {Xn } has no equilibrium distribution.
Positive recurrence
The MLT assumed that the chain was ergodic, and therefore positive recurrent. What happens if we relax
51
this assumption so that we have an irreducible aperiodic Markov chain {Xn , n = 0, 1, 2, . . .} with transient
or null-recurrent states only?
Suppose that a Markov chain consists only of transient and/ or null-recurrent states. Then all states have
µj = ∞. It follows from the proof that null recurrence is a class property that for all i, j
(n) 1
pij → = 0 as n → ∞,
µj
that is, all limits exist but are zero. This limit is not a proper probability distribution so no equilibrium
distribution exists.
Irreducibility
The MLT assumed that the chain was irreducible. What happens if we relax this assumption so that the
Markov chain has more than one class? We can split this into two possibilities:
(a) Suppose that a Markov chain contains 2 or more closed classes, Cm1 , Cm2 , . . ..
If, for example, class Cm1 is ergodic then for all j ∈ Cm1
p(n) → 1 , as n → ∞, if i ∈ Cm
ij µj 1
Hence there is no equilibrium distribution (the limit depends on the initial state).
(b) Suppose that a Markov chain consists of a finite number of transient states and a single closed class
C. Then eventually the chain will be absorbed into C.
1
πj = for all j.
µj
This does not give a proper probability distribution, and so if C is not ergodic then no equilibrium
distribution exists.
(b) No closed classes (i.e. all states are transient) ⇒ no equilibrium distribution exists.
(c) If there exists exactly one closed class C, is it certain that the Markov chain will eventually be absorbed
into C?
52
– If “no” then no equilibrium distribution exists (the long-term behaviour of the chain depends on
p(0) ).
– If “yes” then an equilibrium distribution exists if and only if C is ergodic.
An equilibrium distribution exists if and only if there is an ergodic class C and the
Markov chain is certain to be absorbed into C eventually, wherever it starts.
Exercise 3.17
Three cards, labelled A,B and C, are placed in a row and their order changed as follows. A random
choice is made of either the left hand card or the right hand card, either card having probability 1/2
of being chosen, and this card is then placed between the other two. This process is repeated, the
successive random choices being independent.
(a) Find the transition matrix for the six-state Markov chain of successive orders X1 , X2 , . . . of the
cards.
(b) Is the chain irreducible, aperiodic?
(c) Find the two-step transition matrix.
(d) If the initial order is ABC, find approximately the probability that after 2n changes, where n
is large, the order is ABC.
53
Exercise 3.18
For each of these Markov chains state, with a reason, whether an equilibrium distribution exists. If
it does exist then find it.
(a)
1/2 1/2 0 0 0
1/2 1/2 0 0 0
S = {0, 1, 2, 3, 4}, P =
0 0 0 1 0 .
0 0 1 0 0
1/4 0 1/4 0 1/2
(b)
1/2 1/2 0 ··· 0 0
1/3 0 2/3 0 0 · · ·
S = {0, 1, 2, . . .}, P =
.
1/4 0
0 3/4 0 · · ·
. .. .. .. ..
.. . . . .
(c)
0 1/2 1/2 0
1/2 0 0 1/2
S = {1, 2, 3, 4}, P =
1 0 0 0
0 1 0 0
(d)
1/3 2/3 0 0
1 0 0 0
S = {0, 1, 2, 3}, P =
.
1/4 1/4 1/4 1/4
0 0 1 0
(e)
q p 0 0 0 ···
0 q p 0 0 ···
S = {0, 1, 2, . . .}, P =
.
0 0 q p 0 ···
.. .. .. .. ..
. . . . .
54
4 Continuous-time Markov processes
Notation
Continuous-time processes can change their value/state (‘jump’) at ANY instant of time.
Remember that in this course we only consider stochastic processes {X(t), t ≥ 0} with finite or count-
able state space S.
{X(t), t ≥ 0} is a continuous-time Markov process if, for all t ≥ 0, 0 ≤ t0 < t1 < · · · < tn < s and for
all states i, j, i0 , i1 , . . . , in ∈ S
As in the discrete time case, we only consider time homogeneous processes, which implies that:
The difficulty with continuous time processes is that there is no ‘smallest’ unit of time. In the discrete time
case, we talked about 1-step transition probabilities and extended these to more general n−step transition
probabilities. Here, we have no such ‘minimum time unit’ and so our notation for the transition probabilities
must clearly show the amount of time taken to reach one state from another. We will use the notation:
for transition probabilities, and store these in a transition matrix P(t) (note again here that we are being
careful with our unit of time). If S = {0, 1, 2, . . .} this will be of the form
00p (t) p 01 (t) p 02 (t) · · ·
p10 (t) p11 (t) p12 (t) · · ·
P (t) =
p20 (t) p21 (t) p22 (t) · · ·
.. .. ..
. . .
The same ‘rules’ apply here as for the discrete time case: the elements of P (t) must satisfy, for all t ≥ 0,
X
pij (t) ≥ 0 and pij (t) = 1.
j
55
In particular, when t = 0,
1
if j = i,
pij (0) = P (X(0) = j | X(0) = i) =
0 if j 6= i,
so P(0) = I (the identity matrix). This makes sense - the process stays in the same state if we allow no time
to pass!
f (h)
lim = 0,
h→0 h
that is,
as h → 0, f (h) → 0 faster than h does.
Example 4.1
• f (x) = x2 .
f (h) h2
= = h → 0 as h → 0.
h h
So x2 is o(h).
√
• f (x) = x.
√
f (h) h 1
= = √ → ∞ as h → 0.
h h h
√
So x is not o(h).
56
Example 4.2 Suppose that X ∼ exponential(λ). Then for h > 0,
Exercise 4.3
Show that if a random variable X is memoryless, then
It turns out that the exponential distribution is memoryless (in fact, it is the only memoryless distribution).
Example 4.2 demonstrates this clearly: we showed that if X ∼ exponential(λ), then, according to (4) above,
or equivalently,
P(X − t ∈ (0, h] | X > t) = 1 − exp(−λh), for h > 0. (5)
The right hand side of (5) is the distribution function of an exponential(λ) distribution. This implies that
That is, if X ∼ exponential(λ) then {(X − t) | X > t} ∼ exponential(λ) too. Check back to the definition
of a memoryless distribution to see that the exponential distribution is therefore memoryless.
57
The identity in exercise 4.3 shows the impact of the memoryless property. Suppose that we are waiting for
an event to occur (e.g. a train arriving at Goodge Street station), the time we must wait being exponentially
distributed with parameter λ. If we wait s minutes without a train arriving, the remaining time until the
event occurs is still exponential(λ). That is, the fact that we have already waited s minutes without an event
is ‘forgotten’ and we essentially start our wait again.
Example 4.4
Question: Service times at a bank are exponentially and independently distributed with parameter
µ. One queue feeds two clerks. When you arrive both clerks are busy, but the queue is empty. You
will be next to be served; what is the probability that, of the three customers present, you will be
last to leave?
Answer: When the first customer leaves you are served. Your service time T1 has an exponential(µ)
distribution and by the lack-of-memory property of the exponential distribution so does the further
service time T2 of the other customer. T1 and T2 are independent. Therefore
• The probability that Yk is the minimum of these random variables can be shown to be
λk
(7)
λ1 + ... + λn
Challenge
Prove these assertions about the minimum of exponential random variables!
58
4.3 Breaking down the definition of a continuous time Markov process
A continuous-time Markov chain can be described by the distribution of holding times for each state (i.e.
the distribution of the length of time the chain stays in a particular state), together with the jump chain
(i.e. when the continuous-time Markov chain does leave a state, to which other state does it go?).
Suppose that the process starts in state i, and let T (i) be the holding time in state i. We will show that
P (T (i) > u + s|T (i) > u) = P (T (i) > s) for u, s ≥ 0, and from this we can deduce that the distribution
of T (i) is memoryless (and hence exponential).
Notice that the event {T (i) > u} is identical to the event {X(t) = i for all 0 ≤ t ≤ u}, for any u ≥ 0.
Now:
P (T (i) > s + t|T (i) > s) =P (X(u) = i for all 0 ≤ u ≤ t + s | X(u) = i for all 0 ≤ u ≤ s)
↓ Using information from the conditioning
=P (X(u) = i for all s ≤ u ≤ t + s | X(u) = i for all 0 ≤ u ≤ s)
↓ Markov property
=P (X(u) = i for all s ≤ u ≤ t + s | X(s) = i)
↓ Time homogeneity
=P (X(u) = i for all 0 ≤ u ≤ t | X(0) = i)
↓ Equivalent statements
=P (T (i) > t)
So T (i) is memoryless, and therefore exponentially distributed. Supposing that we let qi denote the
parameter of this exponential distribution, then qi can also be thought of as the rate at which the chain
leaves state i.
In the previous section we defined the rate at which the chain leaves state i to be qi . Notice that this makes
no assumptions about the state which the chain goes to, simply that the chain moves from state i. To which
state does the process move when it leaves state i?
A simple way of thinking about this question is with the ‘alarm clock’ analogy. Suppose that you are cur-
rently in state i, and from here, you could potentially go to one of states j, k, l, .... Each of these states carries
59
with it an alarm clock, which will ring at an exponentially distributed time with parameter qij , qik , qil ... re-
spectively, independently of each other. For example, the time at which the alarm clock in state j will ring,
Tij , is exponential with parameter qij .
Whichever alarm clock rings first, the process will next move to that state and will do so immediately. Once
the chain moves to this state, we reset all alarm clocks again, one for each of the states, noting that there is
no alarm clock for the state that the chain is currently in (why?).
Also note that if I am currently in state i, the time at which the alarm clock in state j rings (i 6= j), is
Tij ∼ exponential(qij ). However, if I am in another state, say k (with k 6= j and k 6= i), then the time
at which the alarm clock in state j rings, is Tkj ∼ exponential(qkj ). Note the change in parameter- the
distribution for the time at which the alarm clock in state j rings changes depending on where the process
is currently.
Why is the distribution of the time at which an alarm clock rings exponential?
Suppose the time at which alarm clocks ring is not exponentially distributed. It can be shown that this
implies that the holding time in a state is not memoryless, and therefore not exponential. This is a
contradiction.
Therefore, the time at which the alarm clocks ring must be exponentially distributed.
Example 4.5
A continuous time Markov chain with three states, S = {1, 2, 3}, is currently in state 1 (the green
dot denotes the location of the process in the diagram below). An exponential alarm clock is placed
in each of states 2 and 3, and whichever of these rings first, the process will immediately go to that
state. Notice that the time at which either can ring, T12 and T13 are independent of one another.
It is clear that the holding time in state 1, T (1), can be expressed as:
Now suppose that the process is in state 3. From state 3, the process can only go to state 2 next and
so only one alarm clock is set (i.e. there is no alarm clock in state 1), and the time that this alarm
clock rings is the time that the process moves to state 2.
60
where such direct transitions are possible (see example 4.5 above for a case where not all transitions are
possible), and
T (i) ∼ exponential(qi )
Tij ∼ exponential(qij ),
it makes sense that the parameter qi and the parameters {qij ; j 6= i} are also related. But how?
Look back at the result in (6). This tells us exactly how the parameter qi and the parameters
{qi1 , ..., , qi,i−1 , qi,i+1 , ..., qik } are related: X
qi = qij . (8)
j6=i
This also makes intuitive sense: the rate at which we leave state i is the sum of the rates at which we
enter other states.
Before moving on, note one more property of these exponential times. Suppose that the process is in state
i currently and can from there move to any of the states {1, ..., i − 1, i + 1, ..., k}. What if we conditioned
on the process going to state k next? Does this affect the holding time in state i? Is the holding time
in state i, under this assumption, exponential with parameter qi as per usual, or does the conditioning im-
ply that it is now exponential with parameter qik (as this is the time at which the alarm clock in state k rings)?
The answer is that the holding time in state i is still exponential with parameter qi , even if we condition on
going to a particular state k next. Let the events A and B be defined as follows:
• A is the event min{Ti1 , ..., Ti,i−1 , Ti,i+1 , ..., Tik } > t;
• B is the event min{Ti1 , ..., Ti,i−1 , Ti,i+1 , ..., Tik } = Tik ;
• Note that we want to show that P (A|B) = P (A);
P (A) P (A)
P (A|B) = P (B|A) = P (B) = P (A),
P (B) P (B)
as required.
Using these results, can we find the probability that the alarm clock for state j will be the first to ring,
give that the process is currently in state i? This is equivalent to asking ‘what’s the probability that
when we move from state i, we go to state j?’. Since the time until the alarm clock for state j rings is
61
exponential with parameter qij , then
Therefore, the probability that from state i, the process goes to state j is given by pij = qij /qi .
Note: Suppose that the continuous-time Markov chain has an absorbing state, state i. If the chain enters
state i then it will stay there forever. In this case we would set pii = 1, rather than using the equation given
above.
Notice that the jump chain is itself a discrete-time Markov chain, and therefore we can construct its transition
matrix.
Example 4.6
Suppose that a continuous time Markov chain has state-space S = {1, 2, 3, 4}, and that qij denotes
the rate at which the chain moves from state i to state j (for j 6= i). Let
X
qi = qij for i = 1, 2, 3, 4.
j6=i
Notice that the diagonal of the transition matrix for a jump chain is ALWAYS ZERO, since we are
assuming that the chain moves to another state. Check that all rows sum to 1!
The jump chain can be used to answer questions which involve the states that the chain enters but not the
times at which they are entered:
• What is the probability that {X(t), t ≥ 0} ever enters a particular state?
• If a chain states are thought of as the number of individuals in a population (e.g. if the process is in
state 5, then there are 5 individuals in the population), what is the probability that a population ever
becomes extinct? That is, what is the probability that the chain ever enters state 0?
62
4.4 Analysis of transition probabilities
So far, we have only considered the rate at which a continuous time Markov chain moves from one state
to another. For discrete time processes, our interest was initially in transition probabilities. We will now
explore transition probabilities for continuous time Markov chains.
The obvious ‘problem’ with continuous time Markov chains is that, unlike in the discrete time setting, there
is no “smallest time” until the next transition. Instead, we can jump to another state at any time. The
transition probability pij (t), for states i and j (j 6= i), is a function of t which in principle can be studied.
This is far more complex than in the discrete time case, and our aim is to dispose of the need to work with
transition probabilities to study the long term behaviour of the chain.
It turns out that this is possible, and we will later encounter the generator matrix of a continuous time
Markov chain, which will be much easier to work with than transition probabilities pij (t). To understand
where the generator matrix comes from, we will first investigate the nature of transition probabilities for
continuous time Markov processes.
We will make use of T (i), the holding time in state i, throughout this section. First we will consider transition
probabilities pij (h) over a very small period of time of length h. This will help us in studying transition
probabilities over longer periods of time, and we will be able to develop further tools, namely:
1. The continuous time version of the Chapman-Kolmogorov equations;
2. Kolmogorov’s forward equations;
3. Kolmogorov’s backward equations
Don’t forget that this is all in the hope of understanding the long term behaviour of continuous time Markov
chains!
while the probability of one transition in [0, h] and that the transition is to state j 6= i is
pij (h) =P (T (i) < h| move from i to j in the interval [0, h])P (move from i to j in the interval [0, h])
=P (T (i) < h)P (move from i to j in the interval [0, h])
qij
= (1 − exp(−qi h))
qi
qij
= (qi h + o(h)) = qij h + o(h)
qi
63
Probability of two transitions in [0, h] is
X
P (T (i) + T (k) < h| move from i to k first)P (move from i to k first)
k6=i
X qik
≤ P (T (i) < h, T (k) < h| move from i to k first)
qi
k6=i
X qik
≤ P (T (i) < h| move from i to k first)P (T (k) < h| move from i to k first)
qi
k6=i
X qik
= (qi h + o(h)) (qk h + o(h)) = o(h).
qi
k6=i
On the other hand, the probability of two transitions in [0, h] and that the chain at time h is in state j is
P (two transitions in [0, h] and that the chain at time h is in state j) ≤ P (two transitions in [0, h]) = o(h)
Similarly, we can show that for more than two transitions, P (more than two transition in [0, h]) = o(h).
In summary:
• In a very small time interval of length h, the probability of making more than one transition is
o(h). This is negligible, and so we consider that AT MOST one transition will occur in such a
small time interval.
• The probability of moving from state i to state j in the small time interval [0, h] is
and that no transitions occur during this interval (otherwise, to get back to state i, we would
require at least two transitions which has negligible probability).
64
Result: Chapman-Kolmogorov equations in continuous time
P (s + t) = P (s) P (t).
i k j state
-
0 s s+t time
Matrix format:
P (s + t) = P (s)P (t)
We also have
p(t) = p(0) P (t),
(t) (t)
where p(0) is the distribution of X0 (the initial distribution) and p(t) = (p0 , p1 , . . .) (the distribution of the
possible location of process at time t).
65
Result: Kolmogorov forward differential equations (KFDEs)
P 0 (t) = P (t) Q,
where, for a continuous time Markov chain with state space S = {1, 2, 3, ...},
−q1 q12 q13 ...
q21 −q2 q23 ...
Q=
,
q31
q32 −q3 ...
. .. .. ..
.. . . .
P
where qij is the transition rate from state i to state j and qi = j6=i qij (the rate at which the process
leaves state i, as before).
The KFDEs are rather easy to prove, the main tool being the Chapman-Kolmogorov equations for continuous
time processes. We take an initial time point (0) followed later by two time points (t and t + h) with h small.
Notice that the times t and t + h are therefore close together.
i k j value of X(t)
-
0 t t+h time
66
Now subtract pij (t) from each side, and divide each side by h:
Letting h ↓ 0 gives X
p0ij (t) = −pij (t)qj + pik (t) qkj , for all i, j ∈ S,
k∈S
k6=j
In matrix notation
P 0 (t) = P (t) Q.
Exercise 4.7
Convince yourself that the matrix version of the KFDEs contains all the individual differential equa-
tions X
p0ij (t) = −pij (t)qj + pik (t) qkj ,
k∈S
k6=j
for all i, j in S.
If we wish to find pij (t) = P (X(t) = j | X(0) = i), for j = 0, 1, 2 . . . then we have only one set of
equations to solve.
If we are able to do this we will get a general solution. Then we use the initial distribution p(0) (often of
the form X(0) = i with probability 1) to find the particular solution which satisfies this initial condition.
67
4.4.4 Kolmogorov’s backward equations
Exercise 4.8
Compare Kolmogorov’s forward and backward equations.
The KBDEs can be shown to hold by conditioning on the state of the process at time h (compare this with
the forward equations). We take two initial time points (0 and h) close together followed later by a single
time point (t + h).
i k j value of X(t)
-
0 h t+h time
68
Therefore, substituting this is in and separating the case i = k from i 6= k,
X
pij (t + h) = (1 − qi h + o(h)) pij (t) + (qik h + o(h))pkj (t)
k∈S
k6=i
X
= pij (t) − hpij (t)qi + h qik pkj (t) + o(h).
k∈S
k6=i
Now subtract pij (t) from each side, and divide each side by h:
Letting h ↓ 0 gives X
p0ij (t) = −pij (t)qi + pik (t) qkj , for all i, j ∈ S,
k∈S
k6=i
which shows that the KBDEs hold. The KBDEs in matrix notation are written as
P 0 (t) = Q P (t).
An initial condition is required to obtain a specific as opposed to general solution. It makes sense to use
P (0) = I as the initial condition (why?). In doing so, both the KFDEs and KBDEs give the same solution
for P (t) (this is a good thing, otherwise something very strange is happening!).
Subject to the initial condition P (0) = I, both KFDEs and KBDEs have the same solution for P (t),
namely
∞ n n
X t Q
P (t) = = exp(tQ).
n=0
n!
Exercise 4.9
Using the solution to the differential equations, i.e.
∞ n n
X t Q
P (t) = ,
n=0
n!
P (h) = I + hQ + o(h),
69
4.4.6 The generator matrix, Q
We’ve seen that the transition rates that we introduced in section 4.3 play a significant role for continuous
time Markov processes, appearing in the KFDEs and KBDEs. The solution to these differential equations
show us that the transition matrix P (t) is entirely dependent on these rates through the matrix Q.
In fact, the matrix Q is so important that it has a name - the generator matrix. We’ll now have a look at
some properties of this matrix Q in preparation for the next section, where we will use it extensively.
For a continuous time Markov process with state-space S = {1, 2, 3, ...}, the generator matrix Q is defined
as
−q1 q12 q13 ...
q21 −q2 q23 ...
Q=
.
q31 q32
−q3 ...
. .. .. ..
.. . . .
Since the qik are rates, they are strictly non-negative, and therefore qi must be so too. Also recall that in
section 4.2, we saw X
qi = qik .
k6=i
Consider the rows of Q, and you will notice that they sum to zero. Therefore, the diagonal of the matrix
Q is negative (can be zero too), while all the off-diagonal entries are positive (can be zero).
In fact, the matrix Q is so important that together with the initial distribution of the continuous time Markov
chain, p(0) , we can specify the process exactly and can compute all that we’re interested in, including the
long term behaviour of the chain. Compare this to the discrete time chain, where we required the initial
distribution together with the transition matrix:
Discrete time process specified by: Continuous time process specified by:
In effect, for continuous time Markov processes, the generator matrix Q takes the place of the transition
matrix P in the discrete time case.
Note: throughout this chapter, we assume that the qij s satisfy certain technical conditions which are
sufficient to prevent a Markov chain ‘behaving badly’, for example, the elements in the matrix Q (the rates)
exist and are finite. These conditions will be met in all our examples.
So far, we have thought about the generator matrix, Q, as a matrix of rates. In fact, we could also show
that
dP (t)
Q := = P 0 (0).
dt t=0
That is, Q contains the rates of change of the transition probabilities P (X(t) = j | X(0) = i) at
t = 0: just as we already know. In order for this differentiation to make sense, we assume that P (t) is
70
continuous at 0, that is,
1
if j = i
pij (t) →
0 if j 6= i as t ↓ 0.
The probability row vector π is an invariant distribution of a continuous-time Markov chain {X(t), t ≥
0} with transition matrix P (t) if
π = π P (t) is equivalent to π Q = 0.
The consequence of this is as follows. If p(0) = π then p(t) = π for all t ≥ 0, that is, if we start the chain
using the distribution π, the distribution of the chain at any future time t is also π.
More generally, if the distribution of states of our continuous Markov chain is π eventually, then the
distribution of states of our continuous Markov chain will be π for all future times.
Finding the invariant distribution of a continuous time Markov chain requires finding a vector π which
satisfies πQ = 0. This is much easier than solving π = π P (t) as P (t) itself is difficult to compute!
71
Suppose that such a distribution exists. Then since pij (t) → πj (a constant) as t → ∞ we expect
p0ij (t) → 0 as t → ∞.
Note that it is not always true the other way around: an invariant distribution of {X(t), t ≥ 0} is not
necessarily an equilibrium distribution.
We used the KFDEs to establish this result. Since the forward and backward equations give the same solu-
tion for P (t), could we have used the KBDEs to do the same?
In this case, the answer is ‘no’. Though the KFDEs and KBDEs look very similar, it is sometimes the case
that only one of these is useful in solving a particular problem. Here, using the KFDEs allowed us to see the
connection between the invariant and equilibrium distributions. However, if we had tried to use the KBDEs
instead, we would get the following:
X X X
0 = −πj qi + qik πj = −πj qi + πj qik = πj qik − qi = 0.
k∈S k∈S k∈S
k6=j k6=j k6=j
Obviously, the equation is perfectly valid and correct, but 0 = 0 is of no use whatsoever!
In other cases, which we will not consider in this course, the backward equations prove to be useful while
the forward equations are not.
The theorem will depend on the Markov chain being irreducible, which we define next.
72
Definition: Irreducible continuous time Markov chains
A continuous-time Markov chain is irreducible if, for every i and j in S, pij (t) > 0 for some t. That
is, all states in the chain intercommunicate.
(i) If there exists an invariant distribution π then it is unique and pij (t) → πj as t → ∞ for all
i, j ∈ S. (i.e. π is the equilibrium distribution of the chain.)
(ii) If there is no invariant distribution then pij (t) → 0 as t → ∞ for all i, j ∈ S. (i.e. no equilibrium
distribution exists.)
Note:
73
5 Important types of continuous-time processes
5.1 The Poisson process
This is a very simple but important continuous time Markov process which is the building block for
many complex processes. A Poisson process, {N (t), t ≥ 0}, counts the number of events in (0, t]. These
events occur one at a time and independently of each other.
• The state space of a Poisson process is infinite and consists of the natural numbers together with {0}:
S = {0, 1, 2, 3, ...}.
• And so on...
• These ‘events’ occur uniformly, i.e. at a constant rate, λ, per unit time.
You may find it helpful to visualise a Poisson process using the following state space diagram,
0 1 2 3 4 ···
Notice that there are no probabilities on the arrows here, unlike the discrete time case. Remember that
moving to the next state can occur at any time. Equivalently, we can visualise a Poisson process using a
graph which plots the realisation of the process; each ‘step’ upward denotes the occurrence of an event.
74
Definition: Poisson process
Let N (t) denote the number of occurrences of some event in (0, t] for which there exists λ > 0 such
that for h > 0:
(i) P (1 event in (t, t + h]) = λh + o(h);
(ii) P (no events in (t, t + h]) = 1 − λh + o(h);
(iii) The number of events in (t, t + h] is independent of the process in (0, t].
Notes
1. Statement (i) can be written as P(N (t + h) − N (t) = 1) = λh + o(h).
Statement (ii) can be written as P(N (t + h) − N (t) = 0) = 1 − λh + o(h).
2. Statement (i) says that P (1 event in (t, t+h]) is approximately λh (that is, approximately proportional
to the length h of the time interval (t, t + h]) for small h.
Statement (ii) says that P (0 events in (t, t + h]) ≈ 1 − λh, for small h. Therefore, P (more than 1 event
in (t, t + h]) is o(h).
3. Statement (iii) says that the process has independent increments.
4. Note that the Poisson process satisfies the Markov property: consider times t1 < t2 < · · · < tn < tn+1 .
Then
Compare this to the definition of the Poisson process, which states that
and that this holds for all i = 0, 1, 2, .... Together, these imply that qi = λ: the rate of leaving state i is the
rate of the Poisson process.
75
• For k < i, qik = 0 since the Poisson process cannot go down in value - it is an increasing process!
• For k > (i + 1), qik = 0 since the Poisson process cannot jump directly from state i to state k > i + 1:
it must jump to state i + 1 first.
Notice that since we now have the generator matrix, Q, for a Poisson process, and we know that its initial
distribution is always p(0) = (1, 0, 0, ...), we can use the results from Chapter 4 on continuous time processes.
In particular, test your understanding of these results by answering the following.
Exercise 5.1
Does a Poisson process have an invariant distribution? An equilibrium distribution?
Exercise 5.2
Write down the forward and backward equations for the Poisson process (you need not solve them).
Exercise 5.3
Find the transition matrix of the embedded jump chain of the Poisson process. Describe the embedded
jump chain in words.
What is the probability that, by time t, k events have occurred? In other words, can we calculate p0k (t) =:
P (N (t) = k)? Note that this is the same as asking for the probability that the Poisson process has moved
from state 0 to state k in a time interval of length t.
We can use the forward equations for a Poisson process to establish the answer to this question. The forward
76
equations are
X
p0i,i+1 (t) = − pi,i+1 (t) qi+1 + pik (t) qk,i+1
k∈S
k6=j
for all i = 0, 1, 2, .... We can solve these differential equations, as before, and we use
1
if k = 0
p0k (0) = P(N (0) = k) =
0
for k = 1, 2, 3, . . . ,
(λt)k exp(−λt)
p0k (t) = P(N (t) = k) = , k = 0, 1, 2, . . . ,
k!
i.e.
N (t) ∼ Poisson(λt).
Exercise 5.4 k
exp −λt
Show that P(N (t) = k) = (λt) k! is indeed the solution to the forward equations, by substituting
this into both sides of the equation.
Exercise 5.5
Tourists arrive at the departure point of a sightseeing bus according to a a Poisson process of rate
λ per minute. Denote by N (t) the number of tourists that arrive at the departure point of the bus
during a time interval of length t minutes.
(i) For t > 0, name the distribution of N (t) and state its mean and variance.
(ii) The bus driver gets bored and drives off after 10 minutes. If λ = 3, what is the probability that
there will be 32 tourists on the bus? What is the expected number of passengers that will be
on the bus?
(iii) Given that the first tourist arrived during the first 5 minutes, find the probability that he or
she arrived during the first 2 minutes.
The number of events in (s, s + t] has a Poisson(λt) distribution (for all s).
77
Statements (i), (ii) and (iii) in the definition of the Poisson process are constant over time, so
(λt)k exp(−λt)
P(k events in (s, s + t]) = P(k events in (0, t]) = .
k!
[The Poisson process has stationary increments: the distribution of the number of events in a time interval
does not depend on when the interval starts. This is equivalent to time homogeneity. ]
In a small interval of length h, the probability of one event is λh + o(h). The probability of two or
more of these events in a small interval of length h is o(h).
If (a, b] and (c, d] are non-overlapping intervals, then the number of events in (a, b] is independent of
the numbers of events in (c, d]).
The times between successive events are i.i.d. exponential(λ) random variables.
Let T2 be the time between the first and second events. Firstly, we find the marginal distribution of T2 :
78
Z ∞
P(T2 > t) = P(T2 > t | T1 = s)fT1 (s) ds
Z0 ∞
= P(no events in (s, s + t])λ exp(−λs) ds
Z0 ∞
= exp(−λt) λ exp(−λs) ds
0
= exp(−λt).
Therefore, T2 ∼ exp(λ). Now we show that T1 and T2 are independent.
Z ∞
P(T1 > v, T2 > t) = P(T2 > t | T1 = u)fT1 (u) du
Zv ∞
= P(no events in (u, u + t]) λ exp(−λu) du
Zv ∞
= exp(−λt) λ exp(−λu) du
v
= exp(−λt) exp(−λv) = P(T2 > t)P(T1 > v).
Therefore, T1 and T2 are independent. Repeating the argument for (T2 , T3 ), (T3 , T4 ), . . . gives the result.
Let Sr be the time to the rth event. Then Sr = T1 + · · · + Tr . This is the sum of r i.i.d. exponential(λ)
random variables, which has a Gamma(r, λ) distribution.
The time from an arbitrary time point t to the next event is an exponential(λ) random variable.
This follows directly from the lack-of-memory property of the exponential distribution and is consistent with
the definition (i), (ii), (iii) of a Poisson process.
79
These properties are summarised in the following diagram.
A further very useful property is the distribution of the times at which events occur when we know exactly
the number of events that have happened. Notice that this is a result which is conditional on knowing the
number of events that have occurred.
Given that exactly k events occur in (0, t], the members of the set (U1 , . . . , Uk ) of k (unordered)
arrival times are i.i.d. random variables, each of which has distribution U (0, t). That is,
i.i.d.
U1 , . . . , UN (t) | N (t) = k ∼ U (0, t).
This makes sense: the k events are randomly scattered in the interval (0, t]. You do not need to be able to
prove this result, but you will need to use it.
Suppose that {N1 (t), t ≥ 0} and {N2 (t), t ≥ 0} are independent Poisson processes with rates λ1 and λ2
respectively, and N (t) = N1 (t)+N2 (t) for all t ≥ 0 (so that {N (t)} counts the total number of events).
80
We’ll show that the three conditions, (i), (ii) and (iii), required of a Poisson process, hold.
(i)
(iii) Independent increments. This holds because it holds for each process individually and the two process
are independent.
{N (t), t ≥ 0} is a Poisson process of rate λ. Each event of the process is deleted with probability
(1 − p) and kept with probability p, independently of all other events in the process.
Let {M (t), t ≥ 0} be the process containing only the events which are kept, then {M (t), t ≥ 0} is a
Poisson process of rate pλ.
We’ll show that the three conditions, (i), (ii) and (iii), required of a Poisson process, hold.
(i)
(ii)
(iii) Deleting events doesn’t affect the independent increments property because events are deleted inde-
pendently of each other.
81
Example 5.6
Outside Heals on Tottenham Court Road a charity worker attempts to stop people in order to
have a conversation. People walking along the street arrive at Heals in a Poisson process of rate
40 per minute. On average only 1 in every 20 people stop to talk. Each person’s decision is taken
independently of all other people. Each conversation takes a time which is exponentially distributed
with mean 1 minute, and is independent of other conversations. If the charity worker is busy, people
passing Heals do not stop.
Let X(t) denote whether the charity worker is busy at time t, with S = {0, 1}. State Q and the
forward equation for p01 (t). If the charity worker is free at time t = 0 minutes, show that the
probability that they are busy at time t > 0 is
2 2
− exp(−3t).
3 3
Hence, or otherwise, find the long-run proportion of the time that the charity worker is busy.
Solution
Firstly, pn (t) = P (X(t) = n | X(0) = 0) with X(0) = 0 so that p0 (0) = 1. People who stop to talk
1
arrive in a Poisson process of rate 40 × 20 = 2 per minute. [Thinning of a Poisson process.]
Notice that p00 (t) = 1 − p01 (t), and substituting this into the forward equation for p01 (t) yields
We get the long-run proportion of time that the charity worker is busy by letting t → ∞, giving the
answer 2/3. Don’t forget that we could have obtained this answer more easily by solving πQ = 0,
yielding π1 = 2/3, as before.
82
Exercise 5.7
A bicycle factory has three independent production lines, each of which produces bicycles according
to a Poisson Process. The first, for racing bikes, has a time in minutes between completed machines
that is exponential (1/5). The interarrival times for the second, for BMX bikes, are exponential (1/4)
and for the third, for mountain bikes, are exponential (1/6).
Exercise 5.8
On a single telephone line, calls arrive according to a Poisson process of rate λ per minute.
(a) Name the distribution of the number of phone calls that arrive during a time interval of length
t minutes (t > 0), and state its mean.
(b) Name the distribution of the time (in minutes) between the arrivals of two successive phone
calls, and state its mean.
If a caller finds the line free then his/her call is ‘effective’; effective calls have durations that are
independent, exponentially distributed random variables with a mean of 1/µ minutes, independently
of the arrival process. If a caller finds the line busy then this call is lost.
(c) Let Y be the time (in minutes) from the beginning of an effective phone call to the arrival of
the next effective phone call. Write down the mean and the variance of Y .
(d) State, with a reason, whether or not the arrivals of the effective phone calls follow a Poisson
process.
83
We think about the ‘states’ of a birth death process as the size of the population: if the process is ‘in
state n’ then ‘the size of the population is n’. The state identifies the number of individuals in the
population (though note that a birth-death process doesn’t have to describe the size of a population -
it can describe any process which goes up or down by one unit at each transition).
• to state (n + 1) (a birth) OR
• to state (n − 1) (a death).
Notation: if the size of the population is n, then, using the same notation as in Chapter 4,
Birth rate: qn,n+1 = λn
Death rate: qn,n−1 = µn
84
• Obviously we must have µ0 = 0. Why?
• The birth and death rates can depend on the population size. That is, they can change as the
population size changes which is why the general birth and death rates are indicated with a
subscript n, the population size.
How long does the process stay in state n? We know from Chapter 4 that the distribution of this time is
exponential, and since the only transitions allowable are to (n + 1) (with rate qn,n+1 = λn ) or to (n − 1) (with
rate qn,n−1 = µn ) then we have that the rate at which the process leaves state n is qn = qn,n+1 + qn,n−1 =
(λn + µn ). Therefore, the time T until the next ‘event’ will be exponential with rate (λn + µn ),
T ∼ exponential(λn + µn ).
From this we can construct the jump chain of a birth-death process. Using the birth and death rates λn and
µn , the next event will be a:
• birth, with probability λn /(λn + µn );
• death, with probability µn /(λn + µn ).
The transition matrix of the embedded jump chain is therefore
? ? 0 0 0 0
µ1 λ1
λ +µ 0 λ1 +µ1 0 0 0
1 1
P = µ2 λ2
0 λ2 +µ2 0 λ2 +µ2 0 0
µ3 λ3
0 0 λ3 +µ3 0 λ3 +µ3 ···
.. .. .. .. .. ..
. . . . . .
Why is there ambiguity in the first line of the transition matrix of the embedded jump chain?
• For k > (i + 1), qik = 0 since the birth-death process cannot jump directly from state i to state
k > i + 1: it must jump to state i + 1 first.
85
Definition: Generator matrix for the birth-death process
We could also have constructed the generator matrix by computing the transition probabilities pij (h) for
small h, and reading off the associated transition rates. Let pij (h) = P(N (t + h) = j | N (t) = i). Then
pn,n+1 (h) = {λn h + o(h)}{1 − λ h + o(h)} + o(h)
= λn h + o(h)
pn,n−1 (h) = {1 − µ h + o(h)}{µ h + o(h)} + o(h)
= µ h + o(h)
pnn (h) = {1 − µ h + o(h)}{1 − λ h + o(h)} + o(h)
= 1 − (µ + λ)h + o(h).
pnm (h) = o(h), if m ∈
/ {n − 1, n, n + 1}.
Now compare this to the general result for any continuous time Markov process with discrete state space:
pij (h) = qij h + o(h)
pii (h) = 1 − qi h + o(h)
Match up the rates with the qi and qij , to construct the generator matrix. Of course, this is a far more
cumbersome way of extracting the generator matrix!
Easy exercise
Which types of birth-death processes (listed above) have birth or death rates which depend on the
current population size?
86
We could define a process that has (i), (ii), (iii) and (iv), that is,
λn = λ, n = 0, 1, . . . and µn = 0, n = 1, 2, . . . .
therefore, supposing that the process is currently in state n (n individuals in the population):
• the probability of a birth in (t, t + h], is pn,n+1 (h) = λn h + o(h).
• the probability of a death in (t, t + h], is pn,n−1 (h) = µn h + o(h).
87
Exercise 5.9
Fill in the table below, assuming that the population size at time t is n.
Birth rate
Death rate
(small h)
(small h)
5.2.5 The equilibrium distribution of the (general) birth and death process
Do all birth-death processes have an equilibrium distribution? Our result in Chapter 4 only specified whether
an irreducible continuous time Markov chain had an equilibrium distribution:
(i) If there exists an invariant distribution π then it is unique and pij (t) → πj as t → ∞ for all i, j ∈ S.
– That is, π is the equilibrium distribution of the chain.
(ii) If there is no invariant distribution then pij (t) → 0 as t → ∞ for all i, j ∈ S.
No (consider, for example, a Poisson process). When a process is not irreducible, we cannot guarantee that
an invariant distribution is an equilibrium distribution. Which birth-death processes are irreducible (and so
we can apply the result above)?
88
Exercise 5.10
Fill in the table to show which birth-death processes are irreducible, and which are not.
with Immigration
Linear death
Emigration
Linear death
with emigration
Note that for a process to have an equilibrium distribution, it must have exactly one invariant distribution.
Recall that the existence of exactly one invariant distribution does not necessarily imply that the process
has an equilibrium distribution, though! Furthermore, on applying the result in Chapter 4,
• If the process is irreducible - existence of invariant distribution implies existence of equilibrium distri-
bution.
• If the process is NOT irreducible - cannot yet tell if existence of an invariant distribution implies the
existence of an equilibrium distribution.
First of all, we’ll search for conditions under which an invariant distribution exists for a birth-death process.
Once this has been established, we will show that for one particular non-irreducible birth-death process
(for which exactly one invariant distribution exists) the invariant distribution is in fact an equilibrium
distribution.
Solving πQ = 0 gives
−λ0 π0 + µ1 π1 = 0
λj−1 πj−1 − (λj + µj ) πj + µj+1 πj+1 = 0, j = 1, 2, . . . .
89
Therefore, π defined by:
λ0
π1 = π0 ;
µ1
λj−1 · · · λ0
πj = π0 . for j = 2, 3, . . ..
µj · · · µ1
satisfies πQ = 0.
P∞
For π to be a probability distribution it must also satisfy j=0 πj = 1.
∞
X λj−1 · · · λ0
π0 1 + = 1,
j=1
µj · · · µ1
−1
∞
1 +
X λj−1 · · · λ0 = π0 . (10)
j=1
µj · · · µ1
These equations define a probability distribution if and only if the sum in (10) is finite. If the state space S
is finite, then the sum in (10) must be finite.
In summary, if (10) is finite, then an invariant distribution exists. This is guaranteed for birth-death
processes with finite state-space.
Notice that if (10) is divergent an invariant distribution (and therefore an equilibrium distribution) cannot
exist.
If the process is irreducible, the equilibrium and invariant distributions coincide (if the latter exists).
What if the process is not irreducible? Can an equilibrium distribution exist? The answer is yes. We will
now look for conditions under which an equilibrium distribution exists for a linear birth-death process
only (we will not look at other types of non-irreducible birth-death processes). This equilibrium distribution
will be linked to the probability of extinction for the process, which we look at next.
How are ‘extinction probability’ and invariant distributions linked in this case?
For linear birth-death processes, once extinction (reaching state 0) occurs, we stay in this state forever.
Therefore, π = (1, 0, 0, . . .) is an invariant distribution for the process.
90
Under what circumstances is π = (1, 0, 0, . . .) also an equilibrium distribution? This is equivalent to asking
‘under which conditions does the population die out (become extinct) with probability 1?’. Alternatively,
‘under what circumstances is it certain that the chain will reach the state 0?’.
An elegant way of solving this is via generating functions. Let N (t) denote the state of the linear birth-
death process at time t (i.e. the number of individuals in the population at time t). Then
h i X∞
G(s, t) = E sN (t) = sn P(N (t) = n).
n=0
Now, the probability that extinction occurs at or before time t is given by P(N (t) = 0), which can be
extracted from the generating function via
∞
X
G(0, t) = 0n P(N (t) = n) = P(N (t) = 0),
n=0
so that N (0)
µ−µ exp(−(λ−µ)t)
if λ 6= µ
λ−µ exp(−(λ−µ)t)
G(0, t) =
N (0)
λt
1+λt if λ = µ
If we let t → ∞ we get
1 if λ ≤ µ
P(eventual extinction) =
(µ/λ)N (0)
if λ > µ.
If the birth rate is no larger than the death rate, the population is certain to become extinct eventually.
If, on the other hand, λ > µ, then expected population size increases to ∞ as t → ∞. You can see this
by using G(s, t) to show that
E[N (t)] = N (0) exp(−(λ − µ)t).
However, even under these conditions, extinction is still possible and so the population size will either
increase without limit, or become extinct. As there is uncertainty as to how the process behaves in the
future, there is no equilibrium distribution.
91
5.3 Exercises
The next exercises are exam-type questions. They are hard, but will test your understanding of birth-death
processes.
Exercise 5.11
Suppose that arrivals to a utopian island occur as a Poisson process with rate λ per year. Those who
arrive on the island can never leave, but each person on the island at time t has probability µh + o(h)
of dying in the interval (t, t + h] years. Any individual on the island at time t has probability
θh + o(h) of giving birth in the interval (t, t + h] years.
Arrivals, births and deaths are independent and the probability of two or more of these occurring in
(t, t + h] is o(h). The parameters λ, µ, θ are all positive. Let Xt denote the population size of the
island by time t.
(d) Using the expressions given in part (c), calculate the expected time until the population reaches
size (i + 1) for the first time, for i = 0, 1, 2.
(e) Given that Xt = 0, give an expression for how long, on average, it takes for the population to
reach size 2 for the first time.
92
Exercise 5.12
The population of an island evolves according to an immigration-death process. Specifically, the
population evolves according to the following rules.
People immigrate onto the island in a Poisson process of rate α per year. Each person on the
island at time t has probability µh + o(h) of dying in the time interval (t, t + h] years. There is no
emigration from the island and there are no births.
Immigrations and deaths are independent and the probability of two or more events (immigrations
or deaths) in (t, t + h] is o(h). The parameters α and µ are positive.
(e) Suppose now that the island has an active volcano. When the volcano erupts the whole
population of the island is killed, reducing the population size to zero. Eruptions occur in a
Poisson process of rate φ per year. Otherwise the population of the island evolves according to
the immigration-death process described above.
Using your answer to (d)(i), or otherwise, calculate the expected number of people killed by the
next eruption to occur after time 0, given that N (0) = 0.
93