Hoelbook
Hoelbook
The Houghton MifHin Series in Statistics under the Editorship of Herman Chernoff
LEO BREIMAN
Probability and Stochastic Processes: With a View Toward Applications Statistics: With a View Toward Applications
PAUL G. HOEL, SIDNEY C. PORT, AND CHARLES J. STONE
Introduction to Statistics-A
Fresh Approach
J. Stone
BOSTON
New York
Atlanta
Geneva, Illinois
Dallas
Palo Alto
COPYRIGHT
All rights reserved. No part of this work may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and recording, or by any information storage or retrieval system, without permission in writing from the publisher.
PRINTED IN THE U.S.A. LmRARY OF CONGRESS CATALOG CARD NUMBER: ISBN:
79-165035
0-395-12076-4
General Preface
This three-volume series grew out of a three-quarter course in probability, statistics, and stochastic processes taught for a number of years at UCLA. We felt a need for a series of books that would treat these subjects in a way that is well coordinated, but which would also give adequate emphasis to each subject as being interesting and useful on its own merits. The first volume, Introduction to Probability Theory, presents the fundamental ideas of probability theory and also prepares the student both for courses in statistics and for further study in probability theory, including stochastic processes. The second volume, Introduction to Statistical Theory, develops the basic theory of mathematical statistics in a systematic, unified manner. Together, the first two volumes contain the material that is often covered in a two-semester course in mathematical statistics. The third volume, Introduction to Stochastic Processes, treats Markov chains, Poisson processes, birth and death processes, Gaussian" processes, Brownian motion, and processes defined in terms of Brownian motion by means of elementary stochastic differential equations.
Preface
In recent years there has been an ever increasing interest in the study of systems which vary in time in a random manner. Mathematical models of such systems are known as stochastic processes. In this book we present an elementary account of some of the important topics in the theory of such processes. We have tried to select topics that are conceptually interesting and that have found fruitful application in various branches of science and technology. A stochastic process can be defined quite generally as any collection of random variables X(t), t E T, defined on a common probability space, where T is a subset of ( - 00, (0) and is thought of as the time parameter set. The process is called a continuous parameter process if T is an interval having positive length and a discrete parameter process if T is a subset of the integers. If the random variables X(t) all take on values from the fixed set !/, then !/ is called the state space of the process. Many stochastic processes of theoretical and applied interest possess the property that, given the present state of the process, the past history does not affect conditional probabilities of events defined in terms of the future. Such processes are called Markov processes. In Chapters 1 and 2 we study Markov chains, which are discrete parameter Markov processes whose state space is finite or countably infinite. In Chapter 3 we study the corresponding continuous parameter processes, with the "Poisson process" as a special case. In Chapters 4-6 we discuss continuous parameter processes whose state space is typically the real line. In Chapter 4 we introduce Gaussian processes, which are characterized by the property that every linear combination involving a finite number of the random variables X(t), t E T, is normally distributed. As an important special case, we discuss the Wiener process, which arises as a mathematical model for the physical phenomenon known as "Brownian motion." In Chapter 5 we discuss integration and differentiation of stochastic processes. There we also use the Wiener process to give a mathematical model for "white noise." In Chapter 6 we discuss solutions to nonhomogeneous ordinary differential equations having constant coefficients whose right-hand side is either a stochastic process or white noise. We also discuss estimation problems involving stochastic processes, and briefly consider the "spectral distribution" of a process.
vii
viii
Preface
This text has been designed for a one-semester course in stochastic processes. Written in close conjunction with Introduction to Probability Theory, the first volume of our three-volume series, it assumes that the student is acquainted with the material covered in a one-semester course in probability for which elementary calculus is a prerequisite. Some of the proofs in Chapters 1 and 2 are somewhat more difficult than the rest of the text, and they appear in appendices to these chapters. These proofs and the starred material in Section 2.6 probably should be omitted or discussed only briefly in an elementary course. An instructor using this text in a one-quarter course will probably not have time to cover the entire text. He may wish to cover the first three chapters thoroughly and the remainder as time permits, perhaps discussing those topics in the last three chapters that involve the Wiener process. Another option, however, is to emphasize continuous parameter processes by omitting or skimming Chapters 1 and 2 and concentrating on Chapters 3-6. (For example, the instructor could skip Sections 1.6.1, 1.6.2, 1.9, 2.2.2, 2.5.1, 2.6.1, and 2.8.) With some aid from the instructor, the student should be able to read Chapter 3 without having studied the first two chapters thoroughly. Chapters 4-6 are independent of the first two chapters and depend on Chapter 3 only in minor ways, mainly in that the Poisson process introduced in Chapter 3 is used in examples in the later chapters. The properties of the Poisson process that are needed later are summarized in Chapter 4 and can be regarded as axioms for the Poisson process. The authors wish to thank the UCLA students who tolerated preliminary versions of this text and whose comments resulted in numerous improvements. Mr. Luis Gorostiza obtained the answers to the exercises and also made many suggestions that resulted in significant improvements. Finally, we wish to thank Mrs. Ruth Goldstein for her excellent typing.
T able of Contents
Markov Chains
1.1 1.2 1.3 1.4
Markov chains having two states Transition function and initial distribution Examples Computations with transition functions 1.4.1 Hitting times 1.4.2 Transition matrix Transient and recurrent states Decomposition of the state space 1.6.1 Absorption probabilities 1.6.2 Martingales Birth and death chains Branching and queuing chains 1.8.1 Branching chain 1.8.2 Queuing chain Appendix Proof of results for the branching and queuing chains 1.9.1 Branching chain 1.9.2 Queuing chain
1.5 1.6
1.7 1.8
1 2 5 6 12 14 16 17 21 25 27 29 33 34 36 36 38 39 47 47 49 50 53 56 60 63 67 69 70
1.9
Table of Contents
2.7 2.8
Convergence to the stationary distribution Appendix Proof of convergence 2.8.1 Periodic case 2.8.2 A result from number theory Construction of jump processes Birth and death processes 3.2.1 Two-state birth and death process Poisson process 3.2.2 Pure birth process 3.2.3 3.2.4 Infinite server queue Properties of a Markov pure jump process 3.3.1 Applications to birth and death processes Mean and covariance functions Gaussian processes The Wiener process
3.3
154
159
166
6.4
Markov Chains
Consider a system that can be in anyone of a finite or countably infinite number of states. Let 51' denote this set of states. We can assume that 51' is a subset of the integers. The set 51' is called the state space of the system. Let the system be observed at the discrete moments of time n = 0, 1, 2, ... , and let Xn denote the state of the system at time n. Since we are interested in non-deterministic systems, we think of Xm n > 0, as random variables defined on a common probability space. Little can be said about such random variables unless some additional structure is imposed upon them. The simplest possible structure is that of independent random variables. This would be a good model for such systems as repeated experiments in which future states of the system are independent of past and present states. In most systems that arise in practice, however, past and present states of the system influence the future states even if they do not uniquely determine them. Many systems have the property that given the present state, the past states have no influence on the future. This property is called the Markov property, and systems having this property are called Markov chains. The Markov property is defined precisely by the requirement that
for every choice of the nonnegative integer n and the numbers Xo, ... , xn+ 1, each in 51'. The conditional probabilities P(Xn+ 1 = Y I Xn = x) are called the transition probabilities of the chain. In this book we will study Markov chains having stationary transition probabilities, i.e., those such that P(Xn+ 1 = Y I Xn = x) is independent of n. From now on, when we say that Xm n > 0, forms a Markov chain, we mean that these random variables satisfy the Markov property and have stationary transition probabilities. The study of such Markov chains is worthwhile from two viewpoints. First, they have a rich theory, much of which can be presented at an elementary level. Secondly, there are a large number of systems arising in practice that can be modeled by Markov chains, so the subject has many useful applications.
1
Markov Chains
In order to help motivate the general results that will be discussed later, we begin by considering Markov chains having only two states.
1.1.
For an example of a Markov chain having two states, consider a machine that at the start of any particular day is either broken down or in operating condition. Assume that if the machine is broken down at the start of the nth day, the probability is p that it will be successfully repaired and in operating condition at the start of the (n + l)th day. Assume also that if the machine is in operating condition at the start of the nth day, the probability is q that it will have a failure causing it to be broken down at the start of the (n + l)th day. Finally, let 1to(O)denote the probability that the machine is broken down initially, i.e., at the start of the Oth day. Let the state 0 correspond to the machine being broken down and let the state 1 correspond to the machine being in operating condition. Let Xn be the random variable denoting the state of the machine at time n. According to the above description
P(Xn+ P(Xn+
and
1 1
= =
1 I Xn 0
0) 1)
= p, = q,
I Xn =
0)
P(Xo
=
0
1tO(O).
Since there are only two states, 0 and 1, it follows immediately that
P(Xn+ P(Xn+
and that the probability 1to(I) From this information, We observe that
= =
I Xn = =
1)
0) 1)
= =
1 - p, 1 - q,
1 I Xn
= P(Xo =
1 - 1to(O).
P(Xn+1
0)
= P(Xn =
= P(Xn
=
0
0)
+ P(Xn =
= 0) = 1)
1 and Xn+1
0)
= O)P(Xn+\l = =
0)
I Xn
0
+
= =
P(Xn = I)P(Xn+1
I Xn
(1 - p)P(Xn
+ qP(Xn =
=
0)
1)
= (1 - p)P(Xn
= 0)
+ q(1 - P(Xn = 0» +
q.
(1 - p - q)P(Xn
1.1.
= 0) =
1to(O),so
P(XI
0)
(1 - P - q)1to(O)
+q
P(X2 = 0) = (1 - p - q)P(Xl
0)
+q
(l - p - q)].
(1 - p - q)21to(0)
+ q[1 + +q
n-l
P(Xn
0)
(1 - p - qt1to(O)
i=O
(1 - p - q)i.
1)
1to(1).
Suppose now that p + q > O. Then by the formula for the sum of a finite geometric progression,
n-l
(1 - p - q)i =
(1
i=O
-p- q P+q
)n
P(Xn = 0) =
q p+q
(1 - p - qt (1to(O) -
q p+q
),
P(Xn = 1) =
p+q
(1 - p - q)n (1to(l)
p+q
).
Suppose that p and q are neither both equal to zero nor both equal to 1. Then 0 < p + q < 2, which implies that 11 - p - ql < 1. In this case we can let n -+ 00 in (3) and (4) and conclude that (5) lim P(Xn
n-+ao
= 0) =
and
lim P(X
n-+ao
= 1) =
We can also obtain the probabilities q /(p + q) and p/(p + q) by a different approach. Suppose we want to choose 1to(O) and 1to(l) so that P(Xn = 0) and P(Xn = 1) are independent of n. It is clear from (3) and (4) that to do this we should choose
1to(O) =
---==----
+
q
and
1to(l) =
----=-
Thus we see that if Xn, n > 0, starts out with the initial distribution P(Xo
0)
and
P(Xo
1)
+q
Markov Chains
and
P(Xn = 1) =
p+q
The description of the machine is vague because it does not really say whether Xm n > 0, can be assumed to satisfy the Markov property. Let us suppose, however, that the Markov property does hold. We can use this added information to compute the joint distribution of Xo, Xb ... , Xn• For example, let n = 2 and let xo, Xl' and X2 each equal 0 or 1. Then
P(Xo
Xo, Xl
Xl' and X2
X2)
= P(Xo =
= P(Xo
Xo and Xl XO)P(XI
Xl)P(X2
X2 I Xo
Xo and Xl
Xl)
= Xl I Xo =
XO)P(X2
x21 Xo
Xo and Xl
Xl)·
Now P(Xo = xo) and P(XI = Xl I Xo = xo) are determined by p, q, and 1to(O); but without the Markov property, we cannot evaluate P(X2 = X2 I Xo = Xo and Xl = Xl) in terms of p, q, and 1to(O). If the Markov property is satisfied, however, then
P(X2
X2
I Xo = =
Xo and Xl
Xl)
P(X2
X2
I Xl
x.),
Xo, Xl
=
Xl' and X2
X2)
P(Xo
XO)P(XI
= Xl I Xo =
XO)P(X2
= X2
I Xl
= Xl)·
For example,
P(Xo
= 0, Xl - 1, and =
=
X2
= 0)
P(Xo
O)P(XI
1 I Xo = 0)P(X2
0 I Xl = 1)
1to(O)pq.
The reader should check the remaining entries in the following table, which gives the joint distribution of Xo, Xl' and X2.
Xo
Xl 0 0 1 1 0 0 1 1
X2
P(Xo
0 0 0 0 1 1 1 1
0 1 0 1 0 1 0 1
1to(O) (l - q) p
(1 - 1to(O»q(1 (1 - 1to(O»qp - p)
1.2.
1.2.
Let Xn, n > 0, be a Markov chain having state space f/. (The restriction to two states is now dropped ..) The function P(x, y), x E f/ and y E f/, defined by (6) P(X, y)
P(XI
I Xo =
x),
x, Y
f/,
is called the transition function of the chain. (7) and (8) P(X, y) > 0,
It is such that
E
x, Y
f/,
L P(x,
y
y) = 1,
XE
probabilities,
I Xn =
xo,""
x)
P(x, y),
n>l.
P(Xn+1
I Xo =
Xn-l
= xn-l,
Xn
x)
P(x, y).
In other words, if the Markov chain is in state x at time n, then no matter how it got to x, it has probability P(x, y) of being in state y at the next step. For this reason the numbers P(x, y) are called the one-step transition probabilities of the Markov chain. The function 1ro(x), x E f/, defined by (11) 1ro(X)
P(Xo
x),
XE
f/,
is called the initial distribution of the chain. (12) and (13) 1ro(X) > 0,
XE
L 1ro(x)
x
= 1.
The joint distribution of Xo, ... , Xn can easily be expressed in terms of the transition function and the initial distribution. For example, P(Xo Also, P(Xo
XO, Xl
Xl)
= =
P(Xo
XO)P(XI
Xl
I Xo =
Xo)
1rO(XO)P(XO' Xl)'
XO, Xl = Xl' X2
X2)
=
=
P(XO = XO, Xl
XI)P(X2
=
= I Xo
X2
I Xo
XO' Xl = Xl)
1ro(XO)P(XO' XI)P(X2
X2
= XO' Xl
= x.).
Markov Chains
Since Xn, n > 0, satisfies the Markov property and has stationary transition probabilities, we see that
P(X2
X2
I Xo =
XO, Xl
Xl)
=
= =
=
=
X2 X2
I Xl =
I Xo
=
xd Xl)
Thus
P(Xo (14) P(Xo
Xo, Xl
Xl' X2
=
xn)
X2)
TCo(Xo)P(xo,
Xo, ... , Xn
It is usually more convenient, however, to reverse the order of our definitions. We say that P(x, y), X E 9' and y E 9', is a transition/unction if it satisfies (7) and (8), and we say that TCo(X), X E 9', is an initial distribution if it satisfies (12) and (13). It can be shown that given any transition function P and any initial distribution TCo, there is a probability space and random variables Xn, n > 0, defined on that space satisfying (14). It is
not difficult to show that these random variables form a Markov chain having transition function P and initial distribution no. The reader may be bothered by the possibility that some of the conditional probabilities we have discussed may not be well defined. For example, the left side of (I) is not well defined if
P(Xo
Xo, ... , Xn
xn)
= O.
This difficulty is easily resolved. Equations (7), (8), (12), and (13) defining the transition functions and the initial distributions are well defined, and Equation (14) describing the joint distribution of Xo,"" Xn is well defined. It is not hard to show that if (14) holds, then (1), (6), (9), and (10) hold whenever the conditional probabilities in the respective equations are well defined. The same qualification holds for other equations involving conditional probabilities that will be obtained later. It will soon be appa~ent that the transition function of a Markov chain plays a much greater role in describing its properties than does the initial distribution. For this reason it is customary to study simultaneously all Markov chains having a given transition function. In fact we adhere to the usual convention that by "a Markov chain having transition function P," we really mean the family of all Markov chains having that transition function.
1.3. Examples
In this section we will briefly describe several interesting examples of Markov chains. These examples will be further developed in the sequel.
1.3.
Examples
Random walk. Let l, 2, ••• be independent integervalued random variables having common density f Let Xo be an integer-valued random variable that is independent of the e /s and set Xn = Xo + I + ... + en· The sequence X", n > 0, is called a random walk. It is a Markov chain whose state space is the integers and whose transition function is given by
Example 1.
ee
- x).
P(Xo
= =
xo,
el
Xl -
Xo,""
en
= Xn - xn-l)
= P(Xo
peen = xn - xn-l)
= 1to(XO)f(XI
- xn-l)
and thus (14) holds. Suppose a "particle" moves along the integers according to this Markov chain. Whenever the particle is in x, regardless of how it got there, it jumps to state y with probability fey - x). As a special case, consider a simple random walk in which f(l) = p, f( -1) = q, and f(O) = r, where p, q, and r are nonnegative and sum to one. The transition function is given by
P(x, y)
C'
;:
0,
Let a particle undergo such a random walk. If the particle is in state X at a given observation, then by the next observation it will have jumped to state X + 1 with probability p and to state x-I with probability q; with probability r it will still be in state x. Ehrenfest chain. The following is a simple model of the exchange of heat or of gas molecules between two isolated bodies. Suppose we have two boxes, labeled 1 and 2, and d balls labeled 1, 2, ... , d. Initially some of these balls are in box 1 and the remainder are in box 2. An integer is selected at random from 1, 2, ... , d, and the ball labeled by that integer is removed from its box and placed in the opposite box. This procedure is repeated indefinitely with the selections being independent from trial to trial. Let Xn denote the number of balls in box 1 after the nth trial. Then X", n > 0, is a Markov chain on f/ = {O, 1, 2, ... , d}.
Example 2.
Markov Chains
The transition function of this Markov chain is easily computed. Suppose that there are x balls in box I at time n. Then with probability xld the ball drawn on the (n + l)th trial will be from box 1 and will be transferred to box 2. In this case there will be x-I balls in box 1 at time n + 1. Similarly, with probability (d - x)/d the ball drawn on the (n + l)th trial will be from box 2 and will be transferred to box 1, resulting in x + I balls in box 1 at time n + 1. Thus the transition function of this Markov chain is given by
d'
P(x, y) =
x x
y y
x-I,
- d'
0,
=x
1,
elsewhere.
Note that the Ehrenfest chain can in one transition only go from state x to x-lor x + 1 with positive probability. A state a of a Markov chain is called an absorbing state if Pea, a) = 1 or, equivalently, if Pea, y) = for y '# a. The next example uses this definition. Suppose a gambler starts out with a certain initial capital in dollars and makes a series of one dollar bets against the house. Assume that he has respective probabilities p and q = 1 - P of winning and losing each bet, and that if his capital ever reaches zero, he is ruined and his capital remains zero thereafter. Let Xm n > 0, denote the gambler's capital at time n. This is a Markov chain in which is an absorbing state, and for x > 1
Gambler's ruin chain. Example 3.
q, (15)
P(x, y) =
p,
y = x-I, y = x + 1,
0,
elsewhere.
Such a chain is called a gambler's ruin chain on f/ = {O,1,2, ... }. We can modify this model by supposing that if the capital of the gambler increases to d dollars he quits playing. In this case and d are both absorbing states, and (15) holds for x = 1, ... , d - l.
For an alternative interpretation of the latter chain, we can assume that two gamblers are making a series of one dollar bets against each other and that between them they have a total capital of d dollars. Suppose the first gambler has probability p of winning any given bet, and the second gambler has probability q = 1 - p of winning. The two gamblers play until one
1.3.
Examples
of them goes broke. Let Xn denote the capital of the first gambler at time n. Then Xm n > 0, is a gambler'S ruin chain on {O,1, ... , d}.
Example 4. Birth and death chain. Consider a Markov chain either on f/' = {O,1, 2, ... } or on Y = {O, 1, ... , d} such that starting from x the chain will be at x-I, x, or x + 1 after one step. The transition
Y = x, Y = x + 1, elsewhere, where Px, qx, and r x are nonnegative numbers such that Px + qx + r x = 1. The Ehrenfest chain and the two versions of the gambler's ruin chain are examples of birth and death chains. The phrase "birth and death" stems from applications in which the state of the chain is the population of some living system. In these applications a transition from state x to state x + 1 corresponds to a "birth," while a transition from state x to state x-I corresponds to a "death." In Chapter 3 we will study birth and death processes. These processes are similar to birth and death chains, except that jumps are allowed to occur at arbitrary times instead of just at integer times. In most applications, the models discussed in Chapter 3 are more realistic than those obtainable by using birth and death chains. Consider a service facility such as a checkout counter at a supermarket. People arrive at the facility at various times and are eventually served. Those customers that have arrived at the facility but have not yet been served form a waiting line or queue. There are a variety of models to describe such systems. We will consider here only one very simple and somewhat artificial model; others will be discussed in Chapter 3.
Queuing chain. Example 5.
Let time be measured in convenient periods, say in minutes. Suppose that if there are any customers waiting for service at the beginning of any given period, exactly one customer will be served during that period, and that if there are no customers waiting for service at the beginning of a period, none will be served during that period. Let en denote the number of new customers arriving during the nth period. We assume that are independent nonnegative integer-valued random variables having common density f
10
Markov
Chains
Let Xo denote the number of customers present initially, and for n > 1, let Xn denote the number of customers present at the end of the nth period. If Xn = 0, then Xn+ 1 = en+ 1; and if Xn > 1, then Xn+ 1 = Xn + en+ 1 - 1. It follows without difficulty from the assumptions on em n > 1, that Xn, n > 0, is a Markov chain whose state space is the nonnegative integers and whose transition function P is given by P(O, y) and
= f(y)
-x
Pix, y) = f(y
1),
x>1.
Example 6. Branching chain. Consider particles such as neutrons or bacteria that can generate new particles of the same type. The initial set of objects is referred to as belonging to the Oth generation. Particles generated from the nth generation are said to belong to the (n + 1)th generation. Let Xm n > 0, denote the number of particles in the nth generation. Nothing in this description requires that the various particles in a generation give rise to new particles simultaneously. Indeed at a given time, particles from several generations may coexist. A typical situationis illustrated in Figure 1: one initial particle gives rise to two particles. Thus Xo = 1 and Xl = 2. One of the particles in the first generation gives rise to three particles and the other gives rise to one particle, so that X2 = 4, We see from Figure 1 that X3 = 2. Since neither of the particles in the third generation gives rise to new particles, we conclude that X4 = and consequently that Xn = 0 for all n > 4. In other words, the progeny of the initial particle in the zeroth generation become extinct after three generations.
Figure 1
1.3.
Examples
11
In order to model this system as a Markov chain, we suppose that each particle gives rise to ~ particles in the next generation, where ~ is a nonnegative integer-valued random variable having density f We suppose that the number of offspring of the various particles in the various generations are chosen independently according to the density f Under these assumptions Xm n > 0, forms a Markov chain whose state space is the nonnegative integers. State 0 is an absorbing state. For if there are no particles in a given generation, there will not be any particles in the next generation either. For x > 1
P(x,y)
P(~l
+ ... +
~x
y),
where ~ h ... ,~x are independent random variables having common density f In particular, P(l, y) = fey), y > O. If a particle gives rise to ~ = 0 particles, the interpretation is that the particle dies or disappears. Suppose a particle gives rise to ~ particles, which in turn give rise to other particles; but after some number of generations, all descendants of the initial particle have died or disappeared (see Figure 1). We describe such an event by saying that the descendants of the original particle eventually become extinct. An interesting problem involving branching chains is to compute the probability p of eventual extinction for a branching chain starting with a single particle or, equivalently, the probability that a branching chain starting at state 1 will eventually be absorbed at state O. Once we determine p, we can easily find the probability that in a branching chain starting with x particles the descendants of each of the original particles eventually become extinct. Indeed, since the particles are assumed to act independently in giving rise to new particles, the desired probability is just p". The branching chain was used originally to determine the probability that the male line of a given person would eventually become extinct. For this purpose only male children would be included in the various generations. Consider a gene composed of d subunits, where d is some positive integer and each subunit is either normal or mutant in form. Consider a cell with a gene composed of m mutant subunits and d - m normal subunits. Before the cell divides into two daughter cells, the gene duplicates. The corresponding gene of one of the daughter cells is composed of d units chosen at random from the 2m mutant subunits and the 2(d - m) normal subunits. Suppose we follow a fixed line of descent from a given gene. Let Xo be the number of mutant subunits initially
Example 7.
12
Markov Chains
present, and let Xm n > 1, be the number present in the nth descendant gene. Then Xm n > 0, is a Markov chain on f/ = {O,I, 2, ... , d} and
P(X, Y) =
States
1.4.
Let Xm n > 0, be a Markov chain on Y having transition function P. In this section we will show how various conditional probabilities can be expressed in terms of P. We will also define the n-step transition function of the Markov chain. We begin with the formula
(16)
P(Xn+
xn+
1, ••• ,
Xn+m
Xn+m
I Xo =
Xo, ... , Xn
••
Xn) Xn+m)'
= P(Xm xn+1)·
P(Xn+m-l,
Xn+m) Xn)
which reduces to the right side of (16). It is convenient to rewrite (16) as (17) P(Xn+1
= Yl"'"
Xn+m
Ym I Xo
Xo,"" Yl)P(Yl'
Xn-1
Xn-l'
Xn
= x)
Ym)·
= P(x,
Let Ao, ... , An-1 that
(18)
P(Xn+1
Ao,·.·,
Xn-1
An-b
Xn = x) Ym)·
P(x, Yl)P(Yl'
Y2)' .. P(Ym-l'
•••
P(Xn+l
EBb"',
I Xo
Ao,""
Xn-1
E An-h
K; = x)
Ym)·
1: ... 1:
YIEBI YmEBm
P(x, Yl)P(Yl'
1.4.
13
The m-step transition function pm(x, y), which gives the probability of going from x to y in m steps, is defined by
(20) pm(x, y) =
L'" L
Yl Ym-l
P(x, Yl)P(Yh
Y2)'"
po(x,
elsewhere.
P(Xn+m
I Xo
An-l,
Xn
x)
I Xn =
x)
(23)
P(Xn+m
= = =
I Xo =
Y z
x, Xn
z)
pm(z, y).
=
=
=
P(Xn+m
I Xo = I Xo =
x) x)P(Xn+m
L P(Xn
z z
I Xo = = z),
x, Xn
z)
L P'(», z)P(Xn+m
pn+m(x, y) =
=Y
I Xo =
x, X;
L pn(x,
z
z)pm(z,
y).
For Markov chains having a finite number of states, (24) allows us to think of P" as the nth power of the matrix P, an idea we will pursue in Section 1.4.2. Let no be an initial distribution for the Markov chain. Since
P(Xn
y)
=
=
L P(Xo
x
= =
x, X; x)P(Xn
y)
L P(Xo
x
I x, =
x),
we see that
(25) P(Xn
y)
L no(x)pn(x,
x
y).
This formula allows us to compute the distribution of Xn in terms of the initial distribution no and the n-step transition function P".
14
Markov Chains
of Xm observe
P(Xn+l
y)
= L P(Xn =
x x
X,
Xn+l
y)
= L P(Xn =
so that (26)
x)P(Xn+l
Y I X;
x),
P(Xn+l
= y) =
L P(Xn
x
x)P(x, y).
If we know the distribution of Xo, we can use (26) to find the distribution of Xl' Then, knowing the distribution of Xl' we can use (26) to find the distribution of X2• Similarly, we can find the distribution of Xn by applying (26) n times. We will use the notation Px( ) to denote probabilities of various events defined in terms of a Markov chain starting at x. Thus
denotes the probability that a Markov chain starting at x is in a state a at time 3 but not at time I or at time 2. In terms of this notation, (19) can be rewritten as
(27)
P(Xn+l
Bh· .. , Xn+m E Bm I Xo
Ao,··.,
Xn-l
E
An-I' Xn = X)
E
= PiXl
1 .4.1 . H itting ti meso T....of A is defined by
Bl, ... , Xm
Bm).
A)
if Xn E A for some n > 0, and by T.... = 00 if Xn ¢ A for all n > O. In other words, T....is the first positive time the Markov chain is in (hits) A. Hitting times play an important role in the theory of Markov chains. In this book we will be interested mainly in hitting times of sets consisting of a single point. We denote the hitting time of a point a E f/ by 'Fa rather than by the more cumbersome notation 1{a}' An important equation involving hitting times is given by (28)
pn(x, y) =
m=l
PiTy
= m)pn-m(y,
y),
n>1.
In order to verify (28) we note that the events {Ty = m, Xn = y}, 1 < m < n, are disjoint and that
{Xn
= y} = U {1'y =
m=1
m, X n
= y}.
1.4.
15
We have in effect decomposed the event {Xn = y} according to the hitting time of y. We see from this decomposition that
m=l
n
L m=l
m=l
= =
=
m)P(Xn m)P(Xn
= =
I Xo = x, Ty = m)
L
n
Y I Xo
X,
Xl =F y, ... ,
-L
Px(Ty
m)pn-m(y,
y),
m=l
and hence that (28) holds. Show that if a is an absorbing state, then P''(x, a) PxCTa < n), n ~ 1.
Example 8. If a is an absorbing state, then pn-m(a,
a)
m=l
L
n
PxCTa = m)pn-m(a,
a)
L m=l
Observe that
2)
z¢y
PxCX 1
Z,
X2
y)
z¢y
P(x, z)P(z,
y).
PxCTy = n
1) =
z¢y
n>1.
is a consequence of (27), but it should also be directly in order to go from x to y for the first time at time n + 1, to go to some state z =1= y at the first step and then go from first time at the end of n additional steps.
16
Markov Chains
1.4.2. Transition matrix. Suppose now that the state space !/ is finite, say !/ = {O, 1, ... ,d}. In this case we can think of P as the transition matrix having d + 1 rows and columns given by
~ [p(~,
d F or example, {O, 1, 2, 3} is
0)
...
d P(O;
d1
P(d, 0) matrix
the transition
0 0
1 2
p2(X, y)
L P(x,
z
z )P(z, y).
Recalling the definition of ordinary matrix multiplication, we observe that the two-step transition matrix p2 is the product of the matrix P with itself. More generally, by setting m = 1 in (24) we see that (30)
pn+ lex, y) =
L pn(x,
z
z )P(z, y).
It follows from (30) by induction that the n-step transition matrix P" is the nth power of P. An initial distribution TCo can be thought of as a (d + I)-dimensional row vector
1to
(TCo(O), ...
If we let
1tn
denote the (d
1tn
I)-dimensional
= (P(Xn =
and
The two-state Markov chain discussed in Section 1.1 is one of the few examples where P" can be found very easily.
1.5.
17
Example 9.
transition matrix
p=
[1 -
1- q ,
p]
where p
p+q
(1 - P -
qt
p+q
=
P
=0
In
(4) and
p+q
p+q
q p+q
- (1 - p - qt
q p+q
+ (1 -
p - qt
p+q
+q q
[q
p] + (1 - p P
+q
q)n [
-q
p-
Pq] .
1.5.
Let Xn, n > 0, be a Markov chain having state space !/ and transition function P. Set Pxy = PxCTy < (0). Then Pxy denotes the probability that a Markov chain starting at x will be in state y at some positive time. In particular, Pyy denotes the probability that a Markov chain starting at y will ever return to y. A state y is called recurrent if Pyy = 1 and transIent if Pyy < 1. If y is a recurrent state, a Markov chain starting at y returns to y with probability one. If y is a transient state, a Markov chain starting at y has positive probability 1 Pyy of never returning to y. If y is an absorbing state, then PiTy = 1) =
18
Markov
Chains
P(y, y) = 1 and hence Pyy = 1; thus an absorbing state is necessarily recurrent. Let ly(z), Z E !/, denote the indicator function of the set {y} defined by
= y, Z =1= y.
Z
Let N(y) denote the number of times n > 1 that the chain is in state y. Since ly(Xn) = 1 if the chain is in state y at time nand ly(Xn) = 0 otherwise, we see that (31) N(y)
=
n= 1
I:
00
liXn)'
The event {N (y) > I} is the same as the event {Ty < co}. Thus PiN(y)
> 1)
Let m and n be positive integers. By (27), the probability that a Markov chain starting at x first visits y at time m and next visits y n units of time later is PiTy = m)Py(Ty = n). Thus PiN(y)
> 2) =
m= 1 n= 1
I: I:
00
00
PiTy
m)PiTy
n)
= PxyPyy•
m>1.
> m) - PiN(y)
>m
m>1.
1),
it follows from (32) that (33) Also Px(N(y) so that (34) PiN(y)
= =
PiN(y)
m) = PXyp;y-l(1
- Pyy),
0) = 1 - PiN(y)
> 1),
0) = 1 - Pxy'
These formulas are intuitively obvious. To see why (33) should be true, for example, observe that a chain starting at x visits state y exactly m times if and only if it visits y for a first time, returns to y m - I additional times, and then never again returns to y.
19
We use the notation Ei ) to denote expectations of random variables defined in terms of a Markov chain starting at x. For example, (35)
It follows from (31) and (35) that
n=1
CXl
CXl
Ex(1y(X n))
= ~ pn(x, y).
n= 1
Set
G(x, y)
= ExCN(y)) = ~
CXl
pn(x, y).
n= 1
Then G(x, y) denotes the expected number of visits to y for a Markov chain starting at x.
Theorem 1 (i) Let y be a transient state. Then
PiNey)
and (36) G(x, y) =
< (0)
=1
Pxy , 1 - Pyy
XE
9',
9'.
Then PiNey)
(0)
1 and
PiNey)
(0)
= PiTy < = 0,
while
(0)
= Pxy,
XE
9'.
= 0,
then G(x, y)
G(x, y)
00.
This theorem describes the fundamental difference between a transient state and a recurrent state. If y is a transient state, then no matter where the Markov chain starts, it makes only a finite number of visits to y and the expected number of visits to y is finite. Suppose instead that y is a recurrent state. Then if the Markov chain starts at y, it returns to y infinitely often. If the chain starts at some other state x, it may be impossible for it to .ever hit y. If it is possible, however, and the chain does visit y at least once, then it does so infinitely often.
20
Markov Chains
Let y be a transient state. Since 0 < Pyy < 1, it follows from = 00) = lim PJlJN(y) ::::: ) = lim PXyp~y-1 = m
111-+00 111-+00
PiNey) By (33)
o.
G(x, y) = Ex(N(y»
= I:
111=1 00
00
mPiN(y)
m) - Pyy)·
I: mpXyp~y-1(1
111=1
Substituting
~ m~
111=1
~-1
1 , (1 - t)2
Pxy
<
00.
This completes the proof of (i). Now let y be recurrent. Then Pyy = 1 and it follows from (32) that
Px(N(y)
(0)
lim Px(N(y)
111-+00
> m)
= lim Pxy
111-+00
= Pxy-
In particular, PiNey) = (0) = 1. If a nonnegative random variable has positive probability of being infinite, its expectation is infinite. Thus G(y, y) 0, then PiTy implies that P"(x, y) Pxy > 0, then PiNey)
Ey(N(y»
00.
If Pxy
= m) = 0 for all finite positive integers m, so = 0, n > 1; thus G(x, y) = 0 in this case.
= (0) = Pxy
(28) If
G(x, y) = EiN(y» This completes the proof of Theorem 1. Let y be a transient state. Since
I: P"(x,
00 ,,=1
y) = G(x, y)
<
00,
XE
I/,
0,
XE
I/.
t.6.
21
A Markov chain is called a transient chain if all of its states are transient and a recurrent chain if all of its states are recurrent. It is easy to see that a Markov chain having a finite state space must have at least one recurrent state and hence cannot possibly be a transient chain. For if 9' is finite and all states are transient, then by (38)
o=
ye[F
lim pn(x, y)
n->oo
= lim
=
L
=
pn(x, y)
E
n->ooye9'
lim PiXn
n-> 00
9')
=
which is a contradiction.
lim 1
n-> 00
1,
1.6.
Let x and Y be two not necessarily distinct states. We say that x leads Y if Pxy > O. It is left as an exercise for the reader to show that x leads Y if and only if P''(x, y) > 0 for some positive integer n. It is also left the reader to show that if x leads to Y and y leads to z, then x leads to Theorem 2 Let x be a recurrent state and suppose that x leads to y. Then y is recurrent and Pxy = Pyx = 1. Proof Since
< (0)
Pxy > 0,
we see that PiTy = n) > 0 for some positive integer n. Let no be the least such positive integer, i.e., set
(39)
no
n) > 0).
It follows easily from (39) and (28) that pno(x, y) > 0 and (40) pm(x, y) = 0, 1 < m < no.
Since pnO(x, y) > 0, we can find states Yl' ... , Yno-l such that
PiXl
Yno-l' Xno
y)
= Pix,
y) > O.
None of the states Yb ... , Yno-l equals x or Y; for if one of them did equal x or y, it would be possible to go from x to Y with positive probability in fewer than no steps, in contradiction to (40).
22
Markov Chains
We will now show that Pyx = 1. Suppose on the contrary that Pyx < 1. Then a Markov chain starting at Y has positive probability 1 - Pyx of never hitting x. More to the point, a Markov chain starting at x has the positive probability
P(x, Yl) ... P(Yno-l, y)(l - Pyx)
of visiting the states Yl, ... , Yno-l' Y successively in the first no times and never returning to x after time no. But if this happens, the Markov chain never returns to x at any time n > 1, so we have contradicted the assumption that x is a recurrent state. Since Pyx = 1, there is a positive integer n1 such that pnl(y, x) > 0. Now
Pnl+n+no( y, y ) - P Y(X nl +n+no -) Y -
> P,(Xnl
Hence
G(y, y) >
n=nl+1+no
X,
Xn1
+n
X,
Xn1 +n+no
y)
L
00
pn(y,
y)
L
00
pnl +n+no(y, y)
n=l
L
00
pn(x, x) x) =
n=l
pnl(y,
x)PnO(x, y)G(x,
+ 00,
from which it follows that Y is also a recurrent state. Since Y is recurrent and Y leads to x, we see from the part of the theorem that has already been verified that Pxy = 1. This completes the proo( I A nonempty set C of states is said to be closed if no state inside of C leads to any state outside of C, i.e., if
(41) Pxy = 0, pn(x, y) = 0,
XE
C and Y 1= c.
= 0,
XE
C and Y 1= C
zeC
23
and (42) follows by induction. If C is closed, then a Markov chain starting in C will, with probability one, stay in C for all time. If a is an absorbing state, then {a} is closed. A closed set C is called irreducible if x leads to y for all choices of x and y in C. It follows from Theorem 2 that if C is an irreducible closed set, then either every state in C is recurrent or every state in C is transient. The next result is an immediate consequence of Theorems 1 and 2. Corollary 1 Let C be an irreducible closed set of recurrent states. Then Pxy = 1, Px(N(y) = (0) = 1, and G(x, y) = 00 for all choices of x and y in C. An irreducible Markov chain is a chain whose state space is irreducible, that is, a chain in which every state leads back to itself and also to every other state. Such a Markov chain is necessarily either a transient chain or a recurrent chain. Corollary 1 implies, in particular, that an irreducible recurrent Markov chain visits every state infinitely often with probability one. We saw in Section 1.5 that if [/ is finite, it contains at least one recurrent state. The same argument shows that any finite closed set of states contains at least one recurrent state. Now let C be a finite irreducible closed set. We have seen that either every state in C is transient or every state in C is recurrent, and that C has at least one recurrent state. It follows that every state in C is recurrent. We summarize this result: Theorem 3 Let C be a finite irreducible closed set of states. Then every state in C is recurrent. Consider a Markov chain having a finite number of states. Theorem 3 implies that if the chain is irreducible it must be recurrent. If the chain is not irreducible, we can use Theorems 2 and 3 to determine which states are recurrent and which are transient. Example 10. Consider a Markov chain having the transition matrix
0 0
1 2
3 4 5
Determine which states are recurrent and which states are transient.
24
Markov Chains
As a first step in studying this Markov chain, we determine by inspection which states lead to which other states. This can be indicated in matrix form as 0 0
1 1
5 0
2
3 4
+ + + + +
The x, y element of this matrix is + or 0 according as PXY is positive or zero, i.e., according as x does or does not lead to y. Of course, if P(x, y) > 0, then Pxy > O. The converse is certainly not true in general. For example, P(2, 0) = 0; but
P2(2,0)
P(2, I)P(I, 0)
= t· i- = lo >
0,
so that P20 > O. State 0 is an absorbing state, and hence also a recurrent state. We see clearly from the matrix of + 's and O's that {3, 4, 5} is an irreducible closed set. Theorem 3 now implies that 3, 4, and 5 are recurrent states. States 1 and 2 both lead to 0, but neither can be reached from O. We see from Theorem 2 that 1 and 2 must both be transient states. In summary, states 1 and 2 are transient, and states 0, 3, 4, and 5 are recurrent. Let f/ Tdenote the collection of transient states in f/, and let f/ R denote the collection of recurrent states in 9'. In Example 10, 9'T = {I, 2} and 9' R = {O,3, 4, 5}. The set 9'R can be decomposed into the disjoint irreducible closed sets C1 = {O} and C2 = {3, 4, 5}. The next theorem shows that such a decomposition is always possible whenever f/ R is nonempty.
Theorem 4 Suppose that the set 9'R of recurrent states is nonempty. Then 9'R is the union of a finite or countably infinite number of disjoint irreducible closed sets C1, C2, ••••
Choose x E 9'R and let C be the set of all states y in 9'R such that x leads to y. Since x is recurrent, Pxx = I and hence x E C. We will now verify that C is an irreducible closed set. Suppose that y is in C and y leads to z. Since y is recurrent, it follows from Theorem 2 that z is recurrent. Since x leads to y and y leads to z, we conclude that x leads to z. Thus z is in C. This shows that C is closed. Suppose that y and z are both in C. Since x is recurrent and x leads to y, it follows from
Proof
1.6.
25
Theorem 2 that y leads to x. Since y leads to x and x leads to z, we conclude that y leads to z. This shows that C is irreducible. To complete the proof of the theorem, we need only show that if C and D are two irreducible closed subsets of 9'R, they are either disjoint or identical. Suppose they are not disjoint and let x be in both C and D. Choose y in C. Now x leads to y, since x is in C and C is irreducible. Since D is closed, x is in D, and x leads to y, we conclude that y is in D. Thus every state in C is also in D. Similarly every state in D is also in C, so that C and D are identical. I We can use our decomposition of the state space of a Markov chain to understand the behavior of such a system. If the Markov chain starts out in one of the irreducible closed sets C, of recurrent states, it stays in C, forever and, with probability one, visits every state in C, infinitely often. If the Markov chain starts out in the set of transient states 9' T' it either stays in 9'T forever or, at some time, enters one of the sets C, and stays there from that time on, again visiting every state in that C, infinitely often. 1.6.1 Absorption probabilities. Let C be one of the irreducible closed sets of recurrent states, and let pc<x) = PiTc < (0) be the probability that a Markov chain starting at x eventually hits C. Since the chain remains permanently in C once it hits that set, we call Pc(x) the probability that a chain starting at x is absorbed by the set C. Clearly Pc(x) = 1, X E C, and Pc(x) = 0 if x is a recurrent state not in C. It is not so clear how to compute pcCx) for x E 9'n the set of transient states. If there are only a finite number of transient states, and in particular if 9' itself is finite, it is always possible to compute Pc(x), x E 9't» by solving a system of linear equations in which there are as many equations as unknowns, i.e., members of 9'T. To understand why this is the case, observe that if x E 9'T, a chain starting at x can enter C only by entering C at time 1 or by being in 9'T at time 1 and entering C at some future time. The former event has probability Lyec P(x, y) and the latter event has probability Lye.9'T P(x, y)Pc(y). Thus
(44)
Pc(x) =
yeC
1:
P(x, y)
+ 1:
P(x, y)pcCy),
ye.9'T
Equation (44) holds whether 9' T is finite or infinite, but it is far from clear how to solve (44)for the unknowns pcCx), x E 9'r» when fl'T is infinite. An additional difficulty is that if 9'T is infinite, then (44) need not have a unique solution. Fortunately this difficulty does not arise if 9'T is finite.
26
Markov Chains
Suppose the set 9'T of transient states is finite and let C be an irreducible closed set of recurrent states. Then the system of equations (45) f(x)
=
yeC
Theorem 5
P(x, y)
+ L P(x, y)f(y),
ye[/T
pc<x) ,
Proof
P(y, z)
+ L P(y, z)f(z),
ze
[/T
yeC
P(x, y)
ye9'T
LL
P(x, y)P(y, z)
zeC
+L
ye[/T
L
ze[/T
The sum of the first two terms is just PxCTc < 2), and the third term reduces to LZe[/T p2(X, z)f(z), which is the same as Lye[/T p2(X, y)f(y). Thus f(x) = PxCTc < 2) + L P2(X, y)f(y).
ye[/T
By repeating this argument indefinitely or by using induction, we conclude that for all positive integers n (47) f(x)
E
= PxCTc <
n)
+ L pn(x, y)f(y),
ye[/T
XE
9' and
9'T.
According to the assumptions of the theorem, 9'T is a finite set. It therefore follows from (48) that the sum in (47) approaches zero as n -+ 00. Consequently for x E 9'T f(x) as desired.
Example 11.
=
I
Consider the Markov chain discussed in Example 10. and
Find From (44) and the transition matrix in Example 10, we see that PIO and P20 are determined by the equations
1.6.
27
and
Pzo
tpI0
+ jPZO'
Solving these equations we find that PI0 = t and Pzo = t· By similar methods we conclude that P{3,4,5}(1) = t and P{3,4,5}(2) = -t. Alternatively, we can obtain these probabilities by subtracting p{o}(I) and p{0}(2) from 1, since if there are only a finite number of transient states,
(49)
f/ T
=~
i
Px(Tcl
Since there are only a finite number of transient states and each transient state is visited only finitely many times, the probability PX(T[/'R < (0) that a recurrent state will eventually be hit is 1, so (49) holds. Once a Markov chain starting at a transient state x enters an irreducible closed set C of recurrent states, it visits every state in C. Thus
(50)
Pxy
pcCx),
XE
f/ T and Y
C.
It follows from (50) that in our previous example P13 = P14 = P15 = P{3,4,5}(1)
= =
and
PZ3 = PZ4 = PZ5 = P{3,4,5}(2)
t -t.
1.6.2. Martingales. Consider a Markov chain having state space {O, ... , d} and transition function P such that
d
(51)
~
y=O
yP(x, y)
=
=
x,
x = 0, ... , d.
Now
E[Xn+l
I Xo
=
d
Xo,""
Xn-1
=
Xn-l'
Xn = x]
=
=~
d
y=O
yP[Xn+1
yP(x, y)
I Xo
Xo,""
Xn-1
xn-1, X;
x]
=~
y=O
I Xo
Xo,""
Xn-1
= xn-1,
Xn = x] = x,
i.e., that the expected value of Xn+ 1 given the past and present values of Xo, ... , Xn equals the present value of Xn. A sequence of random variables
28
Markov Chains
having this property is called a martingale. Martingales, which need not be Markov chains, playa very important role in modern probability theory. They arose first in connection with gambling. If Xn denotes the capital of a gambler after time n and if all bets are "fair," that is, if they result in zero expected gain to the gambler, then Xn, n > 0, forms a martingale. Gamblers were naturally interested in finding some betting strategy, such as increasing their bets until they win, that would give them a net expected gain after making a series of fair bets. That this has been shown to be mathematically impossible does not seem to have deterred them from their quest. It follows from (51) that
y=o
yP(O, y) = 0,
and hence that P(O, 1) = ... = P(O, d) = 0. Thus is necessarily an absorbing state. It follows similarly that d is an absorbing state. Consider now a Markov chain satisfying (51) and having no absorbing states other than and d. It is left as an exercise for the reader to show that under these conditions the states 1, ... , d - 1 each lead to state 0, and hence each is a transient state. If the Markov chain starts at x, it will eventually enter one of the two absorbing states and d and remain there permanently. It follows from Example 8 that
EiXn)
=L
=
y=o
yPiXn
y)
y=o d-l
y=l
+ +
dpn(x, d) dPiTd
d-l
y= 1
< n).
Since states 1, 2, ... , d - 1 are transient, we see that P''(x, y) ~ n ~ 00 for y = 1, 2, ... , d - 1. Consequently,
n-+ co
° as
=
lim EiXn)
dPi1d
< (0)
= dPxd'
On the other hand, it follows from (51) (see Exercise 13(a)) that EXn EXn-1 = ... = EXo and hence that EiXn) = x. Thus lim EiX n)
n-+ co
X.
1.7.
29
By equating the two values of this limit, we conclude that (53) Since Pxo
Pxd =
d'
= 0, ... , d.
Pxd
d'
0, ... , d.
Of course, once (53) is conjectured, it is easily proved directly from Theorem 5. We need only verify that for x = 1, ... , d - 1, (54)
~ = P(x, d)
d-l
y=l
Clearly (54) follows from (51). The genetics chain introduced in Example 7 satisfies (51) as does a gambler's ruin chain on {a, 1, ... ,d} having transition matrix of the form
1 tOt tot
°
1
Suppose two gamblers make a series of one dollar bets until one of them goes broke, and suppose that each gambler has probability t of winning any given bet. If the first gambler has an initial capital of x dollars and the second gambler has an initial capital of d - x dollars, then the second gambler has probability Pxd = xjd of going broke and the first gambler has probability 1 - (x/d) of going broke.
1.7. Birth and death chains
tOt
For an irreducible Markov chain either every state is recurrent or every state is transient, so that an irreducible Markov chain is either a recurrent chain or a transient chain. An irreducible Markov chain having only finitely many states is necessarily recurrent. It is generally difficult to decide whether an irreducible chain having infinitely many states is recurrent or transient. We are able to do so, however, for the birth and death chain.
30
Markov Chains
Consider a birth and death chain on the nonnegative integers or on the finite set {O,... ,d}. In the former case we set d = 00. The transition function is of the form
y = x-I, y = x, y
1,
00.
where Px + qx + rx = 1 for x E !/, qo = 0, and Pd = 0 if d < assume additionally that Px and qx are positive for 0 < x < d. For a and b in !/ such that a < b, set a < x < b,
We
and set u(a) = 1 and u(b) = O. If the birth and death chain starts at y, then in one step it goes to y - 1, y, or y + 1 with respective probabilities qy, r y' or py. It follows that a < y < b. Since ry (56) Set Yo
(57)
=
u(y
=
+ 1) -
u(y)
1 and
I
= qy Py
1
(u(y)
u(y -
1»,
a < y < b.
~, =
Y
q ... q
PI'"
Py
o<y
< d.
+ 1) -
u(y)
= ___b_ (u(y)
Yy-I
u(y -
1»,
a < y < b,
1) - u(y)
=
=
Ya+ 1 Ya Ya
•••
1) - u(a»
Yy (u(a
1) - u(a».
+ 1) = Yy (u(a)
Ya
u(a
+ 1»,
a < y < b.
Summing (58) on y = a, ... , b - 1 and recalling that u(a) = 1 and u(b) = 0, we conclude that
u(a) u(a
Ya
+ 1) -
1 Ly=a Yy
b-l .
31
1)
=
b~~
Ly=a
t,
x, ... , b -
~b-l
£...y=x
~b-l
'/y
')'y
'
<
<
b.
£...y=a
(59)
P (T < T.) =
x a b
~b-l
'/y
')'y
'
a < x < b.
P (T. < T) =
x
b
~b-l
~x-l £...y=a
'/y
')'y
£...y=a
'
<
<
b.
A gambler playing roulette makes a series of one dollar bets. He has respective probabilities 9/19 and 10/19 of winning and losing each bet. The gambler decides to quit playing as soon as his net winnings reach 25 dollars or his net losses reach 10 dollars. (a) Find the probability that when he quits playing he will have won 25 dollars. (b) Find his expected loss. The problem fits into our scheme if we let Xn denote the capital of the gambler at time n with Xo = 10. Then Xm n > 0, forms a birth and death chain on {O,1, ... , 35} with birth and death rates and Px = 9/19, qx = 10/19,
Example 12.
o o
States 0 and 35 are absorbing states. Formula (60) is applicable with a = 0, x = 10, and b = 35. We conclude that
'/y = (l0/9Y,
o<
y < 34,
and hence that P10(T35 < To) = L~=o (10/9)Y = (10/9)10 - 1 = .047. L;~o (10/9)Y (10/9)35 - 1 Thus the gambler .has probability .047 of winning 25 dollars. His expected loss in dollars is 10 - 35(.047), which equals $8.36.
32
Ms,kov Chsi".
In the remainder of this section we consider a birth and death chain on the nonnegative integers which is irreducible, i.e., such that Px > 0 for x > 0 and qx > 0 for x > 1. We will determine when such a chain is recurrent and when it is transient. As a special case of (59), (61) PI (To
<
Tn)
1-
L,=o
n- 1
Yy
'
n > 1.
Consider now a birth and death chain starting in state 1. Since the birth and death chain can move at most one step to the right at a time (considering the transition from state to state as movement along the real number line), (62) It follows from (62) that {To < Tn}, n > 1, forms a nondecreasing sequence of events. We conclude from Theorem 1 of Ch.apter 1 of Volume 11 that
(63)
n-+oo
> 1}.
Equation (62) implies that Tn ~ n and thus 1'" --4 00 as n --4 00; hence the event {To < 1'" for some n > I} occurs if and only if the event {To < co} occurs. We can therefore rewrite (63) as (64)
n-+oo
1-
:2:;=0
Y,
y=O
Yy =
00.
If the birth and death chain is recurrent, then P1(To < (0) = 1 and (66) follows from (65). To obtain the converse, we observe that P(O, y) = 0 for y > 2, and hence (67) Po(To < (0)
P(O,O)
+ P(O,
1 Paul G. Hoel, Sidney C. Port, and Charles J. Stone,/rttroductkm to Probablli'y Thor" (Boston: Houghton Mifflin Co., 1971), p. 13.
1.8.
33
Suppose (66) holds. Then by (65) P1(To < (0) = 1. From this and (67) we conclude that Po(To < (0) = P(O, 0)
Thus is a recurrent state, and since the chain is assumed to be irreducible, it must be a recurrent chain. In summary, we have shown that an irreducible birth and death chain on {a, 1, 2, ... } is recurrent if and only if
(68)
P(O, 1) = 1.
Q1'"
qx =
•• Px
00.
x= 1 Pl'
Example13. Consider the birth and death chain on {a, 1, 2, ... } defined by
x
Px
= 2(x
+2 + 1)
and
qx = 2(x
1)'
x ~ O.
X
X
,
_
it follows that
"I = q 1 =:«
x=
P1 ••• Px
(x
1·2···x 3 . 4 ... (x
+
x
2)
1)(x
2)
1
=2(
1) 2.
+1 t +t X
--
+2 + ! - t + ... )
1)
We conclude that the chain is transient. 1.8. Branching and queuing chains
In this section we will describe which branching chains are certain of extinction and which are not. We will also describe which queuing chains
34
Markov Chains
are transient and which are recurrent. The proofs of these results are somewhat complicated and will be given in the appendix to this chapter. These proofs can be skipped with no loss of continuity. It is interesting to note that the proofs of the results for the branching chain and the queuing chain are very similar, whereas the results themselves appear quite dissimilar. 1.8.1. Branching chain. Consider the branching chain introduced in Example 6. The extinction probability p of the chain is the probability that the descendants of a given particle eventually become extinct. Clearly
Suppose there are x particles present initially. Since the numbers of offspring of these particles in the various generations are chosen independently of each other, the probability Pxo that the descendants of each of the x particles eventually become extinct is just the xth power of the probability that the descendants of anyone particle eventually become extinct. In other words, (69)
Pxo _
P,
= 1, 2, ....
Recall from Example 6 that a particle gives rise to ~ particles in the next generation, where ~ is a random variable having density f If f(l) = 1, the branching chain is degenerate in that every state is an absorbing state. Thus we suppose that f(l) < 1. Then state 0 is an absorbing state. It is left as an exercise for the reader to show that every state other than 0 is transient. From this it follows that, with probability one, the branching chain is either absorbed at 0 or approaches + 00. We conclude from (69) that
x
n-+oo
= 1, 2, ....
Clearly it is worthwhile to determine p or at least to determine when p = 1 and when p < 1. This can be done using arguments based upon the formula
(70)
tI>(p) = p,
+L
y=l
00
f(y)tY,
O<t<1.
1.8.
35
pel, 0)
+L
y=1
00
00
pel, y)PyO
y=1
P(l, y)pY
f(y)pY
y=1
Let Jl denote the expected number of offspring of any given particle. Suppose Jl < 1. Then the equation <I)(t) = t has no roots in [0, 1) (under our assumption thatf(l) < 1), and hence p = 1. Thus ultimate extinction
is certain ifJl
< 1 andf(l)
Suppose instead that Jl root Po in [0, 1), and hence p equals either Po or 1. Actually p always equals Po. Consequently, if Jl > 1 the probability of ultimate extinction is
less than one.
The proofs of these results will be given in the appendix. The results themselves are intuitively very reasonable. If Jl < 1, then on the average each particle gives rise to fewer than one new particle, so we would expect the population to die out eventually. If Jl > 1, then on the average each particle gives rise to more than one new particle. In this case we would expect that the population has positive probability of growing rapidly, indeed geometrically fast, as time goes on. The case Jl = 1 is borderline; but since p = 1 when Jl < 1, it is plausible by "continuity" that p = 1 also when Jl = 1. Suppose that every man in a certain society has exactly three children, which independently have probability one-half of being a boy and one-half of being a girl. Suppose also that the number of males in the nth generation forms a branching chain. Find the probability that the male line of a given man eventually becomes extinct.
Example 14.
The density f of the number of male children of a given man is the binomial density with parameters n = 3 and p = t· Thus f(O) = i, f(l) = t,f(2) = t,f(3) = i, andf(x) = for x >4. The mean number of male offspring is Jl = t. Since Jl > 1, the extinction probability p is the root of the equation
36
Mllrkov Chllins
t3
or equivalently as
3t 2
5t
0,
(t - 1)(t2 + 4t - 1) = 0.
This equation has three roots, namely, 1, sequently, p = ,J"S - 2.
-,J"S -
2, and
,J"S -
2. Con-
1.8.2. Queuing chain. Consider the queuing chain introduced in Example 5. Let 1, and Jl be as in that example. In this section we will indicate when the queuing chain is recurrent and when it is transient. Let Jl denote the expected number of customers arriving in unit time. Suppose first that Jl > 1. Since at most one person is served at a time and on the average more than one new customer enters the queue at a time, it would appear that as time goes on more and more people will be waiting for service and that the queue length will approach infinity. This is indeed the case, so that if Jl > 1 the queuing chain is transient. In discussing the case Jl < 1, we will assume that the chain is irreducible (see Exercises 37 and 38 for necessary and sufficient conditions for irreducibility and for results when the queuing chain is not irreducible). Suppose first that Jl < 1. Then on the average fewer than one new customer will enter the queue in unit time. Since one customer is served whenever the queue is nonempty, we would expect that, regardless of the initial length of the queue, it will become empty at some future time. This is indeed the case and, in particular, 0 is a recurrent state. The case Jl = 1 is borderline, but again it turns out that 0 is a recurrent state. Thus if Jl < 1 and the
e e2' ...
In this section we will verify the results discussed in Section 1.8. To do so we need the following.
pee
(71)
Theorem 6 Let <I> be the probability generating function of a nonnegative integer-valued random variable and set Jl = (with Jl = + 00 if does not have finite expectation). If Jl < 1 and = 1) < 1, the equation
Ee
<I>(t) = t
has no roots in [0, 1). If Jl > 1, then (71) has a unique root Po in [0, 1).
1.9.
37
Graphs of cl>{t), 0 < t < 1, in three typical cases corresponding to fl < 1, fl = 1, and fl > 1 are shown in Figure 2. The fact that fl is the left-hand derivative ofcl>{t) at t = 1 plays a fundamental role in the proof of Theorem 6.
y y
t ~<1 Figure 2
Po
~>1
Proof
e.
Then
= f{O) + f(1)t
+ f(2)t2 +
3f(3)t2
+ ... + .. '.
fl.
2f(2)t
= f{O),
t-+ 1
Since cl>'{t) is nondecreasing in t, 0 < t < 1, we conclude that cl>'{t) < 1 for 0 < t < 1. Suppose next that fl = 1 and f{l) = = 1) < 1. Then fen) > 0 for some n > 2 (otherwise f{O) > 0, which implies that fl < 1, a contradiction). Therefore cl>'{t) is strictly increasing in t, o < t < 1. Since
pee
lim cl>'(t)
t"'1
= 1,
we again conclude that cl>'{t) < 1 for 0 < t < 1. Suppose now that fl ~ 1 and P{ = 1) < 1. We have shown that cl>'{t) < 1 for 0 < t < 1. Thus
- (cl>{t) - t) < 0, dt
<
< 1,
38
Markov Chains
and hence cI>(t) - t is strictly decreasing on [0, 1]. Since <1>(1) - 1 = 0, we see that <I>(t) - t > 0, < t < 1, and hence that (71) has no roots on [0, 1). This proves the first part of the theorem. Suppose next that Ii > 1. Then lim <I>'(t) > 1,
t r+ 1
so by the continuity of <1>' there is a number to such that < to < 1 and <I>'(t) > 1 for to < t < 1. It follows from the mean value theorem that
+ 3· 2f(3)t
+ ...
shows that <I>(t) = t has a
has no roots in (0, 1). This contradiction unique root in [0, 1).
Branching chain. Using Theorem 6 we see that the results for Ii < 1 follow as indicated in Section 1.8.1. Suppose u > 1. It follows from Theorem 6 that P equals Po or I, where Po is the unique root of the equation <I>(t) = tin [0, 1). We will show that P always equals Po. First we observe that since the initial particles act independently in giving rise to their offspring, the probability PiTo < n) that the descendants of each of the y > 1 particles become extinct by time n is given by Py(To < n) Consequently Pl(To for n >
1.9.1.
by Exercise 9(a)
= P(I, 0) = P(l, 0) = f(O)
< n + 1)
+ +
00
y= 1
L
00 00
y= 1
< n))Y
+L
y=l
f(y)(P1(To
< n))y,
1.9.
39
and hence (72) We will use (72) to prove by induction that (73) Now PI (To < 0) = 0 < Po, so that (73) is true for n = O. Suppose that (73) holds for a given value of n. Since <1>(/) is increasing in I, we conclude from (72) that P1(To n ~ O.
n > O.
and thus (73) holds for the next value of n. By induction all n > O. By letting n -+ 00 in (73) we see that P
Pl(To
Since P is one of the two numbers Po or 1, it must be the number Po. 1.9.2. Queuing chain. We will now verify the results of Section 1.8.2. Let en denote the number of customers arriving during the nth time period. Then l' are independent random variables having common density f, mean 11, and probability generating function <1>. It follows from Exercise 9(b) and the identity P(O, z) = P(l, z), valid for a queuing chain, that Poo = PlO' We will show that the number P = Poo = PlO satisfies the equation
e e2' ...
(74)
<I>(p) = p.
If 0 is a recurrent state, P = 1 and (74) follows immediately from the fact that <1>(1) = 1. To verify (74) in general, we observe first that by Exercise 9(b) Poo = P(O, 0) i.e., that (75) P = f(O)
y= 1
00
L
00
P(O, y)pyo,
+L
y=l
f(y)pyo·
In order to compute Pyo, Y = 1, 2, ... , we consider a queuing chain starting at the positive integer y. For n = 1,2, ... , the event {Ty-1 = n} occurs if and only if
n = min (m = min (m
> 0: > 0:
(el
1)
+ ... +
=m-
(em 1),
1) = Y -
1)
el + ... + em
40
Markov Chains
that is, if and only if n is the smallest positive integer m such that the number of new customers entering the queue by time m is one less than the number served by time m. Thus Py(TY-1 = n) is independent of y, and consequently Py,y-l = Py(TY-1 < 00) is independent of y for y = 1, 2, .... Since PIO = p, we see that
Py,y-l
Py-l,y-2
= ... = PIO = p.
Now the queuing chain can go at most one step to the left at a time, so in order to go from state y > 0 to state 0 it must pass through all the intervening states y - 1, ... , 1. By applying the Markov property we can conclude (see Exercise 39) that (76) P yO - P y,y - 1P y 1 ,y - 2
"'P
10 -
- Py •
= f(O) + ~
00
f(y)py
y=l
~(p),
so that (74) holds. Using (74) and Theorem 6 it is easy to see that if p. < 1 and the queuing chain is irreducible, then the chain is recurrent. For p satisfies (74) and by Theorem 6 this equation has no roots in [0, 1) (observe that P( 1 = 1) < 1 if the queuing chain is irreducible). We conclude that p = 1. Since Poo = p, state 0 is recurrent, and thus since the chain is irreducible, all states are recurrent. Suppose now that p. > 1. Again p satisfies (74) which, by Theorem 6, has a unique root Po in [0, 1). Thus P equals either Po or 1. We will prove that p = Po. To this end we first observ.e that by Exercise 9(a)
P1(To < n
1)
P(l, 0)
+~
00
y=l
1) = f(O)
+~
y= 1
00
f(y)Py(To
< n).
To verify (78) observe that if a queuing chain starting at y reaches 0 in n or fewer steps, it must reach y - 1 in n or fewer steps, go from y - 1 to Y - 2 in n or fewer steps, etc. By applying the Markov property we can conclude (see Exercise 39) that (79) Py(To < n) < Py(TY-1
< n)Py-1(I'y-2
< n)···
Exercises Since 1 < z < y, (78) is valid. It follows from (77) and (78) that
41
P1(To < n
i.e., that (80)
1) < f(O)
+ I:
00
y=t
f(y)(Pt(To
< n))Y,
n > O.
n > 0,
by a proof that is almost identical to the proof that (72) implies (73) (the slight changes needed are left as an exercise for the reader). Just as in the proof of the corresponding result for the branching chain, we see by letting n -+ 00 in (81) that P < Po and hence that p = Po. We have shown that if J.l > 1, then Poo = P < 1, and hence 0 is a transient state. It follows that if J.l > 1 and the chain is irreducible, then all states are transient. If J.l > I and the queuing chain is not irreducible, then case (d) of Exercise 38 holds (why?), and it is left to the reader to show that again all states are transient.
Exercises
1 Let Xm n ~ 0, be the two-state Markov chain.
0 and X2
Find
0),
Suppose we have two boxes and 2d balls, of which d are black and d are red. Initially, d of the balls are placed in box 1, and the remainder of the balls are placed in box 2. At each trial a ball is chosen at random from each of the boxes, and the two balls are put back in the opposite boxes. Let Xo denote the number of black balls initially in box I and, for n ~ 1, let Xn denote the number of black balls in box 1 after the nth trial. Find the transition function of the Markov chain Xn, n > O. more customers waiting to be served at the start of a period, there is probability p that one customer will be served during that period and probability 1 - p that no customers will be served during that period. Find the transition function for this modified queuing chain.
3 Let the queuing chain be modified by supposing that if there are one or
42
Markov Chains
sets mentioned below are all in d. (a) Show that if Di are disjoint and P(C I Di) = p independently of i, then P(C I D = p. (b) Show that if c, are disjoint, then P(Ui c, I D) = Li P(Ci I D). (c) Show that if E, are disjoint and E, = n, then
Ui i)
P(C
Ui
I D)
L P(Ei i
I D)P(C I s, n D).
= PCB I Ci) for all i,
(d) Show that if C, are disjoint and Pt A lei) then peA I UiCi) = PCB I UiCi)'
5 Let Xm n
= =
n). n).
x = 0, ... , d.
P(Xo
8
I Xl
Xo
I Xl
Xl)'
Let X and y be distinct states of a Markov chain having d < 00 states and suppose that X leads to y. Let no be the smallest positive integer such that pnO(x, y) > 0 and let Xl' ... ,xno-l be states such that P(x, Xl)P(Xl, X2)'" P(Xno-2' Xno-l)P(Xno-l,
y) > O.
(a) Show that x, Xl' ... , xno-l' yare distinct states. (b) Use (a) to show that no < d - l. (c) Conclude that PxCTy < d - 1) > O.
9 Use (29) to verify the following identities:
1) = P(x, y)
n),
n> -
o· ,
P(x, y)
+ ~ P(x, z)pzy.
z*y
10 Consider the Ehrenfest chain with d = 3. (a) Find Px(To = n) for X E 9' and 1 < n (b) Find P, p2, and
r>.
< 3.
1to
1tl, 1t2,
Exercises
43
11
Consider the genetics chain from Example 7 with d = 3. (a) Find the transition matrices P and P". (b) If Xo = (0, i, i, 0), find Xl and X2· (c) Find Pi1{0,3} = n), x E 9', for n = 1 and n = 2. Consider the Markov chain having state space {O, 1, 2} and transition matrix
0
12
P=l
2
[I
y)
=
0 0
12 1 0 1
-p
0
~].
XE
(a) Find p2. (b) Show that p4 = p2. (c) Find P", n > 1.
13
Let Xm n > 0, be a Markov chain whose state space 9' is a subset of {O,1, 2, ... } and whose transition function P is such that
L yP(x,
y
Ax
+ B,
9',
for some constants A and B. (a) Show that EXn+1 = AEXn (b) Show that if A 1= 1, then
EXn =
14
B.
I-A
+ An
(EX
0_
I-A
).
Let Xm n > 0, be the Ehrenfest chain on {O,1, ... ,d}. Show that the assumption of Exercise 13 holds and use that exercise to compute EiXn).
n=O
L
00
pn(x, y) <
n=O
L
00
pn(y, y).
16 17 18
Show that n.
Pxy
Show that if x leads to y and y leads to z, then x leads to z. Consider a Markov chain on the nonnegative integers such that, starting from x, the chain goes to state x + 1 with probability p, o < p < 1, and goes to state 0 with probability 1 - p. (a) Show that this chain is irreducible. (b) Find Po(To = n), n > 1. (c) Show that the chain is recurrent.
44
Markov Chains
19
Consider a Markov chain having transition matrix 0123 0 t 0 t .\: 1 0010 2 0001 3 0100 4 0000 5 0000 6 0000
state space {O,1, ... , 6} and 56 t00 000 000 000 t0t tt0 0tt 4
(a) Determine which states are transient and which states are recurrent. (b) Find POY' y = 0, ... , 6. 20 Consider the Markov chain on {O, 1, ... , 5} having transition matrix 0 1 2 3 4 5 12345 tt0000 0 t t 0 0 i. 0 00t080 .\: .\: 0 0 .\: .\: 0 0 i 0 .\: 0 1 1 0 5 0 .1 5 15 5 0
21
(a) Determine which states are transient and which are recurrent. (b) Find p{O,1}(x), x = 0, ... , 5. Consider a Markov chain on {O, 1, ... , d} satisfying (51) and having no absorbing states other than 0 and d. Show that the states 1, ... , d - 1 each lead to 0, and hence that each is a transient state. the genetics chain introduced in Example 7 satisfies Equation (51).
22 Show that
e) (~r(I - ~r-Y.
o
< x < d.
Find p{O}(x), 0 < x < 2d. 24 Consider a gambler's ruin chain on {O, 1, ... ,d}.
Find
respective probabilities 9/19 and 10/19 of winning and losing each bet. The gambler decides to quit playing as soon as he either is one dollar ahead or has lost his initial capital of $1000. (a) Find the probability that when he quits playing he will have lost $1000. (b) Find his expected loss.
Exerci.e.
26 Consider a birth and death chain on the nonnegative
45
Px > 0 and qx > 0 for x > 1. (a) Show that if L;'=o Yy = 00, then Pxo = 1, x > 1. (b) Show that if L;'=o Yy < 00, then P xO - L:'=x Yy £"y=O Yy
~CX) ,
x > 1. -
27 Consider a gambler's ruin chain on {O, 1, 2, ... }. (a) Show that if q > p, then Pxo = 1, x > 1. (b) Show that if q < p, then Pxo = (q/p)x, x > 1.
integers.
28 Consider
an irreducible birth and death chain on the nonnegative Show that if Px < qx for x > 1, the chain is recurrent. birth and death chain on the nonnegative
qx _ (
X
X )2
1'
x > 1.
(a) Show that this chain is transient. (b) Find PxO' x >l. Hint: Use
I:;=
l/y2 = n2/6.
Exercise
26 and
the formula
(a) Compute PxCFa < Tb) for a < x < h. (b) Compute PxO, x > O.
31
Consider a branching chain such that f(l) state other than 0 is transient.
<
1.
32 Consider
the branching chain described in Example 14. If a given man has two boys and one girl, what is the probability that his male line will continue forever? Consider a branching chain probability p of extinction. with f(O)
33
f(3)
1/2.
Find
the
34 Consider
a branching chain with f(x) = p(1 - p)x, x > 0, where o < p < 1. Show that P = 1 if p > 1/2 and that p = p/(l - p) if p < 1/2. Let Xm n > 0, be a branching Hint: See Exercise 13. chain. Show that
36
EiXn)
xp",
Let Xm n > 0, be a branching chain and suppose that the associated random variable ~ has finite variance (J2. (a) Show that E[X;+1 I X; = x] = X(12 + X2Jl2. (b) Use Exercise 35 to show that EiX;
+ 1)
= =
x}t"u1
2 + 11Ex(X ;).
Lx P(X
x)E[Y
IX =
x].
46
Markov Chains
+ ... +
p2(n-l»
x2p2n,
n ~ 1.
(d) Show that if there are x particles initially, then for n > 1
p =I 1,
p=1.
37 Consider the queuing chain.
(a) Show that if either J(O) = 0 or J(O) + J(I) = 1, the chain is not irreducible. (b) Show that if'j'(O) > 0 andJ(O) + J(1) < 1, the chain is irreducible. Hint: First verify that (i) Pxy > 0 for 0 < y < x; and (ii) if xo > 2 and J(xo) > 0, then PO,xo+n(xo-l) > 0 for n > O.
38
Determine which states of the queuing chain are absorbing, which are recurrent, and which are transient, when the chain is not irreducible. Consider the following four cases separately (see Exercise 37): (a) J(I) = 1; (b) J(O) > 0, J(I) > 0, and J(O) + J(I) = 1; (c) J(O) = 1; (d) J(O) = 0 andJ(I) < 1. Consider the queuing chain. (a) Show that for y > 2 and m a positive integer
39
Py(To
m)
m-l
k=l
Py(TY-1
k)Py-1(To
m - k).
y > 2.
(c) Why does Equation (76) follow from (b)? (d) By summing the equation in (a) on m = 1, 2, ... , n, show that
> 2.
Let Xn, n > 0, be a Markov chain having state space [/' and transition function P. If n(x), x E [/', are nonnegative numbers summing to one, and if (1)
I n(x)P(x,
x
y) = n(y),
E [/',
E [/'.
Then, as we will soon see, regardless of the initial distribution of the chain, the distribution of Xn approaches n as n -+ 00. In such cases, n is sometimes called the steady state distribution. In this chapter we will determine which Markov chains have stationary distributions, when there is such a unique distribution, and when (2) holds. 2.1. Let Elementary properties of stationary distributions
tt
be a stationary distribution.
Then z)P(z, y)
z
I n(x)p2(x,
x
y) =
I n(x) L P(x,
X
= ~ (~
=
%
L n(z )P(z,
=
n(y).
r:: lex,
we conclude that for all n (3)
y)
L pn(x, z )P(z,
z
y),
I n(x)pn(x,
x
y) = n(y),
47
E [/'.
48
Stationary Distributions
of a Markov Chain
tt
P(Xn = y) = 'Ir(Y),
E //,
and hence that the distribution of Xn is independent of n. Suppose conversely that the distribution of Xn is independent of n. Then the initial distribution 'lro is such that 'lro(y) = P(X
0
= y) = P(X
= y) =
L 'lro(x)P(x,
x
y).
Consequently 'lro is a stationary distribution. In summary, the distribution of Xn is independent of n if and only if the initial distribution is a stationary distribution. Suppose now that 'Ir is a stationary distribution and that (2) holds. Let 'lro be the initial distribution. Then
(5)
P(Xn = y) =
L 'lro(x)pn(x, x
y),
E //.
By using (2) and the bounded convergence theorem stated in Section 2.5, we can let n -+ 00 in (5), obtaining lim P(Xn = y) =
n->
00
L 'lro(x)'Ir(Y)·
x
Since
(6)
Lx
s:
Formula (6) states that, regardless of the initial distribution, for large values of n the distribution of Xn is approximately equal to the stationary distribution n: It implies that tt is the unique stationary distribution. For if there were some other stationary distribution we could use it for the initial distribution 'lro. From (4) and (6) we would conclude that 'lro(Y) = 'Ir(Y), Y E //. Consider a system described by a Markov chain having transition function P and unique stationary distribution n, Suppose we start observing the system after it has been going on for some time, say no units of time for some large positive integer no. In effect, we observe Yn, n > 0, where n>
o.
The random variables Yn, n > 0, also form a Markov chain with transition function P. In order to determine unique probabilities for events defined in terms of the Yn chain, we need to know its initial distribution, which is the same as the distribution of Xno. In most practical applications it is very
2.2.
Examples
49
hard to determine this distribution exactly. We may have no choice but to assume that Yn, n > 0, has the stationary distribution n for its initial distribution. This is a reasonable assumption if (2) holds and no is large.
2.2. Examples
In this section we will consider some examples in which we can show directly that a unique stationary distribution exists and find simple formulas for it. In Section 1.1 we discussed the two-state Markov chain on [/ = {O, I} having transition matrix
o
We saw that if p determined by
[I q
1
p
1- q .
nCO) =
and
n(1) =
For Markov chains having a finite number of states, stationary distributions can be found by solving a finite system of linear equations. Consider a Markov chain having state space [/ = {O,I, 2} and transition matrix
Example 1.
o! 1
2i
[t
o
12
Show that this chain has a unique stationary distribution n and find n, Formula (I) in this case gives us the three equations
nCO) nCO) nCO)
in-
346 323
+ n(l) + +
n( 1) n(1)
' '
+ +
342
n(2) = n(2).
50
Stationary Distributions
of a Markov Chain
Lx
n(1)
n(2)
1.
By subtracting twice the first equation from the second equation, we eliminate the term involving n(2) and find that n(l) = 5n(0)/3. We conclude from the first equation that n(2) = 3n(0)/2. From the fourth equation we now see that
n(O)(1
+ t)
=
6 2 5'
= 1,
.2.. • _[_ 3 25
--
.£ 5
It is readily seen that these numbers satisfy all four equations. Since they
n(1) =
t,
and
n(2) = f5'
Though it is not easy to see directly, (2) holds for this chain (see Section
i.e., that
r, > 0
and
for for
O<x<d
if d is finite, and that Px > 0 and for O<x<oo if d is infinite. Suppose d is infinite. The system of equations for O<x<oo
L n(x)P(x,
x
y) = n(y),
fI',
2.2.
Examples
51
becomes
n(O)ro n(y -
+ +
n(I)ql
l)qy+
I
= n(O),
I)Py-1
n(y)ry
n(y
n(y),
y > 1.
q In(1) - POn(O) = 0,
1),
y>1.
> 0,
1) = ____!!L_ n( y),
s,«,
Px-I
> O.
Po'"
s.:: s,
nCO),
x>1.
I,
·(9)
x = 0,
x-I
nx =
P 0 "'P ql'"
s,
x>1.
x >0.
that
Conversely, (1) follows from (10). Suppose now that Lx nx < 00 or, equivalently, (11) x= I
Po'" ql'"
Px-I qx
<
00.
We conclude from (10) that the birth and death chain has a unique stationary distribution, given by (12)
x >0.
Suppose instead that (11) fails to hold, i.e., that (13) x=o
l: nx
00
00.
52
Stationary Distributions
of a Markov Chain
We conclude from (10) and (13) that any solution to (1) is either identically zero or has infinite sum, and hence that there is no stationary distribution. In summary, we see that the chain has a stationary distribution if and only if (II) holds, and that the stationary distribution, when it exists, is given by (9) and (12). Suppose now that d < 00. By essentially the same arguments used to obtain (12), we conclude that the unique stationary distribution is given by
(14) 1t(x)
X
d 1t;x
, 1ty
Ly=o
o<x s
d,
Consider the Ehrenfest chain introduced in Section 1.3 and suppose that d = 3. Find the stationary distribution. The transition matrix of the chain is 0
I
2 3
[! !].
ot
0
I0
I2
1to
= I,
t1 = 3,
and
= t,
1t(1) =
i,
1t(2) =
i,
and
1t(3) =
t.
Formula (2) does not hold for the chain in Example 2 since P''(x, x) = 0 for odd values of n. We can modify the Ehrenfest chain slightly and avoid such "periodic" behavior.
Modified Ehrenfestchain. Suppose we have two boxes labeled 1 and 2 and d balls labeled 1, 2, ... ,d. Initially some of the balls Example 3.
are in box 1 and the remainder are in box 2. An integer is selected at random from 1, 2, ... , d, and the ball labeled by that integer is removed from its box. We now select at random one of the two boxes and put the removed ball into this box. The procedure is repeated indefinitely, the
2.2.
EXBmples
S3
selections being made independently. Let Xn denote the number of balls in box 1 after the nth trial. Then Xm n > 0, is a Markov chain on !/ = {O, I, ... ,d}. Find the stationary distribution of the chain for d = 3. The transition matrix of this chaill, for d
o ttoO 1 !t t
2
°
0
3, is
123
0
300tt
t!
To see why P is given as indicated, we will compute P(I, y), 0 < y < 3. We start with one ball in box 1 and two balls in box 2. Thus P(1, 0) is the probability that the ball selected is from box 1 and the box selected is box 2. Thus P(I, 0)
= t· t = i·
box 2 and
Secondly, P(I, 2) is the probability tkat the ball Hlected islrom the box selected is box 1. Thus P(I,2)
= t· t = t·
Clearly P(1, 3) = 0, since at most one ball is transferred at a time. Finally, P(l, 1) can be obtained by subtracting P(l, 0) + P(l, 2) + P(l, 3) from 1. Alternatively, P(l, 1) is the probability that either the selected ball is from box 1 and the selected box is box 1 or the selected ban is from box 2 and the selected box is box 2. Thus
P(l, 1) =
t· t + t· t
t·
The other probabilities are computed similarly. This Markov chain is an irreducible birth and death chain. It is easily seen that nx, 0 < x < 3, are the same as in the previous example and hence that the stationary distribution is again given by n(O) =
1,
n(l)
= i,
n(2) =
i,
and
n(3) =
1.
It follows from the results in Section 2.7 that (2) holds for the chain in Example 3. 2.2.2. Particle. in a box. A Markov chain that arises in several applied contexts can be described as follows. Suppose that ~n particles are added to a box at times n = 1, 2, ... , where ~m n > I, are independent aRd have a Poisson distribution with common parameter it. Suppose that lach. particle ia the box at time If, independently of all the other particles
54
in the box and independently of how particles are added to the box, has probability p < 1 of remaining in the box at time n + 1 and probability q = 1 - p of being removed from the box at time n + 1. Let Xn denote the number of particles in the box at time n. Then Xm n > 0, is a Markov chain. We will find the stationary distribution of this chain. We will also find an explicit formula for P'(», y) and use this formula to show directly that (2) holds. The same Markov chain can be used to describe a telephone exchange, where en is the number of new calls starting at time n, q is the probability that a call in progress at time n terminates by time n + 1, and Xn is the number of calls in progress at time n. We will now analyze this Markov chain. Let R(Xn) denote the number of particles present at time n that remain in the box at time n + 1. Then
Xn+1
Clearly
P(R(Xn)
en+
R(Xn)·
= z I X; =
x)
< z < x,
and
P(~n
z)
= _e_,_,
z.
A_Z
-A,
z > O.
Since
P(Xn+
1
Y I X;
x)
min(x,y)
z=O
min(x,y)
P(R(Xn) P(~n+
1
= z, =
en+
Y - z I X;
x) x),
z=o
Y - z)P(R(Xn)
= z I X; =
we conclude that
It follows from (15) or from the original description of the process that P(x, y) > 0 for all x > 0 and y > 0, and hence that the chain is irreducible. Suppose Xn has a Poisson distribution with parameter t. Then R(Xn)
y)
=L
x=y
00
00
P(Xn
x, R(Xn)
y)
= =
x=y
P(Xn = x)P(R(Xn)
t
Y I X;
x)
x=y
t tXe- (x) x! Y
pY(l _ Py-y
2.2
Examples
55
t tXepY(1 - pt-Y X=yy!(x - y)! = (pt)ye-t (t(1 - p)t-Y y! x=y (x - y)! = (pt)Ye-t (t(1 - p)y
- i:
i: f
z!
= (pt)Ye-
which shows that R(Xn) has the indicated Poisson distribution. We will now show that the stationary distribution is Poisson with parameter t for suitable t. Let Xo have such a distribution. Then Xl = I + R(Xo) is the sum of independent random variables having Poisson distributions with parameters A. and pt respectively. Thus Xl has a Poisson distribution with parameter A. + pt. The distribution of Xl will agree with that of Xo if t = A. + pt, i.e., if
t=----
A
q
1- p
We conclude that the Markov chain has a stationary distribution n which is a Poisson distribution with parameter A/q, i.e., such that
(16)
n(x)
oi« e-)./q
x!
x >0.
Finally we will derive a formula for P''(x, y). Suppose Xo has a Poisson distribution with parameter t. It is left as an exercise for the reader to show that Xn has a Poisson distribution with parameter tpn Thus
). + _ (1
_ pn).
and hence
(17) r pn(x, y) = e-)'(l- P")/qet(l- p") x=o xl
L
00
[tpn
+ ~ (1
q.,:::_
- pn)]Y
_
y!
56
Stationary Distributions
of a Markov Chain
Now if
x=
z=o
L
L
azbx_z·
min(x,y)
x=
z=o
azbx-z·
Using this with (17) and the binomial expansion, we conclude that P"(x, y) =
x! e-A,(l-l'")/q
min(x,y)
y!
z=o
(y)
Z
rr: ,
z=o
(y - z)!
Thus as n ~ 00, the terms in the sum in (18) all approach zero except for the term corresponding to z = O. We conclude that
-A,/q
(A)Y
(19)
lim P'ix, y) = e
n-+ co
y!
n(y),
x, y > O.
Thus (2) holds for this chain, and consequently the distribution n given by (16) is the unique stationary distribution of the chain.
2.3. Average number of visits to a recurrent state
Consider an irreducible birth and death chain with stationary distribution n, Suppose that P(x, x) = = 0, X E 9', as in the Ehrenfest chain and the gambler'S ruin chain. Then at each transition the birth and death chain moves either one step to the right or one step to the left. Thus the chain can return to its starting point only after an even number of transitions. In other words, P"(x, x) = 0 for odd values of n. For such a chain the formula lim P"(x, y) = n(y), Y E Y,
'x
n-+ co
57
There is a way to handle such situations. Let an' n > 0, be a sequence of numbers. If
(20)
n-+oo
lim an = L
lim
n-+oo
!f
n
m= 1
am = L.
Formula (21) can hold, however, even if (20) fails to hold. For example, if an = 0 for n odd and an = 1 for n even, then an has no limit as n ~ 00, but
n-+oo
lim -
m=l
l:
n
am = -. 2
In this section we will show that 1n lim - l: P"'(x, y) n-+oo n m= 1 exists for every pair x, y of states for an arbitrary Markov chain. In Section 2.5 we will use the existence of these limits to determine which Markov chains have stationary distributions and when there is such a unique distribution. Recall that z = y, l,(z) = (~: z =F y, and that
(22)
l:
n
1y(Xm)
y) =
m=l
l:
pm(x, y).
Then Nn(y) denotes the number of visits of the Markov chain to y during times m = 1, ... , n. The expected number of such visits for a chain starting at x is given according to (22) by (23) Let y be a transient state. Then lim N~(y)
n"'oo
N(y)
<
00
58
Stationary Distributions
of a Markov Chain
and
n-r a:
00,
XE
g.
lim Nn(y) = 0
n-+oo
and that
(25) 1· Gn(x, y) = 0 , 1m n-+ 00 n
XE
g.
Observe that Nn(y)Jn is the proportion of the first n units of time that the chain is in state y and that Gn(x, y)Jn is the expected value of this proportion for a chain starting at x. Suppose now that y is a recurrent state. Let my = Ey(Ty) denote the mean return time to y for a chain starting at y if this return time has finite expectation, and set my = 00 otherwise. Let l{Ty<oo} denote the random variable that is 1 if T; < 00 and 0 if T; = 00. We will use the strong law of large numbers to prove the main result of this section, namely, Theorem 1 below.
Strong Law of Large Numbers. Let identically distributed random variables. have finite mean u, then
~h ~2'
If
· 1 rm
n-+oo
~1
+ ... +
n
~n
If these random variables are nonnegative and fail to have finite expectation, then this limit holds, provided that we set J-t = + 00.
lim Nn(y) =
n-+oo
l{Ty<
oo}
my
XE
g.
These formulas are intuitively very reasonable. Once a chain reaches y, it returns to y "on the average every my units of time." Thus if T, < 00 and n is large, the proportion of the first n units of time that the chain is in
59
state y should be about limy. Formula (27) should follow from (26) by taking expectations. From Corollary 1 of Chapter 1 and the above theorem, we immediately obtain the next result. Corollary 1 Then (28) and if P(Xo (29) If my = (25) hold.
00
x, Y
E
C,
C)
n->oo
my
C.
the right sides of (26)-(29) all equal zero, and hence (24) and
Proof In order to verify Theorem I, we need to introduce some additional random variables. Consider a Markov chain starting at a recurrent state y. With probability one it returns to y infinitely many times. For r > 1 let denote the time of the rth visit to y, so that
T;
T;
r).
Set = = Ty and for r > 2 let = T; - T;-I denote the waiting time between the (r - I)th visit to y and the rth visit to y. Clearly T". = WIy + ... + Wyr. y The random variables W~, W;, ... are independent and identically distributed and hence they have common mean EiW~) = EiTy) = my. This result should be intuitively obvious, since every time the chain returns to y it behaves from then on just as would a chain starting out initially at y. One can give a rigorous proof of this result by using (27) of Chapter 1 to show that for r > 1
w; t;
W;
mI,·.·,
Py(W: = mr).
WI y
W2 y
+ ... +
k
Wk y=m
with probability
one,
60
Stationary Distributions
of a Markov Chain
lim
Tk
___l!_
=m
y
Set r = Niy). By time n the chain has made exactly r visits to y. Thus the rth visit to y occurs on or before time n, and the (r + l)th visit to y occurs after time n; that is,
TNn(Y) ~ n
Y
<
TNn(y)+l
y'
and hence
or at least these results hold for n large enough so that N,,(y) ~ 1. Since N,,(y) -+ 00 with probability one as n -+ 00, these inequalities and (30) together imply that
. 1im
" .... 00
n
N,,(y)
my
or, equivalently, that (29) holds. Let y be a recurrent state as before, but let Xo have an arbitrary distribution. Then the chain may never reach y. If it does reach y, however, the above argument is valid; and hence, with probability one, N,,(y)ln -+ 1{Ty < 00 }Imy as n -+ 00. Thus (26) is valid. By definition 0 < N,,(y) < n, and hence
(31)
o < N,,(y)
n
< 1.
A theorem from measure theory, known as the dominated convergence theorem, allows us to conclude from (26) and (31) that lim Ex (Nn(Y)) n = Ex
(I{TY<OO})
my
n .... co
2.4.
A recurrent state y is called null recurrent if my = we see that if y is null recurrent, then (32) · 1· 1im G,,(x, y) =lm n "....
From Theorem 1
:L:!.= 1 pm(x, y) =, 0
n
XE
Y.
" .... 00
00
2.4. Null recurrent and positive recurrent states (It can be shown that if y is null recurrent, then
61
(33)
n-+(l()
lim P'(», y) = 0,
XE
9',
which is a stronger result than (32). We will not prove (33), since it will not be needed later and its proof is rather difficult.) A recurrent state y is called positive recurrent if my < 00. It follows from Theorem I that if y is positive recurrent, then lim Gn(y, y)
n-+(l()
_1 > 0.
Thus (32) and (33) fail to hold for positive recurrent states. Consider a Markov chain starting out in a recurrent state y. It follows from Theorem 1 that if y is null recurrent, then, with probability one, the proportion of time the chain is in state y during the first n units of time approaches zero as n -+ 00. On the other hand, if y is a positive recurrent state, then, with probability one, the proportion of time the chain is in state y during the first n units of time approaches the positive limit limy as n -+ 00. The next result is closely related to Theorem 2 of Chapter 1.
Theorem 2 Let x be a positive recurrent state and suppose that x leads to y. Then y is positlie recurrent.
It follows from Theorem 2 of Chapter 1 that y leads to x. Thus there exist positive integers n, and n2 such that
Proof.
and Now pnl +m+n2(y, y) > pnl(y, x)pm(x, x)pn2(x, y), and by summing on m
Gn1+n+n/y,
=
x)pn2(x, y) Gn(x, x) .
As n -+ 00, the left side of this inequality converges to limy and the right side converges to
Hence
_!_ >
my and consequently my <
00.
pnl(y,
x)pn2(x, y) > 0, m;
62
Stationary Distributions
of a Markov Chain
From this theorem and from Theorem 2 of Chapter 1 we see that if C is an irreducible closed set, then every state in C is transient, every state in C is null recurrent, or every state in C is positive recurrent. A Markov chain is called a null recurrent chain if all its states are null recurrent and a positive recurrent chain if all its states are positive recurrent. We see therefore that an irreducible Markov chain is a transient chain, a null recurrent chain, or a positive recurrent chain. If C is a finite closed set of states, then C has at least one positive recurrent state. For
~
yeC
pm(x, y) = 1,
XE
C,
Gn(x, y) = 1,
n
XE
C.
If C is finite and each state in C is transient or null recurrent, then (25) holds and hence
1 = lim ~
n-+oo yeC
Gn(x, y)
n
=
=~
yeC
lim GnCx, y)
n-+oo
0,
The proof of this theorem is now almost immediate. Since C is a finite closed set, there is at least one positive recurrent state in C. Since C is irreducible, every state in C is positive recurrent by Theorem 2. I
Proof Corollary 2 An irreducible Markov chain having afinite number of states is positive recurrent. Corollary 3 A Markov chain having a finite number of states has no null recurrent states.
Corollary 2 follows immediately from Theorem 3. To verify Corollary 3, observe that if y is a recurrent state, then, by Theorem 4 of Chapter 1, y is contained in an irreducible closed set C of recurrent states. Since C is necessarily finite, it follows from Theorem 3 that all states in C, including y itself, are positive recurrent. Thus every recurrent state is positive recurrent, and hence there are no null recurrent states. I
Proof.
2.5.
63
Example 4. Consider the Markov chain described in Example 10 of Chapter 1. We have seen that 1 and 2 are transient states and that 0, 3,4, and 5 are recurrent states. We now see that these recurrent states are necessarily positive recurrent. 2.5. Existence and uniqueness of stationary distributions
In this section we will determine which Markov chains have stationary distributions and when there is a unique such distribution. In our discussion we will need to interchange summations and limits on several occasions. This is justified by the following standard elementary result in analysis, which we state without proof.
Let a(x), x E 9', be nonnegative numbers having finite sum, and let bn(x), x E 9' and n > 1, be such that Ibn(x)I < 1, X E 9' and n > 1, and
Bounded Convergence Theorem. lim bn(x)
n-+ <Xl
= b(x),
=
XE
9'.
Then
lim
n-+<Xl
L a(x)bn(x)
x
L a(x)b(x).
x
Then
L n(z)pm(z,
x)
n(x).
L n(z)
z
Gn(z, x) = n(x),
n
XE
9'.
Theorem 4 Let n be a stationary distribution. state or a null recurrent state, then n(x) = O. Proof
(35)
If x is a transient
XE
9',
64
Stationary Distributions
of a Markov Chain
It follows from this theorem that a Markov chain with no positive recurrent states does not have a stationary distribution.
Theorem 5 An irreducible positive recurrent Markov chain has a unique stationary distribution n, given by
(36) n(x)
= -,
mx
X E [/.
Proof
that (37)
It follows from Theorem 1 and the assumptions of this theorem lim Giz, x) = _1_ ,
n-+
00
mx
x,
Z E [/.
lim
n-+oo
L n{z) Gn{z, x)
z
=-
1 m;
L n{z)
z
=-
1 . mx
Thus if there is a stationary distribution, it must be given by (36). To complete the proof of the theorem we need to show that the function n{x), x E [/, defined by (36) is indeed a stationary distribution. It is clearly nonnegative, so we need only show that (38) and (39) 1 L-= x mx 1
L -1
x
m;
P{x, y) = -,
1 my
E [/.
L plft{z,
x
x) = 1.
L Giz,
x
x)
1,
ZE
I/.
L plft{z,
x
65
L GnCz,x)
x
If f/ is finite, we conc1ude from (37) and (40) that 1 - 1· ~ Gn(z, x) _ ~ - 1 1m i.J - i.J n~oo x n x mx
,
i.e., that (38) holds. Similarly, we conclude that (39) holds by letting n -+ 00 in (41). This completes the proof of the theorem if f/ is finite. The argument to complete the proof for f/ infinite is more complicated, since we cannot directly interchange limits and sums as we did for f/ finite (the bounded convergence theorem is not applicable). Let f/ 1 be a finite subset of f/. We see from (40) that
L
xef/l
GnCz, x) < 1, n
-+ 00
ZE
f/.
The last inequality holds for any finite subset f/ 1 of f/, and hence (42) For if the sum of l/mx over x E f/ exceeded 1, the sum over some finite subset of 9' would also exceed 1. Similarly, we conclude from (41) that if f/ 1 is a finite subset of 9', then ~ By letting n
i.J
xef/l
GnCz, x) P( ) < Gn+ I(Z, y) _ P(z, y) x, Y . n n n in this inequality and using (37), we obtain
-+ 00
xeVl
L -1
m;
P(x, y)
<-
my
<- ,
my
f/.
66
Stationary Distributions
of a Markov Chain
Next we will show that equality holds in (43). It follows from (42) that the sum on y of the right side of (43) is finite. If strict inequality held for some y, it would follow by summing (43) on y that
L _!_
y my
>
= =
L
y
(L _!_
x
P(x,
m;
y»)
1 L-,
x mx
c=--.
L_!_
x mx
Then by (39)
n(x)
= -,
mx
XE
9',
defines a stationary distribution. Thus by the first part of the proof of this theorem c 1
-=-
and hence c = 1. This proves that (38) holds and completes the proof of the theorem. I From Theorems 4 and 5 we immediately obtain
Corollary 4 An irreducible Markov chain is positive recurrent and only if it has a stationary distribution. Example 5.
if
Consider an irreducible birth and death chain on the nonnegative integers. Find necessary and sufficient conditions for the chain to be (a) positive recurrent, (b) null recurrent, (c) transient. From Section 2.2.1 we see that the chain has a stationary distribution if and only if
(44) x= 1
~ v«: 1...l
Ql'"
Px-l
<
qx
00.
67
Thus (44) is necessary and sufficient for the chain to be positive recurrent. We saw in Section 1.7 that (45) is a necessary and sufficient condition for the chain to be transient. For the chain to be null recurrent, it is necessary and sufficient that (44) and (45) both fail to hold. Thus
(46)
and
Px - 1 = x= 1 q 1 ••• qx
I: Po'"
00
00
are necessary and sufficient conditions for the chain to be null recurrent. As an immediate consequence of Corollary 2 and Theorem 5 we obtain
Corollary 5 If a Markov chain having afinite number of states is irreducible, it has a unique stationary distribution.
Recall that Nn(x) denotes the number of visits to x during times m 1, ... ,n. By combining Corollary 1 and Theorem 5 we get
Corollary 6 Let Xn, n > 0, be an irreducible positive recurrent Markov chain having stationary distribution tt. Then with probability one (47)
lim Nix)
n-+
00
= re(x) ,
XE
Y.
11
2.5.1.
Let re be a distribution on f/, i.e., let re(x), x E Y, be nonnegative numbers adding to one, and let C be a subset of Y. We say that n is concentrated on C if
re(x) = 0,
Reducible chains.
x¢ C.
By essentially the same argument used to prove Theorem 5 we can obtain a somewhat more general result.
Theorem 6 Let C be an irreducible closed set of positive recurrent states. Then the Markov chain has a unique stationary distribution t: concentrated on C. It is given by
(48)
re(x)
{~x'
0,
XE
C,
elsewhere.
68
Stationary Distributions
of a Markov Chain
Suppose Co and Clare two distinct irreducible closed sets of positive recurrent states of a Markov chain. It follows from Theorem 6 that the Markov chain has a stationary distribution 1Co concentrated on Co and a different stationary distribution 1C1 concentrated on Ct. Moreover, the distributions 1C« defined for 0 < rt < 1 by
XE
f/,
are distinct stationary distributions (see Exercise 5). By combining Theorems 4-6 and their consequences, we obtain CoroUary 7 Let ~ Markov chain.
(i)
(ii) If f/p is a nonempty irreducible set, the chain has a unique stationary distribution. (iii) If ~ is nonempty but not irreducible, the chain has an infinite number of distinct stationary distributions. Consider now a Markov chain having a finite number of states. Then every recurrent state is positive recurrent and there is at least one such state. There are two possibilities: either the set f/R of recurrent states is irreducible and there is a unique stationary distribution, or f/R can be decomposed into two or more irreducible closed sets and there is an infinite number of distinct stationary distributions. The latter possibility holds for a Markov chain on f/ = {O,1, ... , d} in which d > 0 and 0 and dare both absorbing states. The gambler's ruin chain on {O, 1, ... , d} and the genetics model in Example 7 of Chapter 1 are of this type. For such a chain any distribution 1C«, 0 < rt < 1, of the form
I1C«(X) rt '
{ 0,
rt,
x = 0, x = d,
elsewhere,
is a stationary distribution. Consider the Markov chain introduced in Example 10 of Chapter 1. Find the stationary distribution concentrated on each of the irreducible closed sets. We saw in Section 1.6 that the set of recurrent states for this chain is decomposed into the absorbing state 0 and the irreducible closed set {3, 4, 5}. Clearly the unique stationary distribution concentrated on {O} is given by 7to = (1, 0, 0, 0, 0, 0). To find the unique stationary distriExample 6.
2.6.
Queuing chain
69
but ion concentrated on {3, 4, 5}, we must find nonnegative numbers n(3), n(4), and n(5) summing to one and satisfying the three equations
n(3)
n( 4)
n(5)
= n(3)
624
n(3) = n(4)
3
n(3)
n(4)
3n(5)
n(5).
+ t)
= 1,
= i,
n(4)
= lz'
and
n(5) =
t.
Consequently
7tl
(0, 0, 0,
i, lz' t)
on {3, 4, 5}.
concentrated
2.6. Queuingchain
Consider the queuing chain introduced in Example 5 of Chapter 1. Recall that the number of customers arriving in unit time has density f and mean u. Suppose that the chain is irreducible, which means that f(O) > 0 andf(O) + f(l) < 1 (see Exercise 37 of Chapter 1). In Chapter 1 we saw that the chain is recurrent if J1 < 1 and transient if J1 > 1. In Section 2.6.1 we will show that in the recurrent case (49)
mo=--
1 - Jl
00
and hence 0 is a positive recurrent state. Thus by irreducibility the chain is positive recurrent. On the other hand, if J1 = 1, then mo = 00 and hence 0 is a null recurrent state. We conclude that the queuing chain is null recurrent in this case. Therefore an irreducible queuing chain is positive recurrent if J1 < 1 and null recurrent if Jl = 1, and transient if Jl > 1.
70
Stationary Distributions
of a Markov Chain
*2.6.1. Proof. We will now verify (49). We suppose throughout the proof of this result that J(O) > 0, J(O) + J(I) < 1 and J.l < 1, so that the chain is irreducible and recurrent. Consider such a chain starting at the positive integer x. Then Tx-1 denotes the time to go from state x to state x-I, and Ty-1 - Ty, 1 < y < x-I, denotes the time to go from state y to state y - 1. Since the queuing chain goes at most one step to the left at a time, the Markov property insures that the random variables
are independent. These random variables are identically distributed; for each of them is distributed as min (n > 0:
~1
+ ... +
~n
=n-
1),
i.e., as the smallest positive integer n such that the number of customers served by time n is one more than the number of new customers arriving by time n. Let G(t), 0 < t < 1, denote the probability generation function of the time to go from state 1 to state O. Then (50) G(t) = n=l
00
tnPl(To = n).
The probability generating function of the sum of independent nonnegative integer-valued random variables is the product of their respective probability generating functions. If the chain starts at x, then
is the sum of x independent random variables each having probability generating function G(t). Thus the probability generating function of To is (G(t)Y; that is,
(51)
(G(t)Y
n=l
00
tnpxCTo
n).
o<
< 1,
where <I> denotes the probability generating function of J. To verify (52) we rewrite (50) as G(t) = n=O
00
tn+1P1(To
+ 1) = tPC1,0) + t L tnP1(To =
n= 1
00
+ 1).
2.6.
Queuing chain
71
By using successively (29) of Chapter 1, (51) of this chapter, and the formula P(l, y) = fey), y > 0, we find that
G(t)
=
=
tP(I, tP(I,O)
0)
+ +
t t
t
n= 1
LrL L L
00
y*O
P(I,
Y)Py(To tnpy(To
=
=
n) n)
y*O y*O
pel, y)
L
n= 1
00
= tP(I,
0)
pel, y)(G(t»'
t [f(O) t<l>(G(t».
+L
y*O
f(Y)(G(t»Y]
For 0 < t < I we can differentiate both sides of (52) and obtain
G'(t)
=
<I>(G(t»
tG'(t)<I>'(G(t».
G'(t)
<I>(G(t» I - t<l>'(G(t»'
-+
O<t<1.
I and <I>(t)
1 as t
-+
I and
t--+ 1 x=
L
00
xf(x)tX-1
1
x=1
00
xf(x)
u.
By letting
(54)
t -+
1-
1
Jj
By definition
00
G(t) = ~ P1(To
n=1
n)tn.
But since P(I, x) = P(O, x), x > 0, it follows from (29) of Chapter 1 that the distribution of To for a queuing chain starting in state I is the same as that for a chain starting in state O. Consequently,
G(t)
n=1
L
00
Po(To
n)tn,
Stationary Distributions
of a Markov Chain
00
lim ~
t-+l
00
nPo(To
n= 1
n)tn-1
= ~ nPo(To = n)
n= 1
= Eo(To)
= mo' I
2.7.
We have seen earlier in this chapter that if X", n ~ 0, is an irreducible positive recurrent Markov chain having n as its stationary distribution, then lim
n-+oo
m= 1
x, Y
f/.
In this section we will see when the stronger result lim P"(x, y)
n-+oo
n(y),
x, Y
f/,
holds and what happens when it fails to hold. The positive integer d is said to be a divisor of the positive integer n if nld is an integer. If I is a nonempty set of positive integers, the greatest common divisor of I, denoted by g.c.d. I, is defined to be the largest integer d such that d is a divisor of every integer in l. It follows immediate1y that
I).
In particular, if 1 E I, then g.c.d. I = 1. The greatest common divisor of the set of even positive integers is 2. Let x be a state of a Markov chain such that P'(», x) > 0 for some n > 1, i.e., such that Pxx = PiTx < (0) > O. We define its period dx by
If P(x, x) > 0, then d; = 1. If x and yare two states, each of which leads to the other, then dx = dy• For let n, and n2 be positive integers such that
and
2.7.
73
r: +n2(X, x)
pnl+n+n2(x, x) > pnl(x, y)pn(y, y)pn2(y, x) > 0, so that dx is a divisor of n, + n + n2' Since d; is a divisor of nl + n2, it must be a divisor of n. Thus dx is a divisor of all numbers in the set {n ~ 1: pn(y, y) > O}. Since d, is the largest such divisor, we conclude that dx < dy- Similarly dy < dx, and hence d; = dy• We have shown, in other words, that the states in an irreducible Markov chain have common period d. We say that the chain is periodic with period d if d > 1 and aperiodic if d = 1. A simple sufficient condition for an irreducible Markov chain to be aperiodic is that P(x, x) > 0 for some x E 9'. Since P(O, 0) = f(O) > 0 for an irreducible queuing chain, such a chain is necessarily aperiodic. Example 7. chain. Determine the period of an irreducible birth and death
If some r x > 0, then P(x, x) = r x > 0, and the birth and death chain is aperiodic. In particular, the modified Ehrenfest chain in Example 3 is aperiodic. Suppose r x = 0 for all x. Then in one transition the state of the chain changes either from an odd numbered state to an even numbered state or from an even numbered state to an odd numbered state. In particular, a chain can return to its initial state only after an even number of transitions. Thus the period of the chain is 2 or a multiple of 2. Since p2(0, 0)
= Poq 1 >
0,
we conclude that the chain is periodic with period 2. In particular, the Ehrenfest chain introduced in Example 2 of Chapter 1 is periodic with period 2. Theorem 7 Let Xm n > 0, be an irreducible positive recurrent Markov chain having stationary distribution n, If the chain is aperiodic, (55) lim P'(», y) = n(y),
n->oo
x, Y
9'.
If the chain
is periodic with period d, then for each pair x, y of states in f/ there is an integer r, 0 < r < d, such that P''(x, y) = 0 unless n = md + r for some nonnegative integer m, and
(56)
74
Stationary Distributions
of a Markov Chain
For an illustration of the second half of this theorem, consider an irreducible positive recurrent birth and death chain which is periodic with period 2. If y - x is even, then p2m+ l(X, y) = 0 for all m > 0 and lim p2m(x, y) = 2n(y).
m->oo
We will prove this theorem in an appendix to this chapter, which can be omitted with no loss of continuity. Determine the asymptotic behavior of the matrix P" for the transition matrix P (a) from Example 3, (b) from Example 2. (a) The transition matrix P from Example 3 corresponds to an aperiodic irreducible Markov chain on {O, 1, 2, 3} having the stationary distribution given by
n(O)
Example 8.
= 1,
n(l)
= i,
"8
1
n(2)
= i,
and
n(3)
= 1.
P" .
(b) The transition matrix P from Example 2 corresponds to a periodic irreducible Markov chain on {O, 1, 2, 3} having period 2 and the same stationary distribution as the chain in Example 3. From the discussion following the statement of Theorem 7, we conclude that for n large and even
P" ...:.
i0 40i0 0i0t
0
1
i0
2.8.
Proof of convergence
75 APPENDIX
2.8.
Proof of convergence
We will first prove Theorem 7 in the aperiodic case. Consider an aperiodic, irreducible, positive recurrent Markov chain having transition function P, state space f/, and stationary distribution 'It. We will now verify that the conclusion of Theorem 7 holds for such a chain. Choose a E f/ and let [be the set of positive integers defined by
[=
Then (i) g.c.d. [= 1; (ii) if m E [ and n
E
I, then m
+ n E I.
Property (ii) follows from the inequality pm+n(a, a) > pm(a, a)pn(a, a). Properties (i) and (ii) imply that there is a positive integer n, such that n E [for all n > ni' For completeness we will prove this number theoretic result in Section 2.8.2. Using this result we conclude that P'(a, a) > 0 for n ~ tu, Let x and y be any pair of states in f/. Since the chain is irreducible, there exist positive integers n2 and n3 such that pn2(X, a) > 0 Then for n > ni pn2+n+n3(x, y) > pn2(x, a)pn(a, a)pn3(a, y) > O. We have shown, in other words, that for every pair x, y of states in f/ there is a positive integer no such that
(57)
and
pn(x, y) > 0,
Set
f/2
{(x, y) : x
E f/
and y
9'}.
Then f/2 is the set of ordered pairs of elements in f/. We will consider a Markov chain (Xm Yn) having state space f/2 and transition function P 2 defined by
It follows that Xm n > 0, and Yn, n > 0, are each Markov chains having transition function P, and the successive transitions of the Xn chain and the Yn chain are chosen independently of each other.
76
Stationary Distributions
of a Markov Chain
We will now develop properties of the Markov chain (Xn' Yn). In particular, we will show that this chain is an aperiodic, irreducible, positive recurrent Markov chain. We will then use this chain to verify the conclusion of the theorem. Choose (xo, Yo) E [/2 and (x, y) E y2. By (57) there is an no > 0 such that and Then (58) We conclude from (58) that the chain is both irreducible and aperiodic. The distribution n2 on y2 defined by n2(xO' Yo) = n(xo)n(yo) is a stationary distribution. For
(XO,Yo)
E9'2
Xo EV
LL
n(xo)n(yo)P(xo,
x)P(Yo, y)
yo E9'
= n(x)n(y)
= n2(x, y).
[/2
= min
(n
> 0 : Xn = Yn).
(a, a»
is finite with probability one. Clearly T < 1(a,a), and hence T is finite with probability one. For any n > I (regardless of the distribution of (Xo' Yo»
(59)
Y E Y.
This formula is intuitively reasonable since the two chains are indistinguishable for n > T. To make this argument precise, we choose I < m < n. Then for Z E Y (60) P(Xn = y I T = m, Xm = Ym = z)
= P(Yn =
IT =
m, X m
Ym
z),
since both conditional probabilities equal pn-m(z, y). {TL. n} is the union of the disjoint events
{T
m, X m
Ym
z},
Y,
77
IT <
n)
= P(Yn =
IT <
n)
and hence that (59) holds. Equation (59) implies that P(Xn = y) = P(Xn = y, T < n)
= P(Yn
< P(Yn
and similarly that
= =
y, T < n)
y)
+ Pt'I' > +
IP(Xn = y) - P(Yn = y)1 < Pt'I' > n), lim peT > n) = 0.
n-+
0()
9'.
9'.
Using (63), we can easily complete the proof of Theorem 7. Choose x E f/ and let the initial distribution of (Xn' Yn) be such that P(Xo = x) = 1 and
Yo
E
9'.
Since Xn, n > 0, and Ym n > 0, are each Markov chains with transition function P, we see that
(64)
P(Xn
y)
= pn(x, =
y),
f/,
P(Yn
y)
n(y),
Y E 9'.
n-+
0()
and hence the conclusion of Theorem 7 holds. We first consider a slight extension of Theorem 7 in the aperiodic case. Let C be an irreducible closed set of positive recurrent states such that each state in C has period 1, and let n
2.S.1. Periodic case.
78
Stationary Distributions
of a Markov Chain
be the unique stationary distribution concentrated on C. By looking at the Markov chain restricted to C, we conclude that lim PIl(X, y)
11->00
= n(y) = _!_ ,
my
x, Y
C.
In particular, if y is any positive recurrent state having period 1, then by letting C be the irreducible closed set containing y, we see that
(66) lim P'(», y) =
11->00
_!_ .
my
We now proceed with the proof of Theorem 7 in the periodic case. Let Xm n > 0, be an irreducible positive recurrent Markov chain which is periodic with period d > 1. Set Ym = Xmd, m > O. Then Ym, m > 0, is a Markov chain having transition function Q = pd. Choose y E Y. Then g.c.d. {m I Qm(y, y) > O} = g.c.d. {m I pmd(y, y) > O}
1 = - g.c.d. {n
I p II(y,
y) > O}
=1. Thus all states have period 1 with respect to the Ym chain. Let the XII chain and hence also the Ym chain start at y. Since the X" chain first returns to y at some multiple of d, it follows that the expected return time to y for the Ym chain is d-1my, where my is the expected return time to y for the Xn chain. In particular, y is a positive recurrent state for a Markov chain having transition function Q. By applying (66) to this transition function we conclude that lim Qm(y, y)
m->oo
= .!!_ = dn(y),
my
lim pmd(y, y)
m->oo
dn(y),
Y.
> 0).
Then, in particular, P'ttx, y) > O. We will show that P'(x, y) > 0 only if n - '1 is an integral multiple of d. Choose nl such that pnl(y, x) > O. Then
r: +nl( y, y) > PIII( y, x)prl(x, y) > 0,
2.8.
Proof of convergence
79
and hence '1 + n1 is an integral multiple of d. If P''(»; y) > 0, then by the same argument n + n1 is an integral multiple of d, and therefore so is n - '1' Thus, n = kd + '1 for some nonnegative integer k. There is a nonnegative integer m1 such that '1 = mId + r where o < r < d. We conclude that (68)
pn(x, y)
unless
n = md
+r
for some nonnegative integer m. It follows from (68) and from (28) of Chapter 1 that
(69)
pmd+r(x, y) =
L
k=O
P xCTy = kd
+ r)p(m-k)d(y, y).
Set p(m-k)d(y y) am(k) = ( 0 " , Then by (67) for each fixed k lim am(k) = dn( y).
m-+
00
We can apply the bounded convergence theorem (with f/ replaced by {O,1, 2, ... }) to conclude from (69) that
lim pmd+r(x, y)
m-+oo
= dn(y)
k=O
L
00
Px(Ty
kd
+ r)
= dn{Y)Px{Ty = dn(y),
< (0)
and hence that (56) holds. This completes the proof of Theorem 7.
2.8.2.
of positive integers such that (i) g.c.d. 1= 1; (ii) if m and n are in 1, then m + n is in I. Then there is an no such that n E I for all n > no. We will first prove that 1contains two consecutive integers. Suppose otherwise. Then there is an integer k > 2 and an n1 E I such that n, + k E I and any two distinct integers in I differ by at least k. It follows from property (i) that there is an n E I such that k is not a divisor of n. We can write
n = mk
r,
80
Stationary Distributions
of a Markov Chain
where m is a nonnegative integer and 0 < r < k. It follows from property (ii) that (m + 1)(nl + k) and n + (m + l)nI are each in I. Their difference is
(m
1)(n1
k) -
n-
(m
l)nI
mk -
k-
r,
which is positive and smaller than k. This contradicts the definition of k. We have shown that I contains two consecutive integers, say n1 and nl + 1. Let n > n~. Then there are nonnegative integers m and r such that 0 < r < n 1 and
n-
n~
1)
= mni
+
r
r.
Thus
n
r(n1
(ni -
m)n1'
E
I for all
ni.
Exercises
1
Consider a Markov chain having state space {O, 1, 2} and transition matrix
012
o [.4 1 .3
2
.4 .4 .3 . .2 .4 .4
.2]
Show that this chain has a unique stationary distribution tt and find n. 2 Consider a Markov chain having transition function P such that P(x, y) = ocY' X E f/ and y E f/, where the oc/s are constants. Show that the chain has a unique stationary distribution n, given by 7t(y) = ocY' y E s: 3 Let tt be a stationary distribution of a Markov chain. Show that if 7t(x) > 0 and x leads to y, then 7t(y) > O. 4 Let tt be a stationary distribution of a Markov chain. Suppose that y and z are two states such that for some constant c
P(x, y) = cP(x, z),
XE
f/.
Show that 7t(y) = C7t(z). 5 Let 7to and 7t 1 be distinct stationary distributions for a Markov chain. (a) Show that for 0 < oc < 1, the function 7t(% defined by 7t(%(x) = (1 - oc)7to(x) is a stationary distribution.
+ OC7t1(x),
XE
f/,
Exercises
81
(b) Show that distinct values of ex determine distinct stationary distributions 1ttz. Hint: Choose Xo E Y such that 1to( xo) =F 1t1(xo) and show that 1tixo) = 1tp(xo) implies that ex = p.
6 Consider
a birth and death chain on the nonnegative integers and suppose that p., = I,px = p > Of or x > l,andqx = q = 1 - p > 0 for x > 1. Find the stationary distribution when it exists. distribution of the Ehrenfest chain. (b) Find the mean and variance of this distribution.
function of the modified Ehrenfest chain introduced in Example 3, and show that this chain has the same stationary distribution as does the original Ehrenfest chain. Find the stationary distribution of the birth and death chain described in Exercise 2 of Chapter 1. Hint: Use the formula
10 Let Xm n
> 0, be a positive recurrent irreducible birth and death chain, and suppose that Xo has the stationary distribution n, Show that
P(Xo = y
I Xl = x) = P(x,
y),
x, Y
Y.
Hint: Use the definition of 1tx given by (9). 11 Let Xm n > 0, be the Markov chain introduced in Section 2.2.2. Show that if Xo has a Poisson distribution with parameter t, then Xn has a
Poisson distribution with parameter
tp"
A + - (1
q
_ pn).
power series.
distribution. Use the result of Exercise 12 to find cov (Xm' Xm+J. m ~ 0 and n > O.
14 Consider a Markov chain on the nonnegative integers having transition function P given by P(x, x + 1) = p and P(x, 0) = 1 - p, where < p < 1. Show that this chain has a unique stationary distribution
1t
and find
it.
82
P(x, y) = 1,
Y.
What is the stationary distribution of an irreducible Markov chain having d < 00 states and a doubly stochastic transition function?
16 Consider an irreducible Markov chain having finite state space [/,
transition function P such that P(x, x) = 0, X E [/ and stationary distribution n, Let Px, x E Y, be such that 0 < Px < I, and let Q(x, y), x E [/ and y E Y, be defined by Q(x, x) and Q(x, y)
= =
1 - Px
PxP(x, y),
y ¥= x.
Show that Q is the transition function of an irreducible Markov chain having state space [/ and stationary distribution n', defined by n
'( ) _
X-I
p; In(x)
Lye9'
s;
n(y)
'
XE[/.
The interpretation of the chain with transition function Q is that starting from x, it has probability 1 - Px of remaining in x and probability Px of jumping according to the transition function P.
17 Consider the Ehrenfest chain.
Suppose that initially all of the balls are in the second box. Find the expected amount of time until the system returns to that state. Hint: Use the result of Exercise 7(a).
d},
where c and d are positive integers. Starting from anyone of the first c states, the particle jumps in one transition to a state chosen uniformly from the last d states; starting from any of the last d states, the particle jumps in one transition to a state chosen uniformly from the first c states. (a) Show that the chain is irreducible. (b) Find the stationary distribution. chain having the transition matrix given by Exercise 19 of Chapter 1. (a) Find the stationary distribution concentrated on each of the irreducible closed sets. (b) Find limn-+oo Gn(x, y)Jn. 20 Consider a Markov chain having transition matrix as in Exercise 20 of Chapter 1. (a) Find the stationary distribution concentrated on each of the irreducible closed sets. (b) Find limn-+oo Gn(x, y)Jn.
19 Consider a Markov
Exercises
83
21 Let Xm n
(a) Find the approximate distribution of Xn for n large and even. (b) Find the approximate distribution of Xn for n large and odd. 22 Consider a Markov chain on {O,1, 2} having transition matrix
012 P=1
2
[0 0 10
-! -!
(a) Show that the chain is irreducible. (b) Find the period. (c) Find the stationary distribution.
23 Consider a Markov chain on {O,1, 2, 3, 4} having transition matrix
0 00 10 0 P=2 31 41
12
"3
1
1.
3
34 00
0 0 0 0
(a) Show that the chain is irreducible. (b) Find the period. (c) Find the stationary distribution.
Consider again a system that at any time can be in one of a finite or countably infinite set Y of states. We call g the state space of the system. In Chapters 1 and 2 we studied the behavior of such systems at integer times. In this chapter we will study the behavior of such systems over all times t > O. 3.1. Construction of jump processes
Consider a system starting in state Xo at time O. We suppose that the system remains in state Xo until some positive time Tl' at which time the system jumps to a new state Xl =f:. Xo. We allow the possibility that the system remains permanently in state xo, in which case we set T1 = 00. If T 1 is finite, upon reaching X 1 the system remains there until some time T2 > Tl when it jumps to state X2 =f:. Xl' If the system never leaves Xl' we set T2 = 00. This procedure is repeated indefinitely. If some Tm = 00, we set Tn = 00 for n > m. Let X(t) denote the state of the system at time t, defined by
o
(1)
Tl T2
t t t
The process defined by (1) is called a jump process. At first glance it might appear that (1) defines X(t) for all t > O. But this is not necessarily the case. Consider, for example, a ball bouncing on the floor. Let the state of the system be the number of bounces it has made. We make the physically reasonable assumption that the time in seconds between the nth bounce and the (n + l)th bounce is 2-n• Then x, = nand
84
3.1.
85
< 2 and r, --+- 2 as n --+- 00. Thus (1) defines X(t) only for o < t < 2. By the time t = 2 the ball will have made an infinite number of bounces. In this case it would be appropriate to define X(t) = 00 for We see that
Tn
> 2.
In general, if
(2)
n-+oo
If the X(t)
Tn
= 00,
then (1) does define X(t) for all t > O. We will now specify a probability structure for such a jump process. We suppose that all states are of one of two types, absorbing or nonabsorbing. Once the process reaches an absorbing state, it remains there permanently. With each non-absorbing state x, there is associated a distribution function Fx(t), - 00 < t < 00, which vanishes for t < 0, and transition probabilities Qxy, y E Y, which are nonnegative and such that Qxx = 0 and
(4)
A process starting at x remains there for a random length of time T 1 having distribution function F; and then jumps to state X( T 1) = y with probability QXY' y E Y. We assume that Tl and X(T1) are chosen independently of each other, i.e., that PiTI
Here, as in the previous chapters, we use the notation Pi ) and Ex( ) to denote probabilities of events and expectations of random variables defined in terms of a process initially in state x. Whenever and however the process jumps to a state y, it acts just as a process starting initially at y. For example, if x and yare both non-absorbing states, PiT! < s, X(T!)
= y,
T2 T!
< t, X(T2)
z) = FxCs)QxyFy(t)Qyz.
Similar formulas hold for events defined in terms of three or more jumps. If x is an absorbing state, we set QXY = bxy, where
s;
=(
0:
Y =1=
x, x.
Equation (4) now holds for all x E Y. We say that the jump process is pure or non-explosive if (3) holds with probability one regardless of the starting point. Otherwise we say the
86
process is explosive. If the state space !/ is finite, the jump process is necessarily non-explosive. It is easy to construct examples having an infinite state space which are explosive. Such processes, however, are unlikely to arise in practical applications. At any rate, to keep matters simple we assume that our process is non-explosive. The set of probability zero where (3) fails to hold can safely be ignored. We see from (1) that X(t) is then defined for all t > O. Let P xy(t) denote the probability that a process starting in state x will be in state y at time t. Then and
In particular, PxiO) = bxy. We can also choose the initial state x according to an initial distribution no(x), x E !/, where no(x) > 0 and
y) =
L no(x)P xit).
x
The transition function PXy(t) cannot be used directly to obtain such pro babilities as unless the jump process satisfies the Markov property, which states that for o < Sl < ... < s; < S < t and Xl, ... , xn, x, Y E !/,
By a Markov pure jump process we mean a pure jump process that satisfies the Markov property. It can be shown, although not at the level of this book, that a pure jump process is Markovian if and only if all non-absorbing states x are such that s, i.e., such that (5)
1 - F xCt
t
> 0,
1 - FxCs)
s) = 1 _ F (t)
x ,
s,
>
o.
Now a distribution function F; satisfies (5) if and only ifit is an exponential distribution function (see Chapter 5 of Introduction to Probability Theory). We conclude that a pure jump process is Markovian if and only if F, is an exponential distribution for all non-absorbing states x.
3.1.
87
Let X(t), 0 < t < 00, be a Markov pure jump process. If x is a nonabsorbing state, then F; has an exponential density Ix. Let qx denote the parameter of this density. Then qx = 1/ExC1:1) > 0 and
t t
> 0, < O.
f.
,
OCJ
q xe - qxs ds
= e - qxt,
> O.
If x is an absorbing state, we set qx = O. It follows from the Markov property that for 0 < t 1 < ... < tn and Xl, ... , x, in !/,
(6)
P(X(t1)
Xl'
...
X(tn) = xn)
= x1)Pxtxit2
= P(X(t1)
- t1)···
PXn-tXn(tn - tn-I)'
In particular,
Since
Pxy(t
we conclude that (7)
s) =
L P x(X(t)
z:
= z, X(t
s) = y),
Pxit
s) =
L Pxit)Pzy(s),
z
Equation (7) is known as the Chapman-Kolmogorov equation. The transition function P xit) satisfies the integral equation (8) which we will now verify. obvious fact that If
X
> 0,
is an absorbing
> O.
for a process starting at x, occurs if and only if the first process to z, and the process of time. Thus
Suppose X is not an absorbing state. Then the event {1:t < t, X(1:t) = z and X(t) = y} jump occurs at some time s < t and takes the goes from z to y in the remaining t - s units
88
so Pi7:1 < t and X(t) = y) = ~ Pi7:1 < t, X(7:1) = z and X(t) = y) zif-x
Also
= uxy e-qxt . s:
Consequently, P xy(t) = Px(X(t)
= y)
y)
y)
+ I.t qxe-qxS (~
o
t-
zif-x
S))
ds,
as claimed. Replacing s by as
(9)
>
o.
It follows from (9) that Pxy(t) is continuous in t for t > O. Therefore the integrand in (9) is a continuous function, so we can differentiate the
P~y(t)
=-
«»xit)
+ s, ~ QxzPzy(t),
zif-x
> O.
+ s, ~ QxzP zy(O)
zif-x
Set
(11)
x, Y
E [1'.
Then
(12)
qxy =
Y = x,
=1=
x.
~ qxy yif-x
qx
-qxx·
89
The quantities qxy, x E f/ and y E f/, are called the infinitesimal parameters of the process. These parameters determine qx and QXY' and thus by our construction determine a unique Markov pure jump process. We can rewrite (10) in terms of the infinitesimal parameters as
(14)
z
> O.
This equation is known as the backward equation. If f/ is finite, we can differentiate the Chapman-Kolmogorov equation with respect to s, obtaining
(15)
P~,(t
s) =
L P xZ<t)P~,(s),
z
s > 0 and
> O.
L Pxz(t)P~,(O),
z
> 0,
> O.
Formula (16) is known as the forward equation. It can be shown that (15) and (16) hold even if f/ is infinite, but the proofs are not easy and will be omitted. In Section 3.2 we will describe some examples in which the backward or forward equation can be used to find explicit formulas for P x,(t). 3.2. Birth and death processes
Let f/ = {O, 1, ... ,d} or f/ = {O, 1, 2, ... }. By a birth and death process on f/ we mean a Markov pure jump process on f/ having infinitesimal parameters qxy such that
Iy -
x] > 1.
Thus a birth and death process starting at x can in one jump go only to the states x-lor x + 1. The parameters Ax = qx,x+l, X E f/, and /lx = qx,x-l' X E f/, are called respectively the birth rates and death rates of the process. The parameters qx and QXY of the process can be expressed simply in terms of the birth and death rates. By (13)
90
Thus x is an absorbing state if and only if Ax = J1.x = 0. If x is a nonabsorbing state, then by (12) J1.x Ax (18) Qxy = Ax 0,
+
Ax
J1.x J1.x
,
,
y=xy=x+
elsewhere.
1, 1,
A birth and death process is called a pure birth process if J1.x = 0, X E 9', and a pure death process if Ax = 0, X E 9'. A pure birth process can move only to the right, and a pure death process can move only to the left. Given nonnegative numbers Ax, x E 9', and J1.x, x E 9', it is natural to ask whether there is a birth and death process corresponding to these parameters. Of course, J1.o = is a necessary requirement, as is Ad = if 9' is finite. The only additional problem is that explosions must be ruled out if 9' is infinite. It is not difficult to derive a necessary and sufficient condition for the process to be non-explosive. A simple sufficient condition for the process to be non-explosive is that for some positive numbers A and B
x >0.
This condition holds in all the examples we will consider. In finding the birth and death rates of specific processes, we will use some standard properties of independent exponentially distributed random variables. Let ~1' ... ' ~n be independent random variables having exponential distributions with respective parameters (Xl' ••• ' (Xn. Then min (~1' ... '~n) has an exponential distribution with parameter (Xl + ... + Cln and (19) Moreover, with probability one, the random variables n distinct values. To verify these results we observe first that P(min(~l'···' k = 1, ... , n.
~b •.. , ~n
take on
~n) > t)
~n
> t)
PGn > t)
and hence that min (~1' ... , ~n) has the indicated exponential distribution.