100% found this document useful (3 votes)

685 views

Hoelbook

Paul G. Hoel, sidney port, AND CHARLES J. Stone are the authors of "statistics: With a view to applications" the series grew out of a three-quarter course in probability, statistics, and Stochastic Processes. The first two volumes present the fundamental ideas of probability theory.

Uploaded by

gp3523469066

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (3 votes)

685 views

Hoelbook

Uploaded by

gp3523469066

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 214

Hoel Port Stone Iintroduction to Stochastic Processes

The Houghton MifHin Series in Statistics under the Editorship of Herman Chernoff

LEO BREIMAN

Probability and Stochastic Processes: With a View Toward Applications Statistics: With a View Toward Applications
PAUL G. HOEL, SIDNEY C. PORT, AND CHARLES J. STONE

Introduction to Probability Theory Introduction to Statistical Theory Introduction to Stochastic Processes

PAUL F. LAZARSFELD AND NEIL W. HENRY

Latent Structure Analysis

GOTTFRIED E. NOETHER

Introduction to Statistics-A

Fresh Approach

Y. S. CHOW, HERBERT ROBBINS, AND DAVID SmGMUND

Great Expectations: The Theory 0/ Optimal Stopping

I. RICHARD SAVAGE

Statistics: Uncertainty and Behavior

Introd uction to Stochastic Processes

Paul G. Hoe. Sidney C. Port Charles

J. Stone

University of California, Los Angeles

HOUGHTON MIFFLIN COMPANY

BOSTON

New York

Atlanta

Geneva, Illinois

Dallas

Palo Alto

1972 BY HOUGHTON MIFFLIN COMPANY.

All rights reserved. No part of this work may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and recording, or by any information storage or retrieval system, without permission in writing from the publisher.
PRINTED IN THE U.S.A. LmRARY OF CONGRESS CATALOG CARD NUMBER: ISBN:

79-165035

0-395-12076-4

General Preface

This three-volume series grew out of a three-quarter course in probability, statistics, and stochastic processes taught for a number of years at UCLA. We felt a need for a series of books that would treat these subjects in a way that is well coordinated, but which would also give adequate emphasis to each subject as being interesting and useful on its own merits. The first volume, Introduction to Probability Theory, presents the fundamental ideas of probability theory and also prepares the student both for courses in statistics and for further study in probability theory, including stochastic processes. The second volume, Introduction to Statistical Theory, develops the basic theory of mathematical statistics in a systematic, unified manner. Together, the first two volumes contain the material that is often covered in a two-semester course in mathematical statistics. The third volume, Introduction to Stochastic Processes, treats Markov chains, Poisson processes, birth and death processes, Gaussian" processes, Brownian motion, and processes defined in terms of Brownian motion by means of elementary stochastic differential equations.

Preface

In recent years there has been an ever increasing interest in the study of systems which vary in time in a random manner. Mathematical models of such systems are known as stochastic processes. In this book we present an elementary account of some of the important topics in the theory of such processes. We have tried to select topics that are conceptually interesting and that have found fruitful application in various branches of science and technology. A stochastic process can be defined quite generally as any collection of random variables X(t), t E T, defined on a common probability space, where T is a subset of ( - 00, (0) and is thought of as the time parameter set. The process is called a continuous parameter process if T is an interval having positive length and a discrete parameter process if T is a subset of the integers. If the random variables X(t) all take on values from the fixed set !/, then !/ is called the state space of the process. Many stochastic processes of theoretical and applied interest possess the property that, given the present state of the process, the past history does not affect conditional probabilities of events defined in terms of the future. Such processes are called Markov processes. In Chapters 1 and 2 we study Markov chains, which are discrete parameter Markov processes whose state space is finite or countably infinite. In Chapter 3 we study the corresponding continuous parameter processes, with the "Poisson process" as a special case. In Chapters 4-6 we discuss continuous parameter processes whose state space is typically the real line. In Chapter 4 we introduce Gaussian processes, which are characterized by the property that every linear combination involving a finite number of the random variables X(t), t E T, is normally distributed. As an important special case, we discuss the Wiener process, which arises as a mathematical model for the physical phenomenon known as "Brownian motion." In Chapter 5 we discuss integration and differentiation of stochastic processes. There we also use the Wiener process to give a mathematical model for "white noise." In Chapter 6 we discuss solutions to nonhomogeneous ordinary differential equations having constant coefficients whose right-hand side is either a stochastic process or white noise. We also discuss estimation problems involving stochastic processes, and briefly consider the "spectral distribution" of a process.
vii

viii

Preface

This text has been designed for a one-semester course in stochastic processes. Written in close conjunction with Introduction to Probability Theory, the first volume of our three-volume series, it assumes that the student is acquainted with the material covered in a one-semester course in probability for which elementary calculus is a prerequisite. Some of the proofs in Chapters 1 and 2 are somewhat more difficult than the rest of the text, and they appear in appendices to these chapters. These proofs and the starred material in Section 2.6 probably should be omitted or discussed only briefly in an elementary course. An instructor using this text in a one-quarter course will probably not have time to cover the entire text. He may wish to cover the first three chapters thoroughly and the remainder as time permits, perhaps discussing those topics in the last three chapters that involve the Wiener process. Another option, however, is to emphasize continuous parameter processes by omitting or skimming Chapters 1 and 2 and concentrating on Chapters 3-6. (For example, the instructor could skip Sections 1.6.1, 1.6.2, 1.9, 2.2.2, 2.5.1, 2.6.1, and 2.8.) With some aid from the instructor, the student should be able to read Chapter 3 without having studied the first two chapters thoroughly. Chapters 4-6 are independent of the first two chapters and depend on Chapter 3 only in minor ways, mainly in that the Poisson process introduced in Chapter 3 is used in examples in the later chapters. The properties of the Poisson process that are needed later are summarized in Chapter 4 and can be regarded as axioms for the Poisson process. The authors wish to thank the UCLA students who tolerated preliminary versions of this text and whose comments resulted in numerous improvements. Mr. Luis Gorostiza obtained the answers to the exercises and also made many suggestions that resulted in significant improvements. Finally, we wish to thank Mrs. Ruth Goldstein for her excellent typing.

T able of Contents

Markov Chains
1.1 1.2 1.3 1.4
Markov chains having two states Transition function and initial distribution Examples Computations with transition functions 1.4.1 Hitting times 1.4.2 Transition matrix Transient and recurrent states Decomposition of the state space 1.6.1 Absorption probabilities 1.6.2 Martingales Birth and death chains Branching and queuing chains 1.8.1 Branching chain 1.8.2 Queuing chain Appendix Proof of results for the branching and queuing chains 1.9.1 Branching chain 1.9.2 Queuing chain

1.5 1.6

1.7 1.8

1 2 5 6 12 14 16 17 21 25 27 29 33 34 36 36 38 39 47 47 49 50 53 56 60 63 67 69 70

1.9

Stationary Distributions of a Markov Chain

2.1 2.2
Elementary properties of stationary distributions Examples 2.2.1 Birth and death chain 2.2.2 Particles in a box Average number of visits to a recurrent state Null recurrent and positive recurrent states Existence and uniqueness of stationary distributions 2.5.1 Reducible chains Queuing chain 2.6.1 Proof
ix

2.3 2.4 2.5 2.6

Table of Contents

2.7 2.8

Convergence to the stationary distribution Appendix Proof of convergence 2.8.1 Periodic case 2.8.2 A result from number theory Construction of jump processes Birth and death processes 3.2.1 Two-state birth and death process Poisson process 3.2.2 Pure birth process 3.2.3 3.2.4 Infinite server queue Properties of a Markov pure jump process 3.3.1 Applications to birth and death processes Mean and covariance functions Gaussian processes The Wiener process

72 75 77 79 84 84 89 92 94 98 99 102 104 111 111 119 122

Markov Pure Jump Processes

3.1 3.2

3.3

Second Order Processes

4.1 4.2 4.3

Continuity, Integration, and Differentiation of Second Order Processes 128

5.1 Continuity assumptions 5.1.1 Continuity of the mean and covariance functions 5.1.2 Continuity of the sample functions Integration Differentiation White noise

5.2 5.3 5.4

128 128 130 132 135 141

Stochastic Differential Equations, Estimation Theory, and Spectral Distributions 152

6.1 6.2 6.3 First order differential equations Differential equations of order n 6.2.1 The case n = 2 Estimation theory 6.3.1 General principles of estimation 6.3.2 Some examples of optimal prediction Spectral distribution

154
159
166

6.4

170 173 174 177 190 199 201

Answers to Exercises Glossary of Notation Index

Markov Chains

Consider a system that can be in anyone of a finite or countably infinite number of states. Let 51' denote this set of states. We can assume that 51' is a subset of the integers. The set 51' is called the state space of the system. Let the system be observed at the discrete moments of time n = 0, 1, 2, ... , and let Xn denote the state of the system at time n. Since we are interested in non-deterministic systems, we think of Xm n > 0, as random variables defined on a common probability space. Little can be said about such random variables unless some additional structure is imposed upon them. The simplest possible structure is that of independent random variables. This would be a good model for such systems as repeated experiments in which future states of the system are independent of past and present states. In most systems that arise in practice, however, past and present states of the system influence the future states even if they do not uniquely determine them. Many systems have the property that given the present state, the past states have no influence on the future. This property is called the Markov property, and systems having this property are called Markov chains. The Markov property is defined precisely by the requirement that

for every choice of the nonnegative integer n and the numbers Xo, ... , xn+ 1, each in 51'. The conditional probabilities P(Xn+ 1 = Y I Xn = x) are called the transition probabilities of the chain. In this book we will study Markov chains having stationary transition probabilities, i.e., those such that P(Xn+ 1 = Y I Xn = x) is independent of n. From now on, when we say that Xm n > 0, forms a Markov chain, we mean that these random variables satisfy the Markov property and have stationary transition probabilities. The study of such Markov chains is worthwhile from two viewpoints. First, they have a rich theory, much of which can be presented at an elementary level. Secondly, there are a large number of systems arising in practice that can be modeled by Markov chains, so the subject has many useful applications.
1

Markov Chains

In order to help motivate the general results that will be discussed later, we begin by considering Markov chains having only two states.

1.1.

Markov chains having two states

For an example of a Markov chain having two states, consider a machine that at the start of any particular day is either broken down or in operating condition. Assume that if the machine is broken down at the start of the nth day, the probability is p that it will be successfully repaired and in operating condition at the start of the (n + l)th day. Assume also that if the machine is in operating condition at the start of the nth day, the probability is q that it will have a failure causing it to be broken down at the start of the (n + l)th day. Finally, let 1to(O)denote the probability that the machine is broken down initially, i.e., at the start of the Oth day. Let the state 0 correspond to the machine being broken down and let the state 1 correspond to the machine being in operating condition. Let Xn be the random variable denoting the state of the machine at time n. According to the above description

P(Xn+ P(Xn+
and

1 1

= =

1 I Xn 0

0) 1)

= p, = q,

I Xn =
0)

P(Xo

=
0

1tO(O).

Since there are only two states, 0 and 1, it follows immediately that

P(Xn+ P(Xn+
and that the probability 1to(I) From this information, We observe that

= =

I Xn = =
1)

0) 1)

= =

1 - p, 1 - q,

1 I Xn

1to(I) of being initially in state 1 is given by

= P(Xo =

1 - 1to(O).

we can easily compute P(Xn = 0) and P(Xn = 1). 0 and Xn+1

P(Xn+1

= P(Xn =
= P(Xn

=
0

+ P(Xn =
= 0) = 1)

1 and Xn+1

= O)P(Xn+\l = =
0)

I Xn
0

+
= =

P(Xn = I)P(Xn+1

I Xn

(1 - p)P(Xn

+ qP(Xn =
=
0)

= (1 - p)P(Xn

= 0)

+ q(1 - P(Xn = 0» +
q.

(1 - p - q)P(Xn

1.1.

Markov chains having two states

Now P(Xo and

= 0) =

1to(O),so

P(XI

(1 - P - q)1to(O)

P(X2 = 0) = (1 - p - q)P(Xl

+q
(l - p - q)].

(1 - p - q)21to(0)

+ q[1 + +q
n-l

It is easily seen by repeating this procedure n times that (2)

P(Xn

(1 - p - qt1to(O)

i=O

(1 - p - q)i.

In the trivial case p = q = 0, it is clear that for all n and P(Xn

1to(1).

Suppose now that p + q > O. Then by the formula for the sum of a finite geometric progression,
n-l

(1 - p - q)i =

i=O

-p- q P+q

We conclude from (2) that

(3)

P(Xn = 0) =

q p+q

(1 - p - qt (1to(O) -

q p+q

and consequently that

(4)

P(Xn = 1) =

p+q

(1 - p - q)n (1to(l)

p+q

Suppose that p and q are neither both equal to zero nor both equal to 1. Then 0 < p + q < 2, which implies that 11 - p - ql < 1. In this case we can let n -+ 00 in (3) and (4) and conclude that (5) lim P(Xn
n-+ao

= 0) =

and

lim P(X
n-+ao

= 1) =

We can also obtain the probabilities q /(p + q) and p/(p + q) by a different approach. Suppose we want to choose 1to(O) and 1to(l) so that P(Xn = 0) and P(Xn = 1) are independent of n. It is clear from (3) and (4) that to do this we should choose
1to(O) =
---==----

+
q

and

1to(l) =

----=-

Thus we see that if Xn, n > 0, starts out with the initial distribution P(Xo

and

P(Xo

Markov Chains

then for all n P(Xn = 0) =

and

P(Xn = 1) =

p+q

The description of the machine is vague because it does not really say whether Xm n > 0, can be assumed to satisfy the Markov property. Let us suppose, however, that the Markov property does hold. We can use this added information to compute the joint distribution of Xo, Xb ... , Xn• For example, let n = 2 and let xo, Xl' and X2 each equal 0 or 1. Then
P(Xo

Xo, Xl

Xl' and X2

X2)

= P(Xo =
= P(Xo

Xo and Xl XO)P(XI

Xl)P(X2

X2 I Xo

Xo and Xl

Xl)

= Xl I Xo =

XO)P(X2

x21 Xo

Xo and Xl

Xl)·

Now P(Xo = xo) and P(XI = Xl I Xo = xo) are determined by p, q, and 1to(O); but without the Markov property, we cannot evaluate P(X2 = X2 I Xo = Xo and Xl = Xl) in terms of p, q, and 1to(O). If the Markov property is satisfied, however, then
P(X2

I Xo = =

Xo and Xl

Xl)

P(X2

I Xl

x.),

which is determined by p and q. In this case

P(Xo
=

Xo, Xl
=

Xl' and X2

X2)

P(Xo

XO)P(XI

= Xl I Xo =

XO)P(X2

= X2

I Xl

= Xl)·

For example,
P(Xo

= 0, Xl - 1, and =
=

= 0)

P(Xo

O)P(XI

1 I Xo = 0)P(X2

0 I Xl = 1)

1to(O)pq.

The reader should check the remaining entries in the following table, which gives the joint distribution of Xo, Xl' and X2.
Xo

Xl 0 0 1 1 0 0 1 1

P(Xo

= Xo, Xl = Xl' and X2 = x2)

0 0 0 0 1 1 1 1

0 1 0 1 0 1 0 1

1to(O)(l - p)2 1to(O)(l - p)p

1to(O)pq

1to(O) (l - q) p
(1 - 1to(O»q(1 (1 - 1to(O»qp - p)

(1 - 1to(O»(l - q)q (1 - 1to(O»(1 _ q)2

1.2.

Transition function and initial distribution

1.2.

Transition function and initial distribution

Let Xn, n > 0, be a Markov chain having state space f/. (The restriction to two states is now dropped ..) The function P(x, y), x E f/ and y E f/, defined by (6) P(X, y)

P(XI

I Xo =

x),

x, Y

f/,

is called the transition function of the chain. (7) and (8) P(X, y) > 0,

It is such that
E

x, Y

f/,

L P(x,
y

y) = 1,

f/. we see that

Since the Markov chain has stationary (9) P(Xn+l

probabilities,

I Xn =
xo,""

P(x, y),

n>l.

It now follows from the Markov property that

(10)

P(Xn+1

I Xo =

Xn-l

= xn-l,

P(x, y).

In other words, if the Markov chain is in state x at time n, then no matter how it got to x, it has probability P(x, y) of being in state y at the next step. For this reason the numbers P(x, y) are called the one-step transition probabilities of the Markov chain. The function 1ro(x), x E f/, defined by (11) 1ro(X)

P(Xo

x),

f/,

is called the initial distribution of the chain. (12) and (13) 1ro(X) > 0,
XE

It is such that f/,

L 1ro(x)
x

= 1.

The joint distribution of Xo, ... , Xn can easily be expressed in terms of the transition function and the initial distribution. For example, P(Xo Also, P(Xo

XO, Xl

Xl)

= =

P(Xo

XO)P(XI

I Xo =

Xo)

1rO(XO)P(XO' Xl)'

XO, Xl = Xl' X2

X2)

=
=

P(XO = XO, Xl

XI)P(X2
=

= I Xo

I Xo

XO' Xl = Xl)

1ro(XO)P(XO' XI)P(X2

= XO' Xl

= x.).

Markov Chains

Since Xn, n > 0, satisfies the Markov property and has stationary transition probabilities, we see that

P(X2

I Xo =

XO, Xl

Xl)

=
= =

P(X2 P(XI P(XI'

=
=

X2 X2

I Xl =
I Xo
=

xd Xl)

X2)' XI)P(XI, x2)· Xn).

Thus
P(Xo (14) P(Xo

Xo, Xl

Xl' X2

=
xn)

X2)

TCo(Xo)P(xo,

By induction it is easily seen that

Xo, ... , Xn

TCO(XO)P(Xo, Xl) ... P(Xn-l,

It is usually more convenient, however, to reverse the order of our definitions. We say that P(x, y), X E 9' and y E 9', is a transition/unction if it satisfies (7) and (8), and we say that TCo(X), X E 9', is an initial distribution if it satisfies (12) and (13). It can be shown that given any transition function P and any initial distribution TCo, there is a probability space and random variables Xn, n > 0, defined on that space satisfying (14). It is

not difficult to show that these random variables form a Markov chain having transition function P and initial distribution no. The reader may be bothered by the possibility that some of the conditional probabilities we have discussed may not be well defined. For example, the left side of (I) is not well defined if
P(Xo

Xo, ... , Xn

xn)

= O.

This difficulty is easily resolved. Equations (7), (8), (12), and (13) defining the transition functions and the initial distributions are well defined, and Equation (14) describing the joint distribution of Xo,"" Xn is well defined. It is not hard to show that if (14) holds, then (1), (6), (9), and (10) hold whenever the conditional probabilities in the respective equations are well defined. The same qualification holds for other equations involving conditional probabilities that will be obtained later. It will soon be appa~ent that the transition function of a Markov chain plays a much greater role in describing its properties than does the initial distribution. For this reason it is customary to study simultaneously all Markov chains having a given transition function. In fact we adhere to the usual convention that by "a Markov chain having transition function P," we really mean the family of all Markov chains having that transition function.
1.3. Examples

In this section we will briefly describe several interesting examples of Markov chains. These examples will be further developed in the sequel.

1.3.

Examples

Random walk. Let l, 2, ••• be independent integervalued random variables having common density f Let Xo be an integer-valued random variable that is independent of the e /s and set Xn = Xo + I + ... + en· The sequence X", n > 0, is called a random walk. It is a Markov chain whose state space is the integers and whose transition function is given by
Example 1.

P(x, y) = fey P(Xo = xo,' .. , Xn = xn)

- x).

To verify this, let 1to denote the distribution of Xo. Then

P(Xo

= =

xo,

Xl -

Xo,""

= Xn - xn-l)

= P(Xo

XO)p(el = Xl - xo)··· - xo)·· ·f(xn xn),

peen = xn - xn-l)

= 1to(XO)f(XI

- xn-l)

= 1to(XO)P(Xo, Xl) ... P(Xn-l,

and thus (14) holds. Suppose a "particle" moves along the integers according to this Markov chain. Whenever the particle is in x, regardless of how it got there, it jumps to state y with probability fey - x). As a special case, consider a simple random walk in which f(l) = p, f( -1) = q, and f(O) = r, where p, q, and r are nonnegative and sum to one. The transition function is given by

P(x, y)

C'
;:
0,

1, y=x+ y=x1, y = x, elsewhere.

Let a particle undergo such a random walk. If the particle is in state X at a given observation, then by the next observation it will have jumped to state X + 1 with probability p and to state x-I with probability q; with probability r it will still be in state x. Ehrenfest chain. The following is a simple model of the exchange of heat or of gas molecules between two isolated bodies. Suppose we have two boxes, labeled 1 and 2, and d balls labeled 1, 2, ... , d. Initially some of these balls are in box 1 and the remainder are in box 2. An integer is selected at random from 1, 2, ... , d, and the ball labeled by that integer is removed from its box and placed in the opposite box. This procedure is repeated indefinitely with the selections being independent from trial to trial. Let Xn denote the number of balls in box 1 after the nth trial. Then X", n > 0, is a Markov chain on f/ = {O, 1, 2, ... , d}.
Example 2.

Markov Chains

The transition function of this Markov chain is easily computed. Suppose that there are x balls in box I at time n. Then with probability xld the ball drawn on the (n + l)th trial will be from box 1 and will be transferred to box 2. In this case there will be x-I balls in box 1 at time n + 1. Similarly, with probability (d - x)/d the ball drawn on the (n + l)th trial will be from box 2 and will be transferred to box 1, resulting in x + I balls in box 1 at time n + 1. Thus the transition function of this Markov chain is given by

d'
P(x, y) =

x x

y y

x-I,

- d'
0,

elsewhere.

Note that the Ehrenfest chain can in one transition only go from state x to x-lor x + 1 with positive probability. A state a of a Markov chain is called an absorbing state if Pea, a) = 1 or, equivalently, if Pea, y) = for y '# a. The next example uses this definition. Suppose a gambler starts out with a certain initial capital in dollars and makes a series of one dollar bets against the house. Assume that he has respective probabilities p and q = 1 - P of winning and losing each bet, and that if his capital ever reaches zero, he is ruined and his capital remains zero thereafter. Let Xm n > 0, denote the gambler's capital at time n. This is a Markov chain in which is an absorbing state, and for x > 1
Gambler's ruin chain. Example 3.

q, (15)
P(x, y) =

y = x-I, y = x + 1,

elsewhere.

Such a chain is called a gambler's ruin chain on f/ = {O,1,2, ... }. We can modify this model by supposing that if the capital of the gambler increases to d dollars he quits playing. In this case and d are both absorbing states, and (15) holds for x = 1, ... , d - l.

For an alternative interpretation of the latter chain, we can assume that two gamblers are making a series of one dollar bets against each other and that between them they have a total capital of d dollars. Suppose the first gambler has probability p of winning any given bet, and the second gambler has probability q = 1 - p of winning. The two gamblers play until one

1.3.

Examples

of them goes broke. Let Xn denote the capital of the first gambler at time n. Then Xm n > 0, is a gambler'S ruin chain on {O,1, ... , d}.
Example 4. Birth and death chain. Consider a Markov chain either on f/' = {O,1, 2, ... } or on Y = {O, 1, ... , d} such that starting from x the chain will be at x-I, x, or x + 1 after one step. The transition

function of such a chain is given by

y = x-I,

Y = x, Y = x + 1, elsewhere, where Px, qx, and r x are nonnegative numbers such that Px + qx + r x = 1. The Ehrenfest chain and the two versions of the gambler's ruin chain are examples of birth and death chains. The phrase "birth and death" stems from applications in which the state of the chain is the population of some living system. In these applications a transition from state x to state x + 1 corresponds to a "birth," while a transition from state x to state x-I corresponds to a "death." In Chapter 3 we will study birth and death processes. These processes are similar to birth and death chains, except that jumps are allowed to occur at arbitrary times instead of just at integer times. In most applications, the models discussed in Chapter 3 are more realistic than those obtainable by using birth and death chains. Consider a service facility such as a checkout counter at a supermarket. People arrive at the facility at various times and are eventually served. Those customers that have arrived at the facility but have not yet been served form a waiting line or queue. There are a variety of models to describe such systems. We will consider here only one very simple and somewhat artificial model; others will be discussed in Chapter 3.
Queuing chain. Example 5.

Let time be measured in convenient periods, say in minutes. Suppose that if there are any customers waiting for service at the beginning of any given period, exactly one customer will be served during that period, and that if there are no customers waiting for service at the beginning of a period, none will be served during that period. Let en denote the number of new customers arriving during the nth period. We assume that are independent nonnegative integer-valued random variables having common density f

el' e2, ...

Markov

Chains

Let Xo denote the number of customers present initially, and for n > 1, let Xn denote the number of customers present at the end of the nth period. If Xn = 0, then Xn+ 1 = en+ 1; and if Xn > 1, then Xn+ 1 = Xn + en+ 1 - 1. It follows without difficulty from the assumptions on em n > 1, that Xn, n > 0, is a Markov chain whose state space is the nonnegative integers and whose transition function P is given by P(O, y) and

= f(y)
-x

Pix, y) = f(y

1),

x>1.

Example 6. Branching chain. Consider particles such as neutrons or bacteria that can generate new particles of the same type. The initial set of objects is referred to as belonging to the Oth generation. Particles generated from the nth generation are said to belong to the (n + 1)th generation. Let Xm n > 0, denote the number of particles in the nth generation. Nothing in this description requires that the various particles in a generation give rise to new particles simultaneously. Indeed at a given time, particles from several generations may coexist. A typical situationis illustrated in Figure 1: one initial particle gives rise to two particles. Thus Xo = 1 and Xl = 2. One of the particles in the first generation gives rise to three particles and the other gives rise to one particle, so that X2 = 4, We see from Figure 1 that X3 = 2. Since neither of the particles in the third generation gives rise to new particles, we conclude that X4 = and consequently that Xn = 0 for all n > 4. In other words, the progeny of the initial particle in the zeroth generation become extinct after three generations.

Figure 1

1.3.

Examples

In order to model this system as a Markov chain, we suppose that each particle gives rise to ~ particles in the next generation, where ~ is a nonnegative integer-valued random variable having density f We suppose that the number of offspring of the various particles in the various generations are chosen independently according to the density f Under these assumptions Xm n > 0, forms a Markov chain whose state space is the nonnegative integers. State 0 is an absorbing state. For if there are no particles in a given generation, there will not be any particles in the next generation either. For x > 1
P(x,y)

P(~l

+ ... +

y),

where ~ h ... ,~x are independent random variables having common density f In particular, P(l, y) = fey), y > O. If a particle gives rise to ~ = 0 particles, the interpretation is that the particle dies or disappears. Suppose a particle gives rise to ~ particles, which in turn give rise to other particles; but after some number of generations, all descendants of the initial particle have died or disappeared (see Figure 1). We describe such an event by saying that the descendants of the original particle eventually become extinct. An interesting problem involving branching chains is to compute the probability p of eventual extinction for a branching chain starting with a single particle or, equivalently, the probability that a branching chain starting at state 1 will eventually be absorbed at state O. Once we determine p, we can easily find the probability that in a branching chain starting with x particles the descendants of each of the original particles eventually become extinct. Indeed, since the particles are assumed to act independently in giving rise to new particles, the desired probability is just p". The branching chain was used originally to determine the probability that the male line of a given person would eventually become extinct. For this purpose only male children would be included in the various generations. Consider a gene composed of d subunits, where d is some positive integer and each subunit is either normal or mutant in form. Consider a cell with a gene composed of m mutant subunits and d - m normal subunits. Before the cell divides into two daughter cells, the gene duplicates. The corresponding gene of one of the daughter cells is composed of d units chosen at random from the 2m mutant subunits and the 2(d - m) normal subunits. Suppose we follow a fixed line of descent from a given gene. Let Xo be the number of mutant subunits initially
Example 7.

Markov Chains

present, and let Xm n > 1, be the number present in the nth descendant gene. Then Xm n > 0, is a Markov chain on f/ = {O,I, 2, ... , d} and

P(X, Y) =

States
1.4.

and d are absorbing states for this chain.

Computations with transition functions

Let Xm n > 0, be a Markov chain on Y having transition function P. In this section we will show how various conditional probabilities can be expressed in terms of P. We will also define the n-step transition function of the Markov chain. We begin with the formula
(16)

P(Xn+

xn+

1, ••• ,

Xn+m

I Xo =

Xo, ... , Xn
••

Xn) Xn+m)'

= P(Xm xn+1)·

P(Xn+m-l,

To prove (16) we write the left side of this equation as

P(Xo P(Xo

Xo, ... , Xn+m xo, ... , Xn

Xn+m) Xn)

By (14) this ratio equals

1!o(Xo)P(xo, Xl) ... P(xn+m-1, Xn+m) Xn) 1!O(XO)P(Xo, Xl) ... P(Xn-1,

which reduces to the right side of (16). It is convenient to rewrite (16) as (17) P(Xn+1

= Yl"'"

Xn+m

Ym I Xo

Xo,"" Yl)P(Yl'

Xn-1

Xn-l'

= x)
Ym)·

= P(x,
Let Ao, ... , An-1 that
(18)

Y2) ... P(Ym-l'

be subsets of f/. It follows from (17) and Exercise 4(a)

Yl,"" Xn+m = Ym I Xo
=
E

P(Xn+1

Ao,·.·,

Xn-1

An-b

Xn = x) Ym)·

P(x, Yl)P(Yl'

Y2)' .. P(Ym-l'

Let B1, that

(19)

•••

Bm be subsets of Y. It follows from (18) and Exercise 4(b) Xn+m Bm

P(Xn+l

EBb"',

I Xo

Ao,""

Xn-1

E An-h

K; = x)
Ym)·

1: ... 1:
YIEBI YmEBm

P(x, Yl)P(Yl'

Y2) ..• P(Ym-l'

1.4.

Computations with transition functions

The m-step transition function pm(x, y), which gives the probability of going from x to y in m steps, is defined by
(20) pm(x, y) =

L'" L
Yl Ym-l

P(x, Yl)P(Yh

Y2)'"

for m > 2, by pl(X, y)

P(x, y), and by 1 y) = ( ' 0, x

po(x,

y, (19) that P''(»; y).

elsewhere.

We see by setting B, = ... = Bm-l

(21)

= 9' and Bm = {y} in

P(Xn+m

I Xo

Ao, ... , Xn-l

An-l,

In particular, by setting Ao = ... = An-l (22)

P(Xn+m

= 9', we see that = =

pm(x, y).

I Xn =

It also follows from (21) that

(23)

P(Xn+m

= = =

I Xo =
Y z

x, Xn

pm(z, y).

Since (see Exercise 4(c))

pn+m(x, y)

=
=
=

P(Xn+m

I Xo = I Xo =

x) x)P(Xn+m

L P(Xn
z z

I Xo = = z),

x, Xn

L P'(», z)P(Xn+m
pn+m(x, y) =

I Xo =

x, X;

we conclude from (23) that (24)

L pn(x,
z

z)pm(z,

y).

For Markov chains having a finite number of states, (24) allows us to think of P" as the nth power of the matrix P, an idea we will pursue in Section 1.4.2. Let no be an initial distribution for the Markov chain. Since
P(Xn

=
=

L P(Xo
x

= =

x, X; x)P(Xn

L P(Xo
x

I x, =

x),

we see that
(25) P(Xn

L no(x)pn(x,
x

y).

This formula allows us to compute the distribution of Xn in terms of the initial distribution no and the n-step transition function P".

Markov Chains

For an alternative method of computing the distribution that

of Xm observe

P(Xn+l

= L P(Xn =
x x

Xn+l

= L P(Xn =
so that (26)

x)P(Xn+l

Y I X;

x),

P(Xn+l

= y) =

L P(Xn
x

x)P(x, y).

If we know the distribution of Xo, we can use (26) to find the distribution of Xl' Then, knowing the distribution of Xl' we can use (26) to find the distribution of X2• Similarly, we can find the distribution of Xn by applying (26) n times. We will use the notation Px( ) to denote probabilities of various events defined in terms of a Markov chain starting at x. Thus

denotes the probability that a Markov chain starting at x is in a state a at time 3 but not at time I or at time 2. In terms of this notation, (19) can be rewritten as
(27)

P(Xn+l

Bh· .. , Xn+m E Bm I Xo

Ao,··.,

Xn-l
E

An-I' Xn = X)
E

= PiXl
1 .4.1 . H itting ti meso T....of A is defined by

Bl, ... , Xm

Bm).

Let A be a subset of g. The hitting time

T.... = min (n > 0: Xn

if Xn E A for some n > 0, and by T.... = 00 if Xn ¢ A for all n > O. In other words, T....is the first positive time the Markov chain is in (hits) A. Hitting times play an important role in the theory of Markov chains. In this book we will be interested mainly in hitting times of sets consisting of a single point. We denote the hitting time of a point a E f/ by 'Fa rather than by the more cumbersome notation 1{a}' An important equation involving hitting times is given by (28)

pn(x, y) =

m=l

PiTy

= m)pn-m(y,

y),

n>1.

In order to verify (28) we note that the events {Ty = m, Xn = y}, 1 < m < n, are disjoint and that

{Xn

= y} = U {1'y =
m=1

m, X n

= y}.

1.4.

Computations with transition functions

We have in effect decomposed the event {Xn = y} according to the hitting time of y. We see from this decomposition that

m=l
n

PxCTy = m, Xn = y) PxCTy PxCTy

L m=l
m=l

= =
=

m)P(Xn m)P(Xn

= =

I Xo = x, Ty = m)

L
n

Y I Xo

Xl =F y, ... ,

-L

Px(Ty

m)pn-m(y,

y),

m=l

and hence that (28) holds. Show that if a is an absorbing state, then P''(x, a) PxCTa < n), n ~ 1.
Example 8. If a is an absorbing state, then pn-m(a,

1 for 1 < m < n, and

hence (28) implies that P'(x, a) =

m=l

L
n

PxCTa = m)pn-m(a,

L m=l

Px(Ta = m) = Px(Ta < n).

Observe that

and that PxCTy

z¢y

PxCX 1

z¢y

P(x, z)P(z,

y).

For higher values of n the probabilities PiTy the formula

(29)

= n) can be found by using

PxCTy = n

1) =

z¢y

P(x, z)PzCTy = n),

n>1.

This formula obvious. For it is necessary z to y for the

is a consequence of (27), but it should also be directly in order to go from x to y for the first time at time n + 1, to go to some state z =1= y at the first step and then go from first time at the end of n additional steps.

Markov Chains

1.4.2. Transition matrix. Suppose now that the state space !/ is finite, say !/ = {O, 1, ... ,d}. In this case we can think of P as the transition matrix having d + 1 rows and columns given by

~ [p(~,
d F or example, {O, 1, 2, 3} is

...

d P(O;

P(d, 0) matrix

P(d, d) of the gambler's 2 3 rum chain on

the transition

0 0
1 2

1000 q0p0 0q0P 0001 Formula (24)

Similarly, we can regard P" as an n-step transition matrix. with m = n = 1 becomes

p2(X, y)

L P(x,
z

z )P(z, y).

Recalling the definition of ordinary matrix multiplication, we observe that the two-step transition matrix p2 is the product of the matrix P with itself. More generally, by setting m = 1 in (24) we see that (30)

pn+ lex, y) =

L pn(x,
z

z )P(z, y).

It follows from (30) by induction that the n-step transition matrix P" is the nth power of P. An initial distribution TCo can be thought of as a (d + I)-dimensional row vector
1to

(TCo(O), ...

, TCo(d)). row vector

If we let

1tn

denote the (d
1tn

I)-dimensional

= (P(Xn =

0), ... , P(Xn = d)),

then (25) and (26) can be written respectively as

and

The two-state Markov chain discussed in Section 1.1 is one of the few examples where P" can be found very easily.

1.5.

Transient and recurrent states

Example 9.

Consider the two-state Markov chain having one-step

transition matrix
p=

[1 -

1- q ,

where p

> O. Find P".

In order to find pn(o, 0) obtain

P''(O, 0) =

P O(Xn = 0), we set 7ro(O) = 1 in (3) and

p+q

(1 - P -

p+q

In order to find pn(o, 1) obtain

pn(o, 1) =

=
P

Po(Xn = 1), we set 7ro(1) (1 - p - qt P

(4) and

p+q

Similarly, we conclude that

p(nl, 0) =

q p+q

- (1 - p - qt

q p+q

and P''(I, 1) = It follows that pn = p P p+q

+ (1 -

p - qt

p+q

+q q

p] + (1 - p P

q)n [

-q

Pq] .

1.5.

Transient and recurrent states

Let Xn, n > 0, be a Markov chain having state space !/ and transition function P. Set Pxy = PxCTy < (0). Then Pxy denotes the probability that a Markov chain starting at x will be in state y at some positive time. In particular, Pyy denotes the probability that a Markov chain starting at y will ever return to y. A state y is called recurrent if Pyy = 1 and transIent if Pyy < 1. If y is a recurrent state, a Markov chain starting at y returns to y with probability one. If y is a transient state, a Markov chain starting at y has positive probability 1 Pyy of never returning to y. If y is an absorbing state, then PiTy = 1) =

Markov

Chains

P(y, y) = 1 and hence Pyy = 1; thus an absorbing state is necessarily recurrent. Let ly(z), Z E !/, denote the indicator function of the set {y} defined by
= y, Z =1= y.
Z

Let N(y) denote the number of times n > 1 that the chain is in state y. Since ly(Xn) = 1 if the chain is in state y at time nand ly(Xn) = 0 otherwise, we see that (31) N(y)
=

n= 1

liXn)'

The event {N (y) > I} is the same as the event {Ty < co}. Thus PiN(y)

> 1)

Px(I'y < (0) = PxY'

Let m and n be positive integers. By (27), the probability that a Markov chain starting at x first visits y at time m and next visits y n units of time later is PiTy = m)Py(Ty = n). Thus PiN(y)

> 2) =

m= 1 n= 1

I: I:

PiTy

m)PiTy

= PxyPyy•

Similarly we conclude that (32) Since Px(N(y)

= m) = PiN(y)

m>1.

> m) - PiN(y)

>m
m>1.

1),

it follows from (32) that (33) Also Px(N(y) so that (34) PiN(y)
= =

PiN(y)

m) = PXyp;y-l(1

- Pyy),

0) = 1 - PiN(y)

> 1),

0) = 1 - Pxy'

These formulas are intuitively obvious. To see why (33) should be true, for example, observe that a chain starting at x visits state y exactly m times if and only if it visits y for a first time, returns to y m - I additional times, and then never again returns to y.

1.5. Transient and recurrent states

We use the notation Ei ) to denote expectations of random variables defined in terms of a Markov chain starting at x. For example, (35)
It follows from (31) and (35) that

n=1
CXl

CXl

Ex(1y(X n))

= ~ pn(x, y).
n= 1

Set
G(x, y)

= ExCN(y)) = ~

CXl

pn(x, y).

n= 1

Then G(x, y) denotes the expected number of visits to y for a Markov chain starting at x.
Theorem 1 (i) Let y be a transient state. Then

PiNey)
and (36) G(x, y) =

< (0)

Pxy , 1 - Pyy

9',

which is finite for all x G(y, y) = (37) If Pxy

00.

9'.
Then PiNey)

(ii) Let y be a recurrent state.

Also

(0)

1 and

PiNey)

(0)

= PiTy < = 0,
while

(0)

= Pxy,

9'.

= 0,

then G(x, y)

if Pxy > 0, then

G(x, y)

00.

This theorem describes the fundamental difference between a transient state and a recurrent state. If y is a transient state, then no matter where the Markov chain starts, it makes only a finite number of visits to y and the expected number of visits to y is finite. Suppose instead that y is a recurrent state. Then if the Markov chain starts at y, it returns to y infinitely often. If the chain starts at some other state x, it may be impossible for it to .ever hit y. If it is possible, however, and the chain does visit y at least once, then it does so infinitely often.

Markov Chains

Proof. (32) that

Let y be a transient state. Since 0 < Pyy < 1, it follows from = 00) = lim PJlJN(y) ::::: ) = lim PXyp~y-1 = m
111-+00 111-+00

PiNey) By (33)

G(x, y) = Ex(N(y»

= I:
111=1 00

mPiN(y)

m) - Pyy)·

I: mpXyp~y-1(1
111=1

Substituting

t = Pyy in the power series

~ m~
111=1

~-1

1 , (1 - t)2

we conclude that G(x, y) = 1 - Pyy

Pxy

00.

This completes the proof of (i). Now let y be recurrent. Then Pyy = 1 and it follows from (32) that

Px(N(y)

(0)

lim Px(N(y)
111-+00

> m)

= lim Pxy
111-+00

= Pxy-

In particular, PiNey) = (0) = 1. If a nonnegative random variable has positive probability of being infinite, its expectation is infinite. Thus G(y, y) 0, then PiTy implies that P"(x, y) Pxy > 0, then PiNey)

Ey(N(y»

00.

If Pxy

= m) = 0 for all finite positive integers m, so = 0, n > 1; thus G(x, y) = 0 in this case.
= (0) = Pxy

(28) If

> 0 and hence

= 00.

G(x, y) = EiN(y» This completes the proof of Theorem 1. Let y be a transient state. Since

I: P"(x,
00 ,,=1

y) = G(x, y)

00,

I/,

we see that (38) lim P"(x, y)

,,-+ 00

I/.

t.6.

Decomposition of the state space

A Markov chain is called a transient chain if all of its states are transient and a recurrent chain if all of its states are recurrent. It is easy to see that a Markov chain having a finite state space must have at least one recurrent state and hence cannot possibly be a transient chain. For if 9' is finite and all states are transient, then by (38)

ye[F

lim pn(x, y)
n->oo

= lim
=

L
=

pn(x, y)
E

n->ooye9'

lim PiXn
n-> 00

9')

=
which is a contradiction.

lim 1
n-> 00

1.6.

Decomposition of the state space to to to z.

Let x and Y be two not necessarily distinct states. We say that x leads Y if Pxy > O. It is left as an exercise for the reader to show that x leads Y if and only if P''(x, y) > 0 for some positive integer n. It is also left the reader to show that if x leads to Y and y leads to z, then x leads to Theorem 2 Let x be a recurrent state and suppose that x leads to y. Then y is recurrent and Pxy = Pyx = 1. Proof Since

We assume that y #; x, for otherwise there is nothing to prove. PiTy

< (0)

Pxy > 0,

we see that PiTy = n) > 0 for some positive integer n. Let no be the least such positive integer, i.e., set
(39)

min (n > 1: PiTy

n) > 0).

It follows easily from (39) and (28) that pno(x, y) > 0 and (40) pm(x, y) = 0, 1 < m < no.

Since pnO(x, y) > 0, we can find states Yl' ... , Yno-l such that
PiXl

Yl' ... ,Xno-1

Yno-l' Xno

= Pix,

Yl) ... P(Yno-l,

y) > O.

None of the states Yb ... , Yno-l equals x or Y; for if one of them did equal x or y, it would be possible to go from x to Y with positive probability in fewer than no steps, in contradiction to (40).

Markov Chains

We will now show that Pyx = 1. Suppose on the contrary that Pyx < 1. Then a Markov chain starting at Y has positive probability 1 - Pyx of never hitting x. More to the point, a Markov chain starting at x has the positive probability
P(x, Yl) ... P(Yno-l, y)(l - Pyx)

of visiting the states Yl, ... , Yno-l' Y successively in the first no times and never returning to x after time no. But if this happens, the Markov chain never returns to x at any time n > 1, so we have contradicted the assumption that x is a recurrent state. Since Pyx = 1, there is a positive integer n1 such that pnl(y, x) > 0. Now
Pnl+n+no( y, y ) - P Y(X nl +n+no -) Y -

> P,(Xnl
Hence
G(y, y) >
n=nl+1+no

Xn1

Xn1 +n+no

L
00

pn(y,

L
00

pnl +n+no(y, y)

n=l

> pnl( y, x)PnO(x, y)

L
00

pn(x, x) x) =

n=l

pnl(y,

x)PnO(x, y)G(x,

+ 00,

from which it follows that Y is also a recurrent state. Since Y is recurrent and Y leads to x, we see from the part of the theorem that has already been verified that Pxy = 1. This completes the proo( I A nonempty set C of states is said to be closed if no state inside of C leads to any state outside of C, i.e., if
(41) Pxy = 0, pn(x, y) = 0,
XE

C and Y 1= c.

Equivalently (see Exercise 16), C is closed if and only if

(42)
XE

C, Y 1= C, and n > 1. C and Y 1= C,

Actually, even from the weaker condition (43)

P(x, y) p2(X,

= 0,

we can prove that C is closed. For if (43) holds, then for x

y) =
z e 9'

C and Y 1= C

P(x, z)P(z, y) P(x, z)P(z, y) = 0,

zeC

1.6. Decomposition of the state space

and (42) follows by induction. If C is closed, then a Markov chain starting in C will, with probability one, stay in C for all time. If a is an absorbing state, then {a} is closed. A closed set C is called irreducible if x leads to y for all choices of x and y in C. It follows from Theorem 2 that if C is an irreducible closed set, then either every state in C is recurrent or every state in C is transient. The next result is an immediate consequence of Theorems 1 and 2. Corollary 1 Let C be an irreducible closed set of recurrent states. Then Pxy = 1, Px(N(y) = (0) = 1, and G(x, y) = 00 for all choices of x and y in C. An irreducible Markov chain is a chain whose state space is irreducible, that is, a chain in which every state leads back to itself and also to every other state. Such a Markov chain is necessarily either a transient chain or a recurrent chain. Corollary 1 implies, in particular, that an irreducible recurrent Markov chain visits every state infinitely often with probability one. We saw in Section 1.5 that if [/ is finite, it contains at least one recurrent state. The same argument shows that any finite closed set of states contains at least one recurrent state. Now let C be a finite irreducible closed set. We have seen that either every state in C is transient or every state in C is recurrent, and that C has at least one recurrent state. It follows that every state in C is recurrent. We summarize this result: Theorem 3 Let C be a finite irreducible closed set of states. Then every state in C is recurrent. Consider a Markov chain having a finite number of states. Theorem 3 implies that if the chain is irreducible it must be recurrent. If the chain is not irreducible, we can use Theorems 2 and 3 to determine which states are recurrent and which are transient. Example 10. Consider a Markov chain having the transition matrix

0 0
1 2

3 4 5

0000 t !!0 0 0 1 2 1 1 055505 1 1 0 0 0 6 "3 t 1 00020! 0 0 0 .i. 0 .3. 4 4

Determine which states are recurrent and which states are transient.

Markov Chains

As a first step in studying this Markov chain, we determine by inspection which states lead to which other states. This can be indicated in matrix form as 0 0
1 1

5 0

2
3 4

+0000 +++++ +. + + + + 000++ 000++ 000++

+ + + + +

The x, y element of this matrix is + or 0 according as PXY is positive or zero, i.e., according as x does or does not lead to y. Of course, if P(x, y) > 0, then Pxy > O. The converse is certainly not true in general. For example, P(2, 0) = 0; but
P2(2,0)

P(2, I)P(I, 0)

= t· i- = lo >

so that P20 > O. State 0 is an absorbing state, and hence also a recurrent state. We see clearly from the matrix of + 's and O's that {3, 4, 5} is an irreducible closed set. Theorem 3 now implies that 3, 4, and 5 are recurrent states. States 1 and 2 both lead to 0, but neither can be reached from O. We see from Theorem 2 that 1 and 2 must both be transient states. In summary, states 1 and 2 are transient, and states 0, 3, 4, and 5 are recurrent. Let f/ Tdenote the collection of transient states in f/, and let f/ R denote the collection of recurrent states in 9'. In Example 10, 9'T = {I, 2} and 9' R = {O,3, 4, 5}. The set 9'R can be decomposed into the disjoint irreducible closed sets C1 = {O} and C2 = {3, 4, 5}. The next theorem shows that such a decomposition is always possible whenever f/ R is nonempty.
Theorem 4 Suppose that the set 9'R of recurrent states is nonempty. Then 9'R is the union of a finite or countably infinite number of disjoint irreducible closed sets C1, C2, ••••

Choose x E 9'R and let C be the set of all states y in 9'R such that x leads to y. Since x is recurrent, Pxx = I and hence x E C. We will now verify that C is an irreducible closed set. Suppose that y is in C and y leads to z. Since y is recurrent, it follows from Theorem 2 that z is recurrent. Since x leads to y and y leads to z, we conclude that x leads to z. Thus z is in C. This shows that C is closed. Suppose that y and z are both in C. Since x is recurrent and x leads to y, it follows from
Proof

1.6.

Decomposition of the state space

Theorem 2 that y leads to x. Since y leads to x and x leads to z, we conclude that y leads to z. This shows that C is irreducible. To complete the proof of the theorem, we need only show that if C and D are two irreducible closed subsets of 9'R, they are either disjoint or identical. Suppose they are not disjoint and let x be in both C and D. Choose y in C. Now x leads to y, since x is in C and C is irreducible. Since D is closed, x is in D, and x leads to y, we conclude that y is in D. Thus every state in C is also in D. Similarly every state in D is also in C, so that C and D are identical. I We can use our decomposition of the state space of a Markov chain to understand the behavior of such a system. If the Markov chain starts out in one of the irreducible closed sets C, of recurrent states, it stays in C, forever and, with probability one, visits every state in C, infinitely often. If the Markov chain starts out in the set of transient states 9' T' it either stays in 9'T forever or, at some time, enters one of the sets C, and stays there from that time on, again visiting every state in that C, infinitely often. 1.6.1 Absorption probabilities. Let C be one of the irreducible closed sets of recurrent states, and let pc<x) = PiTc < (0) be the probability that a Markov chain starting at x eventually hits C. Since the chain remains permanently in C once it hits that set, we call Pc(x) the probability that a chain starting at x is absorbed by the set C. Clearly Pc(x) = 1, X E C, and Pc(x) = 0 if x is a recurrent state not in C. It is not so clear how to compute pcCx) for x E 9'n the set of transient states. If there are only a finite number of transient states, and in particular if 9' itself is finite, it is always possible to compute Pc(x), x E 9't» by solving a system of linear equations in which there are as many equations as unknowns, i.e., members of 9'T. To understand why this is the case, observe that if x E 9'T, a chain starting at x can enter C only by entering C at time 1 or by being in 9'T at time 1 and entering C at some future time. The former event has probability Lyec P(x, y) and the latter event has probability Lye.9'T P(x, y)Pc(y). Thus
(44)
Pc(x) =
yeC

P(x, y)

+ 1:

P(x, y)pcCy),

ye.9'T

Equation (44) holds whether 9' T is finite or infinite, but it is far from clear how to solve (44)for the unknowns pcCx), x E 9'r» when fl'T is infinite. An additional difficulty is that if 9'T is infinite, then (44) need not have a unique solution. Fortunately this difficulty does not arise if 9'T is finite.

Markov Chains

Suppose the set 9'T of transient states is finite and let C be an irreducible closed set of recurrent states. Then the system of equations (45) f(x)
=
yeC

Theorem 5

P(x, y)

+ L P(x, y)f(y),
ye[/T

has the unique solution

(46)

f(x) If (45) holds, then fey)

=
zeC

pc<x) ,

Proof

P(y, z)

+ L P(y, z)f(z),
ze
[/T

Substituting this into (45) we find that f(x)

yeC

P(x, y)

ye9'T

P(x, y)P(y, z)

zeC

+L
ye[/T

L
ze[/T

P(x, y)P(y, z)f(z).

The sum of the first two terms is just PxCTc < 2), and the third term reduces to LZe[/T p2(X, z)f(z), which is the same as Lye[/T p2(X, y)f(y). Thus f(x) = PxCTc < 2) + L P2(X, y)f(y).
ye[/T

By repeating this argument indefinitely or by using induction, we conclude that for all positive integers n (47) f(x)
E

= PxCTc <

+ L pn(x, y)f(y),
ye[/T

Since each y (48)

9'T is transient, it follows from (38) that lim P'(x, y) = 0,

n-+
0()

9' and

9'T.

According to the assumptions of the theorem, 9'T is a finite set. It therefore follows from (48) that the sum in (47) approaches zero as n -+ 00. Consequently for x E 9'T f(x) as desired.
Example 11.
=

lim PxCTc < n) = PxCTc < (0) = Pc(x) ,

n-+
0()

I
Consider the Markov chain discussed in Example 10. and

Find From (44) and the transition matrix in Example 10, we see that PIO and P20 are determined by the equations

PIO = i + !PIO + !P20

1.6.

Decomposition of the state space

and
Pzo

tpI0

+ jPZO'

Solving these equations we find that PI0 = t and Pzo = t· By similar methods we conclude that P{3,4,5}(1) = t and P{3,4,5}(2) = -t. Alternatively, we can obtain these probabilities by subtracting p{o}(I) and p{0}(2) from 1, since if there are only a finite number of transient states,
(49)

To verify (49) we note that for x

~ Pclx)
i

f/ T

=~
i

Px(Tcl

< (0) = PxCT[/'R < (0).

Since there are only a finite number of transient states and each transient state is visited only finitely many times, the probability PX(T[/'R < (0) that a recurrent state will eventually be hit is 1, so (49) holds. Once a Markov chain starting at a transient state x enters an irreducible closed set C of recurrent states, it visits every state in C. Thus
(50)
Pxy

pcCx),

f/ T and Y

It follows from (50) that in our previous example P13 = P14 = P15 = P{3,4,5}(1)
= =

and
PZ3 = PZ4 = PZ5 = P{3,4,5}(2)

t -t.

1.6.2. Martingales. Consider a Markov chain having state space {O, ... , d} and transition function P such that
d

(51)

~
y=O

yP(x, y)

=
=

x = 0, ... , d.

Now
E[Xn+l

I Xo

=
d

Xo,""

Xn-1
=

Xn-l'

Xn = x]
=

=~
d

y=O

yP[Xn+1
yP(x, y)

I Xo

Xo,""

Xn-1

xn-1, X;

=~
y=O

by the Markov property. We conclude from (51) that

(52) E[Xn+1

I Xo

Xo,""

Xn-1

= xn-1,

Xn = x] = x,

i.e., that the expected value of Xn+ 1 given the past and present values of Xo, ... , Xn equals the present value of Xn. A sequence of random variables

Markov Chains

having this property is called a martingale. Martingales, which need not be Markov chains, playa very important role in modern probability theory. They arose first in connection with gambling. If Xn denotes the capital of a gambler after time n and if all bets are "fair," that is, if they result in zero expected gain to the gambler, then Xn, n > 0, forms a martingale. Gamblers were naturally interested in finding some betting strategy, such as increasing their bets until they win, that would give them a net expected gain after making a series of fair bets. That this has been shown to be mathematically impossible does not seem to have deterred them from their quest. It follows from (51) that

y=o

yP(O, y) = 0,

and hence that P(O, 1) = ... = P(O, d) = 0. Thus is necessarily an absorbing state. It follows similarly that d is an absorbing state. Consider now a Markov chain satisfying (51) and having no absorbing states other than and d. It is left as an exercise for the reader to show that under these conditions the states 1, ... , d - 1 each lead to state 0, and hence each is a transient state. If the Markov chain starts at x, it will eventually enter one of the two absorbing states and d and remain there permanently. It follows from Example 8 that

EiXn)

=L
=

y=o

yPiXn

y=o d-l

yP"(x, y) ypn(x, y) ypn(x, y)

y=l

+ +

dpn(x, d) dPiTd

d-l

y= 1

< n).

Since states 1, 2, ... , d - 1 are transient, we see that P''(x, y) ~ n ~ 00 for y = 1, 2, ... , d - 1. Consequently,
n-+ co

° as
=

lim EiXn)

dPi1d

< (0)

= dPxd'

On the other hand, it follows from (51) (see Exercise 13(a)) that EXn EXn-1 = ... = EXo and hence that EiXn) = x. Thus lim EiX n)
n-+ co

1.7.

Birth and death chains

By equating the two values of this limit, we conclude that (53) Since Pxo
Pxd =

= 0, ... , d.

Pxd

= 1, it follows from (53) that

Pxo = 1 x

0, ... , d.

Of course, once (53) is conjectured, it is easily proved directly from Theorem 5. We need only verify that for x = 1, ... , d - 1, (54)
~ = P(x, d)

d-l
y=l

:!:: P(x, y).

Clearly (54) follows from (51). The genetics chain introduced in Example 7 satisfies (51) as does a gambler's ruin chain on {a, 1, ... ,d} having transition matrix of the form
1 tOt tot

°
1

Suppose two gamblers make a series of one dollar bets until one of them goes broke, and suppose that each gambler has probability t of winning any given bet. If the first gambler has an initial capital of x dollars and the second gambler has an initial capital of d - x dollars, then the second gambler has probability Pxd = xjd of going broke and the first gambler has probability 1 - (x/d) of going broke.
1.7. Birth and death chains

tOt

For an irreducible Markov chain either every state is recurrent or every state is transient, so that an irreducible Markov chain is either a recurrent chain or a transient chain. An irreducible Markov chain having only finitely many states is necessarily recurrent. It is generally difficult to decide whether an irreducible chain having infinitely many states is recurrent or transient. We are able to do so, however, for the birth and death chain.

Markov Chains

Consider a birth and death chain on the nonnegative integers or on the finite set {O,... ,d}. In the former case we set d = 00. The transition function is of the form
y = x-I, y = x, y

1,
00.

where Px + qx + rx = 1 for x E !/, qo = 0, and Pd = 0 if d < assume additionally that Px and qx are positive for 0 < x < d. For a and b in !/ such that a < b, set a < x < b,

and set u(a) = 1 and u(b) = O. If the birth and death chain starts at y, then in one step it goes to y - 1, y, or y + 1 with respective probabilities qy, r y' or py. It follows that a < y < b. Since ry (56) Set Yo
(57)
=

1 - Py - qy, we can rewrite (55) as

u(y
=

+ 1) -

u(y)

1 and
I

= qy Py
1

(u(y)

u(y -

1»,

a < y < b.

~, =
Y

q ... q
PI'"

o<y

< d.

From (56) we see that

u(y

+ 1) -

u(y)

= ___b_ (u(y)

Yy-I

u(y -

1»,

a < y < b,

from which it follows that

u(y

1) - u(y)

=
=

Ya+ 1 Ya Ya

•••

_h_ (u(a Yy-l

1) - u(a»

Yy (u(a

1) - u(a».

Con sequently, (58)

u(y) u(y

+ 1) = Yy (u(a)
Ya

u(a

+ 1»,

a < y < b.

Summing (58) on y = a, ... , b - 1 and recalling that u(a) = 1 and u(b) = 0, we conclude that
u(a) u(a

+ 1) -

1 Ly=a Yy
b-l .

1.7. Birth and death chains

Thus (58) becomes

u(y) u(y

1)
=

b~~

Ly=a

a < y < b. 1 and again using the formula

Summing this equation on y u(b) = 0, we obtain

u(x) =

x, ... , b -

~b-l

£...y=x
~b-l

'/y
')'y
'

£...y=a

It now follows from the definition of u(x) that

(59)

P (T < T.) =
x a b

~b-l

£...y=x ~b-l £...y=a

'/y
')'y

a < x < b.

By subtracting both sides of (59) from 1, we see that

(60)

P (T. < T) =
x
b

~b-l

~x-l £...y=a

'/y
')'y

£...y=a

A gambler playing roulette makes a series of one dollar bets. He has respective probabilities 9/19 and 10/19 of winning and losing each bet. The gambler decides to quit playing as soon as his net winnings reach 25 dollars or his net losses reach 10 dollars. (a) Find the probability that when he quits playing he will have won 25 dollars. (b) Find his expected loss. The problem fits into our scheme if we let Xn denote the capital of the gambler at time n with Xo = 10. Then Xm n > 0, forms a birth and death chain on {O,1, ... , 35} with birth and death rates and Px = 9/19, qx = 10/19,

Example 12.

o o

< x < 35,

< x < 35.

States 0 and 35 are absorbing states. Formula (60) is applicable with a = 0, x = 10, and b = 35. We conclude that
'/y = (l0/9Y,

y < 34,

and hence that P10(T35 < To) = L~=o (10/9)Y = (10/9)10 - 1 = .047. L;~o (10/9)Y (10/9)35 - 1 Thus the gambler .has probability .047 of winning 25 dollars. His expected loss in dollars is 10 - 35(.047), which equals $8.36.

Ms,kov Chsi".

In the remainder of this section we consider a birth and death chain on the nonnegative integers which is irreducible, i.e., such that Px > 0 for x > 0 and qx > 0 for x > 1. We will determine when such a chain is recurrent and when it is transient. As a special case of (59), (61) PI (To

Tn)

L,=o

n- 1

n > 1.

Consider now a birth and death chain starting in state 1. Since the birth and death chain can move at most one step to the right at a time (considering the transition from state to state as movement along the real number line), (62) It follows from (62) that {To < Tn}, n > 1, forms a nondecreasing sequence of events. We conclude from Theorem 1 of Ch.apter 1 of Volume 11 that

(63)

n-+oo

lim P1(To < Tn}

P1(To < Tn for some n

> 1}.

Equation (62) implies that Tn ~ n and thus 1'" --4 00 as n --4 00; hence the event {To < 1'" for some n > I} occurs if and only if the event {To < co} occurs. We can therefore rewrite (63) as (64)
n-+oo

lim P1(To < Tn}

P1(To < co).

It follows from (61) and (64) that

(65) PI (To < (0)

:2:;=0

We are now in position recurrent if and only if (66)

to show that the birth and death. chain is

y=O

Yy =

00.

If the birth and death chain is recurrent, then P1(To < (0) = 1 and (66) follows from (65). To obtain the converse, we observe that P(O, y) = 0 for y > 2, and hence (67) Po(To < (0)

P(O,O)

+ P(O,

1)P1(To < oo),

1 Paul G. Hoel, Sidney C. Port, and Charles J. Stone,/rttroductkm to Probablli'y Thor" (Boston: Houghton Mifflin Co., 1971), p. 13.

1.8.

Branching and queuing chains

Suppose (66) holds. Then by (65) P1(To < (0) = 1. From this and (67) we conclude that Po(To < (0) = P(O, 0)

Thus is a recurrent state, and since the chain is assumed to be irreducible, it must be a recurrent chain. In summary, we have shown that an irreducible birth and death chain on {a, 1, 2, ... } is recurrent if and only if
(68)

P(O, 1) = 1.

Q1'"

qx =
•• Px

00.

x= 1 Pl'

Example13. Consider the birth and death chain on {a, 1, 2, ... } defined by
x
Px

= 2(x

+2 + 1)

and

qx = 2(x

1)'

x ~ O.

Determine whether this chain is recurrent or transient. Since qx -=

X
X

,
_

it follows that
"I = q 1 =:«
x=

P1 ••• Px
(x

1·2···x 3 . 4 ... (x

+
x

1)(x

2)
1

=2(

1) 2.

Thus 00 OO( ~ v, = 2 x=l ~ x=l

= 2(-!- =2·-!-=1.

+1 t +t X

+2 + ! - t + ... )

We conclude that the chain is transient. 1.8. Branching and queuing chains

In this section we will describe which branching chains are certain of extinction and which are not. We will also describe which queuing chains

Markov Chains

are transient and which are recurrent. The proofs of these results are somewhat complicated and will be given in the appendix to this chapter. These proofs can be skipped with no loss of continuity. It is interesting to note that the proofs of the results for the branching chain and the queuing chain are very similar, whereas the results themselves appear quite dissimilar. 1.8.1. Branching chain. Consider the branching chain introduced in Example 6. The extinction probability p of the chain is the probability that the descendants of a given particle eventually become extinct. Clearly

Suppose there are x particles present initially. Since the numbers of offspring of these particles in the various generations are chosen independently of each other, the probability Pxo that the descendants of each of the x particles eventually become extinct is just the xth power of the probability that the descendants of anyone particle eventually become extinct. In other words, (69)
Pxo _

= 1, 2, ....

Recall from Example 6 that a particle gives rise to ~ particles in the next generation, where ~ is a random variable having density f If f(l) = 1, the branching chain is degenerate in that every state is an absorbing state. Thus we suppose that f(l) < 1. Then state 0 is an absorbing state. It is left as an exercise for the reader to show that every state other than 0 is transient. From this it follows that, with probability one, the branching chain is either absorbed at 0 or approaches + 00. We conclude from (69) that
x
n-+oo

= 1, 2, ....

Clearly it is worthwhile to determine p or at least to determine when p = 1 and when p < 1. This can be done using arguments based upon the formula
(70)

tI>(p) = p,

where tI> is the probability generating function of J, defined by tI>(t) = f(O)

+L
y=l

f(y)tY,

O<t<1.

1.8.

Branching and queuing chains

To verify (70) we observe that (see Exercise 9(b))

P = P10

pel, 0)

+L
y=1
00

pel, y)PyO

= pel, 0) + = f(O) + L = <I)(p).

y=1

P(l, y)pY

f(y)pY

y=1

Let Jl denote the expected number of offspring of any given particle. Suppose Jl < 1. Then the equation <I)(t) = t has no roots in [0, 1) (under our assumption thatf(l) < 1), and hence p = 1. Thus ultimate extinction
is certain ifJl

< 1 andf(l)

Suppose instead that Jl root Po in [0, 1), and hence p equals either Po or 1. Actually p always equals Po. Consequently, if Jl > 1 the probability of ultimate extinction is
less than one.

< 1. > 1. Then the equation <I)(t) = t has a unique

The proofs of these results will be given in the appendix. The results themselves are intuitively very reasonable. If Jl < 1, then on the average each particle gives rise to fewer than one new particle, so we would expect the population to die out eventually. If Jl > 1, then on the average each particle gives rise to more than one new particle. In this case we would expect that the population has positive probability of growing rapidly, indeed geometrically fast, as time goes on. The case Jl = 1 is borderline; but since p = 1 when Jl < 1, it is plausible by "continuity" that p = 1 also when Jl = 1. Suppose that every man in a certain society has exactly three children, which independently have probability one-half of being a boy and one-half of being a girl. Suppose also that the number of males in the nth generation forms a branching chain. Find the probability that the male line of a given man eventually becomes extinct.
Example 14.

The density f of the number of male children of a given man is the binomial density with parameters n = 3 and p = t· Thus f(O) = i, f(l) = t,f(2) = t,f(3) = i, andf(x) = for x >4. The mean number of male offspring is Jl = t. Since Jl > 1, the extinction probability p is the root of the equation

Mllrkov Chllins

lying in [0, 1). We can rewrite this equation as

t3
or equivalently as

3t 2

(t - 1)(t2 + 4t - 1) = 0.
This equation has three roots, namely, 1, sequently, p = ,J"S - 2.

-,J"S -

2, and

,J"S -

2. Con-

1.8.2. Queuing chain. Consider the queuing chain introduced in Example 5. Let 1, and Jl be as in that example. In this section we will indicate when the queuing chain is recurrent and when it is transient. Let Jl denote the expected number of customers arriving in unit time. Suppose first that Jl > 1. Since at most one person is served at a time and on the average more than one new customer enters the queue at a time, it would appear that as time goes on more and more people will be waiting for service and that the queue length will approach infinity. This is indeed the case, so that if Jl > 1 the queuing chain is transient. In discussing the case Jl < 1, we will assume that the chain is irreducible (see Exercises 37 and 38 for necessary and sufficient conditions for irreducibility and for results when the queuing chain is not irreducible). Suppose first that Jl < 1. Then on the average fewer than one new customer will enter the queue in unit time. Since one customer is served whenever the queue is nonempty, we would expect that, regardless of the initial length of the queue, it will become empty at some future time. This is indeed the case and, in particular, 0 is a recurrent state. The case Jl = 1 is borderline, but again it turns out that 0 is a recurrent state. Thus if Jl < 1 and the

e e2' ...

queuing chain is irreducible, it is recurrent.

The proof of these results will be given in the appendix. APPENDIX 1.9. Proof of results for the branching and queuing chains

In this section we will verify the results discussed in Section 1.8. To do so we need the following.

pee
(71)

Theorem 6 Let be the probability generating function of a nonnegative integer-valued random variable and set Jl = (with Jl = + 00 if does not have finite expectation). If Jl < 1 and = 1) < 1, the equation

(t) = t

has no roots in [0, 1). If Jl > 1, then (71) has a unique root Po in [0, 1).

1.9.

Proof of results for the branching and queuing ch.ins

Graphs of cl>{t), 0 < t < 1, in three typical cases corresponding to fl < 1, fl = 1, and fl > 1 are shown in Figure 2. The fact that fl is the left-hand derivative ofcl>{t) at t = 1 plays a fundamental role in the proof of Theorem 6.
y y

t ~<1 Figure 2

~>1

Proof

Let! denote the density of cl>{t)

Then

= f{O) + f(1)t

+ f(2)t2 +
3f(3)t2

+ ... + .. '.
fl.

and cl>'{t) = f(1) Thus cl>(0)

2f(2)t

= f{O),
t-+ 1

= 1, and lim cl>'(t) = f(l) + 2f(2) + 3f(3) + ... =

cl>(l)

Suppose first that fl < 1. Then lim cl>'{t) < 1.

t-« 1

Since cl>'{t) is nondecreasing in t, 0 < t < 1, we conclude that cl>'{t) < 1 for 0 < t < 1. Suppose next that fl = 1 and f{l) = = 1) < 1. Then fen) > 0 for some n > 2 (otherwise f{O) > 0, which implies that fl < 1, a contradiction). Therefore cl>'{t) is strictly increasing in t, o < t < 1. Since

pee

lim cl>'(t)
t"'1

= 1,

we again conclude that cl>'{t) < 1 for 0 < t < 1. Suppose now that fl ~ 1 and P{ = 1) < 1. We have shown that cl>'{t) < 1 for 0 < t < 1. Thus

- (cl>{t) - t) < 0, dt

< 1,

Markov Chains

and hence cI>(t) - t is strictly decreasing on [0, 1]. Since <1>(1) - 1 = 0, we see that (t) - t > 0, < t < 1, and hence that (71) has no roots on [0, 1). This proves the first part of the theorem. Suppose next that Ii > 1. Then lim '(t) > 1,
t r+ 1

so by the continuity of <1>' there is a number to such that < to < 1 and '(t) > 1 for to < t < 1. It follows from the mean value theorem that

<1>(1) - (to) > 1.

1 - to Since <1>(1) = 1, we conclude that (to) - to < 0. Now (t) - t is continuous in t and nonnegative at t = 0, so by the intermediate value theorem it must have a zero Po on [0, to). Thus (71) has a root Po in [0, 1). We will complete the proof of the theorem by showing that there is only one such root. Suppose that < Po < PI < 1, (po) = Po, and (Pl) = Pl. Then the function (t) - t vanishes at Po, PI' and 1; hence by Rolle's theorem its first derivative has at least two roots in (0, 1). By another application of Rolle's theorem its second derivative "(t) has at least one root in (0, 1). But if Ii > 1, then at least one of the numbers f(2), f(3), ... is strictly positive, and hence "(t) = 2f(2)

+ 3· 2f(3)t

+ ...
shows that (t) = t has a

has no roots in (0, 1). This contradiction unique root in [0, 1).

Branching chain. Using Theorem 6 we see that the results for Ii < 1 follow as indicated in Section 1.8.1. Suppose u > 1. It follows from Theorem 6 that P equals Po or I, where Po is the unique root of the equation (t) = tin [0, 1). We will show that P always equals Po. First we observe that since the initial particles act independently in giving rise to their offspring, the probability PiTo < n) that the descendants of each of the y > 1 particles become extinct by time n is given by Py(To < n) Consequently Pl(To for n >

1.9.1.

(Pl(To < n))Y.

by Exercise 9(a)
= P(I, 0) = P(l, 0) = f(O)

< n + 1)

+ +
00

y= 1

L
00 00

pel, y)P,(To < n) P(I, y)(Pl(To

y= 1

< n))Y

+L
y=l

f(y)(P1(To

< n))y,

1.9.

Proof of results for the branching and queuing chains

and hence (72) We will use (72) to prove by induction that (73) Now PI (To < 0) = 0 < Po, so that (73) is true for n = O. Suppose that (73) holds for a given value of n. Since <1>(/) is increasing in I, we conclude from (72) that P1(To n ~ O.

n > O.

< n + 1) = (Pl(To < n)) < (po) = Po,

(73) is true for

and thus (73) holds for the next value of n. By induction all n > O. By letting n -+ 00 in (73) we see that P

Pl(To

< (0) = lim Pl(To

n-+oo

< n) < Po.

Since P is one of the two numbers Po or 1, it must be the number Po. 1.9.2. Queuing chain. We will now verify the results of Section 1.8.2. Let en denote the number of customers arriving during the nth time period. Then l' are independent random variables having common density f, mean 11, and probability generating function <1>. It follows from Exercise 9(b) and the identity P(O, z) = P(l, z), valid for a queuing chain, that Poo = PlO' We will show that the number P = Poo = PlO satisfies the equation

e e2' ...

(74)

(p) = p.

If 0 is a recurrent state, P = 1 and (74) follows immediately from the fact that <1>(1) = 1. To verify (74) in general, we observe first that by Exercise 9(b) Poo = P(O, 0) i.e., that (75) P = f(O)

y= 1
00

L
00

P(O, y)pyo,

+L
y=l

f(y)pyo·

In order to compute Pyo, Y = 1, 2, ... , we consider a queuing chain starting at the positive integer y. For n = 1,2, ... , the event {Ty-1 = n} occurs if and only if
n = min (m = min (m

> 0: > 0:

(el

+ ... +
=m-

(em 1),

1) = Y -

el + ... + em

Markov Chains

that is, if and only if n is the smallest positive integer m such that the number of new customers entering the queue by time m is one less than the number served by time m. Thus Py(TY-1 = n) is independent of y, and consequently Py,y-l = Py(TY-1 < 00) is independent of y for y = 1, 2, .... Since PIO = p, we see that
Py,y-l

Py-l,y-2

= ... = PIO = p.

Now the queuing chain can go at most one step to the left at a time, so in order to go from state y > 0 to state 0 it must pass through all the intervening states y - 1, ... , 1. By applying the Markov property we can conclude (see Exercise 39) that (76) P yO - P y,y - 1P y 1 ,y - 2

"'P

10 -

- Py •

It follows from (75) and (76) that

= f(O) + ~

f(y)py

y=l

~(p),

so that (74) holds. Using (74) and Theorem 6 it is easy to see that if p. < 1 and the queuing chain is irreducible, then the chain is recurrent. For p satisfies (74) and by Theorem 6 this equation has no roots in [0, 1) (observe that P( 1 = 1) < 1 if the queuing chain is irreducible). We conclude that p = 1. Since Poo = p, state 0 is recurrent, and thus since the chain is irreducible, all states are recurrent. Suppose now that p. > 1. Again p satisfies (74) which, by Theorem 6, has a unique root Po in [0, 1). Thus P equals either Po or 1. We will prove that p = Po. To this end we first observ.e that by Exercise 9(a)

P1(To < n

P(l, 0)

y=l

P(1, y)Py(To < n),

which can be rewritten as (77) P1(To < n

1) = f(O)

+~
y= 1

f(y)Py(To

< n).

We claim next that (78)

> 1 and n > O.

To verify (78) observe that if a queuing chain starting at y reaches 0 in n or fewer steps, it must reach y - 1 in n or fewer steps, go from y - 1 to Y - 2 in n or fewer steps, etc. By applying the Markov property we can conclude (see Exercise 39) that (79) Py(To < n) < Py(TY-1

< n)Py-1(I'y-2

< n)···

P1(To < n).

Exercises Since 1 < z < y, (78) is valid. It follows from (77) and (78) that

P1(To < n
i.e., that (80)

1) < f(O)

+ I:
00

y=t

f(y)(Pt(To

< n))Y,

n > O.

This in turn implies that (81)

n > 0,

by a proof that is almost identical to the proof that (72) implies (73) (the slight changes needed are left as an exercise for the reader). Just as in the proof of the corresponding result for the branching chain, we see by letting n -+ 00 in (81) that P < Po and hence that p = Po. We have shown that if J.l > 1, then Poo = P < 1, and hence 0 is a transient state. It follows that if J.l > 1 and the chain is irreducible, then all states are transient. If J.l > I and the queuing chain is not irreducible, then case (d) of Exercise 38 holds (why?), and it is left to the reader to show that again all states are transient.

Exercises
1 Let Xm n ~ 0, be the two-state Markov chain.

(a) P(XI = 0 I Xo (b) P(Xt :f. X2).

0 and X2

Find

0),

Suppose we have two boxes and 2d balls, of which d are black and d are red. Initially, d of the balls are placed in box 1, and the remainder of the balls are placed in box 2. At each trial a ball is chosen at random from each of the boxes, and the two balls are put back in the opposite boxes. Let Xo denote the number of black balls initially in box I and, for n ~ 1, let Xn denote the number of black balls in box 1 after the nth trial. Find the transition function of the Markov chain Xn, n > O. more customers waiting to be served at the start of a period, there is probability p that one customer will be served during that period and probability 1 - p that no customers will be served during that period. Find the transition function for this modified queuing chain.

3 Let the queuing chain be modified by supposing that if there are one or

Markov Chains

4 Consider a probability space

(n, .91, P) and assume that the various

sets mentioned below are all in d. (a) Show that if Di are disjoint and P(C I Di) = p independently of i, then P(C I D = p. (b) Show that if c, are disjoint, then P(Ui c, I D) = Li P(Ci I D). (c) Show that if E, are disjoint and E, = n, then

Ui i)
P(C

I D)

L P(Ei i

I D)P(C I s, n D).
= PCB I Ci) for all i,

(d) Show that if C, are disjoint and Pt A lei) then peA I UiCi) = PCB I UiCi)'
5 Let Xm n

> 0, be the two-state Markov chain.

(a) Find Po(To (b) Find PO(Tl

6 Let Xm n

= =

n). n).

> 0, be the Ehrenfest chain and suppose that Xo has a

binomial distribution with parameters d and 1/2, i.e.,

P(Xo = x) = ~, Find the distribution of Xl'

7 Let Xm n

x = 0, ... , d.

> 0, be a Markov chain. Show that

P(Xo
8

I Xl

Xl' ... , Xn = xn) = P(Xo

I Xl

Xl)'

Let X and y be distinct states of a Markov chain having d < 00 states and suppose that X leads to y. Let no be the smallest positive integer such that pnO(x, y) > 0 and let Xl' ... ,xno-l be states such that P(x, Xl)P(Xl, X2)'" P(Xno-2' Xno-l)P(Xno-l,
y) > O.

(a) Show that x, Xl' ... , xno-l' yare distinct states. (b) Use (a) to show that no < d - l. (c) Conclude that PxCTy < d - 1) > O.
9 Use (29) to verify the following identities:

(a) PxCTy < n (b) Pxy

1) = P(x, y)

+ ~ P(x, z)Pz(Ty <

z*y

n),

n> -

o· ,

P(x, y)

+ ~ P(x, z)pzy.
z*y

10 Consider the Ehrenfest chain with d = 3. (a) Find Px(To = n) for X E 9' and 1 < n (b) Find P, p2, and

r>.

< 3.

(c) Let 1to be the uniform distribution and 1t3'

1to

(!, !, !, i). Find

1tl, 1t2,

Exercises

Consider the genetics chain from Example 7 with d = 3. (a) Find the transition matrices P and P". (b) If Xo = (0, i, i, 0), find Xl and X2· (c) Find Pi1{0,3} = n), x E 9', for n = 1 and n = 2. Consider the Markov chain having state space {O, 1, 2} and transition matrix
0

P=l
2

[I
y)
=

0 0

12 1 0 1

-p
0

~].
XE

(a) Find p2. (b) Show that p4 = p2. (c) Find P", n > 1.
13

Let Xm n > 0, be a Markov chain whose state space 9' is a subset of {O,1, 2, ... } and whose transition function P is such that

L yP(x,
y

+ B,

9',

for some constants A and B. (a) Show that EXn+1 = AEXn (b) Show that if A 1= 1, then
EXn =
14

I-A

+ An

(EX

I-A

Let Xm n > 0, be the Ehrenfest chain on {O,1, ... ,d}. Show that the assumption of Exercise 13 holds and use that exercise to compute EiXn).

15 Let y be a transient state. Use (36) to show that for all x

n=O

L
00

pn(x, y) <

n=O

L
00

pn(y, y).

16 17 18

Show that n.

Pxy

> 0 if and only if P''(x, y) > 0 for some positive integer

Show that if x leads to y and y leads to z, then x leads to z. Consider a Markov chain on the nonnegative integers such that, starting from x, the chain goes to state x + 1 with probability p, o 1. (c) Show that the chain is recurrent.

Markov Chains

Consider a Markov chain having transition matrix 0123 0 t 0 t .\: 1 0010 2 0001 3 0100 4 0000 5 0000 6 0000

state space {O,1, ... , 6} and 56 t00 000 000 000 t0t tt0 0tt 4

(a) Determine which states are transient and which states are recurrent. (b) Find POY' y = 0, ... , 6. 20 Consider the Markov chain on {O, 1, ... , 5} having transition matrix 0 1 2 3 4 5 12345 tt0000 0 t t 0 0 i. 0 00t080 .\: .\: 0 0 .\: .\: 0 0 i 0 .\: 0 1 1 0 5 0 .1 5 15 5 0

(a) Determine which states are transient and which are recurrent. (b) Find p{O,1}(x), x = 0, ... , 5. Consider a Markov chain on {O, 1, ... , d} satisfying (51) and having no absorbing states other than 0 and d. Show that the states 1, ... , d - 1 each lead to 0, and hence that each is a transient state. the genetics chain introduced in Example 7 satisfies Equation (51).

22 Show that

23 A certain Markov chain that arises in genetics has states 0, 1, ... , 2d

and transition function

P(x, y) ='

e) (~r(I - ~r-Y.
o
< x < d.

Find p{O}(x), 0 < x < 2d. 24 Consider a gambler's ruin chain on {O, 1, ... ,d}.

Find

25 A gambler playing roulette makes a series of one dollar bets. He has

respective probabilities 9/19 and 10/19 of winning and losing each bet. The gambler decides to quit playing as soon as he either is one dollar ahead or has lost his initial capital of $1000. (a) Find the probability that when he quits playing he will have lost $1000. (b) Find his expected loss.

Exerci.e.
26 Consider a birth and death chain on the nonnegative

integers such that

Px > 0 and qx > 0 for x > 1. (a) Show that if L;'=o Yy = 00, then Pxo = 1, x > 1. (b) Show that if L;'=o Yy < 00, then P xO - L:'=x Yy £"y=O Yy
~CX) ,

x > 1. -

27 Consider a gambler's ruin chain on {O, 1, 2, ... }. (a) Show that if q > p, then Pxo = 1, x > 1. (b) Show that if q < p, then Pxo = (q/p)x, x > 1.

Hint: Use Exercise 26.

28 Consider

integers.
28 Consider

an irreducible birth and death chain on the nonnegative Show that if Px < qx for x > 1, the chain is recurrent. birth and death chain on the nonnegative

an irreducible integers such that

Px -

qx _ (
X

X )2

x > 1.

(a) Show that this chain is transient. (b) Find PxO' x >l. Hint: Use

I:;=

l/y2 = n2/6.

Exercise

26 and

the formula

30 Consider the birth and death chain in Example 13.

(a) Compute PxCFa < Tb) for a < x < h. (b) Compute PxO, x > O.

Consider a branching chain such that f(l) state other than 0 is transient.

Show that every

32 Consider

the branching chain described in Example 14. If a given man has two boys and one girl, what is the probability that his male line will continue forever? Consider a branching chain probability p of extinction. with f(O)

f(3)

1/2.

Find

the

34 Consider

a branching chain with f(x) = p(1 - p)x, x > 0, where o 1/2 and that p = p/(l - p) if p < 1/2. Let Xm n > 0, be a branching Hint: See Exercise 13. chain. Show that

EiXn)

xp",

Let Xm n > 0, be a branching chain and suppose that the associated random variable ~ has finite variance (J2. (a) Show that E[X;+1 I X; = x] = X(12 + X2Jl2. (b) Use Exercise 35 to show that EiX;
+ 1)

Hi"t: Use the formula EY

= =

x}t"u1

2 + 11Ex(X ;).

Lx P(X

x)E[Y

IX =

x].

Markov Chains

(c) Show that EiX;) = x(J2(pn-l

+ ... +

p2(n-l»

x2p2n,

n ~ 1.

(d) Show that if there are x particles initially, then for n > 1
p =I 1,

p=1.
37 Consider the queuing chain.

(a) Show that if either J(O) = 0 or J(O) + J(I) = 1, the chain is not irreducible. (b) Show that if'j'(O) > 0 andJ(O) + J(1) < 1, the chain is irreducible. Hint: First verify that (i) Pxy > 0 for 0 < y < x; and (ii) if xo > 2 and J(xo) > 0, then PO,xo+n(xo-l) > 0 for n > O.

Determine which states of the queuing chain are absorbing, which are recurrent, and which are transient, when the chain is not irreducible. Consider the following four cases separately (see Exercise 37): (a) J(I) = 1; (b) J(O) > 0, J(I) > 0, and J(O) + J(I) = 1; (c) J(O) = 1; (d) J(O) = 0 andJ(I) < 1. Consider the queuing chain. (a) Show that for y > 2 and m a positive integer

Py(To

m-l

k=l

Py(TY-1

k)Py-1(To

m - k).

(b) By summing the equation in (a) on m PyO = P», y-l Py-l,O

1,2, ... , show that

y > 2.

Py(To < n) < Py(TY-1

< n)Py_1(To < n),

> 2.

(e) Why does Equation (79) follow from (d)?

40 Verify that (81) follows from (80) by induction.

Stationary Distributions of a Markov Chain

Let Xn, n > 0, be a Markov chain having state space [/' and transition function P. If n(x), x E [/', are nonnegative numbers summing to one, and if (1)

I n(x)P(x,
x

y) = n(y),

E [/',

then n is called a stationary distribution. exists and that (2)

n-+
00

Suppose that a stationary distribution

lim P'(x, y) = n(y),

E [/'.

Then, as we will soon see, regardless of the initial distribution of the chain, the distribution of Xn approaches n as n -+ 00. In such cases, n is sometimes called the steady state distribution. In this chapter we will determine which Markov chains have stationary distributions, when there is such a unique distribution, and when (2) holds. 2.1. Let Elementary properties of stationary distributions
tt

be a stationary distribution.

Then z)P(z, y)
z

I n(x)p2(x,
x

y) =

I n(x) L P(x,
X

= ~ (~
=
%

n(x)P(x, z)) P(z, y) y)

L n(z )P(z,
=

n(y).

Similarly by induction based on the formula

r:: lex,
we conclude that for all n (3)

L pn(x, z )P(z,
z

y),

I n(x)pn(x,
x

y) = n(y),
47

E [/'.

Stationary Distributions

of a Markov Chain

If Xo has the stationary distribution (3) implies that for all n

(4)

for its initial distribution, then

P(Xn = y) = 'Ir(Y),

E //,

and hence that the distribution of Xn is independent of n. Suppose conversely that the distribution of Xn is independent of n. Then the initial distribution 'lro is such that 'lro(y) = P(X
0

= y) = P(X

= y) =

L 'lro(x)P(x,
x

y).

Consequently 'lro is a stationary distribution. In summary, the distribution of Xn is independent of n if and only if the initial distribution is a stationary distribution. Suppose now that 'Ir is a stationary distribution and that (2) holds. Let 'lro be the initial distribution. Then
(5)

P(Xn = y) =

L 'lro(x)pn(x, x

y),

E //.

By using (2) and the bounded convergence theorem stated in Section 2.5, we can let n -+ 00 in (5), obtaining lim P(Xn = y) =
n->
00

L 'lro(x)'Ir(Y)·
x

Since
(6)

'lro(x) = 1, we conclude that lim P(Xn = y) = 'Ir(Y),

n->
00

Formula (6) states that, regardless of the initial distribution, for large values of n the distribution of Xn is approximately equal to the stationary distribution n: It implies that tt is the unique stationary distribution. For if there were some other stationary distribution we could use it for the initial distribution 'lro. From (4) and (6) we would conclude that 'lro(Y) = 'Ir(Y), Y E //. Consider a system described by a Markov chain having transition function P and unique stationary distribution n, Suppose we start observing the system after it has been going on for some time, say no units of time for some large positive integer no. In effect, we observe Yn, n > 0, where n>

The random variables Yn, n > 0, also form a Markov chain with transition function P. In order to determine unique probabilities for events defined in terms of the Yn chain, we need to know its initial distribution, which is the same as the distribution of Xno. In most practical applications it is very

2.2.

Examples

hard to determine this distribution exactly. We may have no choice but to assume that Yn, n > 0, has the stationary distribution n for its initial distribution. This is a reasonable assumption if (2) holds and no is large.
2.2. Examples

In this section we will consider some examples in which we can show directly that a unique stationary distribution exists and find simple formulas for it. In Section 1.1 we discussed the two-state Markov chain on [/ = {O, I} having transition matrix

o
We saw that if p determined by

[I q

1
p

1- q .

q > 0, the chain has a unique stationary distribution n, q

+q p+ We also saw that if 0 < p + q < 2, then (2) holds.

nCO) =

and

n(1) =

For Markov chains having a finite number of states, stationary distributions can be found by solving a finite system of linear equations. Consider a Markov chain having state space [/ = {O,I, 2} and transition matrix
Example 1.

o! 1
2i

[t
o

Show that this chain has a unique stationary distribution n and find n, Formula (I) in this case gives us the three equations
nCO) nCO) nCO)

in-

346 323

+ n(l) + +
n( 1) n(1)

n(2) = nCO) n(2) = n(1)

' '

+ +

342

n(2) = n(2).

Stationary Distributions

of a Markov Chain

n(x) = 1 gives us the fourth equation nCO)

n(1)

n(2)

By subtracting twice the first equation from the second equation, we eliminate the term involving n(2) and find that n(l) = 5n(0)/3. We conclude from the first equation that n(2) = 3n(0)/2. From the fourth equation we now see that
n(O)(1

+ t)
=
6 2 5'

= 1,

and hence that

nCO)

Thus ""(I) I. and

""(2) I.
3 '2 • 6 TI 9 TI'

.2.. • _[_ 3 25

.£ 5

It is readily seen that these numbers satisfy all four equations. Since they

are nonnegative, the unique stationary distribution is given by

nCO) = 2.7). Birth and death chain. Consider a birth and death chain on {O, 1, ... , d} or on the nonnegative integers. In the latter case we set d = 00. We assume without further mention that the chain is irreducible, 2.2.1.
6 2 5'

n(1) =

and

n(2) = f5'

Though it is not easy to see directly, (2) holds for this chain (see Section

i.e., that

r, > 0
and

for for

O<x<d

if d is finite, and that Px > 0 and for O<x<oo if d is infinite. Suppose d is infinite. The system of equations for O<x<oo

L n(x)P(x,
x

y) = n(y),

fI',

2.2.

Examples

becomes
n(O)ro n(y -

+ +

n(I)ql
l)qy+
I

= n(O),

I)Py-1

n(y)ry

n(y

n(y),

y > 1.

Since these equations reduce to (7)

qy+In(y

q In(1) - POn(O) = 0,

1) - pyn(y) = qyn(y) - py-In(y

1),

y>1.

It follows easily from (7) and induction that

> 0,

and hence that

n( y

1) = ____!!L_ n( y),

s,«,
Px-I

> O.

Consequently, (8) Set n(x)

Po'"

s.:: s,

nCO),

x>1.

I,
·(9)

x = 0,
x-I

nx =

P 0 "'P ql'"

x>1.

Then (8) can be written as (10)

x >0.
that

Conversely, (1) follows from (10). Suppose now that Lx nx < 00 or, equivalently, (11) x= I

Po'" ql'"

Px-I qx

00.

We conclude from (10) that the birth and death chain has a unique stationary distribution, given by (12)
x >0.

Suppose instead that (11) fails to hold, i.e., that (13) x=o

l: nx

00.

Stationary Distributions

of a Markov Chain

We conclude from (10) and (13) that any solution to (1) is either identically zero or has infinite sum, and hence that there is no stationary distribution. In summary, we see that the chain has a stationary distribution if and only if (II) holds, and that the stationary distribution, when it exists, is given by (9) and (12). Suppose now that d < 00. By essentially the same arguments used to obtain (12), we conclude that the unique stationary distribution is given by
(14) 1t(x)
X

d 1t;x

, 1ty

Ly=o

o<x s

where 1tx; 0 <

Example 2.

< d, is given by (9).

Consider the Ehrenfest chain introduced in Section 1.3 and suppose that d = 3. Find the stationary distribution. The transition matrix of the chain is 0
I

2 3

[! !].
ot
0
I0

This is an irreducible birth and death chain in which

1tl

1to

= I,

t1 = 3,

and

Thus the unique stationary distribution is given by

1t(0)

= t,

1t(1) =

1t(2) =

and

1t(3) =

Formula (2) does not hold for the chain in Example 2 since P''(x, x) = 0 for odd values of n. We can modify the Ehrenfest chain slightly and avoid such "periodic" behavior.
Modified Ehrenfestchain. Suppose we have two boxes labeled 1 and 2 and d balls labeled 1, 2, ... ,d. Initially some of the balls Example 3.

are in box 1 and the remainder are in box 2. An integer is selected at random from 1, 2, ... , d, and the ball labeled by that integer is removed from its box. We now select at random one of the two boxes and put the removed ball into this box. The procedure is repeated indefinitely, the

2.2.

EXBmples

selections being made independently. Let Xn denote the number of balls in box 1 after the nth trial. Then Xm n > 0, is a Markov chain on !/ = {O, I, ... ,d}. Find the stationary distribution of the chain for d = 3. The transition matrix of this chaill, for d

o ttoO 1 !t t
2

°
0

3, is

123
0

300tt

To see why P is given as indicated, we will compute P(I, y), 0 < y < 3. We start with one ball in box 1 and two balls in box 2. Thus P(1, 0) is the probability that the ball selected is from box 1 and the box selected is box 2. Thus P(I, 0)

= t· t = i·
box 2 and

Secondly, P(I, 2) is the probability tkat the ball Hlected islrom the box selected is box 1. Thus P(I,2)

= t· t = t·

Clearly P(1, 3) = 0, since at most one ball is transferred at a time. Finally, P(l, 1) can be obtained by subtracting P(l, 0) + P(l, 2) + P(l, 3) from 1. Alternatively, P(l, 1) is the probability that either the selected ball is from box 1 and the selected box is box 1 or the selected ban is from box 2 and the selected box is box 2. Thus

P(l, 1) =

t· t + t· t

t·

The other probabilities are computed similarly. This Markov chain is an irreducible birth and death chain. It is easily seen that nx, 0 < x < 3, are the same as in the previous example and hence that the stationary distribution is again given by n(O) =

n(l)

= i,

n(2) =

and

n(3) =

It follows from the results in Section 2.7 that (2) holds for the chain in Example 3. 2.2.2. Particle. in a box. A Markov chain that arises in several applied contexts can be described as follows. Suppose that ~n particles are added to a box at times n = 1, 2, ... , where ~m n > I, are independent aRd have a Poisson distribution with common parameter it. Suppose that lach. particle ia the box at time If, independently of all the other particles

Stationary Distributions of a Markov Chain

in the box and independently of how particles are added to the box, has probability p < 1 of remaining in the box at time n + 1 and probability q = 1 - p of being removed from the box at time n + 1. Let Xn denote the number of particles in the box at time n. Then Xm n > 0, is a Markov chain. We will find the stationary distribution of this chain. We will also find an explicit formula for P'(», y) and use this formula to show directly that (2) holds. The same Markov chain can be used to describe a telephone exchange, where en is the number of new calls starting at time n, q is the probability that a call in progress at time n terminates by time n + 1, and Xn is the number of calls in progress at time n. We will now analyze this Markov chain. Let R(Xn) denote the number of particles present at time n that remain in the box at time n + 1. Then
Xn+1

Clearly
P(R(Xn)

en+

R(Xn)·

= z I X; =

= (~) pZ(1 - py-z,

< z < x,

and
P(~n

= _e_,_,
z.

A_Z

-A,

z > O.

Since
P(Xn+
1

Y I X;

min(x,y)

z=O
min(x,y)

P(R(Xn) P(~n+
1

= z, =

en+

Y - z I X;

x) x),

z=o

Y - z)P(R(Xn)

= z I X; =

we conclude that

It follows from (15) or from the original description of the process that P(x, y) > 0 for all x > 0 and y > 0, and hence that the chain is irreducible. Suppose Xn has a Poisson distribution with parameter t. Then R(Xn)

has a Poisson distribution with parameter pt. For

P(R(Xn)

=L
x=y
00

P(Xn

x, R(Xn)

= =

x=y

P(Xn = x)P(R(Xn)
t

Y I X;

x=y

t tXe- (x) x! Y

pY(l _ Py-y

2.2

Examples

t tXepY(1 - pt-Y X=yy!(x - y)! = (pt)ye-t (t(1 - p)t-Y y! x=y (x - y)! = (pt)Ye-t (t(1 - p)y

- i:

i: f

y! z=o ( P t)Ye -t et(l-p) y! pt y!

= (pt)Ye-

which shows that R(Xn) has the indicated Poisson distribution. We will now show that the stationary distribution is Poisson with parameter t for suitable t. Let Xo have such a distribution. Then Xl = I + R(Xo) is the sum of independent random variables having Poisson distributions with parameters A. and pt respectively. Thus Xl has a Poisson distribution with parameter A. + pt. The distribution of Xl will agree with that of Xo if t = A. + pt, i.e., if

t=----

A
q

1- p

We conclude that the Markov chain has a stationary distribution n which is a Poisson distribution with parameter A/q, i.e., such that
(16)
n(x)

oi« e-)./q
x!

x >0.

Finally we will derive a formula for P''(x, y). Suppose Xo has a Poisson distribution with parameter t. It is left as an exercise for the reader to show that Xn has a Poisson distribution with parameter tpn Thus
). + _ (1

_ pn).

and hence
(17) r pn(x, y) = e-)'(l- P")/qet(l- p") x=o xl

L
00

[tpn

+ ~ (1
q.,:::_

- pn)]Y
_

Stationary Distributions

of a Markov Chain

Now if

where each power series has a positive radius of convergence, then

If az = 0 for z > y, then

z=o

L
L

azbx_z·

min(x,y)

z=o

azbx-z·

Using this with (17) and the binomial expansion, we conclude that P"(x, y) =
x! e-A,(l-l'")/q

min(x,y)

z=o

(y)
Z

[A ]y-Z (1 _ pnz - (1 _ pn) q (x - z)!

rr: ,

which simplifies slightly to (18) P"(x, y) =

min(x,y) (
e-A,(l-p")/q

A ]y-Z [ - (1 - pn) x pnz(1 _ pn)x-z _q _

)

z=o

(y - z)!

Since 0 < p < 1, lim p" = O.

n--co

Thus as n ~ 00, the terms in the sum in (18) all approach zero except for the term corresponding to z = O. We conclude that
-A,/q

(A)Y

(19)

lim P'ix, y) = e
n-+ co

n(y),

x, y > O.

Thus (2) holds for this chain, and consequently the distribution n given by (16) is the unique stationary distribution of the chain.
2.3. Average number of visits to a recurrent state

Consider an irreducible birth and death chain with stationary distribution n, Suppose that P(x, x) = = 0, X E 9', as in the Ehrenfest chain and the gambler'S ruin chain. Then at each transition the birth and death chain moves either one step to the right or one step to the left. Thus the chain can return to its starting point only after an even number of transitions. In other words, P"(x, x) = 0 for odd values of n. For such a chain the formula lim P"(x, y) = n(y), Y E Y,

n-+ co

clearly fails to hold.

2.3. Average number of visits to a recurrent state

There is a way to handle such situations. Let an' n > 0, be a sequence of numbers. If
(20)
n-+oo

lim an = L

for some finite number L, then

(21)

lim
n-+oo

!f
n

m= 1

am = L.

Formula (21) can hold, however, even if (20) fails to hold. For example, if an = 0 for n odd and an = 1 for n even, then an has no limit as n ~ 00, but
n-+oo

lim -

m=l

l:
n

am = -. 2

In this section we will show that 1n lim - l: P"'(x, y) n-+oo n m= 1 exists for every pair x, y of states for an arbitrary Markov chain. In Section 2.5 we will use the existence of these limits to determine which Markov chains have stationary distributions and when there is such a unique distribution. Recall that z = y, l,(z) = (~: z =F y, and that
(22)

Set Nn(y) = and

Glx,
m=l

l:
n

1y(Xm)

y) =

m=l

pm(x, y).

Then Nn(y) denotes the number of visits of the Markov chain to y during times m = 1, ... , n. The expected number of such visits for a chain starting at x is given according to (22) by (23) Let y be a transient state. Then lim N~(y)
n"'oo

N(y)

with probability one,

Stationary Distributions

of a Markov Chain

and
n-r a:

lim Gix, y) = G(x, y) <

00,

It follows that (24)

lim Nn(y) = 0
n-+oo

with probability one,

and that
(25) 1· Gn(x, y) = 0 , 1m n-+ 00 n
XE

Observe that Nn(y)Jn is the proportion of the first n units of time that the chain is in state y and that Gn(x, y)Jn is the expected value of this proportion for a chain starting at x. Suppose now that y is a recurrent state. Let my = Ey(Ty) denote the mean return time to y for a chain starting at y if this return time has finite expectation, and set my = 00 otherwise. Let l{Ty<oo} denote the random variable that is 1 if T; < 00 and 0 if T; = 00. We will use the strong law of large numbers to prove the main result of this section, namely, Theorem 1 below.
Strong Law of Large Numbers. Let identically distributed random variables. have finite mean u, then
~h ~2'

be independent these random variables

..•

· 1 rm
n-+oo

+ ... +
n

with probability one.

If these random variables are nonnegative and fail to have finite expectation, then this limit holds, provided that we set J-t = + 00.

This important theorem is proved in advanced probability texts.

Theorem 1 Let y be a recurrent state. (26) and (27) Then

lim Nn(y) =
n-+oo

l{Ty<

oo}

with probability one,

· 1rm Gn(x, y) =,Pxy n-+oo n my

These formulas are intuitively very reasonable. Once a chain reaches y, it returns to y "on the average every my units of time." Thus if T, < 00 and n is large, the proportion of the first n units of time that the chain is in

2.3. Average number of visits to a recurrent state

state y should be about limy. Formula (27) should follow from (26) by taking expectations. From Corollary 1 of Chapter 1 and the above theorem, we immediately obtain the next result. Corollary 1 Then (28) and if P(Xo (29) If my = (25) hold.
00

Let C be an irreducible closed set of recurrent states.

x, Y
E

I, then with probability one

lim Nn(y) = _I

n->oo

the right sides of (26)-(29) all equal zero, and hence (24) and

Proof In order to verify Theorem I, we need to introduce some additional random variables. Consider a Markov chain starting at a recurrent state y. With probability one it returns to y infinitely many times. For r > 1 let denote the time of the rth visit to y, so that

min (n > 1 : Nn(y)

r).

Set = = Ty and for r > 2 let = T; - T;-I denote the waiting time between the (r - I)th visit to y and the rth visit to y. Clearly T". = WIy + ... + Wyr. y The random variables W~, W;, ... are independent and identically distributed and hence they have common mean EiW~) = EiTy) = my. This result should be intuitively obvious, since every time the chain returns to y it behaves from then on just as would a chain starting out initially at y. One can give a rigorous proof of this result by using (27) of Chapter 1 to show that for r > 1

w; t;

and then showing by induction that PiW;

mI,·.·,

W; = mr) = Py(W: = mI)···

Py(W: = mr).

The strong law of large numbers implies that lim

k->oo

WI y

W2 y

+ ... +
k

Wk y=m

with probability

one,

Stationary Distributions

of a Markov Chain

i.e., that (30)

k....eo

lim

Tk
___l!_

=m
y

with probability one.

Set r = Niy). By time n the chain has made exactly r visits to y. Thus the rth visit to y occurs on or before time n, and the (r + l)th visit to y occurs after time n; that is,
TNn(Y) ~ n
Y

TNn(y)+l
y'

and hence

or at least these results hold for n large enough so that N,,(y) ~ 1. Since N,,(y) -+ 00 with probability one as n -+ 00, these inequalities and (30) together imply that

. 1im
" .... 00

n
N,,(y)

with probability one,

or, equivalently, that (29) holds. Let y be a recurrent state as before, but let Xo have an arbitrary distribution. Then the chain may never reach y. If it does reach y, however, the above argument is valid; and hence, with probability one, N,,(y)ln -+ 1{Ty < 00 }Imy as n -+ 00. Thus (26) is valid. By definition 0 < N,,(y) < n, and hence
(31)

o < N,,(y)
n

< 1.

A theorem from measure theory, known as the dominated convergence theorem, allows us to conclude from (26) and (31) that lim Ex (Nn(Y)) n = Ex

(I{TY<OO})
my

= PxCTy < (0) = Pxy

my my

n .... co

and hence from (23) that (27) holds. Theorem 1.

This completes the proof of

2.4.

Null recurrent and positive recurrent states

00.

A recurrent state y is called null recurrent if my = we see that if y is null recurrent, then (32) · 1· 1im G,,(x, y) =lm n "....

From Theorem 1

:L:!.= 1 pm(x, y) =, 0
n

" .... 00

2.4. Null recurrent and positive recurrent states (It can be shown that if y is null recurrent, then

(33)

n-+(l()

lim P'(», y) = 0,

9',

which is a stronger result than (32). We will not prove (33), since it will not be needed later and its proof is rather difficult.) A recurrent state y is called positive recurrent if my < 00. It follows from Theorem I that if y is positive recurrent, then lim Gn(y, y)
n-+(l()

_1 > 0.

Thus (32) and (33) fail to hold for positive recurrent states. Consider a Markov chain starting out in a recurrent state y. It follows from Theorem 1 that if y is null recurrent, then, with probability one, the proportion of time the chain is in state y during the first n units of time approaches zero as n -+ 00. On the other hand, if y is a positive recurrent state, then, with probability one, the proportion of time the chain is in state y during the first n units of time approaches the positive limit limy as n -+ 00. The next result is closely related to Theorem 2 of Chapter 1.
Theorem 2 Let x be a positive recurrent state and suppose that x leads to y. Then y is positlie recurrent.

It follows from Theorem 2 of Chapter 1 that y leads to x. Thus there exist positive integers n, and n2 such that
Proof.

and Now pnl +m+n2(y, y) > pnl(y, x)pm(x, x)pn2(x, y), and by summing on m
Gn1+n+n/y,
=

I, 2, ... , n and dividing by n, we conclude that

y) _ GnI+hly, y) > pnl(y,

x)pn2(x, y) Gn(x, x) .

As n -+ 00, the left side of this inequality converges to limy and the right side converges to

Hence

_!_ >
my and consequently my <
00.

pnl(y,

x)pn2(x, y) > 0, m;

This shows that y is positive recurrent.

Stationary Distributions

of a Markov Chain

From this theorem and from Theorem 2 of Chapter 1 we see that if C is an irreducible closed set, then every state in C is transient, every state in C is null recurrent, or every state in C is positive recurrent. A Markov chain is called a null recurrent chain if all its states are null recurrent and a positive recurrent chain if all its states are positive recurrent. We see therefore that an irreducible Markov chain is a transient chain, a null recurrent chain, or a positive recurrent chain. If C is a finite closed set of states, then C has at least one positive recurrent state. For
~
yeC

pm(x, y) = 1,

and by summing on m = 1, ... , n and dividing by n we find that ~

yeC

Gn(x, y) = 1,
n

If C is finite and each state in C is transient or null recurrent, then (25) holds and hence
1 = lim ~
n-+oo yeC

Gn(x, y)
n
=

=~
yeC

lim GnCx, y)
n-+oo

a contradiction. We are now able to sharpen Theorem 3 of Chapter 1.

Theorem 3 Let C be a finite irreducible closed set of states. Then every state in C is positive recurrent.

The proof of this theorem is now almost immediate. Since C is a finite closed set, there is at least one positive recurrent state in C. Since C is irreducible, every state in C is positive recurrent by Theorem 2. I
Proof Corollary 2 An irreducible Markov chain having afinite number of states is positive recurrent. Corollary 3 A Markov chain having a finite number of states has no null recurrent states.

Corollary 2 follows immediately from Theorem 3. To verify Corollary 3, observe that if y is a recurrent state, then, by Theorem 4 of Chapter 1, y is contained in an irreducible closed set C of recurrent states. Since C is necessarily finite, it follows from Theorem 3 that all states in C, including y itself, are positive recurrent. Thus every recurrent state is positive recurrent, and hence there are no null recurrent states. I
Proof.

2.5.

Existence and uniqueness of stationary distributions

Example 4. Consider the Markov chain described in Example 10 of Chapter 1. We have seen that 1 and 2 are transient states and that 0, 3,4, and 5 are recurrent states. We now see that these recurrent states are necessarily positive recurrent. 2.5. Existence and uniqueness of stationary distributions

In this section we will determine which Markov chains have stationary distributions and when there is a unique such distribution. In our discussion we will need to interchange summations and limits on several occasions. This is justified by the following standard elementary result in analysis, which we state without proof.

Let a(x), x E 9', be nonnegative numbers having finite sum, and let bn(x), x E 9' and n > 1, be such that Ibn(x)I < 1, X E 9' and n > 1, and
Bounded Convergence Theorem. lim bn(x)
n-+ <Xl

= b(x),
=

9'.

Then
lim
n-+<Xl

L a(x)bn(x)
x

L a(x)b(x).
x

Let n be a stationary distribution by (3)

and let m be a positive integer.

Then

L n(z)pm(z,

n(x).

Summing this equation clude that (34)

on m = 1, 2, ... , n and dividing by n, we con-

L n(z)
z

Gn(z, x) = n(x),
n

9'.

Theorem 4 Let n be a stationary distribution. state or a null recurrent state, then n(x) = O. Proof
(35)

If x is a transient

If x is a transient state or a null recurrent state,

· 11m GnCz, x) = 0 , n-+<Xl n

9',

as shown in Sections 2.3 and 2.4. bounded convergence theorem that

It follows from (34), (35), and the

n(x) = lim ~ n(z) Gn(z, x) = 0, n-+<Xl z n

as desired.

Stationary Distributions

of a Markov Chain

It follows from this theorem that a Markov chain with no positive recurrent states does not have a stationary distribution.

Theorem 5 An irreducible positive recurrent Markov chain has a unique stationary distribution n, given by
(36) n(x)

= -,

X E [/.

Proof
that (37)

It follows from Theorem 1 and the assumptions of this theorem lim Giz, x) = _1_ ,
n-+
00

Z E [/.

Suppose n is a stationary distribution. bounded convergence theorem that n{x)

We see from (34), (37), and the

lim
n-+oo

L n{z) Gn{z, x)
z

1 m;

L n{z)
z

1 . mx

Thus if there is a stationary distribution, it must be given by (36). To complete the proof of the theorem we need to show that the function n{x), x E [/, defined by (36) is indeed a stationary distribution. It is clearly nonnegative, so we need only show that (38) and (39) 1 L-= x mx 1

L -1
x

P{x, y) = -,

1 my

E [/.

Toward this end we observe first that

L plft{z,
x

x) = 1.

Summing on m = 1, ... , n and dividing by n, we conclude that (40)

L Giz,
x

I/.

Next we observe that by (24) of Chapter 1

L plft{z,
x

x)P{x, y) = r:: I{Z, y).

2.5. Existence and uniqueness of stationary distributions

By again summing on m = 1, ... , n and dividing by n, we conclude that (41)

L GnCz,x)
x

P(x, y) = Gn+ 1(Z, y) _ P(z, y) .

If f/ is finite, we conc1ude from (37) and (40) that 1 - 1· ~ Gn(z, x) _ ~ - 1 1m i.J - i.J n~oo x n x mx
,

i.e., that (38) holds. Similarly, we conclude that (39) holds by letting n -+ 00 in (41). This completes the proof of the theorem if f/ is finite. The argument to complete the proof for f/ infinite is more complicated, since we cannot directly interchange limits and sums as we did for f/ finite (the bounded convergence theorem is not applicable). Let f/ 1 be a finite subset of f/. We see from (40) that

L
xef/l

GnCz, x) < 1, n
-+ 00

f/.

Since f/ 1 is finite, we can let n (37) that

in this inequality and conclude from

The last inequality holds for any finite subset f/ 1 of f/, and hence (42) For if the sum of l/mx over x E f/ exceeded 1, the sum over some finite subset of 9' would also exceed 1. Similarly, we conclude from (41) that if f/ 1 is a finite subset of 9', then ~ By letting n
i.J

xef/l

GnCz, x) P( ) < Gn+ I(Z, y) _ P(z, y) x, Y . n n n in this inequality and using (37), we obtain

-+ 00

xeVl

L -1

P(x, y)

We conclude, as in the proof of (42), that

(43)
~ - P(x, y) x m;

<- ,
my

f/.

Stationary Distributions

of a Markov Chain

Next we will show that equality holds in (43). It follows from (42) that the sum on y of the right side of (43) is finite. If strict inequality held for some y, it would follow by summing (43) on y that

L _!_
y my

>
= =

L
y

(L _!_
x

P(x,

y»)

L .'. (L P(x, y») m;

x
y

1 L-,
x mx

which is a contradiction. (39) holds. Set

This proves that equality holds in (43), i.e., that

c=--.

L_!_

x mx

Then by (39)
n(x)

= -,
mx

9',

defines a stationary distribution. Thus by the first part of the proof of this theorem c 1
-=-

and hence c = 1. This proves that (38) holds and completes the proof of the theorem. I From Theorems 4 and 5 we immediately obtain
Corollary 4 An irreducible Markov chain is positive recurrent and only if it has a stationary distribution. Example 5.

Consider an irreducible birth and death chain on the nonnegative integers. Find necessary and sufficient conditions for the chain to be (a) positive recurrent, (b) null recurrent, (c) transient. From Section 2.2.1 we see that the chain has a stationary distribution if and only if
(44) x= 1

~ v«: 1...l
Ql'"

Px-l

00.

2.5. Existence and uniqueness of stationary distributions

Thus (44) is necessary and sufficient for the chain to be positive recurrent. We saw in Section 1.7 that (45) is a necessary and sufficient condition for the chain to be transient. For the chain to be null recurrent, it is necessary and sufficient that (44) and (45) both fail to hold. Thus
(46)

and

Px - 1 = x= 1 q 1 ••• qx

I: Po'"
00

are necessary and sufficient conditions for the chain to be null recurrent. As an immediate consequence of Corollary 2 and Theorem 5 we obtain
Corollary 5 If a Markov chain having afinite number of states is irreducible, it has a unique stationary distribution.

Recall that Nn(x) denotes the number of visits to x during times m 1, ... ,n. By combining Corollary 1 and Theorem 5 we get
Corollary 6 Let Xn, n > 0, be an irreducible positive recurrent Markov chain having stationary distribution tt. Then with probability one (47)

lim Nix)
n-+
00

= re(x) ,

2.5.1.

Let re be a distribution on f/, i.e., let re(x), x E Y, be nonnegative numbers adding to one, and let C be a subset of Y. We say that n is concentrated on C if
re(x) = 0,

Reducible chains.

x¢ C.

By essentially the same argument used to prove Theorem 5 we can obtain a somewhat more general result.
Theorem 6 Let C be an irreducible closed set of positive recurrent states. Then the Markov chain has a unique stationary distribution t: concentrated on C. It is given by

(48)

re(x)

{~x'
0,

elsewhere.

Stationary Distributions

of a Markov Chain

Suppose Co and Clare two distinct irreducible closed sets of positive recurrent states of a Markov chain. It follows from Theorem 6 that the Markov chain has a stationary distribution 1Co concentrated on Co and a different stationary distribution 1C1 concentrated on Ct. Moreover, the distributions 1C« defined for 0 < rt < 1 by
XE

f/,

are distinct stationary distributions (see Exercise 5). By combining Theorems 4-6 and their consequences, we obtain CoroUary 7 Let ~ Markov chain.
(i)

denote the positive recurrent states of a

If ~ is empty, the chain has no stationary distributions.

(ii) If f/p is a nonempty irreducible set, the chain has a unique stationary distribution. (iii) If ~ is nonempty but not irreducible, the chain has an infinite number of distinct stationary distributions. Consider now a Markov chain having a finite number of states. Then every recurrent state is positive recurrent and there is at least one such state. There are two possibilities: either the set f/R of recurrent states is irreducible and there is a unique stationary distribution, or f/R can be decomposed into two or more irreducible closed sets and there is an infinite number of distinct stationary distributions. The latter possibility holds for a Markov chain on f/ = {O,1, ... , d} in which d > 0 and 0 and dare both absorbing states. The gambler's ruin chain on {O, 1, ... , d} and the genetics model in Example 7 of Chapter 1 are of this type. For such a chain any distribution 1C«, 0 < rt < 1, of the form
I1C«(X) rt '

{ 0,

rt,

x = 0, x = d,

elsewhere,

is a stationary distribution. Consider the Markov chain introduced in Example 10 of Chapter 1. Find the stationary distribution concentrated on each of the irreducible closed sets. We saw in Section 1.6 that the set of recurrent states for this chain is decomposed into the absorbing state 0 and the irreducible closed set {3, 4, 5}. Clearly the unique stationary distribution concentrated on {O} is given by 7to = (1, 0, 0, 0, 0, 0). To find the unique stationary distriExample 6.

2.6.

Queuing chain

but ion concentrated on {3, 4, 5}, we must find nonnegative numbers n(3), n(4), and n(5) summing to one and satisfying the three equations
n(3)

n( 4)

n(5)

= n(3)

624
n(3) = n(4)

3
n(3)

n(4)

3n(5)

n(5).

4 we find that n( 4) = n(3)/3 and

From the first two of these equations n(5) = 87[(3)/3. Thus

n(3)(1

+ t)

= 1,

from which we conclude that

n(3)

= i,

n(4)

= lz'

and

n(5) =

Consequently
7tl

(0, 0, 0,

i, lz' t)
on {3, 4, 5}.

is the stationary distribution

concentrated

2.6. Queuingchain
Consider the queuing chain introduced in Example 5 of Chapter 1. Recall that the number of customers arriving in unit time has density f and mean u. Suppose that the chain is irreducible, which means that f(O) > 0 andf(O) + f(l) < 1 (see Exercise 37 of Chapter 1). In Chapter 1 we saw that the chain is recurrent if J1 < 1 and transient if J1 > 1. In Section 2.6.1 we will show that in the recurrent case (49)
mo=--

1 - Jl
00

and hence 0 is a positive recurrent state. Thus by irreducibility the chain is positive recurrent. On the other hand, if J1 = 1, then mo = 00 and hence 0 is a null recurrent state. We conclude that the queuing chain is null recurrent in this case. Therefore an irreducible queuing chain is positive recurrent if J1 < 1 and null recurrent if Jl = 1, and transient if Jl > 1.

It follows from (49) that if J1 < 1, then mo <

Stationary Distributions

of a Markov Chain

*2.6.1. Proof. We will now verify (49). We suppose throughout the proof of this result that J(O) > 0, J(O) + J(I) < 1 and J.l < 1, so that the chain is irreducible and recurrent. Consider such a chain starting at the positive integer x. Then Tx-1 denotes the time to go from state x to state x-I, and Ty-1 - Ty, 1 < y < x-I, denotes the time to go from state y to state y - 1. Since the queuing chain goes at most one step to the left at a time, the Markov property insures that the random variables

are independent. These random variables are identically distributed; for each of them is distributed as min (n > 0:
~1

+ ... +

=n-

1),

i.e., as the smallest positive integer n such that the number of customers served by time n is one more than the number of new customers arriving by time n. Let G(t), 0 < t < 1, denote the probability generation function of the time to go from state 1 to state O. Then (50) G(t) = n=l

tnPl(To = n).

The probability generating function of the sum of independent nonnegative integer-valued random variables is the product of their respective probability generating functions. If the chain starts at x, then

is the sum of x independent random variables each having probability generating function G(t). Thus the probability generating function of To is (G(t)Y; that is,
(51)

(G(t)Y

n=l

tnpxCTo

n).

We will now show that (~2)

G(t) = t<l>(G(t)),

< 1,

where denotes the probability generating function of J. To verify (52) we rewrite (50) as G(t) = n=O

tn+1P1(To

+ 1) = tPC1,0) + t L tnP1(To =
n= 1

+ 1).

• This material is optional and can be omitted with no loss of continuity.

2.6.

Queuing chain

By using successively (29) of Chapter 1, (51) of this chapter, and the formula P(l, y) = fey), y > 0, we find that
G(t)

=
=

tP(I, tP(I,O)

+ +

t t
t

n= 1

LrL L L

y*O

P(I,

Y)Py(To tnpy(To

=
=

n) n)

y*O y*O

pel, y)

L
n= 1

= tP(I,

pel, y)(G(t»'

t [f(O) t<l>(G(t».

y*O

f(Y)(G(t»Y]

For 0 < t < I we can differentiate both sides of (52) and obtain
G'(t)
=

(G(t»

tG'(t)'(G(t».

Solving for G'(t) we find that (53) Now G(t)

G'(t)

(G(t» I - t<l>'(G(t»'
-+

O<t<1.

I and (t)

1 as t

I and

lim '(t) = lim

t--+ 1

t--+ 1 x=

L
00

xf(x)tX-1
1

x=1

xf(x)

By letting
(54)

t -+

I in (53) we see that lim G'(t)

t--+l

1
Jj

By definition
00

G(t) = ~ P1(To
n=1

n)tn.

But since P(I, x) = P(O, x), x > 0, it follows from (29) of Chapter 1 that the distribution of To for a queuing chain starting in state I is the same as that for a chain starting in state O. Consequently,
G(t)

n=1

L
00

Po(To

n)tn,

72 and hence lim G'{t)

t-+l

Stationary Distributions

of a Markov Chain

lim ~
t-+l
00

nPo(To

n= 1

n)tn-1

= ~ nPo(To = n)
n= 1

= Eo(To)

= mo' I

lt now follows from (54) that (49) holds.

2.7.

Convergence to the stationary distribution

We have seen earlier in this chapter that if X", n ~ 0, is an irreducible positive recurrent Markov chain having n as its stationary distribution, then lim
n-+oo

m= 1

pm(x, y) = lim Gn(x, y) = n(y),

n-+oo

x, Y

f/.

In this section we will see when the stronger result lim P"(x, y)
n-+oo

n(y),

x, Y

f/,

holds and what happens when it fails to hold. The positive integer d is said to be a divisor of the positive integer n if nld is an integer. If I is a nonempty set of positive integers, the greatest common divisor of I, denoted by g.c.d. I, is defined to be the largest integer d such that d is a divisor of every integer in l. It follows immediate1y that

1 < g.c.d. I < min (n : n

I).

In particular, if 1 E I, then g.c.d. I = 1. The greatest common divisor of the set of even positive integers is 2. Let x be a state of a Markov chain such that P'(», x) > 0 for some n > 1, i.e., such that Pxx = PiTx < (0) > O. We define its period dx by

d; = g.c.d. {n > 1 : F'(x, x) > O}.

Then 1 < dx < min (n > 1 : F'(x, x) > 0).

If P(x, x) > 0, then d; = 1. If x and yare two states, each of which leads to the other, then dx = dy• For let n, and n2 be positive integers such that
and

2.7.

Convergence to the stationary distribution Then

r: +n2(X, x)

> pnl(x, y)pn2(y, x) > 0,

and hence d; is a divisor of nl

n2' If pn(y, y) > 0, then

pnl+n+n2(x, x) > pnl(x, y)pn(y, y)pn2(y, x) > 0, so that dx is a divisor of n, + n + n2' Since d; is a divisor of nl + n2, it must be a divisor of n. Thus dx is a divisor of all numbers in the set {n ~ 1: pn(y, y) > O}. Since d, is the largest such divisor, we conclude that dx < dy- Similarly dy < dx, and hence d; = dy• We have shown, in other words, that the states in an irreducible Markov chain have common period d. We say that the chain is periodic with period d if d > 1 and aperiodic if d = 1. A simple sufficient condition for an irreducible Markov chain to be aperiodic is that P(x, x) > 0 for some x E 9'. Since P(O, 0) = f(O) > 0 for an irreducible queuing chain, such a chain is necessarily aperiodic. Example 7. chain. Determine the period of an irreducible birth and death

If some r x > 0, then P(x, x) = r x > 0, and the birth and death chain is aperiodic. In particular, the modified Ehrenfest chain in Example 3 is aperiodic. Suppose r x = 0 for all x. Then in one transition the state of the chain changes either from an odd numbered state to an even numbered state or from an even numbered state to an odd numbered state. In particular, a chain can return to its initial state only after an even number of transitions. Thus the period of the chain is 2 or a multiple of 2. Since p2(0, 0)

= Poq 1 >

we conclude that the chain is periodic with period 2. In particular, the Ehrenfest chain introduced in Example 2 of Chapter 1 is periodic with period 2. Theorem 7 Let Xm n > 0, be an irreducible positive recurrent Markov chain having stationary distribution n, If the chain is aperiodic, (55) lim P'(», y) = n(y),
n->oo

x, Y

9'.

If the chain

is periodic with period d, then for each pair x, y of states in f/ there is an integer r, 0 < r < d, such that P''(x, y) = 0 unless n = md + r for some nonnegative integer m, and

(56)

lim pmd+,(x, y) = dn(y).

m_'oo

Stationary Distributions

of a Markov Chain

For an illustration of the second half of this theorem, consider an irreducible positive recurrent birth and death chain which is periodic with period 2. If y - x is even, then p2m+ l(X, y) = 0 for all m > 0 and lim p2m(x, y) = 2n(y).
m->oo

If y - x is odd, then p2m(x, y) = 0 for all m > 1 and

lim p2m+ l(X, y) = 2n(y).
m->oo

We will prove this theorem in an appendix to this chapter, which can be omitted with no loss of continuity. Determine the asymptotic behavior of the matrix P" for the transition matrix P (a) from Example 3, (b) from Example 2. (a) The transition matrix P from Example 3 corresponds to an aperiodic irreducible Markov chain on {O, 1, 2, 3} having the stationary distribution given by
n(O)

Example 8.

= 1,

n(l)

= i,
"8
1

n(2)

= i,

and

n(3)

= 1.

It follows from Theorem 7 that for n large

1 i i "8 1i i 1 1 "8 i i 1 i- i i 1

P" .

(b) The transition matrix P from Example 2 corresponds to a periodic irreducible Markov chain on {O, 1, 2, 3} having period 2 and the same stationary distribution as the chain in Example 3. From the discussion following the statement of Theorem 7, we conclude that for n large and even
P" ...:.

i0 40i0 0i0t
0
1

while for n large and odd

i0 pn ...:. t 0 i 0 0 i 0 .i. 4 t0i0

2.8.

Proof of convergence

75 APPENDIX

2.8.

Proof of convergence

We will first prove Theorem 7 in the aperiodic case. Consider an aperiodic, irreducible, positive recurrent Markov chain having transition function P, state space f/, and stationary distribution 'It. We will now verify that the conclusion of Theorem 7 holds for such a chain. Choose a E f/ and let [be the set of positive integers defined by

[=
Then (i) g.c.d. [= 1; (ii) if m E [ and n
E

{n > 0 : P'(a, a) > O}.

I, then m

+ n E I.

Property (ii) follows from the inequality pm+n(a, a) > pm(a, a)pn(a, a). Properties (i) and (ii) imply that there is a positive integer n, such that n E [for all n > ni' For completeness we will prove this number theoretic result in Section 2.8.2. Using this result we conclude that P'(a, a) > 0 for n ~ tu, Let x and y be any pair of states in f/. Since the chain is irreducible, there exist positive integers n2 and n3 such that pn2(X, a) > 0 Then for n > ni pn2+n+n3(x, y) > pn2(x, a)pn(a, a)pn3(a, y) > O. We have shown, in other words, that for every pair x, y of states in f/ there is a positive integer no such that
(57)

and

pn(x, y) > 0,

Set
f/2

{(x, y) : x

E f/

and y

9'}.

Then f/2 is the set of ordered pairs of elements in f/. We will consider a Markov chain (Xm Yn) having state space f/2 and transition function P 2 defined by

It follows that Xm n > 0, and Yn, n > 0, are each Markov chains having transition function P, and the successive transitions of the Xn chain and the Yn chain are chosen independently of each other.

Stationary Distributions

of a Markov Chain

We will now develop properties of the Markov chain (Xn' Yn). In particular, we will show that this chain is an aperiodic, irreducible, positive recurrent Markov chain. We will then use this chain to verify the conclusion of the theorem. Choose (xo, Yo) E [/2 and (x, y) E y2. By (57) there is an no > 0 such that and Then (58) We conclude from (58) that the chain is both irreducible and aperiodic. The distribution n2 on y2 defined by n2(xO' Yo) = n(xo)n(yo) is a stationary distribution. For
(XO,Yo)

E9'2

n2(xO, Yo)P2«XO' Yo), (x, y»

Xo EV

n(xo)n(yo)P(xo,

x)P(Yo, y)

yo E9'

= n(x)n(y)

= n2(x, y).

Thus the chain on Set Choose a

[/2

is positive recurrent; in particular, it is recurrent. T

= min

> 0 : Xn = Yn).

Y. Since the (Xn' Yn) chain is recurrent, 1(a,a)

min (n > 0 : (Xm Yn)

(a, a»

is finite with probability one. Clearly T < 1(a,a), and hence T is finite with probability one. For any n > I (regardless of the distribution of (Xo' Yo»
(59)

P(Xn = y, T < n) = P(Yn = y, T < n),

Y E Y.

This formula is intuitively reasonable since the two chains are indistinguishable for n > T. To make this argument precise, we choose I < m < n. Then for Z E Y (60) P(Xn = y I T = m, Xm = Ym = z)

= P(Yn =

IT =

m, X m

z),

since both conditional probabilities equal pn-m(z, y). {TL. n} is the union of the disjoint events
{T

Now the event z

m, X m

z},

I < m < nand

2.B. Proof of convergence

so it follows from (60) and Exercise 4(d) of Chapter 1 that P(Xn = y

IT <

= P(Yn =

IT <

and hence that (59) holds. Equation (59) implies that P(Xn = y) = P(Xn = y, T < n)

= P(Yn
< P(Yn
and similarly that

= =

y, T < n)

+ P(Xn = y, T > n) + P(Xn = y, T > n)

+ Pt'I' > +

P(Yn = y) < P(Xn = y) Therefore for n > 1

(61)

peT > n).

IP(Xn = y) - P(Yn = y)1 < Pt'I' > n), lim peT > n) = 0.
n-+
0()

9'.

Since T is finite with probability one,

(62)

We conclude from (61) and (62) that (63)

n-+

lim (P(Xn = y) - P(Yn = y)) = 0,

0()

9'.

Using (63), we can easily complete the proof of Theorem 7. Choose x E f/ and let the initial distribution of (Xn' Yn) be such that P(Xo = x) = 1 and
Yo
E

9'.

Since Xn, n > 0, and Ym n > 0, are each Markov chains with transition function P, we see that
(64)

P(Xn

= pn(x, =

y),

f/,

and (65) Thus by (63)-(65)

n-+

P(Yn

n(y),

Y E 9'.

lim (pn(x, y) - n(y)) = lim (P(Xn = y) - P(Yn = y)) = 0,

0()

n-+

0()

and hence the conclusion of Theorem 7 holds. We first consider a slight extension of Theorem 7 in the aperiodic case. Let C be an irreducible closed set of positive recurrent states such that each state in C has period 1, and let n
2.S.1. Periodic case.

Stationary Distributions

of a Markov Chain

be the unique stationary distribution concentrated on C. By looking at the Markov chain restricted to C, we conclude that lim PIl(X, y)
11->00

= n(y) = _!_ ,
my

x, Y

In particular, if y is any positive recurrent state having period 1, then by letting C be the irreducible closed set containing y, we see that
(66) lim P'(», y) =
11->00

_!_ .
my

We now proceed with the proof of Theorem 7 in the periodic case. Let Xm n > 0, be an irreducible positive recurrent Markov chain which is periodic with period d > 1. Set Ym = Xmd, m > O. Then Ym, m > 0, is a Markov chain having transition function Q = pd. Choose y E Y. Then g.c.d. {m I Qm(y, y) > O} = g.c.d. {m I pmd(y, y) > O}
1 = - g.c.d. {n

I p II(y,

y) > O}

=1. Thus all states have period 1 with respect to the Ym chain. Let the XII chain and hence also the Ym chain start at y. Since the X" chain first returns to y at some multiple of d, it follows that the expected return time to y for the Ym chain is d-1my, where my is the expected return time to y for the Xn chain. In particular, y is a positive recurrent state for a Markov chain having transition function Q. By applying (66) to this transition function we conclude that lim Qm(y, y)
m->oo

= .!!_ = dn(y),
my

and thus that

(67)

lim pmd(y, y)
m->oo

dn(y),

Let x and y be any pair of states in Y and set

'1 = min (n : P'(x, y)

> 0).

Then, in particular, P'ttx, y) > O. We will show that P'(x, y) > 0 only if n - '1 is an integral multiple of d. Choose nl such that pnl(y, x) > O. Then
r: +nl( y, y) > PIII( y, x)prl(x, y) > 0,

2.8.

Proof of convergence

and hence '1 + n1 is an integral multiple of d. If P''(»; y) > 0, then by the same argument n + n1 is an integral multiple of d, and therefore so is n - '1' Thus, n = kd + '1 for some nonnegative integer k. There is a nonnegative integer m1 such that '1 = mId + r where o < r < d. We conclude that (68)
pn(x, y)

unless

n = md

for some nonnegative integer m. It follows from (68) and from (28) of Chapter 1 that
(69)

pmd+r(x, y) =

L
k=O

P xCTy = kd

+ r)p(m-k)d(y, y).

Set p(m-k)d(y y) am(k) = ( 0 " , Then by (67) for each fixed k lim am(k) = dn( y).
m-+
00

< k < m, k > m.

We can apply the bounded convergence theorem (with f/ replaced by {O,1, 2, ... }) to conclude from (69) that
lim pmd+r(x, y)
m-+oo

= dn(y)

k=O

L
00

Px(Ty

+ r)

= dn{Y)Px{Ty = dn(y),

< (0)

and hence that (56) holds. This completes the proof of Theorem 7.

2.8.2.

A result from number theory.

Let I be a nonempty set

of positive integers such that (i) g.c.d. 1= 1; (ii) if m and n are in 1, then m + n is in I. Then there is an no such that n E I for all n > no. We will first prove that 1contains two consecutive integers. Suppose otherwise. Then there is an integer k > 2 and an n1 E I such that n, + k E I and any two distinct integers in I differ by at least k. It follows from property (i) that there is an n E I such that k is not a divisor of n. We can write
n = mk

Stationary Distributions

of a Markov Chain

where m is a nonnegative integer and 0 < r < k. It follows from property (ii) that (m + 1)(nl + k) and n + (m + l)nI are each in I. Their difference is
(m

1)(n1

k) -

l)nI

mk -

which is positive and smaller than k. This contradicts the definition of k. We have shown that I contains two consecutive integers, say n1 and nl + 1. Let n > n~. Then there are nonnegative integers m and r such that 0 < r < n 1 and
n-

n~
1)

= mni

+
r

Thus
n

r(n1

(ni -

m)n1'
E

which is in I by property (ii). This shows that n n > no

I for all

ni.

Exercises
1

Consider a Markov chain having state space {O, 1, 2} and transition matrix
012

o [.4 1 .3
2

.4 .4 .3 . .2 .4 .4

.2]

Show that this chain has a unique stationary distribution tt and find n. 2 Consider a Markov chain having transition function P such that P(x, y) = ocY' X E f/ and y E f/, where the oc/s are constants. Show that the chain has a unique stationary distribution n, given by 7t(y) = ocY' y E s: 3 Let tt be a stationary distribution of a Markov chain. Show that if 7t(x) > 0 and x leads to y, then 7t(y) > O. 4 Let tt be a stationary distribution of a Markov chain. Suppose that y and z are two states such that for some constant c
P(x, y) = cP(x, z),
XE

f/.

Show that 7t(y) = C7t(z). 5 Let 7to and 7t 1 be distinct stationary distributions for a Markov chain. (a) Show that for 0 < oc < 1, the function 7t(% defined by 7t(%(x) = (1 - oc)7to(x) is a stationary distribution.

+ OC7t1(x),

f/,

Exercises

(b) Show that distinct values of ex determine distinct stationary distributions 1ttz. Hint: Choose Xo E Y such that 1to( xo) =F 1t1(xo) and show that 1tixo) = 1tp(xo) implies that ex = p.
6 Consider

a birth and death chain on the nonnegative integers and suppose that p., = I,px = p > Of or x > l,andqx = q = 1 - p > 0 for x > 1. Find the stationary distribution when it exists. distribution of the Ehrenfest chain. (b) Find the mean and variance of this distribution.

7 (a) Find the stationary

8 For general d, find the transition

function of the modified Ehrenfest chain introduced in Example 3, and show that this chain has the same stationary distribution as does the original Ehrenfest chain. Find the stationary distribution of the birth and death chain described in Exercise 2 of Chapter 1. Hint: Use the formula

10 Let Xm n

> 0, be a positive recurrent irreducible birth and death chain, and suppose that Xo has the stationary distribution n, Show that
P(Xo = y

I Xl = x) = P(x,

y),

x, Y

Hint: Use the definition of 1tx given by (9). 11 Let Xm n > 0, be the Markov chain introduced in Section 2.2.2. Show that if Xo has a Poisson distribution with parameter t, then Xn has a
Poisson distribution with parameter

tp"

A + - (1
q

_ pn).

12 Let Xn, n ~ 0, be as in Exercise 11. Show that

Hint: Use the result of Exercise II and equate coefficients of tX in the

appropriate
13 Let X", n

power series.

> 0, be as in Exercise 11 and suppose that X 0 has the stationary

distribution. Use the result of Exercise 12 to find cov (Xm' Xm+J. m ~ 0 and n > O.
14 Consider a Markov chain on the nonnegative integers having transition function P given by P(x, x + 1) = p and P(x, 0) = 1 - p, where < p < 1. Show that this chain has a unique stationary distribution

and find

it.

Stationary Distributions of a Markov Chain

15 The transition function of a Markov chain is called doubly stochastic if

xe9'

P(x, y) = 1,

What is the stationary distribution of an irreducible Markov chain having d < 00 states and a doubly stochastic transition function?
16 Consider an irreducible Markov chain having finite state space [/,

transition function P such that P(x, x) = 0, X E [/ and stationary distribution n, Let Px, x E Y, be such that 0 < Px < I, and let Q(x, y), x E [/ and y E Y, be defined by Q(x, x) and Q(x, y)
= =

1 - Px

PxP(x, y),

y ¥= x.

Show that Q is the transition function of an irreducible Markov chain having state space [/ and stationary distribution n', defined by n

'( ) _
X-I

p; In(x)
Lye9'

n(y)

XE[/.

The interpretation of the chain with transition function Q is that starting from x, it has probability 1 - Px of remaining in x and probability Px of jumping according to the transition function P.
17 Consider the Ehrenfest chain.

Suppose that initially all of the balls are in the second box. Find the expected amount of time until the system returns to that state. Hint: Use the result of Exercise 7(a).

18 A particle moves according to a Markov chain on {I, 2, ... , c

d},

where c and d are positive integers. Starting from anyone of the first c states, the particle jumps in one transition to a state chosen uniformly from the last d states; starting from any of the last d states, the particle jumps in one transition to a state chosen uniformly from the first c states. (a) Show that the chain is irreducible. (b) Find the stationary distribution. chain having the transition matrix given by Exercise 19 of Chapter 1. (a) Find the stationary distribution concentrated on each of the irreducible closed sets. (b) Find limn-+oo Gn(x, y)Jn. 20 Consider a Markov chain having transition matrix as in Exercise 20 of Chapter 1. (a) Find the stationary distribution concentrated on each of the irreducible closed sets. (b) Find limn-+oo Gn(x, y)Jn.
19 Consider a Markov

Exercises

21 Let Xm n

> 0, be the Ehrenfest chain with d = 4 and Xo = O.

(a) Find the approximate distribution of Xn for n large and even. (b) Find the approximate distribution of Xn for n large and odd. 22 Consider a Markov chain on {O,1, 2} having transition matrix
012 P=1
2

[0 0 10
-! -!

(a) Show that the chain is irreducible. (b) Find the period. (c) Find the stationary distribution.
23 Consider a Markov chain on {O,1, 2, 3, 4} having transition matrix

0 00 10 0 P=2 31 41

12
"3
1

1.
3

34 00

0 0 0 0

0 .i. i 4 0!i 000 000

(a) Show that the chain is irreducible. (b) Find the period. (c) Find the stationary distribution.

Markov Pure Jump Processes

Consider again a system that at any time can be in one of a finite or countably infinite set Y of states. We call g the state space of the system. In Chapters 1 and 2 we studied the behavior of such systems at integer times. In this chapter we will study the behavior of such systems over all times t > O. 3.1. Construction of jump processes

Consider a system starting in state Xo at time O. We suppose that the system remains in state Xo until some positive time Tl' at which time the system jumps to a new state Xl =f:. Xo. We allow the possibility that the system remains permanently in state xo, in which case we set T1 = 00. If T 1 is finite, upon reaching X 1 the system remains there until some time T2 > Tl when it jumps to state X2 =f:. Xl' If the system never leaves Xl' we set T2 = 00. This procedure is repeated indefinitely. If some Tm = 00, we set Tn = 00 for n > m. Let X(t) denote the state of the system at time t, defined by

o
(1)
Tl T2

< < <

t t t

< < <

T1, T2' T3'

The process defined by (1) is called a jump process. At first glance it might appear that (1) defines X(t) for all t > O. But this is not necessarily the case. Consider, for example, a ball bouncing on the floor. Let the state of the system be the number of bounces it has made. We make the physically reasonable assumption that the time in seconds between the nth bounce and the (n + l)th bounce is 2-n• Then x, = nand

3.1.

Construction of jump processes

< 2 and r, --+- 2 as n --+- 00. Thus (1) defines X(t) only for o < t < 2. By the time t = 2 the ball will have made an infinite number of bounces. In this case it would be appropriate to define X(t) = 00 for We see that
Tn

> 2.
In general, if

(2)
n-+oo

we say that the X(t) explode, i.e., if

(3)

process explodes. lim

n-+oo

If the X(t)

process does not

= 00,

then (1) does define X(t) for all t > O. We will now specify a probability structure for such a jump process. We suppose that all states are of one of two types, absorbing or nonabsorbing. Once the process reaches an absorbing state, it remains there permanently. With each non-absorbing state x, there is associated a distribution function Fx(t), - 00 < t < 00, which vanishes for t < 0, and transition probabilities Qxy, y E Y, which are nonnegative and such that Qxx = 0 and
(4)

A process starting at x remains there for a random length of time T 1 having distribution function F; and then jumps to state X( T 1) = y with probability QXY' y E Y. We assume that Tl and X(T1) are chosen independently of each other, i.e., that PiTI

< t, X(T!) = y) = Fit)Qxy.

Here, as in the previous chapters, we use the notation Pi ) and Ex( ) to denote probabilities of events and expectations of random variables defined in terms of a process initially in state x. Whenever and however the process jumps to a state y, it acts just as a process starting initially at y. For example, if x and yare both non-absorbing states, PiT! < s, X(T!)
= y,
T2 T!

< t, X(T2)

z) = FxCs)QxyFy(t)Qyz.

Similar formulas hold for events defined in terms of three or more jumps. If x is an absorbing state, we set QXY = bxy, where

Y =1=

x, x.

Equation (4) now holds for all x E Y. We say that the jump process is pure or non-explosive if (3) holds with probability one regardless of the starting point. Otherwise we say the

Markov Pure Jump Processes

process is explosive. If the state space !/ is finite, the jump process is necessarily non-explosive. It is easy to construct examples having an infinite state space which are explosive. Such processes, however, are unlikely to arise in practical applications. At any rate, to keep matters simple we assume that our process is non-explosive. The set of probability zero where (3) fails to hold can safely be ignored. We see from (1) that X(t) is then defined for all t > O. Let P xy(t) denote the probability that a process starting in state x will be in state y at time t. Then and

In particular, PxiO) = bxy. We can also choose the initial state x according to an initial distribution no(x), x E !/, where no(x) > 0 and

In this case, P(X(t)

y) =

L no(x)P xit).
x

The transition function PXy(t) cannot be used directly to obtain such pro babilities as unless the jump process satisfies the Markov property, which states that for o < Sl < ... < s; < S < t and Xl, ... , xn, x, Y E !/,

By a Markov pure jump process we mean a pure jump process that satisfies the Markov property. It can be shown, although not at the level of this book, that a pure jump process is Markovian if and only if all non-absorbing states x are such that s, i.e., such that (5)
1 - F xCt
t

> 0,

1 - FxCs)

s) = 1 _ F (t)
x ,

Now a distribution function F; satisfies (5) if and only ifit is an exponential distribution function (see Chapter 5 of Introduction to Probability Theory). We conclude that a pure jump process is Markovian if and only if F, is an exponential distribution for all non-absorbing states x.

3.1.

Construction of jump processes

Let X(t), 0 < t < 00, be a Markov pure jump process. If x is a nonabsorbing state, then F; has an exponential density Ix. Let qx denote the parameter of this density. Then qx = 1/ExC1:1) > 0 and
t t

> 0, < O.

Observe that P xC 1:1 > t)

f.
,

OCJ

q xe - qxs ds

= e - qxt,

> O.

If x is an absorbing state, we set qx = O. It follows from the Markov property that for 0 < t 1 < ... < tn and Xl, ... , x, in !/,
(6)

P(X(t1)

Xl'

...

X(tn) = xn)
= x1)Pxtxit2

= P(X(t1)

- t1)···

PXn-tXn(tn - tn-I)'

In particular,

for s > 0 and t > 0

Since

Pxy(t
we conclude that (7)

s) =

L P x(X(t)
z:

= z, X(t

s) = y),

Pxit

s) =

L Pxit)Pzy(s),
z

s > 0 and t > O.

Equation (7) is known as the Chapman-Kolmogorov equation. The transition function P xit) satisfies the integral equation (8) which we will now verify. obvious fact that If
X

> 0,

is an absorbing

state, (8) reduces to the

> O.
for a process starting at x, occurs if and only if the first process to z, and the process of time. Thus

Suppose X is not an absorbing state. Then the event {1:t < t, X(1:t) = z and X(t) = y} jump occurs at some time s < t and takes the goes from z to y in the remaining t - s units

Markov Pure Jump Processes

so Pi7:1 < t and X(t) = y) = ~ Pi7:1 < t, X(7:1) = z and X(t) = y) zif-x

Also

= uxy e-qxt . s:
Consequently, P xy(t) = Px(X(t)

= y)
y)

= Pi7:1 > t and X(t) =

= (jXye-qxt

+ Px(7:1 < t and X(t) =

QxzPzy(t -'

+ I.t qxe-qxS (~
o
t-

zif-x

S))

ds,

as claimed. Replacing s by as
(9)

s in the integral in (8), we can rewrite (8)

It follows from (9) that Pxy(t) is continuous in t for t > O. Therefore the integrand in (9) is a continuous function, so we can differentiate the

right side. We obtain

(10)

P~y(t)

«»xit)

+ s, ~ QxzPzy(t),
zif-x

> O.

In particular, P~y(O) = - q~ xy(O)

+ s, ~ QxzP zy(O)
zif-x

Set
(11)

x, Y

E [1'.

Then
(12)

qxy =

Y = x,

=1=

It follows from (12) that

(13)

~ qxy yif-x

-qxx·

3.2. Birth and death processes

The quantities qxy, x E f/ and y E f/, are called the infinitesimal parameters of the process. These parameters determine qx and QXY' and thus by our construction determine a unique Markov pure jump process. We can rewrite (10) in terms of the infinitesimal parameters as
(14)
z

> O.

This equation is known as the backward equation. If f/ is finite, we can differentiate the Chapman-Kolmogorov equation with respect to s, obtaining
(15)

P~,(t

s) =

L P xZ<t)P~,(s),
z

s > 0 and

> O.

In particular, P~y(t) = or equivalently,

(16)
t
z

L Pxz(t)P~,(O),
z

> 0,

> O.

Formula (16) is known as the forward equation. It can be shown that (15) and (16) hold even if f/ is infinite, but the proofs are not easy and will be omitted. In Section 3.2 we will describe some examples in which the backward or forward equation can be used to find explicit formulas for P x,(t). 3.2. Birth and death processes

Let f/ = {O, 1, ... ,d} or f/ = {O, 1, 2, ... }. By a birth and death process on f/ we mean a Markov pure jump process on f/ having infinitesimal parameters qxy such that

Iy -

x] > 1.

Thus a birth and death process starting at x can in one jump go only to the states x-lor x + 1. The parameters Ax = qx,x+l, X E f/, and /lx = qx,x-l' X E f/, are called respectively the birth rates and death rates of the process. The parameters qx and QXY of the process can be expressed simply in terms of the birth and death rates. By (13)

so that (17) and

Markov Pure Jump Processes

Thus x is an absorbing state if and only if Ax = J1.x = 0. If x is a nonabsorbing state, then by (12) J1.x Ax (18) Qxy = Ax 0,

+
Ax

J1.x J1.x

,
,

y=xy=x+
elsewhere.

1, 1,

A birth and death process is called a pure birth process if J1.x = 0, X E 9', and a pure death process if Ax = 0, X E 9'. A pure birth process can move only to the right, and a pure death process can move only to the left. Given nonnegative numbers Ax, x E 9', and J1.x, x E 9', it is natural to ask whether there is a birth and death process corresponding to these parameters. Of course, J1.o = is a necessary requirement, as is Ad = if 9' is finite. The only additional problem is that explosions must be ruled out if 9' is infinite. It is not difficult to derive a necessary and sufficient condition for the process to be non-explosive. A simple sufficient condition for the process to be non-explosive is that for some positive numbers A and B

x >0.
This condition holds in all the examples we will consider. In finding the birth and death rates of specific processes, we will use some standard properties of independent exponentially distributed random variables. Let ~1' ... ' ~n be independent random variables having exponential distributions with respective parameters (Xl' ••• ' (Xn. Then min (~1' ... '~n) has an exponential distribution with parameter (Xl + ... + Cln and (19) Moreover, with probability one, the random variables n distinct values. To verify these results we observe first that P(min(~l'···' k = 1, ... , n.
~b •.. , ~n

take on

~n) > t)