Test
Test
Test
A Continuation Order Plan is available for this series. A continuation order will bring
delivery of each new volume immediately upon publication. Volumes are billed only upon
actual shipment. For further information please contact the publisher.
Applied Probability
Frank A. Haight
The Pennsylvania State University
University Park, Pennsylvania
vii
Contents
2. Conditional Probability............................................... 51
2.1. Introduction. An Example ............................................ 51
2.2. Conditional Probability and Bayes' Theorem. . . . . . . . . . . . . . . . . . . . . . . . .. 56
2.3. Conditioning........................................................... 61
2.4. Independence and Bernoulli Trials ............... .................... 64
2.5. Moments, Distribution Functions, and Generating Functions. . . . . . .. 68
2.6. Convolutions and Sums of Random Variables........................ 70
2.7. Computing Convolutions: Examples.................................. 74
2.8. Diagonal Distributions ................................................ 76
2.9. Problems............................................................... 81
ix
x Contents
3. Markov Chains. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 89
3.1. Introduction: Random Walk .......................................... 89
3.2. Definitions............................................................. 91
3.3. Matrix and Vector..................................................... 95
3.4. The Transition Matrix and Initial Vector............................. 99
3.5. The Higher-Order Transition Matrix: Regularity ..................... 101
3.6. Reducible Chains ...................................................... 104
3.7. Periodic Chains ........................................................ 106
3.8. Classification of States. Ergodic Chains ............................... 106
3.9. Finding Equilibrium Distributions-The Random Walk
Revisited .............................................................. 108
3.10. AQueueingModel ..................................................... 113
3.11. TheEhrenfestChain ................................................... 117
3.12. Branching Chains ...................................................... 119
3.13. Probability of Extinction ............................................... 123
3.14. The Gambler's Ruin ................................................... 124
3.15. Probability of Ruin as Probability of Extinction ...................... 127
3.16. First-Passage Times .................................................... 127
3.17. Problems ............................................................... 132
Discrete Probability
Example 2. Two chess players agree to a match where the winner will
be the first one to accumulate five points, with one point awarded for a win,
half a point for a draw, and no points for a loss. Sample space: the number
of games played in the match, 5,6,7,8,9,10. (Note: If 10, the match is a
draw.)
t The term "sample space" for the possible outcomes of a probabilistic experiment has some
drawbacks: In the first place it could suggest statistical sampling, and in the second place it is
more akin to the "domain" of a variable, as that term is used in mathematics. Nevertheless, it
is a well-established expression and is used here as in virtually all textbooks.
4 Chapter I
In the first six examples, the number of outcomes in the sample space is
finite. The outcomes are called points of the sample space. In the examples
which follow, the sample spaces contain an infinite number of points.
Example 8. Light bulbs are tested as they come off the production line
and are found to be working or defective. Sample space: the number of
bulbs tested before the first defective is found, 0, 1,2, . . .. Should the
possibility of an infinite duration of good bulbs be built into this model,
similar to Example 7? This is a typical problem in model building, and the
answer will depend on the nature of the model needed.
Example 9. There are n keys, only one of which fits a lock, and keys
are chosen at randomt and tried, not eliminating those which have failed.
Sample space: the number of keys tried before a success, 1,2,3,4, ....
Another sample space would be the number of failures before a success,
0,1,2,3, ....
Example 10. Bids are invited for a contract. Sample space: the number
of bidders, 0,1,2,3, .... In this case it is difficult, if not impossible, to
tFor the time being. the expression "at random" is used in a purely intuitive sense; later it will
be made more precise by specifying equal probabilities.
Discrete Probability 5
Comparing these examples with one another, it seems that the choice of
the initial value is usually not so difficult. In Example 10, there could
presumably be zero bidders for the contract, so it would be inappropriate to
begin with one. On the other hand, in Example 12, the idea of buying zero
bus tickets makes no sense, so the sample space properly begins with one. In
Example 4, the sample space might include the point zero for a self-service
elevator, but if an operator is present, the sample space might begin with
one.
Although a clear understanding of the system being modeled often
gives the initial value of the sample space, genuine ambiguities are sometimes
encountered in deciding on the final value. In Example 12, it certainly does
not make sense to assume an enormously large number of people buying a
ticket, but on the other hand, where should the line be drawn? Before going
further into this question, it is useful to consider some further examples
where the problem is even more difficult.
Finally, the sample spaces given in this section are all discrete, whether
finite or infinite. Beginning in Chapter 4, some continuous sample spaces
will be treated. Many students are not as familiar with the mathematical
techniques of discrete variables (especially summation) as with those for
continuous variables (integration), and so in this first chapter, some empha-
sis is given to technique, while the fundamentals of probability are being
discussed. As the book progresses, more and more of the technique will be
taken for granted, and the steps in proofs correspondingly abbreviated.
(p,l-p), O<p<l,
The parameter is usually represented by the letter ;\; the probabilities are
therefore written
( e -x , ;\e -x , "2;\
1 2e -x , 3!
1 A3e -x , . .. ) ,
This distribution has a parameter on the unit interval and is defined over an
infinite number of values.
The Binomial Distribution. Sometimes the normalization to unity is
obtained not by dividing the sum of the series, but by assigning parameter
values. As an example, consider the binomial series
This well-known formula contains three parameters: a and 7T, which may be
any real numbers, and n, which is a positive integer (or possibly zero). By
setting a= 1-7T, the series is normalized to unity, and its terms can be used
as probabilities:
This formula specifies a finite number (n+ 1) of probabilities. Note that one
of the two parameters of the distribution tells how many probabilities there
are, while the other lies in the unit interval.
The Fisher Distribution. The Fisher t distribution is obtained by nor-
malizing the terms of the logarithmic series to unity and making other small
changes:
-w /3) ,
x log( I -
x=I,2,3, ... , -1::::::/3<1.
The five distributions introduced in this section will not provide for all
the probability models required (new distributions occur from time to time
in the sequel). They are given here to show how probability distributions
can be formed. It will be useful for the student to write out some of the
terms in each of these distributions, with specific parameter values, and see
how they might be meaningful representations of probabilities for some of
the examples given in Section 1.2. Here are some possibilities for investiga-
tion:
probability [in fact, one element of the probability distribution (-L -!-)], with
the left side, which contains a sample point enclosed by the "operator" P( ).
The equality comes, as mentioned in Section 1.2, purely as a definition (of a
"fair" coin). Statement B, on the other hand, is a theorem and needs to be
proved. t In either case, the basic format is clear, and is of the form
P{ sample point) = probability,
and it is necessary to be sure that if all the sample points are listed with
their corresponding probabilities, the total of the probabilities is unity. This
means only that all of the sample points are accounted for and all the
probabilities are used, with no leftovers on either side.
At this stage it is important to make the distinction between those
sample points which are quantitative (represented by a number, such as the
number of idle operators, the number of bugs on a leaf, etc.), on the one
hand, and those which are qualitative (side of coin, color of car, name of
individual, i.e., represented by a "label").
The qualitative case is useful in the beginning for introducing some
elementary examples, and, in theory, for showing that not all probabilistic
experiments must produce numerical outcomes. However, as the book
progresses and more practical situations are under discussion, the experi-
ments will be almost exclusively numerical. When the sample space is
quantitative, we speak of a random variable (usually denoted by a capital
letter). For example, let X= the number of bugs on a leaf. The sample space,
in this case 0, 1,2, ... , then represents the values that the random variable
can take. Then the general displayed equation shown above takes on the
special form
P{ X=x )=probability, x=0,1,2, ....
There is often a natural correspondence between sample points and the
values of the random variable, although in a completely theoretical sense the
random variable is chosen, just as the sample space was chosen, as part of
the definition of the experiment. In this chapter, random variable values will
almost invariably be non-negative integers, or some subset (finite or infinite)
thereof.:j: There are many cases in which a random variable can be defined
in a meaningful way for an experiment with purely qualitative sample
points.
tThere is a third category of probability statements encountered in the study of statistics: the
empirical fact, established on the basis of experimental evidence.
*The book as a whole treats positive (discrete or continuous) random variables.
12 Chapter I
In the toss of a coin, the labels "head" and "tail" can be replaced by the
random variable X=number of heads, with X=O or X= 1. Then the
probabilities for a fair coin would be
p(X=O)=p(X= 1)=1:.
p(X=O)=1:,
it would hardly be worth the effort to try to reduce these three formulas to
one comprehensive one.
Also, although the sample points and the values of the variable x often
agree, as in the examples at the end of Section 1.3, it is not compulsory that
they should do so. The probabilities can be assigned in any way at all. The
Poisson probabilities fall naturally on the non-negative integers 0, 1,2,3, ... ,
but it is not difficult to assign them to values other than those. If the
random variable X takes values which are multiples of 3, for example,
3,6,9, ... , the Poisson assignment would be given by
Ax-1e->-
P( X = 3x ) = (x - I)! ' x=I,2,3, ... ,
which is equivalent to
Ax/ 3 - l e->-
P(X=x)= (x/3-1)! ' x=3,6,9, ....
Discrete Probability 13
In other words, the sample points and the probabilities are completely
separate entities which are combined to make a complete probability model.
Unfortunately, there is in the literature a considerable variability in
notation, both because of different traditions and because of different
purposes which need to be served. Often it is convenient to abbreviate
P( X = x) by PX' representing the probability of the value x. If there is a
parameter Ain the distribution, this might be included in the notation Px( A),
being, for example, the probability of the value x in a Poisson distribution
with parameter A. In some cases, mainly those which occur in theoretical
probability, it is desirable to keep the random variable showing in the
notation by writing P(X=x)=Px(x), but the right side is hardly more
compact than the left side, and in this book the full form P( X = x) will be
retained when it is desirable to mention the random variable.
Omitting the random variable from the notation is convenient when it
is fixed during an entire calculation and the calculation is rather complex.
Examples of this will occur in Section 1.7 and subsequently.
Finally, it is worthwhile to comment on the theoretical nature of
random variables. An abstract treatment of probability emphasizes that a
random variable is a/unction, which takes sample points into numbers. That
is, given any result of an experiment (sample point), the random variable
translates this result into a number. This fact can be appreciated by
considering different random variables for a single experiment. Suppose a
coin is thrown. Let
and so forth. The choice of one of these random variables is always a choice,
and is not intrinsic in the experiment of throwing the coin. In other words,
after the coin has been thrown, there is no number shining brilliantly from
the coin to the observer; any number must be defined. In making the
definition, the observer is choosing a function which must give a unique
value for the result of any throw.
14 Chapter 1
p( E U F) = p( E) + P( F). (4)
Example 1. A fair die has six sides with the numbers 1,2,3,4,5, and 6
and a probability of one-sixth for each of the sides appearing in a single
throw. Suppose a random variable X is defined as X=number showing in
one throw of a fair die, with possible values 1,2,3,4,5, and 6. Consider the
event" X is odd." The probability can be calculated as follows:
=i+i+i
= -}.
This example is trivial and the answer is obvious in any case. Now some less
simplistic examples will be given, which will also permit a further acquain-
tance with probability distributions.
What is the probability that there will be more than three accidents on a
day?
P(X>3)=P(X=4UX=5UX=6U ... )
=P(X=4)+P(X=5)+P(X=6)+ .. .
=(I_p)p4+(I_p)p5+(I_p)p6+ .. .
= (I - P)p4( 1 + P+ p2 + ... )
XXe->'
P(X=x)=-,-, x=o, 1,2,3, ... , X>O.
x.
What is the probability that a day will pass without more than one claim?
P(X$I)=P(X=OUX= J)
= 1- P(X=n-l)- P(X=n)
= l-n7T"-I(l-7T )-7T".
Probabilities for simple events have been assumed, according to the
fundamental principle, to be additive. The same is true for compound events
which are mutually exclusive, i.e., which have no sample points in common.
When there are common points between E and F, additivity is no longer
valid. In fact, if P(E) and P(F) are computed separately, all points in EnF
are counted twice, once for P(E) and once for P(F). Hence the additivity
equation (4) needs to be modified as follows:
P( E U F) =p( E) + p( F) - p( E n F). (6)
This formula is also an axiom of probability.
00
= L /Pj-m 2 (8)
j=o
(These summations have been written withj=O, 1,2, ... , which includes all
the cases of concern in this chapter; naturally, if other values of the random
variable occurred, the summations would have to be suitably modified.)
The various properties of the mean and variance are especially im-
portant in the study of statistics. Here it is sufficient to note that the mean is
Discrete Probability 19
a kind of average value, where random variable values are weighted with "the
corresponding probabilities and summed, and that the variance is a measure
of dispersion, since the weights are (squared) distances from the mean. A
small variance indicates little variability and, in fact, zero variance indicates
all probability concentrated at a single value P(X=m)= 1. Such a distribu-
tion is called causal (since the cause of the phenomenon should be known)
or deterministic.
In addition to the mean and variance, it is sometimes useful to consider
higher moments m r, r= 1,2,3, ... , defined by the formula
00
mr =~ fpj. (9)
j=O
and this can be extremely misleading, partly because E( X) may not even be
a value of X [E( X) for the throw of a fair die is 31, but this is not a possible
result], but also because even if it is a value of X, it is not necessarily the
most probable value.
and
E(cX)=cE(X).
Therefore define
PI =C,
P2 =C/4,
P3 =C/9,
P4 =C/16,
and, in general,
x=I,2,3, ... ,
so that
6
Px=2"2' x= 1,2,3, ... ,
'fTX
where of course 'fT does not represent a parameter, but the Archimedean
constant. By construction, E( X)= '<jPj = 00.
By using the same series with n = 3, a distribution can be constructed
with a finite mean but an infinite second moment, and with higher values of
n, a distribution with the first n - 2 moments finite and infinite higher
moments.
The idea of a probability distribution with an infinite mean value,
although simple enough mathematically, has a history of confusion. One of
the most famous "paradoxes" in probability, the Petersburg Paradox
(Chapter 2, Problem 71) depends only on the failure to grasp the concept of
such a probability distribution.
x=O,I,2, ... ,
00 ';'./e-"A
E(X)= ~ }-.,-.
j=O J.
In evaluating this sum, the first thing that springs to mind is to cancel} from
numerator and denominator. There is a slight pitfall here, which has been
known to trap students: The cancellation does not apply to the first term,
which, in fact, vanishes. The first step is therefore
00 'A/e-"A
E(X)= ~ ( '-1)' .
j=l ] .
Ai
.,
00
=Ae-"A ~
~
-
j=O J.
=A.
then
00
E(X}= ~ j(l-p}pJ.
J=O
To evaluate the sum, let
t This fact is sometimes taken by practical people in the reverse sense, assuming that data which
exhibits equality of mean and variance must necessarily be from a Poisson distribution. This is
a serious error, inasmuch as information about two moments is insufficient to determine all
probabilities.
24 OJapter 1
so that
E(X)=p/(I-p ).
d ~ .
=-
dr 1
J""r
=!(l~r)
and so forth.
In finding the second moment, the series to be evaluated is
=r(I+r)/(I-r).
var(X)= p 2
(l-p)
The steps needed to reduce this to a standard binomial sum are similar to
those used to reduce the Poisson mean to a standard exponential sum:
cancellation of the factorial, adjustment of the index, and removal of the
necessary constant:
=n'1T.
1/2
--_0
---0 -
---~o _ __
o
oL-____~--~----~----~----~====~o----~
1--------------------
o
--_0
o
- - -....0
112
---"0
1/2
o
1/4 0
o
o+-__-,___~----~c_----_,------~o----~o~----~-
2 3 4 5 6 7
Figure 1. An illustrative probability distribution. Top, tail Q( x); middle, distribution function
P( x ); bottom, exact probabilities p,.
P( X:::::: X ).t The first important thing to notice is that whereas the exact
probabilities are defined only for integer values of X (and it would be more
complete always to add "zero elsewhere" in every equation), both of the
cumulative distributions are, by virtue of the inequality sign, defined for all
real values of x. This fact will be reflected in the notation by writing the
argument in parentheses, rather than as a subscript, a common mathemati-
tThese terms, which are quite standard, have two peculiarities: a lack of symmetry in name
("head" and "tail" might be better) and some awkwardness in using the word "distribution,"
both for the entire concept and for one particular function characterizing it.
Discrete Probability 27
cal indication,
p(x)=p(X<x),
P(O)=o,
x= 1,2,3, ... ,
Note. These formulas are written under the assumption that the proba-
bilities occur at the non-negative integers; some modification would be
needed in other cases.
t Many authors define the cumulative functions so that they are both right continuous:
P(x)=P(X~x), Q(x)=P(X>x). The reason for a preference for left continuity appears in
Section 4.9.
28 Chapter 1
lim p(y)=P(x),
lim Q(y)=Q(x),
lim p( x) = lim Q( x ) = 1,
X-toOO x-+-oo
P(x)= I-px,
(11 )
Q{x}=px
P(x)=O, x::o;l,
P( x ) = I , x> n .
Discrete Probability 29
On the other hand, the cumulative forms of the Poisson and binomial
distributions must be reserved for Chapter 4, since they involve higher
transcendental functions.
( 12)
(14)
so that
[ r( 1)] 2 = 41o 1 e
00
0
00
-(x' +y2) dx dy,
Using the recursion relation (13), a general formula for all half-integers is
easily calculated:
(15)
In this formula, x must be a positive integer, but n can be any real number.
Note that (n)x vanishes for n=O, -I, -2, ... , I-x. It is easy to write (n)x
as a quotient of gamma functions:
The gamma function can also be used to extend the definition of the
binomial coefficients (~), since
( -n)=~(-n-x+l)
x x! x
1
= - ( -n- x+ 1)( -n- x+2)( -n)
x!
= ~(-Irn(n+
x!
I) (n+x-I)
( 17)
Discrete Probability 31
r(p)r(q)=4 1o'IT/2
(cosa) 2-1
P
100 2
(sina) 2q- 1da r (p+q)-l e - r 2 dr
0
(l-A)(l-p)= 1
( )
E X = j-:/
~ (J+n-l)(
J 1- P
)n.
pJ
~ (j+n)! ( )n.
= j-:I (j-l}!(n-l)! I-p pJ
= ~ (j+n)! (1 p}npJ
j-:o J!(n-l)! -
and this, being a binomial sum with parameter n + I, equals unity. Therefore
P(X=O)=t
P(X=I)=t,
P(X=2)= t,
P(X=3)=t
P(X=4)=~,
P(X=7)=t,
</>(s)= ~ P(X=j)sJ,
J=O
(20)
Poisson Distribution
Binomial Distribution
=(\-wf (n)(
J=O ) I
~
w
)J
=(\-w+swf
(I-pf
</>(s)= (I-psf
Discrete Probability 35
Geometric Distribution
l[>(s)=(I-p)j(1-ps).
Fisher Distribution
In every case the proof is little more than a small variant on the sum which
originally defined the probability. Unfortunately, there will be some distri-
butions for which the sum is not quite so simple; but even if the p.g.f. is left
in summation form, it can still be used and be useful.
One difficulty which students often have in understanding p.g.f.'s is
knowing the meaning of s. This difficulty is not confined to students, for
there is some discussion among mathematicians as to whether s should be
regarded as a "label," a real variable, or something else. t For the purposes
of this book, it is best to regard s as a real variable, defined at least over the
interval OSsS 1, and I[>(s) as a function having rather interesting proper-
ties.
The first of these properties is that 1[>( 1) = 1; this equation is the p.g.f.
version of normalization to unity. In other branches of mathematics, gener-
ating functions are defined for an arbitrary sequence of constants a o +a1s+
a 2 s 2 + ... ; the distinguishing feature of a p.g.f. is that since the coefficients
are probabilities, setting s = 1 must yield 1[>(1) = 1. What is the value of I[>(O)?
Since I[>(s) contains all the information about the distribution, it must
be possible to recover the exact probabilities. This, it is clear from Eq. (20),
can be done by differentiation and setting s=O:
(21)
so that
m=E(X)=I[>'(1),
v=m 2 -m 2
In a similar fashion, the third and higher moments can be written in terms
of the p.gJ. at the point s= 1.
Change of Variable. A function of a random variahle is itself a random
variable. Specifically, two functions of importance in this book are adding a
constant (X + k) and multiplying by a constant (eX). Neither of these
operations affects the probabilities, but both affect the values of the
variables. Consider the example given at the beginning of this section, with
k=3 or with e=5. In the first instance, the distribution (writing Y=X+3,
p.gJ. '" for Y) is
P(Y=3)=t,
P(Y=4)=i,
P(Y=5)=-L
and
Discrete Probability 37
P(Z=O)=-!,
P(Z=5)= !,
P{Z=IO)=!.
and
The student can verify the general principle suggested by these exam-
ples: If random variables X, Y, and Z have p.g.f. <l>x(s), <l>y(s), and <l>z(s),
then
and
<I>(s)= ~ p(j)si,
i=1
00
'I'(s)= ~ QU)si,
i=O
<1>( s) = s;~s; ,
(24)
( _l-sl/>(s)
'I's)- 1
-s .
E(X)='I'(I)-1.
Similarly,
v=2'1"(I)+'1'(I)- ['I'(!)f
and wk can be calculated step by step from this formula. In fact, the student
will be able, using the recursion formula, to find W k up to a dozen or so
factors without much trouble, especially the student with a small calculator.
This still does not give a general formula for wk as a function of k, and
it is a real challenge to the ingenuity to try to convert the recursion relation
into such a formula. One of the simplest methods is by use of the generating
function
00
I/>(s)= ~ wjsj.
j=1
The negative sign is chosen so that the condition cp(O) =0 is satisfied, and s
must be less than t for real roots.
The next problem is to find the wk from cp( s). Since the W k are the
coefficients in the power-series expansion of cp(s), it is only necessary to
express cp( s) in powers of s. U sing the binomial theorem,
Therefore
=2 2k - 1 k\ [(-!)U)(~) .(k-~)]
=2 k - 1 k\ [13-5 (2k-3)]
=~(2k-2)
k k-l
wk
=_1
k-l
(2k-2)
k .
Discrete Probability 41
*
Remember that this sequence does not form a probability distribution
[</>(1) I). Nevertheless, it can be used to construct a probability distribu-
tion by the familiar method of normalization to unity. Dividing by </>(s),
x=1,2,3, ... ,
is a valid probability distribution with parameter </> (or s, since they are
related by the quadratic equation). In Chapter 6 this distribution occurs as
an important one in the theory of queues, with a parameter p satisfying
- p
</>- l+p'
s= P
(1 + p)2
P(X=x)=~
I (2X-2)(
x-I
p
l+p
)X-'( I )X
l+p' x=I,2,3, .... (27)
The only exceptions are straight lines, and it may be well to deal with
these cases first. The right end being anchored at (I, I), there are two cases:
(a) t=cp(s), a horizontal line, therefore Po = 1, and the distribution is
causal at the origin.
(b) cp(s)=Po +PIS, a diagonal line connecting (0, Po) and (1,1). Then
PI = 1 -Po' and the distribution is a simple version of the binomial, with
n = I. This is called the Bernoullit distribution and is modeled by a single
throw of a (not necessarily fair) coin.
As a special case of (b), there is case (c), PI = I, the causal distribution
at the value x= 1.
Except for these special cases, t = cp( s) is a curve which is at least
quadratic. Furthermore, the slope at (0, Po) is PI (non-negative) and the
slope at (I, I) is m, positive except for the degenerate case (a).
In one of the applications of Markov chain theory in Chapter 3, it is
important to know whether or not the curve t = cp( s) crosses the line t = s
between (0, Po) and (I, I), i.e., to find whether the equation s = cp( s) has a
root in the interval (0, I). [It always has the root s = 1 (normalization to
unity once again).] Because of the increasing slope of the curve, there can be
at most one such root [except for the degenerate case (c)].
If there is a root in this interval, the slope of the curve must be greater
than unity at the crossing point and must therefore be greater than unity at
( 1, I). Thus m> I. On the other hand, if there is no such root, t = cp( s) will
always be above (=s [except for case (c)] and so the slope at its largest value
cp'(I) will be less than or equal to one. This shows that, except for the trivial
cases, the existence of a root of the equation cp(s)=s in the interval O<s< 1
is a necessary and sufficient condition for m> 1.
As an example, consider the Poisson distribution with parameter A. The
equation cp(s)=s can be written
or
A= logs
s-I'
and logs/(s-l) will be less than one. Reversing the inequality on s reverses
it on A.
1.14. Problems
2. For a Poisson distribution with parameter A, show that the largest probability
corresponds to x, the largest integer less than or equal to A.
3. What is the largest probability for the binomial distribution?
4. Make the first three terms of an arithmetic progression into a probability
distribution by normalizing to unity. How many parameters will there be in the
distribution? What range of values can the parameters have?
5. In Section 1.5, Example 2, find the probability that X is an odd number.
6. In Section 1.5, Example 3, find the probability that there will be more than
three claims in one day.
7. In Section 1.5, Example 4, find the probability that X is odd, square, and less
than ten.
8. In a Poisson distribution on the non-negative integers, P(X= 1)=P(X=2).
Find P( X=4). Ans. t e- 2
10. Find the mean, variance, and probability generating function for the geometric
probabilities defined for the following values: (i.) 1,2,3, ... , (ii) 2,3,4, ... , (iii)
k, k+ I, k+2, ... , (iv) 1,3,5, ....
II. Let E and F be events, and let E' be the event consisting of all sample points
not in E. Let IE' IF' and IE' be the respective indicators. Show that (i)
IE' = 1- IE' (ii) IEnF = IE IF' (iii) I EUF = 1-(1- IE)(l- IF)'
44 Chapter 1
12. Using the indicator relationship of Problem II, show that for three events E, F,
and G, EU(FilG)=(EUF)il(EUG).
13. Generalize the red and blue ticket problem of Section 1.5 to n tickets of each
color. Using indicators, show that E( X) = 1, independent of the number of
ticket pairs.
14. Let Xbe a random variable with a geometric distribution over the non-negative
integers, with parameter p= t. Compute the probabilities (i) X>4, (ii) 3:5X<6,
(iii) 4<X:56 or X>B, (iv) 2:5X:54 or B<X< 12.
Ans. (i) +Z, (ii) i4, (iii) -Hz
15. Let X be Poisson distributed over the non-negative integers with parameter A.
Find the expected value of (I + X) -I. A 118. (I-e -h)/A
16. Prove that the variance of aX is equal to a 2v, where v is the variance of X.
17. Let Xbe a binomial variable with parameters 'TT and n. Find the expected value
of (I + X) -I.
lB. A random variable has distribution
E(X)= N-Np+pN+1
I-p
24. Let ppl.) represent the Poisson probabilities for x=O, 1,2, ... and let P(x, A)
and Q( x, A) be the cumulative probabilities (with parameter indicated). Prove
the following formulas:
(i) lOp(X-I,At)dt=x/A,
o
x
(ii) ~ Pj(A)Px~//L)=pX<A+/L),
j=O
(iv) 1o
u 00
Q(x-I,At)dt=(I/A) ~ Q}AU),
j=x
d
(v) dA Q(x, A)=p,(A).
29. Find the variance of the negative binomial distribution directly, using the p.g.f.
30. Referring to Section 1.10, draw the graph of (I - A)( I - p) in the A~p plane and
verify that the statements about the values of these parameters are correct.
31. LetP(X=x)=ix,x=I,2,3.FindP(X2 =x). Ans. X I / 2/6
46 Chapter 1
32. Let X be geometrically distributed over the non-negative integers with parame-
ter p. Let Y=min(X, M), where M is an integer. Compute the probability
distribution of Y. Work the same problem where X takes values in the positive
integers.
33. Let X be geometrically distributed with parameter p. Compute the following
distributions: (i) X2 if X takes values in the non-negative integers; (ii) X 2 if X
takes values in the positive odd integers; (iii) X + 4 if X takes values in the
non-negative integers; and (iv) X + 4 if X takes values in the positive odd
integers.
34. Let X be geometrically distributed over the non-negative integers with parame-
ter p, and let K be an integer. Find E(min X, K) and E(max X, K). Hint: Use
E(X)='i.Q(j), Problem 37.
35. With the assumptions of Problem 34, find the distribution of Y=min(X, K).
36. Let X be a random variable satisfying
2x
P(X=x)= N(I+N)' x=I,2, ... ,N.
Show that this is a valid probability distribution, and find the mean, variance,
and p.g.f.
37. Referring to Section 1.11, show without use of the p.g.f. that E(X)='i.Q(j) if
x=0,1,2, ... (compare Problem 23).
38. A random variable has p.g.f. (3+s)/(6-2s). Find the mean, variance, and
P(X=x). Ans. P(x=x)=rx, x= 1,2,3, ... , P(x=O)= 1.
39. In Problem 4, find the mean, variance, and p.g.f. for the following values of X:
(i) 0, 1,2; (ii) 2,3,4; (iii) 1,5,6.
40. Show that the generating function of (i) a x is (I - as) ~ I, (ii) x is s/ (I - s) 2, (iii)
x(x-I) is 2S2/(I-S)3, (iv) x 2 is s(s+ 1)/(I-s)3.
41. The moment generating function is defined to be E(e Sx ). Show that it actually
generates (i.e. has coefficients) the moments divided by factorials. Find a
formula connecting the moment generating function with the p.g.f.
42. Find the mean, variance, and p.g.f. for the following:
(i) The Poisson probabilities assigned to the positive odd integers, i.e.,
e~AA(\-I)/2
(ii) The odd Poisson probabilities normalized to unity over the same odd
values, i.e.,
e~AAX
P(X=x)=C-,-, x=I,3,5, ....
x.
Discrete Probability 47
43. The concept of the p.g.f. of a random variable can be generalized to that of a
generating function of an event, as cI>(s) and 'l'(s) of Section 1.1 I, where the
events in question were X <x and X2::x. Find the generating functions for the
following events, where cf>(s) is the p.g.f. of X: (i) X>x+ I, (ii) X:::;x, (iii) X>x,
(iv) X=2x.
44. Given that i(2s + 1)(1 + s) is the p.g.f. for a random variable X, find (i)
P(X=x), (ii) the p.g.f. for X+ I, (iii) E(X), (iv) the variance of X.
45. Find P( x) and Q( x) for Problems 3 I, 36, and 38.
46. Given a random variable with cf>( s) = ro( s + I )(2s + 3), find the mean, variance,
and P(x).
47. Let cf>(s) be a p.g.f. (i) show that 1/[2-cf>(s)] is also a p.g.f. (ii) If the values of
the random variable corresponding to cf>( s) are the non-negative integers, what
are the values of the random variable in part (i)? (iii) If cf>( s) corresponds to a
random variable X and part (i) to a random variable Y, express P(Y=O) in
terms of the probabilities of X. (iv) Show that E( X) = E( Y).
48. Consider the probability distribution P(X=x), with p.g.f. cf>(s). Define the
exponential probability generating function 1/;( s) by
00 . SJ
1/;(s)= ~ P(X=j)~.
j=O j.
Show that
cf>(s)= lOe-'1/;(ts)dt.
o
49. Random variables X and Yare connected through their p.g.f., and cf>(s) and
I/;(s), by the relation s2cf>(s)=I/;(s). Find the relationships between E(X) and
E( Y) and between the variances.
50. Let Px()..) denote the Poisson probabilities over the non-negative integers.
Define a probability distribution qA)..) [random variable X] by
Find E(X).
51. Let P(X=x)=(1 _p)pX, x=0,1,2, ... and let a distribution qx be defined
by
56. Show by Section 1.13 that the expected value of an indicator is less than unity.
57. Every indicator has a Bernoulli distribution; is the converse true?
58. A fair die is thrown and X is the sum of the number showing on top and on the
face nearest you. Find the distribution of X.
Note: Opposite faces of a die are numbered I : 6, 2: 5, and 3: 4.
Ans. P(X=x) = i, x=5,6, 8, 9; P( X=x)= iI, x=3,4, 10, II.
59. In Section I.7 (geometric distribution) use the "derivative" method, as shown
for finding E( X) to sum the series needed to find E( X2).
60. Nine points are arranged in a square, and three of them are chosen at random.
Two points are "neighbors" if adjacent in a row, column, or diagonal. Find the
distribution of X=the number of neighbors.
61. A coin is weighted so that the probability of heads is ~; it is tossed three times.
Find the distribution of X= the length of the longest sequence of tails observed.
x=o if no tails. Ans. (idt~, B)
62. A boy comes from a family with three children; what is the probability that his
two siblings are of the same sex? Assume equal probability that a child is male
or female. Ans. 4
63. Given a random variable X, with
(i) Sketch the region in the a-b plane for which this is a valid probability
distribution, (ii) find P(2 X - 3 < 5), (iii) find E( X), (iv) sketch the probability
generating function.
64. A die is rolled three times; what is the probability that each result is larger than
the preceding one? Ans. i4
Discrete Probability 49
65. Show that the recurrence equation (25) for Catalan numbers is satisfied if W k is
interpreted as the number of ways a convex polygon of k+ I sides can be
divided into triangles by k - 2 nonintersecting diagonals.
66. A green die and a red die are thrown (together) n times. Show that the number
of times the value on the green die is greater than the value on the red die is a
binomial random variable. Find the parameters.
2
Conditional Probability
Next, suppose a second attempt is made to catch the fish, and the
number caught the second time is the random variable Y. Again, suppose
that the binomial distribution applies to Y, with one parameter equal to the
number of fish remaining after the first trial, 3 - X.
Some reflection on the random variables X and Y should show that Y
has also the possible values 0, 1,2, 3-and yet there are certain combinations
of values which are impossible, for example, X = 3, Y= 3. In other words,
51
52 Chapter 2
if X= 1, then Y=O,I,2,
P(Y=O)=(I-'17' )3
P(Y= 1)=3'17'(1-'17')2
(binomial with parameters 3 and '17').
P(Y=2)=3,,2(1-'17' )
P(Y=3)='17'3
P(Y=O)=(I-'17')2 }
p(Y= 1)=2'17'(1-'17') (binomial with parameters 2 and '17').
P(Y=2)='17'2
p(Y=O)=
P(Y= 1)='17'
1-,,} (binomial with parameters 1 and '17').
P(Y=2IX=1)=1T2,
P(Y=1IX=2)=1T,
p(Y=OIX=3)=1.
[It would also be possible to write down the various possibilities which have
probability zero, for example, P(Y=31 X=3)=O, etc.]
If the expression P( Y = y IX = x) is regarded as a function of the two
arguments x and y, it is clear that the normalization to unity applies to the
first argument only: Summing on x is a meaningless operation.
These 10 probabilities form four separate probability distributions, one
conditional on each of the four possible values of X, but they do not
represent the individual probabilities of the 10 basic sample points which have
positive probability. For example, although the value y=o conditional on
X=3 has probability unity, it does not mean that the joint occurrence of
X=3 and y=o is mandatory. The probabilities of the basic sample points
are denoted by a comma,
P(X=x, Y=y),
in contrast with
P( X=xl Y=y).
There is a great difference between saying that two fish are caught the first
54 Chapter 2
time and none the second time, and saying that two fish are caught the first
time, given that none are caught the second time.
How shall the probabilities of the sample points be calculated? First,
consider the various possibilities arranged in a rectangular tableau: in the
language of probability, a "bivariate distribution" or, in the language of
statistics, a "contingency table." Just as the sample points and their proba-
bilities are listed in the single-variable case, so, in the bivariate case, they are
given in a two-way table. For the fish-catching experiment, the table would
have 16 cells, six of them occupied by zeros, as in Table 2.l.
In thinking about how to fill in the remaining probability values in
Table 2.1, note first that the column totals, i.e., P(X=x), x=O, 1,2,3, are
given by the original assumption as binomial values. This says only that the
probability of catching x fish the first time is the sum of the probabilities of
the constituent sample points, P(X=x, Y=y) summed ony.
Therefore the probability in the lower right cell must be P(X=3)='173.
This makes sense, too, since the event "three fish on the first scoop" must
have the same probability as "three fish on the first scoop and no fish on the
second." Now consider the third column, corresponding to X=2; the total
probability in this column is 3'17 2(1-'17)=P(X=2), and the proportions to
be shared between y=o (bottom cell) and Y= 1 (second cell) are given by
the conditional probabilities P( Y=OI X= 2) and P(Y= 11 X=2). These have
been found to be 1-'17 and '17, respectively. In order to 'divide 3'17 2(1-'17)
into proportions 1-'17 and '17, it is only necessary to multiply by these
quantities, giving P(Y=O, X=2)=3'172(1_'17)2 for the bottom cell in this
column, and P(Y= 1, X=2)=3'17 3(1-'17) for the second cell. These two
values have the required sum for total probability P( X =2) and the right
proportions for P(Y=OI X=2) and P(Y= 11 X=2).
Thus the event "two fish caught on the first scoop and none on the
second scoop" has probability 3'17 2(1- '17)2. The same principle can be
applied to get the probabilities in the first two columns, and to fill out the
joint probability array as in Table 2.2.t
tIn this book we take up and right as positive directions, exactly as in analytic geometry. Other
texts sometimes use down and right.
Conditional Probability 55
where T=7T(I-7T).
That is, Y is also a binomial random variable with parameters 3 and
7T(l-7T). The unconditional distributions of X and Yare often written along
the margins of the bivariate table and are known in statistics as "marginal"
distri bu tions.
In addition to the bivariate and single-variate distributions, there are
also conditional distributions P(X=xl Y=y) corresponding to P(Y=yl X=
x). These are formed by normalizing to unity the row or column specified
by the condition imposed. When Y=2 comes after the vertical bar, it is an
instruction to consider row three and normalize to unity:
P( X = II Y = 2) = 7T .
1-7T+7T 2
56 Chapter 2
p(EnF)
P(EIF)= P(F) (1)
For the special case of random variable values, the formula reads
P(X=x, Y=y)
P(X=xIY=y)= P(Y=y) (2)
Conditional Probability 57
The motivation for these definitions should be understood from the fish-
netting example: On the left side we have an array distribution, and the
right side says that the probabilities in this array distribution are formed by
normalizing to unity (the denominator) the joint probabilities (the numera-
tor). Stated yet another way, the "conditional sample space" is For Y= y,
and dividing by P(F) or P(Y=y) reflects this fact.
It is also clear that the single-variable probabilities should be defined in
terms of array sums of the bivariate probabilities:
Since the joint probabilities P(X=x, Y=y) are symmetric in the two
arguments, the basic definition can be written in the nicely symmetric forms
P(EIF)P(F)=P(FIE)P(E) (3)
or
P(X=xl Y=y )p(Y=y )=P(Y=yl X=x)p(X=x).
As a basis for further work, the student should clearly understand why
conditional probability must be defined in this way, and memorize the
simple mnemonic formula
IAIBB=BIAA, I (4)
from which virtually all more complicated formulas follow easily. It is also
useful to see the intuitive significance of these formulas.
Each of the distributions comes with its own system of parameters,
distribution functions, probability generating functions, and so forth. There
does not seem to be any simple and comprehensive set of notation to
account for it all, and rather than attempt to build up such a (cumbersome)
system, this book will define the various quantities of interest on an ad hoc
basis, often using Pxy to abbreviate P(X=x, Y=y). Thus Px =~yPxy' etc.
A problem often involves the basic definition where two of the factors
are given, say P( E) and P( FIE) and a third is to be found, usually
P( ElF). Solutions clearly involve finding P( F) first. A little reflection will
58 Chapter 2
show that this is possible, for the quantities given permit the calculation of
the bivariate probabilities P(EIF), and so P(F) is obtained as a row sum.
P{GIA)= 1, P{SIA)=O,
p{ GIB)=-L P{SIB)=1,
P{ GIC)= i, P(SIC)= ~,
P( GID)=~, P(SID)=~
There are many problems of this general tenor, which the student
should work until the basic procedure is familiar. In the following example,
the problem is an industrial one, but the situation, as far as it relates to
probability, is virtually identical.
p(Y=y, X=x)
p(Y=YIX=x)= p(X=x)
( _ _)_ P(X=xIY=y)P(Y=y)
P Y-YIX-x - P(X=x)
P(X=xIY=y)P(Y=y)
P ( Y=yIX=x ) = - - - - - - - - - (5)
~p(X=xIY=y )P(Y=y)
y
variable. There is also a version of Bayes' theorem, with the same meaning
and the same proof, for events not necessarily involving random variables.
For example, in the gold/silver coin problem, Bayes' theorem would read
. P(GIA)P(A)
P(AIG)= P( GIA)P(A)+ P( GIB)P(B)+ P( GIC)p( C)+ p( GID)P(D) ,
representing exactly the calculations needed to obtain the bivariate table
from the given probabilities, and then the answer from the bivariate table.
It may be surprising how often practical problems involve being given
P(EIF) and P(F), rather than some other probabilities; the existence of
Bayes' theorem reflects this fact. Although Bayes' theorem permits the
calculation of the "inverse" probabilities without writing out the entire
bivariate table, it is more sensible and orderly to do the latter, and the
actual calculations usually prove to be identical, as the derivation of the
theorem shows.
Sometimes the events which condition probabilities are more subtle
than obvious random variable values. In the two following examples-both
of which will assume importance later in this book- the nature of the
conditioning may not at first be clear.
Problem 1 (Stick Problem). There are two sticks, one a foot in length and the
other two feet in length. What is the probability distribution for the length of a
stick? If a stick is chosen at random, the result P( X = I) = 1, P( X = 2) = 1 would be
obtained, where X=length of stick. On the other hand, by choosing at random a
point on a stick, the result P(X= 1)= t, P(X=2)= ~ would be correct. Actually the
difference in the two results comes from an implicit assumption about how the
experiment is performed. Some authors call the situation a paradox, although there
is certainly no reason to do so. Using the concepts of conditional probability, it
appears that the definition of X is not the same in the two cases.
Problem 2 (Family Problem). A family has two children, of which one is a boy.
What is the probability that the other is a boy? In this formulation, it appears that
the answer must be one-half, since the other child could equally well be boy or girl.
But, like the stick problem, there is a curiosity about the way the problem is
phrased. It refers exclusively to families with one boy and another child, not to all
families. If families with two children have equal probabilities of boys and girls,
then the sex distributions which are equally likely would be
and the problem refers only to the first three kinds of families. Conditional on this,
the probability of another boy is one-third. In order to get the answer one-half, it
Conditional Probability 61
would be necessary to ask the question not in terms of families, but in terms of
boys: If a boy has one sibling, what is the probability that it is a boy? Then the two
middle cases are indistinguishable, and the answer is one-half.
2.3. Conditioning
and this is obvious in any case from the fact that P( X = x) is an array sum.
The validity of the formula depends, however, on the fact that all values of y
are represented, in other words, that the values are mutually exclusive and
exhaustive of the sample space. The same is true of any mutually exclusive
and exhaustive set of events, say E I' E 2 , ... , En: If P( Ei n E) = 0, all i, j,
then
+ P(Y=yl X=2)P(X=2).
62 Chapter 2
E(Y)= ~ jp(Y=j)
j=O
00 00
= ~ j ~ P(Y=jIX=i)P(X=i)
j=O i=O
00 00
=~ ~ jP(Y=jIX=i)P(X=i)
i=O j=O
00
= ~ E(YIX=i)P(X=i), (8)
i=O
with similar expressions for other mutually exclusive and exhaustive events.
This last formula is often written
E(Y)=E(E(YIX)).
This means treating E( Y IX) as a random variable with an expectation.
Although the attitude towards E( Y IX) is hardly of importance, it would be
well to clarify it and the meaning of the formula by referring once again to
the fish-netting example. Treating E( Y I X= x) in this example as values of a
Conditional Probability 63
random variable 3'77, 2'77, '77, and 0 and probabilities (1-'77)3, 3'77(1-'77)2,
3'77 2(1-'77), and '77 3 would lead to exactly the same calculation for E(Y) as
performed above.
+ p( A3IA2B1)P(A2B1)
p(EnF)=p(E)P(F),
P(E)=P(EIF), (12)
P(F)=P(FIE).
may happen in any order (and with the same probability), the factor (~) is
supplied. 0
(n+Y-l)
y (1 - 'TT r'TT n- 1'TT , y=0,1,2, ... ,
. n ( e ) x(
11m( x) L
e ) n - x ex. (n) ( n - I )
I- L = x! 11m L ---y;- ... ( n - Lx + I )( e ) LA - x
I- L
Z )n
e = lim
Z ( 1+-
n---+oo n
Each of the five kinds of distributions which have so far been discussed
in connection with two random variables carries with it the entire structure
of moments, distribution functions, and generating functions defined in
Chapter 1. In thinking about the various quantities and functions involved
even when there are two random variables, let alone n random variables, it
appears that a very complex set of notation would be needed to describe
everything systematically. It is probably safe to say that there does not exist
any such fully comprehensive set, and if one were devised, the tangle of
symbols would obscure rather than reveal important relationships.
In this section, some of the more useful relationships are illustrated,
together with typical notation. In the remainder of the book, various
quantities and functions will be defined according to the need to clarify the
problem at hand, without attempting an overall system.
Means:
2, i iP(X=i, Y=y)
E(XIY=y)= ~iP(X=iIY=y)= ,
I
. p(Y=y)
etc.
Variances:
With two random variables, three closely related quantities are often
used to measure the degree of association between the variables:
When
(XY)=E(X)E(Y),
the variables are said to be uncorrelated. The student can verify that
independent variables are uncorrelated, and perhaps construct a counterex-
ample to show that the converse is not true.
for the single-variable marginals, it is obvious that </>(s, l)=a(s) and </>(1, t)
= f3( t). Also,
a2asat
(s,t)1 _ ( )
- XY . (15)
s=t=1
70 Chapter 2
Distribution Functions
Defining the bivariate distribution function by
G(x)=P(X<x),
H(x)=P(Y<x ),
since the events (X < x ) n (Y <y) and (X?:: x) n (Y?:: y) do not exhaust the
sample space.
Notation. Some authors prefer to keep the same letter for all distribu-
tion functions, indicating the random variable as a subscript, thus:
Fx. y(x, y), Fx(x), Fy(Y), and so forth. This system is also often used for
other functions, such as the probability generating function and even the
basic exact probabilities, as noted in Section 1.4. As with most notational
issues, there are advantages and disadvantages to any system, and both
seem to sharpen as the number of variables increases. Fortunately, the
problems addressed in this book require only ad hoc notation.
for X) and values of Y for the vertical sums, so the probabilities for v:tlues
of Z are found in diagonal sums. The value Z = occurs only when both
x=o and Y=O, i.e., the probability in the lower left corner, (l_'TT)6. To
Conditional Probability 71
Similarly,
and
then
.
P(X+ Y=z)= ~ hZ-i' (16)
)=0
where the values in the summation depend on the possible values of X and
Y. However, even if X and Y have an infinite number of values, the sums
(being diagonals) are each still finite, although there are an infinite number
of such sums.
Once the significance of the diagonal sums as probabilities is appreci-
ated, the fact can be used in other ways, for example, in constructing the
bivariate table.
P(X=x Y= )=e f\
-~"\x+Y( +
x Y)2-X-Y
, Y (x+y)! x
In this example, it is clear from the problem (or from finding the
marginals and multiplying) that the random variables are independent, so
that formula (16) can be written
z
P(X+ Y=z)= ~ Pjqz-j' (17)
j=O
Distribution: Convolution;
E(X+ Y)=E(X)+E(y),
( 18)
var(X+ Y)=var(X)+var(Y)+2cov(X, Y).
Since the left side of the identity is positive for all values of a and b, so must
the right side be always positive. This implies that zeros of the quadratic
form must be complex, and therefore the discriminant must be negative:
e-(HI')(;\+JLr
z!
In the asterisk notation, this fact could be written
(
e-A;\X )2* = e- 2A (nr
x! x!'
with the obvious extension to n*. With this system, the derivation of the
binomial distribution from the Bernoulli could be written
['lTX(I_'lT)I-X]Il*=(~)'lTX(I_'lT)Il-X,
x=O,
x==O,
and
x-I x-j-I
= ~ ~ p;qj'
)=0 ;=0
x k-I
P(X+ Y<x)= ~ ~ P;qx-k
k=1 ;=0
x
~ P(X<k)P(Y=x-k).
k=1
the diagonal sum. These distributions are conditional on the sum of the two
random variables being fixed.
For example, in the familiar fish-netting experiment, suppose the total
number of fish caught in both scoops is two. Then with a little calculation,
the new distribution is obtained:
(1 _'IT)2
p(X=OIX+ Y=2)=P(Y=2I X+ Y=2)= 2'
(2-'IT )
2(1 -'IT)
P(X= IIX+ Y=2)=P(Y= IIX+ Y=2)= 2 '
(2-'lT)
I
p(X=2IX+ Y=2)=p(Y=OI X+ Y=2)= 2.
(2-'lT)
The student should verify this result and find the other diagonal distribu-
tions for this example, with the general result
Pxy=P(X=x, Y=y).
qn(x)=p(X=xIX+ Y=n),
satisfy
(19)
as follows:
00 k
= ~ ~ qx(k)cksXt k- x
k=Ox=O
00
= ~ lh(s/t)Ckt k.
k=O
00
00
(22)
The value of bo can be found by setting s= 1 and then Eq. (22) becomes
00
~ 1f1k(S)[ adqk(k)]
a( s ) = -"k_=-'-O-,-oo_ _ _ _ __ (23)
~ [adqk(k)]
k=O
A 7T
(25)
/L l-7T .
It is left as an exercise for the student to show that the converse is true.
Diagonal distributions for independent Poisson variables are binomial, with
parameter satisfying Eq. (25).
When the diagonal distributions are rectangular (and the random
variables are independent),
l-sk+1
1f1k(S) = (I+k)(l-s)
tIt is possible to prove that only this value can be obtained from the functional equation (24);
see, for example, ACZEL, 1. (1966), Lectures on Functional Equations and Their Applications,
Academic Press, New York, p. 67 (Theorem 2).
80 Chapter 2
l-sa(s)
a(s)= [l+E(X)](I-s)'
E(X)
p= l+E(X)'
It must be emphasized that these results depend on the assumption of
the independence of X and Y; indeed the fish-netting example shows a
binomial diagonal distribution where the marginals are not Poisson.
The most important application of diagonal distributions occurs in
modeling "before-and-after" studies, especially in connection with the oc-
currence of accidents.t Let X be the number of accidents before a supposed
improvement is made, and Y, the number of accidents afterwards. If the
improvement affected hundreds or thousands of road segments with greatly
differing usage characteristics, it might be desirable to "standardize" the
analysis by considering together all those segments for which X + Y is fixed.
Then the distributions of X and Y separately would enable the experimenter
to draw conclusions regarding the effectiveness of the supposed improve-
ment. Furthermore, this could be done without assuming that X and Yare
independent.
t Thismodel has been used in practical analysis of accident data: see ERLANDER, S. (1971), A
review of some statistical models used in automobile insurance and in road accident studies,
Accident Analysis and Prevention, Vol. 3, No. I, pp. 45-75, and especially the comprehensive
reference list.
Conditional Probability 81
CHUNG, KAI LAI (1974), Elementary Probability Theory with Stochastic Processes, Springer-
Verlag, New York ..
FELLER, WILLIAM (1968), An Introduction to Probabilitv Theory and Its Applications, Vol. I,
third edition, John Wiley and Sons, New York.
GNEDENKO, B. V. (1968), The Theorv of Probability, fourth edition, Chelsea, New York.
2.9. Problems
Unless otherwise specified, it is assumed in this section that the dice, coins,
selection of balls, etc., are unbiased, that is, that each possible result has the same
probability. The first 23 problems give practice in setting up a bivariate probability
table. The same problems can be used at the instructor's discretion to find
properties of these distributions: expected values and other moments, covariance
and correlation, generating functions. These can be obtained for marginals, condi-
tional distributions, diagonal distributions of both types, and also various probabil-
ity questions such as P(XIY), P(XY odd), E(max(X, Y, var(XIY=y), and so
forth. Independence can be investigated. For extra variety, and simplicity, dice
problems can be interpreted as referring to tetrahedral dice, with sides I, 2, 3, 4.
Some of the problems can be made easier by specifying a value of n, others more
difficult by replacing a given integer by n.
10. Two balls are selected from a box with balls labeled 1,2,3; X=number on first
ball selected, Y=number on second ball selected (no replacement, of course).
II. A red and a blue die are thrown, with X=score on the blue die, and Y=larger
score.
12. X, Y are as in the fish-netting experiment, but with the basic distribution
rectangular rather than binomial.
13. Given the fish-netting experiment with n fish and m scoops, show that the
expected number of fish caught decreases with each successive scoop.
14. Three dice are thrown together, the highest number X noted, and that die put to
one side. (If two or more show the same number, the throw is invalid, and the
experiment repeated.) The remaining dice are thrown until they show different
numbers, and the larger number is Y.
15. X has probabili ty generating function i (2 s + 1)( s + I), Y conditional on X = x is
rectangular with values x, x+ I, x+2.
16. Cubical dice are painted in an
unorthodox manner: I, I, I, 3, 3, 5 on the sides.
Two are thrown, with X=sum, Y=lesser value.
17. A coin is flipped repeatedly; X = number of flips needed to get the first head,
Y=number of flips needed to get the first tail.
18. Three red and two green balls are in a bag. They are taken out one by one until
both green balls have been chosen. X=number of balls before the first green
one, Y=number of additional balls before the second green one.
19. Let P(U=x)=(I-p )p\ x=O, 1,2, ... , and let n be a positive integer. X=
max(U, n), Y=min(U, n).
20. A die is rolled; X=number on top, Y=number on side most nearly facing you.
(Note: The dice are painted so that the total of the numbers on opposite faces is
seven.)
21. Balls numbered 0, 1, 1,3,3,6 are placed in a bag and two are drawn; X=larger
number, Y=total.
22. A bag contains three white and two red balls. Two are drawn consecutively,
with X=number of white balls on first draw, Y=number of white balls on
second draw.
23. ~P(X=x)sX =exp( -,\ +'\s), P(Y=yl X=x)=(1 + x) -I, y=O, 1,2, ... , x.
24. Two (unequal) digits are taken from the set 1,2, ... ,9, with every pair having an
equal chance of being selected. (i) If the sum is odd, what is the probability that
one of the digits is 2? (ii) If one of the digits is 2, what is the probability that the
sum is odd? Ans: (i) +0, (ii) i
25. There are N coins in a box, of which n are normal and N - n are double-headed.
A coin is selected at random and tossed r times, each time coming up heads.
What is the probability that it is normal? Ans. n/[n + N( N - n )2"]
Conditional Probability 83
26. In a certain factory, machine A produces p percent of the output, and x percent
of its production is defective. Machine B produces 100- p percent of the output,
and y percent of its production is defective. What is the probability that a
randomly selected defective item came from machine A?
27. Box A contains three green and two yellow balls; box B contains one yellow and
two green balls; box C contains one green and three yellow balls. (i) One box is
selected at random and a ball drawn at random from it. What is the probability
that the ball drawn is yellow? (ii) If a yellow ball is obtained, what is the
probability that it came from box C? Ans. (i) ~,(ii) ~
28. Die A has four red and two white faces, whereas die B has two red and four
white faces. A fair coin is flipped; if it falls heads, the game continues by
throwing die A alone; if it falls tails, die B is used. (i) Show that the probability
of red at any throw is !. (ii) If the first two throws result in red, what is the
probability of red at the third throw? (iii) If red turns up at the first n throws,
what is the probability that die A is being used?
29. X is binomial with parameters n and 'IT; Y conditional on the value X=x is
binomial with parameters x and (1.
30. In searching for a penny, it is known that it is hidden in one of three places, and
it is equally likely to be in anyone of them. The probability of finding the
penny if it is in place x (x= 1,2,3) is Px. Suppose a search is made in place one,
and the penny is not found. What is the probability that it was there?
31. Twice as many women use the library as men, but of those using the library,
women check out an average of two books per visit and men, an average of
three. (i) Find the average number of books checked out by a library user. (ii) If
a person checks out three books, find the probability that that person is a man,
assuming that the number of books checked out (for either sex) is geometric
with positive integral values. Ans. (i) t, (ii) ~
32. Five boxes contain black/white balls as follows: Box A, 2/2; box B, 3/1; box
C, 3/2; box D, 0/1 box E, 1/2. If a ball is chosen from each box, what is the
expected number of white balls obtained? If a box is selected and then a ball
chosen, what is the probability that it is white? If a black bail is obtained, what
is the probability that it came from box 4?
33. Consider the formula P(EIF)P(F)=P(FIE)P(E) for a two-by-two table: E,
E', F, P. Bayes' theorem shows that it is sufficient to be given P(EI F), P(F),
and P( E I P) to determine the whole table. Find counterexamples to show that
it is not sufficient to have one factor from each side of the equation given.
34. Box A contains one red and one yellow ball; box B contains one yellow and two
red balls. A box is chosen and a yellow ball extracted; what is the probability
that it was from box A?
35. Suppose there are four boxes, labeled A, B, C, and D. A ball is chosen at
random from box A, which contains initially six balls labeled B, three labeled C,
and three labeled D. The letter drawn tells which box is to be used for the
84 Chapter 2
second drawing. Box B contains five green and five yellow balls, box C contains
four green and six yellow balls, and box D contains two green and eight yellow
balls. Given that the second ball drawn is green, what is the probability that box
B is being used? Are the events "first ball C" and "second ball yellow"
independent?
36. A fair die is thrown until two consecutive results are the same. What is the
expected number of throws?
37. A fair coin is tossed until the same side appears twice in succession, so that
I /2,,-1 is the probability of every result that requires n tosses. Let E be the
event that the experiment ends before the sixth toss, and let F be the event that
the experiment ends on an even number of tosses. (i) Find P( E) and P( F). (ii)
Show that E and F are independent. (iii) If G is the event that the experiment
ends before the fifth toss, are F and G independent?
38. In an investigation of animal behavior, rats have to choose between four similar
doors, one of which is "correct." If an incorrect choice is made, the rat is
returned to the starting point, and made to choose again, this continuing until
the correct response is made. The random variable X is the trial on which a
correct response is first made, with possible values 1,2,3, .... Find P(X=x)
and E( x) under the following hypotheses. (i) Each door is equally likely to be
chosen on each trial and all trials are mutually independent. (ii) At each trial,
the rat chooses with equal probability between doors which have not yet been
tried, no choice ever being repeated. (iii) The rat never chooses the same door on
two successive trials, but otherwise chooses at random with equal probabilities.
Ans. Expected values: (i) 4, (ii) 1, (iii)
39. Suppose X balls are distributed at random into n boxes, where X is Poisson
(over the non-negative integers) with parameter A. Let Y be the number of
empty boxes. Show that Y is binomial with parameters n and e -AI".
40. Two dice are thrown n times. Show that the number of throws in which the
number on the first die exceeds the number on the second die is binomially
distributed with parameters n and n.
41. A pair of coins is thrown. (i) What is the distribution of the number of throws
needed for both coins to show heads? (iii) What is the distribution of the
number of throws needed for at least one coin to show a head?
42. Let X be the total showing on a single throw of n dice. Find E( X) and var( X)
as a function of n.
43. In a sequence of Bernoulli trials withp=P(S), letpx be the probability that the
combination SF occurs for the first time on trials number x- I and x. Find the
generating function, the mean, and the variance. A ns. Mean = [p (I - P )1- 1
44. Two players try alternately to obtain a success in a game where the probability
of success is p. Show that the probability that the first player wins is (2 - p) - 1.
Generalize to n players.
Conditional Probability 85
45. A game between two players, A and B, consists in them taking turns playing a
machine until one of them scores a success. The first to score a success is the
winner. Their probabilities of success in a single play are p for A and q for B.
Since B is the better player (q> p), he allows A to have the first tum. All plays
are independent. (i) Show that the game is fair if and only if q=pj(l-p). (ii)
Show that if A wins, the average number of plays he takes in which to win is
(p+q-pq)-'.
What is the value of this probability for x=O? (ii) Show that U is geometri-
cally distributed with parameter I-(l-p?
(iii) Show that
2p ,
P(V-U=x)=-2-(I-p), x=I,2,3, ....
-p
What is the probability for x=O?
(iv) Show that U and V - U are independent.
53. X and Yare geometric with parameter p. Let U= Y- X and V= min( X, Y). (i)
Show that
P( X=v-u )P(Y=v), u<O,
{
p(U=u,v=v)= P(X=v)P(Y=u+v), u?:O.
54. X and Y have probability generating functions q>(s), I/!(s); show that P(X- Y=
x) is the coefficient of SX in the expansion of q>(s)I/!(I/s) in powers of s, x=O,
1, 2, ....
55. Let X be geometric over the non-negative integers. (i) Show that for integers
y:5x
P(X:5xIX~y )=P(X:5x-y).
(ii) Show that the geometric distribution is the only distribution over the
non-negative integers with this property. This property is called the memoryless
property of the geometric distribution, and can be expressed by saying that
truncating the geometric distribution (omitting the first k probabilities and
normalizing the remainder to unity) yields the same distribution. The memory-
less property becomes important in Chapter 5.
56. In a certain restaurant, 90% of the customers do not smoke and 10% do. Let
X=number of the customer who is the fifth smoker. Write down P(X=x).
57. On the student council there are four undergraduate men, six undergraduate
women, and six graduate men. How many graduate women must be appointed
to the council if sex and graduate status are to be independent?
58. Show how the derivation of the binomial and negative binomial distributions
can be simplified by using indicator random variables.
59. A coin is weighted so that the probability of heads is one-fourth. The coin is
tossed four times and X=length of the longest string of tails which occurs. Find
P(X=x). Ans. P(X=3)=-m
60. A rodent is placed in a cage with three doors. The first door leads to food after
three minutes of travel. The second door returns the creature to his starting
point after 5 minutes of travel, while the third returns him to the starting point
after 7 minutes of travel. What elapsed time before reaching the food would be
consistent with the hypothesis that the rodent is choosing doors at random?
Ans. 15 minutes
61. Referring to a joint distribution function F( x, y) show that
p( XI :5X<x 2 , YI:5 Y<Y2)=F(x 2 , Y2)- F( XI' Y2) - F(X2' YI)+ F(xI' YI)
62. A slot machine works by inserting a coin. If a player wins, the coin is returned
with another coin, otherwise the original coin is lost. The probability of winning
is arranged to be one-half, independently of previous plays, unless the previous
play was a win, in which case the probability of a win is p < -!. Show that if the
cost of maintaining the machine is c coins a day, then, in order to be profitable,
the owner must choose p so that it satisfies the inequality
1-3c
p< 2(I-c) forc<1
Conditional Probability 87
63. The spores of a certain plant are arranged in sets of four in a linear chain (with
three links). When the spores are ejected from the plant, each link has a
probability /3 of breaking, independently for each link. For example, if all links
break, four groups of one spore each are obtained, whereas if no links break, a
single group of four spores results. Find the probability of a group containing X
spores, and show that E( X) = I + 3/3.
64. Express var( X) and var( Y) in terms of the probability generating function
E(sxty).
65. For a certain species of animal, the probability of a litter of size X is Poisson
with values in the positive integers, i.e.,
x=I,2,3, ....
The probability of a male birth is p and that of a female birth, I - p. What is the
probability that a litter is "matable," that is, that it contains at least one male
and one female?
66. The number of people arriving at a library per hour is Poisson with parameter 5.
An arrival is equally likely to be a man or a woman. What is the conditional
probability that at most three men arrived, given that five women arrived?
Ans. (l/120e5):~:~~5(1)j/(j-5)!
67. Show that the correlation coefficient must lie in the interval [-I, + I].
68. Verify the last formula in Section 2.7 for the geometric distribution.
69. X is Poisson with parameter h; Y conditional on X=x is binomial with
parameters x and 'IT. Show that Y is Poisson with parameter h'IT.
70. Using the notation of Section 2.8, show that X and Yare equidistributed (same
probabilities with the same parameters) if
for all k,
that the expected winnings (and hence the "fair entry price") is infinite.
[Note: This is considered a "paradox" in the sense that an infinite price is to be
88 Chapter 2
paid for a finite reward. It would be more appropriate to say, however, that the
"wrong" question is being asked. In a distribution with an infinite mean, a
person is asked to choose a finite E( X). Alternatively, one might remark that a
finite value for a random variable with an infinite expectation is hardly
paradoxical, since the expectation is a weighted sum.]
3
Markov Chains
89
90 Chapter 3
Table 3.1. The instantaneous number of successes, Xn , vs. the accumulated number
of successes, Yn
S S S F S S F F F S F F F F S F
XII 1 0 1 0 0 0 1 0 0 0 0 0
Yn 2 3 3 4 5 5 5 5 6 6 6 6 6 7 7
Markov Chains 91
x=o, 1, ... , n.
The experiment could be rephrased as follows: Consider a particle
which starts at the origin and moves one step to the right with probability 'TT
and remains stationary with probability l-'TT. Each change of the system is
called a transition; in this chain only two types of transitions are possible:
from state x to state x + 1 (with probability 'TT) and from state x to state x
(with probability l-'TT). Such a chain is called a random walk (on the
positive integers); it is a particular kind of random walk, in which steps to
the left or steps greater than one are impossible.. In terms of conditional
probability, the random walk is defined by the equations
P(Xn+1 =x+ 11 Xn =x)='TT,
3.2. Definitions
This says simply that the (n + 1)st probability distribution conditional on all
preceding ones equals the (n + 1)st probability distribution conditional on
the nth for n=O, 1,2, .... Note: This definition includes the case of an
independent sequence.
The definition relates each random variable in the sequence to the
preceding one, and so to complete the definition of the chain it is necessary
to specify a distribution for the zeroth random variable Xo' This will be
denoted by (a O,a l ,a 2 , ), that is,
When the initial state of the system is given (without any probabilistic
component), P( Xo = x) = 1 for some particular x.
The transition probabilities are of tent abbreviated Pxy' where
tNote that the letters x and y occur in typographically reverse order on the two sides of the
equation. Older textbooks often straightened this out by making p", the probability of a
transition from v to x.
Markov Chains 93
P( Xn = II Xn - 1 = I) = f3.
(2)
gambler's fortune after the nth game. It is assumed that he wins or loses one
unit each time he gambles, so that A+J.L= 1, and once his fortune is lost, the
process terminates, a = O. Depending on the circumstances, the game may be
fair (A=J.L) or not, the gambler may stop after accumulating a prede-
termined fortune N (absorbing barrier at N) or he may continue indefinitely
(infinitely rich adversary). As a modeling problem, it is certainly reasonable
to use the Markov model, that is, to assume that his fortune after the nth
game depends only on his fortune after the (n - 1)st game and the proba-
bility of winning. t
Two kinds of limiting probability distributions are especially im-
portant; (i) the equilibrium probabilities (denoted by 'lTx )'
lim p( Xn =x ) ='lTx ,
n~oo
which, being limiting values, mayor may not exist, and (ii) the stationary
probabilities (denoted by vx ),
P(Xn =x)=P(Xn+ 1 =x )=vx'
which also mayor may not exist.
To give an example,:j: consider the following problem: A gentleman
owns three suits, green, red, and blue. He wears them on successive days
according to the following scheme: If he wears the green suit one day, he is
equally likely to wear it or the red suit the following day; if he wears the red
suit one day, he never wears it the following day, but is equally likely to
wear either of the other two; finally, if he wears the blue suit one day, he
never wears the red suit the following day, but is equally likely to wear
either of the other two.
With stationary probabilities vG ' V R , VB' Eq. (6) of Section 2.3 shows
the way to the following equations:
tThe student may be able to think of circumstances in which this would not be a reasonable
assumption.
*Whimsical examples are often used in introducing Markov chains because most realistic
examples involve substantial calculations. However, when the states of the system are labels
rather than numbers, some fundamental tinkering with the definitions of random variable or
Markov chain is needed for consistency.
Markov Chains 95
P(XI=X)= ~ p(Xo=j)P(Xj=xIXo=j)
j=O
00
= ~ a)Pjx
)=0
But the calculations leading to distributions for the other random variables
in the chain P( XII = x) can often become difficult algebraically.
To examine systematically the various distributions implicit in the
definition of a Markov chain, it is highly desirable to use the notation (and
properties) of matrix and vector.
The student familiar with matrix theory will recognize in the calcula-
tions of Section 3.2 the elements of matrix transformation of vectors. The
formulation in terms of matrices makes the general results more compact
and assists in the proof of theorems. In the present section, a brief review
outline of the necessary portions of the theory is given, together with a
theorem important in Markov chain analysis. The matrices are assumed to
be square, but not necessarily finite.
96 Chapter 3
AB= {a xy } {b xv } = { ~ axAy} ,
J
where axy denotes the element in the x th row and y th column of A, and
{ a xv} is the matrix of which a xv is a typical element.
Vectors are of two types, row vectors and column vectors; vectors are
not necessarily finite. A row vector is an ordered sequence of numbers
written in a row; a column vector is an ordered sequence of numbers written
in a column. The numbers are called components of the vector. Vectors are
equal only if both are of the same type and have equal components. A
probability vector has all components zero or positive, with the sum of the
components equal to unity. (Thus the rows of a stochastic matrix are
probability vectors.) Vectors are added by adding corresponding compo-
nents and then only if both are of the same type, with the same number of
components.
Vectors are multiplied by matrices only if the number of components of
the vector equals the number of rows (column) of the matrix, and then by
means of the formula
LVja 1j
Av= LVj a 2j ,
Mo I
Mo(1-alk)+rnOalk
(a;, a" ) rno
Mo(1-a lk )+rn Oa2k
a rl a rr
Mo(1-ark)+rnOark
Mo
where k is the number of the element of w having the value rno' Thus every
element of Aw is of the form
where of course the value of a is different for each component, but always
with a2::f3. Thus every component of Aw is $.Mo -f3(Mo -rno), and since
no component of Av can exceed a component of Aw, the largest such
component, M, must satisfy
and this inequality is stronger than (i). Applying the same argument to the
vector -v gives the corresponding inequality
which establishes (ii). The result (iii) is obtained by adding the two inequali-
ties above. 0
Note that the proof breaks down in the case of an infinite matrix
simply because it could happen that no least element would exist. It is left
as an exercise for the student to discover why the condition is imposed that
the matrix have no zeros.
It is also interesting to note that the calculation of the probabilities
P( XI = x) given in Section 3.2 is exactly equivalent to the multiplication of
the transition matrix by the initial row vector.
Markov Chains 99
IL
l-A-IL A
P=
IL l-A-IL
l-A-IL
A
IL A
I I
2: 2:
I
p= I
2: 2:
I I
2: 2:
17 2
p= (1-17)2 217(1-17)
0 0 1-17 17
In these examples, the initial vector would be determined by the nature
of the hypothesis. For Example 1, the random walker could be assumed to
start n units away from the origin:
a=(i,O,O,O).
With the formulation as transition matrix and initial vector, the calcu-
lation of probabilities for the random variable XI can be seen as vector-by-
matrix multiplication:
that is, aP is a row vector with x th component P( XI = x). Thus, also, the
stationary vector v satisfies the equation
vP=v.
Markov Chains 101
These equations are written out in full in Section 3.2 for the colored-suit
example, and the student should confirm that they correspond to the matrix
given above.
In general, the stationary probabilities, although simply represented,
are easy or difficult to compute, depending on whether or not the equations
vP=v are easy or difficult to solve. The equilibrium probabilities, on the
other hand, present further difficulties, which will be discussed in the next
section.
The examples given in this section slide over the question of state
space, mainly because the non-negative integers are assumed to correspond
naturally to the rows and columns of the transition matrix. Where some
other state space is intended (as in the colored-suit example), it is, strictly
speaking, necessary to label the rows with the states to which they are
supposed to correspond. Thus a given stochastic matrix could apply to two
different Markov chains, just as in Chapter I the same set of probabilities
could apply to different random variables. In Example 3, the precise way of
writing the matrix should be
G 1: I
"2 o
R I
"2 o 1
2
B I
"2 o I
"2
because there could be defined another Markov chain with state space 1,2,
3 and transition matrix
T
I
"2 0
2 12 0 I
"2
3 1: 0 I
"2
(3)
which is exactly the formula for matrix multiplication. Let this probability
be denoted by p~~. Then the matrix {p~~} is just the square of the matrix
{PXy}' This can be extended by the same argument to higher-order transi-
tion probabilities. If p~~) denotes the probability of a transition from state x
to state y in exactly n steps, then {p~~)} = pn. Thus the matrix multiplication
formula
(4)
can be interpreted for the Markov chain transition matrix as stating that the
probability of a transition from state x to state y (where x and yare
arbitrary) in m + n steps is the same as the probability of an m-step
transition to some intermediate state followed by an n-step transition to y.
In this context, the matrix multiplication equation is called the Chapman-
Kolmogorov equation for the Markov chain. This equation will be important
in the sequel, and it is important to realize that the proof of the Chapman-
Kolmogorov equation consists of nothing more than the identification of
the matrix multiplication formula with the probabilistic description of the
Markov chain.
If there is an integer n such that pn consists entirely of positive
(nonzero) elements, then the matrix and the Markov chain it represents are
called regular. In a Markov chain with a regular transition matrix, every
state must be accessible from every other state, and in the same number of
steps. It would be impossible in a regular Markov chain that there could
exist two states x and y with p~~) = 0 for all n. Also, regular chains could not
have states x and y accessible only in an even number of steps, because if pn
consists only of positive elements, so will higher powers of P. Finally,
regularity is incompatible with an infinite matrix such as the one char-
acterizing the random walk, since it is impossible that every transition can
occur in a finite number of steps, and n must be a finite integer.
Thus regularity is a rather stringent condition on the Markov chain,
and when it is satisfied, produces quite simple general results. The following
Markov Chains 103
theorems state in essence that for finite regular chains, the equilibrium
probabilities exist, and are equal to the stationary probabilities.
n=I,2,3, ... ,
As n~ 00, 1:::./1 ~O, and pnuj , thejth column of pn, approaches a column
vector with all components equal.
Case II: Zerosin P. Let N be such that pN has no zeros, and let f3 N be
the smallest (nonzero) element of pN. For matrices pkN which are multiples
of pN, case I of the theorem holds, and therefore the nonincreasing
sequence I:::. n contains a subsequence I:::.kN which approaches zero. Thus I:::. n
approaches zero and the theorem is proven. 0
Proof. Direct multiplication shows that all = 'IT; hence apn ~ 'IT. Fur-
thermore, if u is a stationary vector (such that uP=u), then, since upn ~'IT
and u pn = u, u= 'IT so that 'IT is the unique stationary vector of the Markov
chain. The proof is completed. 0
The proofs given above apply only to finite regular chains, but the
theorems remain true if the condition of finiteness is omitted. Since the
proofs shed little light on probabilistic problems, they are omitted.
These results show that the long-range behavior of regular chains can
be determined relatively easily by calculation of the stationary vector. The
104 Chapter 3
calculation itself may present difficulties, but these difficulties are algebraic
rather than probabilistic, and are not discussed in this volume.
When a Markov chain is not regular, the situation is somewhat more
complicated and can most easily be described after some preliminary
definitions.
from which it is not possible to infer a unique solution, even by adding the
normalizing condition Vo +v I = 1.
In general, a reducible chain has a matrix P which can be decomposed
into two sub matrices PI and P2 thus:
Markov Chains 105
The number of separate blocks can be greater than two. It is clear just by
looking at this matrix that the Markov chain really consists of two Markov
chains which do not relate to one another. Therefore it is sensible to
consider each separately, with some probabilistic scheme to determine
which chain is operative.
Sometimes, by renumbering the states, a reducible chain is a little more
difficult to spot. Consider the matrix
I I
0 2 0 2 0 0
I I
2 0 0 2 0 0
I I
0 0 0 0 2: 2
I I
2: 2: 0 0 0 0
I I
0 0 2 0 0 2
I I
0 0 2: 0 2: 0
By tracing the probabilities, the student will confirm that this corresponds
to the ball-throwing story above, with three boys and three girls.
Reducible Markov chains can be exactly defined by introducing the
concept of a closed set of states.
From this point onwards, it is assumed that all chains are irreducible.
106 Chapter 3
(~ 6)'
Obviously this chain has stationary vector (-L 1:), giving an example of a
Markov chain which is not regular, and yet which has a stationary vector.
Many periodic chains have this property. One of the most important such
chains is discussed in detail in Section 3.11.
Most of the results in the remainder of this chapter are given for
aperiodic (that is, not periodic) chains, because the main principles can be
illustrated on aperiodic chains.
The proof of this fact consists essentially in observing that each return to
the state is an independent event.
Recurrent states are further classified according to whether their mean
recurrence time is finite or infinite. Probability distributions with infinite
Markov Chains 107
means were briefly introduced in Section l.6; this is the first serious
encounter with such distributions. Suppose a chain starts in state x. Let Y
denote the number of steps before the chain is again (for the first time) in
state x. Then Y is a random variable defined over the positive integers. Let
and
00
JL{x)= ~ jfx{j).
j=l
Theorem 1. If two states x and yare accessible from one another, then
they are either both transient, both ergodic, or both null.
tThe terms "ergodic" and "null," although not particularly intuitive, are standard.
lOS Chapter 3
the cumulative number of fish caught, states 0, 1, and 2 are transient, but
state 3 is ergodic and, in fact, absorbing.
Note that the theorem does not prevent a chain from containing more
than one kind of state, but it does prevent that for states which are mutually
accessible. For an irreducible chain, there can be at most one set of mutually
accessible states, although there may be some transient states as well.
When a Markov chain has only a finite number of states, the general
theory applies and the problem of determining stationary vectors is the
Markov Chains 109
</>(s)= ~sj'7Tj'
j
The remaining equations give three kinds of terms, the first being
110 Chapter 3
which is
If the four contributions to the equation are assembled, the terms involving
'IT\and 'lT2 cancel, and <I>(s) can be expressed in terms of 'lTo (and the
parameters) as follows:
(6)
(7)
Equation (7), substituted back into the expression for <I>(s), yields the
probability generating function explicitly in terms of the parameters of the
model. A careful inspection of the fractional representation of <I>(s) reveals a
common factor (I-lis) in the numerator and denominator. When s=l= 1,
Markov Chains III
(8)
or a<O.
On the other hand, if both numerator and denominator are positive,
then the first inequali ty includes the second one and JL > A, so that there is a
greater probability of a step to the left than of a step to the right.
The second kind of information available from a knowledge of </>( s) is
the values of the moments. Differentiating with respect to s and setting s= I
gives
112 Chapter 3
where ps < 1. To put the denominator into this form, divide by - p. so that
p=A/P..
a-A
A= - - and B=l.
p.
Thus 7Tx , the coefficient of sx, consists of two terms, the first being 7ToApx-1
and the second being 7ToBpx. Making the necessary substitutions gives
P.-A a x-I
7Tx = p.-A+a J;p
This formula does not apply to 7T 1; the student should derive that expression
separately, finding en route why the calculation for the general case breaks
down.
The general random walk results can be used to obtain a number of
interesting and useful special cases, simply by assigning specific values to
the parameters. Two examples will be given, the first being a (discrete time)
queue, or waiting line, where the random variable represents the number in
the system (waiting or being served). Queues provide important models of
various kinds of stochastic processes and will occur with increasing frequency
(and increasing complexity) in the remainder of this volume. The present
model is a rather primitive one, because of the discrete time aspect.
Customers are supposed to arrive only at discrete intervals and to be
finished with service also at the same discrete intervals. The student might
think of a service facility which can admit new customers only "on the
hour" and can discharge customers also only "on the hour." The number of
Markov Chains 113
customers in the system can be 0,1,2, ... , and the parameters are interpre-
ted as follows: .\=probability of a new customer in the system, fL=
probability of a service finishing, a =.\, with 'fI'x being the long-range
frequencies of the number of customers in the queue. With this interpre-
tation, for example, 'fI'0 = 1 - p gives the probability of an idle server.
The second interpretation, gambling with a fixed stake, was already
mentioned in Section 3.2 an<;l will be further enlarged in Section 3.14. With
a merciless opponent and a probability of winning no greater than that of
losing, being bankrupt is clearly an absorbing state. For a less trivial result,
suppose that a generous opponent gives a bankrupt gambler a unit stake, so
that the origin is a reflecting barrier, but suppose that the odds are against
the gambler, so that .\, the probability of winning, is less than one-half. The
student can verify that the gambler'S average fortune is
2(1-2,\) .
It is also left as an exercise to show that for both the queueing model
and the gambling model, the equilibrium distribution, if it exists, is of a
modified geometric form, that is, the probabilities form a geometric series
after the first two terms.
Finally, it will be a useful exercise for the student to classify the states
of the random walk and to see how the classification (transient/ergodic/null)
depends on the values of the parameters .\, fL, and a.
The stages of the calculation, represented by Eqs. (5), (6), (7), and (8),
are quite typical of calculations which will recur in the remainder of this
book, and it will be helpful for further understanding for the student to
rehearse the various steps until they begin to become intuitively clear.
possible from state x to any state x + y if there are y + 1 (allowing for the
one person leaving after service) arrivals during a service period.
The student will greatly improve his understanding of this model and
prepare for the more difficult models of Chapter 6 by verifying the
transition matrix for the system:
ao a, a2 a3 a4
ao a, a2 a3 a4
0 ao a, a2 a3
0 0 au a, a2 (9)
0 0 0 ao a,
00
K=ao+ ~ a/v
y='
Since customers are served one at a time, the queue must pass through all
states between y and 0 if it is to return to zero. Furthermore, the arrival and
service pattern that would reduce the queue eventually from state y to state
y - 1 is exactly the same as the one that would reduce the queue from state
y - 1 to state y - 2, and so forth (for example, seven arrivals in one time unit
followed by eight time units with no arrivals). Thus
00
K=a o+ ~ avKr.
v=l
Markov Chains 115
Now, ~I' the probability that a queue with one customer in service will ever
become empty, is exactly the same as the probability that an empty queue
will ever again become empty, simply because the identity of the first two
rows of the transition matrix shows that a jump from state zero to any other
state has the same probability as a jump from state one to that state.
For example, a jump from state zero to state seven represents seven
arrivals and has probability, a 7 and a jump from state one to state seven
also represents seven arrivals, since the customer being served is discharged
during the time unit. This means that the probability of the queue in state
one becoming eventually empty is the same as that for the queue in state
r
zero, or ~I = Therefore, with state zero transient, ~ < I satisfies
00
In the other case, where zero is recurrent, ~ = I clearly also satisfies this
equation. Thus in every case ~ is a root of the equation
s=a(s ),
00
a(s)= ~ ajs j .
j=O
Referring to Section 1.13, it is easy to see that (except for the special cases
given in that section, which will be discussed below) every state of the queue
is transient if and only if ~ < I, which will happen if and only if the mean
00
;\= ~ jaj
j=1
of the arrival distribution is > 1; every state of the queue is recurrent if and
only if ~= I, which will happen if and only if ;\:::::: 1.
This result is intuitively clear, since a mean arrival rate greater than one
means that arrivals occur more frequently, on the average, than service
terminations, so that the system is overloaded and must gradually increase
in size, whatever local fluctuations may be encountered.
116 Chapter 3
Case I: a o = 1. There are never any arrivals to the system, state zero is
absorbing, and the chain is reducible.
(11 )
with the number of terms increasing in each succeeding equation. Using the
technique illustrated in Section 3.9, the nth equation is multiplied by sn-t,
n = 1,2,3, ... , and the whole set summed, leading to the following result:
which leads to
(I-S)7TO
</>(s)= I-s/o:(s) . (12)
Letting s-'> I with the aid of L'Hopital's rule permits evaluation (when
;\.::; I) of 7To = I-;\., and thus </>(s) can be written explicitly in terms of the
arrival distribution probability generating function o:(s):
o:(s)(I-;\.)(I-s)
</> ( s ) - -~,.--,--'-'----'-- ( 13)
- o:(s )-s
Markov Chains 117
where there are k+ I states, x=O, 1,2, ... , k. For example, with k=3, the
transition matrix defining the chain would be
0 I 0 0 1
I 2
:1 0 :1 0
2 I
0 :1 0 :1
0 0 0
Although this chain moves always from left to right or right to left, never
staying in place, there is a "central tendency," in that the further away the
chain is from the middle, the more likely it is to move towards the middle.
In the limiting cases, where x=o or x=k, there is a probability of one of a
step towards the middle, away from the reflecting boundaries.
It is the tendency towards central equilibrium that originally recom-
mended this matrix as a suitable model for heat exchange. In the physical
model, the states of the system are equated to the number of molecules in
one of two (heat-exchanging) containers, so that k is, for practical purposes,
far larger than three, and, in fact, the student interested in the physical
interpretation of the Ehrenfest model would do well, in the following
discussion, to think of k as, perhaps, 10 10, so that, for example, the
transition from state one to state zero would have a probability like 10 -10, a
very small number.
It is easy to see that the Ehrenfest chain is irreducible but periodic,
with period two. Nevertheless, like the trivial chain mentioned in Section
3.7, there is a stationary probability vector associated with the system. The
first steps in the analysis are only slightly different from those of Sections
3.9 and 3.10. The Ehrenfest chain matrix leads to the system of equations
7Tk = ( 1- -k-l)
k - 7Tk - l,
k-I k-J
+ kI [ <I> () ]
S -7T - kS ~ ._1
jS} 7Tj __ 1 + k1 ~ J7T
.++ I
O .4.; .4.; j IS} .
S j= J S j= J
The last two terms do not seem to fit into any simple form involving the
probability generating function itself, but a moment's reflection leads to the
suspicion that they are similar to mean (expected) values and might be
obtained by differentiation of <1>( s). This indeed is the case; here is a
differential equation in <I>(s). Writing out the (finite) series involved, the
student will be able to show that
and
x=O, I , ... , k,
The Markov chains studied up to this point have all had a certain
simplicity: It has been possible to proceed from the model to an explicit
form for the transition matrix, and from the matrix to the stationary
distribution. These calculations have permitted an easy classification of
states with respect to parameter values.
There are, however, certain Markov chains of importance for which
even the first step-an explicit form for the matrix-is difficult, so that the
analysis is more roundabout. Branching chains are of this type.
The idea of a branching chain came originally from the study of the
extinction of surnames, i.e., whether or not one of the generations following a
given individual would consist entirely of married females, or contain no
children. Subsequently, several diverse applications of the theory have been
discovered, notably in genetics and nuclear physics, where individuals can,
with known probability, give rise to offspring, who in turn can give rise to
further offspring of succeeding generations.
120 Chapter 3
is conditional upon an nth generation of x individuals. Let Zl' Z2' ... ' Z, be
the number of offspring of these x individuals, with, by hypothesis, the
distribution bx for each of the x independent random variables. Then the
(n + 1)st generation, which has size 'ij= I Zj' will have distribution b;* and
probability generating function [f3(s W. Thus
Px V = P( X n+ I = Y I Xn = x) = coefficient of SV in
. r.
the expansion of [f3( s )
(15)
It is easy to check this result against the values computed above for the
transitions from state two and to see how one might write out the next row
systematically. However, unless f3( s) is particularly simple, this does not
greatly facilitate the expression of Pxy explicitly in terms of the defining
probabilities of the chain, bx It is clear, nevertheless, that since all the states
communicate with the absorbing state zero, they must be transient, unless it
Markov Chains 121
is impossible not to have at least one offspring, bo =0. Thus the interest in
branching chains lies not so much in the calculation of stationary distribu-
tions (since every state has limiting probability zero) as in computing the
probabilities for an nth generation of size x, and especially the probability
of a transition to state zero, i.e., the probability of extinction.
Let the probability of x individuals in the nth generation be denoted by
b~n), with probability generating function {3n(s) and mean Bn:
b(n)
x
=P{ XI=x)
I'
00
~ b;n-I){coefficientofsxin [{3{s)]J}
)=0
00
{3n{s)= ~ b;n-I)[{3{s)]J
)=0
( 17)
(18)
122 Chapter 3
Thus
00
~ P( N = n )[ 0:( S )
11=0
r
=f3[o:(s)]. ( 19)
Markov Chains 123
In the notation of Section 3.12, let an be the probability that the nth
generation is empty:
It is obvious that a l =bo and the basic relationship!3n =!3n-I(!3) shows that
an =!3( an-I) Since the probability generating function !3 is increasing, it
follows that
type were discussed in Section 1.13. Let the (possible) root less than one be
denoted by cr. Since /3(O)</3(cr)=cr, lXl <cr. By induction, all lXn <cr, and
therefore lXoo = cr. Thus the value of lXoo is given by the analysis of Section
1.13.
First, the special cases: (i) If bo = 1, lXoo = 1; (ii) if b l = 1, lXoo =0; (iii) if
<
bo = 1 - b I' lXoo = 1. In the general case, M = 1 implies that lXoo = I, and
M> I implies that lXoo =cr< 1.
Summing 'i.Bn shows that the mean size of the entire progeny is
(1- B)-I when M< 1, and infinite when M;::::: I.
This Markov chain on the integers 0, I, ... , N was in trod uced in Section
3.2. It models a game in which a gambler can win or lose t a unit sum at
each stage of a game, with fixed probabilities of win or loss. Thus it is a
special case of the random walk discussed in Section 3.2, with
P( Xn =N I X n- I =N)= I,
Since the two absorbing states can be reached from the intermediate states,
it is clear that the states 1,2, ... , N - I are transient.
The problem is to find the probabilities for the eventual absorption
into states 0, corresponding to a loss of the initial stake, and N, which could
correspond either to winning all the opponent's stake or to a predetermined
fortune desired. These probabilities, it is clear intuitively, depend on the
gambler's initial stake k, where k= 1,2, ... , N-1.
Let p(k) be the probability of eventual ruin:
Since state x can be reached only from state x-I (by winning) or from
state x + 1 (by losing), the probabilities p( k) satisfy the difference equation
p( k + 1) - p( k ) = C[ p( k) - p( k - 1)] ,
where
I-A
c=--
A
k-)
p{k)-I= ~ CJ[p(I)-l].
j=O
l-C
p(1)=I---
l-C N
and, finally.
[( 1- A) /A ] k - [(1 - A) /A ] N
p(k)= . (21 )
1- [(l-;\)/;\r
to be
p(k)=I-k/N, '-1
f \ - 2'
p(k)= 1, ifA~-L
(22)
( l-'A)k ,
p(k)= -'A- if 'A>-!-.
(23)
00
..fk(S)= }: Pn(k)sn,
n=O
namely,
t Note that this is not a probability generating function, inasmuch as ~ p,,( k) = p( k) =I=- I.
11
Markov Chains 127
I-A
p{I)=-A-
Now suppose that the gambler begins with k> I units as his initial
stake. In order to become ruined, the gambler must pass through the
different states k-I, k-2, .. . ,1,0, and each step has the same independent
probability as the probability of ruin with one initial unit; thus
P(k)=(I~At
It is left as an exercise for the student to discover where the important
condition A> 1- enters this formulation, giving Eqs. (22).
II
II
(n) =
Pxv ~ f(k)p(n-k) (24)
~ v~"
. k=1 ..
f(1)=p,
f(2)=p(2) -qf(l),
where the upper indices refer to the corresponding powers of the basic
matrix. Thus, in order to calculate f( n) systematically, one needs to know
two elements from all powers of P, namely, the elements Pxy =P and PVy = q.
When these are known, the calculation can be reduced to a single stage by
using probability generating functions. Multiplying the jth equation above
by sJ and summing gives an expression for the probability generating
function cf>( s) = L.snf( n):
00 00
j=1 j=1
00 00
/3 1-/3 0 0
0 0 /3 1-/3
p=
/3 1-/3 0 0
0 0 /3 1-/3
and
Suppose first passage from state one to state two is required. Then
p{S)=(I-/3)S+/3(I-/3){S2+S3+ ... )
(I - /3 )s{ 1- s + s/3)
l-s
( ) _ s2/3(I_/3)
qs - l-s '
(I - /3)s
cp(s)= l-s/3 '
n= 1,2,3, ... ,
through the point (1, I). In fact, in this example, both are infinite at the
value s= 1. However, cf>(s) is a probability generating function and cf>(l)= 1.
The calculation depends, of course, on the two states chosen, x = I,
y=2. In this example, there are 16 different possible choices of state,
including those of the form x = y. The first-passage distributions obtained
are not all geometric (one other is) and are not all equal. The student should
find a few of these generating functions as an exercise.
In the case x=y, the first passage (now recurrence time, d. Section 3.8)
is from a state back to the same state, and the general formula is still valid
with p(s)=q(s), as may be shown by recapitulating its proof. These
first-passage probabilities are connected with the quantities ~ and fx( n),
defined in Section 3.8 in the following way:
~= ~ f{n).
n=1
~ f(n)< 1.
n=1
1- ~ f(n).
n=1
tThese are the usual, but somewhat misleading, names given to such distributions.
Markov Chains 131
ft(x)= ~ jf(j)=cj>'(I),
j=l
where the f and cj> functions apply to state x. The generating function
equation can be written in the form
cj>( s) 00
~ siq(j),
1 - cj>( s) j=l
11m (n) _ cj>(l)
1) ,
q - -----;-(
n-->oo cj>
which in this case becomes l/ft(x). But the limit of the diagonal element in
the matrix has already been shown to be 'Trx' This means that when q( s) has
tFor the exact conditions on the theorem, an outline of the proof, and complete references, see
Cox, D. R., and MILLER, H. D. (1965), The Theory of Stochastic Processes, Methuen, London,
pp. 140--141.
132 Chapter 3
d ( q(s) )
=ds l+q(s)s=1
q'( l)
(26)
3.17. Problemst
1. Let Xn be a Markov chain with state space 0, 1,2, with initial vector (ao, aI' a2)
and transition matrix {p xr }' Find (i) P( Xo = 0, XI = 1, X 2 = 1), (ii) P( XI = 1, X 2
= 11 Xo =0), (iii) P(Xn = llXn - 2 =0), (iv) P(X2 =Xo)
2. A fair coin is tossed until three consecutive heads occur. Let Xn =x if at the nth
trial the last tail occurred at the (n - x)th trial, x = 0,1, ... , n, i.e., XII denotes the
length of the string of heads ending at the nth trial. Write down the transition
matrix. I
2: 0 0
0 0
Ans.
0 0
0 0 0
3. Two green balls and two yellow balls are placed in two boxes so that each box
contains two balls. At each step one ball is selected at random from each box,
and the two exchanged. Let Xo denote the number of yellow balls initially in the
first box. For n = 1,2,3, ... , let x" denote the number of yellow balls in the first
box after n exchanges have taken place. (i) Find the transition matrix and the
two-step transition matrix. (ii) Show that nlim __ 00 x" = Xo. (iii) Find the one-step
matrix beginning with N green and N yellow balls.
tProblems 17 and 27 are taken from Lindley (1965), with kind permission of the publishers.
Markov Chains 133
4. There are two green balls in box A and three red balls in box B. At each step a
ball is selected at random from each box and the two are exchanged. Let the
state of the system at time n be the number of red balls in box A after the nth
exchange. (i) Find the transition matrix. (ii) What is the probability that there
are two red balls in box A after three steps? (iii) What is the long-run
probability that there are two red balls in box A? Ans. (ii) fx, (iii) ftJ
5. A number Xo is chosen at random from the integers 1,2,3,4,5. For n= 1,2,3, ...
a value of )(" is chosen at random from the integers 1,2, ... , X" _ I. Find the
one-step and two-step transition matrices.
6. In a sequence of independent throws of a fair die, let X" be the largest number
appearing during the first n throws. Find the one-step and two-step transition
matrices.
7. Four children throw a ball to one another. A child with the ball is equally likely
to throw it to each of the other three children. Find the one-step and two-step
transition matrices.
8. (A famous problem). A and B have agreed to play a series of games
until one of them has won five games. They are equally strong players and no
draws are allowed. Owing to circumstances beyond their control, they are
compelled to stop at a point where A has won four and B has won three games.
How should the stakes be divided? The present and possible future states of the
system are four: A has four games, B three; A has four and B also has four; A
has won; B has won. Write down the transition matrix and solve the problem.
9. Consider a two-state Markov chain with Pm) =A, PI I =/L. A new chain is
constructed by calling the pair 01 state 0 and the pair IO state I, ignoring 00 and
II. Only nonoverlapping pairs are considered. Show that for the new chain
.- I
Pm = P II = (I - A- /L) .
10. In independent throws of a coin which has probability P of falling heads, let X"
be the number of heads in the first n throws. Find the one-step and two-step
transition matrices.
II. Let X" be a two-state Markov chain over 1,2. Find (i) P(XI = II XO = I and
X2 = I) and (ii) P( XI =1= X2 ).
12. A particle moves on a circle through points 0,1,2,3,4 in clockwise order. At
each step it has probability P of moving to the next point clockwise and
probability 1-P of moving to the next point counterclockwise. Find the
transition matrix.
13. A rat is being trained to run a maze, and each attempt is considered to be a
success or a failure. Suppose the sequence of trials forms a Markov chain with
transition matrix
S
F
134 Chapter 3
and that on the first trial the rat is equally likely to succeed or fail. The rat is
considered to be trained if it achieves three consecutive successes. Find the
probability that the rat is not trained after 10 trials.
14. Test the following matrices for regularity.
(i)
(~ ~ ), (ii)
(~ ~ ), (iii)
(~ n,
(iv)
(1
1
4
1
2 n
IS. A Markov chain has transition matrix
(v)
(! 0
1
4
n
Find p".
16. A Markov chain has the transition matrix
P
I-p po ).
o l-p
"2
I I
"2 0 0
I I 1 0
4 "2 4
I 1 I
0 4 2 4
0 0 I 1
"2 2
(i) Find the stationary vector. (ii) Write down the mean recurrence times.
A ns. (ii) (3,6,6,3)
18. A production line produces three variants A, B, and C of a basic design. For a
"balanced" line, the following rules are observed: Two identical variants never
follow one another and each B must be followed by a C. (i) What is the
probability that after an A, the next A is separated by only one unit? (ii) Given
a production schedule which specifies the proportion of each variant required,
how could one find a rule which would yield this output in a balanced
production line? (iii) Suppose the demand was for 40% variant A, 10% variant
B, and 50% variant C. How could this be achieved?
Markov Chains 135
19. Consider a Markov chain with state space (1,2,4), initial vector (,tD, and
transition matrix
~
1 2
5 "5
I 1
(
"2 2
3
"4 o
(i) Find P(Xo = 1, XI = 1, X3 = 1). (ii) Compute pW.
20. A Markov chain with the non-negative integers as its state space has transition
matrix
0 0 0 0
I
0 1 0 0
"2 2
0 1 0 1 0
4 4
0 0 1 0 1
R R
0 0 0 II 0
16
(i) Write out the transition probabilities, i.e., fill in the blanks in the following
formula: P( Xn + 1 = 1X" = ) = . (ii) Let the stationary vector be
('lTI' 'lT2' 'lT3 ... ). Write a difference equation giving 'lTx in terms of 'lTx-1 and 'lTx+ I.
(iii) Express 'lT4 in terms of 'ITo. (iv) Find P( X"+2 =21 Xn = 2). (v) Let </>(s) = ~'lTjsj.
Show that </>( s) = P( s )</><1 s), where P( s) is a polynomial in s, and find P( s).
(vi) Let J.t be the mean value of the stationary vector, and let I' be the slope of
</>(s) at the value x=l. Prove that 21-'=21'+ 1.
21. Find the stationary probability vector for
(i)
(~
1
1 ~ ), (ii)
(:
1
0
1
4
n, (iii)
(~ !) 0
1
2
22. Classify the states of the Markov chains with transition matrices
I I I I 2
"4 "2 0 "4 "3 "3 0 0
I
0 1 .1.. 1 1 1 0
"5 3 15 4 R R
(i) 1 I
(ii) 1 1
0 3 "3 0 0 0 2 "2
1 I 1 1 1 2
"4 "4 "4 4 0 0 "3 "3
23. Two boys and two girls are throwing a ball. Each boy throws the ball to the
other boy with probability 1 and to each girl with probability Each girl .
136 Chapter 3
throws the ball to each boy with probability -t and never to the other girl. Find
the long-range probability that each has the ball. Ans. (-\-,L t i)
24. Let X n , n=0,1,2, ... ,be a two-state Markov chain (Poo =X,PII =/L), which
starts in state O. Find the probability that the first return to state 0 is at the nth
step. Ans. The p.gJ. </>(s) is (I-/Ls) -1(XS+S 2 _XS2 _/LS2).
(i) Find the n-step transition matrix. (ii) Describe the limiting behavior of the
chain. (iii) Find the stationary vector and comment on its meaning.
Ans. (iii) <-to -
p), 1. -t p)
26. A fair tetrahedral die with 1,2,3,4 on its faces is thrown repeatedly, and x" = x
if x is the highest value obtained (on the bottom) in n throws. (i) Write the
transition matrix for the Markov chain. (ii) For fixed x and y, find
</>(s)= ~/H(j)sj,
j
where 1, ..( n) is the probability of first transition from x to y on the nth step. (iii)
Consider </>(1) for y= 1,2,3,4 and explain why </>(1)= I holds only for y=4.
27. A certain kind of nuclear particle splits into 0, I, or 2 particles with probabilities
, -t, and , respectively, and then dies. The individual particles act indepen-
dently of each other. Given a particle, let XI' Xl' and X3 denote the number of
particles in the first, second, and third generations. Find (i) P( Xl >0), (ii)
P(XI = 11 X 2 = 1), (iii) P(X, =0).
28. In an Ehrenfest chain, show that if the distribution of Xo is (~) 2 -k, so is the
distribution of XI. For an Ehrenfest chain with k = 3, find the distribution of XI'
X 2 , and X3 if Xo is rectangular.
29. For a branching chain, calculate the probability of extinction for (i)
bo = , b 2 = ~, (ii) bo = , b I = -t, b 2 = , (iii) bo = i, b I = -t, b 2 = -\- , (iv) bo = b) = -t.
Ans. (i) t (ii) 1, (iii) -t, (iv) --t+-t5 1/ 2
30. In a certain branching process, the probability of n offspring from one ancestor
is geometric with parameter p. (i) Find the range of values for p which will make
the process die out with probability one. (ii) For p outside this range, find the
probability of extinction. (iii) If p is chosen so that the probability of the process
never dying out is 0.999, what is the probability that an individual will have no
offspring? (iv) Answer the first three parts with the modified assumption that
the probability of no offspring is the sum of the first two geometric probabili-
ties, the probability of one offspring is the sum of the next two geometric
probabilities, and so forth.
Markov Chains 137
31. Suppose each man has exactly three children, with equal probability that each
child is a boy or a girl. Consider the branching chain in which x" is the number
of males in the nth generation. (i) Find the probability that the male line of a
given man eventually becomes extinct. (ii) If a given man has two boys and a
girl, find the probability that his male line eventually becomes extinct.
32. Consider a branching chain with initial population size N and probability
generating function I-p+ps. Find the probability distribution of the step x at
which the population becomes extinct.
33. At time zero, a blood culture starts with one red cell. At the end of one minute,
the red cell dies and is replaced by one of the following combinations: two red
cells with probability ~, one red and one white cell with probability j, two white
cells with probability 11. Each red cell lives for one minute and gives birth to
offspring in the same way; each white cell lives for one minute and dies without
reproducing. Assume the cells behave independently. (i) At time n + 1 minutes
after the culture began, what is the probability that no white cells have yet
*
appeared? (ii) What is the probability that the entire culture dies out eventually?
Ans. (i)W 2" c '_\, (ii)
34. Bets of $1 each are made on the tosses of a fair coin, with the policy to stop
when the winnings reach $10 or the losses reach $20. Find (i) the probability of
losing, (ii) the expected loss, (iii) the expected number of bets.
35. A Markov transition matrix is said to be doubly stochastic if the columns sum to
unity. Show that such a chain has a rectangular stationary vector, if it is
irreducible, aperiodic and finite.
36. Consider a Markov chain with state space 1,2, ... ,c+d, where c and dare
positive integers. Starting from anyone of the first c states, transition is equally
likely to any of the last d states; starting from anyone of the last d states,
transition is equally likely to anyone of the first c states. (i) Show that the chain
is irreducible. (ii) Find the stationary vector.
37. Let X" be a branching process with probability generating function cp(s). Let Y"
denote the total number of individuals in the first n generations, i.e.,
n=O,I,2, ....
38. Suppose all elements in column y of a Markov transition matrix are equal and
nonzero, except that PVy =0. Show that the first-passage distribution from x to y
is geometric.
39. In the gambIer'S ruin formulation of Section 3.14, let q( k) denote the probabil-
ity that the duration of the game will be k stages. (i) Show that q( k) satisfies
q( k )=Aq( k+ 1)+ (I-A )q( k-I)+ I.
138 Chapter 3
k N 1-[(1-;\)/;\]'
q(k)= l-lA - l-lA I-[(I-;\)/;\t
40. Solve sequentially the random walk equations with step probabilities
P,~ 1. x' Po' Px+ 1. x to obtain 'fT, in terms of 7To
1 1
2 2
o I
"2
I
"2 o
and states 1,2,3, show that the first-passage distribution fI2(n)=2-". [Show
that for odd n,p<"i =(2" + 1)/2"3, q("i =(2" -2)/2"3, and for even n,p<"i =
(2" -1)/2"3, q("i =(2" +2)/2"3.]
42. In the example of Section 3.16, find all of the first-passage probability generat-
ing functions.
43. Define a Markov chain based on the fish-netting experiment of Section 2.1,
classify the states, and write down the transition matrix.
44. Referring to Eq. (19), show that E(SN)=E(X)E(N).
45. For the Markov chain defined in Problem 25, consider first passage from state I
to state 3. Findp(s), q(s), and show that
</>(s)= 1-(I-p)s2'
and thus that the first-passage distribution is geometric over the even integers
with parameter I-p.
4
4.1. Examples
There are many random variables whose ranget consists of the positive
real line (0, 00) or a portion of the real line.
j' up to now no particular name has been attached to the set of values for which a random
variable has a positive probability, and indeed there is no standard word in the literature for
this important concept. Most authors skirt the question, as the first three chapters of this book
have done, by saying that the random variable is "over" or "on" some set of values. Feller
attributes his reluctance to use the correct mathematical term "range" (for values assumed by
a function) to a statistical definition of "range" as the difference between the largest and
smallest items in a sample. This does not seem to be a good enough reason, and from this
point onwards, the aspect of a random variable as a function on the sample space will be
emphasized by introducing the term" range" for the possible values of a random variable.
139
140 Chapter 4
Example 13. A person approaches a postal counter; X=time needed to
wait for service. Range: and (0, (0), the first value corresponding to those
Continuous Probability Distributions 141
In the above examples it might be asked why the range includes the
special value zero rather than simply assigning the probability required to
the zero in the continuous portion of the range. The answer, which will
become clearer in the course of the chapter, is that each number in a
continuous range must have probability zero. Indeed, one definition of a
continuous random variable X is the equation
P(X=x)=o forallx.
Consider Example 17. It might happen that half of all cars are stationary,
with the remaining car speeds continuously distributed over the interval
(0,00). The probability model for this situation would assign Po = 1. with the
remaining probability distributed like a continuous random variable.
142 Chapter 4
jYf(u)du=P(x<X<y).
x
r~ f{u)du= jbf{u)du= 1,
-00 a
x-I
P(X<x)=PX' x= 1,2,3, ... , where Px = ~ Pj
)=0
_ 1 JAX -v k-\
- r( k) 0 e v dv.
The integral is clearly related to the integral defining the gamma function
(Section 1.9) and is called an incomplete gamma function. The incomplete
gamma function will be discussed in detail in Section 4.5.
The student will note that the exponential distribution results from
setting k = 1 in the gamma distribution.
Many of the concepts defined in Chapters 1 and 2 can be extended to
the case of continuous random variables by substituting integration for
summation.
The events considered here t are real intervals (a, b), with probability
f:f(u)du.
mean=m= lOxf(x)dx,
o
(g(X))= 1o g(x)f(x)dx,
00
-oo<x<oo,
f(x, y) f{x, y)
and
g{x) h{y) .
Continuous Probability Distributions 145
p(X<x)-P(X<k)
P(X>k)
l-e-AX-(l-e- Ak )
e- Ak
=l-e-A(x-k),
and with a change of origin from 0 to k, this becomes the cumulative form
of the exponential distribution.
Calculations of integrals with respect to a continuous density can often
be greatly facilitated by using the fact that the given density is normalized.
Thus, since
"Akf(n+k)
f(k)"An+k
=(k)n"A- n
In the study of discrete random variables, the change from the distribu-
tion of one random variable to another functionally related one was
basically substitution, with the probabilities unchanged. A little more care
must be taken in the continuous cases, although the result is rather similar.
Beginning with the basic relationship between the random variable and its
probabilities
P{X<x )=F{x),
p{Y<x )=P(h(X)<x)
In the continuous case, the density function fl(x) of Y and f2(x) of X are
thus given by
fl{x)= ~ F( h -I{X))
(I)
This simply says that the substitution must be made not only in the density
function, but also in the differential element. Thus, for example, if X is
exponential with parameter "A., the distribution of kX is not "A.e -"Ay/k, but
rather ("A./k)e-"Ay/k, as normalization would show in any case. The extra
factor l/k can be thought of as coming from the substitution of dy/k for
dx.
tThe student with experience in analysis will see why these conditions are necessary. In
virtually all cases of importance in this book. the relationship is linear.
Continuous Probability Distributions 147
J= u
(1 +V)2
and the joint density function of U and V is
then
H(x)=P(X+Y<x)= ff u+v<x
I(u)g(v)dudv
The upper limit is now the finite value x, since the density functions under
consideration are zero for negative arguments.
It will be noted that the convolution formula given is entirely analo-
gous to the formula for discrete random variables as given in Section 2.6,
and. like the earlier formula, the functions I and g are interchangeable.
Also, for students interested in the theory of integration, it is clear that
necessary conditions for differentiation under the integral sign are fulfilled
in the present instance.
Actually working out a convolution can be tricky in the continuous
case because of the range of the variable. Consider the following example.
Let X be a random variable with a continuous rectangular distribution over
(0, a),
O<x<a,
O<x<oo.
Continuous Probability Distributions 149
O<x<oo,
which cannot be correct, since the right side does not even integrate to
unity. The important fact which is omitted in this calculation is that the
rectangular distribution vanishes outside the range (0, a), so that the in-
tegrand can also vanish for certain ranges of the variable. In fact, although
it is true that X + Y has range (0,00), the evaluation of the convolution
integral requires separate consideration of the ranges (0, a) and (a, 00). In
the first case, the upper limit x is correct, but if x>a, the upper limit must
be replaced by a since the rectangular distribution is zero for values larger
than a. Thus the density for X + Y is
x>a,
x<a.
[ f( x ) ] n. = f( X ) * f( x ) * ... * f( x ) (n "factors").
are denoted by
(5)
y( n , x) + f( n , x) = f( n ).
These functions are useful in many ways; in probability their chief roles
are as the distribution functions for two important distributions: the gamma
distribution and the Poisson distribution. This link between two prominent
distributions, one continuous and the other discrete, turns out to be espe-
cially significant in studying stochastic point processes, beginning in Section
5.3.
The Gamma Distribution (with parameters A and k)
(7)
x~ I Ale ~A
P(x)= ~ - . , - ,
/=0 J.
Continuous Probability Distributions lSI
y(x, A)
= 1- r(x)
r(x, A)
r(x) . (9)
~ e-XA} = y(x, A)
(10)
}=x j! r(x)
n=0,1,2, ... ,
this being the distribution function for the length of a line segment joining
two of the points which are separated by k - 1 other points. The density
function for this distance is thus a gamma distribution function with
parameters k and A; in the special case k= 1, it is exponential. Now consider
a segment of the line of length L which does not contain the origin, and let
Y denote the number of points lying on L. Then, for example, the leftmost
152 Chapter 4
y(k, AL)
[by Eq. (7)]
r(k)
This shows that if the gaps between points are exponentially distributed
with parameter A, then the counts of points in length L are Poisson
distributed with parameter AL. The argument, which will be generalized in
discussions of renewal processes, can also be reversed, giving the Poisson-
gamma relation. Naturally, the mean count in a unit interval is A, whereas
the mean gap is l/A.
An important application of this relationship occurs when the line
represents a time axis and the points are events in time. If a distribution is
formed by counting events in time, such as cars passing a point per minute,
particles arriving at a particle counter per minute, mine disasters per year,
fire calls per day, or wars per century, then this distribution is Poisson if
and only if the continuous distribution formed by the time between cars,
particles, disasters, fire calls, or wars is independent and exponential. Such
events are sometimes called random, or Poisson. Because of the more
general use of the word "random" to mean "not deterministic," this volume
uses "Poisson" to indicate such a point sequence. It becomes important,
then, to distinguish clearly between the two meanings: A Poisson sequence
has a Poisson counting distribution.
Two other facts which have already been established shed additional
light on Poisson events. The Poisson distribution counts points placed
independently on a segment (Section 2.4); and if one point follows another
with exponential distribution, then after a fixed period of time, the distribu-
tion of the gap to the next point is still exponential and with the same
parameter (Section 4.2). This characterization is referred to by saying that
the exponential distribution is memoryless.
Continuous Probability Distributions 153
p>O,q>O.
It is shown in Section 1.9, Eq. (18), that the normalizing constant must be
f(p+q)
c= f( p )f( q) .
(II)
B,,( x, n - x + 1)
1- P ()
x = -,.------..,-
B( x, n - x + I) .
(13)
The proof of Eq. (13) is not difficult, but it can be tedious. It consists of the
repeated integration by parts of the definition of the incomplete beta
function; the process ejects successive binomial terms. The first step, for
154 Chapter 4
example, yields
( n - x )( n - x-I) ... 3 2 1
x ( x+ 1) ... ( n-l ) B",(n, 1),
easily turns into the nth binomial term, thus showing that the ratio of the
incomplete to the complete beta functions is the tail P( X?:. x ) of the
binomial distribution.
This beta-binomial relationship is almost exactly parallel to the Poisson-
gamma relationship of Section 4.5. A probabilistic interpretation is as
follows. Suppose numbers Xl' X 2 , ... , Xn are chosen independently, at ran-
dom, on the unit interval. Let Yl ::; Y2 ::; ::; Yn be the same numbers
designated in order of magnitude. Let X be the number of the J0 which lie
in the interval (0,77"), so that X is a binomial random variable with
parameters nand 77". Then the event X?:.x is the same as the event Yx ::;77".
The first inequality has probability equal to the left side of Eq. (13), and
therefore the beta distribution on the right side gives the probability of the
second inequality. Yj is called in statistics an ordered sample, a concept of
considerable importance.
( 14)
where f(p,) is the density function for the random variable JL. Note that the
range of p, is (0,00), so that it would be natural to choose a density function
f( JL) defined over this range. One choice that seems logical, and that has in
fact been widely used, is the gamma distribution, say, with parameters 'A and
k. Then, using the substitution technique given in Section 4.2 in connection
with the gamma mean, the integral is evaluated as
_(X+k-l)(
- x
'A )k( t )X
'A+t 'A+t'
a negative binomial distribution. This model, when used in accident analy-
sis, is often referred to as an "accident proneness" model, with the parame-
ter JL called the proneness. In the theory of probability, the Poisson
distribution is said to be mixed with the gamma distribution to give the
negative binomial. This use of the term "mixed" should not be confused
with the same word as applied to a distribution which is partly discrete and
partly continuous:!:; in case of doubt, it is better to write "parameter mixed"
in one case and "discrete and continuous mixed" in the other.
There are many other examples of mixing which are of some impor-
tance. The beta distribution, being defined over the unit interval, is useful as
a mixing distribution when the parameter in question is a probability, such
as p in the geometric distribution or 'TT in the binomial.
B(x+p, 1+q}
x=o, 1,2, .... (15)
B(p,q}
_ (~)B(x+p, n+q- x)
x=O,I, ... ,n. ( 16)
B(p, q}
The third assumption means that the functional value at points of discon-
tinuity are associated with the lower branch of the curve, since F( x) =
P(X<x).t
Special Cases. (a) If F(x) is continuous, and P( X <x)=F(x), then X is
a continuous random variable.
(b) If F( x) is a step function, and P( X < x) = F( x), then X is a discrete
random variable.
(c) If F(x) is continuous except at a countable number of points, where
F( x) has jump discontinuities, and if there is an interval in which F'( x)
exists and is nonzero, and P( X < x) = F( x), then X is a mixed (discrete and
continuous) random variable.
If F'( x) exists and is continuous for all x 2::0, then a familiar theorem
from calculus guarantees that for all x>O,
(17)
But if X is a discrete or mixed random variable, F'( x) does not even exist at
the points where F is discontinuous.:j:
For a discrete distribution, let the jumps occur at the points a x' and let
the magnitude of the jump at a x be Px' Define the delta function ~(x) by
~(x)= I, x>o.
Thus ~(x-t) is the distribution function for the causal distribution at the
value t. Then, for any discrete distribution, the distribution function can be
written
tAs pointed out in Section 1.8, an equivalent theory can be developed on the basis of right
continuity, that is, with F(x)=P(X~x).
*By using Lebesgue integration, a version of Eq. (17) can be obtained even if F'( t) is not
defined at all points; however, the resulting theorem is not useful here because it holds only if
F is continuous. Indeed F must satisfy a stronger condition called absolute continuity. See, for
example, ROYDEN, H. L. (1968), Real A nalvsis, Macmillan, New York.
Continuous Probability Distributions 159
( 19)
F(x)=O, X~O,
x>O.
Let
c5=norm(~)= max(xi+I-X i ),
and let ~o' ~I"'" ~n-I be chosen so that Xi ~t ~Xi+ I' Then, if the limit
n-I
lim ~ g(~i)[F(Xi+I)-F(x;)] (20)
8~0 i=O
(21)
It will be noted that the Stieitjes integral becomes a Riemann integral when
F(x)=x, according to the definition, and also that the notation chosen for
Eq. (21) reflects this fact. Just as with the Riemann integral, one of the first
tasks is to establish conditions under which the Stieltjes integral exists. Since
the application to probability (Laplace transforms of distribution functions)
is rather special, no discussion will be given of general necessary conditions,
but only of those sufficient for this purpose.
Theorem 1. If, on the interval [a, b), g(x) is continuous and F(x) is
monotonic nondecreasing, then the integral (21) exists.
Continuous Probability Distributions 161
Mi = max g( x ),}
Xi ~X~Xi+l' i=O, 1, ... , n-l,
m i = min g(x),
The left and right terms of the inequality are called the lower and upper
sums, respectively, of the subdivision ~, and are written L(~) and U(~). If
<
?i and S;' are any two subdivisions of the interval, L(?i) = U( S;'); the proof
of this fact is left as an exercise for the student and is based on considering
a subdivision ~" consisting of all points in either ~ or ~'. Now define two
numbers A and B by the equations
n-l
A = sup ~ m/lF;,
i=O
n-l
B=inf ~ M;t:.F;,
;=0
lim [U(~)-L(~)]=O,
8->0
g(x')-g(x")< F(b)~F(a).
Let the norm 8 be any value less than this 8': 8<8', and take for x' and x"
successively the values x;, x; + ], defining S. Then
n~]
U(S)-L(S)= ~ (M;-m;)tlF;
;=0
n~]
Note. This theorem gives the integration by parts formula familiar both
in Riemann and in Stieltjes integration.
n
~ F( x k ) [g( ~k~]) - g( ~k)] + F( b )g( b ) - F( a )g( a).
k=]
Theorem 3
Theorem 4
Theorem 5
Theorem 6
Improper integrals (i.e., those with upper limit infinity) are important and
are defined, as with Riemann integrals, as limits.
Definition 1
=e+e- I -2,
ta
g{x )dF{x )=0,
whenever F(x) is constant over the closed interval [a, b). Suppose g(x)= 1
and. consider the deterministic distribution at the origin with distribution
function F(x). With left continuity, F(x) is not constant over the closed
interval [0, b) and so this formula does not apply. But with right continuity,
F( x) = lover the closed interval [0, b), and so the formula would require
that the total probability of the deterministic distribution would be zero,
Continuous Probability Distributions 165
rather than unity, as needed for probability theory. The only way out of the
difficulty would be to significantly complicate the formulation by substitut-
ing limits from the left in place of fixed lower limits of integration, i.e., to
write always
lim
xt a x
f h
in place of
It is easier and more pleasant to use left continuity from the beginning.
(24)
This means that for discrete random variables, the Laplace-Stieltjes trans-
form is only the probability generating function with s replaced by e -So
1o00
e -sXdF{ x) =s 1 e -SXF{ x )dx
0
00
00 n+l
= ~ sf e-SXF{x)dx
11=0 11
00 n+l
= ~ sf e -SXF{n+ 1)dx
n=O n
00
o
A similar theorem could be given for general mixed discrete and continuous
distributions, but it will be more useful to consider certain special cases.
Continuous Probability Distributions 167
F(x)=a+(1-a)(I-e- Ax ), x>O
cp(s)= lOOe-SXdF(x)
o
1o e -SX(I-e- )dx
=a+(I-a)s 00 Ax
=a+(1-a)-(I-a)s1 e -(A+S)Xdx 00
o
s
=1-(1-a)-
s+,\
as+,\
s+,\ .
,\+as
,\+s '
as before. Calculations of this nature can often be organized algebraically
by using the function 8( x), which represents the causal distribution at the
168 Chapter 4
~(x)= 1, x=O,
which reduces to the earlier result when d=O, k= 1. The student should, as
an exercise, obtain the Laplace-Stieltjes transform by the distribution
function formula of Theorem 1.
tSee, for example, BREMERMANN, HANS (1965), Distributions, Complex Variables and Fourier
Transforms, Addison-Wesley, Reading, Massachusetts.
Continuous Probability Distributions 169
The transform 0:( s) is called the Laplace transform of g( x), and is denoted
e
by g( x); the transform f3( s) is called the Laplace-Stieltjes transform of
g(x) and. when g(x) is differentiable, is equal to eg'(x), which, as will be
shown in the next section, is equal to seg(x)-g(O+). When g(x) is a
continuous distribution function, g(O+ )=0, so that in this case the Laplace
transform 0:( s) and the Laplace-Stieltjes transform f3( s) are connected by
the simple formula o:(s)=sf3(s). In the purely discrete case, it has been
shown that the Laplace-Stieltjes transform reduces to the probability gener-
ating function with argument e -So For distributions which are mixed
discrete and continuous, only the Laplace-Stieltjes transform is defined,
although the delta-function format for its calculation is based on an analogy
with the Laplace transform.
(25)
and
a>O, (31 )
(32)
(33)
where the symbol g(O+) means that the differentiation is to be performed
and the result evaluated as x ~ from the right. This can be different from
the functional value g(O), but need not be so. If, for example, g( x) = F( x), a
distribution function, then F(O) = 0, while F(O + ) would be P( X = 0) if there
was a discrete component at the origin. Similarly,
(36)
Continuous Probability Distributions 171
The special cases n = I of the last four formulas will be of primary interest
and, for reference, are given separately:
eg'(x)=s~(s)-g(O+ ), (37)
e1 ~(s)
x
g(u)du=-, (38)
o s
The properties discussed thus far are relatively simple and can be
established quite easily from the definitions. There are, however, two
important categories of results which are more difficult: (i) limiting theo-
rems, and (ii) inversion theorems. In each case there is a substantial body of
purely theoretical analysis, establishing exactly the conditions which are
necessary and sufficient. The proofs given here apply only to those situa-
tions which are needed in the sequel, although more general theorems are
stated without proof.
Limit Theorems t
10 e-sY(x)dx=s~(s)-g(O+).
00
(41 )
X-+CO 0
x
= x-+co
lim [g(x)- g(O)].
Equating this result to the right side of Eq. (41), it is clear that the quantity
g(O) cancels, since it is independent of x and s, and thus the theorem
follows. 0
lim s<p(s)= 1.
s-+o+
lim o(s)= 1.
s-+o+
lim o(s)=F(O+).
S-+ 00
. g(x) . sn+ll[;(s)
hm --n-= hm
x->o+ X s-> 00 n!
and
. g(x) . sn+ll[;(s)
hm --n-= hm I'
x-> 00 X x->o+ n.
With all limit theorems it is necessary to take care that the various
limits exist, since the existence of one limit does not guarantee that of
another. For example, with g(x)=sinx, q,(S)=(l+S2)-I, and lims->o+
sq,(s)=O, but limx->oog(x) does not exist.
The student should investigate the various limiting formulas for a
simple probability distribution, say, f(x)="A.e- Xx , F(x)= l-e- Xx This
distribution could then be compared with the mixed discrete and continuous
distribution used as the example in Section 4.10.
Laplace transforms were originally used, and are still primarily used, to
solve differential equations by replacing nth derivatives by nth powers
according to Eq. (33) of Section 4.11. In probability, however, the principal
use of Laplace transforms is to replace n-fold convolutions by nth powers
according to Eq. (27) of Section 4.11. In either situation it may be desirable,
once the problem has been solved in terms of the Laplace transform, to
obtain the Laplace inverse, e.g., in the case of probability, the distribution
function. Unfortunately, the best-known and most general Laplace inver-
sion formula,
I jC+iR
g(x)= -2 . lim esxl[;(s)ds,
1T1 R->oo c-iR
174 Chapter 4
;\/L=k, v;\2=k.
Define a limit so that the mean remains fixed and the variance approaches
zero; this can be done by replacing the parameters ;\ and k by /L and k, so
that the density function becomes
(42)
. (-l)k-I(k)kd k- I (k)
f(x)= }~: f(k) ~ dSk-1 cj> ~ (43)
Substituting this result into the right side of Eq. (43) gives
r1m
k~oo
1
0
00
(k/x)k
f(k) e
-ku/x k-Ij( )d
u u u,
(44)
with the same argument as used in Section 3.12; when the Xj are discrete,
the result given in that section follows from
The student will verify that when the ~ are exponential with parameter h,
and N is geometric with parameter p, then SN is exponential with parameter
h(l-p).
The distribution of the J0 is called the stopped distribution and that of
N, the stopping distribution.
Just as in Section 3.12, the method of marks can be used to obtain the
nested probability generating function equation.
An important special case is that obtained when N is Poisson distrib-
uted. Then
(45)
4.14. Problemst
I. On a line segment XY, two points are chosen independently and at random. If
the points are P and Q, what is the probability that the segments XP, PQ, and
QY can form a triangle?
2. A point is chosen at random on the base of an equilateral triangle of side A.
Find the density function for the random variable X, defined as the distance
from the chosen point to the opposite vertex.
Ans. (2x/A)(4x2-3A2)~1/2, 1fS A<x<A
3. Consider a random variable with range (0,2) and with the density function
2 Kx(2 - x). Find K and determine whether the distribution is symmetrical
about the mean.
4. A point is chosen at random in the unit square. What is the probability that the
point lies within the triangle formed by the y axis, the diagonal x=y, and the
horizontal y = I? What would the probability be if it were given that the point
fell within the triangle formed by the coordinate axes and the diagonal x+y= I?
5. A random variable with range (0, I) has density function kx 2 (1- x 3 ). Find the
value of k and the expectation of the variable. Ans. E(x)=-&
6. A random variable with range (2,5) has density function k(\ + x). Find (i) k, (ii)
P( X> 3), (iii) E( X).
'Problems 13,28, and 40 are taken from Lindley (\965) (see p. 132) with kind permission of
the publisher.
178 Chapter 4
searching in the second place, and if the oil is there, the probability of finding it
is I-e -k(T-r). Given that the probability of oil being in the first place is p, and
in the second place, 1 - p, what is the probability that oil will be found? How
should the time be divided between the two places to maximize the probability
of discovery?
12. A point P is chosen at random on the diameter of a semicircle of unit radius
and a perpendicular is drawn to meet the semicircle in Q. Find the expected
length of PQ. Another point P' is chosen at random on the circumference of the
semicircle and a perpendicular to the diameter is drawn through P' to meet the
diameter in Q'. Find the expected length of P'Q' and explain why the two
results do not agree.
13. A point X is chosen at random on the unit line segment PQ. What is the
expected area of the rectangle with sides PX and XQ? What is the probability
that the area is greater than one-half?
14. Suppose a random variable has a density function which is symmetric about the
value X==c. Show that the density function ofaX+b is symmetric about the
value ac+b.
15. Suppose a random variable X has range (0,00) and a random variable Y has
range (- X, + X), with joint density
f( x , y ) == k ( X 2 - Y 2 ) e- ,
Find the value of k, the marginal and conditional densities, and the expecta-
tions of X and Y.
16. Suppose the range for the distribution of X and Y is the unit circle, with joint
density
Find the value of k, the marginal and conditional densities, and the expecta-
tionsof X and Y. Ans. (387T )(l_X2)3!2, -I<x<I
17. Consider independent random variables X and Y with densities 6x(I- x) and I,
respectively, each being defined over the range (0, I). Find the densities of (i)
X+ Y, (ii) X- Y.
18. Let X and Y be independently negative exponential with the same parameter A.
Show that the random variables X+ Y and XjY are independent.
19. Suppose random variables X and Yeach have range (0,00), with joint density
function
e ye
-'/" l-r.
u 2 -2Xu+ y=o
has real, distinct roots? Note: you need not evaluate the integral in the answer.
22. Let X and Y be independent negative exponentially distributed random vari-
ables with the same parameter. Show that X/(X+ Y) is rectangularly distrib-
uted over the unit interval.
23. Show that the ratio of two independent negative exponential random variables
wi th the same parameter has density function (I + x) - 2. What is the range?
24. Random variables X and Y have joint density function
F( t) =0, for 0,
tSee RADE, LENNERT (1972), On the use of generating functions and Laplace transforms in
applied probability theory, Tnl. 1. Malh. Ed. Sci. Techno!. 3, 25-33.
180 Chapter 4
29. Show that if the Poisson probabilities are truncated by normalizing to unity
those beyond the value N - I, the resulting probabilities can be written
f(N)e-AA'
x!y(N, A) , x=N, N+ I, N+2, ....
30. In Eq. (14), show that the moment generating function of the parameter-mixed
distribution can be written in terms of the Laplace transformation of f( x).
31. Show that the Laplace transformation of
is given by
32. Let X be beta distributed with parameters p and q. Find the density of 1/ X-I.
Ans. xP-'(1 + x) -p-q/ B(p, q), O<x< 00.
33. Show that if X and Y are independent, gamma distributed random variables
with parameters A, k and A, K, respectively, then X/(X+ Y) is beta distributed
with parameters k and K.
34. Prove
dk k
(i) - k [e'y(n,x)]=(-I) (I-n),e\y(n-k,x),
dx
d" k
(ii) - . [eT(n,x)]=(-I) (l-n)keT(n-k,x).
dx k
35. Prove
(i)
(ii)
36. Prove
37. Prove
(i) I
j=1
y(j,A) =A
f(j) ,
(ii)
38. If X is a continuous random variable with distribution function F(x), show that
E(X)= /(f[I- F(x)]dx.
( ) 1o
B a,b =2 "/2(.
smu )2a-l( cosu) 2h-1 duo
40. The probability density of the velocity Vof a molecule with mass m fn a gas at
absolute temperature T is
with range (0, 00), where {3=m/2kT, k is Boltzmann's constant, and A is chosen
for normalization to unity. Find the mean and variance of V.
41. The chi-square distribution, useful in statistics, has density function
Xl 11/2)- Ie - x/2
2n/ 2 f(!n) .
Show that this is actually a gamma distribution, and find the values of the
parameters.
42. Show that
(ii) 1o",/2.(sm u) du -
r _ f[1(r+l)]'lT1/2
(1
2f 2r+ 1
) , r> -1.
182 Chapter 4
00 Uh~ldu
B(a,b)=l +h.
o (l+ur
45. Show that
1o f(k,.\x)
00 _ k
f(k) dX-y;.
n=0,1,2, ....
These will be called the gaps of the point process. When the point process is
considered to be the transition points of a continuous time, stochastic
process, the gaps are sometimes also called the sojourn times of the process.
It is clear that a point process can be specified either by giving a
mechanism for determining the points 'Tn or, alternatively, by giving a
mechanism for determining the gaps On. Sometimes it will be convenient to
do one, and sometimes the other. Indeed the relationship between the two
formulations will play an important role in the theory.
In simple cases, the process which specifies the gaps or points of
transition of a stochastic process is independent of the process which
specifies the transitions. This is by no means always the case. There are
situations in which the nature of each succeeding gap will depend on the
state of the system at the beginning (or even at the end) of that gap.
Notation
Let X(t) denote the state of the system at time t. Then the double
subscript probability with two time values
For a Markov process, the exponential parameter is not necessarily the same
for each gap. The unconditional distribution will also be denoted by the
Continuous Time Processes 18S
Unless otherwise specified, it will be assumed that the state space of the
process is the non-negative integers 0, 1,2, ....
As in Chapter 3, the initial distribution will be denoted by the letter a:
. P(x<X<x+~xlx<X)
f ()
x = t.x--->O
hm ~x
. P(x<X<x+~x) 1
= t..HO
hm
~x P(x<X)
g(x)
I-G(x) , (7)
or, alternatively,
d
f(x)=- dxlog[l-G(x)]. (8)
(9)
and hence that a renewal process has constant failure rate if and only if it is
a Poisson process. This property corresponds to the lack of aging property
of the negative exponential distribution. According to either interpretation,
a transition point is equally likely to occur at any instant, quite indepen-
dently of the length of time since the last transition point.
In a renewal process, if f( x) is an increasing function, the process is
said to have positive aging, if decreasing, negative aging. The student should
examine the relationship between f( x) and the parameters of the gamma
distribution which define an Erlang process (Problem 28). What does this
imply for the aging of a deterministic process?
When x=O, the argument needs to be modified. The student should fill in
the details to show that
(12)
Using the generating function technique, Eqs. (11) and (12) can be com-
bined in the following form:
where
00
</>(s,t)=exp[ -(I-s)At],
then
Let
Using the same kind of argument as in Section 5.3, the differential equation
in PI(t) can be built up by conditioning on the state of the system at time t
as follows:
(14)
Continuous Time Processes 191
system begins in state I, (iii) the system has probability 1T of beginning in
state and probability 1 -1T of beginning in state 1.
It will be noted that, independent of the initial state of the system, the
probabilities evaluated as t --'> 00 are proportional to the coefficients A and p..
The meaning of this result will be discussed in more detail in Section 5.8.
it follows that
With this system of notation for the infinitesimal probabilities Px/ D.t), which
incorporates the Markov property, it is possible to compute, in the general
case, the differential difference equation in much the same style as has
already been done for the Poisson and the two-state processes. The basic
idea is to divide the time period (t, t + D.t) into periods of length t and D.t
and form the derivative:
pqo(t+D.t)= ~Pxk(t)Pkv(D.t)
. k .
P~y(O)=Axy'
It will be noted that Au is not yet defined; now it will be taken to be p.~x(O).
The system of differential equations (18) is known as the forward differential
equation system of the process. An analogous system of backward differential
equations can be obtained by beginning with t and D.t interchanged.
In either case, the solutions of the differential difference equations are
obtainable only in certain special cases. Two of these have already been
investigated:
The Poisson Process
Y=FX, x+ 1.
The Two-State Process
AO.l =A,
The student should work out the details of these processes as special cases
of the Markov process.
Continuous Time Processes 193
x=I,2,3, ... ,
(19)
n=N,
The student should carry out the calculations leading to this conclusion,
and, by investigation of the negative binomial mean for this case, show that
the population mean is exponential with time.
194 Chapter 5
5.6. Equilibrium
Events like the occurrence of a birth and a death in time 111 have probability
o(l1t), and there is no change with probability
leading to
x= 1,2,3, ... ,
(20)
Continuous Time Processes 19S
(21)
The necessary condition for the equilibrium is therefore that this sequence
sums to 1 - Po, i.e., that the series
A .'11...
Po-1 = ~
00
0 J +1 (22)
j=O P,l P,j+l
converges.
A special case of the birth-and-death process is the Markov queue:
Ax ='11.., P,x =p,. The random variable represents the number in the system
(queueing and being served), while A represents the rate of arrival to the
system, and p" the rate of service. Then, writing P=A/p"
Po = l-p,
a geometric distribution.
For a queue of this type, the variance is always larger than the mean
and approaches infinity with saturation (p~ 1) more rapidly. It is therefore
quite sensible to "come back later" when the queue seems too long to be
tolerable.
196 Chapter 5
(23)
with the equilibrium solution Po( (0) = 1. Thus the probability of extinction
is I, and the zero state is absorbing. Some further results on the linear
birth-and-death process are given in problems 36-38 of Section 5.15.
can be interpreted as the probability that there are no marked points in the
interval (0, t] when each point is marked independently with probability
I-s.
Poisson Process
To have no marked points in (0, t+~t] requires no marked points in
(0, t] and either no point in (t, t+~t) or else a point which is not marked:
which leads directly to the differential equation for the Poisson process
(Section 5.3).
The method of marks can also be adapted to apply to Laplace
transformations. Consider a renewal process with gap density g( x), and
suppose the marks are points of a completely independent Poisson process
with parameter x. Then, beginning at the beginning of a gap (or at any other
Continuous Time Processes 197
point), the probability of no mark before time t is e -SI, i.e., the tail of the
negative exponential distribution. Thus the Laplace transform
<[>(s)= lOe-sxg(x)dx
o
represents the probability that no mark occurs during the gap.
which is equivalent to
I/;(s )=(1-p )<[>(s )+p<[>(s )I/;(s). (25)
Note in this derivation the key fact that if the point terminating the gap X is
dropped, then the interval Y consists of independent intervals X and Y, the
latter beginning with the dropped point. Thus
(27)
Several properties of this matrix shed light on the process. Since diagonal
elements Pxx(t) have the value + I at t=O and decline to zero at t= 00 (by
the property of negative-exponential sojourn), Axx' the slope at the origin,
must always be negative. Similarly, since for nondiagonal elements Px/t) is
monotonically increasing from the value zero at the origin, nondiagonal Axy
are always positive. Furthermore, each row sum in the matrix must be zero,
since it is the derivative of ~yPxy(t) = l.
Writing the Markov transition probabilities also in matrix form,
and
tERDELYI, A., ed. (1954), McGraw-Hill, New York, p. 229, Eq. (4).
Continuous Time Processes 199
Then
P( T>M) = 1- /36.I+O( 6.1).
But P( T> t), the probability that the sojourn time exceeds I, is the probabil-
ity that a process in state x does not change to another state in time I:
and therefore, given that the system is in state x, the negative exponential
parameter is /3 = -;\. x x
Neglecting the sojourn times, the process behaves like a discrete time
Markov chain, with the transition matrix having zeros on the diagonal and
-;\. xy j;\. xx elsewhere. The Poisson continuous time Markov infinitesimal
matrix
-;\. ;\. 0
o -;\. ;\.
o 0 -;\.
o 0 0
o 0 0
o 0 0 1
However, it is not true that every discrete time Markov chain can be
regarded as a continuous time Markov process simply by introducing
negative exponential sojourn times, since there would be no provision for
the case where a chain goes from state x to state x. Introducing two
negative exponential sojourns with parameter;\' xx would destroy the Markov
property of the process, since the sum of the two sojourns would be gamma
distributed and would represent the time between transitions.
In this correspondence there is an interesting difference between the
stationary vector {'7Tx } of the chain and the equilibrium vector {pJ of the
process.
200 ChapterS
Which two-state Markov chain does this correspond to? Since the transition
probabilities were proportional to A and /L, the first guess might be the chain
with matrix
which indeed does have the equilibrium probabilities given above. But it
could be argued that the "right" discrete time analog would be a chain with
matrix
(~ ~),
since there is no probability of a transitIOn to the same state in the
continuous case, and for this chain the equilibrium probabilities are (1,1).
The problem of which chain corresponds to the Markov two-state process is
in fact exactly the "stick problem" of Section 2.2, with the sticks being the
sojourn times. The model leading to equiprobable equilibrium corresponds
to "choosing a stick" (i.e., a state), while the other model is "length biased"
according to sojourn length. In this light, it is natural that the continuous
time model should agree with the length-biased discrete time model.
In Section 5.10, the relationship will be examined more systematically,
and a formula connecting the two distributions will be established.
It is worth noting that the two distributions agree in this instance when
A= /L, that is, when the point process is Poisson. This is true more generally
and merely reflects that Poisson points are "at random," and so probabili-
ties evaluated at these points are not length biased towards either state.
The notation of this section is consistent with that of Sections 5.2 and
5.3: Let N(t) represent the number of renewals in the interval (0, fl, with
Continuous Time Processes 201
gap density g(x), distribution function G(x), mean gap y. Thus N(t) is the
random variable defining the counting distribution for the renewal process.
The expected value of this random variable is a function of t and is called
the renewal function:
00
Thus
because the convolution gX*( u) is the distribution of 'Tx = 0"0 + 0" 1 + ... + O"x-l'
a sum of x independent, equidistributed random variables. Since the mean
can be expressed as the sum of tails (d. Problem 37, Chapter I),
00
<f>( s)
(32)
f3(s)= s[I-<f>(s)]
as the relation between the gap and the renewal function Laplace trans-
forms. For some purposes, it is convenient to consider the derivative
202 ChapterS
of the renewal density is just sf3( s), a still simpler formula results:
</>(s)
a(s)= 1-</>(s). (33)
m(t)=g(t)+ {g(u)m(t-u)du,
a(s)
</>(s)= l+a(s). (34)
specified by the renewal function or the renewal density. This in turn shows
the importance of the renewal function.
There are two important theorems relating to N(t) and M( t), the
proofs of which are beyond the scope of this book, but which are intuitively
clear:
. M{t) 1
hm - - = - . (35a)
1--+00 t Y
Theorem 2
. N{t) 1
hm - - = - . (35b)
1--+00 t Y
It should be noted, however, that the first of these theorems follows from
the (unproved) n = 1 generalization of Theorem 2, Section 4.11:
lim m ( t ) = .l ,
1 ..... 00 Y
Although the emphasis so far has been towards denying any paradox in
this situation, it might be good at this stage to explain why the word
paradox is sometimes used. Suppose the renewal process refers to light
bulbs. Then, looking at the light bulb which is now burning-if one is
justified in regarding "now" as an arbitrary point in time-is not equivalent
to looking at a typical light bulb. The bias is in the direction of especially
long-lived light bulbs, since "now" is more likely to occur during one of the
long lives than during one of the short ones, just as in the case of the stick
problem.
In this section the problem of a "time-typical" interval, that is, the gap
surrounding an arbitrary point, will be more precisely formulated and
solved. It should be understood, in the first place, that the solution must
involve some kind of limiting procedure. This is because an "arbitrary"
point in time can only be chosen over a finite domain (rectangular distribu-
tion). It will therefore be necessary to choose a point t, solve the problem,
and then let t~ 00.
Three random variables are of interest: X( t), the time back to the most
recent renewal; Y( t), the time forward to the next renewal; and Z( t), the
time from the most recent renewal to the next renewal. Although Z( t) =
X(t)+ Y(t), it should be noted that the distribution of Z(t) cannot be
obtained as the convolution of the separate distributions, since X(t) and
Y( t) are not independent. Also, the ranges of Y(t) and Z( t) are (0,00),
while that of X( t) is (0, t).
Depending on the application, these random variables have been given
various names, as shown in Table 5.1. Using this terminology, in the simple
stick case of Section 2.2, it would be appropriate to say that (-r, 1) is the gap
distribution, and ( t, t) is the spread distribution.
g(x)[M(t)-M(t-x)], t>x,
(36)
g(x)[1 + M(t)], t~x.
00
00
~ P('Tr+I-'Tr<X,'Tr<t<'Tr+l)
r=O
00
~ P(or<x,t-or<'Tr<t).
r=O
Since Or and 'Tr are independent, differentiation with respect to x yields, for
t>x,
00
c( x I1 ) = g( x) ~ r
r=O /- x
g'*( u) du
= g( x )( M( 1 ) - M( 1- x )) .
00
c(xlt)=g{x) ~ {gr*(u)du=g(x)[I+M(t)]. o
r=O 0
where, using the convention of Section 4.l0 for mixed discrete and continu-
ous distributions, 8(t- x)= 1 when t=x, and 8(t- x)=O otherwise.
When t falls in the interval (Tr' Tr+ I)' the value of X( t) is stochastic,
namely, X(t)=t-Tr. Therefore, omitting for the moment the first term,
= ~ p(X(t)<x, Tr<t<Tr+l )
r=1
00
= ~ P(Tr>t-x,ar>t-Tr )
r= 1
a(xlt)= ~ gr*(t-x)[l-G(x)]
r=1
=m(t-x)[l-G(x)]
g( t + x ) + [m ( t - u ) g( x + u ) du. (38)
o
B(xlt)= ~ p(Y(t)<X,Tr<t<Tr+l )
r=O
00
= t
o r=O
00
~ P(t<ar+u<t+x)g'*(u)du
00
= [~ f+x-Ug(v)g'*(u)dvdu.
o r=O (-u
Continuous Time Processes 207
fo L g(t+x-u)g'*(u)du.
00
b(xlt)=
r=O
Since gO*( u) = 8( u), the first term becomes g( x + t) and the remaining terms
can be written
I I xg(x)
-[I-G(x)], -[I-G(x)], (39)
y y y
Proof. These results follow directly from the first three theorems and
the limits for m(t) and M(t) given in section 5.9. The three limiting
distributions will be denoted by a( x), b( x), and c( x). 0
f(s)
(40)
ys y
(i) M(t)=At,
(ii) m(t)=A,
(vi) a(x)=b(x)=g(x)=Ae- h ,
(42)
(43)
tIn this section, and in many which follow, Laplace transforms (of time) occur together with
probability generating functions. As an aid to memory, the variable z will be used for the
probability generating functions, and s for the Laplace transforms. Thus the variables s, t
correspond through the Laplace transform, and the variables x, z correspond through the
probability generating function. In functional form, x is written as a subscript, and the other
three in parentheses. When z occurs with s or t, it is written before the comma, with the s or t
variable after the comma. Naturally, this system does not apply in cases where Laplace
transforms are represented in terms of probability generating functions, such as will occur in
Chapter 6 (for example, Section 6.6). In these latter cases, the Laplace transform is not with
respect to a time variable, but with respect to a continuous density function, and the letter sis
retained for both the probability generating function and the Laplace transform.
210 Chapter 5
I-</>( s)
(44)
s[l-z</>(s)] ,
_ -I </>(s)
M(t)-t (45)
s [ I -<p ()]
s .
In the asynchronous case, the calculations are much the same, starting
with
(46)
and
leading to
_ ~ _ -1(1. + [1-<P(S)](Z-I))
'1T(~,t)-e 2[ ]' (48)
S s 1- z<p{ s)
M{t}=tjy, (49)
These formulas are selected from many which relate the synchronous
and asynchronous counting distributions to the gap distribution and to one
another. Others are given in the problem list for the chapter.
In the terminology of renewal theory, M(t) would be called the renewal
function for an equilibrium renewal process, and M( t) the renewal function
for an ordinary renewal process.
The emphasis placed in this section on two particular types of counting
procedures should not obscure the fact that the general renewal counting
distributions, based on an arbitrary initial gap, are also of importance. One
such process will be discussed in connection with particle counters in which
the first gap has transform A/( A+ s) and the subsequent gaps have trans-
form Ae -A~/( A+ s). These two transforms do not satisfy the relationship
between gap and excess given by Theorem 5 of Section 5.10, on which the
definition of asynchronous counting is based.
q>( s) = ( A~ s ( ,
so that the Laplace transform of the renewal density m(t) is
Ak
a(s)= (A+S)k_Ak' (50)
212 ChapterS
Inversion of this transform is usually not easy, and for large values of k it is
even necessary to use numerical methods. For k=2, however, there is a
simple solution:
Expanding into partial fractions or using the theorem of Section 4.12, the
student will be able to invert this Laplace transformation to obtain
(51)
From this result, it is easy to see by integration that the renewal function for
the two-stage Erlang process is
(52)
(53)
Let
Then
(x+ I)k-I
00 y(kj, At)
M( t ) =.~ r( k .) (56)
J=I 'J
_ I xk~ I t
Qx(t)=- ~
y j=(x~ I)k 0
fBj
xk
k ~ Aj' (57)
j=(x~ l)k+1
from which the student can obtain pAt) and the other quantities char-
acterizing the distribution.
(59)
and the counting distributions by an upper delta: p:( t), p:( t), Pxl!.( t), ~l!.( t),
Q~(t), Q~(t).
Synchronous Counting
By Section 4.11,
,f,X+I(S)
=E -le-sl!.(x+l) 'f' (60)
S
The Laplace inverse of cpx+ I( s)js is simply Qx(t), and powers of shave
the effect of differentiating the inverse, since Qx(O)=O. Therefore
(62)
This is a rather curious distribution, in that the argument on the right side
becomes negative when t<Ll(x+ I). In fact, since Q~(t) represents the
probability of more than x events beginning just after an event, such a count
will be impossible when t<Ll(x+ 1). Therefore the result should be stated in
Continuous Time Processes 215
full form:
t>~(X+ 1),
Then the exact probabilities can be obtained from the values for the tails as
follows:
BI
p~(t)= { I'
1,
- B~ - B ~ + Bg + B? + Bi , t > 3~,
{
pt(t)= I-B5-B~, 2~<t:S:3~,
0, t:S:2~,
and, in general,
x x-I
~ Bf+l- ~ Bf, t>{x+ I)~,
)=0 )=0
p:{t)= x-I
1- ~ Bf, x~<t:S:{x+ I)~,
)=0
0, t:S:x~.
216 Chapter 5
0, t~x~.
(65)
~ pit)=l-pjt)
)=0
would merely mean that for normalization to unity, the special value must
also be taken into account.
An important example of a divergent process is the birth process
treated in Section 5.5, where the coefficients ~ x are assumed to increase
more rapidly than the simple linear expression leading to the Yule process.
Let Z be the (finite or infinite) time required for an infinite number of
transitions. Then the probability of a divergent process is the probability
Continuous Time Processes 217
that Z is finite, that is P(Z<t); in other words, the distribution function for
Z represents the probability that an infinite number of transitions have
taken place before time t. The density function for Z is
When the series converges, so will the infinite product, and this convergence
is the necessary condition that Pw(t) be nonzero.
There is a theorem of algebrat to the effect that a necessary and
sufficient condition for the convergence of the infinite product (66) is the
convergence of the series ~A j I. Therefore the process will be divergent when
00 I
~ ~<oo. (67)
j=1 J
kOOk
f3(s)= -l!. + ~ _ J _ , (68)
s j=IS+Aj
tsee, for example, BROMWICH, THOMAS JOHN rANSON (1908), An Introduction to the Theory of
Infinite Series, Macmillan & Co., London. The proof of the theorem consists essentially of a
close examination of the logarithm of the infinite product.
218 ChapterS
ko=1
The problem of finding p",( t) thus depends on evaluating these limits, which
in some cases can be rather difficult.
k = I S
+J2
J s ......1m
-j2 s1+s
( )( 1+ 4I)
s ...
where
00
~= II (I - Aj) - I
j=1
(-Ir~
k = ------------ (70)
n An(An-1 -1)(An- 2 -I) ... (71.-1)
t This rapidly converging series is given in ERDEL YI, A. (1955), Higher Transcendental Func-
tions, Vol. 3, McGraw-Hill, New York, p. 177.
220 ChapterS
important processes not treated in the present book are Markov renewal
processes (see <;inlar, Chapter 10 and Ross, Chapter 7), martingales (see
Karlin and Taylor, Chapter 6), and processes based on the normal distribution
(Gaussian, Wiener, etc.) [see Hoel, Port, and Stone (1972) and Cox and
Miller (1965)].
5.15. Problemst
1. Using the methods of Section 5.10, find the joint distribution of X(t) and Y(t),
namely, density function
Use this result to prove Theorems 2 and 3 by direct integration and Theorem I
by integration over the domain where x+y=constant.
2. Consider a pure birth process in which the infinitesimal probabilities of a birth
in flt are Aflt+o(ilt) when there are an odd number alive at the beginning of
the interval, and /Lflt+o(ilt) when there are an even number alive at the
beginning of the interval. Define
Obtain the differential equations for A(t) and B(t) and solve.
3. The lifetime of a piece of machinery defines a renewal process with the usual
notation: g, </>, etc. The machine is installed and working at t=O, and inspected
tProblem 6 is taken from Karlin and Taylor (1975), and Problems 9 and 10 are taken from
Ross (1972), with kind permission of the publishers.
Continuous Time Processes 221
Z=lim Zi'
that is, the birthday age at which the machine breaks. (i) Show that Z" is a
Markov chain. (ii) Find the transition matrix in terms of g, G, </>, etc. (iii) Show
that P(Z=x)=G(x+ \)-G(x).
4. In a Poisson process with parameter A, suppose that N events occur in time t.
Find the density function for X, the time to the nth event (n<N).
Ans. X/t is beta distributed.
5. Two machines are working, each with negative exponential lifetimes and param-
eter A. When one fails, a replacement is furnished, the time to replace being also
negative exponential with parameter /L. Let X(t) be the number of working
machines at time t, state space (0, \,2). (i) Find the infinitesimal transition
matrix. (ii) Find the forward equations. (iii) Solve the forward equations to find
the transition probabilities Pxv(t).
6. Consider a Yule process with parameter A and initial state N= I. Suppose the
probability of death of the original ancestor in time tlt is f3tlt+o(tlt), given
that he is living at time t. Find the distribution of the number x of offspring
from the single ancestor and his descendants at the time of his death.
Ans. (f3/A)B(f3/A+ \, x+ 1)
7. Suppose N identical balls are distributed into two boxes. A ball in box A (box
B) remains there for a negative exponential time parameter A (parameter /L) and
goes to the other box. The balls act independently. Let X(t) denote the number
of balls in box A at time t. Then X(t) is a birth-and-death process defined over
0, \, ... , N. (i) Find the birth and death rates. (ii) Find PxN( t). (iii) Find E( X(t.
8. In a birth-and-death process, Ax =(\ + x) - I and /Lx =/L. Write down the for-
ward equations.
( loTxf( X) dx + T( \ - F( T) ) - 1
222 ChapterS
10. It might appear that the finiteness of m(t) would follow from the fact that N(t)
is finite. However, such reasoning is not valid, as the following example shows.
Let Ybe a random variable with P(Y=2/)=d)/, n= 1,2,3, .... Then P(Y<oo)
= I, but E(Y)=oo, as the student should show.
11. Consider two independent Poisson processes. Show that the distribution of the
number of events in one process which fall between two consecutive events in
the other process is geometric.
12. Consider a Poisson process with parameter A. Find the distribution of the
number of Poisson points which occur in an independent interval T which is
gamma distributed with parameters IL and k.
13. A machine breaks when N shocks have been received. If the shocks occur at
times which form a Poisson process with parameter A, find the density function
for the lifetime of the machine.
14. Let X(t) be a Markov process with state space (0,1) and transition probabilities
(i) -
QAt)=- 111
Y 0
Px-I(u)du,
(ii)
-iT (z; t) - I
---'--'-1- =
z-
1
-1 '1T(z;
Y 0
I
u) du,
d-iT '1T(z; t)
(iii) -=(z-I)--.
d'T Y
19. For the Erlang process with k = 2, obtain the renewal density function from the
renewal integral equation.
20. For the Erlang process with k=2, find the exact and asymptotic distributions of
spread, excess and deficit.
Continuous Time Processes 223
21. Show that the two examples of divergent birth processes given in Section 5.14
lead respectively to the following equations for the probability generating
function of the "finite" probabilities pAt), x=O, 1,2, ... :
(i) For"A j =/,
aq,(z,t) =_"A(I_z)(z2a2q, +zaq,).
at az 2 az
aq,(z,t)
at =-(I-z)q,("Az,t).
22. Referring to Theorem 4 of Section 5.10, let the limiting moments of the
distribution of y(t) be
",,= 1 x"
00 I-G(x)
dx,
o Y
and let the corresponding quantities for the gap distribution be IL". Show that
23. Show that for displaced negative-exponential gaps (Section 5.13) the asymptotic
distribution of excess (or deficit) given in Section 5.10 has Laplace transform
"A("A+s)-"A2e-'t.
s(I+"A.1)("A+s) ,
from
PXI'(t+.1t)= ~PX)(.1t)Pj,(t).
j
29. In thinning an Erlang process, find the limiting form of the thinned process as
the variance of the gap distribution approaches zero.
30. In the discussion of renewal theory, it has been assumed that the lifetimes are
finite. What modifications would need to be made if G(oo)< I?
that is, N is the maximum number of gaps that will fit into the counting period:
N=O, 1,2, .... Find the probability that there are x points in t, assuming that tis
equally likely to begin anywhere in a gap. Show that the probability generating
function of the counting distribution is
ZN
T(~-T+N~+zT-zN~).
37. By solving the equation of Problem 36, or directly from Eq. (23), show that the
nonequilibrium linear birth-and-death probabilities for A=/L are given by
(Atr- 1
Px(t)= ( )X+l' x=I,2,3, ....
l+At
38. Consider a linear birth-and-death process with a finite number of states n, so
that Ax=(n-x)A, /Lx=/Lx, OSxSn. Write down the basic differential equa-
tions of the process, and show that the equilibrium probabilities form a
binomial distribution.
6
"I"The expression "stochastic service systems." due to John Riordan. expresses the idea of a
queue quite well.
:I David George Kendall, English mathematician, 191 x-
225
226 Chapter 6
This section begins with the equations for the queue as a birth-and-death
process, specializing those obtained in Section 5.6:
ao
</>(z, t)= ~ Pj(t)zj
j=O
Table 6.1. Queueing notation (This system does not apply in Chapter 5)
Probability
Random Distribution Laplace generating
Name variable Density Probabili ties function transform function Mean Variance
Interarrival time d(x) 8(s) 1/,\
Service time U b(x) B(x) (J( s) I/p, v
Number of departures
in an interarrival r, lI(S)
period a
Number of arrivals
in a service period N qx lJ(s)
Number in the system X Px <1>( s)
Number queueing Y
Queueing time V a(x)h A(x) a(s)
Waiting time W c(x) C(x) y(s)
Residual service time U ii(s)
Discrete busy period K hx K(S)
Continuous busy period L f(x) a(s)
Balking X' gx Gx
CI'I
f
The Theory of Queues 229
for O~z~ 1. Let the original number in the system be n, so that Pn(O) = 1,
</>(z,O)=zn. It is convenient to reduce the differential equation to an
algebraic equation by using the Laplace transform:
(3)
(4)
and
The first task is to evaluate '1T(s), and so represent the transform 1/;(z, s)
in terms of s, z, and the parameters A, /1-, and n. To do this, the student
should note that the function is convergent for z < 1 and that ~ is the zero
less than one. Therefore ~ must also be a zero of the numerator, and thus
(5)
230 Chapter 6
so that the explicit form for the transform of the probability generating
function is
(6)
It is not easy to invert this expression with respect to the variable s, and
expansion in terms of z is also rather complicated. However, the student
specializing in transforms and higher transcendental functions may be able
to show from this equation that each of the probabilities can be expressed as
a finite series of Bessel functions of imaginary argument. t Instead of
pursuing the formal development further, some of the important special
cases will be discussed in the next sections.
When A>P. (i.e., p> I), the number present in the system tends to
increase without limit, but does so in a stochastic manner, so that the
possibility exists for decrease from time to time. The states of the system are
alI transient, in that the probability of a revisit tends to zero. Therefore the
probability of the system being in any state tends to zero, and this fact can
be inferred from the equations of Section 6.2.
It is not, however, useless to inquire about the evolution of the system.
Depending on how great the discrepancy between A and p., the time spent in
various states may vary considerably. For a given state x, define the
indicator random variable (Section 1.5) I( t):
when X(t)=x,
I(t}= {~
otherwise.
Then
t Details can be found in Cox, D. R., and MILLER, H. D. (1965), The Theory 0/ Stochastic
Processes, Methuen, London, p. 194 ff.
The Theory of Queues 231
is the expected total time spent in state x. Call this quantity T(x). Then
T( x) = 10 00
E(I( t ))dt. (7)
But the expected value of /(t) is only the probability Px( t) of the queue
being in state x, so that
Rather than evaluating this integral for each x, it is more natural to evaluate
all of the integrals by finding
and this is the same as 1/;(z,O). Therefore an investigation of the expected
duration in the transient states begins with setting s = in the formulas of
Section 6.2.
With s=O, consider the equations for ~ and 1/. The radical becomes
1.\ - fL 1 and it is at this stage that the assumption p> I comes into the
picture, giving the values
~= lip,
1/= 1.
pn+lzn+l_pnzn+I-1 +z
1/;( ,: , 0 ) = ......:..\p-n-~-I(-pz---l'-)(-l---z)-(-p--1) (8)
1
T{x)=---- x=O, I, ... , n,
Xpn-x-l(p_l) ,
(10)
1
T{x)= -,- , x=n+ 1, n+2, n+3 ....
"'-Jl
Note that if the queue begins empty (n = 0), the expected time spent in each
state is (X-Jl)-I.
From a practical point of view, the main interest is in the case p< I,
since infinite queues are more important as a theoretical concept than as an
observed reality. In the remainder of this chapter, except where particularly
stated, it is assumed that p< 1.
When p< 1, it has been shown already that the equilibrium distribution
is geometric. It is not difficult to obtain this result from the formulation of
Section 6.2. The values of ~ and 1) are respectively 1 and 1/p. Therefore,
using Theorem 2 of Section 4.11,
(11 )
. I-p
hm <I>{z, t)= -1- . ( 12)
1--->00 -pz
The Theory of Queues 233
In the second case there is also a service period, but it is preceded by an idle
period, that is, a time when the server is idle and the queue is empty. By the
memory less property of the negative exponential, this period, ending in an
arrival, has density Ae -Ax. Thus the total length between departures has
density
(13)
and this case has probability 1 - p, so that the contribution to the Laplace
transform is
A Jl
(14)
(I-P)A+S Jl+s'
Adding together the two Laplace transform pieces gives the result A/( A+ s).
There are many queueing variants on the M).. I Mp./I queue as for-
mulated in Sections 6.2-6.4. An item of information of great importance is
the equilibrium distribution of number in the system, or, failing that, the
mean and variance of the number in the system. The calculations needed to
obtain the equilibrium distribution often follow rather closely the birth-and-
death formulation and hardly require consideration of probability generat-
ing functions, Laplace transforms, and similar mathematical ornaments. In
this section, an example is given of quick calculations leading to the
equilibrium probabilities for a slight extension of the M)..IMp./I queue,
namely, that where n servers are working. It is sensible to assume that the
customers waiting are assigned to the first free server; otherwise the system
is no more than n independent queues.
This means that the parameters in the birth-and-death scheme are A for
arrivals, and for departures,
/LX, O::;x::;n,
/L = {
x /Ln, x?:.n.
(15)
Making the assumption of equilibrium, the left sides of these equations can
be set equal to zero. Letting Px( 00) = Px' that is, suppressing the time index
The Theory of Queues 235
in equilibrium, gives
In the usual way, the value of Po can be obtained from the condition that the
distribution normalizes to unity:
(17)
tIn this treatment, p="Ajp. as usual. However, some authors prefer to keep the symbol p to
denote the traffic intensity of the queue, i.e., the ratio of input to service, so that the
equilibrium condition remains p< I. In the notation used here, the equilibrium condition is
p j n < I, since there are II servers at work.
236 Chapter 6
which yield
x=o, I, ... , n,
that is, the truncated Poisson distribution:
_ pX/x!
Px x=o, I, ... , n. (19)
I + p + P /2 + ... + pn/ n ! '
- 2
For the design of the switchboard or of the parking lot, the value Pn , the last
value, is the important one, since it gives the probability that, in equi-
librium, the facility will be full and a loss will occur. This formula,
Pn/n'
Pn =p(1oss) = . , (20)
I +p+p2/2+ ... +pn/n!
l/!L= -/3'(0),
with variance
v=/3"(O)- [/3'(O)t
for fixed u.
ooe-XU(AUr
qx=l , b(u)du. (21 )
ox.
00
fJ(s)= ~ q)sJ
)=0
=
00
~
)=0
s11
.
0
00
.,
e-XU(Au)i
J.
b(u)du
= 1o 00
e -XUHSUb( u) du
qo q, q2
qo q, q2
o qo q,
o 0 qo
leading to
O( s )( I - s )( I - p )
</>(S)= O(S)-S ' (23)
</>( s) = _(I_-_s)-::-(1_-_p_)p-'7[A_(I_-_s-,,-)]
(24)
P[A(I -s)]-s
The student can verify that this formula is consistent with the M 1MI I,
result, that is, if
P(s)= +/L ,
/L s
then
I-p
</>(s)=-I-
-ps
b(x )=S(x-I//L),
Therefore
(I-s)(l-p)
</>(s)= l-seP(I-s)
00
Therefore
This well-known and useful formula expresses the mean number in the
system in equilibrium in terms of the mean and variance of the service time
The Theory of Queues 241
O(1)=P(O)= 1,
O'(l)=p,
Then, differentiating,
(28)
It is easy to see from this formula that the minimum mean number in
the system is achieved by setting the variance equal to zero, assuming a
fixed service rate. This means that the MIDI 1 queue yields less congestion
than any other MIG/I system.
An Alternative Proof
For the MIG/I queue, the service times form a renewal process when
service is taking place, and the arrivals occur at instants which have no
relationship to service. Therefore the distribution of the time from an arrival
tA slight variant of this form relates to the mean wai ting time rather than the mean number in
the system; see Section 6.R.
242 Chapter 6
+ -2
expected residual service = v JL p,
2JL - I
The relationship between E(Y) and E(X), the total number in the system,
is not, as one might guess, E(X)=E(Y)+ I; this is true when the service is
occupied. Otherwise E(Y)=E(X). Thus
=E(X)-p, (29)
E(X)=AE(W). (30)
tThis formula is usually known as L=XW, where Land Ware the respective expectations.
There is an extensive literature on the subject: see, for example, Operations Research 9,
383-387(1961),15,1109-1116(1967); 17,915-917(1969); 18, 172-174(1970); 20,1115-1126
(1972); 20, 1127-1136 (1972); 22, 417-421 (1974).
244 Chapter 6
E(Y)='\E(V); (31)
and
In(t)=O otherwise.
Then
00
X( t ) = }: Ii t )
j=1
and
00
a(x)= ~ (JLe-l'x)i*(I_p)pi
j==1
(32)
where the equilibrium assumption means that JL>A. The student can verify
that this normalizes to p. Taking into consideration the discrete component
at the origin of magnitude 1- p, the Laplace transform can be written
(33)
tThe transforms are Laplace transforms of the densities and Laplace-Stieltjes transforms of
the distribution functions.
246 Chapter 6
y( s ) =p( s ) 0:( S )
JL(I-p)
JL-;\+s
(34)
Px= lOe-AU(;\urc(u)du,
o
so that
=y[;\(I-s)] . (35)
which yields the following expression for the waiting time transform in
terms of the service time transform and the probability generating function
for the number in the system:
( ) _ <p[(;\-s)/;\] (37)
0: s - p(s) .
The Theory of Queues 247
s(1- p)
a(s)= ;\,8(s)-;\+s' (38)
or
s,8( s )(1- p )
y(s)= ;\,8(s)-;\+s. (39)
It is, of course, easy to verify that these formulas give the correct answer in
the MIMII case.
A convenient form of a(s) can be obtained by considering the residual
service time when a customer enters the queue, that is, the time from his
entry, assuming there is service in process, until that service terminates. Let
the residual service time be denoted by fJ with Laplace transform ji(s).
Then, noting Theorem 4, Section 5.10,
a( s ) = _I_--::::p_
1- p,8 (s)
00
(41 )
VItI
A(x, t)=p(V(t)<x),
iSee the footnote in Section 5.11. Here the transform is with respect to x, not t.
The Theory of Queues 249
Then, considering the graph of V(t) at times t and t+~t, it is easy to see
that
X j X+6.t .
A(x-u+~t,t)b(u)du+o(~t).
o
The first term corresponds to the situation in which there is no arrival, and
the second term to the situation where there is an arrival. Thus the first
element M of the Kendall symbol is assumed. The equation is not as yet
in the proper form for transformation into a differential equation, simply
because the ~t is attached to one variable on the left side and a different
variable on the right. This can be remedied, however, by use of the mean
value theorem with respect to the variable x:
dA
A(x+~t, t)=A(x, t)+ dx ~t+o(~t).
Substituting this value, it is now easy to form the following partial differen-
tial integral equation for A:
a a (x
atA(x, t)= axA(x, t)-XA(x, t)+X Jo A(x-u, t)b(u)du. (42)
In taking the Laplace transform of this equation with respect to the variable
*
x, it is necessary to remember [ef. Section 4.11, Eq. (37)] that in general
F(O + , t) O. The transformation yields the equation
a
ata(s, t)=sa(s, t)-sA(O+, t)-Xa(s, t)+a(s, t)f3(s). (43)
sA(O+ )
a(s)= Xf3(s)-X+s. (44)
Thus in the MIG I I case the probability distribution for the virtual
queueing time is the same as that obtained for the actual queueing time. The
reason is easy to see: For in an MIG/I queue, arrivals, being "at random"
in accordance with a Poisson process, occur at instants which are "typical"
values of the virtual queueing time.
Proof. Let x='2. J=obj yj. Clearly bo=O, since y=x-X2+~X3_ ... ,
and if there were a constant term in the expansion of x, there would also be
one in the expansion of y: bo -b5 + ~ b6 -(l/3!)b6 + ... =boe -h o *0.
lie t------,,..,....
~-~----~----------_+--------~x
Therefore
00
x= ~ bjyj.
j=1
1= ~ 'b, j- I dy
,~} JY dx'
)=1
If j=l=r,
'-r- I dy _ 1 dyj-r
yJ - ----
dx - j-r dx
=_.I_!!'-[xj-r(A +A x+ .. . )],
} - r dx 0 I
00 (xj);
=x-j ~ .,
;=0 I.
Thus
=
} 'b,J }'j-I/( } '-1)'.,
The discussion of queues has thus far dealt with two important random
variables: the number in the system and the waiting time. A third variable
of considerable practical importance is the duration of the busy period. This
variable, together with the duration of the idle period, characterizes the
operation of the system from the point of view of the server, rather than
from the point of view of the customer.
Two preliminary remarks are necessary. First, the idle period in an
MIGII queue is negative exponentially distributed with parameter A,
simply because it is terminated by an (Poisson) arrival instant. Second, the
busy period can be defined to be either discrete or continuous. The discrete
busy period is defined to be the number served between two consecutive idle
periods: random variable K, probability hx (x= 1,2,3, ... ), probability
generating function K( s). The continuous busy period is defined as the time
needed to serve these K customers: random variable L, density f(x)
(O<x<oo), Laplace transform (1(s).
t Emile Borel, French mathematician, 1871-1956. The work on the busy period distribution was
published during the German occupation of France. Could wartime conditions have contrib
uted a motive for studying queues?
The Theory of Queues 253
since the last factor represents the probability of no Poisson arrivals during
n k service periods. Note that K = 1 + n I + n 2 + ... + n k' so that the proba-
bility of this type of a busy period (that is, with n 1 following the first
customer, etc.) can be written
() K-l
~~p e -pK ,
where Q is a function of n I' n 2' ... , n k only and does not depend on the
value of p.
In order to find h x = P( K = x), it is necessary to sum all possible
expressions of this form, subject to the condition that K = 1+ ~n j' Rather
than bother with such complicated algebra, it will be equivalent to write
pe-P=z.
Then
00 1 00
~
~
h} 0 = 1= - ~
~ Qzi
} '
j=l P j=l
or
x-I
h = ,
x _X_px-Ie-px
' x= 1,2,3, .... (48)
x.
~ h)=l-h oo '
)=1
so that p' has the series expansion given in Section 6.10, withy=pe- p Then
I-h = -1 ~ QZl.
00 P 1
-~
p
so that
From this formula it is clear that when p= 1, hoo =0, and the Borel
distribution holds.
The student should show that the mean of the Borel distribution is
(1- p) -I and that the variance is p(1- p) -3.
parameter p/(l +p). The student should show this, either directly or by
substitution into the M / G/ I result of Section 6.6. Thus the probability of,
for example, n 2 arrivals during n I services will be of the negative binomial
form
p p'
(52)
(53)
(54)
leading to
pK(s)e -pK(S) = pse -P,
(55)
The M I M II Queue
K(S )=s[1 +p-pK(S)] -I,
1 { 1+p-+-
K(S)= 2p _ [ (I +p) 2 -4ps ] 1/2} , (56)
(57)
The mean discrete busy period for this queue can be shown to agree
with that for the M1M11 and MIDI 1 queues, either by differentiation or
by the following direct argument.
258 Chapter 6
Consider an entire cycle from the instant when the system becomes
empty to the next instant when it becomes empty. This cycle consists of one
busy period and one idle period for the server. In this context "busy period"
means, of course, "continuous busy period." The time spent in equilibrium
in the two states, busy and idle, will be proportional to the probabilities that
the server is busy or idle, and these, from Section 6.6, are respectively p and
I-p. Furthermore, the expected length of the idle period must be I/A,
since it is the mean of a negative exponentially distributed random variable
with parameter A. Thus the mean length of the continuous busy period Q
satisfies
Q P
I/A I-p'
so that
Q=1. _I_
lL I-p'
I_p'
Although closely connected with the discrete busy period, the continu-
ous busy period L has a less convenient distribution, even in the simplest
cases. For example, in the M / M / I case, the distribution f( x) of L involves
Bessel functions. It is not difficult, however, to derive a functional equation
in the Laplace transform a(s) of the continuous busy period distribution,
which is similar to Eq. (44) of Section 4.13.
The argument begins with conditioning on a fixed value U for the
service time of the first arriving customer, and then on a fixed value j for the
number of customers arriving during u. The distribution of u is b(x), and
that of j is Poisson with parameter AU. With these quantities fixed, the
length of the busy period equals the combined lengths of each of the j busy
periods induced by the j arrivals. This means that the busy period distribu-
tion, conditional on j and u, is the j-fold convolution of f(x) with itself.
The Theory of Queues 259
or, unconditionally,
)=0
.
J.
(58)
This equation, like so many other similar ones, can best be unraveled
with the assistance of a Laplace transform. Since the transform of fi. is aJ,
the transform of Ji'(x-u) is e-USai(s). Therefore
a(s)= ~
00
)=0 0
1OO(AUY .
-.,-e-AUe-USaJ(s)b(u)du
J.
= 1000
exp[ -US-UA+UAa(s)]b(u)du
=,8[S+A-Aa(s)] , (59)
tHowever,students familiar with the Lagrange series [see WHIITAKER, E. T. and WATSON, G.
N. (1927), A Course of Modern Analysis, 4th ed., Cambridge University Press, pp. 132-133)
may be able to obtain the solution directly.
260 Chapter 6
the transform of which yields a Bessel function for f( x), and in the MIDI I
case,
(61 )
I 00 ' j - I
f{x}=- ~ ~pje-pj8{x-jlp.). (62)
P j=1 J.
This section is concerned with two ways to generalize the basic busy
period distributions h x and f( x), a discrete generalization and a continuous
generalization. Each of the ways of generalizing has some interesting
applications, and each may be applied either to h x or to f( x). Table 6.1 on
p. 228 will help to keep the terminology and notation clear.
Discrete generalization with parameter r, r= 1, 2, 3, .... It is assumed that
the busy period begins not with a single customer, but with an accumulation
of r customers in the system. Thus the basic busy periods are obtained by
setting r= 1.
Continuous generalization with parameter T, O<T< 00. It is assumed
that there is an initial period of duration T during which customers accu-
mulate and wait but are not served. Then the busy period begins with the
number of customers who have arrived during T as the value of r for the
discrete generalization. It is not correct, however, to assume that the basic
busy period distributions would be obtained by setting T=O (or any other
value), since the basic busy periods are started not after any particular
length of time, but rather when the first arrival occurs.
The discussion deals first with discrete busy period distributions (both
generalizations) and then with continuous busy period distributions (both
generalizations).
The Theory of Queues 261
(63)
together with
The proof of the theorem consists in now showing that exactly r of each
group of x patterns is "admissible" (i.e., leads to a busy period of x
customers), assuming that the total number of arrivals is x - r.
The situation for x = 10, r= 3 is shown in Fig. 4, and the proof is keyed
to that figure. In order to obtain all the cycles, the original pattern is
assumed to be repeated once; then beginning with each of the first x service
periods is equivalent to considering each of the x cycles. It is convenient to
start with a queue of n (a large number) to illustrate all the cycles, since all
I 2 4 5 6 7 8 9 II 12 13 14 15 16 18 19 20
0 2 0 I 2 0 0 I I 0 2 0 0 I 2 0 I I
n-I -- f--- - --
n-2
n-3
0-4
n-5
n-6
Figure 4. Pattern of arrivals and departures from a queue. The upper row of numbers
represents the service times, the lower row, the number of arrivals during a service
time. Arrows indicate admissible patterns. Adapted from TANNER, 1. C. (1961), "A
derivation of the Borel distribution," Biometrika 48, pp. 222-224.
The Theory of Queues 263
that need be shown is whether or not a cycle ends with a smaller number in
the system than ever before.
It is clear that at the beginning of the (x+ l)st service, the number in
the system must be n - r, since there have been x - r arrivals and x
completed services. Similarly, at the beginning of each service period in the
second half of the pattern, the number in the system is just r less than it was
x services earlier.
Therefore, since the number of steps down is exactly r during the first
half of the pattern, it is exactly r during the second half, and must on
exactly r occasions take on a value lower than any previous value at the
beginning of a service period. D
This concludes the proof of the theorem by Tanner's combinatorial
method. In the special cases M1MI I and MIDI I, proofs similar to Borel's
proof (Section 6.11) can be constructed, except for the evaluation of the
coefficients, which must then be done, as Borel did, by integration in the
complex plane, rather than by use of Section 6.10. These special results are
given in the following theorems; the proofs, specializing Theorem I, are left
as exercises for the student.
r(2X-r-l) px-r
h r *= - x=r, r+ l,r+2, .... (65)
x X x-I (l+p)2x-r'
x e-~T(ATY .
hAT)= ~ ., h{*, x= 1,2,3, ... ,
)=1 J.
(66)
x=1,2,3, ... ,
(67)
x= 1,2,3, ... ,
(68)
x= 1,2,3, ... ,
(69)
o<x<oo,
(72)
n=1
266 Chapter 6
a( s, 7) = e -,"+;\TO(S). (73)
Proof
Up to this point the emphasis has been on queues with Poisson arrival
processes. The reasons for this approach are two: (i) In the largest categories
of applications, the assumption provides a good model for reality. Arrivals
to a queue are often uncontrolled-they appear from no organized pool and
can be well approximated by a Poisson stream. (ii) By treating queues of
this type, it has been possible to illustrate a number of the basic techniques:
differential difference equations, imbedded Markov chains, branching
processes, transforms and generating functions, and so forth. By the judi-
cious application of these techniques, it is possible to solve many other types
of queueing configurations suggested by practical problems.
In some respects, the queue with Kendall symbol GI Milt is an image
of MIG I I, but it has one striking difference: The equilibrium probabilities
for the imbedded Markov chain of the number of customers in the system
are geometric, independent of the input distribution. The interarrival as-
sumptions, in fact, determine only the value of the geometric parameter. In
order to show this, it is necessary to go through the usual steps of defining
and solving the imbedded chain.
In a GI Mil queue, the aging takes place between one arrival and
another, and so the regeneration points are instants when a customer
arrives. Suppose the interarrival density function is d( x) with Laplace
transform c5(s) and mean -c5(O) = I/A. Analogously to the qx of Section 6.6,
let rx denote the probability of x service terminations during one interarrival
gap, assuming that the server is never free during this period. This expression
in italics is unnecessary in the MIG/l definition of qx by definition, and
the lack of exact symmetry between the cases first appears here. Let the
probability generating function of the rx be 1/(5). Then it is easy to express
1/(s) in terms of c5(s):
(pt r
r
x
=i0 oo --e-P.td(t)dt
x! '
so that
(74)
tSome authors use Gt in place of G; this renewal process is also called "Palm input."
268 Chapter 6
The student should check carefully to see where the italicized assumption is
required in this calculation.
In writing the transition matrix for the imbedded chain, every element
can be expressed as one of the 'x' except for those in the first column. The
reason for this is simply that a transition from any value x to zero represents
a period (between one arrival and the next) which contains a certain time
when the queue is empty, and so violates the italicized condition. Neverthe-
less, these probabilities can be filled in simply by assuming normalization of
rows to unity. The student, taking into consideration these principles,
should show that the transition matrix can be written
00
~Ij
1
00
'0
0 0
Before writing down and solving for the equilibrium probabilities defined
by this matrix, it is first necessary to take a small excursion. Consider 'x as a
probability distribution. The mean value is easily calculated:
Thus, for the equilibrium condition p< 1, the mean is > 1, and the result of
Section l.l3 shows that the equation
=l-r
Px = ~ P/j-x+l
j=x+l
00
=(l-n ~ ~j'j-X+l
j=x+l
00
=(l-n ~ ~j+X-l'j
j=O
00
=(l-n~x-l ~ ~j'j
j=O
o
Theorem 2. The asymptotic waiting time distribution is the following
mixed discrete and continuous distribution:
Proof. The proof is left as an exercise. [Note the difference between the
impulse function 8( x) in this theorem and the similarly designated Laplace
transform 8(s).] 0
tWith considerably more difficulty, it is possible to prove this by "discovering" the geometric
probabilities, rather than by verifying them.
270 Chapter 6
6.16. Balking
gx =p( X'=x),
x= 1,2,3, ... ,
x= 1,2,3, ... ,
with solution
where
x-I
cx = II
)=1
Hj ,
gx=(I-r)rx.
Then
and
so that
272 Chapter 6
where t
x=O, I, ... , M,
Therefore
e, = I, x=O, I, ... , M+ I,
so that
I-p
Po= l_pM+2
and
_ x I-p
Px -p I -pM+2' x=O, I, ... , M+ 1.
t Itis customary to assign the number I to the highest priority class, 2 to the second highest,
etc. This can lead to curious locution: for class r is "higher" than class r+ I.
*JAISWAL, N. K. (1969), Priority Queues, Academic Press, New York; COHEN J. W. (1969), The
Single Server Queue, American Elsevier, New York.
274 Chapter 6
The present section contains a typical result, the mean queueing time
for the priority classes of an M / G/1 queue. The calculation of the exact
probabilities is not really difficult, but in many realistic queueing situations,
information on expected values is all that is needed, and this is usually
found by a direct argument, such as the one given here.
Consider a queue with priority classes 1,2, ... , N, with arrival rates for
the various classes A" A2 , ... , AN' service time distributions b,(x),
b2 ( x), ... , b N ( x), mean values 1/IL" 1/ 1L2' . ' 1/IL N' and queueing times
V" V2 , , VN The problem is to express E( Vk ) in terms of the other
quantities. It is clear, first of all, that the queueing time is composed of three
parts for a head-of-the-line system, namely, (i) the residual service of the
customer already in service, if there is one, (ii) the time to service all
customers of equal or higher priority who are already in the queue, and (iii)
the time to service all customers of higher priority who arrive before the
customer of priority k can enter service.
Let the mean residual service time be denoted by n. Then, since there is
no contribution to E(Vk ) when the service mechanism is idle, the residual
service conditional on the customer in service being of priority j is given by
(78)
The Theory of Queues 275
lIN
- =}:: ~ Pj'
P. j=l
(79)
Next, consider the second component of E(Vk ), the service times of all
customers in queue of priorities 1,2, ... , k. The expected number of customers
of priority j queueing, by Section 6.8, is AjE( J-j), and the expected service
time of such a customer is 1/p. j. Therefore the contribution of these
customers to E(Vk ) is
k
~ PjE{J-j).
j=l
and so forth.
276 Chapter 6
tWHITE, H., and CHRISTIE, L. S. (1958), "Queuing with preemptive priorities or with break-
down," Operations Research 6, 79-95.
The Theory of Queues 277
Section 6.14. The distribution of 7" is denoted by b(x), so that the uncondi-
tional transform of the queueing time can be written, for the case where the
service is occupied on arrival,
(81)
if (s ) = 1!:.s [ I - f3( s )] ,
so that
R _ X[I-f3(s+X-X(J(s))]
ex (s) - I - p + s + X- X(J( S )
Using Eq. (59) of Section 6.13, the numerator can be simplified, giving the
form
R _ X[I-(J(s)]
ex (s) - I - p + S + X- X(J( s) . (83)
The student should show that for the M / M / I case, this reduces to
where
U=A+s-AO'(s ),
so that
O'(s )=,B( U)
and
du
ds =1-AO"(s).
daR(s) A du
-d-'-s-'-= u2 ds [,B(u)-u,B'(u)-l].
This equation must be evaluated at the point s=O. Since 0"(0) is minus the
continuous busy period mean, given in Section 6.12 as
p. I-p'
it follows that
6.19. Problemst
I. Find the distribution of the number of arrivals in n service periods, both for- the
MjMjl and MjDjl queues.
tproblem 17 is taken from Lindley (1965) (see p. 132), with kind permission of the publisher.
280 Chapter 6
2. Consider a queue in which arrivals and departures are dependent on the number
in the system, specifically that A and /L are replaced by Ax and /Lx when there are
a number x in the system. Show that equilibrium exists only if the series
converges. Consider the special cases: (i) Ax =A/(x+ 1), /Lx =/L. This would
indicate a queue where new arrivals are discouraged by large numbers in the
system, rather similarly to balking. Show that the equilibrium distribution is
Poisson. (ii) Ax =A, /Lx =x/L for x<N, /Lx =N/L, for x~N. This means that there
are N servers. Obtain the equilibrium distribution for the number in the system.
3. In Problem 2, let A, =pxA, x~O, O<p< 1 and /Lx =/L. (i) Find the equilibrium
probability Px of x customers in the systems in terms of Po. (ii) Evaluate Po.
4. In an M 1M11 queue, there are only two waiting places available; when these
are filled, arrivals are permanently lost. The person in service does not occupy
one of the waiting places. Define the states of the system; obtain the equilibrium
equations for the state probabilities; find the probability generating function,
and by differentiation of this function obtain the mean value of the number in
the system.
5. Consider an MIM/I queue with
Ax=O, x>n,
(i) Find explicitly the equilibrium probabilities for the number in the system in
terms of A, /L, and n. (ii) Show that the expected number in the system is
nl{\+l/p)
6. For an MIMln queue, find (i) the expected number queueing, (ii) the probabil-
ity that all servers are occupied, (iii) the probability that there will be someone
queueing, (iv) the mean queueing time for those who queue, (v) the mean
queueing time for all arrivals, and (vi) the probability that x servers are busy.
7. In an MIG/oo queue, the service is empty at time zero. Show that the
probability of x departures in [0, t) is Poisson with parameter
1be Theory of Queues 281
Po=!(I-p ),
11. Use the following outline to given an alternate proof for the probability
generating function result of Section 6.6. Let Xn be the number in the system at
the end of the nth service, and let Nn be the number of arrivals in the nth
service. Show that Xn+ 1 =Xn -~(Xn)+Nn+I' where~(x)= l,x>Oand ~(x)=O,
x:50. Form
and using the independence of XII and N,,, express this in terms of </>II(S) and
8(s). Pass to the limit as n-+ 00.
12. Show that the equilibrium mean number in the M1MIn system is
npPn
p+ , .
(n-pr
13. Consider an MIG/I system in which no queue is allowed to form. When the
server is occupied, incoming customers are lost. There are two counters for the
server, and when both are empty, customers are sent to counter A with
probability P and to counter B with probability I-p. At each counter the
service is negative exponential with parameters 2J'p and 2J'(1-p), respectively.
(i) Solve for the equilibrium probability of an empty system. (ii) Find the
probability that counter.A is occupied. (iii) Find the probability that both
counters are occupied.
282 Chapter 6
eAXb(x)
/3(A)
17. You are in a G1MI I queue. Show that the probability that you will have to
queue for more than three times as long as the person in front of you is i.
18. In a G1M100 queue, let ZI/ be the number of busy servers at the instant of the
nth arrival. Show that
P(ZI/+I=yIZI/=X)= ( -y-
x+1 )1 0
00 _.
e I"U(I-e
_ '-1'+1
I'll). d(u)du.
19. In a DIM11 queue with p> 1 and one person in service and no one queueing,
show that the probability that the server will ever again be idle is lip.
20. In a GI D11 queue, show that
8( s)( 1- s)
.p(s)= 8(s)-s .p(0).
ton( I)
1-(1'(1)'
21. Work out Section 6.15 for EA arrivals as far as you can. For E 2 , express the
mean E( X) in terms of p.
22. Consider a queue in which arrivals occur in a Poisson process of groups, with gl/
being the probability of a group of size n, n = 1,2,3, ... , and probability
generating function g( s) ='2:g ,Sl. (i) Show that the generating function for the
number of customers arriving in time t is given by exp[Atg(s)-At). (ii) Show
The Theory of Queues 283
26. Work Problem 25 with the following modified assumptions. The customer
enters the system only if the first server is free, and on completing the first
service, enters the second service only if that server is free. Customers not
entering either service are permanently lost to the system.
27. Consider an M 1MI I queue in which the server remains idle until the queue size
reaches n. Then the service of these n customers takes place, and also all other
arrivals during this period. When at last the queue is empty, and the server idle,
the process is repeated.
28. Consider an assembly line moving with uniform speed with items for service
spaced along it. A single server moves with the line while serving, and against it
with infinite velocity while transferring to the next item. The line has a barrier at
which service must be broken off and the server is absorbed. A server with
negative-exponential (A) service time starts service on an item when it is time T
from the barrier. The spacings between items form a renewal process with
distribution function B( x). Show that the probability generating function for
284 Chapter 6
the number of items served before absorption satisfies the integral equation
Hint: Let Z( t) be the distance to the barrier at time t, so that Z(O) = T and
Z(t)=T+X(t)-(, where X(t) is the spacing between the first and last item
completed in (0, (). Then the time to absorption
inf( (I T+ X( () - (::;0)
where 8xv =0, x*y, and 8xx = I, and any symbol with a negative subscript is
interpreted as being equal to zero. Show that the mean number of nonpriority
customers in the system is
31. In an M 1MI I queue, suppose there are only a finite number n of possible
customers; any customer not in the system at time ( has a constant probability
p!:i. ( of joining the queue during (, (+!:i. (). Find the equilibrium distribution for
the number in the system.
32. Consider a queue with MA arrivals, where service is instantaneous but only
available at service instants, which are separated by equidistributed intervals,
with distribution function H(x), with E=tH. Suppose that the maximum
number that can be served in a service instant is k, and let Z/ denote the
number of customers in the system just before the Jth service instant. (i) Show
The Theory of Queues 285
that
where Ni is the number of arrivals in the interval between the j th and (j + I)th
service instants. (ii) Show that the probability generating function of ~ is
e( X( I - s)), and find the equilibrium probability generating function for Z. (iii)
Discuss the applicability of the model to a queue of vehicles at a traffic light.
33. An M / M / I queue with feedback has the following defining property: When a
customer finishes service, departure occurs with probability p; with probability
I - p, the customer rejoins the end of the queue. (i) Find the equilibrium
probabilities for the number in the system. (ii) Find the probability that a
customer is served x times. (iii) Find the expected total time a customer spends
in the system.
34. In an M / M / I queue, the service rate is JL only when there are less than three
customers in the system. When there are three or more, another server is
employed, so that the service rate is increased to 2JL. Formulate and find the
equilibrium probabilities.
35. Consider a bulk service system. Whenever the server becomes free, he accepts
two customers from the queue into service simultaneously, or, if only one is in
the queue, he accepts that one. In either case, the service time for the group (of
size 1 or 2) is distributed as b( x) with mean 1/JL. Let X" be the number of
customers in the system after the nth service, with equilibrium probability
generating function </>( s), and let N" be the number of arrivals during the nth
service, with equilibrium probability generating function fJ(s). Define p=X/2JL.
(i) Show that fJ(s)=P(X-Xs). (ii) Find E(X) in terms of var(X) and P(X" =0).
(iii) Express </>(s) in terms of pes), P(X" =0), and P(X" = 1). (iv) Express
P(X" = 1) in terms of P(X" =0).
36. Consider an M/ M/00 queue with batch arrivals; the batch size being geometri-
cally distributed with parameter r. Formulate the number in the system as a
continuous time Markov process and find the infinitesimal matrix of the
process. Find the probability generating function for the equilibrium distribu-
tion of the process.
37. Show that the results of Section 6.5, Eqs. (16) and (17), can be obtained as a
special case of Section 5.6, Eqs. (21) and (22).
Index