Introduction To Probability Theory-3
Introduction To Probability Theory-3
Port
Stone
Introduction
to Probability
Theory
'.
-:
. ...
'
~~~~~: ~-
. .
._.-
LEO BREIMAN
Statistics: Uncertainty
and Behavior
Introduction to
Probability
Theory
Paul G. Hoel
Sidney C. Port
Charles J. Stone
University of California, Los Angeles
BOSTON
COPYRIGHT
1971
0-39'5-04636-X
74-136173
General Preface
Preface
Table of Contents
Probability Spaces
1.1
2
6
1.2
1.3
1.4
1.5
Combinatorial Analysis
2.1
2.2
2.3
2.4
*2.5
*2.6
*2.7
*2.8
10
14
18
27
Ordered samples
Permutations
Combinations (unordered samples)
Partitions
Union of events
Matching problems
Occupancy problems
Number of empty boxes
27
30
31
34
38
40
43
44
49
3.1
3.2
3.3
3.4
50
57
3.5
3.6
Definitions
Computations with densities
Discrete random vectors
Independent random variables
3.4.1 The multinomial distribution
3.4.2 Poisson approximation to the binomial distribution
Infinite sequences of Bernoulli trials
Sums of independent random variables
4.2
Definition of expectation
Properties of expectation
ix
60
63
66
69
70
72
82
84
85
Contents
4.3
4.4
4.5
4.6
Moments
Variance of a sum
Correlation coefficient
Chebyshev's Inequality
92
96
99
100
109
110
112
115
117
123
124
124
126
128
131
5.1
5.2
5.3
*5.4
139
6.1
6.2
139
145
145
150
153
155
157
160
163
166
173
173
174
177
181
183
186
190
6.3
6.4.
*6.5
*6.6
*6.7
7.1
7.2
7.3
7.4
7.5
*8
197
197
200
Contents
xi
8.3
8.4
*9
205
209
216
9.1
9.2
9.3
9.4
9.5
216
219
225
228
230
Random walks
Simple random walks
Construction of a Poisson process
Distance to particles
Waiting times
Answers to Exercises
Table I
Index
239
252
255
Probability Spaces
P1obsbHity Spaces
the probability that a given newborn baby will live at least 70 years? Various
attempts have been made, none of them totally acceptable, to give alternative
interpretations to such probability statements.
For the mathematical theory of probability the interpretation of probabilities
is irrelevant, just as in geometry the interpretation of points, lines, and planes is
irrelevant. We will use the relative frequency interpretation of probabilities only as
an intuitive motivation for the definitions and theorems we will be developing
throughout the book.
1.1.
In this section we will discuss two simple examples of random phenomena in order to motivate the formal structure of the theory.
A box has s balls, labeled 1, 2, . .. , s but otherwise
identical. Consider the following experiment. The balls are mixed up well
in the box and a person reaches into the box and draws a ball. The
number of the ball is noted and the ball is returned to the box. The outcome of the experiment is the number on the ball selected. About this
experiment we can make no nontrivial prediction.
Example 1.
Suppose we repeat the above experiment n times. Let N,(k) denote the
number of times the ball labeled k was drawn during these n trials of the
experiment. Assume that we had, say, s = 3 balls and n = 20 trials.
The outcomes of these 20 trials could be described by listing the numbers
which appeared in the order they were observed. A typical result might be
l, 1,3, 2,
in which case
and
The relative frequencies (i.e., proportion of times) of the outcomes l, 2,
and 3 are then
Nzo(l) = .25,
20
N 20(2) = .40,
20
and
N zo(3)
= .35.
20
As the number of trials gets large we would expect the relative frequencies N,(1)/n, ... , N,(s)/n to settle down to some fixed numbers
PH p 2 , , Ps (which according to our intuition in this case should all
be 1/s).
By the relative frequency interpretation, the number p 1 would be called
the probability that the ith ball will be drawn when the experiment is
performed once (i = 1, 2, .. . , s).
1. 1.
+ ' + Ps =
P1obsbility Spaces
1a
1b
r-------------------~0
AUB
1c
Figure 1
To illustrate these concepts let A be the event "red ball selected" and
let B be the event "even-numbered ball selected." Then the union A u B
is the event that either a red ball or an even-numbered ball was selected.
The intersection A n B is the event "red even-numbered ball selected."
The event Ac occurs if a red ball was not selected.
We now would like to assign probabilities to events. Mathematically,
this just means that we associate to each set B a real number. A priori we
could do this in an arbitrary way. However, we are restricted if we want
these probabilities to reflect the experiment we are trying to model. How
should we make this assignment? We have already assigned each point
the number s- 1 Thus a one-point set {m} should be assigned the number
s- 1 Now from our discussion of the relative frequency of the event
"drawing a red ball," it seems that we should assign the event A the probability P(A) = r/s. More generally, if B is any event we will define P(B)
by P(B) = j/s if B has exactly j points. We then observe that
P(B) =
l:
p1"
CDic41B
where LQ)k e B Pt means that we sum the numbers Pt over those values of k
such that m~: e B. From our definition of P(B) it easily follows that the
following statements are true. We leave their verification to the reader.
Let 0 denote the empty set; then P(0) = 0 and P(O) = I. If A and B
are any two disjoint sets, i.e., A n B == 0 , then
P(A u B)
= P(A) + P(B).
1.1.
P(A)
P(B)
Probability Spaces
1.2.
Probability spaces
1.2.
Probability spaces
U A1 =
At u A 2 u u A,.
I= t
and
II
() A 1 = At n A 2 n n A11
I= t
U A,. E J1
ll=t
00
and
() A,. e Jl.
ll=t
U All = A
~t=t
u A2 u
() A,. = At n A 2 n
11=t
():'=t
A 11 are
Probability Spaces
P(A)
+ P(B).
(Qt
Ai) =
itt
P(A,).
Actually, again for mathematical reasons, we will in fact demand that this
additivity property hold for countable collections of disjoint events.
Definition 2 A probability measure P on a a-field of subsets .91
of a set n is a real-valued function having domain .91 satisfying the
following properties:
(i) P(Q) = 1.
(ii) P(A) > 0 for all A e d .
(iii) If A,., n = 1, 2, 3, .. . , are mutually disjoint sets in .91, then
p
1.2.
Probability spaces
One thing however is clear; whatever .91 and P are chosen to be, d must
contain all intervals, and P must assign probability (e-lto - e-At,) to the
interval [t0 , t 1 ] if we want the probability space we are constructing to
reflect the physical situation. The problem of constructing the space now
becomes the following purely mathematical one. Is there a a-field .91 that
contains all intervals as members and a probability measure P defined on
d that assigns the desired probability P(A) to the interval A? Questions
of this type are in the province of a branch of advanced mathematics
called measure theory and cannot be dealt with at the level of this book.
Results from measure theory show that the answer to this particular
question and others of a similar nature is yes, so that such constructions
are always possible.
We will not dwell on the construction of probability spaces in general.
The mathematical theory of probability begins with a~ abstract probability
space and develops the theory using the probability space as a basis of
operation. Aside from forming a foundation for precisely defining other
concepts in the theory, the probability space itself plays very little role in
the further development of the subject. Auxiliary quantities (especially
random variables, a concept taken up in Chapter 3) quickly become the
dominant theme of the theory and the probability space itself fades into
the background.
We will conclude our discussion of probability spaces by constructing
an important class of probability spaces, called uniform probability spaces.
Some of the oldest problems in probability involve the idea of picking a
point "at random" from a set S. Our intuitive ideas on this notion show
us that if A and Bare two subsets having the same "size" then the chance
of picking a point from A should be the same as from B. If S has only
finitely many points we can measure the "size" of a set by its cardinality.
Two sets are then of the same "size" if they have the same number of
points. It is quite easy to make a probability space corresponding to the
experiment of picking a point at random from a set S having a finite
numbers of points. We taken = Sand d to be all subsets of S, and
assign to the set A the probability P(A) = jfs if A is a set having exactly j
points. Such a probability space is called a symmetric probability space
because each one-point set carries the same probability s - 1 . We shall
return to the study of such spaces in Chapter 2.
Suppose now that S is the interval [a, b] on the real line where ~ oo <
a < b < + oo. It seems reasonable in this case to measure the "size" of a
subset A of [a, b] by its length. Two sets are then of the same size if they
have the same length. We will denote the length of a set A by lA I.
To construct a probability space for the experiment of "choosing a
point at random from s,~ we proceed in a manner similar to that used for
= S, and appeal to the results of
the isotope experiment. We take
ProbsbHity Spaces
10
measure theory that show that there is a a-field .91 of subsets of S, and a
probability measure P defined on .91 such that P(A) = IAI/ISI whenever A
is an interval.
More generally, let S be any subset of r-dimensional Euclidean space
having finite, nonzero r-dimensional volume. For a subset A of S denote
the volume of A by IAJ. There is then a a-field .91 of subsets of S that
contains all the subsets of S that have volume assigned to them as in
calculus, and a probability measure P defined on .91 such that P(A) =
IAI/ISI for any such set A . We will call any such probability space,
denoted by (S, .91, P), a uniform probability space.
1.3.
Properties of probabilities
!l n B =(Au
A~
n B = (A n B) u (Ac n B).
By setting B
+ P(Ac n B).
and recalling that P(!l) = 1, we conclude from (2) that
P(A~ = 1 - P(A).
P(B) = P(A n B)
(2)
=n
(3)
(4)
P(B) = P(A)
Since P(Ac n B)
(6)
P(Ac n B)
P(A)
if A c B.
if A c B.
~
1.3.
11
Properties of prQbabilities
and
(8)
To see that (7) holds, observe that w e (U,. ~ 1 A,.)c if and only if m A,.
for any n; that is, we A! for all n ;;;:: 1, or equivalently, wE
A~. To
establish (8) we apply (7) to {A~}, obtaining
n,.
Now
n,.
U,. A,. is the event that at least one of the events A,. occurs, while
A~
is the event that none of these events occur. In words, (9) asserts
that the probability that at least one of the events A 11 will occur is 1 minus
the probability that none of the events A,. will occur. The advantage of (9)
is that in some instances it is easier to compute P(n,. A~ than to compute
P(U,. A 11) . [Note that since the events A,. are not necessarily disjoint it is
not true that P(U,. A 11) = L11 P(A,.).] The use of (9) is nicely illustrated
by means of the following.
Suppose three perfectly balanced and identical coins are
tossed. Find the probability that at least one of them lands heads.
Example 3.
Coin 2
Coin 3
T
-
Our intuitive notions suggest that each of these eight outcomes should
have the probability 1/8. Let A 1 be the event that the first coin lands
heads, A 2 the event that the second coin lands heads, and A 3 the event
that the third coin lands heads. The problem asks us to compute
P(A 1 u A 2 u A3 ) . Now Af n A2 n Aj = {T, T, T} and thus
P(Af n
A2 n
A3) = 1/8;
12
Prob11bility $p11css
Our basic postulate (iii) on probability measures tells us that for disjoint sets A and B, P(A u B) = P(A) + P(B). If A and B are not
necessarily disjoint, then
P(A u B) = P(A)
(10)
+ P(B)
- P(A n B)
and consequently
P(A u B) ~ P(A.)
(11)
+ P(B).
To see that (10) is true observe that the sets A n Be, A. n B, and Ae n B
are mutually disjoint and their union is just A u B (see Figure 2). Thus
(12)
+ P(Ae n
B)
+ P(A n
B).
By (2), however,
P(A n B1
P(A) - P(A n B)
P(Ac n B)
and
By substituting these expressions into (12), we obtain (10).
Figure 2
Equations (10) and (11) extend to any finite number of sets. The
analogue of the exact formula (10) is a bit complicated and will be discussed in Chapter 2. Inequality (11), however, can easily be extended by
induction to yield
(13)
s L"
P(A,).
I= 1
S P(A 1 u u A.- 1)
+ P(A,.).
Hence if (13) holds for n - 1 sets, it holds for n sets. Since (13) clearly
holds for n = 1, the result is proved by induction.
1.3.
13
P1operties of p1obsbilities
So far we have used only the fact that a probability measure is finitely
additive. Our next result will use the countable additivity.
(14)
(i) If A 1 c: A 2 c: and A
= P(A).
lim P(A,.)
n-+co
n::.
UB
A,. =
1,
I= 1
and
co
U B1
I= 1
A=
Consequently,
II
P(A,.) = ~ P(B1)
I= 1
and
P(A)
co
~ P(B1).
i= 1
Now
CO
II
(15)
n-+co I= 1
l= 1
by the definition of the sum of an infinite series. It follows from (15) that
lim P(A,.)
n-+ co
II
lim ~ P(B1)
n-+co 1=1
CIO
L P(BI) = P(A),
I= 1
A'"=
U A~.
n=1
lim
n-+co
P(A~
= P(AC).
n:=
A,.. Then
14
Ptobsbility Spaces
..... 00
1 - lim P(A~)
n-+oo
1 -
P(A~ =
P(A),
Conditional probability
(17)
If P(A)
P(A)
rl
N"(if)
B)
N,(A
rl
B)/n
N"(il)jn
1.4.
Conditions/ p1obsbillty
16
r. B)/P(A).
As a first example of the use of (17) we will solve the problem posed at
the start of this section. Since the set Q has b + r points each of which
carries the probability (b + r)- 1 , we see that P(A) = r(b + r)- 1 and
P(A n B) = (b + r)- 1 Thus
I
P(B I A)=-.
r
tossed once.
(a) Find the conditional probability that both coins show a head given
that the first shows a head.
(b) Find the conditional probability that both are heads given that at
least one of them is a head.
To solve these problems we let the probability space n consist of the
four points HH, HT, TH, IT, each carrying probability 1/4. Let A be
the event that the first coin results in heads and let B be the event that the
second coin results in heads. To solve (a) we compute
P(A n B
I A) =
= 1/2.
I A u B) =
P(A n B)/P(A u B)
= (1/4)/(3/4) =
1/3.
In the above two examples the probability space was specified, and we
used (17) to compute various conditional probabilities. In many problems
however, we actually proceed in the opposite direction. We are given in
advance what we want some conditional probabilities to be, and we use this
information to compute the probabiiity measure on n. A typical example
of this situation is the following.
Suppose that the population of a certain city is 40% male
and 60% female. Suppose also that 50% of the males and 30% of the
females smoke. Find the probability that a smoker is male.
Example 6.
Let M denote the event that a person selected is a male and let F denote
the event that the person selected is a female. Also let S denote the event
that the person selected smokes and let N denote the event that he does not
smoke. The given information can be expressed in the form P(S I M) = .5,
16
Probability Spaces
= .6.
+ P(S n F).
Since
P(S n F) = P(F)P(S I F) = (.6)(.3) = .18,
we see that
P(S)
.20
+ .18 = .38.
Thus
20
= .38
~
P(M I S)
.53 .
The reader will notice that the probability space, as such, was never
explicitly mentioned. This problem and others of a similar type are solved
simply by using the given data and the rules of computing probabilities
given in Section 3 to compute the requested probabilities.
It is quite easy to construct a probability space for the above example.
Take the set Q to consist of the four points SM, SF, NM, and NF that are,
respectively, the unique points in the sets S n M , S r'l. F, N n M , and
N n F. The probabilities attached to these four points are not directly
specified, but are to be computed so that the events P(S I M), P(S I F),
P(M), and P(F) have the prescribed probabilities. We have already
found that P(S n M) = .20 and P(S n F) = .18. We leave it as an
exercise to compute the probabilities attached to the other t~o points.
The problem discussed in this example is a special case of the following
general situation. Suppose A1, A 2 , , A,. are n mutually disjoint events
with union n. Let B be an event such that P(B) > 0 and suppose P(B I At)
and P(At) are specified for 1 ~ k ~ n. What is P(A 1 I B)? To solve this
problem note that the A 11 are disjoint sets with union 0 and consequently
Thus
P(B) =
A:=l
But
P(B n Aa)
1.4.
Conditional probability
17
so we can write
(18)
This formula, called Bayes' rule, finds frequent application. One way
oflooking at the result in (18) is as follows. Suppose we think of the events
At as being the possible "causes" of the observable event B. Then P(A 1 I B)
is the probability that the event A, was the "cause" of B given that B
occurs. Bayes' rule also forms the basis of a statistical method called
Bayesian procedures that will be discussed in Volume II, Introduction to
Statistical Theory.
As an illustration of the use of Bayes' rule we consider the following
(somewhat classical) problem.
Exa~ple 6.
18
Probability Spsces
Example 7.
b black balls. A ball is drawn and its color noted. Then it together
with c > 0 balls of the same color as the drawn ball are added to the urn.
The procedure is repeated n - 1 additional times so that the total
number of drawings made from the urn is n.
Let R1, 1 ~ j < n, denote the event that thejth ball drawn is red and let
B1 , 1 ~ j < n, denote the event that the jth ball drawn is black. Of course,
for each j, R1 and B1 are disjoint. At the kth draw there are b + r +
(k - 1)c balls in the urn and we assume that the probability of drawing
any particular ball is (b + r + (k - 1)c)- 1 To compute P(R 1 (') R 2 ) we
write
Now
and thus
P(R. (') R2) = (
r ) ( r+c ) .
b+r b+r+ c
Similarly
and thus
P(R 2 ) = P(R 1 n R 2 )
P(B 1 r. R 2 )
--b + r
Consequently, P(R2 )
= P(R1). Since
P(B2) = 1 - P(R 2) =
,
r
in the exercises.
1.5.
Independence
1.5. Independence
19
P(A n B)
= P(A)P(B).
= 0 and is also symmetric in the letters
P(A)P(B).
P(A n B n C)
P(A)P(B)P(C).
We leave it as an exercise to show that if A, B, and Care mutually independent and P(A n B) =F 0, then P(C I A n B) = P(C).
P1obabUhy Spaces
20
1/2, 0
1/4}
1.5.
21
Independence
the x/s have the value I; for simplicity, say x 1 = x 2 = ; = X~c = I and
the other x,'s have the value 0. Then if A1 denotes the event that the ith
trial, I < i < n, is a success, we see that
{(1, 1, ... , 1, 0, .. . , 0)} = A 1 n n A~c n A~+ 1 n n A~.
n- k
According to our intuitive views, the events A1 , .. , A~c, Ak+ ~> .. . , A~ are
to be mutually independent and P(A 1) = p, 1 < i < n. Thus we should
assign P so that
P({(1, 1, .. . , 1, 0, ... , 0)}) = P(A 1 )
P(A~c)P(Ak+ 1)
P(A~)
p"'(1 - p)"- k.
Let us now compute the probability that exactly k of then trials result
in a success. Note carefully that this differs from the probability that k
specified trials result in successes and the other n - k trials result in
failures. Let B" denote the event that exactly k of the n trials are successes.
Since every choice of a specified sequence having k successes has probability
pk(l - p)"-k, the event Brc has probability P(B~c) = C(k, n)pk(l - p)"-k,
where C(k, n) is the number of sequences (x 1 , .. , x,.) in which exactly k
of the x,'s have value 1. The computation of C(k, n) is a simple combinatorial problem that will be solved in Section 2.4. There it will be
shown that
(20)
C(k n)
'
n!
k!(n- k)! '
0 < k < n.
= m(m
1) 1.
(Z)
(the binomial
(21)
ProbsbHity Spaces
22
P(B 1)
(~) (.1)0(.9)3 +
(n
(.1)1(.9)2
= (.9) 3 + 3(.1)(.9)2
= .972.
Exercises
Let (0, .!1/, P) be a probability space, where .r;1 is the a-field of all
subsets of n and P is a probability measure that assigns probability
p > 0 to each one-point set of 0.
(a) Show that 0 must have a finite number of points. Hint: show that
n can have no more than p- 1 points.
(b) Show that if n is the number of points in 0 then p must be n- 1
2 A model for a random spinner can be made by taking a uniform
probability space on the circumference of a circle of radius 1, so that the
probability that the pointer of the spinner lands in an arc of length s is
sf2n. Suppose the circle is divided into 37 zones numbered I, 2, ... , 37.
Compute the probability that the spinner stops in an even zone.
x+y= 1.
4 Let a point be picked at random in the disk of radius 1. Find the
probability that it lies in the angular sector from 0 to n/4 radians.
5 In Example 2 compute the following probabilities:
(a) No disintegration occurs before time 10.
(b) There is a disintegration before time 2 or a disintegration between
times 3 and 5.
6 A box contains 10 balls, numbered 1 through 10. A ball is drawn from
the box at random. Compute the probability that the number on the
ball was either 3, 4, or 5.
7 Suppose two dice are rolled once and that the 36 possible outcomes are
equally likely. Find the probability that the sum of the numbers on the
two faces is even.
23
Exercises
2/5, P(B)
= 2/5,
and
15 With the same box composition as in Exercise 14, find the probability
that all three of the removed balls will be black if it is known that at
least one of the removed balls is black.
16 Suppose a factory has two machines A and B that make 60% and 40%
of the total production, respectively. Of their output, machine A
produces 3% defective items, while machine B produces 5% defective
items. Find the probability that a given defective part was produced by
machine B.
17 Show by induction on n that the probability of selecting a red ball at
any trial n in Polya's scheme (Example 7) is r(b + r)- 1
18 A student is taking a multiple choice exam in which each question has
5 possible answers, exactly one of which is correct. H the student knows
the answer he selects the correct answer. Otherwise he selects one
answer at random from the 5 possible answers. Suppose that the
student knows the answer to 70% of the questions.
(a) What is the probability that on a given question the student gets
the correct answer?
24
Probebflity Spaces
(b) If the student gets the correct answer to a question, what is the
probability that he knows the answer?
19 Suppose a point is picked at random in the unit square. If it is known
that the point is in the rectangle bounded by y = 0, y = 1, x = 0, and
20
21
22
23
24
25
26
25
Exercises
(c) Calculate the probability that the first ball is white given that the
second ball is white.
It is known that
40% of the men and 60% of the women smoke cigarettes. What is the
probability that a student observed smoking a cigarette is a man?
probability that the second drawer has a silver coin given that the
first drawer has a gold coin.
31 In Polya's urn scheme (Example 7) given that the second ball was red,
33
34
35
36
37
38
Let A 1 be the event that the ith coin lands heads. Show that the events
A 1, A 2 , and A 3 are mutually independent.
Suppose the six faces of a die are equally likely to occur and that the
successive die rolls are independent. Construct a probability space for
the compound experiment of rolling the die three times.
Let A and B denote two independent events. Prove that A and Be,
Ac and B, and Ac and Be are also independent.
Let !l = {1, 2, 3, 4} and assume each point has probability 1/4. Set
A = {1, 2}, B = {1, 3}, C = {1, 4}. Show that the pairs of events
A and B, A and C, and B and C are independent.
Suppose A, B, and Care mutually independent events and P(A n B) :/:
0. Show that P(C I A n B) = P(C).
Experience shows that 20% of the people reserving tables at a certain
restaurant never show up. If the restaurant has 50 tables and takes
52 reservations, what is the probability that it will be able to accommodate everyone?
A circular target of unit radius is divided into four annular zones with
outer radii 1/4, 1j2, 3/4, and 1, respectively. Suppose 10 shots are fired
independently and at random into the target.
26
Probability Spaces
(a) Compute the probability that at most three shots land in the zone
bounded by the circles of radius 1/2 and radius 1.
(b) If 5 shots land inside the disk of radius 1/2, find the probability
that at least one is in the disk of radius 1/4.
39 A machine consists of 4 components linked in parallel, so that the
40
41
42
43
44
45
46
47
Combinatorial
Analysis
Recall from Section 1.2 that a symmetric probability space having s points is the
model used for choosing a point at random from a set S having s points. Henceforth when we speak of choosing a point at random from a finite set S, we shall
mean that the probability assigned to each one-point set is s- 1 , and hence the
probability assigned to a set A havingj points isjjs.
Let N(A) denote the number of points in A. Since P(A) = N(A)fs, the problem
of computing P(A) is equivalent to that of computing N(A). The procedure for
finding P(A) is to "counf' the number of points in A and divide by the total
number of points s. However, sometimes the procedure is reversed. If by some
means we know P(A), then we can find N(A) by the formula N(A) = sP(A).
This reverse procedure will be used several times in the sequel.
The computation of N(A) is easy if A has only a few points, for in that case we
can just enumerate all the points in A. But even if A has only a moderate number
of points, the method of direct enumeration becomes intractable, and so some
simple rules for counting are desirable. Our purpose in this chapter is to present a
nontechnical systematic discussion of techniques that are elementary and of
wide applicability. This subject tends to become difficult quite rapidly, so we shall
limit our treatment to those parts of most use in probability theory. The first four
sections in this chapter contain the essential material, while the last four sections
contain optional and somewhat more difficult material.
2.1 ..
Ordered samples
27
28
Combinatorial Analysis
The special case when the sets S, 1 S i < n, are the same set can be
approached from a different point of view. Suppose a box has s distinct
balls labeled 1, 2, ... , s. A ball is drawn from the box, its number noted
and the ball is returned to the box. The procedure is repeated n times.
Each of the n draws yields a number from 1 to s. The outcome of the n
draws can be recorded as an n-tuple (x1, x 2, ... , X11) , where x 1 is the
number on the 1st ball drawn, x 2 that on the 2nd, etc. In all, there ares"
possible n-tuples. This procedure is called sampling with replacement from
a population of s distinct objects. The outcome (x 1 , x 2 , , X 11) is called a
sample of size n drawn from a population of s objects with replacement.
We speak of random sampling with replacement if we assume that all of the
s" possible samples possess the same probability or, in traditional language,
are equally likely to occur.
A perfectly balanced coin is tossed n times. Find the
probability that there is at least one head.
Example 3.
U7=
P(A) = 1 - P(Ac)
=
1- p ( c01 A.r)
(n Ai)
= 1- p
i=l
and n;= 1 A~ occurs if and only if all of the n tosses yield tails. Thus
P(ni=t Ai) = 2- ", so P(A) = I - 2- ".
2.1.
Ordered samples
29
(1)
(s),.
-- =
s"
s(s- 1) ( s - n
1)
~--~--~------~
_ (1 _
s"
~)
(1 _
~) ... ( 1
_ n
1) .
5
and the fact that birth rates are not exactly uniform over the year.) Find
the probability p that no two people in a group of n people will have a
common birthday.
Example 4.
Combinatorial Analysis
30
lim
s-+ao
(~~~~ =
S
lim
s-+ao
(1 - !) .. (1 - n S
1
)
= 1.
Permutations
Suppose we have n distinct boxes and n distinct balls. The total number
of ways of distributing the n balls into the n boxes in such a manner that
each box has exactly one ball is n!. To say that these n balls are distributed
at random into the n boxes with one ball per box means that we assign
probability 1/n! to each of these possible ways. Suppose this is the case.
What is the probability that a specified ball, say ball i, is in a specified box,
say boxj? If ball i is in boxj, this leaves (n - 1) boxes and (n - I) balls
to be distributed into them so that exactly one ball is in each box. This
can be done in (n - 1)! ways, so the required probability is (n - 1) !/n! =
lfn.
Another way of looking at this result is as follows. If we have n distinct
objects and we randomly permute them among themselves, then the
probability that a specified object is in a specified position has probability
lfn. Indeed, here the positions can be identified with the boxes and the
objects with the balls.
The above considerations are easily extended from 1 to k > 1 objects.
If n objects are randomly permuted among themselves, the probability
that k specified objects are in k specified positions is (n - k)!/n!. We
leave the proof of this fact to the reader.
Problems involving random permutations take on a variety of forms
when stated as word problems. Here are two examples:
(a) A deck of cards labeled 1, 2, ... , n is shuftled, and the cards are
then dealt out one at a time. What is the probability that for some
specified i, the ith card dealt is the card labeled i?
2.3.
31
(b) Suppose 10 couples arrive at a party. The boys and girls are then
paired off at random. What is the probability that exactly k specified boys
end up with their own girls?
A more sophisticated problem involving random permutations is to find
the probability that there are exactly k "matches." To use our usual
picturesque example of distributing balls in boxes, the problem is to find
the probability that ball i is in box i for exactly k different values of i.
The problem of matchings can be solved in a variety of ways. We
postpone discussion of this problem until Section 2.6.
2.3.
(s),. = (s).
r!
Combinatorial Analysis
32
(s)
r
s!
(s),.
= r! =r!(s-r)!.
We point out here for future use that ( ~) is well defined for any real
number a and nonnegative integer r by
(3)
a)
(r
= (a),. =
1) (a - r
a(a -
r!
1) ,
r!
(-1t)
3
= ( -1r)( -~
tX -n
3!
=-
1t(7t
1)(1t
2)
3!
- 2)
= 0 if r >
= 0 if r is a negative integer.
a. We adopt
Then ( ~) is defined
1 < i 1 < i2 < < i,. < n. Indeed, each of the (n),. choices of r distinct
numbers from 1 to n has r! reorderings exactly one of which satisfies the
requirement. Thus the number of distinct choices of numbers satisfying
the requirement is the same as the number of distinct subsets of size r
that can be drawn from the set { 1, 2, ... , n}.
The mathematics department
consists of 25 full professors, 15 associate professors, and 35 assistant
professors. A committee of 6 is selected at random from the faculty of
the department. Find the probability that all the members of the committee are assistant professors.
Example 7.
Committee membership.
2.3.
33
i)
i)I(i).
value of .01; therefore the tenure staff (associate and full professors) need
not worry unduly about having no representation.
Consider a poker hand of five cards. Find the probability
of getting four of a kind (i.e., four cards of the same face value) assuming
the five cards are chosen at random.
Example 8.
(s;)
to- 4
The probability space in this case consists of n" equally likely points.
Let A be the event that only box 1 is empty. This can happen only if the
n balls are in the remaining n - I boxes in such a manner that no box is
empty. Thus, exactly one of these (n - 1) boxes must have two balls, and
the remaining (n - 2) boxes must have exactly one ball each. Let B1 be the
event that box j, j = 2, 3, ... , n, has two balls, box 1 has no balls, and
the remaining (n - 2) boxes have exactly one ball each. Then the B1 are
disjoint and A = Uj= 2 B1 To compute P(B1) observe that the two balls
put in box j can be chosen from the n balls in (;) ways. The (n - 2)
balls in the remaining (n - 2) boxes can be rearranged in (n - 2)! ways.
34
Combinatorial Analysis
Thus the number of distinct ways we can put two balls into boxj, no ball in
(~) (n
- 2)!.
(~) (n
- 2}!
P(B} = -'-''--n-,.- -
and consequently
(n -
P(A) =
1)
(n)2
n"
2.4.
(n - 2)!
(") (n - 1)!
= _,_2--'-----
n"
Partitions
(b ~ ')
in the population.
probability
E!~h of these
r balls
(~)
ways without
regard to order, and then - k black balls can be chosen from the b black
balls without regard to order in
balls could be paired with each choice of n - k black balls there are,
therefore, a total of
probability is
(~)
2.4.
35
PtJrtitions
The essence of this type of problem is that the population (in this case
the balls) is partitioned into two classes (red and black balls). A random
sample of a certain size is taken and we require the probability that the
sample will contain a specified number of items in each of the two classes.
In some problems of this type the two classes are not explicitly specified,
but they can be recognized when the language of the problem is analyzed.
A poker hand has five cards drawn from an ordinary deck
of 52 cards. Find the probability that the poker hand has exactly 2 kings.
Example 10.
deck there are 4 kings and 48 other cards. This partitions the cards into
two classes, kings and non-kings, having respectively 4 and 48 objects
each. The poker hand is a sample of size 5 drawn without replacement
and without order from the 52 cards. The problem thus is to find the
probability that the sample has 2 members of the first class and 3 members
of the second class. Hence the required probability is
(a) What is the probability that in a hand of 5 cards exactly 3 are clubs?
(b) What is the probability that in a hand of 5 cards exactly 3 are of the
sam~ suit?
To solve problem (a) we note that the conditions of the problem partition the deck of 52 cards into 2 classes. Class one is the "clubs" having
13 members, and class two is "other than clubs" having 39 members. The
5 cards constitute a sample of size 5 from the population of 52 cards, and
the problem demands that 3 of the 5 be from class one. Thus the required
probability is
p=
8.15
10- 2
To solve (b) let A 1 be the event that exactly 3 cards are clubs, A 2 the
event that exactly 3 cards are diamonds, A 3 the event that exactly 3 cards
are hearts, and A 4 the event that exactly 3 cards are spades. Then since
there are only 5 cards in the hand, the events A 1, A 2 , A 3 , A4 are mutually
36
Combinatorial Analysis
ff)
each of which is equally likely. Of these we must now compute the number
of ways in which we can have one pair and one triple. Consider the number
of ways we can choose a particular triple, say 3 aces, and a particular pair,
say 2 kings. The triple has 3 cards that are to be chosen without regard to
order from the four aces and this can be done in
(~) ways.
two cards to be drawn without regard to order from the four kings. This
can be done in
(~)
(~) (~)
Thus the
probability of getting a poker hand that has a triple of aces and a pair of
5
kings is
= p. Of course, this probability would be the
same for any specified pair and any specified triple. Now the face value
of the cards on the triple can be any of the possible 13, and the face value
of the cards in the pair can be any of the 12 remaining possible face values.
Since each of the 13 values for the triple can be associated with each of the
12 values for the pair, there are (13)(12) such choices. Each of these
choices constitutes a disjoint event having probability p, so the required
probability is
(13)(12)p = (13)(12)(4)(6) ~ 1.44
10-3.
(sff)
In a poker hand what is the probability of getting exactly
two pairs? Here, a hand such as (2, 2, 2, 2, x) does not count as two pairs
but as a 4-of-a-kind.
Example 13.
To solve the problem we note that if the hand has two pairs, then two of
the cards have the same face value x, two of the cards have the same face
value y # x, and the fifth card has a different face value from x or y.
Now there are 13 different face values. The face values of the two pairs
1
can be chosen from them in (
ways. The other card can be any one of
i)
11 face values. The two cards of value x can be chosen from the four of
2.4.
Partitions
that value in
37
The remaining
(i)
= 4 ways.
To solve this problem we can argue as follows. The effect of the first
sample is to partition the balls into two classes, viz., those n selected and
those r - n not selected: (We can imagine that then balls selected in the
first sample are painted red before being tossed back). The problem is
then of finding the probability that the sample of size m contains exactly
k balls from the first class, so the desired probability is
38
Combinatorial Analysis
In the committee problem discussed earlier, find the probability that the committee of 6 is composed of
2 full professors, 3 associate professors, and 1 assistant professor.
Example 16.
Committee problem.
2.5.
Union of events
Ui=
2.5.
39
Union of events
P(A 1 v A 2 )
P(A 1 )
P(A 2 )
P(A 1
r1
P(Ui'= 1 A 1)
A 2 ).
3. Let
Now
(4)
P(B)
= P(A 1 u
A 2 ) = P(A 1)
Since B n A 3 = (A 1 v A2 ) n A 3
that
(5) P(B () A3 )
= P(A 1 fl
A3 )
P(A 2 )
P(A 1 n A 2 ).
(A 1 n A3 ) v (A 2 n A 3 ), it follows
P(A 2 n A 3 )
P(A 1 n A 2
r1
A3 ) .
Substituting (4) and (5) into the expression for P(A 1 u A 2 u A 3 ), we see
that
P(A 1 u A 2 u A 3 ) = [P(A 1)
+ P(Az)
- [P(A 1
r1
A3 )
- P(A 1 n A 2 )]
+ P(A 2 n
A3 )
+ P(A 3 )
+ P(A 2 ) + P(A 3 )]
[P(A 1 n A 2 ) + P(A 1 n A 3 ) +
P(A 1 n A 2 n A 3)]
= [P(A 1 )
+ P(A 1 n
P(A 2 n A 3 )]
A 2 n A 3 ).
+ P(A2) + P(A3),
S 2 = P(A 1 n A2 )
and
S3
= P(A 1 n
P(A 1 n A3 )
P(A 2
()
A 3 ),
A 2 n A 3 ).
Then
(6)
There is a generalization of (6) that is valid for all positive integers n.
Let A 1 , . , A, be events. Define n numbers S,., 1 < r :::;;; n, by
S,. =
1:
1 ~it< ...
<i,.~ll
P(A 11
r1 r1
Then in particular
S 1 = P(A 1 )
11-1
S2
= I:
io;J
+ .-. + P(A,.),
II
L
j=i+l
P(A; n A),
A;J.
40
Combinatorial Analysis
and
S,.
P(A 1 n n A,.).
(7)
= S1
S2
+ + (- 1)"- 1S,..
The reader can easily check that this formula agrees with (6) if n = 3 and
with Equation (10) of Chapter 1 if n = 2. The proof of (7) proceeds by
induction, but is otherwise similar to that of (6). We will omit the details
of the proof.
The sum S 1 has n terms, the sum S 2 has
sum S,. has ( ~) terms. To see this, note that the rth sum is just the sum
of the numbers P(A 11 n n A,r) over aU the values of the indices
i 1 ,. i 2 , , i,. such that i1 < i 2 < < i,.. The indices take values between
1 and n. Thus the number of different values that these indices can take is
the same as the number of ways we can draw r distinct numbers from n
numbers without replacement and without regard to order.
2.6.
Matching problems*
We now may easily solve the problem of the number of matches. Let
A, denote the event that a match occurs at the ith position and let p,.
denote the probability that there are no matches. To compute 1 - p,. =
P(U1= 1 A,), we need to compute P(A 11 n A12 n n A,,.) where i 1 ,
i 2 , , i,. are r distinct numbers from {I, 2, ... , n}. But this probability
is just the probability of a match at positions i 1, i 2 , , i,., and we have
already found that the probability of this happening is (n - r) !/n!. Since
the rth sum
P(A 1
f (-1)'-
r=1
"
( -l}r-1
r=1
r!
=L
n!
(n - r)!
r!(n - r)!
n!
that is,
(8)
(1 -
p,J
=1-
2!
1
+ -I - ... + ( -1)"- .
3!
n!
2.6.
Matching p1oblems
41
Using (8) we see that the probability, Pm that there are no matches is
(9)
Pn
1 - 1
1
2!
1
3!
I)"
n!
+ - - - + ''' + -
" ( 1)k
~
k=O
k!
= L
Now the right-hand side of (9) is just the first n + 1 terms of the Taylor
expansion of e- 1 Therefore, we can approximate Pn by e- 1 and get
1 - e- 1 = .6321 ... as an approximation to (I - Pn) It turns out that
this approximation is remarkably good even for small values of n. In the
table below we compute the values of(I - Pn) for various values ofn.
I - Pn
.6667
.6250
.6333
.6320
We thus have the remarkable result that the probability of at least one
match among n randomly permuted objects is practically independent of n.
The problem of matches can be recast into a variety of different forms.
One of the most famous of these is the following.
Two equivalent decks of cards are well shuffled and matched against
each other. What is the probability of at least one match?
To solve the problem we need only observe that the first deck can be
used to determine positions (boxes). With no loss of generality then we can
assume the cards in the first deck are arranged in the order 1, 2, . . . , n.
The cards in the second deck (the balls) are then matched against the
positions determined by the first deck. A match occurs at position i if and
only if the ith card drawn from the second deck is card number i.
Now that we know how to compute the probability Pn of no matches, we
can easily find the probability fJn(r) that there are exactly r matches. To
solve the problem we first compute the probability that there are exactly
r matches and that these occur at the first r places. This can happen only
if there are no matches in the remaining (n - r) places. The probability
that there are no matches among j randomly permuted objects is p1
Hence j! p1 is the number of ways that j objects can be permuted among
themselves so that there are no matches. (Why?) Since there is only pne
way of having r matches at the first r positions, the number of ways we
can have exactly r matches at the first r positions and no matches at
the remaining (n - r) positions is (n - r)! Pn-r Thus the required
probability is
(n - r)!
a., =
n!
Pn-r
The probability that there are exactly r matches and that these occur at
any specified r positions is the same for all specifications, namely, a.,.
Combinatolial Analysis
42
To solve the problem that there are exactly r matches, all that is now
necessary is to realize that the events "exactly r matches occurring at
positions i., i 2 , , i," are disjoint events for the various choices of
(~) rx,..
(~)
(10)
()=
P,.r
n! rx,.
r I(
. n - )r I.
n!
(n- r)!p,._,
r!(n-r)!
n!
1 [
1-1+ -1 ++ ( l)n-r] .
r!
2!
(n - r)!
=-
fJ,.{r)
e-t
~ -
r!
n.1
__ Pn-r
..::....::.........:...__
(r - 1)! n
Hence
P(Ai I B,) =
p,._, ' ~
n(r-1).p,._,
!_
2. 7.
Occupancy ptoblems
2.7.
43
Occupancy problems
Combinatorial Analysis
44
that k specified balls are ink specified boxes is just r -k. In the language of
random sampling this says that if a sample of size n is drawn with replacement from a population of r objects, then the probability that the .i1 th,
j 2 th, . .. ,Ath elements in the sample are any k prescribed objects is ,-k.
Let A 1(i) be the event that the jth element in the sample is the ith object.
Then we have just said that for any choicej1 < j 2 < <A, 1 < k < n,
of elements in the sample (i.e., balls) and any choice it> i2, . .. , ik of objects
(i.e., boxes),
P(Ah(i 1) n Ah(i 2 ) n n A1"(ik)) = ,-k.
Since this is true for all k and all choices of j 1 , . . ,jh we see that for any
ih i 2 , , i,. the events A 1(i1), . , A,.(i,.) are mutually independent.
If we think of drawing a random sample of size n from a set of r distinct
objects as ann-fold repetition of the experiment of choosing one object at
random from that set of r distinct objects, then we see that the statement
taat the events A 1(i1 ), . . , A 11{i11) are independent says that the outcome of
one experiment has no influence on the outcome of the other experiments.
This, of course, is in good accord with our intuitive notion of random
sampling.
Suppose n balls are distributed at random into r boxes.
Find the probability that there are exactly k balls in the first r 1 boxes.
Example 17.
To solve the problem observe that the probability that a given ball is in
one of the first r 1 boxes is r 1fr. Think of the distribution of then balls as
an n-fold repetition of the experiment of placing a ball into one of the
r boxes. Consider the experiment a success if the ball is placed in one of the
first r 1 boxes, and otherwise call it a failure. Then from our results in
Section 1.5, we see that the probability that the first r1 boxes have exactly
k balls is
2.8.
2.8.
45
Similarly, if 1 < i 1 < i 2 < < i" < r, then the event A, 1 fl A, 2 n
fl A1k occurs if and only if all of the balls are in the remaining r - k
boxes. Consequently, P(A 11 fl fl A 1k) = (r - k)"/r" = (1 - k/r)".
We can now apply (7) to compute the probability of A 1 u u A,,
which is just the event that at least one box is empty. In this situation
S"
= (~) (I
P(A 1 u u Ar) =
(-
1)"- 1
( ') (
k=l
~)
".
r
1 -
p0(r, n)
= 1 - P(A 1 u u Ar)
.(
-1)j-l
= 1 -
J= t
(~)
1
(1 -
;r.
1)"
r
ex"( r, n) =
(14)
(r -
k)"p 0 (r - k, n)
r"
(1 - ;k)" p (r 0
k, n).
We may now easily compute the probabilities P~c(r, n). For each choice
of k distinct numbers i., i2 , , i~c from the set of numbers {1, 2, . . . , n},
the event {exactly k boxes i 1 , i 2 , , i" empty} has probability cx~c(r, n) and
these events are mutually disjoint. There are ( ~) such events and their
union is just the event {exactly k boxes empty}. Thus
P~c(r, n) = (~) ( 1 -
(15)
;r
p 0 (r - k, n).
P~c(r,
n). =
10
~) r-k
~ (-1)1 ( r j
k) (1- 1- +, - k)".
46
Combinstorisl Analysis
Exercises
47
(a) What is the probability that exactly one box is empty? Hint: use
the result of Example 9.
(b) Given that box 1 is empty, what is the probability that only one
box is empty?
(c) Given that only one box is empty, what is the probability that box 1
is empty?
11
12 Show that
1)"s
n( 1-
(s),.
~ - ~
1)"-t
.
s
( 1- -
13 A box has b black balls and r red balls. Balls are drawn from the
occurrmg:
(a) Royal flush ((10, J, Q, K, A) of the same suit);
(b) Straight flush (five cards of the same suit in a sequence);
(c) Four of a kind (face values of the form (x, x, x, x, y) where x andy
are distinct);
(d) Full house (face values of the form (x, x, x, y, y) where x andy are
distinct);
(e) Flush (five cards of the same suit);
(f) Straight (five cards in a sequence, regardless of suit);
(g) Three of a kind (face values of the form (x, x, x, y, z) where x, y,
and z are distinct);
(h) Two pairs (face values of the form (x, x, y, y, z) where x, y, and z
are distinct);
(i) One pair (face values of the form (w, w, x, y, z) where w, x, y, and z
are distinct).
48
Combin11toriBI An11/ysis
16
17
18
19
20
size 3 is selected. Find the probability that balls 1 and 6 are among
the three selected balls.
Cards are dealt from an ordinary deck of playing cards one at a time
until the first king appears. Find the probability that this occurs with
the nth card dealt.
Suppose in a population of r elements a random sample of size n is
taken. Find the probability that none of k prescribed elements is in the
sample if the method used is
(a) sampling without replacement;
(b) sampling with replacement.
Suppose a random sample .of size n is drawn from a population of r
objects without replacement. Find the probability that k given objects
are included in the sample.
Suppose n objects are permuted at random among themselves. Prove
that the probability that k specified objects occupy k specified positions
is (n - k)!fn!.
With reference to Example 14, show that
21 A
What is the probability that the bridge hands of north and south
together (a total of 26 cards) contain exactly 3 aces?
23 What is the probability that if 4 cards are drawn from a deck, 2 will be
black and 2 will be red?
22
24 Find the probability that a poker hand of 5 cards will contain no card
smaller than 7, given that it contains at least 1 card over 10, where aces
are treated as high cards.
25 If you hold 3 tickets to a lottery for which n tickets were sold and
5 prizes are to be given, what is the probability that you will win at
least 1 prize?
26 A box of 100 washers contains 5 defective ones. What is the probability that two washers selected at random (without replacement)
from the box are both good?
27
Discrete Random
Variables
Consider the experiment of tossing a coin three times where the probability of a
head on an individual toss is p. Suppose that for each toss that comes up heads we
win $1, but for each toss that comes up tails we lose $1. Clearly, a quantity of
interest in this situation is our total winnings. Let X denote this quantity. It is
clear that X can only be one of the values $3, $1, -$1, and -$3. We cannot with
certainty say which of these values X will be, since that value depends on the outcome of our random experiment. If for example the outcome is HHH, then X will
be $3; while for the outcome HTH, X will be $1. In the following table we list the
values of X (in dollars) corresponding to each of the eight possible outcomes.
(J)
P{ro}
X(w)
HHH
p3
HHT
p2(l - p)
HTH
p2(1 - p)
THH
p2(l - p)
HIT
-1
p(l - p)2
THT
-1
p(l - p)2
TTH
-1
p(l - p)2
TIT
-3
(1 - p)J
We can think of X as a real-valued function on the probability space corresponding to the experiment. For each w e 0, X(m) is then one of the values, 3, 1, -1, -3.
Consider, for example, the event {w: X(w) = 1}. This set contains the three points
w 2 , w 3 , and w 4 corresponding to the outcomes HHT, HTH, and THH, respectively.
The last column in the table gives the probabilities associated with the eight
possible outcomes of our experiment. From that table we see that the event
{w: X(w) = I} has probability 3p2 (1 - p). We usually abbreviate this by saying
49
50
Definitions
Let (!l, .91, P) be an arbitrary -probability space, and let X be a realvalued function on n taking only a finite or countably infinite number of
values x 1, x 2 , As in the example just given, we would certainly like to
be able to talk about the probability that X assumes the value x, for each i.
For this to be the case we need to know that for each i, {w e n: X (ro) = x 1}
is an event, i.e., is a member of d . If, as in the previous example, dis the
u-field of all subsets of n then this is certainly the case. For in that case,
no matter what xi might be, {ro: X(ro) = x 1} is a subset of!l and hence a
member of d, since .91 contains every possible subset of n. However, as
was indicated in Section 1.2, in general .91 does not consist of all subsets of
n, so we have no guarantee that {roE n: X(ro) = x,} is in .91. The only
reasonable way out is to explicitly assume that X is a function on n such
that this desired property holds. This leads us to the following.
3.1.
Definitions
f(- 3)
= .216,
= .432,
f( -1)
/(1)
.288,
f(3) = .064,
.288
.216
.064
-3
-2
- 1
Figure 1
Binomial distribution. Consider n independent repetitions of the simple success-failure experiment discussed in Section 1.5.
Let S,. denote the number of successes in the n trials. Then S,. is a random
variable that can only assume the values 0, 1, 2, . .. , n. In Chapter 1 we
showed that for the integer k, 0 < k ~ n,
Example 2.
P(S,. = k) =
(~) y(l -
p)"-";
f(x) =
{(~) r(l
0,
- p)"-",
x = 0, 1, 2, . . . , n,
elsewhere.
This density, which is among the most important densities that occur in
probability theory, is called the binomial density with parameters nand p.
The density from Example 1 is a binomial density with parameters n = 3
andp = .4.
One often refers to a random variable X having a binomial density by
saying that X has a binomial distribution (with parameters n and p if one
wants to be more precise). Similar phraseology is also used for other
random variables having a named density.
As expJained in Chapter 2, the binomial distribution arises in random
sampling with replacement. For random sampling without replacement
we have the following.
62
x = 0, 1, 2, ... , n.
P(X = x) =
(rl}x(r - rl)n ~x n!
x!(n - x)! (r),.
_(n)x (r1)x(r -
r1)n-x.
(r),.
f(x) =
0,
0, 1, 2, .. . , n,
elsewhere
or
x = 0, 1, 2, . .. , n,
elsewhere.
This density is called the hypergeometric density.
Here are a few more examples of random variables.
Constant random variable. Let c be a real number. Then
the function X defined by X(w) = c for all ro is a discrete random
variable, since the set {ro: X(ro) = c} is the entire set nand n is an event.
Clearly, P(X = c) = 1, so the density f of X is simply f(c) = 1 and
f(x) = 0, x :1: c. Such a random variable is called a constant random
variable. It is from this point of view that a numerical constant is considered a random variable.
Example 4 .
3. 1.
53
Definitions
= P(X =
f(O) = I - p,
and
f(I) = p,
x =F 0 or I.
f(x) = 0,
Figure 2
P(A)
e
:
1)
2]
-------------- -
n[
2 -
(;)
2i
----
n2
2(n - x)
f(x) =
n2
0,
1
,
x = 1, 2, ... , n,
elsewhere.
54
(i) f(x)
0,
X E
R.
Li f(x,)
= I.
Properties (i) and (ii) are immediate from the definition of the discrete
density function of X. To see that (iii) holds, observe that the events
{co: X(w) = xi} are mutually disjoint and their union is n. Thus
~ f(xi)
i
= ~ P(X = xi)
i
=P(
y {X = xi}) = P(O) = 1.
3 . 1.
55
Definitions
s- 1 ,
x = ( 0,
x = x1 , x 2 , ,
Xs,
elsewhere
Example 7.
{p(l
- pt,
0,
X = 0, 1, 2, ... ,
elsewhere
(1)
f(x)
p)",
0,
= 0, 1, 2, . . . ,
elsewhere.
To show that this .is a density we must verify that properties (i)-(iii)
hold. Here property (ii) is obviously true. That (i) holds may be seen as
follows. For x a nonnegative integer,
= (-a)~
x!
= (-a)( -a -
1) .. (-a - x
,.
X.
1)
56
= (- 1t(ClXCl +
1) ... (Cl
X -
1)
x!
= ( -lY (a + X
n~
x!
Thus
(2)
p(~a)(-lY(l- PY
pa(a + ~
-I)ct- py.
Since the right-hand side of (2) is clearly nonnegative we see that (i) holds.
To verify (iii), recall that the Taylor series of (1 - t) -cz for - I < t < I is
f (-Cl) (-t)x.
(1 - t)-cz =
(3)
x=O
p- =
(-a) (-lY(l -
x=O
p)x
f(x) =
p
{
cz
(a + x
X -
1) (l
)x
- p '
0,
= 0, 1, 2, ... ,
elsewhere.
For some purposes this form is more useful than that given in (1). Observe
that the geometric density with parameter p is a negative binomial density
with parameters a = I and p.
Let A. be a positive number.
Poisson density with parameter A. is defined as
Example 9.
Poisson densities.
The
= 0, 1, 2, . . . ,
elsewhere.
It is obvious that this function satisfies properties (i) and (ii) in the definition of a discrete density function. Property (iii) follows immediately from
the Taylor series expansion of the exponential function, namely,
00
A_X
e"=~-.
x=O X!
Many counting type random phenomena are known from experience
to be approximately Poisson distributed. Some examples of such phenom-
3.2.
57
{ro I X(ro) e A}
U {ro I X(ro) =
x,eA
x1},
where by Ux,eA we mean the union over all i such that xi eA. Usually
the event {ro: X(ro) e A} is abbreviated to {X e A}, and its probability is
denoted by P(X e A). If - oo ~ a < b < oo and A is an interval with
end points a and b, say A = (a, b], then we usually write P(a < X S b)
instead of P(X e (a, b]). Similar notation is used for the other intervals
with these endpoints.
An abbreviated notation is also used for conditional probabilities. Thus,
for example, if A and B are two subsets of R we write P(X E A I X e B)
for the conditional probability of the event {X e A} given the event
{X eB}.
Let/be the density of X. We can compute P(X e A) directly from the
density /by means of the formula
(6)
P(X e A)
x,eA
f(x 1),
where by Lx,e A we mean the sum over all i such that x 1 eA. This formula
follows immediately from (5) since the events {ro I X(ro) = x 1}, i =
1, 2, .. . , are disjoint. The right side of (6) is usually abbreviated as
LxeA f(x). In terms of this notation (6) becomes
(7)
P(X e A) =
f(x).
xeA
L
x~t
f(x),
-00
58
F(b) - F(a).
F(t)
= l:
f(x),
x= -co
where [t] denotes the greatest integer less than or equal to t (e.g.,
[2.6] = [2] = 2). We see that F is a nondecreasing function and that,
for any integ,e r x, F has a jump of magnitude f(x) at x and F is constant on
the interval [x, x + 1). Further properties of distribution functions wi1l be
obtained, from a more general viewpoint, in Chapter 5.
SetS = {1, 2, ... , 10} and let X be uniformly distributed
on S. Then f(x) = 1/10 for x = 1, 2, . . . , 10 and f(x) = 0, elsewhere.
The distribution function of X is given by F(t) = 0 fort < I, F(t) = I for
t > 10 and
Example 10.
[t]
[t]
l:
F(t) =
f(x) = -
10
x= l
1 <X< 10.
< X < 5)
or as
P(3 < X < 5)
Similarly P(3
P(3
= F(5)
- F(3)
5) is obtained as
< X< 5)
= 3/10
or as
P(3 < X ~ 5)
Figure 3
= P(2 <
X < 5)
P (X :5 t)
10
3.2.
59
f(x) = (
Thus F(t)
= 0 fort
r,1 - PY',
= 0, 1, 2, . .. ,
elsewhere.
< 0 and
[t]
t ~ 0.
x=O
1 - F(x - 1)
(1 - p)x.
Geometrically distributed random variables arise naturally in applications. Suppose we have a piece of equipment, such as an electrical fuse,
that neither deteriorates nor improves in the course of time but can fail
due to sporadic chance happenings that occur homogeneously in time. Let
the object be observed at fixed time periods such as hours or days, and let
X be the number oftime units up to and including the first failure, assuming
that the object is new at time 0. Clearly X is a discrete random variable
whose possible values are found among the integers 1, 2, 3, . . . . The
event {X = n} occurs if and only if the object first fails at the nth time
period. Our intuitive notion that the object neither deteriorates nor improves with time can be precisely formulated as follows. If we know that
the object has not failed by time n, i.e., the first failure is after time n so
X > n, then the probability that it does not fail until after time n + m, i.e.,
P(X > n + mIX > n), should be the same as the probability of starting
with an object which is new at time n and having it not fail until after time
n + m. The fact that the failure causes occur homogeneously in time can
be taken to mean that this probability depends only on the number of time
periods that elapse between n and n + m, namely m, but not on n. Thus
P(X > n) should satisfy the equation
(8)
P(X > n
+ m IX >
Since
P(X .> n
P(X > n)
m),
60
P(X > n
in our case since X can assume only values that are positive integers.
Therefore P(X > 0) = 1.
Set p = P(X = 1). Then P(X > 1) = 1 - p and from (9) we see that
P(X
1) - P(X > n)
(1 _ p)"-1 _ (1 _ p)"
= p( 1
_ p)"- 1.
Ifp = 0 then P(X = n) = 0 for all n = 1, 2, ... and thus P(X = + oo)
= 1, i.e., the object never fails. We exclude this case from consideration.
Likewise p = 1 is excluded because then P(X = 1) = 1, so the object
always fails.
Let Y = X - 1. Then Y assumes the values 0, 1, 2, . . . with probabilities P(Y = n) = p(l - p)". We see therefore that Yhas the geometric
distribution with parameter p.
As we have just shown, the random variable Y = X - 1 is geometrically
distributed. This example is typical in the sense that geometrically
distributed random variables usually arise in connection with the waiting
time for some event to occur. We shall discuss this in more detail after we
treat independent trials in Section 3.4.
3.3.
X 1((J))
= xh X 2(w) = x 2 , ,
X,((J))
= x,.
3.3.
61
X(ro)
f(x 1 ,
x,) = P(X1
= x 1 , . , X,
= x,)
or equivalently
xeK.
The probability that X belongs to the subset A of R' can be found by using
the analog of (7), namely,
P(X e A) =
f(x).
xe...t
0, x e K.
Lt f(x 1)
= 1.
82
(y {X= x,
= ~j P(X
=X,
Y =
Y))
= YJ) = L, P(X
=X,
y = y).
This last expression results from using the same notational convention
that was introduced for random variables in Section 3.2. Similarly,
P(Y = y) = P (
{ X= xi, Y = y})
= ~ P(X
= y) = 1: P(X = X ,
= x, y
= y).
fx(x)
~ f(x, y)
y
and
fr(Y) = ~ f(x, y).
(12)
Suppose two cards are drawn at random without replacement from a deck of 3 cards numbered l, 2, 3. Let X be the number on the
first card and let Y be the number on the second card. Then the joint
density f of X and Y is given by f(l, 2) = f(l , 3) = /(2, 1) = /(2, 3) =
/(3, 1) = /(3, 2) = 1/6 and f(x, y) = 0 elsewhere. The first marginal
density, that is, the density of X is given by
Example 12.
fx(I)
= f(l,
1}
+ /(1, 2) + /(1, 3)
3.4.
63
I/4
I/8
I/I6
I/16
l/16
1/16
1/4
1/8
Then /x{l) = L:=l /(1, y) = l/4 + I/8 + 1/16 + 1/16 = 1/2, and
fx(2) = 1 - fx(l) = 1/2 so X has the uniform distribution on 1, 2.
Similarly
/r(l) = 1/4
3.4.
y),
0, 1, y = 1, 2, .. . , 6,
elsewhere.
X =
= fx(x)fy(y).
64
(13)
The random variables are said to be dependent if they are not independent.
As in the case of the combined experiment of tossing a coin and rolling a
die, the notion of independent random variables forms a convenient way
to precisely formulate our intuitive notions that experiments are independent of each other.
Consider two independent discrete random variables having densities
fx andfy, respectively. Then for any two subsets A and B of R
P(X E A, y E B)
(14)
I:
fx.r(x, y)
xeA yeB
~ fx(x)fr(Y)
xeA )IEB
= [ .xeA
~ fx(x)]
=
[I: fr(Y)]
,eB
= X) =
+Y
P(Y ~ X).
Y.
:::::: z) for y
= 0, 1, . . . , z.
so by Example 11
P(min (X, Y) ~ z)
= (1
- py(I - p)z
= (1
_ p)2z.
3.4.
65
P(Y ~ X) =
P(X = x, Y ~ X)
x=O
00
= l:
P(X
.x=O
= x,
Y > x)
00
~ P(X = x)P(Y ~ x)
.x=O
00
l: p(l
x=O
- Pt(l - PY
00
= p ~ (1 - p)2%
x=O
P(X
z) =
P(X = x, X+ Y = z)
x=O
=
=
..
l: P(X =
x=O
L"
x, Y = z - x)
P(X = x)P(Y =
z- x)
x=O
%
l:
.x=O
= y IX + y =
z)
= P(Y =
y' X + y
P(X
P(X =
P(X
z - y, Y
P(X
+ Y
=zP(X
= z)
Y = z)
= y)
= z)
y)P(Y = y)
Y = z)
z+1
66
Consider some experiment (such as rolling a die) that has only a finite or
countably infinite number of possible outcomes. Then, as already explained, we can think of this experiment as that of observing the value of a
discrete random variable X. Suppose thC experiment is repeated n times.
The combined experiment can be described as that of observing the values
of the random variables Xh X 2 , , X,., where X, is the outcome of the
ith experiment. If the experiments are repeated under identical conditions,
presumably the chance mechanism remains the same, so we should require
that these n random variables all have the same density. The intuitive
notion that the repeated experiments have no influence on each other can
now be formulated by demanding that the random variables xh x2, ... '
X,. be mutually independent. Thus, in summary, n independent random
variables X 1 , , X,. having a common discrete density f can be used to
represent an n-fold independent repetition of an experiment having a finite
or countably infinite number of outcomes.
The simplest random experiments are those that have only two possible
outcomes, which we may label as success and failure. In tossing a coin, for
example, we may think of getting a head as a success, while in drawing a
card from a deck of r cards we may consider getting an ace as a success.
Suppose we make n independent repetitions of our simple experiment. We
can then describe the situation by letting Xh X 2 , , X,. ben independent
indicator random variables such that Xi = I or 0 according as the ith trial
of the experiment results in a success or failure. In the literature, trials that
can result in either success or failure are called Bernoulli trials, and the
above situation is described by saying we perform n Bernoulli trials with
common probability p = P(X1 = I) for success. In this context a random
variable that takes on the values I and 0 with probabilities p and 1 - p
respectively is said to have a Bernoulli density with parameter p.
The outcome of performing n Bernoulli trials can be given by the
random vector X = (X1 , X 2 , , X,.). The information conveyed in this
vector tells exactly which trials were a success and which were a failure.
Often, such precise information is not required, and all we want to know is
the number S,. of trials that yielded a success among the n trials. In
Example 2 we showed that S,. was binomially distributed with parameters
n and p. Observe that S,. = X 1 + + X,.. Any random variable Y
that is binomially distributed with these same parameters can be thought of
as the sum of n independent Bernoulli random variables Xh . .. , X,. each
having parameter p.
Let us now consider independent repetitions of an experiment that has a
finite number r ~ 2 of possible outcomes.
Consider an experiment, such as
rolling a die, that can result in only a finite number r of distinct possible
3.4.1.
3.4.
87
then
x1
= 0,
x2 = 3, and x3
= 2.
We will now compute the joint density of X 1, , X,. To this end let
xh x 2 , , x, be r nonnegative integers with sum x 1 + + x, = n. A
moment's thought shows that since the random variables Y1 , Y2 , , Y,.
are independent with a common density, every specified choice of x 1 of
them having value 1, x 2 of them having value 2, ... , Xr of them having
valuer, has the same probability, namely
. px,.
1 2
,
Pxpx2
Thus letting C(n; x 1,
see that
P(X 1
p;".
(:J
ways. The
(:J
C(n- x 1 ; x 2 ,
x,).
68
(17)
n!
= (x ')(
') ... (x,.')
x
1
Indeed, for r = 1 there is nothing to prove. Assume that (17) holds for
r - 1 boxes. Then from (16) we see that
C(n; x 1 ,
... ,
n!
(n - x )!
x,) = - - -- - ___,___ _ 1
(x1 !)(n- x 1)! (x2!)" (x,!)
n!
-=..:...___
as desired.
The joint density f of X1 ,
(18) f(xl, ... ' x,) =
. ,
-----
X, is therefore given by
n!
p"t ... p""
(x1 !) . .. (x,!) 1
.r '
x 1 integers ;;:: 0 such that x 1
(
+ + x, = n,
0, elsewhere.
p"t
.. . p.r-1
" "- (1 _ p _ .. _ p
)n-xt- -x,.-t
1
1
r-1
xk
<
n,
n!
3.4.
69
To see this, observe that in performing then trials we are now only interested in the k + 1 outcomes "1," "2," . . . , "k," and "not (I, 2, ... , k)."
Thus in essence we haven repeated trials of an experiment having k + 1
outcomes, with Xi being the number of times that the ith outcome occurs,
i = I, 2, ... , k. Equation (20) now follows from (19) with r - I = k.
There is
an important connection between the binomial distribution and the
Poisson distribution. Suppose, for example, that we perform n Bernoulli
trials with success probability p,. = ).jn at each trial. Then the probability
of having S,. = k successes in the n trials is given by
3.4.2.
P(S,. = k) =
=
Now as n -+ oo, (n)kfnk
Consequently,
(21)
.
Itm
-+
(~)
(p,.)k(l - p,.)"-k
).k (n)k
k! nk
(t - ~)" (t - ~)
n
1, (l - ).jn)"
(11)
k (p,.)k(l
rt-+CXl
-+
-k.
In the derivation of (2I) we had np,. = ).. Actually (21) holds whenever
np,. -+ ). as n -+ oo.
Equation (21) is used in applications to approximate the binomial
distribution by the Poisson distribution when the success probability p is
smalJ and n is large. This is done by approximating the binomial probability P(S,. = x) by means of f(x) where f is the Poisson density with
parameter ). = np. The approximation is quite good if np2 is small.
The following example illustrates the use of this technique.
A machine produces screws, 1% of which are defective.
Find the probability that in a box of 200 screws there are no defectives.
Example 15.
= e- 2 = .1353.
The fact that the Poisson distribution can arise as a limit of binomial
distributions has important theoretical consequences. It is one justification
for developing models based on Poisson processes, which will be discussed
in Chapter 9. The use of the Poisson approximation as a labor-saving
70
{X1
It follows that
P(W1 = n)
P(X1 = 0, . . . , X,._ 1
P(X1
0) P(X,._ 1
= 0, X,.
= 1)
= O)P(X,. =
= (1 - p)"-p.
Consequently
(22)
P(W1
1 = n) = p(1 - p)".
1)
3.5.
71
Let r > I be an integer and let T, denote the number of trials until the
rth success (so that the rth success occurs at trial T,.). Then T,. is a random
variable that can assume only the integer values r, r + 1, . ... The event
{T,. = n} occurs if and only if there is a success at the nth trial and during
the first n - I trials there are exactly r - 1 successes. Thus
{ T,
= n} =
{X1
+ .. +
=r-
XII_ 1
I} n {X 11
I}.
Since the two events on the right are independent and X 1 + + X11 _ 1
has a binomial distribution with parameters n - 1 and p, we see that for
n = r, r + 1, ...
P(T,
n)
= P(X 1 + . . +
X 11 -
= r - 1)P(X = 1)
1) ,_ 1(1_
- (~ =Dp'(l - p)~~-r.
Consequently
+ n - 1) p'(1 (23)
P(T, 1
n -
) 11 - r
- ( r-1 P
r = n) = (
11
r _
P)11
We see from Equations (4) and (23) that T, - r has the negative binomial
distribution with parameters a = r and p.
Let T0 = 0 and for any integer r > 1 let T,. be as above. Define
W1 = T 1 - T 1_ 1 , i = 1, 2, . . . . Then Wi is the number of trials after the
(i - 1)st success until the ith success. We will now show that for any integer
r > 1 the random variables W1 - I, W2 - 1, . .. , W,. - 1 are mutually
independent and have the same geometric density with parameter p.
To see this let n 1 , n 2 , . , n, be any r positive integers. Then the
event {W1 = n 1 , , W, = n,} occurs if and only if among the first
n1 + + n, trials all are failures except for trials
which are successes. Since the trials are mutually independent with
success probability p we see that
P(W1
= (1-
n (p{l -
p)"'-IJ.
1=1
72
distributed with parameters rand p. We have thus established the interesting and important fact that the distribution of the sum of r independent,
identically distributed geometric random variables with parameter p is
negative binomially distributed with parameters r and p.
Further properties of infinite sequence~ of independent Bernoulli trials
will be treated in the exercises.
3.6.
In this section we discuss methods for finding the distribution of the sum
of a finite number of independent discrete random variables. Let us start
by considering two such variables X and Y.
We assume, then, that X and Y are independent discrete random
variables. Let x 1 , x 2 , . denote the distinct possible values of X. For any
z, the event {X + Y = z} is the same as the event
U {X = x1, Y = z
i
= xi,
Y = z) =
- x 1} .
L P(X = x1,
i
Y = z - x 1)
L P(X = x1)P(Y = z
- x1)
Li fx(xi)fy(z
- x,).
L fx(x)fr(z
- x).
In other words
(24)
fx+r(z) =
(25)
fx+ r(z)
= L
fx(x)fr(z - x).
.x=O
3.6.
73
CIO
<l>x(t} = ~ P(X
x=O
x)r = ~ fx(x}r,
-l=:;;;t<l.
x=O
P(X
= x) =
(j p"(l -
p)"-x
we conclude that
(26)
<l>x(t) == (pt
+ 1 - p)".
and hence
<l>x(t) =
p)"r
.r=O
ao
= p ~
(_ex)
'
( -t(l -
x=O
p))".
p
)llo.
c)x(t) = (
1 - t(1 - p)
74
Poisson distribution.
with parameter A.. Then
Example 18.
A.xe-;.
P(X = x) = - -
x!
and hence
(llx(t)
e-;.
x=O
By settings
(A.t)x .
X!
we see that
(28)
(llx+r(t) =
fz(z)tz
z=O
00
= L
tz
z=O
fx(x)fr(z - x)
x=O
00
00
fx(x)tx
.x=O
z=x
00
00
I:
.x=O
fy(Z - x)tz-x
/x(x)r ~ fr(Y)t'
y= O
The conclusions of the next theorem can be proven most easily by the
"generating function technique/' which is based upon the fact that if
00
!:
.x=O
00
axr =
!:
b.xtx,
-1 < t <
.1,
.x=O
then we may equate the coefficients of tx in the two power series and
conclude that ax = hx, x = 0, 1, 2, . . . . This shows that if two nonnegative integer-valued random variables have the same probability
3.6.
76
generating function, they must have the same distribution. In other words,
the probability generating function of a nonnegative integer-valued
random variable uniqueJy determines the distribution of that random
variable.
Theorem 1 Let X1 ,
+
(pt +
= (pt
1 - p)"1 (pt
1 - p)"1 + .. . +"...
1 - p)""
P)"t + +,.-x
x=O
P(X 1
+ + X, = x)t"
+ +
x,~s
+ +
X,
= x) = ax
76
)CIIl (
1 - t(1 --. p)
(1 - tft - p)r
p
)CX,.
1 - t(1 - p)
+ +ex...
(31)
-1<t<l.
P(SN
= x) = :!:
P(SN
x, N
n)
~ P(Sn
x, N
n)
n)P(S,
n== 0
00
r~=O
00
= :!:
P(N
= xI N =
n).
r~=O
(32)
P(SN
x)
~ P(N
n=O
n)P(S,
x).
77
Exercises
4>sN(t)
= L
rP(SN
x=O
co
x=O
co
tx ~ P(N = n)P(S,. = x)
n=O
00
= x)
~ P(N
00
n=O
n) ~ rP(S,.
x)
x=O
co
~ P(N = n) 4>s"(t)
n=O
00
L P(N =
n)(4>x,(t))"
= 4>N(4>x,(t)).
rt=O
Exercises
x = 1, 2, ... , N,
elsewhere.
.l
.05
78
9
10
11
12
13
-I
-2
1/9
1/27
1/ 27
1/9
2/9
1/9
1/9
1/9
4/27
IY- XI.
79
Exercises
P(M = m
'
= min (X,
. P(X
m)P(Y
+ z),
Y).
z < 0,
z > 0.
Figure 4
If three shots are fired at random at the target, what is the probability
that exactly one shot lands in each zone?
21 Suppose 2r balls are distributed at random into r boxes. Let X 1 denote
the number of balls in box i.
(a) Find the joint density of Xtt ... , X,.
(b) Find the probability that each box contains exactly 2 balls.
22 Consider an experiment having three possible outcomes that occur with
probabilities p 1 , p 2 , and p 3 , respectively. Suppose n independent
80
repetitions of the experiment are made and let X 1 denote the number of
times the ith outcome occurs.
(a) What is the density of X 1 + X 2 ?
(b) Find P(X2 = y I X 1 + X 2 = z), y = 0, 1, ... , z.
23
of 100 fuses has at most 2 defective fuses if 3% of the fuses made are
defective.
25
26 Let Ti be the number of trials up to and including the ith success. Let
T, = x,).
T, = x,)
w,
= x,- x,-1).
Now use the fact that the random variables W1 - 1, . .. , W,. - 1 are
mutually independent random variables, each having a geometric
distribution with parameter p .
27
Let N" be the number of successes in the first n trials. Show that
P(T1 =
28
I N,.
1
= 1) = - '
x = 1, 2, .. . , n.
7;. = x, 1 N,. = r) =
(~) -t,
=X
IN,.= r) =
1) (nX)
r-k
.
(;)
X( k-1
81
Exercises
<j
~ r
31
32
33
34
35
36
X,
1j = Y I Nn = r)
= x,
= y, Z = z I X +
=x +y +
z)
Expectation of
Discrete Random
Variables
Let us consider playing a certain game of chance. In order to play the game, we
must pay a fee of a dollars. As a result of playing the game we receive X dollars,
where X is a random variable having possible values x 1 , x 2 , , x,. The question
is, should we play the game? If the game is to be played only once, then this
question is quite difficult. However, suppose we play the game a large number of
times. After n plays we would pay na dollars and receive X 1 + + X11 dollars.
If we assume that the successive plays of the game constitute independent repetitions of the same experiment (observing a value of X), then we can take the random
variables X 1 , X 2 , , X 11 as mutually independent and having the common density
f of X. Let N 11(x1) denote the number of games that yielded the value x, i.e., the
number of X,'s that assume the value xi. Then we can write
r
X 1 + + X 11
= 1=1
I: x 1Nix1).
X 1 +- - +- XII
---=. . :.: -_
n
t=l
x, [Nn{x1)] .
n
(1)
EX =
I= 1
xJ(x1),
lnt1oduction
83
EX = 0 P(X = 0)
I P(X = I) = p.
Since a random variable having a binomial density with parameters 1 and p is just
an indicator random variable, we see that we can find the probability of the event A
that X = 1 by computing the expectation of its indicator.
We now compute EX for any n ~ 1. In this case X assumes the values 0, 1, 2, . .. ,
n,and
EX =
(~) pi(1
j=O
- p)"- i ,
(J~)
- j!(n-j)!
jn!
1)!
( j - 1}![(n- 1)n(n -
--------~--~-------
U-
1)]!
=n(;=D
Thus
EX= n
)=1
(~
}
1
1
y(l -
p)"-1.
Ex
= np "~1
"- (n -. 1) P'(1 1
1=0
P)n-1-1
p)"-t-1
= [p + (I
so we see that
EX= np.
- p)]"-1
= I,
84
4.1.
Definition of expectation
(2)
I:
EX =
x1J(x1).
j=l
00
j=l
--
~~.e
-A.
~
'--
J!
A
-).
e
(J - 1)!
.
J=l
~ )._i
'-- -
}=0
00
~ . A -A 1-J}-:-e
-
Ex --
j!
,_-A. A.
= ~~.c::
e
Geometric distribution.
ution with parameter p. Find EX.
Example 3.
--
,
11..
Now
00
EX=
l: jp{l
}=0
= p(l
- p)i
00
- p)
l: j(1
- p)l-1
j=O
00
j=O
dp
L-
= -p(l - p)
(t - p)l.
00
l: (1
}=0
- p)l.
Using the formula for the sum of a geometric progression, we see that
(1)
1
-p(l- p) (- ).
p2
4.2.
Ptoperties of expectation
85
Consequently
EX= 1 - p .
p
We will next consider an example of a density that does not have finite
mean.
Letfbe the function defined on R by
Example 4.
= {x(x ~
f(x)
1) '
= 1, 2, . .. ,
elsewhere.
0,
The function/ obviously satisfies properties (i) and (ii) in the definition of
density functions given in Chapter 3. To see that f satisfies property (iii)
we note that
1
1
1
- --= - - -x(x + 1)
x
x + 1
and hence
00
00
f(x) =
x=l
= (1
[1
1]
- --
x=l
- 1/2)
+ (1 /2 - 1/3) + .. . =
1.
Thus (iii) holds andfis a density. Nowfdoes not have finite mean because
1
tX)
tX)
L lxlf(x) = x=1
LX+1
x=l
and it is well known that the harmonic series L:= 1 x- 1 does not converge.
4.2.
Propenies of expectation
~ ({J(x)f(x)
](
= L qJ(x1)f(x1) .
J
86
(4)
< oo
ll:
EZ
~ q>(x)f(x).
ll:
P(Z
= z1}
=
= z1}.
~ fx(x).
x e.AJ
Consequently,
~ lzJI/z(zj) = ~ lz11P(Z = z1)
j
L lzil
Since q>(x)
~ fx(x)
XEAJ
Lj lzilfz(zi) = ~ L
j
lq>(x)l/x(x).
li:EAJ
By their definition, the sets A 1 are disjoint for distinct values of j, and their
union is the set of all possible values of X. Therefore
This shows that Z has finite expectation if and only if (4) holds.
If Z does have finite expectation, then by repeating the above argument
with the absolute signs eliminated, we conclude that (5) holds.
I
= lxl .
Then
by Theorem I, lXI has finite expectation if and only if L x lxlf(x) < oo.
But, according to our definition of expectation, X has finite expectation if
and only if the same series converges. We see therefore that X has finite
expectation if and only if EIXI < oo .
We shall now use Theorem 1 to establish the following important
properties of expectation.
4.2.
87
Properties of expectstion
(ii)
If c is a constant,
cEX.
(iii) X
Y) = EX
EY.
L xfx(x) = cfx(c) = c.
JC
< oo,
L (cx)fx(x)
= c
L xfx(x) =
cEX.
JC
+y
+ L IYlf(x,
y)
)/
)/
= LX lxlfx(x) +
L IYl/r(Y) <
y
oo
Y) =
L
(x + y)f(x, y)
x,y
=~
xf(x, y) +
x,y
~ yf(x, y)
JC,)I
=EX+ EY.
= X- Y
= E(X-
Y)
= EZ
= 1:zfz(z).
..
88
Now the sum of nonnegative terms can equal zero only if all the individual
terms equal zero. Since fz(z 1) > 0 it must be that z 1 = 0. Thus the only
possible value of Z is 0 and consequently P(Z = 0) = 1.
Finally, (v) follows from (iv) and (ii) because -lXI ~ X :S lXI and
hence -.EIXI < EX < EIXl . This completes the proof of the theorem. I
It easily follows from (ii) and (iiii.) that if X 1 , . . . , Xn are any n random
variables having finite expectation, and c 1, , en are any n constants, then
(6)
E(c 1 X 1
+ + cnXn)
= c 1EX1
+ +
cnEXn.
L lxllf(x1)
< M
= lx11)
M)
L f(x1)
lx11<
M for
> 0,
1. Consequently
~ M,
IEXI ~ EIXI =
L lx1lf(xl) ~
M.
4.2.
Properties of expectation
89
(7)
Proof. Observe that since X and Y are independent, the joint density
of X and Y isfx(x)fy{y). Thus
~ lxylf(x, y) = ~ lxiiYifx(x)fr(Y)
x,y
X,)l
L (xy)fx(x}fr(Y)
= [~ xfx(x)]
[ ~ Yfr(Y)]
= (EXXEY).
The converse of this property does not hold; two random variables X
and Y may be such that E(XY) = (EX)(EY) even though X and Yare not
independent.
Let (X, Y) assume the values (1, 0), (0, 1), ( -1, 0), and
(0, -1) with equal probabilities. Then EX = EY = 0. Since XY = 0,
it follows that E(XY) = 0 and hence E(XY) = (EX)(EY). To see that
X and Yare not independent observe, for example, that P(X = O) =
P(Y = 0) = 1/2 whereas P(X = 0, Y = 0) = 0. Thus
Example 5.
P(X
= 0,
= 0)
90
ES,.
= E(X1 + +
=L
X,.)
EXi = np.
'"" 1
Hypergeometric distribution. Suppose we have a population of r objects, r1 of which are of type one and r - r1 of type two.
A sample of size n is drawn without replacement from this population. Let
S, denote the number of objects of type one that are obtained. Compute
Example 7.
ESII.
But sft
= x1 + .. . +
= P(X1 =
I)= r 1
r
= L"
EXi
r
= n _J
i= 1
(;')11-1 (1 - ;') ,
n = 1, 2, .. . .
p,- 1
:....
(1 - i/r)- 1
= r(r-
i}- 1 .
4.2.
91
Properties of expectation
Consequently,
(8)
ESk = 1
+ k-1
~ ( - ' - .)
i= 1
k-1 (
=~
i=o
r -
-.
r -
r(lr + r -
_1_
1
1) .
+ ... + __1_ _
r-k+
We point out for later use that it is clear from the construction of the X 1
that they are mutually independent random variables.
In the previous chapter we have seen that nonnegative integer-valued
random variables X play a prominent role. For these random variables
the following theorem can frequently be applied both to decide if X has
finite expectation and to compute the expectation of X.
(9)
x=t
OC)
~ xP(X
(10)
x=l
= x) = L
x=l
from which the theorem follows immediately. To this end we first write
the left side of (1 0) as
X
00
}: P(X = x) }: 1.
x=l
y=l
OC)
OC)
y=l x=y
y=l
Replacing the dummy variable y by the dummy variable x in the right side
of this equality, we obtain the right side of (10). This shows that (10)
holds, as desired.
I
For an elementary application of this theorem, suppose that X is a
geometrically distributed random variable having parameter p. Then
P(X > x) = (I - p)" and thus by the theorem
OC)
EX =
(1 - p)" = (1 - p)
x=1
92
4.3.
Moments
(11)
JC
and
(12)
E(X - J.L)'
= ~ (x
- tt)'f(x).
In view of (11) and (12), the rth moment and rth central moment are
determined by the density f, and it therefore makes perfectly good sense
to speak of them as the rth moment and rth central moment of this
density.
Suppose X has a moment of order r; then X has a moment of order k
for all k :;:;; r . To see this, observe that if lxl < 1, then
lxl > 1,
1.
so
4.3.
Moments
93
X,
R.
To see this, observe that if lxl < lyl, then lxl 1lyl'- 1 S: lyl 1lyl'- 1 =
IYI' S: lxl' + lyl'; while if lxl ~ lyl, then lxl1lyl'- 1 < lxl' s; lxl' + IYI'.
Thus (13) holds. Using (13) and the binomial theorem we now see that
j=O
S:
But
(;)
(lxl' + Iyl').
(~) =
J
j=O
because
2'
(1
1)'
(~)
j=O
2'
1j1r-j
(~).
j=O
Consequently
L
lx + Yl'i(x, y) ~ 2' ~ (lxl' + I yl')f(x, y)
x,y
x,y
= 2'E(IXI' + IYl')
= 2'"(EIXI' + EIYI') < oo.
Hence by Theorem I, (X+ Y)' has finite expectation.
(2X)(EX)
2(EX) 2
(EX) 2 ]
(EX) 2
94
In other words
(14)
E(X- a)2 = EX 2
2aEX
+ a2
+ (p - a)] 2
+ 2(X- p)(l1
- a)
+ (p
- a)2
Since E(X - 11) = 0, it follows that the cross-product term has zero
expectation and hence
(15)
+ (Jl
(Jl - a)2
- a)2
.x=O
fx(x)ro < oo
4.3.
95
Moments
for some t0 > 1. Then we can regard (l>x as being defined on - t0 < t < t 0
by
IX)
(l>x(t) = ~ fx(x)~,
x=O
(l>x(t) = ~ xfx(x)~- 1,
x=l
and
IX)
x=2
(l>x(1)
~ xfx(x)
EX
x=l
and
00
Thus the mean and variance of X can be obtained from (l>x by means of
the formulas
EX = (l>X(l)
and
Similar formulas, in terms of the higher derivatives of (l>x(t) at t = 1,
can be developed for the other moments of X.
We now illustrate the use of these formulas with the following examples.
Negative binomial distribution. Let X be a random
variable having a negative binomial distribution with parameters ex and p.
Find the mean and variance of X.
Example 9.
= (a + l)atft
Thus
~X(l) =
and
"'c ~ p).
p) 2
96
1- p
p2
e~ pr "e ~ p) - c~ pr
+
"z
Poisson distribution.
= A.e-4<r-t>
and
4>x(t)
Consequently 4>~(1)
= ..PeJ.<t-t>.
A. and 4>i(1)
.A.
EX= ;.,
which agrees with the answer found in Example 2, and
Var X = .A.l
+ A. -
A- 2 =
A..
This shows that if X has a Poisson distribution with parameter A., then the
mean and variance of X both equal A..
4.4.
Variance of a sum
Let X and Ybe two random variables each having finite second moment.
Then X + Y has finite second moment and hence finite variance. Now
Var (X
+ Y)
= E[(X
Y) - E(X
+ Y)] 2
Thus, unlike the mean, the variance of a sum of two random variables
is, in general, not the sum of the variances. The quantity
E[(X - EX)(Y- EY)]
is called the covariance of X and Y and written Cov (X, Y). Thus we have
the important formula
(16)
Var (X
+ Y)
= Var X
+ Var
+ 2 Cov (X,
Y).
4.4.
97
Variance of a sum
Now
(X- EX)(Y- EY) = XY- (Y)(EX)- X(EY)
(EX)(EY),
From this form, it is clear that Cov (X, Y) = 0 whenever X and Y are
independent. (Example 5 shows that the converse is false.) We see from
(16) that if X and Yare independent random variables having finite second
moments, then Var (X+ Y) = Var X+ VarY.
In particular if P(Y = c) = 1 for some constant c, then X and Yare
independent and the variance of Y equals zero ; consequently
Var (X+ c)
(18)
More generally, if Xh X2 ,
finite second moment, then
(19)
Var
= Var X+
. ,
and, in particular, if X 1 ,
(20)
Var
Var (
Var X.
Var (c)
:t: j=tt
t x,) t
=
I= 1
Var X 1
I= 1
Var (X1
+ +
X,.)
= n Var X 1 = nu2
= a 2 Var X.
Binomial distribution.
Var S,.
Now X~
hence
n Var X1
= X 1 because X1 is either 0 or 1.
Thus EX~
EX1 = p and
98
= EX1 = ' 1
r
= ( ';)
(': )
(I - ';) .
Next we must compute the covariances. Assume that l ::s; i < j < n.
Now X 1Xi = 0 unless both Xi and X1 are l, so
EXiXJ
= P(X1 =
1,X1 = 1)
= (':) (': ~ :) .
Thus
(';)2
- ( ':) ( ';
~ 11 -
',)
and hence
"it
i=1
J=i+t
't -
n(n - l) (')
r
2
r r(r - l)
"
n ( ';) ( 1 - ':) ( 1 - :
r (r - 1)
=~) .
It is interesting to compare the mean and variance for the hypergeometric distribution with those of the binomial distribution having the
same success probability p = (r 1 /r). Suppose we have a population of r
objects of which r 1 are of type one and r - r 1 are of type two. A random
99
sample of size n is drawn from the population. Let Y denote the number
of objects of type one in the sample.
If the sampling is done with replacement then Y is binomially distributed
with parameters nand p = (r 1 /r), and hence
and
On the other hand, if the sampling is done without replacement, then Y
has a hypergeometric distribution,
and
Var Y = n ( ', )
1 - ':) ( 1 - ;
=~) .
The mean is the same in the two cases, but in sampling without replacement the variance is smaller. Intuitively, the closer n is to r the more
deterministic Y becomes when we sample without replacement. Indeed, if
n = r then the variance is zero and P(Y = r 1) = 1. But if r is large
compared to n, so that (n/r) is close to zero, the ratio of the variances
obtained in sampling with and without replacements is close to one. This
is as it should be, since for fixed n and large r there is little difference
between sampling with replacement and sampling without replacement.
4.5.
Correlation coefficient
(22)
_ (X Y) _
p - p '
-
Cov (X, Y)
(Var X) (VarY)
-r==========~
.J
(23)
100
We now show that (23) always holds. From the above discussion we can
assume that P( Y = 0) < I and hence EY 2 > 0. The proof is based on a
simple but clever trick. Observe that for any real number A.
0 < E(X- lY) 2
A.2 EY 2
2A.EXY
+ EX 2
E(X - aY)2 = EX 2
2
-
[E(XY)]
EY 2
=OJ =
1.
that is,
[Cov (X, Y)r < (Var X)(Var Y).
Thus by the definition of p
lp(X, Y)l ~ 1.
We also see from Theorem 7 that IPI = 1 if and only if P(X = aY) = I
for some constant a.
The correlation coefficient is of limited use in probability theory. It
arises mainly in statistics and further discussion of it will be postponed to
Volume II.
4.6.
Chebyshev's inequality
4.6.
Chebyshev's inequality
101
having the two possible values 0 and t which it assumes with probabilities
P(Y = 0) = P(X < t) and P(Y = t) = P(X ~ t) respectively. Thus
EY
= tP(Y =
t)
0 P(Y
0)
= tP(Y =
t)
= tP(X ~
t).
or
P(X
(25)
t)
~EX.
t
Chebyshev's Inequality.
Jl and finite variance a
(26)
P(IX -
Ill
(]2
t) < 2 .
t
102
Var
(~n)
(Isnn - I > b)
P. -
b2
~2 .
nb
(28)
b) = o.
sn
n
Jl. l
b) = 0.
Whenever the X 1 have finite mean, the weak law holds. However, when
they also have finite variance, then (27) holds. This is a more precise
statement since it gives us an upper bound for P
of n. We now illustrate the use of (27) by applying it to binomially distributed random variables.
Let X 1 , X 2 , , Xn be n independent Bernoulli random variables assuming the value 1 with common probability p . Then Jl = p and u 2 =
p(l - p). Thus (27) shows that
(29)
(l ~n- PI ~ <5)
<
p(ln~ p) .
4.6.
1t3
Chebyshev's inequality
Since p(1 - p) < 1/4 ifO < p < 1 (because by the usual calculus methods
it can easily be shown that p(l - p) has its maximum value at p = 1/2),
it follows that regardless of what p may be,
(30)
(IS,.- PI >b) ~ _1 .
4nb2
Equation (29) is useful when we know the value of p, while (30) gives us a
bound on P (
~"
- p
1/2, (29) and (30) do not differ by much, but ifpis far from 1/2 the estimate
given by (29) may be much better. (Actually even the bounds given by
(29) are quite poor. We shall discuss another method in Chapter 7 that
yields much better estimates.)
Suppose b and e > 0 are given. We may use (29) or (30) to find a lower
bound on the number of trials needed to assure us that
Indeed, from (29) we see that this will be the case if p(1 - p)/nb 2 < e.
Solving for n we find that n > p(l - p)/eb2 If we use (30), then
n > (4eb 2 )- 1 trials will do. We stat,e again that these bounds on n given
by Chebyshev~s inequality are poor and that in fact a much smaller
number of trials may be sufficient.
As an illustration of the difference between these two estimates for n,
choose b = .1 and e = .01. Then b2 e = 10- 4 and from (30) we see that to
guarantee that
This should be compared with the exact value for this probability which
is .038.
104
Exercises
2x
f(x) =
N(N
{
0,
1),
= 1, 2, ... , N,
elsewhere.
X=
x=l
N(N + 1)
2
and
fx
= N(N
x=1
1X2N
6
= 4 and
E sin (rrX/2).
3 Let X be Poisson with parameter A. Compute the mean of (I
4
1).
p . Find
X)- 1
P(IX -
Yl 5:: M) = I
for some constant M. Show that if Y has finite expectation, then X has
finite expectation and lEX- EYI < M.
Let X be a geometrically distributed random variable and let M > 0
be an integer. Set Z = min (X, M). Compute the mean of Z.
Hint: Use Theorem 5.
Let X be a geometrically distributed random variable and let M > 0
be an integer. Set Y = max (X, M). Compute the mean of Y.
Hint: Compute P(Y < y) and then use Theorem 5.
Let X be uniformly distributed on {0, I, ... , N}. Find the mean and
variance of X by using the hint to Exercise 1.
Construct an example of a density that has a finite moment of order
r but has no higher finite moment. Hint : Consider the series
:E;'= 1 x-<r+l> and make this into a density.
10 Suppose
4
-1 < t < 1,
and
4>_i(t)
= EX(X-
1)tx- 2 ,
-1<t<l.
Exercises
105
(b) Use Theorem 4 to rederive the result that if X and Yare independent nonnegative integer-valued random variables, then
-l<t~l.
15 Let X1,
"
i.J (X" -
-2
X)
=
k=l
"
~(
i.J
k=l
(X" - X) 2 ) = (n - 1)u2
k=l
16
P(X,
1, X 1 = 1)
1
= -n(n -
1)
Var S".
19 Establish the following properties of covariance:
(a) Cov (X, Y) = Cov (Y, X);
(b) Cov
Ct. itt
a 1X1,
h1 Y1) =
~~ 1
106
21 Suppose X and Yare two random variables such that p(X, Y) = 1/2,
VarX = 1, and Var Y = 2. Compute Var (X- 2Y).
22 A box has 3 red balls and 2 black balls. A random sample of size 2 is
without replacement from the box. Let X be the number on the first
ball and let Y be the number on the second ball. Compute Cov (X, Y)
and p(X, Y).
24 Suppose an experiment having r possible outcomes 1, 2, ... , r that
-J
by carrying out the following steps. Let / 1 = 1 if the ith trial yields
outcome l, and let Ii = 0 otherwise. Similarly, let J 1 = 1 if the ith
trial yields outcome 2, and let Ji = 0 elsewhere. Then X = / 1 +
+ I,. and Y = J 1 + + J,.. Now show the following:
(a) E(l/i) ...: o:
(b) If i #: j, E(IiJi) = PtP2
(c) E(XY)
=E
(t fiJi) + E (t
1
i~i
IiJJ)
= n(n - l)Pt P2
(d) Cov (X, Y) = -nPtP2
25 Suppose a population of
Exercises
107
(d) Use (c) to compute p(X, Y). Compare with the corresponding
correlation coefficient in Exercise 24 withp 1 = r 1 fr andp2 = r2 /r.
26 Let X be a random variable having density f given by
1/18,
f(x) = { 16/18,
= 1, 3,
X=
2.
Show that there is a value of~ such that P(IX- Jll > ~) = Var X/~ 2 ,
so that in general the bound given by Chebyshev's inequality cannot be
improved.
27 A bolt manufacturer knows that
5%
~!.
A.
ability generating function ~x(t) = Etx is finite for all t and let x 0
be a positive number. By arguing as in the proof of Chebyshev's
inequality, verify the following inequalities:
0 < t
(b) P(X >
X 0)
< ~xx(t) ,
t
1;
t > I.
-inequalities:
;
;
( 2,t) < (2)A/2
(a) P X<
~ (~f
JYIX
I X ) . -_ {P(Y
= y I X = x),
0
'
if P(X = x) > 0,
elsewhere.
108
ditional expectation:
(a) fr(y)
~fx(x)fl'txCY I x);
JC
JC
34 Let N
E[YN
35 Let {X,.}, n
IN
n]
= EY,..
and
ES~ = u 2 EN
Var SN = u 2 EN
+ Jl2 EN2 ,
+ Jl2 Var N.
Continuous Random
Variables
I X(w) = x}) = 0,
-00
<
<
00,
that is, such that X takes on any specific value x with probability zero.
It is easy to think of examples of continuous random variables. As a first
illustration, consider a probabilistic model for the decay times of a finite number
of radioactive particles. Let T be the random variable denoting the time until the
first particle decays. Then T would be a continuous random variable, for the
probability is zero that the first decay occurs exactly at any specific time (e.g.,
T = 2.()()()() .. .. seconds). As a second illustration, consider the experiment of
choosing a point at random from a subset S of Euclidean n-space having finite
nonzero n-dimensional volume (recall the discussion of this in Chapter 1). Let X
be the random variable denoting the first coordinate of the point chosen. It is clear
that X will take on any specific value with probability zero. Suppose, for example,
that n = 2 and S is a disk in the plane centered at the origin and having unit
radius. Then the set of points inS having first coordinate zero is a line segment in
the plane. Any such line segment has area zero and hence probability zero.
Generally speaking, random variables denoting measurements of such physical
quantities as spatial coordinates, weight, time, temperature, and voltage are most
conveniently described as continuous random variables. Random variables which
count objects or events are clear examples of discrete random variables.
There are cases, however, in which either discrete or continuous formulations
could be appropriate. Thus, although we would normally consider measurement of
109
110
F(x) = P(X
x),
- 00
<X<
00.
= F(b)
- F(a),
a< b.
In order to verify (1), set A = {co I X(w) ~ a} and B = {w I X(w) < b}.
Then A ; Band, by the definition of a random variable, both A and B
are events. Hence {w I a < X < b} = B t1 Ac is an event and (1) is a
special case of the fact proven in Section 1.3 that if A ; B, then
P(B
t1
Ac)
P(B) - P(A).
5.1.
111
0 <X< R.
If x < 0, then P(X < x) = 0. If x > R, then P(X < x) = I. Thus the
distribution function F of the random variable X is given by
X< 0,
0 s X oe::: R,
X> R.
(2)
The graph ofF is given in Figure 1. It follows from Formulas (I) and (2)
that if 0 < a < b ~ R, then
P(a
< X < b)
= F(b)
- F(a)
b2- a2
R2
R
Figure 1
Since X takes on only positive values, P(X < x) = 0 for x < 0 and, in
particular, P(X < 0) = 0. For 0 < x < oo,
P(X
< x)
= P(X
= P(O <X~ x)
=1-
e-
lx
X~
0,
X> 0.
112
~~!:
I,
2 ~X.
*..----1
2
Figure 2
5.1.1.
5. 1.
113
n ,.= 0
-oo
+oo
U B,. = n.
and
11=0
11=0
lim P(B11)
= P(0) = 0
and
11-+-oo
11 ...
+ 00
= lim
F(- co)
lim P(B,)
F(n) =
11-+-oo
= 0
11-+-oo
= P(X =:;
x),
-00
< X<
00.
-00
<X<
00.
F(x+)
The proofs of (4) and (5) are similar to the proof of (iii). To prove (4),
for example, we need only show that F(x + 1/n) --+ P(X < x) as
n -+ + oo. This can be done by setting
B,
noting that n11 B, = {ro I X(ro) < x} and repeating the argument of(iii).
From (4) and (5) we see immediately that
(6)
F(x+) - F(x-)
P(X
x),
-oo<x<oo.
114
P(a
< X < b)
= P(a ~ X ~ b) = P(a
= F(b)
< X < b)
- F(a),
so that < and < can be used indiscriminately in this context. The
various properties of a distribution function are illustrated in Figure 3.
(Note that the random variable having this distribution function would be
neither discrete nor continuous.)
F(+co)-1
f(x-)
f(-oo)-0
X
Y-o
Figure 3
if X(w) < t0 ,
if X(w) > t 0
5.2.
115
Thus Y is the decay time, if this time is observed (i.e., is less than or equal
to t 0 ), and otherwise Y = t0 The distribution function Fr of Y is given by
Fr(Y) =
< 0,
0 < y < t0 ,
0,
1 - e-;..,,
1,
y >to.
F(x)
1 for all x,
(iv) F(x+)
= F(x) for
1,
all x.
J~oo f(x) dx
= 1.
F(x) =
-00
<X <
00,
116
If X is a continuous random variable having F as its distribution function, where F is given by (8), then f is also called the density of X . In the
sequel we will use "density function" to refer to either discrete density
functions or density functions with respect to integration. It should be
clear from the context which type of density function is under consideration. For example, the phrase "let X be a continuous random variable
having density f" necessarily implies that f is a density function with
respect to integration.
It follows from (I) and (8) that if X is a continuous random variable
having density f, then
(9)
a < b,
(10)
JA f(x) dx
Figure 4
(11)
f(x)
= F'(x),
0,
(12)
F'(x)
2x/R
0,
X< 0,
O<x<R,
X> R.
5.2.
117
We note that (8) does not definefuniquely since we can always change
the value of a function at a finite number of points without changing the
integral of the function over intervals. One typical way to define f is by
setting f(x) = F'(x) whenever F'(x) exists and f(x) = 0 otherwise. This
defines a density ofF provided that F is everywhere continuous and that F'
exists and is continuous at all but a finite number of points.
There are other ways to derive or verify formulas for the density of a
continuous distribution function F. Given a density function f we can
show that f is a density function of F by verifying that (8) holds. Alternatively, we can reverse this process and show that F can be written in the
form (8) for some nonnegative function f. Then f is necessarily a density
function of F. These methods, essentially equivalent to each other, are
usually more complicated than is differentiation. However, they are
rigorous and avoid special consideration of points wher,e F'(x) fails to
exist.
We will illustrate these methods in our first example of the following
subsection.
Let X be a continuous random
variable having density f. We will discuss methods for finding the density
of a random variable Y which is a function of X.
5.2.1 .
To solve this problem we first let F and G denote the respective distribution functions of X and Y. Then G(y) = 0 for y < 0. For y > 0
G(y)
P(Y < y)
= P(- ..}y
=
P(X 2 < y)
F(..}y) - F( - .Jy)
lr
2vy
=
Thus Y
(13)
lr (f(..}y)
2v y
+ f( -Jy)).
for y > 0,
for y
0.
118
Although (13) is valid in general, our derivation depended on differentiation, which may not be valid at all points. To give an elementary but
completely rigorous proof of (13), we can define g by the right side of (13)
and write for x > 0
% g(y) dy
f.%
- oo
1
, (J(JY)
2v y
obtain
I:w g(y) dy
+ f( -Jy)) dy.
(f(z)
+ /( -
z)) dz
I.r"
f(z) dz
-./~
g(y)
= 2)y
2Jy
R2
1
R2'
(14)
F(x)
0,
x <a,
(x - a)j(b - a),
a ::5: x ::5: b,
I,
X> b.
5.2.
119
G(y)
P(Y ~ y)
= P(log (1
- X) ~ - A.y)
Hence G'(y) = A.e-;.' for y > 0 and G'(y) = 0 for y < 0. The density of
Y is therefore given by .
y > 0,
(16)
g(y) = {A.e-;.',
0,
y ~ 0.
This density is called the exponential density with parameter A. and will be
discussed further in the next section.
The above example is a special case of problems that can be solved by
means of the following theorem.
g(y) =
/(,-'(y))
I~ "'-'(y)l
q>(l).
g(y) = /(x)
~~;I
'
cp(J)
and
120
F(rp-1(y)).
~ F(rp- 1(y))
dy
:y
= f(cp-(y))
rp-1(y).
Now
I!!_
(/) -1(y)l
dy
because rp- 1 is strictly increasing so that (17) holds. Suppose next that rp
is strictly decreasing on 1. Then <p- 1 is strictly decreasing on rp(J), and
for y e <p(/)
G(y) = P( Y =:;; y)
= P(<p(X)
=
< y)
= 1 - F(<p- 1(y)).
Thus
G '(y) = - F'(rp -1(y))
_!!_ cp -1(y)
dy
= f(rp-1(y))
(- :y cp-l(y)).
Now
_.!!._qJ-1(y) =
dy
l~rp-1(y)l
dy
5.2.
121
> 0,
y !5: 0.
(19)
ibi1 1 (Y -b a) '
g(y) =
-oo <
< oo.
= 2y,
g(y) = Rf(Ry)
and g(y)
0 < y < 1,
= 0 elsewhere.
J~> g(x) dx
< oo.
Let g(x)
where. Then
c =
= x(l
t x(l -
- x), 0 < x
x) dx =
122
c=
co
dx
-co
=arctanx
leo
7t
=--
-co
Consequently f = c- 1g is given by
+ x2) '
f(x) = n(l
-oo<x<oo.
This density is known as the Cauchy density. The corresponding distribution function is given by
1
F(x) = -
+ -1 arctan x,
-oo<x< ,oo.
7t
In solving this problem we will let E> be the random variable denoting
the angle chosen measured in radians. Now X = tan 0 and hence (see
Figure 5) for - oo < x < oo,
P(X
x)
= P(tan 0 s
=
x)
=~
(arctan x - ( -
~))
= -1 + -1 arctan x.
2
7t
Figure 5
5.2.
Densi~s
123
Symmetric densities.
variable.
Proof. We will prove this result for continuous random variables.
The proof for discrete random variables is similar. In our proof we will
use the fact that for any integrable function/
-oo < x < oo.
Suppose first that X has a symmetric density f Then
P(-X
= P(X >
x)
-x)
f~xj(y) dy
= J:<X) /(- y) dy
= J~<X) f(y) dy
= P(X ~ x),
- l/2[P(X
~ x)]
= P(X ~ x).
124
F( -x) =
=
J~: f(y) dy
LCX) I<- y) dy
= LCX) f(y) dy
=
and hence
(20)
F( -x) = 1 - F(x),
- OO <X<OO.
For this reason, when tables of such a distribution function are constructed, usually only nonnegative values of x are presented.
5.3.
5.3.1.
Normal densities.
There is no simple formula for the indefinite integral of e-"212 The easiest
way to evaluate c is by a very special trick in which we write c as a twodimensional integral and introduce polar coordinates. To be specific
c2
= f oo
e-:x2f2 dx
= 21t fooo
re-r2f2 dr
= - 21te-r2f21 ~
= 27t.
foo
- oo
- QO
e-y2f2 dy
125
f(x) =
(2n)-lf2e-xlf 2,
-00
<
<
OC>.
(21)
e-:xlf 2
(lO
dx =
~2n.
-QO
The density just derived is called the standard normal density and is
usually denoted by cp, so that
_
cp(x ) -
(22)
..j2n e
-x:Zj 2
,.
-OC>
< X<
00.
<I>(- x) = 1 - <l>(x),
(23)
n(y;
J,4,
u2) =
1
u~
v'
e-<11-l'):t/2
lip
u
(y - ~-'), -
00
<y <
00
y)
= P
(x ~ Y ~ JL)
~ (b ~
11) - <I>
(a ~ ll) .
126
= 0 and b =
= <D(l)
~( -1/2)
= <ll{l)
3. We
- (1 - <D{l/2))
= .8413 - .3085
= .5328.
Jl
= (a
bp,)
+ b(Jl + uX)
which is distributed as n(a + bJt, b 2 u 2 ).
a
bY= a
+ buX,
6.3.2.
(26)
Exponential densities.
f(x) =
{le-lx,
0,
X~
0,
X< 0.
5.3.
127
(27)
0,
X~
0,
X< 0.
P(X
>a + b)
= P(X
a> 0 and b
0.
P(X > a
+ b IX >
128
a > 0
b) = G(a)G(b),
and b > 0.
G(nc)
= (G(c))"
and
G(c) = (G(cfm))'".
We claim next that 0 < G(l) < 1. For if G(1) = 1, then G(n) =
(G(l))" = 1, which contradicts G( + oo) = 0. If G(l) = 0, then G(l/m) = 0
and by right-continuity, G(O) = 0, another contradiction.
Since 0 < G(1) < 1, we can write G(1) = e-:A where 0 < ). < oo. It
follows from (30) that if m is a positive integer, then G(lfm) = e-:AJm. A
second application of (30) yields that if m and n are positive integers, then
G(nfm) = e-;."1'". In other words G(y) = e-J.y holds for all positive
rational numbers y . By right-continuity it follows that G(y) = e-J.y for
all y > 0. This implies that F = 1 - G is the exponential distribution
function with parameter A..
I
Before defining gamma densities in general
we will first consider an example in which they arise naturally.
5.3.3.
Gamma densities.
Example 12.
n(O, u
).
1
r (/(../y)
2~y
+ /( -.J y)),
= 0 for y
y
0 and
> 0.
g(y) =
1
u../2ny
e-y/ 2t12'
y > 0.
5.3.
129
fJ yx-1e-y dy.
Jo
There is no simple formula for the last integral. Instead it is used to define
a function called the gamma function and denoted by r. Thus
c
= A_
-I r(a),
where
fo(() X
r(a) =
(32)
11
e-x
dx,
a> 0.
(33)
- A_ x -1 e -Ax,
r(x;a,A.) = { ~(ex)
X> 0,
X< 0.
'
We also record the following formula, which will prove to be useful:
(f. )
(34)
r(cx)
x c:r-1 e -).x d x=--.
).
The exponential densities are special cases of gamma densities. Specifically, the exponential density with parameter A. is the same as the gamma
density r(l, A.). The density given by (31) was also seen to be a gamma
density with parameters a = 1/2 and A. = If2u2 In other words, if X
has the normal density n(O, u 2 ), then X 2 has the gamma density r(I/2, l/2u2).
By equating (31) and (33) with a = I /2 and A. = 1/2(/ 2 we obtain the useful
fact that
r(l/2)
(35)
= .J;.
r(a
(36)
1) = ar(a),
a> 0.
r(a
1) =
fo(() xa.e-x dx
-xe-x~~ +
= ar(a).
fo(() cxx-le-x dx
130
(37)
(n - 1)!.
It also follows from (35), (36) and some simplifications that if n is an odd
positive integer, then
n) .J~n - 1)!
(
r 2 2._, (n ; 1)
(38)
rx l'"y'"-1e-A.>' dy =
Jo
(m -
1)!
-(A.y)'"-1e-A.)'IX
(m - 1)!
rx A.'"-1y'"-2e-A.>' dy
o Jo
J.o
(m - 2)!
(AX)"'- 1e- b
y-
(m - 2)!
(m - 1)!
'
A.'"y'"-1 e-A.y
- - dy
o (m - 1)!
(39)
= 1-
m-1
(A.x)"e- .b
Jc=o
k!
'
X> 0.
Figure 6.
5.4.
131
having respective densities r(ah A.) and r(a2, A.), then X + Y has the
gamma density r(a1 + a 2 , A.). This result will be proven in Chapter 6.
This and other properties of gamma densities make them very convenient
to work with. There are many applied situations when the density of a
random variable X is not known. It may be known that X is a positive
random variable whose density can reasonably well be approximated by a
gamma density with appropriate parameters. In such cases, solving a
problem involving X under the assumption that X has a gamma density
will provide an approximation or at least an insight into the true but
unknown situation.
5.4.
= ~~:~ =
1,
0 < y < 1,
132
= tan 0 =
~)
tan ( 1r Y -
has the Cauchy distribution. This is exactly what we would get by using
the result of the previous paragraph. According to Example 10, the
Cauchy distribution function is given by
1
F(x) = -
+ -1 arctan x,
-00
<X <
00,
1t
= -21 + -1 arctan x,
1t
has solution
x = F- 1(y) =tan
(1ty- ~).
(b : Jl) .
= .9.
We need to solve
5.4.
133
b - .Jl
<1>-1(.9)
(1
or
b = Jl
u<I>- 1 (.9).
+
In applied statistics the number b =
P(X
:s:
Jl
1.28u and
1.28u) = .9.
Jl
~ (;)
- <I> (- ;)
.9.
2<1> (;) - 1 = .9
and hence a = 011>- 1 (.95). From Table I we see that <1- 1(.95) = 1.645.
In other words,
P(Jl - l.645u
X < Jl
1.645u) = .9.
.675u) = .5
P(Jt - .675u
X ::; Jl
134
or equivalently,
P(IX -
PI
:S .675u)
.5.
This says that if X has the normal density n(Jl~ u 2 ), then X will differ from
11 by less than .675u with probability one-half and by more than .675u with
probability one-half. If we think of J1. as a true physical quantity and X as a
measurement of Jl, then IX- 111 represents the measurement error. For
this reason .675u is known as the probable error.
Exercises
space of radius R. Let X denote the distance of the point chosen from
the center of the ball. Find the distribution function of X.
4
Let a point be chosen uniformly over the interval [0, a]. Let X denote
the distance of the point chosen from the origin. Find the distribution
function of X.
a base of length I and height h from the base. Let X be defined as the
distance from the point chosen to the base. Find the distribution
function of X .
6 Consider an equilateral triangle whose sides each have length s. Let a
point be chosen uniformly from one side of the triangle. Let X denote
the distance of the point chosen from the opposite vertex. Find the
distribution function of X.
7 Let the point (u, v) be chosen uniformly from the square 0 < u :S 1,
0 :S v :S 1. Let X be the random variable that assigns to the point
(u, v) the number u + v. Find the distribution function of X.
8 Let F be the distribution function given by Formula (3).
Find a
Let X denote the decay time of some radioactive particle and assume
that the distribution function of X is given by Formula (3). Suppose A.
issuchthatP(X ~ .01) = 1/2. FindanumbertsuchthatP(X ~ t) =
.9.
10
Exercises
135
11
F(x) =
0,
X< 0,
0 <
3'
X
< 1,
~ X
< 2,
2'
1,
X> 2.
Find:
(a) P(l/2 < X < 3/2);
(b) P(l/2 ~ X < I);
(c) P(l /2 ~ X< I) ;
(d) P(l ~ X ~ 3/2) ;
(e) P(l < X < 2).
12 If the distribution function of
-00
<X<
00.
F(x) =
2 + 2(1xl + 1)'
f . Find a
= lXI.
136
= lf(X + 1).
f.
P(X < x) =
-oo
g(z) dz,
How
26
28 Let
erf(x)
r~
~; Jo e -yl dy,
= lXI.
Find the density of Y = eX.
Find
137
Exercises
37 Let
P(Jl
tPl(l
~ X ~ Jl
+ tPlCT)
= P2 - P1
decay times which are exponentially distributed with some parameter A..
If one half of the particles decay during the first second, how long will it
take for 75% of the particles to decay?
40 Let
f(t)
(t)
1 - F(t) - g '
0 < t <
00.
~t+
dt I T > t) = f(t) dt
1 - F(t)
(b) Show that for s > 0 and t > 0,
P(t
= g(t) dt.
= e-s: .. ,<u>
du.
(c) Show that the system improves with age (i.e., for fixed s the expressions in (b) increase with t) if g is a decreasing function, and
the system deteriorates with age if g is an increasing function.
(d) Show that
00
g(u) du =
00.
138
41 Let X have the gamma density r(ex,
where c
eX,
> 0.
42 Show that if ex
43
Let X
46 Find ~- 1 (t) fort = .1, .2, . .. , .9, and use these values to graph ~- .
1
47 Let
X have the normal density n(Jl, u2 ). Find the upper quartile for X
48 Let X have the Cauchy density. Find the upper quartile for X .
49 Let X have the normal density with parameters Jl and u 2 = .25. Find
a constant c such that
P(IX - Jll :s; c) = .9.
so Let X be an integer-valued random variable having distribution
function F, and let Y be uniformly distributed on (0, 1). Define the
integer-valued random variable Z in terms of Y by
=m
Jointly Distributed
Random Variables
In the first three sections of this chapter we will consider a pair of continuous
random variables X and Y and some of their properties. In the remaining four
sections we will consider extensions from two ton random variables X 1 , X 2 , . ,
X 11 The discussion of order statistics in Section 6.5 is optional and will not be
needed later on in the book. Section 6.6 is mainly a summary of results on sampling
distributions that are useful in statistics and are needed in Volume II. The material
covered in Section 6.7 will be used only in proving Theorem 1 of Chapter 9 and
Theorem 1 of Chapter 5 of Volume II.
6.1.
To see that F is well defined, note that since X and Yare random variables,
both {w I X(w) ::5: x} and {w I Y(w) < y} are events. Their intersection
{w I X(w) < x and Y(w) < y} is also an event, and its probability is
therefore well defined.
The joint distribution function can be used to calculate the probability
that the pair (X, Y) lies in a rectangle in the plane. Consider the rectangle
R
(1)
::5:;
b, c < Y < d)
= F(b, d) -
<
<
b, Y
< d)
= P(X
<
b, Y
< d) -
139
P(X
<
a, Y
< d)
140
Similarly
P(a < X < b, Y < c)
= F(b, c)
- F(a, c).
Thus
P(a < X ~ b, c < Y ~ d)
= P(a
< X
= (F(b, d) -
b, Y
Fr(Y) = P(Y ~ y)
and
are called the marginal distribution functions of X and Y. They are related
to the joint distribution function F by
Fx(x) = F(x, oo) =lim F(x,y)
y-+ 00
and
Fr(Y)
F(oo, y)
lim F(x, y) .
.x-+ 00
F(x, y) =
- 00
then f is called a joint density function (with respect to integration) for the
distribution function F or the pair of random variables X, Y. Unless
otherwise specified, throughout this chapter by density functions we shall
mean density functions with respect to integration rather than discrete
density functions.
IfF has density f, then Equation (I) can be rewritten in terms of j, to
give
(3)
d)
J.b
(J:
P((X, Y) e A)
Jf
f(x, y) dx dy
1.
6.1.
141
< x)
s:>
s:<Xl f(x,
fx(x) =
which satisfies
J:>
F x(x) =
y) dy
fx(u) du.
~ F(x,
ay
y) =
f"
- >
(j_ay f'
f(u, v) dv) du
- >
J:> f( u, y) du
and
(6)
02
ax oy
Under some further mild conditions we can justify these operations and
show that (6) holds at the continuity points of f. In specific cases instead
of checking that the steps leading to (6) are valid, it is usually simpler to
show that the function f obtained from (6) satisfies (2).
Let us illustrate the above definitions and formulas by
reconsidering Example 1 of Chapter 5. We recall that in that example, we
chose a point uniformly from a disk of radius R. Let points in the plane be
determined by their Cartesian coordinates (x, y). Then the disk can be
written as
Example 1.
{(x, y) I x2
y2 < R2}.
142
(7)
f(x, y) =
{n!2 '
0,
elsewhere.
Then for any subset A of the disk (say of the type considered in calculus),
P((X, Y) e A) =
JJf(x , y) dx dy
A
area of A
which agrees with our assumption of uniformity. The marginal density fx
is given by
l
2.../R2 _ X2
fx{x)
oo
f(x, y) dy
-oo
JJR2-x2
_
- - 2 dy =
. -JR2-x2 nR
nR
for - R < x < R and fx(x) = 0 elsewhere. The marginal density f r(Y)
is given by the same formula with x replaced by y.
The variables X and Y are called independent random variables if
whenever a < b and c 5: d, then
(8)
P(a < X < b, c < Y ~ d) = P(a < X < b)P(c < Y ~ d).
F(x, y)
Fx(x)Fy(y),
Conversely (9) implies that X and Y are independent. For if (9) holds,
then by (1) the left side of (8) is
F(b, d) - F(a, d) - F(b, c)
F(a, c)
= Fx(b)Fy(d) -
Fx(a)Fy(d) - Fx(b)Fy(c)
Fx(a)Fy(c)
Fy(c))
< Y <d).
and
{w
I X(ru) e B}
6. 1.
143
-00
is a joint density for X and Y. This follows from the definition of independence and the formula
.; '
)~" (
J'X\X'Jr
) =
4JR2
x2
J R2 -
y2
n2R4
'
which does not agree with the joint density of these variables at x = 0,
y = 0. Since (0, 0) is a continuity point of the functions defined by (7) and
(I 0), it follows that X and Yare dependent random variables. This agrees
with our intuitive notion of dependence since when X is close to R, Y must
be close to zero, so information about X gives us information about Y.
Density functions can also be defined directly, as we have seen in other
contexts. A two-dimensional (or bivariate) density function f is a nonnegative function on R 2 such that
= 1.
f(x, y)
= ft (x)f2(y),
If random variables X and Yhave thisf as their joint density, then X and Y
are independent and have marginal densitiesfx = / 1 andfr = / 2
As an illustration of (11), let / 1 and f 2 both be the standard normal
density n(O, 1). Then f is given by
144
!(x, y ) -
-.x'-/ 2
../2n e
-y'-{2
../ire e
or
(12)
- __!_ e -(.x'-+y'-)/ 2
/(x y) -
'
21t
-00
'
< x, y <
00.
The density given by (12) is called the standard bivariate normal density.
In our next example we will modify the right side of(l2) slightly to obtain a
joint density function that corresponds to the case where the two random
variables having normal marginal densities are dependent.
Let X and Y have the joint density function f given by
Example 2 .
!( x, y) -- ce -(.x'--.xy+yl)/ 2 ,
<
- 00
X,
y <
00,
where cis a positive constant that will be determined in the course of our
discussion. We first "complete the square" in the terms involving y and
rewrite! as
f(x, y) =
ce-[(y-.xf2)2+J.x2f41/ 2'
- 00
<
X,
<
00,
J~<Xl f(x,
fx(x) =
y) dy
oo
2
e-(y-.x/ l ) / 2
dy
- oo
= y -
ce-3x2f8
Jco
e-"'"12 du
= ../27t.
-oo
Consequently
fx(x) =
c../27te-3x
2
/
It is now clear that fx is the normal density n(O, cr2 ) with cr2 = 4/3 and
hence
or c
(13)
= .JJ/47t.
Consequently
f( x, y ) --
../3 e -(.xl-.xy+yl)/ 2 ,
4n
The above calculations now show that fx is the normal density n(O, 4/3).
In a similar fashion, we can show that fr is also n(O, 4/3). Since f(x, y) =F
fx(x)jy(y), it is clear that X and Yare dependent.
6.2.
145
6.2.
= {(x, y) I <p(x, y)
A:
:::;; z}.
Thus
Fz(z) = P(Z < z)
= P((X, Y) E A:)
= JJt(x, y) dx dy.
A~
JJt(x, y) dx dy
A~
=X +
= {(x, y) I x + y :S
Distribution of sums.
Az
Set Z
Y. Then
z}
Fz(z) = Jftcx, y) dx dy =
+y =
z as shown in
A~
Fz(z)
fx+y(z)
J:oo f(x, z -
x)
dx,
oo
148
Figu1e 1
In the main applications of (14), X and Y are independent and (14) can be
rewritten as
(IS)
fx+r(z) =
J~co fx(x)fy(z
- x) dx,
(16)
fx+r(z) =
J:
fx(x)fy(z - x) dx,
h(z) =
J~CX) f(x)g(z ~
x) dx,
J:
A_e-.txA.e-'-<.:r:-x) dx
= .A?e-A.%
J:
dx = .A?ze-.tz.
6.2.
147
We see that X
fx+r(z) = z,
1.
If 1 < z ~ 2 the integrand has value I on the set z - 1 < x < 1 and
zero otherwise. Thus by (16)
I < z < 2.
fx+r(z) = 2 - z,
fx+r(z) = 0,
00.
In summary
z,
fx+r(z)
2 - z,
0,
z < 1,
1<z~2,
elsewhere.
Figure 2
The graph off is given in Figure 2. One can also find the density of X + Y
by computing the area of the set
Az = {(x, y) I 0 < x < 1, 0 ~ y ~ 1 and
+y <
1 <z < 2
Figure 3
z}
148
r(a1
Proof.
+ a2, A.).
fx(x) =
and
X> 0,
'
;.1y1-1 e-;.,
fr(y) =
Thusfx+rCz)
r(ocl)
= 0 for z <
fx+r(z) =
r(a2)
'
> 0.
A_tzt +tz1e-).z i z
X
r(al)r(a2)
111
(z - x)111 -
dx.
= zu (with
z > 0,
(17)
where
(18)
c =
JA u~~~-1(1
- u)l-1 du
r(al)r(a2)
The constant c can be determined from the fact that fx + r integrates out
to 1. From (17) and the definition of gamma densities, it is clear that f x +r
must be the gamma density r(a 1 + cx 2 , A.) as claimed.
I
From (17) and the definition of the gamma density we also see that
c = 1jr(o: 1 + a 2 ). This together with (18) allows us to evaluate the
definite integral appearing in (18) in terms of the gamma function:
(19)
6.2.
149
The reason for this terminology is that the function of a 1 and a 2 defined by
B(<Xh
<X2) -
r(al)r(a2)
r(al + a2)
'
n(J.l1
Proof.
Jl2, uf
+ ui).
and
-oo < y < oo.
Thus by (15)
fx+r(z) =
Joo
1
21t0' 1 0' 2
exp
-00
[ --1 (x2 2 +
2
(z - x)
0' 1
2
)]
dx.
0' 2
Unfortunately an evaluation of this integral requires some messy computations (which are not important enough to master). One way of proceeding
is to first make the change of variable
.J0'21 + 0'22 x.
0'10'2
2nJaf + ai
Joo
-oo
exp
[-
-1 ( u 2 -
a2 ../af + ai
1
2uzu
.
+ 2z
0'2
)]
du.
150
we see that
e -z2/2(ahai)
.J2n.Juf + u~ '
as claimed.
6.2.2.
Distribution of quotients*.
Az = {(x, y) I yfx s; z}
is shown in Figure 4. If x < 0, then yfx < z if and only if y > xz. Thus
Az = {(x, y) I x < 0 and y > xz} u {(x, y) I x > 0 and y < xz}.
Consequently
Frrx<z) =
JJf(x, y) dx dy
...
r (L:
OJ
f(x, y)
6.2.
151
z<O
z>O
Figure 4
J (f~
00
+
=
J (J~oo
00
+
=
It follows from (21) that Y/X has the density fr1x given by
(22)
fr1x(z) =
J
00
(23)
/r,x(z) =
xfx(x)fy(xz) d~,
0 < z
<
00.
152
(24
.)
f(ocl)f(oc2) (z
0 < z <
z2- 1
l)IZI +2'
00
X> 0,
and
fr(Y)
;.2y2-1e-ly
r(oc2)
,
y > 0.
A_t +2z2-1
r(ocJ)r(oc2)
f(<X1
<X2)
(A.(z + 1))t+2
Jo
Since (24) defines a density function we see that for ocb oc2 > 0
= r(ocl)f(oc2) .
Jo
r(ocl
oc2)
The random variables are the same as those of Example 5. Thus again
X and Y 2 are independent, and each has the gamma density r(l/2, I /2a2 ).
Theorem 3 is now applicable and Y 2 IX 2 has the density /r21x2 given by
/r2;x2(z) = 0 for z < 0 and
2
r(l)
21
/r x (z)
z-l /l
f(l l2)f(ll2) (z + 1)
1
- ---1t(Z + l).J Z
0 < z <
00.
'
6.3.
153
Conditional densities
6.3.
Conditional densities
In order to motivate the definition of conditional densities of continuous random variables, we will first discuss discrete random variables.
Let X and Y be discrete random variables having joint density f If x is
a possible value of X, then
_
IX _ ) _ P(X = x, Y = y) _ f(x, y)
P(y -Y
x.
P(X = x)
fx(x)
The function fr 1x defined by
(25)
frlx(Y I x) =
f(x, y)
fx(x) '
fx(x)
0,
fx(x) = 0,
i= 0,
(26)
fnx<Y I x) =
fx(x) '
0,
elsewhere.
ib
frjx(Y I x) dy,
a< b.
P(a s; Y ~ b
IX =
x)
= lim P(a ~ Y ~ b
11~0
Ix - h < X
h).
154
If
L" f(u, y) dy
is continuous in u at u = x, the numerator of the last limit converges to
i"
f(x, y) dy
~ b IX
= x) =
J!f~x, y) dy,
fx(x)
f(x, y)
= fx(x)fYJx(Y I, x),
- 00
<
X,
y <
X,
y <
00.
f(x, y) = fx(x)fy(y),
- 00
<
00,
then
(31)
fr1x<Y I x) = fy(y),
Conversely if (31) holds, then it follows from (29) that (30) holds and X
and Y are independent. Thus (31) is a necessary and sufficient condition
for two random variables X and Yhaving a joint density to be independent.
Example 7.
(13), namely
Then as we saw in Example 2, X has the normal density n(O, 4/3). Thus for
-00 < x, y < 00
6.3.
Conditional densities
155
_ 1 -(y-x/2)1/2
- .J2n e
.
From the statement of the problem, we see that the marginal density of
Xis given by
for
0 < X < 1,
fx(x) = {
elsewhere.
~.
1/x
frrx(Y I x) = {O,
for
0 < y <X< 1,
elsewhere.
andfr(y)
= f~
-~
f(x, y) dx
Jyf. X! dx =
-logy,
= 0 elsewhere.
6.3.1.
(32)
Bayes rule.
"
JXfY(
y) -
f(x, y)
fr(Y) '
Since
f(x, y)
= fx(x)frrx<Y I x)
00.
166
and
/r(Y)
J~oo f(x, y) dx
(33)
fx(x)frlx(Y I x)
J~oofx(x)frlx(Y I x) dx
This formula is the continuous analog to the famous Bayes' rule discussed
in Chapter 1.
In Chapters 3 and 4 we considered random variables X and Y which
were both discrete. So far in Chapter 6 we have mainly considered random
variables X and Y which are both continuous. There are cases when one is
interested simultaneously in both discrete and continuous random variables. It should be clear to the reader how we could modify our discussion
to include this possibility. Some of the most interesting applications of
Formula (33) are of this type.
Suppose the number of automobile accidents a driver will
be involved in during a one-year period is a random variable Y having a
Poisson distribution with parameter A., where A. depends on the driver.
If we choose a driver at random from some population, we can let A. vary
and define a continuous random variable A having density fA The
conditional density of Y given A = A. is the Poisson density with parameter
A. given by
Example 9.
A.'e-.A.
Y!
{
IYII\(Y I A.) =
for
0,
= 0, 1, 2, .. . '
elsewhere.
/A(A.)A.'e-;.
f(A.,y) =
y!
{
0,
for
y = 0, 1, 2, . . . '
elsewhere.
fA(A.) =
In this case,
r(cx)
0,
for
A. > 0,
elsewhere.
6.4.
157
co
p~~ ).- 1
J.o
e- J.fl
T(rx)
= P
).>'e- J.
-dJ.
y!
y!r(a) o
r(rx + y)p
- y! r(a)(ft + l)ll+)'.
11
The value of the last integral was obtained by using Formula (34) of
Chapter 5. We leave it as an exercise for the reader to show that fr is the
negative binomial density with parameters rx and p = P/(1 + p). We also
have that for A. > 0 andy a nonnegative integer,
'"
(.A.
I y)
= f(A., y)
fy(y)
JAfY
r(a)y!r(rx
(p +
1)11+ y
y)p
l)ll+>',{ll+y-te-l(/1+1)
r(cx
y)
The concepts discussed so far in this chapter for two random variables X
and Y are readily extended to n random variables. In this section we
indicate briefly how this is done.
Let X 1, , X" ben random variables defined on a common probability
space. Their joint distribution function F is defined by
- oo < xh ... , x,. < oo.
The marginal distribution functions Fxm' m = 1, ... , n, are defined by
-00
<
X 111
<
00.
158
J::
Xl
(34)
F(xh . .. , x,.) =
-oo
<
<
X 1 , ... , X 11
00.
. ,
x,.)
0"
ox ... ox
1
F(x 1 ,
x,.)
II
X,.) e A) =
f ~ f f(x
1 , ... ,
x,.) dx 1 dx,..
In particular
(35)
and if am
dx,. = 1
4J
dx,..
4n
The random variable Xm has the marginal density fxm obtained by integrating/ over the remaining n - 1 variables. For example,
fxix2) =
f~
>
J:
f(x 1 , , x,.) dx 1 dx 3
dx,..
00
... ,
P(a,.
<
- 00
X 1, . . . , X 11
<
00.
The necessity is obvious, but the sufficiency part for n > 2 is tricky and
will not be proved here. If Fhas a density j, then X1 , . . , X,. are independent if and only iff can be chosen so that
- 00
<
X 1> , X 11
<
00.
'/,.(x,.),
- 00
<
X 1, ... , X 11
<
00.
6.4.
159
X1,
X,..
A.e- .t.x,.
fx...(x,.) = { O,
for
00,
Thusfis given by
for
Recall that the exponential density with parameter A. is the same as the
gamma density r(l, A.). Thus as a special case of this theorem we have the
following corollary: If X 1 , , X, a.re independent random variables, each
having an exponential density with parameter A., then X 1 + + X,. has
the gamma density r(n, A.).
+ .. + Jl
11
and
u2 = uf + .. + u;.
160
'"
Xh , X,.
I"
) ,
1x...... x"
6.5.
t; '
Order statistics
= 4.8,
= 4.3,
6.5.
Order statistics
161
We will now compute the distribution function of the kth order statistic
xk. Let -00 < X < 00. The probability that exactly j of the u,s lie in
( -oo, x] and (n - j) lie in (x, oo) is
(;) Fi(x)(l - F(x))"- 1,
Fx~c(x) = P(X"
=
~ x)
(~)
j=lc
F'(x)(1 - F(x))"-1,
- oo
-oo<x<oo.
The corresponding derivation for X" in general is slightly more complicated. From (37),
fx~c(x) =
n!
f(x)F 1 - 1 (x)(1 - F(x))"- 1
J=k (j - 1)! (n - j)!
,.-1
. n:
J=kJ!(n - J ~
L" .
J=k (J -
l:
1)!
I)! (n - J)!
n.
. f(x)Fi- 1(x)(l - F(x))"- 1
J=k+1 (J - 1)! (n - J)!
162
and by cancellation
f Xk(X )
( 38)
n!
= (k _ 1)! (n _ k)!
/( )F"X
1(
X1
F( ))"-lc
X
-ex:>
<
<
OC).
In order to find the density of the range R we will first find the joint
density of X 1 and X,. We assume that n > 2 (since R = 0 if n = 1). Let
x ~ y. Then
P(X. >
X,
<
u, < y)
(F(y) - F(x))",
and of course
P(X, :::;; y)
= F"(y).
Consequently
F x,,x"(x, y) = P(X 1 < x, X, :::;; y)
= P(X, <
02
fx,,x"(x, y) = ox oy F x,,xJx, y)
= n(n -
1)/(x)f(y)(F(y) - F(x))"- 2 ,
X< y.
0,
x > y.
By slightly modifying the argument used in Section 6.2.1 to find the density
of a sum, we find that the density of R = X, - X 1 is given by
fR(r)
= J~co fx,,x"(x, r +
x) dx.
In other words
fR(r) =
F(x))"- dx,
r>O
r
0,
< 0.
These formulas can all be evaluated simply when U 1, . , U, are independent and uniformly distributed in (0, 1). This is left as an exercise.
There is a uheuristic" way for deriving these formulas which is quite
helpful. We will illustrate it by rederiving the formula for fxk Let dx
denote a small positive number. Then we have the approximation
dx).
163
The most likely way for the event {x ~ X11: < x + dx} to occur is that
k - 1 of the U/s should lie in (- oo, x], one of the U/s should lie in
(x, x + dx], and n - k of the U/s should lie in (x + dx, oo) (see Figure
5). The derivation of the multinomial distribution given in Chapter 3 is
applicable and the probability that the indicated number of the U;'s will
lie in the appropriate intervals is
r_ (x) dx ~
JXk
n!
X(J~
~
aJ
j(u) du
n!
(k - 1)! (n - k)!
from which we get (38). We shall not attempt to make this method
ngorous.
k-1
1 ]
n-k
x+dx
Figure 5
6.6.
Sampling distributions
+ ... +X"
n
xl + + x"
u.Jn
has the standard normal density n(O, 1).
164
+ + x;
(]2
has the gamma density r(n/2, 1/2). This particular gamma density is very
important in statistics. There the corresponding random variable is said
to have a chi~square (X 2 ) distribution with n degrees of freedom, denoted
by X2 (n). By applying Theorem 5 we will obtain the following result about
x2 distributions.
Theorem 7 Let Y1 , . ~ Y, be independent random variables
such that Ym has the x2 (km) distribution. Then Y1 + + Y,. has .the
x2(k) distribution, where k = kl + . .. + k,.
Proof. By assumption, Ym has the gamma distribution r(km/2, 1/2).
Y2/k2
is known as the F distribution with k 1 and k 2 degrees of freedom, denoted
by F(kt. k 2 ).
Theorem 8 Let Y1 and Y2 be independent random variables
having distributions x2 (k 1) and x2(k 2 ). Then the random variable
-Ytfkt
-,
Y2/k2
which has the distribution F(k 1 , k 2 ), has the density f given by f(x) = 0
for x s; 0 and
> O.
'
6.6.
Sampling distributions
165
+ + X!)/m
+ + X~)/(n -
x2(m) and
(Xf
(X;+ 1
m)
has the F(m, n - m) distribution and the density given by (39), where
k 1 = m and k 2 = n - m. Tables ofF distributions are given in Volume
II.
The case m = 1 is especially important. The random variable
Xf
(Xi
+ + ~:)/(n
1)
has the F(1, n - 1) distribution. We can use this fact to find the distribution of
y =
x1
../(Xi + + x;)/(n - 1)
1
=
z > 0.
2yz
Since r(l/2)
(40)
= ..;;,,
M~=
I and
r[(k
v kn: r(k/2)
'
166
Y =
x~
.J(X~ + + x;)f(n - 1)
JY/k
has a t distribution with k degrees of freedom.
6.7.
Yi
I:
j=l
i = 1, . . . , n.
auX1,
[a,J]
[a~l
a,.l
la11
a 11
detA =
6. 7.
167
The constants b 11 can be obtained by solving for each i the system (41) ofn
equations in then unknowns bil~ . . ~ bi,. Alternatively~ the constants b11
are uniquely defined by requiring that the equations
II
Yt
= }=1
l: a,Jxi'
i = 1, ... , n,
have solutions
i = 1, . .. , n.
(42)
Yi
~ a,jxj,
i = 1, . . . , n,
j= 1
where the x's are defined in terms of the y's by (42) or as the unique
solution to the equations y 1 = 1:j= 1 a1ix1.
Yi
g/...X1 ,
X,.),
1, .. . , n.
i = 1, ... , n.
Suppose that these equations define the x's uniquely in terms of the y's,
that the partial derivatives oyJox1 exist and are continuous, and that the
Jacobian
iJyl
oy1
ox,.
iJxl
.
..
.
J(x1 , . , x,.) = .
.
oy,.
oy.,
ox,.
oxl
is everywhere nonzero. Then the random variables Y1 ,
tinuous and have a joint density given by
. . .
168
(45)
)I f(xh , x,.),
Xt, .. , X 11
where the x's are defined implicitly in terms of the y's by (44). This change
of variable formula can be extended still further by requiring that the
functions Ui be defined only on some open subsetS of lf' such that
P((Xh ... , X,.) e S)
1.
In the special case when Yi = I:j= 1 aiix1, we see that oy,jox1 = ail and
J(x 1 , , x,.) is just the constant det A = det [a 11]. So it is clear that
(45) reduces to (43) in the linear case.
Let X 1 , , X,. be independent random variables each
having an exponential density with parameter l . Define Yt> ... , Y,. by
Yi = X 1 + + X~> 1 < i < n. Find the joint density of Y1 , . , Y,..
The matrix [a11] is
Example 13.
1 0
1 1 0
0
1 1
f(x b
. . . ,
. ,
X,. is given by
_ {.A."e-A.<x + +x,.>,
0,
x,.) -
A."e-A.y,.
'
elsewhere.
Of course, one can apply the theorem in the reverse direction. Thus-If
Y~> . .. , Y,. have the joint density given by (47), and random variables
Xb ... , X,. are defined by X 1 = Y1 and X, = Yi - Y1_ 1 for 2 ~ i < n,
then the X's have the joint density f given by (46). In other words,
Xb ... , X,. are independent and each has an exponential distribution with
parameter A.. This result will be used in Chapter 9 in connection with
Poisson processes.
Exercises
111
Exercises
function f. Find the joint distribution function and the joint density
function of the random variables W = a + bX and Z = c + dY,
where b > 0 and d > 0. Show that if X and Y are independent, then
W and Z are independent.
2 Let X and Y be continuous random variables having joint distribution
(b)
5 Let X and Y have a joint density f that is uniform over the interior of
the triangle with vertices at (0, 0), (2, 0), and (1, 2). Find P(X < 1 and
y ~ 1).
6 Suppose the times it takes two students to solve a problem are indepen-
function of X and Y.
8 Letf(x, y)
c(y - x), 0
Y.
170
15
16
17
18
19
20
21
22
23
24
25
26
un.
Exercises
171
TI having a Beta density with parameters cx 1 and cx2 Find the conditional density of II given Y = y.
27 Let
28 Let
xl, x2, x3
xn
Xt
+ .. . + Xn- An
Bn
xlt x2, x3
31 Let
(0, X 1), and let X 3 be chosen uniformly on (0, X 2 ). Find the joint
density of xl' x2, x3 and the marginal density of x3.
32 Let U~> .. . , Un be independent random variables each uniformly
distributed over(O, 1). Let Xk, k = I, ... , n, and R be as in Section 6.5.
33 Let
as x2 (m) and x2 (n). Find the density of Z = Xj(X + Y). Hint: Use
the answer to Exercise 22.
36 Let X and Y be independent random variables each havjng the standard
normal density. Find the joint density of aX + bY and bX - a Y,
where a2 + b2 > 0. Use this to give another derivation of Theorem 2.
37 Let X and Y be independent random variables each having density f.
=X +
Y.
38 Let X and Y be independent random variables each having an exponential density with parameter A.. Find the conditional density of X
given Z = X+ Y = z. Hint : Use the result of Exercise 37.
39 Solve Exercise 38 if X and Yare uniformly distributed on (0, c).
172
> 0,
r < 0,
In the first four sections of this chapter we extend the definition and properties
of expectations to random variables which are not necessarily discrete. In Section
7.5 we discuss the Central Limit Theorem. This theorem, one of the most important
in probability theory, justifies the approximation of many distribution functions by
the appropriate normal distribution function.
7.1.
J::> lxlf(x) dx
< oo,
EX
iba x ( b-a
1 ) dx = ( 1 )
b-a
173
x21
b=
2a
_+ab
2
174
Example 2.
Example 3.
f(x) = n(l
-oo<x<oo .
+ x2) '
-~
lxl
1
n(l
X )
dx
= -2 f.~
0
1t
= -2
lim
1t c-> ~
ic
0
= -1 lim log (1
1t c ... ~
=
7.2.
dx
dx
+ x 2 ) lc
0
00 .
7.2.
175
1)
This random variable can also be defined in terms of the greatest integer
function [ ] as X, = e[X/e]. If e = Io-n for some nonnegative integer
n, then X,(w) can be obtained from X(ro) by writing X{ro) in decimal form
and dropping all digits n or more places beyond the decimal point. It
follows immediately from (I) that
X(ro) - e < X,(ro)
= I.
wen,
X(ro),
1))
I: lekiP(ek <
X < e(k
"
in which case
EX, = ~ ekP(ek ~X < e(k
1)).
"
and by Equation (5) of Chapter 5, P(X < x) = F(x-) holds for all x.
The following theorem, which we state without proof, will be used to give a
general definition of expectation.
176
This theorem and our preceding discussion suggest the following general
definition of expectation.
Definition 2 Let X be a random variable and let X,, B > 0, be
defined by (1). If X, has.finite expectation for some e > 0, we say that
X has finite expectation and define its expectation EX by
EX= lim
EX~..
e-+0
s:oo .J_
00
in which case
EZ =
Joo Joo
- CXl
- CXl
7.3.
7.3.
177
s:oo x"'f(x) dx
and
s:<X) (x
= - ,t
r(a)
(2)
= a(a
1) (a
m -
1)
,tm
u2
= EX 2- (EX)2 =
a(a ,t~ 1) -
(i) 2= ;2 .
EX= n/2 = n
1/2
and
Var X
= - n/2- 2 = 2n.
(1/2)
178
The random variable X - J.l has the normal density n(O, u 2 ), which is a
symmetric density. Thus E(X - Jl)"' = 0 form an odd positive integer.
In particular E(X - Jl) = 0, so we see that the parameter J1 in the normal
density n(Jl, u 2 ) is just the mean of the density. It now follows that all the
odd central moments of X equal zero. To compute the even central
moments we recall from Section 5.3.3 that Y = (X - p,) 2 has the gamma
density r(I /2, l/2a2 ). Since for m even E(X - J.l)"' = EY"'12, it follows
from Example 4 that
(m; 1)
(2~2) m/ 2 r (~)
- -------:::--(
_1
2a
)"''2
= u'"l 3 (m - 1).
E(X - J.l)"' =
m!
2"'' 2
(;)!
a"'.
f~oo f~oo (x
7.3.
179
I( x, y )
./J e-[(xl-xy+yl)/2]
=-
4n
-_ ..j"j
- e -3xlf8e-[(y-x/ 2)1/ 2].
4n
We saw in that example that X and Y each have the normal density
n(O, 4/3). Thus Jlx = Jly = 0 and Var X= Var Y = 4/3. From Equation (4) and the second expression for f, we have
f oo
2-Jhr -
..j]
Exy = -
foo
xe -(3x2/ 8) d x
00
- 00
...[i;c
Now
J
oo
- oo
foo (u + X)
1 e
- --=
-(ul/2)
2 J2n
-oo
du- -X ,
2
and hence
1
EXY =
(JJ) ...J2;c
J:oo x
1/2
1/2 . 4/3
foo
x2e-<3xlf8)
dx
-oo
n(x; 0, 4/3) dx
2/3.
EXY
Jvar X .Jvar Y
2/3
J4i3 ..j4/3
=-
X = min (U1 ,
U,.)
Y = max (U1 ,
.. ,
U,.).
and
Find the moments of X and Y and the correlation between X and Y.
These random variables were studied in Section 6.5 (where they were
denoted by X 1 and X,). Specializing the results of that section to U/s
which are uniformly distributed, we find that X and Y have a joint density
fgiven by
0 <X~ y ~ 1,
(5)
f(x, y) = {~~n
l)(y - x)"-2,
elsewhere.
180
Those readers who have skipped Section 6.5 can think of the present
problem as that of finding the moments of, and the correlation between
two random variables X and Y whose joint density is given by (5).
The mth moment of X is given by
-
1)
J:
x'" dx
= n(n -
1)
x'" dx Y - x
= n(n
EX'"
J:
(
)" _ 1
I.,=
n - 1
(y - x)"- 2 dy
1
y-x
= nr(m + l)r(n) =
EX'"
r(m
In particular, EX
that
1/(n
Var X
(EX 2 )
1)
1) and EX 2
(EX) 2
m! n!
(m + n)!
= 2/(n
(n
l)(n
1) (n
2). It follows
+ 2)
1)
= n(n -
1)
J:
y'" dy
J: (
y - x)"- 2 dx
1
1
= n
s:
n -
n
1)2 (n
x=O
y'"+n-1 dy
n
m+n
- - --
Thus EY = nf(n
1) and
Var y = n
2 -
)
1
= (n
2)
J J:
1
EXY = n(n -
1)
y dy
7.4.
Conditional expectation
181
Since
x(y - xt- 2 = y(y - x)"- 2
we find that
EXY = n(n - 1)
J:
(y - x)"-t,
J:
J J: (
2
y dy
(y -
x)"- dx
- n(n - 1)
y dy
y - x )"- dx
= n(n- 1)
Jo
J.
- n(n - 1) o y d y y -
=n
Ll
x)"( l) lx =y
n
x =o
1
- -n
Consequently
Cov (X, Y)
EX Y- EXEY
= -
- --- -- 2
n + 2
(n + 1)
(n
11)2 (n
2)
p = ---,=:-=======
../var X VarY
= (n
+ l)!(n + 2)/ (n +
l)~(n + 2)
n
7.4.
Conditional expectation
fr1xCY I x) = {
f(x, y)
fx(x) '
0,
elsewhere.
182
For each x such that 0 < fx(x) < oo the function fr1xCY I x), - oo <
y < oo, is a density function according to Definition 5 of Chapter 5. Thus
we can talk about the various moments of this density. Its mean is called
the conditional expectation of Y given X = x and is denoted by
E[Y I X= x] or E[Y I x]. Thus
J_
00
(6)
E[Y
I X = x] =
yf(y I x) dy
00
= s~oo yf(x, y) dy
fx(x)
E[ y I X
= X
J=
- .
L
1
fx(x)
n(n - 1)
= n( 1 -
(y - x)"- 2 dy
x )"- 1 ,
0 <X< 1,
I x)
(n - l)(y - x)"- 2
(1 - x)"- 1
0,
X
'
< y < 1,
elsewhere.
7.5.
183
J:oo yf(y I x) dy
1
1
= (n -
1)(1 - x) 1 -
"
= (n -
1)(1 - x) 1 -
"
= (n _ 1)(l _ x) 1 - ,
=
1)(1 - x)
(n -
n- 1
n
dy
+ x(y -
x)"n- 1
x)"- 2 ] dy
1
]
+x
+x
EY
J:oo E[Y I X
= x]fx(x) dx.
J~oo E[Y I X=
00
x]fx(x) dx =
J:oo dx J_ yf(x, y) dy
00
= EY.
J: (n -
: + x) n(1 - x)"- 1 dx
J:
=n
=
(1 - x)"- 1 dx -
1-
n+1
n+1
(1 - x)" dx
'
184
L ls,._,(y)l(x
- y)
or
S*
nJL
u.Jn
'
s:
P(S,.
k)
= (~)
pt(l - p)"-t.
7.5.
185
Theorem 3 Central Limit Theorem. Let X., X 2 , be independent, identically distributed random variables having mean Jl and
finite nonzero variance u 2 SetS,. = X 1 + + X,.. Then
(8)
lim P
,. .... oo
- 00
<
<
00 .
np.)"'
exists and equals the mth moment of the standard normal distribution.
At this stage it is more profitable to understand what the Central Limit
Theorem means and how it can be used in typical applications.
Let X 1 , X 2 , be independent random variables each
having a Poisson distribution with parameter A.. Then by results in
Chapter 4, J1 = u 2 = l and S, has a Poisson distribution with parameter
n.A.. The Central Limit Theorem implies that
Example 10.
lim P
,. .... oo
(s".J-n.A.nl ~
) = <l>(x),
-00
<X<
00.
One can extend the result of this example and show that if X, is a
random variable having a Poisson distribution with parameter A. = t, then
(9)
lim
, ... tO
P(X, - EX, ~ x)
.Jvar X,
Cl(x),
-00
<X<
00 .
186
p ( S,(f~nnJ.l
:$;
X) "' <>(x),
<X<
-00
00,
or equivalently
P(S, <
(10)
x) "' ~ (x"~=!')
= <b (x - ES,) '
-00
<X<
00 .
.../Var S,
(11)
~ (x :;n n),
.8
.6
.4
-
.2
10
Figure 1
Normal Approximation
15
20
7.5.
187
In solving this problem we let Xn denote the length of life of the nth
light bulb that is installed. We assume that X 1 , X 2 , are independent
random variables each having an exponential distribution with mean I 0 or
parameter A. = 1/10. Then Sn = X 1 + + Xn denotes the time when
the nth bulb burns out. We want to find P(S50 < 365). Now S 50 has
mean 50A.- 1 = 500 and variance 50A.- 2 = 5000. Thus by the normal
approximation formula (1 0)
P(S 50 < 365) "' <I>
e6~~OO)
(12)
fs.(X)
0:<
< X<
00.
Though the derivation of (12) is far from a proof, {12) is actually a good
approximation for n large (under the further mild restriction that, for some
n, Is.. is a bounded function).
As an example of this approximation let X 1 be exponentially distributed
with parameter A. = 1, so that (11) is applicable. Then (12) becomes
(13)
1
fs ..(X) ~ .j~
(/)
(x .j-n n) ,
Normal Approximation
.10
.05
10
Figure 2
15
20
188
the set
{x - a I xis a possible value of Xd
is one.
We exclude, for example, a random variable X1 such that P(X1 = 1) =
P(X1 = 3) = 1/2, for then the greatest common divisor of the indicated
set is 2. Under assumptions (i) and (ii), the approximation
(14)
x an integer,
(15)
fs"(x) =
(~) i~:(l
1
- p)"-x
(
x - np
x an integer.
fs.(x)
~ ~ (x + (~3:-
"Jl)
u.J~
'
x an integer.
7.5.
189
.3
.2
.1
10
Figure 3
"
When S,. is discrete and conditions (i) and (ii) hold, then (17) is usually
more accurate than is the original normal approximation Formula (10).
In Figure 4 we compare the approximations in Formulas (10) and (17)
when S,. has the binomial distribution with parameters n = 10 and
p = .3.
1.0
.8
.6
.4
.2
0
0
10
Figure 4
We will interpret the problem as implying that the number S,. of successes
in n attempts is binomially distributed with parameters n and p = .6.
190
Since P(S,. ~ x)
approximation
(18)
(~~ - np} ,
(13 - (1/2) -
x an integer.
= 5.J.24. Thus
15)
s.J.24
1 - <b( -1.02)
= <1>(1.02) = .846.
The Central Limit Theorem and the
corresponding normal approximation formulas can be regarded as refinements of the Weak Law of Large Numbers discussed in Chapter 4. We
recall that this law states that for large n, S,./n should be close to p. with
probability close to 1. The weak law itself, however, provides no information on how accurate such an estimate should be. As we saw in Chapter
4, Chebyshev's Inequality sheds some light on this question.
The normal approximation formula ( 10) is also useful in this context.
For c > 0
7.5.2.
Applications to sampling.
(I~" - P
I ~ c)
= P(S,. :S np. -
., "'c;~)
= 2 [1 _
nc)
+ P(S,.
np.
nc)
"'cj~)
C!> (c:~} J.
+1-
In other words
(19)
where
(20)
A sample of size n is to be taken to determine the percentage of the population planning to vote for the incumbent in an election.
Let X 1 = 1 if the kth person sampled plans to vote for the incumbent and
Xk = 0 otherwise. We assume that X., .. . , X,. are independent, identically
distributed random variables such that P(X1 = 1) = p and P(Xl = 0) =
1 - p. Then p. = p and u 2 = p(I - p). We will also assume that pis
Example 14.
.Jp(1 -
7.5.
191
= 900.
I~"-
PI ~
.025.
.5
so by (19)
p
(I~'! - PI ~
.025)
~ 2(1
~1.5))
= 2(.067) = .134.
Solution to (ii). We first choose lJ so that 2(1 - ~(/J)) = .01 or
cJ)(5) = .995. Inspection of Table I shows that 15 = 2.58. Solving (20) for
c we get
= <5u = (2.58)(.5) =
043
c
J~
,J9oo
IJ2u2
2
= (2.58)2(.25) = 2663.
(.t25) 2
192
~n- ~1 ~c)
Exercises
1 Let
and cx2 will Z have finite expectation? Find EZ when it exists. Hint:
See Theorem 3 of Chapter 6 and related discussion.
3 Let X have the normal density n(O, u 2 ). Find EIXI. Hint: Use the
of y =
rx.
marginal density fx
8
11
12 Find the mean and variance of the random variable Z from Exercise 17
of Chapter 6.
13
Find the mean and variance of the random variable Y from Exercise 28
of Chapter 6.
Exercises
193
15 Let X have the normal density n(O, u 2 ). Find the mean and variance of
X have the gamma density r((X, A.). For which real t does etX have
xr
J (1 00
F(x)) dx < oo
and then
EX =
fooo
(1 - F(x)) dx.
Pa ~ P1P2 -
.Jt -
pf
.Jt -
p~.
Hint: Write
XZ = [p 1 Y +(X- P1Y)][p2Y
+ (Z-
P2Y)],
194
26 Let X and Y have a density f that is uniform over the interior of the
triangle with vertices at (0, 0), (2, 0), and {1, 2). Find the conditional
expectation of Y given X.
27 Let X and Ybe independent random variables having respective gamma
densities r(a 1 , A.) and r(a2 , A.), and set Z = X + Y. Find the conditional expectation of X given Z.
28 Let n and Y be random variables as in Exercise 26 of Chapter 6.
Find the conditional expectation of II given Y.
29 Let X and Y be continuous random variables having a joint density.
Suppose that Y and ~(X) Y have finite expectation. Show that
Ecp(X)Y
= J~>
cp(x)E[Y I X
x]fx(x) dx.
VarY =
31 Let
x]fx(x) dx.
xh x2,...
uv n
,. ... >
Es: =
and
lim E (
,. ... oo
;-) = 3,
av n
(~) 2;~,.
33 Let X have the gamma density r(a, A.). Find the normal approximation
for P(X ~ x).
34
196
Moment Generating
Functions and
Characteristic
Functions
Some of the most important tools in probability theory are borrowed from other
branches of mathematics. In this chapter we discuss two such closely related tools.
We begin with moment generating functions and then treat characteristic functions.
The latter are somewhat more difficult to understand at an elementary level because
they require the use of complex numbers. It is worthwhile, however, to overcome
this obstacle, for a knowledge of the properties of characteristic functions will
enable us to prove both the Weak Law of Large Numbers and the Central Limit
Theorem (Section 8.4).
8.1.
IS
The domain of M x is all real numbers t such that etx has finite expectation.
Example 1.
Then
oo
u.J2n
-oo
-Ut
e-
foo
1 ety- (yl/2t1l) d y.
u.J2n
---
-oo
Now
Consequently
pt t~ltl/2
M .X\'t) =ee
fco
- oo
197
1
-[(y-t~lt)l/2t12] dt.
-_-e
u.J2n
198
Since the last integral represents the integral of the normal density
n(u2 t, u 2 ), its value is one and therefore
-oo < t < oo.
(1)
Example 2.
Then
J.ex>
= -A_-
r(oc)
11-1
e -(l-t)x d X
A11
r(cx)
= - -...0........:..r(oc) (l - t)
for - oo < t
(2)
(-A.
),
A. -
S: t
-oo<t<A..
Mx(t) =
e"'P(X
n).
11=0
<})x(t)
= L
t 11P(X
n).
11=0
(3)
1 - p)11
Consequently~
eA(t- 1
>.
8. 1.
199
. .. ,
distributed, then
(4)
In order to see why Mx(t) is called the moment generating function we
write
L -t"X"
.
n!
oo
Mx(t) = EetX = E
n=O
Suppose Mx(t) is finite on -t0 < t < t0 for some positive number t0
In this case one can show that in the last expression for Mx(t) it is permissible to interchange the order of expectation and summation. In other
words
(5)
Mx(t) =
oo
EX"
I: n =O n!
t"
for -t0 < t < t0 In particular, if Mx(t) is finite for all t, then (5) holds
for all t. The Taylor series for Mx(t) is
Mx(t) =
(6)
t ... d"
I .
I:
Mx(t)
rt=O n! dt"
t=O
00
EX"= -
(7)
t=O
Example 3.
Mx(t)
= e(l2t2f2 =
(u2t2)" _!_
n=O
n!
Thus the odd moments of X are all zero, and the even moments are given by
EX 211
u2"
-- = (2n)!
2"n!
200
or
EX 2n
= 0'2"(2n)! .
2"n!
and
it follows that
and
Characteristic functions
Ee''x.
8.2.
Characteristic functions
201
remains valid for all complex numbers z 1 and z 2 Letting z = it, where t
is a real number, we see that
e''
(it)"
n!
n=O
- (1
-
(1 -
lt -
it 3
it s
2 - 3! + 4! + 5! - ...
~~ + ~~ -
... ) +
(t - ~~ + ~~ - ... ) .
Since the two power series in the last expression are those of cos t and
sin t, it follows that
e1t = cos t
(8)
+ i sin t.
Using the fact that cos (- t) = cos t and sin (- t) = -sin t, we see that
e't + e-ft
and
cost= - - 2
e't
e-it
smt = - - 2i
le1tl
(cos2 t
sin 2 t) 1 ' 2
1.
If f(t) and g(t) are real-valued functions oft, then h(t) = f(t) + ig(t)
defines a complex-valued function of t. We can differentiate h(t) by
differentiating f(t) and g(t) separately; that is,
h'(t)
= f'(t) +
ig'(t),
J:
h(t) dt =
J.b f(t) dt + i
J:
g(t) dt,
provided that the indicated integrals involving/ and g exist. The formula
202
then
= E(X +
i Y)
= EX + iEY
IEZI
EIZI.
The formula
E(a 1 Z 1
+ a2 Z 2 ) = a1EZ1 +
a2 EZ2
-00
<
<
00 .
In particular, if X takes on the value zero with probability one, then its
characteristic function is identically equal to 1.
If X is a random variable and a and b are real constants, then
<l'a+bx(t) = Ee't(a+bX)
=
=
Eeiraeibtx
e''IJEeibr x,
8.2.
Characteristic functions
203
and hence
(10)
Example 5.
-00
< t<
00.
t ::1: 0
J
1
(/Ju(t) =
-1
"!2 du
11
eit"ll
2 it
-1
siin t
=-t
Then Xis uniformly distributed on (a, b), and by (10) fort '# 0
_
lt(a+b)/2
(/Jx( t ) - e
(b - a)t/2
Alternatively
q>x(t) =
f" e1'x l dx
Ja b- a
l
eitxlb
=b-aUa
it(b - a)
Then
q>x{t) =
fooo e1'x;.e-.tx dx
= ).
fooo e-<J.-ir)x dx
- A e-<J.-Ir)xlo.
A - it
oo
204
Thus
qJ:x(t)
..t
..t -
..
lt
Ee''<X+r> = Ee''xeltY
Ee''xEe''r
and hence
(11)
(/Jx+r(t) = (/Jx(t)qJy(t),
-00
< t <
00.
Formula (11) extends immediately to yield the fact that the characteristic
function of the sum of a finite number of independent random variables
is the product of the individual characteristic functions.
It can be shown that ({Jx(t) is a continuous function oft. Moreover, if X
has finite nth moment, then qJ<;>(t) exists, is continuous in t, and can be
calculated as
cp<">(t)
X
d" EeitX
dt"
E d/1 eitX
dt"
E(iX)"eitx.
In particular
qJ<;>(O) = i"EX".
(12)
(13)
(itX)"
n!
11=0
11=0
i"EX" t".
n!
Suppose that
Mx(t) =
Lco EX"
- t"
rt=O
n!
is finite on - t 0 < t < t 0 for some positive number t 0 Then (13) also
holds on -t0 < t < t 0
Example 7.
Find fPx(t).
From Chapter 7 we know that EX" = 0 for any odd positive integer n.
Furthermore, if n = 2k is an even integer, then
EX"
EXn
= u2k(2k)! .
2kk!
Therefore
8.3.
205
lf>x
- e e
and
lfJx(t)
Eettx,
lfJx(t) = Mx(it).
and hence
iut -a2t2f2
= e"'e
lf>x(t) = ~ e'i'Jx(j).
-co
One of the most useful properties of q>x(t) is that it can be used to calculate
fx(k). Specifically we have the "inversion formula"
(16)
fx(k) = _!_
27t
Jx
e-iktlfJx(t) dt.
-x
_!_
2n
fx
-x
e-ikt [
-co
e'i'JxU>] dt.
206
fxU)
_!_ f~r
21t
-<Xl
ei<i-ll;)t
dt.
-n
In order to complete the proof of (16) we must show that the last expression
equals fx(k). To do so it is enough to show that
_!_
(17)
27t
JK
ei<i-k)t
dt = { 1
-x
if j = k,
if j ::1: k.
Formula (17) is obvious whenj = k, for in that case e'<i-">' = I for all t.
If j :F k, then
7t
l(j- ll;)tlw
-1
e'<i-k)t dt = e
-K
21t - 7t
21ti(j - k)
- - - - - --21ti(j - k)
sin (j - k)n
7t(j -
= O,
k)
since sin m1t = 0 for all integers m. This completes the proof of (17) and
hence also of ( 16).
Let X 1 , X 2 , . , X, be independent, identically distributed
integer-valued random variables and set S,. = X 1 + + X,. Then
(/Jsn(t) = (q>x,(t))", and consequently by (16)
Example 8.
(18)
fs"(k) =
_!_
21t
fn
e-lkt(({Jx 1(t))"
dt.
-7r
Formula (18) is the basis of almost all methods of analyzing the behavior
of fs"(k) for large values of n and, in particular, the basis for the proof of
the "local" Central Limit Theorem discussed in Chapter 7.
There is also an analog of (16) for continuous random variables. Let
X be a random variable whose characteristic function q>x(t) is integrable,
that is,
I~
oo
(19)
Example 9.
8.3.
207
-flltl/2 _
Joo eir.x J1
-.xl/2fll d
e
x.
- oo
u 2n
-tl/2f12
foo
-ir.x
(1
;- e
-(12.xlf2
v 2n
- 00
or equivalently,
1-- e -r2/ 2f12
-u.J2n
foo
1
2n
= -
oo
e -it.xe -fll.xl/2 d x .
2n -
oo
If we integrate both sides of this equation over a < x < b and interchange the order of integration, we conclude that
P(a < X
c Y < b)
= _!_ J.b
2n
_!_
2n
(foo
-oo
e-lt.x(/Jx(t)e-c21212 dt) dx
)e-cltl/2
dt
or
(20)
P(a
~ X + cY ~
b)
= _!_ foo
21t
-oo
208
(21)
- 00
,.-.oo
<
<
00.
Then
(22)
lim FxJx)
,. ..... 00
= Fx(x)
8.4.
209
The Weak Law of Large Numbers and the Central Limit Theorem
P(a
X,.
+ cy ~
b) = __!__
21t
and
(24)
P(a ~ X
cY ~ b)
= -1
oo ( -ibt
-iat)
22
e
-. e
cpx(t)e-c ' ' 2 dt.
-oo
-lt
21t
lim P(a
X,.
cY < b)
= P(a
~ X
+ cY
~ b).
n-+oo
There are two more steps to the proof of the theorem. First one must
show (by letting a -+ - oo in (25)) that
(26)
lim P(X,.
cY < b) = P(X
cY < b).
n-+oo
-+
0 in (26)) that
= P(X
< b)
n-+oo
The Weak law of large Numbers and the Central limit Theorem
In this section we will use the Continuity Theorem to prove the two
important theorems in probability theory stated in the title to this section.
Both theorems were discussed without proof in earlier chapters. In order
to prove these. theorems we first need to study the asymptotic behavior of
log CfJxCt) near t = 0.
Let z be a complex number such that lz - 11 < 1. We can define
log z by means of the power series
log z
= (z
- 1) - (z - 1)2
(z - 1)3
elog z
z,
lz - 11 < 1,
dt
h(t)
210
dt
t =O
= <l'x(o)
(/Jx(O)
= ijl.
Consequently,
lim log (/Jx(t) - i}lt = O.
t-+0
t
(27)
Suppose now that X also has finite variance u 2 Then ({Jx(t) is twice
differentiable and by (12)
cp;(O) = -EX 2 = - (Jl 2
+ u 2 ).
, ... o
t2
- ijlt
(/Jx<t)
.
- - - lJl
(/Jx(t)
1m _;....--t-t-+0
2
= lim (/Jx(t)
- iJlq>x(t) .
2t
t-+0
-(JL2
u2) _ (i/l)2
In other words
(28)
8.4.
211
The We,ak Law of Large Numbers and the Central Limit Theorem
(29)
= 0.
S
n
-II- j l =
X 1 ++XII - jl
n
IS
e- ip.t((/)x ,(tfn))".
Let i be fixed. Then for n sufficiently large, t/n is close enough to zero so
that log (/Jx 1(tfn) is well defined and
(30)
t Ilffi
, .... oo
(/Jx 1 (0) =
log I = 0. If
t/n
But tfn -+ 0 as n -+ oo, so the last limit is 0 by (27). This completes the
proof of (31 ). It follows from (30) and (31) that the characteristic function
of
8
lim P ( " - Jl
, .... oo
n
-e)
= Fx.( -e) = 0
and
8
lim P ( " - Jl
w-+oo
e) =
Fx(e) = 1.
212
e) = o,
I
For the next theorem it is necessary to recall that c:I>(x) denotes the
standard normal distribution function given by
f"
c:I>(x) =
-oo
~ e-y2f2 dy,
-00
.J21t
< X<
00.
(S,.' a JnnJl. ~ x) =
<ll(x},
-OO
<X<
00.
Proof Set
S* _ S,. - nJl.
"
a.Jn
or
(33)
q>5 =(t)
- iJJ.(tfu.J~))] .
,. ... w
If t = 0, then both sides of (34) equal zero and (34) clearly holds. If
t =F 0 we can write the left side of (34) as
t 2 . log q>x.(tfu.J~) - iJJ.(t/uJ~)
2 1lffi
I
'
a ,.... oo
(tfuy n)2
213
Exercises
Thus (34) hQlds for all t. It follows from (33) and (34) that
-00
<
<
00 .
n-+OO
,.-co
= <l>(x),
Exercises
(b) Use this moment generating function to find a formula for EX2 n
(note that the odd moments of X are all zero).
6
7 Let X1 ,
E(X 1
+ .. +
Xn)3 = nEX~
+ n(n
3n(n- 1)EXfEX 1
- l)(n - 2)(EX 1) 3
It follows that
~
P(X ;;:: x)
min e-':xMx(t),
t:i!':O
<
<
00.
Jex>
e-ixr
n(l
-oo
t 2)
dt.
fco
-<X>
n(l
t 2)
dt.
Exercises
215
fx(x) = n(l
x2),
-00
<
<
00.
Show that CfJxCt) = e-lrl, - oo < t < oo. Hint: Interchange the role
of x and t in Exercise 19.
21
22
23
Eeit(XJ.-A.)Jv'"i =
e-'2 ' 2
A.-+oo
(b) What conclusion should follow from (a) by an appropriate modification of the Continuity Theorem?
Poisson Processes
Random walks
Consider a sequence of games. such that during the nth game a random
variable X,. is observed and any player playing the nth game receives the
amount X,. from the "house" (of course, if X,. < 0 the player actually pays
-X,. to the house).
Let us follow the progress of'a player starting out with initial capital x.
Let S,., n > 0, denote his capital after n games. Then S0 = x and
n > 1.
We will further assume that the Xk's have finite mean p. If a player plays
the first n games, his expected capital at the conclusion of the nth game is
(1)
ES,. = x
np..
9.1.
Random walks
217
(2)
:::5::
P(Xk
= 0)
< 1.
It is possible to prove that the random variable Tis finite (with probability
1) and, in fact, P(T > n) decreases exponentially as n -+ oo. This means
that for some positive constants M and c < 1,
P(T > n) < Me",
(3)
n = 0, 1, 2, .. ..
The proof of (3) is not difficult but will be omitted to allow room for
results of much greater interest. From (3) and Theorem 5 of Chapter 4, it
follows that ET and all higher moments ofT are finite.
x-4
a-0 ----------------------------
T- 10
Figure 1
If the player quits playing after the Tth game, his capital will be Sr
(see F igure 1). A famous identity due to Abraham Wald relates the expected capital when the player quits playing to the expected number of
times he plays the game. Specifically, Wald's identity asserts that
(4)
EST =
p.ET.
218
sT
oo
= x + }=1
~ x1 t<T~J}
~ x1
=x+
}=1
00
(5)
ST =
~ Xj(l - l{T<J}),
}=1
and hence
00
(6)
EST
=X+
E ~ Xj(l - l{T<j})
}= 1
It can be shown by using measure theory that the order of expectation and
summation can be reversed in (6). Thus
00
(7)
EST
E[Xj(l - l{T<J})].
j=1
E(Xj{l -
= p(l - P(T
= pP(T
l(T<Jl)
< j))
> j)..
EST
=X +
Jl
L P(T ;;:: j)
j=1
=X+ p.ET,
which completes the proof of Wald's identity.
If the X,.'s have mean Jl = 0 and finite variance a 2 , there is a second
identity due to Wald, namely
Var ST = u 2ET.
(8)
(9)
ST -
~ Xj(l -
l{T<J}).
)=1
and hence
00
(ST - x) =
J=l
00
= L1
j=
00
00
It= 1
9.2.
219
(10)
E(Sr - x)
00
= L L
j= 1 k= 1
We will now evaluate the individual terms in this double sum. Consider
first terms corresponding to values ofj and k such thatj < k. The random
variable
Xj(l - l{T<j})(l -
l{T<k})
ltT<JJ)(l - ltT<kJ)]EXk = 0.
Similarly the terms in the right side of (10) vanish when j > k. When
j = k we obtain
The random variable (1 - I{T<j}) depends only on x1, x2, .. . 'xj-1, and
hence is independent of X1. Since this random variable takes on only the
values 0 and I, we see that
Thus
E[X j(l - 1{T<Jl)2 ] = E[X j(l - l(T<Jl)]
= EXJE(l
- ltT<JJ)
= u 2P(T
< j))
> j).
E(Sr - x) = u
~ P(T
> j) = u 2 ET,
j=1
We will assume throughout this section that a < x < b, a < b, and
a, b, and x are integers. The two identities of the previous section are
most easily applied if it is known that
(11)
P(Sr = a or b)
P(Sr
= a) + P(Sr = b) = 1.
220
(12)
= P{X"
= P {x"
r = P{X"
= 1},
= - 1},
0}.
ESr = aP(Sr = a)
a(l - P(Sr
+ bP(Sr = b)
= b)) + bP(Sr =
b).
For simple random walks we can solve explicitly for P(Sr = a),
P(Sr = b), ESr, and ET. Consider first, the case p = q. Then 11 = 0 and
Wald's identity (4) becomes ESr = x. Thus by (13)
x = a(l - P(Sr = b))
+ bP(Sr = b).
(14)
b) =
x - a
b- a
and
P(Sr =a)=
(15)
b ~ x
b- a
Now
Var Sr
= ESf.
- (ESr) 2
= b2 P(Sr = b) + a 2 P(Sr
b2 (x - a) + a 2 (b - x)
-x2
b- a
(ax
bx - ab) - x 2
= (x - a)(b - x).
Thus if p = q,
(16)
= a) - x 2
ET
= (x -
a)(b - x) .
1- r
9.2.
If r
221
= 0 and p =
1/2,
(17)
ET = (x - a)(b - x).
The problem fits into our scheme with S" denoting the capital of the less
wealthy player after n bets if we choose p = q = 1/2, x = 5, a = 0, and
b = 15. The answer to the first part is given by
P(Sr
b)
5
- O
15 - 0
=!.
3
(5 - 0)(15 - 5)
50.
f(x) = pf(x
1)
+ qf(x -
1)
+ rf(x),
a< x <b.
= p. P(Sr = b I xl =
+ r. P(ST = b I xl
1)
+ q. P(Sr =
I xl =
-1)
= 0)
and
P(Sr = b
I xl =
i)
= f(x + i),
1, -1, 0.
(19)
(20)
f(x
Set c
= f(a +
and
f(b) = 1.
q, we see that
1)
f(x
= f(a +
a< x <b.
I) - f(x) = c
(;r-.
a<
< b.
222
f(x)
= f(x)
- f(a)
L -(f( y + 1) -
= ~-1
~ C
y=a
(
I
f(y))
9_) ' -
p
1 - (qfpyx-
=c
1 - (q/p)
'
a~ x ~b.
1 - (q jpy:-a
f(x) = 1 _ (q/p)b-a
(21)
= b)
= 1 - (qfpy:-a
1 _ (q/p)b-a '
a< x
~b.
( 22)
P(S
ET = (b - a) 1 -
(qfp)x - a _ x - a
P - q 1 - ( q I P)"-a
P- q,
a~
x ~b.
P(S
T
= .6.
1 - (.6/.4f' = .0151.
1 - (.6/.4) 1 '
9.2.
223
In order to find the expected gain to the wealthier player we first note that
the expected capital of the poorer player after they stop playing is
EST
15P(ST = 15)
15(.0151)
= .23.
Thus the expected gain to the wealthier player or the expected loss to the
other player is $5.00 - $.23 = $4.77, a good percentage of the poorer
player's initial capital. The expected number of bets is
ET = EST-
X=
-.2
J1
Let b
-4.77 ~ 24.
--+
P(S,
t-(!('
for q < p,
for q
p.
= So>
{~ - (~r-
for p < q,
for
> q.
Let S, denote the capital of the house (in multiples of $1000) after n
games. Then p = .51, q = .49, x = 100, and a = 0. By (25) the probability that the house will go broke is
I - P(S, > 0 for all n ;:: 0) =
(;r-
(::~ro
= .018.
Suppose p
< y < b.
224
y-a-1
y-a
1 - 2p
b-y-1
P ---'--
b-y
or
p
(26)
(
{a, b}
) _
p(b - a)
_
y, y - 1
(y- a)(b - y)
(27)
If x
::1=
P..t(Y, y) .
1 - P..t(Y, y)
y, then
(28)
G...(y, y)).
G A(x, y) =
p A(x, y)
1 - P"..t(y, y)
(5 5) {0, 15}
'
Pco,t5>(5, 5)
1 - p
(5 5)
{0,15}
85
= = 5.67 .
.15
'
9.3.
225
9.3.
Figure 2
226
Figure 3
'
N Bk --
nJ --
n'
111
(n 1 !) ( n~c !)
p1
l'k
"""
n!
IB I"J
=-nElL.
lSI" J=t n !
k
9.3.
227
" IB I"J
n
lSI"
n !
= n" I N = n) = -n!
Sl!_
1=1
Hence
(30) P(N
"
= n, N 8 , = n 1 , . . , N 8 = n") = P(N = n )n!- fl
k
IS I" J= 1
IBI"1
n1 !
Sl!_
A."JSI" e-AISI n!
IBii"J
n!
lSI" 1=1 n1 !
= A."e-liSI
J= 1
IBJI'J.
n1 !
lSI
IB11 + ... +
IB~cl
'
n" (A.IBJI)"J
n !
~--"-'.:.-e
J=1
-liB11.
P(N 8, = nh . . . , N 8k = n") =
n"
(A.IBJI)"J -AIB1l
e
.
i= 1
n1 !
228
(32)
n! k IB-1"1
~-.
IBI" i=t ni!
n) = -
Distance to particles
9.4.
Distnce to particles
229
from the origin to the star. Thus the amount of light received from a star
of distance r from the origin is K/r 2 for some positive constant K. The
amount of light received from the nearest (and hence apparently brightest)
star is Kf
The total amount of light is
n:.
J.
m)
tm-1e-t
(m - 1)!
dt.
( )
/,mr
d(c.A.)m
md-1 -cJ.rd
r
e
'
- 1)!
= (m
> 0.
f
f.
00
ED!,.
riJ,ir) dr
d(cl)m
r md+j-1 e -c).rd d r.
o (m - 1)!
oo
---
230
Waiting times
respectively. These facts are immediate from our definition of the Poisson
process and its translation into the time language.
As mentioned above, Dm is the time of the mth event. From our results
in Section 9.4 we know that Dm has the gamma distribution r(m, J..). In
particular, D 1 is exponentially distributed with parameter J... Recall
from Chapter 6 that the sum of m independent, identically distributed
exponential variables has the gamma distribution r(m, J..). Define
random variables W1 , W2 , , Wit,... as follows : W1 = D 1 , Wit =
Dlt - Dlt_ 1 , n > 2. Then clearly Dm = W1 + + Wm. The discussion
just given makes it plausible that wl, w2, ... , wm are independent
exponentially distributed random variables with the common parameter J...
This is in fact true and is a very interesting and useful property of the
Poisson process on [0, oo ). The random variable Wm is, of course, nothing
more than the time between the (m - I)st event and the mth, so that the
times W., W2 , are the waiting times between successive events in a
Poisson process.
9.5.
231
Waiting times
< t
11 ,
From Example 13 of Chapter 6, we see that the theorem is true if and only
if the random variables D 1 , , D, have joint density /,. This is true
for n = I, since D 1 is exponentially distributed with parameter A.
A rigorous proof in general is more complicated. Before giving this
proof we will first give a heuristic way of seeing this fact.
LetO = t 0 < t 1 < < t11 andchooseh > Oso small that t,~_ 1 + h < t.,
i = I, 2, ... , n. Then (see Figure 4)
(36)
+ h, 1 < i < n)
= 0, N{t 1 + h) - N(t 1 ) = 1, ... ,
N(t11 ) - N(t,._ 1
+ h) = 0, N (t,. + h)
- N {t11)
>
1)
= A.n-1h11-le-At"(1
0
_ e-Ah).
1
Figure 4
111)h"
0.
= lim
h
10
= A."e-M"
as desired.
-).h
e(h),
232
/,.(s1 ,
.. ,
s1 , , s,. - s 1).
F.(t 1 ,
t,.) =
J~
1
/
1 (s 1)F,._ 1(t 2 -
sH .. . , t,. - s 1 ) ds 1
Gn{t 1 ,
. ,
t,.)
= J~'
1 (s 1 )G,._ 1 (t2 -
s1 ,
tn - s 1 ) ds 1
,.
,.
n {N(t,>
~ i}
i= 1
t,.)
= P(N(tJ >
i, 1
J~'
9.5.
233
Waiting times
(A.tt
e -J.t =
(40)
k!
(k - 1)!
Indeed,
t
J.
5)]k-1
Ut
A.e-;., ""- -
e-.t(r-s) ds
(k - 1)!
(k - 1)!
(k -
P(N(t 1 )
=
k 1,
~ ..
(At)" e-J.t.
k!
k,)
s) = k 1
fr s"-1 ds =
1)! Jo
N(t,)
s)"- 1 ds
e-J.tA_k
(41)
J.t (t -
-J.nk
"'
P(N(t 1 ) = k 1 ,
N(t,)
k,)
=P (N(t1) = k1)
f.
, _ [A.(t
P(N(t1)
_ s)]"'-te-.t<r,-s) ds
Ae As ~~1~--=---------(k1 - 1)!
fi [A(t, -
}=2
1'=2
t,_l)]"r"J-e-..l<trtJ-> .
(kj - kj-1)!
J:'
A.e-..l5 P(N(t 1
s)
k1
= J~' A.e-.A.sP(N(t 1
,
=
r,
J.o
X
A.e- As
s) - N(t1 _ 1
1 -
}=2
s) = k 1
n P(N(t
1, . .. , N(t,. - s)
= k,.
- 1) d$
1)
-
s)
= k1
k 1 _ 1 ) ds
[A(t _ s)]"-1e-.t(r,-s)
"'--'---"1'---_....:_=---------
(kl - 1)!
, [A.(t _
n
j=2
J
ds.
(kj - kj-1)!
Comparing the right-hand side of (42) with that of (43), we see that (41)
holds. The desired equality (39) now follows from (41) by summing both
sides of (41) over all values of k 1 , . , k, such that k 1 < k 2 < < k,.
and k 1 > I, k 2 ~ 2, . .. , k, > n.
I
234
Exercises
1 Let
< Sr
~ b
+ d)
= x.
= I,
Suppose that
< a)
+ bP(Sr >
b)
+ d)P(Sr >
b).
soon as his net winnings reach $25 or his net losses reach $50. Suppose
the probabilities of his winning and losing each bet are both equal to
1/2.
(a) Find the probability that when he quits he will have lost $50.
(b) Find his expected loss.
(c) Find the expected number of bets he will make before quitting.
3 Suppose the gambler described in Exercise 2 is playing roulette and his
true probabilities of winning and losing each bet are 9/19 and 10/ 19
respectively. Solve (a), (b), and (c) of Exercise 2 using the true
probabilities.
4 A gambler makes a series of bets with probability p of winning and
Let Sn denote a simple random walk with p = q = 1/2 and let a < b.
Find P1a,b}(x, y) and G1a,blx, y) for a < x < b and a < y < b.
7 Let Sn denote a simple random walk withp = q = 1/2. Find P 101 (x, y)
Let Sn denote a simple random walk with 0 < q < p. Find P 0 (x, y)
and G0(x, y).
9 Let Sn denote a simple random walk with 0 < q < p. Find P<01( -1 , y)
and G101( -1 , y) for y < 0.
10
Exercises
235
P(Y = k) =
L P(N = n)P(Y =
n=O
IN =
n)
and
So
236
A.. Let
Y, denote the distance from t to the nearest particle to the left. Take
Y, = t if there are no particles to the left. Compute the distribution
function of Y,.
For Zt and Y, as in Exercises 17 and 18,
(a) show tha.t Yt and Zr are independent,
(b) compute the distribution of Z, + Y,.
Particles arrive at a counter according to a Poisson process with
parameter A.. Each particle gives rise to a pulse of unit duration. The
particle is counted by the counter if and only if it arrives when no pulses
are present. Find the probability that a particle is counted between time
t and timet + I. Assume t > I.
Consider a Poisson process on (0, oo) with parameter A. and let T be a
random variable independent of the process. Assume T has an
exponential distribution with parameter v. Let Nr denote the number
of particles in the interval [0, T]. Compute the discrete density of Nr.
Do Exercise 21 if T has the uniform distribution on [0, a], a > 0.
Consider two independent Poisson processes on [0, oo) having parameters A. 1 and A.2 respectively. What is the probability that the first
process has an event before the second process does?
Suppose n particles are distributed independently and uniformly on a
disk of radius r. Let D 1 denote the distance from the center of the disk
to the nearest particle. Compute the density of D 1
For D 1 as in Exercise 24 compute the moments of D 1 Hint: Obtain a
Beta integral by a change of variables.
Consider a Poisson process on R' having parameter A.. For a set A
having finite volume, let N ... denote the number of particles in A.
(a) Compute ENl .
(b) If A and Bare two sets having finite volume, compute E(N,.N8 ).
Let A1 , A 2 , , A 11 ben disjoint sets having finite volume, and similarly
let B 1 , B 2 , , B11 be n disjoint sets having finite volume. For real
numbers cx 1, , CX11 and P1, , p11 , set
19
20
21
22
23
24
25
26
27
11
f(x) = ~ cx11,.,(x)
i=l
and
II
g(x) = ~ P,ls,(x).
l= 1
(t
~~.,) (t
CXtN
= A.
p,N
B,)
A.
237
Exercises
28
= n) = (~) p~(l
- p,)"-"
(c) Now show that M(t) is Poisson distributed with parameter ltp,.
Answers
CHAPTER 1
2. 18/37.
3. 1/2.
6. 3/10.
9. 5/12.
4. 1/8.
8.
7. 1/2.
10. 5/8, 3/8.
12. 1/2.
3/10.
11. 4/5.
15. 5/29.
14. 2/5.
19. 0.
20. (a)
(r
(c)
(r
r(r - 1)
b)(r + b -
br
b)(r + b -
16. 10/19.
(b)
(r
1)
rb
b)(r + b -
1)
b(b - 1)
b)(r + b -
1)
(d)
(r
1)
26.
27. 14/23.
(c) 1/2.
28. 4/9.
29. 2/13.
31. (a) (r
+ c)/(b + r +
(c) 6/17.
30. 1/3.
+r+
c).
39..9976.
(b) 1 -
6 122
134 - 12
41. 75/91.
40. 2.
44. 1 - (1/10) 12
42. - - -4
45' (a)
(~f
(b)
ktO
46. 1 - (11/4)(3/4)7
239
Answers
240
47.
t=2
k=t
CHAPTER 2
3. 1/n.
5. (n - l)(r),.- 1 /r".
6. [('
1. 64.
~ N) (~))'-1
(:
~ ~) r +
b - n
12. 4.
e:)
Here q =
6q,
3
4 q.
c:)
e:) (~) (~).
(e) 4.
(h)
q,
(f) 10 4 5 q,
11 4q.
(i) 13
c (~)
2
)
3
52)-l
.
( 5
15.
sj c3) .
11.
18.
(~ =~)/(:).
4 q.
Answers
. 241
CHAPTER 3
1 _/(x) = {o1/10, x
o, 1, ... , 9,
elsewhere.
2. P(X
+ r = n) = pr (
- r
n -
)< -1)"-r(l -
(!) (n ~ k).
3. (a) P(X = k) =
p)"-r,
s k s
6,
n = r, r + 1, ... .
C~)
= k) = (;)
(b) P(X
4. 1/(2N +1
Gr or-"
n.
2).
5. (a) .3, (b) .3, (c) .55, (d) 2/9, (e) 5/11.
+ (1 _ p)to,
(1 _ p)' _ (1 _ p)u.
8. P(X
= k)
9. P(X
= k) = (k
= (2k - 1)/144,
IC~),
12.
k = 2, 3, .. .. 12.
- 1)
0
- {p(1 - p)",
1 . P( Y - x) - (1 _ p)M
= 0, 1, ... M -
x = M.
1,
11. (a)
(:)I(~)
s y) =
z) = ('
~-
y = n, n
z) I (~), z =
I, ... , r,
1, 2, ... , r - n
(b)
1
N + 1
= z) =
2
(N - z)
+ 1)
2z + 1
z
,
(N + 1) 2
(N
(c) P(IY -
XI = 0) =
P(lY - X I'
1
N
'!' 1 ,
z) = 2(N
= 0, ... , N,
+ 1-
(N
z = 0, .. . , N,
1)2
z)
z =
... '
+ 1.
242
Answers
16. (a)
(b)
P2
,
P2 - PtP2
P1P2
P2 - P1P2
Pt +
Pt +
17. (a) geometric with parameter p 1 + P2 - PtP2
(b)
18.
z = 0, 1~ 2~ . . ..
(a) g(x)
I:, h(y),
20. 5/72.
2
( r)! 2 ,
x 1 ! . .. x,.! r ,.
{2r)!
(b) 2"r2".
21. (a)
(z) (
P1
)z-y (
Pt
P2
P2
P1
),.
P2
24. {17/2)e- 3
26. p"(l - p):x,.-r.
23.
25. {a) 1 - {5/6)6 , {b) 4.
(53/8)e- S/ 2
(;
30. ~--~~------~~--~
(;)
z - 1
2 :s; z :s; N,
Jil'
31. P(X + Y =
z)
32.
<l>x{t)
1 (1 -
35
=
{
(z) (
y
).1
{x/2)!
0
).2
x! y! z!
37. (a)
elsewhere.
).1
)>'
).2
).1
3 6 (x + Y + z)! (
'
= 1.
)%-y (
).1
1-t
N+l
).:xf2e-A
33 fx(x)
elsewhere.
1
tN+ ) '
).1
).2
).2
):x (
).3
).1
).2
).2
)' (
).3
l1
).3
l2
eAp(t-O,
CHAPTER 4
1. {2N + 1)/3.
3. ;.-t(l -
e-A).
)z
).3
Answers
243
7. M + p-1(1 - p)M+1.
+ 2N)/12.
10. 2.
14. E(2X + 3Y) = 2EX + 3EY,
Var (2X + 3 Y) = 4 Var X + 9 Var Y.
16. (a)
(1 - ;1)",
(d) r ( 1 -
(b)
r[
(1 - 2)"
; ,
(c) r
~)"]
1 - (1 -
(1 - t),
~
r(r - 1)[ ( 1 -
~r
-(
1-
~)
2
"]
18. ~
2.
r(1 - i/r)
t= 1
20.
-C1z
22. -1.
= n(n -
Var X= n ('1 )
r
n( n- 1)
1)
'
1 2
'
r(r - 1)
(1 - 'r r - 1n ,
1
r1r2
) '
- n2
r(r - 1)
(d)
Var Y
'1'2
2
= n ('
(1 - ~)r r - 1
r -
n ;
26. 0
1_
= 1.
27. Chebyshev's inequality shows that a = 718 will suffice (see also the answer to
Exercise 46 of Chapter 7).
32. z/2.
CHAPTER 5
Fx(3).
0; F(x) = x/R 2 , 0:::;; x :::;; R 2 ; and F(x) = 1, x > R 2
= x/a, 0
> h.
= 0, x <
and F(x) = 1, x > s.
6.
F(x)
7.
+ 2x
-(l/2)x2 ,
244
Answers
8. m = A. -
Jo~e 2.
10- 3
10. F(x) = 0, x < 0; F(x) = xfa, 0 s x < a/2; and F(x) = 1, x ~ a/2.
11. (a) 7/12, (b) 1/3, (c) 1/6, (d) 5/12, (e) 1/2.
= (x + 10)/20, -5 s
x < 5;andF(x)
= 1,x ~
5.
= Oelsewhere.
26. fr(Y)
(y -
27. F(x) = 0, x < -1; F(x) = 1/2 + 1/n arcsin x, -1 s x s 1; F(x) = 1, x > 1.
/(x) = ljn../1 - xz, -l < x < 1, andf(x) == 0 elsewhere.
28. f(x) = A.lxle-Ax2 , - oo < x < oo .
29. X - a and a - X have the same distribution. F(a - x) = 1 - F (a + x) for all x.
30. <I(x)
= 1/2 +
2
31. fr(Y) =
_ e- 7212 (1\
a v'2n
32. f 1 (y)
ayV2n
33 . .6826.
34. (X - #)/a has the standard normal distribution.
35. fr(- 6) = .0030,
/ 1( - 5) = .0092,
fr(- 4) = .0279,
/ 1 (-2)
.1210,
/y(2) = .1210,
/ 1 (6) = .0030,
/ 1 ( - 1) = .1747,
[ 1 (0) = .1974,
/ 1 (3) = .0655,
/y(4) = .0279,
fr(Y) = 0 elsewhere.
/y(- 3) = .0655,
/ 1 (1)
= .1747,
/y{5) = .0092,
.244 (24.4%).
245
AnswtJts
40. (e)
( ) _- -2AYl-1 e -A>'l,
43 JirY
y > 0, and/r(Y)
r(a)
= v'y, y
44.
tp(y)
45.
<
46.
~- 1 (.1) = -1.282,
~- 1 (.4) = - .253,
~- 1 (.7) = .524,
<>- 1 (.2)
47.
p.
= 0 elsewhere.
;;:: 0.
+ .615G.
< 1.
~-1(.5)
= -.842,
0,
~- 1 (.9)
= 1.282.
49 . .82.
48. 1.
CHAPTER 6
1. Fw z(W, z)
(w -b a' z -d c) .
fw z(w, z) = _!_ I
bd
(w -b a' z -d c) .
F(v'w, v'z) - F(- v'w, v'z) - F(v'w, - v'z) + F(- v'w, - v'z)
1
_ (/(V;, v'~) + /( -v';, V~) + /(v';, -v;)
andfw,z(w, z) =
4V wz
+ f( -Vw, -v;))
for w, z > 0 and Fw.z(w, z) and.fw.z(w, z) equal zero elsewhere.
2. Fw.z<w, z)
3. (a) 3/4, (b) 5/12, (c) 3/4; these results are easily obtained by finding the areas of
the appropriate unit square.
4. 1 - e-t/2(12.
6. 1/3.
5. 3/8.
7. X is exponentially distributed with parameter l. Y has the gamma density r(2, .t).
Fx.r(x, y) = 1 - e-J.x - lxe-A)', 0 :S x :S y;
Fx.r(x, y) = 1 - e- 1Y(1 + Ay), 0 :S y < x; and Fx.r(x, y) = 0 elsewhere.
a > -1, (b) c = (a + 1)(a + 2),
(c) fx(x) =(a+ 2)(1 - x)+t, 0 < x < 1, andfx(x) = 0 elsewhere;
8. (a)
/y(y)
9.
=(a+ 2)y+t, 0
c = ../15/4n.
10. fr-x(z) =
:S y :S
1, and/y(Y) = 0 elsewhere.
X is distributed as
A1 A2
r (Z )
12 JX+Y
a +-2 Z + 1 ,
= -
2
/x+r(z) = 0 elsewhere.
0 :S z :S 1, fx + r(z)
= a+
- - 2 (2 2
= 0, z
z )41
+t
:S 0.
1 < z :S 2;
246
Answers
14.
fz(Z) =
15. (a 1
17. fR(r)
2- (1 - _z_),
b-a
b-a
= -
_!_ Joo
lbl -oo
1)/(a1 + a 2
= ' 2 e-r212
(1
2
,
0 < z
-00
< z <
00.
2).
= 0,
s 0.
(]
18. /xr(z)
oo
f(x, zfx)dx.
-oo lxl
0, z ~ 0.
= 0, z
0.
+n
- y.
27. fr(Y) = apf(y + p)+ 1 , y > 0, and /y(y) = 0, y s 0. The conditional density
of A given Y = y is the gamma density r(a + 1, p + y).
fi (
28 . fr(Y) = V2/n
- 3- y 2 e -y2; 262,
y 2:: 0, and r y)
0,
y < 0.
(]
31. /x 1.x2 .x3 (Xtt x2, x3) = 1fxtx2, 0 < x 3 < x 2 < x 1 < 1, and equals zero elsewhere.
/x3(x) = (loge x)2 /2. 0 < x < 1, and equals zero elsewhere.
1)(y - x)"- 2 0 < x s y < 1, and equals zero elsewhere;
(b) /R(r) = n(n - 1)(1 - r)r"- 2, 0 < r < 1, and zero elsewhere.
(c) Beta density with parameters k and n - k + 1.
= n(n -
34.
r(n/2),
+ b 2 ).
Answers
247
41 . fw,z(w, z) = ( z ) f ( z , wz ) .
w+1
w+1 w+1
CHAPTER 7
1. cxtf(cx 1 + rx 2 ).
2. Z will have finite expectation when cx 1 > 1 and cx2 > 0. In this case EZ
rx2 /(rx 1
1).
3. uJifn.
4. Xefe has a geometric distribution with parameter (1 - e- 1 e).
EXe
5.
6.
1/)..
+ cx2 +
m).
2) for cx 1 > 2.
12. EZ
15. (a) E I X
I=
17. EX'
= f(cx
19. EXk =
248
Answers
= 2(n -
1)/(n
+ 1)2 (n + 2).
21. p = 1/4.
27. E[X I Z
= z] = a 1 z/(a 1
28. E[ll I Y = y] = (a 1
33.
x, 1 ~ x ~ 2; and
+ + x;
{b) P(Xt
:$
36 . .9773.
38. .0415.
39. .0053.
40. (a) fx(x) ~
+ 1/2 - A.)/.J~) -
4>((x -
1/2 - A.)fv'l).
41. lfv'mr.
42. 1/v'me. Approximation (15) is not directly applicable because the greatest common
divisor of the set {x - 1 I x is a possible value of S 1 } is two rather than one.
43 . .133.
45. n ~ 6700.
44. .523.
46. 551.
CHAPTER I
1 . Mx(t) =
=F 0, and Mx(O)
= 1.
2. e11' M x(bt ).
4. (a) Mx(t) = [pf(l - e'(l - p))]", - oo < t < log (1 /(1 - p)).
5. (b) (2n)!
6. (a) dMx(t)
= npe'(pe'
dt
d2Mx(t)
dt 2
= npe'(pe' +
1 - p)"-
1 - p)"-t
10. el(e't-1).
'Px(t) =
.l.-+ 00
n(n - 1)p2e2'(pe'
<l>x(e1').
and
1 - p)"-2.
14.
P(X.l.v';.- A. ::;; x) =
~(x),
- oo <
x< oo .
Answers
249
CHAPTER 9
(c)
6. For x
850.
= y
p{ b}(y y) = 1 - - a,
~ .93,
b-a
- ---2(y - a)(b - y)
'
and
G
{a,b}
) = 2(y -
y, Y
a)(b - y) _
-a
For x < y
x-a
P<a,b}(x, y) = - y- a
and
_ 2(x - aXb - y)
G{a bl(X , Y)
'
b-a
For x > y
b-x
P{a,b}(x, y) = -b -
-y
and
_ 2(y - a)(b - x)
G{a b)(X, Y)
b- a
'
7. For x = y
P{o}(y, y) = 1 - 1/2y and G{o}(y, y) = 2y For x < y
P{o}(x, y) = x/y and G{o}(x, y) = 2x.
1.
For x > y
P{o}(x, y)
1 and G{o}(x, y)
2y.
8. For x = y
Pll'(y, y)
q - p and Gll'(y, y) =
q - P
p-q
For x < y
Pll'(x, y)
q).
For x > y
P (x, y)
0
9. P<o>(-1, -1)
(qfp)x-y.
p-q
= qandG<o>(-1, -1) =
!!.
p
For y < -1
P ( 1 y) {O} -
11. P =
;:z
'
P - q
q((qfp)Y -
11
and
00
G ( 1 y) {O} -
1
q(qfp)"
250
Answers
(nR 2p)"
k!
p)"-".
-JCR2p
P~c(r, n) =
J=O
17.
20. A.e-J..
21. IN/k) = vA."/0. + v)"+ 1 , k = 0, 1, 2, ... , and zero elsewhere.
22.
fNT(k) = -
[1 - e-la
A.a
(A.~)J]
i=O
, k
1!
2nx ( 1 - x
24. ID 1(X) = ~
72
1
)"- ,
1) jr (; +
1).
31.
(a) Pr = -1
i'
0
e-p<t-s> ds = 1 - e -pt .
Jl.t
Table I
ct>(z) =
J
-
I
v2,;
- - e-" 2 12 du = P(Z S z)
>
-3.
.0013 .OOJO .0007 .0005 .0003 .0002 .0002 .0001 .0001 .0000
-2.9
- 2.8
-2.7
-2.6
-2.5
-2.4
-2.3
-2.2
-2.1
-2.0
-1.9
-1.8
-1.7
-1.6
-1.5
-1.4
-1.3
- 1.2
-1.1
-1.0
- .9
- .8
- .7
- .6
- .5
- .4
- .3
- .2
- .1
- .0
.0019
.0026
.0035
.0047
.0062
.0082
.0107
.0139
.0179
.0228
.0287
.0359
.0446
.0548
.0668
.0808
.0968
.1151
.1357
.1587
.1841
.2119
.2420
.2743
.3085
.3446
.3821
.4207
.4602
.5000
.0018
.0025
.0034
.0045
.0060
.0080
.0104
.0136
.0174
.0222
.0281
.0352
.0436
.0537
.0655
.0793
.0951
.1131
.1335
.1562
.1814
.2090
.2389
.2709
.3050
.3409
.3783
.4168
.4562
.4960
.0017
.0024
.0033
.0044
.0059
.0078
.0102
.0132
.0170
.0217
.0274
.0344
.0427
.0526
.0643
.0778
.0934
.1 112
.1314
.1539
.1788
.2061
.2358
.2676
.3015
.3372
.3745
.4129
.4522
.4920
.0017
.0023
.0032
.0043
.0057
.0075
.0099
.0129
.0166
.0212
.0268
.0336
.0418
.0516
.0630
.0764
.0918
. 1093
.1292
.1515
.1762
.2033
.2327
.2643
.2981
.3336
.3707
.0016
.0023
.0031
.0041
.0055
.0073
.0096
.0126
.0162
.0207
.0262
.0329
.0409
.0505
.0618
.0749
.0901
.1075
. 1271
. 1492
.1736
.2005.2297
.2611
.2946
.3300
.3669
.4090 .4052
.4483 .4443
.4880 .4840
.0014
.0019
.0026
.0036
.0048
.0064
.0084
.0110
.0143
.0183
.0233
.0294
.0367
.0455
.0559
.0681
.0823
.0985
.1170
.1379
.1611
.1867
.2148
.2451
.2776
.3121
.3483
.3859
.4247
.4641
262
Table I
.0
.5000
.5398
.5793
.6179
---
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
.5040
.5438
.5832
.6217
.6554 .6591
.6915 .6950
.7257 .7291
.7580 .7611
.7881 .7910
.8159 .8186
.8413 .8438
.8643 .8665
.8849 .8869
.9032 .9049
.9192 .9207
.9332 .9345
.9452 .9463
.9554 .9564
.9641 .9648
.9713 .9719
.9772 .9778
.9821 .9826
.9861 .9864
.9893 .9896
.9918 .9920
.9938 .9940
.9953 .9955
.9965 .9966
.9974 .9975
.9981 .9982
3.
.1
.2
.3
.4
.5
.6
.7
.8
.9
1.0
1.1
1.2
1.3
1.4
1.5
L6
1.7
1.8
.5080
.5478
.5871
.6255
.6628
.6985
.7324
.7642
.79j9
.8212
.8461
.8686
.8888
.9066
.9222
.9357
.9474
.9573
.9656
.9726
.9783
.9830
.9868
.9898
.9922
.9941
.9956
.9967
.9976
.9982
.5120 .5160
.5~17 .5557
.5910 .5948
.6293 .6331
.6664 .6700
.7019 .1054
.7357 .7389
.7673 .7703
.7967 .7995
.8238 .8264
.8485 .8508
.8708 .8729
.8907 .8925
.9082 .9099
.9236 .9251
.9370 .9382
.9484 .9495
.9582 .9591
.9664 .9671
.9732 .9738
.9788 .9793
.9834 .9838
.9871 .9874
.9901 .9904
.9925 .9927
.9943 .9945
.9957 .9959
.9968 .9969
.9911 .9977
.9983 .9984
.9997
.5199
.5596
.5987
.6368
.6736
.7088
.7422
.7734
.8023
.8289
.8531
.8749
.8944
.9115
.9265
.9394
.9505
.9599
.9678
.9744
.9798
.9842
.9878
.9906
.9929
.9946
.9960
.9970
.9978
.9984
.5239
.5363
.6026
.6406
.6772
.7123
.7454
.7764
.8051
.8315
.8554
.8770
.8962
.9131
.9278
.9406
.9515
.9608
.9686
.9750
.9803
.9846
.9881
.9909
.9931
.9948
.9961
.9971
.9979
.9985
.5279
.5675
.6064
.6443
.6808
.7157
.7486
.7974
.8078
.8340
.8577
.8790
.8980
.9147
.9292
.9418
.9525
.9616
.9693
.9756
.9808
.9850
.9884
.9911
.9932
.9949
.9962
.9972
.9979
.9985
.5319
.5714
.6103
.6480
.6844
.7190
.7517
.7823
.8106
.8365
.8599
.8810
.8997
.9162
. 9306
.9430
.9535
.9625
.9700
.9762
.9812
.9854
.9887
.9913
.9934
.9951
.9963
.9973
.9980
.9986
.5359
.5753
.6141
.6517
.6879
.7224
.7549
.7852
.8133
.8389
.8621
.8830
.901S
.9 177
.9319
.9441
.9545
.9633
.9706
.9767
.9817
.9857
.9890
.9916
.9936
.9952
.9964
.9974
.9981
.9986
Note I : If a normal variable X is not "standard,.. its values must be "standardized": Z "' (X is, P(X
x)
(X-"')
~~>)fq,
That
(I - t 1 -
~z)
253
Index
Binomial coefficients, 31
Binomial distribution, 51
application of Chebyshev's Inequality,
102
Bernoulli trials, 66
mean, 83, 89
moment generating function, 198
normal approximation, 188, 190
Poisson approximation, 69
probability generating function, 73
sums of binomial random variables, 75
variance, 97
Birthday problem. 29
Bivariate distribution, 143
normal, 172
standard normal, 144
Conditional probability. 14
involving random variables, 57
Constant random variable, 52
characteristic function, 202
Continuity Theorem, 208
Continuous random variable, 109, 113
Convolution, 146
Coupon problem, 46
Covariance,96, 105,176,178
Decay, exponential, 5, 11 I
Deciles, 133
DeMoivre-Laplace Limit Theorem, 184
De Morgan's laws, 10
255
256
Density with respect to integration, 115
beta, 148
bivariate, 140, 143
chi-square (X 2 ), 164
conditional, 107, 153, 160
exponential, 119
F, 164
gamma, 129
joint, 140, 143, 157, 158
marginal, 141 , 158
Maxwell, 171
normal, 125
Rayleigh, 170
symmetric, 123
t, 165
Discrete density function, 50, 54
Bernoulli, 66
binomial, 51
conditional, 107
geometric, 55
hypergeometric, 52
joint, 62
marginal, 62
multinomial, 68
negative binomial, 55
Poisson, 56
symmetric, 123
Discrete random variable, 50
Discrete random vector, 61
Distribution, 51
Distribution function, 110, 115
absolutely continuous, 11 S
Cauchy, 122
discrete random variable, 57-58
gamma, 130
geometric, 59
inverse, 131
joint, 139, 157
marginal, 140, 157
normal, 125
properties, 112
symmetric density, 124
transformations involving, 131
uniform, 118
Index
F distribution, 164
Failure rate, 137
Field of sets, 7
sigma field, 7
Half-life, 133
Hypergeometric distribution, 52
mean, 90
variance, 98
257
Index
258
Index
t distribution, 165
52
Schwarz inequality, 99
Sigma field (a-field) of subsets, 7
Simple random walk, 220
Standard bivariate normal
143-144
Standard deviation, 94, 176
distribution,
Introduction
to Probability
Theory