Sol Manual
Sol Manual
Sol Manual
• This solution manual remains under construction. The current count is that 678 (out of 687)
problems have solutions. The unsolved problems are
If you volunteer a solution for one of those problems, we’ll be happy to include it . . . and, of
course, “your wildest dreams will come true.”
• Of course, the correctness of every single solution reamins unconfirmed. If you find errors or
have suggestions or comments, please send email: ryates@winlab.rutgers.edu.
• If you need to make solution sets for your class, you might like the Solution Set Constructor
at the instructors site www.winlab.rutgers.edu/probsolns. If you need access, send email:
ryates@winlab.rutgers.edu.
• Matlab functions written as solutions to homework problems can be found in the archive
matsoln.zip (available to instructors) or in the directory matsoln. Other Matlab functions
used in the text or in these homework solutions can be found in the archive matcode.zip
or directory matcode. The .m files in matcode are available for download from the Wiley
website. Two other documents of interest are also available for download:
• A web-based solution set constructor for the second edition is available to instructors at
http://www.winlab.rutgers.edu/probsolns
• The next update of this solution manual is likely to occur in January, 2006.
1
Problem Solutions – Chapter 1
M O
(b) Every pizza is either Regular (R), or Tuscan (T ). Hence R ∪ T = S so that R and T are
collectively exhaustive. Thus its also (trivially) true that R ∪ T ∪ M = S. That is, R, T and
M are also collectively exhaustive.
(c) From the Venn diagram, T and O are mutually exclusive. In words, this means that Tuscan
pizzas never have onions or pizzas with onions are never Tuscan. As an aside, “Tuscan” is
a fake pizza designation; one shouldn’t conclude that people from Tuscany actually dislike
onions.
(d) From the Venn diagram, M ∩ T and O are mutually exclusive. Thus Gerlanda’s doesn’t make
Tuscan pizza with mushrooms and onions.
(e) Yes. In terms of the Venn diagram, these pizzas are in the set (T ∪ M ∪ O)c .
M O
(a) An outcome specifies whether the fax is high (h), medium (m), or low (l) speed, and whether
the fax has two (t) pages or four (f ) pages. The sample space is
2
(b) The event that the fax is medium speed is A1 = {mt, mf }.
(c) The event that a fax has two pages is A2 = {ht, mt, lt}.
(d) The event that a fax is either high speed or low speed is A3 = {ht, hf, lt, lf }.
(e) Since A1 ∩ A2 = {mt} and is not empty, A1 , A2 , and A3 are not mutually exclusive.
(f) Since
A1 ∪ A2 ∪ A3 = {ht, hf, mt, mf, lt, lf } = S, (2)
the collection A1 , A2 , A3 is collectively exhaustive.
(d) Since ZF ∪ XA = {aaa, aaf, af a, af f, f af, f f f} = S, ZF and XA are not collectively exhaus-
tive.
D = {f f a, f af, af f, f f f } . (5)
3
Problem 1.2.4 Solution
The sample space is
⎧ ⎫
⎨ 1/1 . . . 1/31, 2/1 . . . 2/29, 3/1 . . . 3/31, 4/1 . . . 4/30, ⎬
S= 5/1 . . . 5/31, 6/1 . . . 6/30, 7/1 . . . 7/31, 8/1 . . . 8/31, . (1)
⎩ ⎭
9/1 . . . 9/31, 10/1 . . . 10/31, 11/1 . . . 11/30, 12/1 . . . 12/31
The event H defined by the event of a July birthday is described by following 31 sample points.
H = {7/1, 7/2, . . . , 7/31} . (2)
2. If we need to check whether the first resistance exceeds the second resistance, an event space
is
B1 = {R1 > R2 } B2 = {R1 ≤ R2 } . (2)
3. If we need to check whether each resistance doesn’t fall below a minimum value (in this case
50 ohms for R1 and 100 ohms for R2 ), an event space is
C1 = {R1 < 50, R2 < 100} , C2 = {R1 < 50, R2 ≥ 100} , (3)
C3 = {R1 ≥ 50, R2 < 100} , C4 = {R1 ≥ 50, R2 ≥ 100} . (4)
4. If we want to check whether the resistors in parallel are within an acceptable range of 90 to
110 ohms, an event space is
D1 = (1/R1 + 1/R2 )−1 < 90 , (5)
−1
D2 = 90 ≤ (1/R1 + 1/R2 ) ≤ 110 , (6)
−1
D2 = 110 < (1/R1 + 1/R2 ) . (7)
4
Problem 1.3.1 Solution
The sample space of the experiment is
From the problem statement, we know that P [LF ] = 0.5, P [BF ] = 0.2 and P [BW ] = 0.2. This
implies P [LW ] = 1 − 0.5 − 0.2 − 0.2 = 0.1. The questions can be answered using Theorem 1.5.
The problem statement tells us that P [HF ] = 0.2, P [M W ] = 0.1 and P [F ] = 0.5. We can use
these facts to find the probabilities of the other outcomes. In particular,
P [F ] = P [HF ] + P [M F ] . (2)
This implies
P [M F ] = P [F ] − P [HF ] = 0.5 − 0.2 = 0.3. (3)
Also, since the probabilities must sum to 1,
Now that we have found the probabilities of the outcomes, finding any other probability is easy.
(b) The probability that a cell hpone is mobile and fast is P [M F ] = 0.3.
5
Problem 1.3.3 Solution
A reasonable probability model that is consistent with the notion of a shuffled deck is that each
card in the deck is equally likely to be the first card. Let Hi denote the event that the first card
drawn is the ith heart where the first heart is the ace, the second heart is the deuce and so on. In
that case, P [Hi ] = 1/52 for 1 ≤ i ≤ 13. The event H that the first card is a heart can be written
as the disjoint union
H = H1 ∪ H2 ∪ · · · ∪ H13 . (1)
Using Theorem 1.1, we have
13
P [H] = P [Hi ] = 13/52. (2)
i=1
This is the answer you would expect since 13 out of 52 cards are hearts. The point to keep in
mind is that this is not just the common sense answer but is the result of a probability model for
a shuffled deck and the axioms of probability.
6
Problem 1.4.2 Solution
(a) From the given probability distribution of billed minutes, M , the probability that a call is
billed for more than 3 minutes is
(b) The probability that a call will billed for 9 minutes or less is
9
P [9 minutes or less] = α(1 − α)i−1 = 1 − (0.57)3 . (5)
i=1
To prove the union bound by induction, we first prove the theorem for the case of n = 2 events. In
this case, by Theorem 1.7(c),
7
By the first axiom of probability, P [A1 ∩ A2 ] ≥ 0. Thus,
which proves the union bound for the case n = 2. Now we make our induction hypothesis that the
union-bound holds for any collection of n − 1 subsets. In this case, given subsets A1 , . . . , An , we
define
A = A1 ∪ A2 ∪ · · · ∪ An−1 , B = An . (4)
By our induction hypothesis,
P [A1 ∪ · · · ∪ An ] = P [A ∪ B] (6)
≤ P [A] + P [B] (by the union bound for n = 2) (7)
= P [A1 ∪ · · · ∪ An−1 ] + P [An ] (8)
≤ P [A1 ] + · · · P [An−1 ] + P [An ] (9)
(a) For convenience, let pi = P [F Hi ] and qi = P [V Hi ]. Using this shorthand, the six unknowns
p0 , p1 , p2 , q0 , q1 , q2 fill the table as
H0 H1 H2
F p0 p1 p2 . (1)
V q0 q1 q2
Other facts, such as q0 + q1 + q2 = 7/12, can be derived from these facts. Thus, we have
four equations and six unknowns, choosing p0 and p1 will specify the other unknowns. Un-
fortunately, arbitrary choices for either p0 or p1 will lead to negative values for the other
probabilities. In terms of p0 and p1 , the other unknowns are
0 ≤ p0 ≤ 1/3, (6)
0 ≤ p1 ≤ 1/3, (7)
1/12 ≤ p0 + p1 ≤ 5/12. (8)
8
Although there are an infinite number of solutions, three possible solutions are:
and
and
(b) In terms of the pi , qi notation, the new facts are p0 = 1/4 and q1 = 1/6. These extra facts
uniquely specify the probabilities. In this case,
The above “proof” used the property that for mutually exclusive sets A1 and A2 ,
The problem is that this property is a consequence of the three axioms, and thus must be proven.
For a proof that uses just the three axioms, let A1 be an arbitrary set and for n = 2, 3, . . ., let
An = φ. Since A1 = ∪∞i=1 Ai , we can use Axiom 3 to write
∞
P [A1 ] = P [∪∞
i=1 Ai ] = P [A1 ] + P [A2 ] + P [Ai ] . (3)
i=3
By subtracting P [A1 ] from both sides, the fact that A2 = φ permits us to write
∞
P [φ] + P [Ai ] = 0. (4)
n=3
By Axiom 1, P [Ai ] ≥ 0 for all i. Thus, ∞ n=3 P [Ai ] ≥ 0. This implies P [φ] ≤ 0. Since Axiom 1
requires P [φ] ≥ 0, we must have P [φ] = 0.
9
Problem 1.4.8 Solution
Following the hint, we define the set of events {Ai |i = 1, 2, . . .} such that i = 1, . . . , m, Ai = Bi and
for i > m, Ai = φ. By construction, ∪m ∞
i=1 Bi = ∪i=1 Ai . Axiom 3 then implies
∞
∞
P [∪m
i=1 Bi ] = P [∪i=1 Ai ] = P [Ai ] . (1)
i=1
m
m
For i > m, P [Ai ] = P [φ] = 0, yielding the claim P [∪m
i=1 Bi ] = i=1 P [Ai ] = i=1 P [Bi ].
Note that the fact that P [φ] = 0 follows from Axioms 1 and 2. This problem is more challenging
if you just use Axiom 3. We start by observing
m−1 ∞
P [∪m
i=1 Bi ] = P [Bi ] + P [Ai ] . (2)
i=1 i=m
Now, we use Axiom 3 again on the countably infinite sequence Am , Am+1 , . . . to write
∞
P [Ai ] = P [Am ∪ Am+1 ∪ · · ·] = P [Bm ] . (3)
i=m
m
Thus, we have used just Axiom 3 to prove Theorem 1.4: P [∪m
i=1 Bi ] = i=1 P [Bi ].
Thus, P [φ] = 0. Note that this proof uses only Theorem 1.4 which uses only Axiom 3.
10
(b) Using Theorem 1.4 with B1 = A and B2 = Ac , we have
P [S] = P [A ∪ Ac ] = P [A] + P [Ac ] . (7)
Since, Axiom 2 says P [S] = 1, P [Ac ] = 1 − P [A]. This proof uses Axioms 2 and 3.
(c) By Theorem 1.2, we can write both A and B as unions of disjoint events:
A = (AB) ∪ (AB c ) B = (AB) ∪ (Ac B). (8)
Now we apply Theorem 1.4 to write
P [A] = P [AB] + P [AB c ] , P [B] = P [AB] + P [Ac B] . (9)
We can rewrite these facts as
P [AB c ] = P [A] − P [AB], P [Ac B] = P [B] − P [AB]. (10)
Note that so far we have used only Axiom 3. Finally, we observe that A ∪ B can be written
as the union of mutually exclusive events
A ∪ B = (AB) ∪ (AB c ) ∪ (Ac B). (11)
Once again, using Theorem 1.4, we have
P [A ∪ B] = P [AB] + P [AB c ] + P [Ac B] (12)
Substituting the results of Equation (10) into Equation (12) yields
P [A ∪ B] = P [AB] + P [A] − P [AB] + P [B] − P [AB] , (13)
which completes the proof. Note that this claim required only Axiom 3.
(d) Observe that since A ⊂ B, we can write B as the disjoint union B = A ∪ (Ac B). By
Theorem 1.4 (which uses Axiom 3),
P [B] = P [A] + P [Ac B] . (14)
By Axiom 1, P [Ac B] ≥ 0, hich implies P [A] ≤ P [B]. This proof uses Axioms 1 and 3.
11
Problem 1.5.2 Solution
Let si denote the outcome that the roll is i. So, for 1 ≤ i ≤ 6, Ri = {si }. Similarly, Gj =
{sj+1 , . . . , s6 }.
(a) Since G1 = {s2 , s3 , s4 , s5 , s6 } and all outcomes have probability 1/6, P [G1 ] = 5/6. The event
R3 G1 = {s3 } and P [R3 G1 ] = 1/6 so that
P [R3 G1 ] 1
P [R3 |G1 ] = = . (1)
P [G1 ] 5
(b) The conditional probability that 6 is rolled given that the roll is greater than 3 is
P [R6 G3 ] P [s6 ] 1/6
P [R6 |G3 ] = = = . (2)
P [G3 ] P [s4 , s5 , s6 ] 3/6
(c) The event E that the roll is even is E = {s2 , s4 , s6 } and has probability 3/6. The joint
probability of G3 and E is
P [G3 E] = P [s4 , s6 ] = 1/3. (3)
The conditional probabilities of G3 given E is
P [G3 E] 1/3 2
P [G3 |E] = = = . (4)
P [E] 1/2 3
(d) The conditional probability that the roll is even given that it’s greater than 3 is
P [EG3 ] 1/3 2
P [E|G3 ] = = = . (5)
P [G3 ] 1/2 3
12
Problem 1.5.5 Solution
The sample outcomes can be written ijk where the first card drawn is i, the second is j and the
third is k. The sample space is
S = {234, 243, 324, 342, 423, 432} . (1)
and each of the six outcomes has probability 1/6. The events E1 , E2 , E3 , O1 , O2 , O3 are
E1 = {234, 243, 423, 432} , O1 = {324, 342} , (2)
E2 = {243, 324, 342, 423} , O2 = {234, 432} , (3)
E3 = {234, 324, 342, 432} , O3 = {243, 423} . (4)
(a) The conditional probability the second card is even given that the first card is even is
P [E2 E1 ] P [243, 423] 2/6
P [E2 |E1 ] = = = = 1/2. (5)
P [E1 ] P [234, 243, 423, 432] 4/6
(b) The conditional probability the first card is even given that the second card is even is
P [E1 E2 ] P [243, 423] 2/6
P [E1 |E2 ] = = = = 1/2. (6)
P [E2 ] P [243, 324, 342, 423] 4/6
(c) The probability the first two cards are even given the third card is even is
P [E1 E2 E3 ]
P [E1 E2 |E3 ] = = 0. (7)
P [E3 ]
(d) The conditional probabilities the second card is even given that the first card is odd is
P [O1 E2 ] P [O1 ]
P [E2 |O1 ] = = = 1. (8)
P [O1 ] P [O1 ]
(e) The conditional probability the second card is odd given that the first card is odd is
P [O1 O2 ]
P [O2 |O1 ] = = 0. (9)
P [O1 ]
13
(b) The conditional probability that a tick has HGE given that it has Lyme disease is
P [LH] 0.0236
P [H|L] = = = 0.1475. (5)
P [L] 0.16
• P [A] = 1 implying A = B = S.
• P [A] = 0 implying A = B = φ.
In the Venn diagram, assume the sample space has area 1 correspond-
A ing to probability 1. As drawn, both A and B have area 1/4 so that
P [A] = P [B] = 1/4. Moreover, the intersection AB has area 1/16
and covers 1/4 of A and 1/4 of B. That is, A and B are independent
since
B P [AB] = P [A] P [B] . (1)
14
(c) Since C and D are independent,
P [C ∩ D] = P [C] P [D] = 15/64. (3)
The next few items are a little trickier. From Venn diagrams, we see
P [C ∩ Dc ] = P [C] − P [C ∩ D] = 5/8 − 15/64 = 25/64. (4)
It follows that
P [C ∪ Dc ] = P [C] + P [Dc ] − P [C ∩ Dc ] (5)
= 5/8 + (1 − 3/8) − 25/64 = 55/64. (6)
Using DeMorgan’s law, we have
P [C c ∩ Dc ] = P [(C ∪ D)c ] = 1 − P [C ∪ D] = 15/64. (7)
15
Problem 1.6.5 Solution
For a sample space S = {1, 2, 3, 4} with equiprobable outcomes, consider the events
Each event Ai has probability 1/2. Moreover, each pair of events is independent since
A plant has yellow seeds, that is event Y occurs, if a plant has at least one dominant y gene. Except
for the four outcomes with a pair of recessive g genes, the remaining 12 outcomes have yellow seeds.
From the above, we see that
P [Y ] = 12/16 = 3/4 (2)
and
P [R] = 12/16 = 3/4. (3)
To find the conditional probabilities P [R|Y ] and P [Y |R], we first must find P [RY ]. Note that
RY , the event that a plant has rounded yellow seeds, is the set of outcomes
RY = {rryy, rryg, rrgy, rwyy, rwyg, rwgy, wryy, wryg, wrgy} . (4)
16
Problem 1.6.7 Solution
(a) For any events A and B, we can write the law of total probability in the form of
(b) Proving that Ac and B are independent is not really necessary. Since A and B are arbitrary
labels, it is really the same claim as in part (a). That is, simply reversing the labels of A and
B proves the claim. Alternatively, one can construct exactly the same proof as in part (a)
with the labels A and B reversed.
(c) To prove that Ac and B c are independent, we apply the result of part (a) to the sets A and
B c . Since we know from part (a) that A and B c are independent, part (b) says that Ac and
B c are independent.
A AC
In the Venn diagram at right, assume the sample space has area 1 cor-
responding to probability 1. As drawn, A, B, and C each have area 1/2
AB ABC C
and thus probability 1/2. Moreover, the three way intersection ABC has
probability 1/8. Thus A, B, and C are mutually independent since
B BC P [ABC] = P [A] P [B] P [C] . (1)
A AB B
In the Venn diagram at right, assume the sample space has area 1 cor-
responding to probability 1. As drawn, A, B, and C each have area
AC C BC
1/3 and thus probability 1/3. The three way intersection ABC has zero
probability, implying A, B, and C are not mutually independent since
However, AB, BC, and AC each has area 1/9. As a result, each pair of events is independent
since
P [AB] = P [A] P [B] , P [BC] = P [B] P [C] , P [AC] = P [A] P [C] . (2)
17
Problem 1.7.1 Solution
A sequential sample space for this experiment is
This implies
P [H1 H2 ] 1/16
P [H1 |H2 ] = = = 1/4. (2)
P [H2 ] 1/4
(b) The probability that the first flip is heads and the second flip is tails is P [H1 T2 ] = 3/16.
G2 •G1 G2
3/4 3/8
1/2 G1 XXX
XX
1/4 X R2 •G1 R2 1/8
H
HH
1/4 G2 •R1 G2
H
1/8
1/2HH R1
XXX
XX
3/4 X R2 •R1 R2 3/8
The conditional probability that the first light was green given the second light was green is
Finally, from the tree diagram, we can directly read that P [G2 |G1 ] = 3/4.
18
G2 •G1 G2
3/4 3/8
1/2 G1 XXX
XX
1/4 X B2 •G1 B2 1/8
HH
HH 1/4 G2 •B1 G2 1/8
1/2HH B1
XXX
XX
3/4 X B2 •B1 B2 3/8
The game goes into overtime if exactly one free throw is made. This event has probability
P [O] = P [G1 B2 ] + P [B1 G2 ] = 1/8 + 1/8 = 1/4. (1)
19
Problem 1.7.6 Solution
Let Ai and Di indicate whether the ith photodetector is acceptable or defective.
(a) We wish to find the probability P [E1 ] that exactly one photodetector is acceptable. From
the tree, we have
(b) The probability that both photodetectors are defective is P [D1 D2 ] = 6/25.
The probability of H1 is
Similarly,
Thus P [H1 H2 ] = P [H1 ]P [H2 ], implying H1 and H2 are not independent. This result should not
be surprising since if the first flip is heads, it is likely that coin B was picked first. In this case, the
second flip is less likely to be heads since it becomes more likely that the second coin flipped was
coin A.
20
Problem 1.7.8 Solution
(a) The primary difficulty in this problem is translating the words into the correct tree diagram.
The tree for this problem is shown below.
21
Problem 1.7.9 Solution
(a) We wish to know what the probability that we find no good photodiodes in n pairs of diodes.
Testing each pair of diodes is an independent trial such that with probability p, both diodes
of a pair are bad. From Problem 1.7.6, we can easily calculate p.
The probability of Zn , the probability of zero acceptable diodes out of n pairs of diodes is pn
because on each test of a pair of diodes, both must be defective.
n
n
n 6
P [Zn ] = p=p = (2)
25
i=1
(b) Another way to phrase this question is to ask how many pairs must we test until P [Zn ] ≤ 0.01.
Since P [Zn ] = (6/25)n , we require
n
6 ln 0.01
≤ 0.01 ⇒ n ≥ = 3.23. (3)
25 ln 6/25
From the tree, P [C1 ] = p and P [C2 ] = (1 − p)p. Finally, a fish is caught on the nth cast if no fish
were caught on the previous n − 1 casts. Thus,
22
Problem 1.8.3 Solution
(a) The experiment of picking two cards and recording them in the order in which they were
selected can be modeled by two sub-experiments. The first is to pick the first card and
record it, the second sub-experiment is to pick the second card without replacing the first
and recording it. For the first sub-experiment we can have any one of the possible 52 cards
for a total of 52 possibilities. The second experiment consists of all the cards minus the one
that was picked first(because we are sampling without replacement) for a total of 51 possible
outcomes. So the total number of outcomes is the product of the number of outcomes for
each sub-experiment.
52 · 51 = 2652 outcomes. (1)
(b) To have the same card but different suit we can make the following sub-experiments. First
we need to pick one of the 52 cards. Then we need to pick one of the 3 remaining cards that
are of the same type but different suit out of the remaining 51 cards. So the total number
outcomes is
52 · 3 = 156 outcomes. (2)
(c) The probability that the two cards are of the same type but different suit is the number of
outcomes that are of the same type but different suit divided by the total number of outcomes
involved in picking two cards at random from a deck of 52 cards.
156 1
P [same type, different suit] = = . (3)
2652 17
(d) Now we are not concerned with the ordering of the cards. So before, the outcomes (K♥, 8♦)
and (8♦, K♥) were distinct. Now, those two outcomes are not distinct and are only considered
to be the single outcome that a King of hearts and 8 of diamonds were selected. So every
pair of outcomes before collapses to a single outcome when we disregard ordering. So we can
redo parts (a) and (b) above by halving the corresponding values found in parts (a) and (b).
The probability however, does not change because both the numerator and the denominator
have been reduced by an equal factor of 2, which does not change their ratio.
3. Of the remaining
14 field players, choose 8 for the remaining field positions. There are
N3 = 148 to do this.
4. For the 9 batters (consisting of the 8 field players and the designated hitter), choose a batting
lineup. There are N4 = 9! ways to do this.
23
So the total number of different starting lineups when the DH is selected among the field players is
14
N = N1 N2 N3 N4 = (10)(15) 9! = 163,459,296,000. (1)
8
Note that this overestimates the number of combinations the manager must really consider because
most field players can play only one or two positions. Although these constraints on the manager
reduce the number of possible lineups, it typically makes the manager’s job more difficult. As
for the counting, we note that our count did not need to specify the positions played by the field
players. Although this is an important consideration for the manager, it is not part of our counting
of different lineups. In fact, the 8 nonpitching field players are allowed to switch positions at any
time in the field. For example, the shortstop and second baseman could trade positions in the
middle of an inning. Although the DH can go play the field, there are some coomplicated rules
about this. Here is an an excerpt from Major league Baseball Rule 6.10:
The Designated Hitter may be used defensively, continuing to bat in the same posi-
tion in the batting order, but the pitcher must then bat in the place of the substituted
defensive player, unless more than one substitution is made, and the manager then must
designate their spots in the batting order.
If you’re curious, you can find the complete rule on the web.
So the total number of different starting lineups when the DH is selected among the field
players is
14
N = N1 N2 N3 N4 = (10)(15) 9! = 163,459,296,000. (1)
8
• The DH is a pitcher. In this case, there are 10 choices for the pitcher,
15 10 choices for the
DH among the pitchers (including the pitcher batting for himself), 8 choices for the field
players, and 9! ways of ordering the batters into a lineup. The number of possible lineups is
15
N = (10)(10) 9! = 233, 513, 280, 000. (2)
8
24
Problem 1.8.6 Solution
(a) We can find the number of valid starting lineups by noticing that the swingman presents
three situations: (1) the swingman plays guard, (2) the swingman plays forward, and (3) the
swingman doesn’t play. The first situation is when the swingman can be chosen to play the
guard position, and the second where the swingman can only be chosen to play the forward
position. Let Ni denote the number of lineups corresponding to case i. Then we can write
the total number of lineups as N1 + N2 + N3 . In the first situation, we have to choose 1 out
of 3 centers, 2 out of 4 forwards, and 1 out of 4 guards so that
3 4 4
N1 = = 72. (1)
1 2 1
In the second case, we need to choose 1 out of 3 centers, 1 out of 4 forwards and 2 out of 4
guards, yielding
3 4 4
N2 = = 72. (2)
1 1 2
Finally, with the swingman on the bench, we choose 1 out of 3 centers, 2 out of 4 forward,
and 2 out of four guards. This implies
3 4 4
N3 = = 108, (3)
1 2 2
n 9 11 14 17
k 0 1 2 3 (2)
p 0.0079 0.012 0.0105 0.0090
25
(a) Since the probability of a zero is 0.8, we can express the probability of the code word 00111
as 2 occurrences of a 0 and three occurrences of a 1. Therefore
(b) The probability that a code word has exactly three 1’s is
5
P [three 1’s] = (0.8)2 (0.2)3 = 0.0512. (2)
3
The probability of each of these events is less than 1 in 1000! Given that these events took place
in the relatively short fifty year history of the NBA, it should seem that these probabilities should
be much higher. What the model overlooks is that the sequence of 10 titles in 11 years started
when Bill Russell joined the Celtics. In the years with Russell (and a strong supporting cast) the
probability of a championship was much higher.
The probability that the number of green lights equals the number of red lights
P [G = R] = P [G = 1, R = 1, Y = 3] + P [G = 2, R = 2, Y = 1] + P [G = 0, R = 0, Y = 5] (2)
3
2
2
5
5! 7 7 1 5! 7 7 1 5! 1
= + + (3)
1!1!3! 16 16 8 2!1!2! 16 16 8 0!0!5! 8
≈ 0.1449. (4)
26
1−p W2 •W1 W2 p(1−p)
W3 •W1 L2 W3 p
3
p
p W1 L2 L3 •W1 L2 L3 p2 (1−p)
p 1−p
XXX
XX
1−p X L1
1−p p
HH W2 XX W3 •L1 W2 W3 p(1−p)2
XXX
HH 1−p X L3 •L1 W2 L3 (1−p)3
pHH
L2 •L1 L2 p(1−p)
The probability that the team with the home court advantage wins is
P [H] = P [W1 W2 ] + P [W1 L2 W3 ] + P [L1 W2 W3 ] (1)
= p(1 − p) + p + p(1 − p) .
3 2
(2)
Note that P [H] ≤ p for 1/2 ≤ p ≤ 1. Since the team with the home court advantage would win
a 1 game playoff with probability p, the home court team is less likely to win a three game series
than a 1 game playoff!
(a) There are 3 group 1 kickers and 6 group 2 kickers. Using Gi to denote that a group i kicker
was chosen, we have
P [G1 ] = 1/3 P [G2 ] = 2/3. (1)
In addition, the problem statement tells us that
P [K|G1 ] = 1/2 P [K|G2 ] = 1/3. (2)
Combining these facts using the Law of Total Probability yields
P [K] = P [K|G1 ] P [G1 ] + P [K|G2 ] P [G2 ] (3)
= (1/2)(1/3) + (1/3)(2/3) = 7/18. (4)
(b) To solve this part, we need to identify the groups from which the first and second kicker were
chosen. Let ci indicate whether a kicker was chosen from group i and let Cij indicate that
the first kicker was chosen from group i and the second kicker from group j. The experiment
to choose the kickers is described by the sample tree:
Since a kicker from group 1 makes a kick with probability 1/2 while a kicker from group 2
makes a kick with probability 1/3,
P [K1 K2 |C11 ] = (1/2)2 P [K1 K2 |C12 ] = (1/2)(1/3) (5)
P [K1 K2 |C21 ] = (1/3)(1/2) P [K1 K2 |C22 ] = (1/3)2 (6)
27
By the law of total probability,
Note that 15/96 and (7/18)2 are close but not exactly the same. The reason K1 and K2 are
dependent is that if the first kicker is successful, then it is more likely that kicker is from
group 1. This makes it more likely that the second kicker is from group 2 and is thus more
likely to miss.
(c) Once a kicker is chosen, each of the 10 field goals is an independent trial. If the kicker is
from group 1, then the success probability is 1/2. If the kicker is from group 2, the success
probability is 1/3. Out of 10 kicks, there are 5 misses iff there are 5 successful kicks. Given
the type of kicker chosen, the probability of 5 misses is
10 10
P [M |G1 ] = 5 5
(1/2) (1/2) , P [M |G2 ] = (1/3)5 (2/3)5 . (15)
5 5
28
W1 W2 W3 W5
W4 W6
To find the probability that the device works, we replace series devices 1, 2, and 3, and parallel
devices 5 and 6 each with a single device labeled with the probability that it works. In particular,
(1-q)
3 2
1-q
1-q
The probability P [W ] that the two devices in parallel work is 1 minus the probability that neither
works:
P W = 1 − q(1 − (1 − q)3 ). (3)
Finally, for the device to work, both composite device in series must work. Thus, the probability
the device works is
P [W ] = [1 − q(1 − (1 − q)3 )][1 − q 2 ]. (4)
Note that if a 0 is transmitted, then 0 is sent five times and we call decoding a 0 a success.
You should convince yourself that this a symmetric situation with the same deletion and error
probabilities. Introducing deletions reduces the probability of an error by roughly a factor of 20.
However, the probability of successfull decoding is also reduced.
29
Problem 1.10.3 Solution
Note that each digit 0 through 9 is mapped to the 4 bit binary representation of the digit. That is,
0 corresponds to 0000, 1 to 0001, up to 9 which corresponds to 1001. Of course, the 4 bit binary
numbers corresponding to numbers 10 through 15 go unused, however this is unimportant to our
problem. the 10 digit number results in the transmission of 40 bits. For each bit, an independent
trial determines whether the bit was correct, a deletion, or an error. In Problem 1.10.2, we found
the probabilities of these events to be
P [C] = γ = 0.91854, P [D] = δ = 0.081, P [E] = = 0.00046. (1)
Since each of the 40 bit transmissions is an independent trial, the joint probability of c correct bits,
d deletions, and e erasures has the multinomial probability
40! c d e
c!d!e! γ δ c + d + e = 40; c, d, e ≥ 0,
P [C = c, D = d, E = d] = (2)
0 otherwise.
W1 W2 W3 W5
W4 W6
By symmetry, note that the reliability of the system is the same whether we replace component 1,
component 2, or component 3. Similarly, the reliability is the same whether we replace component
5 or component 6. Thus we consider the following cases:
I Replace component 1 In this case
q
P [W1 W2 W3 ] = (1 − )(1 − q)2 , P [W4 ] = 1 − q, P [W5 ∪ W6 ] = 1 − q 2 . (1)
2
This implies
q2
P [W1 W2 W3 ∪ W4 ] = 1 − (1 − P [W1 W2 W3 ])(1 − P [W4 ]) = 1 − (5 − 4q + q 2 ). (2)
2
In this case, the probability the system works is
q2
P [WI ] = P [W1 W2 W3 ∪ W4 ] P [W5 ∪ W6 ] = 1 − (5 − 4q + q ) (1 − q 2 ).
2
(3)
2
30
III Replace component 5 In this case,
q2
P [W1 W2 W3 ] = (1 − q)3 , P [W4 ] = 1 − q, P [W5 ∪ W6 ] = 1 − . (7)
2
This implies
P [W1 W2 W3 ∪ W4 ] = 1 − (1 − P [W1 W2 W3 ])(1 − P [W4 ]) = (1 − q) 1 + q(1 − q)2 . (8)
From these expressions, its hard to tell which substitution creates the most reliable circuit. First,
we observe that P [WII ] > P [WI ] if and only if
q q q2
1− + (1 − q)3 > 1 − (5 − 4q + q 2 ). (11)
2 2 2
Some algebra will show that P [WII ] > P [WI ] if and only if q 2 < 2, which occurs for all nontrivial
(i.e., nonzero) values of q. Similar algebra will show that P [WII ] > P [WIII ] for all values of
0 ≤ q ≤ 1. Thus the best policy is to replace component 4.
Keep in mind that 50*rand(200,1) produces a 200 × 1 vector of random numbers, each in the
interval (0, 50). Applying the ceiling function converts these random numbers to rndom integers in
the set {1, 2, . . . , 50}. Finally, we add 50 to produce random numbers between 51 and 100.
function [C,H]=twocoin(n);
C=ceil(2*rand(n,1));
P=1-(C/4);
H=(rand(n,1)< P);
The first line produces the n × 1 vector C such that C(i) indicates whether coin 1 or coin 2 is chosen
for trial i. Next, we generate the vector P such that P(i)=0.75 if C(i)=1; otherwise, if C(i)=2,
then P(i)=0.5. As a result, H(i) is the simulated result of a coin flip with heads, corresponding
to H(i)=1, occurring with probability P(i).
31
function C=bit100(n);
% n is the number of 100 bit packets sent
B=floor(2*rand(n,100));
P=0.03-0.02*B;
E=(rand(n,100)< P);
C=sum((sum(E,2)<=5));
First, B is an n × 100 matrix such that B(i,j) indicates whether bit i of packet j is zero or one.
Next, we generate the n×100 matrix P such that P(i,j)=0.03 if B(i,j)=0; otherwise, if B(i,j)=1,
then P(i,j)=0.01. As a result, E(i,j) is the simulated error indicator for bit i of packet j. That
is, E(i,j)=1 if bit i of packet j is in error; otherwise E(i,j)=0. Next we sum across the rows of
E to obtain the number of errors in each packet. Finally, we count the number of packets with 5 or
more errors.
For n = 100 packets, the packet success probability is inconclusive. Experimentation will show
that C=97, C=98, C=99 and C=100 correct packets are typica values that might be observed. By
increasing n, more consistent results are obtained. For example, repeated trials with n = 100, 000
packets typically produces around C = 98, 400 correct packets. Thus 0.984 is a reasonable estimate
for the probability of a packet being transmitted correctly.
function N=reliable6(n,q);
% n is the number of 6 component devices
%N is the number of working devices
W=rand(n,6)>q;
D=(W(:,1)&W(:,2)&W(:,3))|W(:,4);
D=D&(W(:,5)|W(:,6));
N=sum(D);
The n×6 matrix W is a logical matrix such that W(i,j)=1 if component j of device i works properly.
Because W is a logical matrix, we can use the Matlab logical operators | and & to implement the
logic requirements for a working device. By applying these logical operators to the n × 1 columns
of W, we simulate the test of n circuits. Note that D(i)=1 if device i works. Otherwise, D(i)=0.
Lastly, we count the number N of working devices. The following code snippet produces ten sample
runs, where each sample run tests n=100 devices for q = 0.2.
>> for n=1:10, w(n)=reliable6(100,0.2); end
>> w
w =
82 87 87 92 91 85 85 83 90 89
>>
As we see, the number of working devices is typically around 85 out of 100. Solving Problem 1.10.1,
will show that the probability the device works is actually 0.8663.
32
function n=countequal(x,y)
%Usage: n=countequal(x,y)
%n(j)= # elements of x = y(j)
[MX,MY]=ndgrid(x,y);
%each column of MX = x
%each row of MY = y
n=(sum((MX==MY),1))’;
for countequal is quite short (just two lines excluding comments) but needs some explanation.
The key is in the operation
[MX,MY]=ndgrid(x,y).
The Matlab built-in function ndgrid facilitates plotting a function g(x, y) as a surface over the
x, y plane. The x, y plane is represented by a grid of all pairs of points x(i), y(j). When x has n
elements, and y has m elements, ndgrid(x,y) creates a grid (an n × m array) of all possible pairs
[x(i) y(j)]. This grid is represented by two separate n × m matrices: MX and MY which indicate
the x and y values at each grid point. Mathematically,
Next, C=(MX==MY) is an n × m array such that C(i,j)=1 if x(i)=y(j); otherwise C(i,j)=0. That
is, the jth column of C indicates indicates which elements of x equal y(j). Lastly, we sum along
each column j to count number of elements of x equal to y(j). That is, we sum along column j to
count the number of occurrences (in x) of y(j).
function N=ultrareliable6(n,q);
% n is the number of 6 component devices
%N is the number of working devices
for r=1:6,
W=rand(n,6)>q;
R=rand(n,1)>(q/2);
W(:,r)=R;
D=(W(:,1)&W(:,2)&W(:,3))|W(:,4);
D=D&(W(:,5)|W(:,6));
N(r)=sum(D);
end
This above code is based on the code for the solution of Problem 1.11.4. The n × 6 matrix W is a
logical matrix such that W(i,j)=1 if component j of device i works properly. Because W is a logical
matrix, we can use the Matlab logical operators | and & to implement the logic requirements for
a working device. By applying these logical opeators to the n × 1 columns of W, we simulate the
test of n circuits. Note that D(i)=1 if device i works. Otherwise, D(i)=0. Note that in the code,
we first generate the matrix W such that each component has failure probability q. To simulate the
replacement of the jth device by the ultrareliable version by replacing the jth column of W by the
column vector R in which a device has failure probability q/2. Lastly, for each column replacement,
we count the number N of working devices. A sample run for n = 100 trials and q = 0.2 yielded
these results:
33
>> ultrareliable6(100,0.2)
ans =
93 89 91 92 90 93
From the above, we see, for example, that replacing the third component with an ultrareliable
component resulted in 91 working devices. The results are fairly inconclusive in that replacing
devices 1, 2, or 3 should yield the same probability of device failure. If we experiment with
n = 10, 000 runs, the results are more definitive:
>> ultrareliable6(10000,0.2)
ans =
8738 8762 8806 9135 8800 8796
>> >> ultrareliable6(10000,0.2)
ans =
8771 8795 8806 9178 8886 8875
>>
In both cases, it is clear that replacing component 4 maximizes the device reliability. The somewhat
complicated solution of Problem 1.10.4 will confirm this observation.
34
Problem Solutions – Chapter 2
(a) We wish to find the value of c that makes the PMF sum up to one.
c(1/2)n n = 0, 1, 2
PN (n) = (1)
0 otherwise
2
Therefore, n=0 PN (n) = c + c/2 + c/4 = 1, implying c = 4/7.
(b) The probability that N ≤ 1 is
P [N ≤ 1] = P [N = 0] + P [N = 1] = 4/7 + 2/7 = 6/7 (2)
Hence c = 1/30.
(b) Let U = {u2 |u = 1, 2, . . .} so that
1 42 17
P [V ∈ U ] = PV (1) + PV (4) = + = (2)
30 30 30
(c) The probability that V is even is
22 42 2
P [V is even] = PV (2) + PV (4) = + = (3)
30 30 3
(d) The probability that V > 2 is
32 42 5
P [V > 2] = PV (3) + PV (4) = + = (4)
30 30 6
35
Problem 2.2.4 Solution
Thus c = 8/7.
(b)
8 2
P [X = 4] = PX (4) = = (2)
7·4 7
(c)
8 4
P [X < 4] = PX (2) = = (3)
7·2 7
(d)
8 8 3
P [3 ≤ X ≤ 9] = PX (4) + PX (8) = + = (4)
7·4 7·8 7
1−p B •Y =0 1−p B •Y =1
p G p G •Y =2
36
Problem 2.2.8 Solution
From the problem statement, a single is twice as likely as a double, which is twice as likely as a
triple, which is twice as likely as a home-run. If p is the probability of a home run, then
PB (4) = p PB (3) = 2p PB (2) = 4p PB (1) = 8p (1)
Since a hit of any kind occurs with probability of .300, p + 2p + 4p + 8p = 0.300 which implies
p = 0.02. Hence, the PMF of B is
⎧
⎪
⎪ 0.70 b=0
⎪
⎪
⎪
⎪ 0.16 b=1
⎨
0.08 b=2
PB (b) = (2)
⎪
⎪ 0.04 b=3
⎪
⎪
⎪
⎪ 0.02 b=4
⎩
0 otherwise
(a) In the setup of a mobile call, the phone will send the “SETUP” message up to six times.
Each time the setup message is sent, we have a Bernoulli trial with success probability p. Of
course, the phone stops trying as soon as there is a success. Using r to denote a successful
response, and n a non-response, the sample tree is
(b) We can write the PMF of K, the number of “SETUP” messages sent as
⎧
⎨ (1 − p)k−1 p k = 1, 2, . . . , 5
PK (k) = (1 − p) p + (1 − p) = (1 − p) k = 6
5 6 5 (1)
⎩
0 otherwise
Note that the expression for PK (6) is different because K = 6 if either there was a success or
a failure on the sixth attempt. In fact, K = 6 whenever there were failures on the first five
attempts which is why PK (6) simplifies to (1 − p)5 .
(c) Let B denote the event that a busy signal is given after six failed setup attempts. The
probability of six consecutive failures is P [B] = (1 − p)6 .
(d) To be sure that P [B] ≤ 0.02, we need p ≥ 1 − (0.02)1/6 = 0.479.
(a) If it is indeed true that Y , the number of yellow M&M’s in a package, is uniformly distributed
between 5 and 15, then the PMF of Y , is
1/11 y = 5, 6, 7, . . . , 15
PY (y) = (1)
0 otherwise
37
(b)
P [Y < 10] = PY (5) + PY (6) + · · · + PY (9) = 5/11 (2)
(c)
P [Y > 12] = PY (13) + PY (14) + PY (15) = 3/11 (3)
(d)
P [8 ≤ Y ≤ 12] = PY (8) + PY (9) + · · · + PY (12) = 5/11 (4)
(a) Each paging attempt is an independent Bernoulli trial with success probability p. The number
of times K that the pager receives a message is the number of successes in n Bernoulli trials
and has the binomial PMF
n k
k p (1 − p)
n−k k = 0, 1, . . . , n
PK (k) = (1)
0 otherwise
(b) Let R denote the event that the paging message was received at least once. The event R has
probability
P [R] = P [B > 0] = 1 − P [B = 0] = 1 − (1 − p)n (2)
To ensure that P [R] ≥ 0.95 requires that n ≥ ln(0.05)/ ln(1 − p). For p = 0.8, we must have
n ≥ 1.86. Thus, n = 2 pages would be necessary.
(a) Let X be the number of times the frisbee is thrown until the dog catches it and runs away.
Each throw of the frisbee can be viewed as a Bernoulli trial in which a success occurs if the
dog catches the frisbee an runs away. Thus, the experiment ends on the first success and X
has the geometric PMF
(1 − p)x−1 p x = 1, 2, . . .
PX (x) = (1)
0 otherwise
(b) The child will throw the frisbee more than four times iff there are failures on the first 4
trials which has probability (1 − p)4 . If p = 0.2, the probability of more than four throws is
(0.8)4 = 0.4096.
38
Problem 2.3.5 Solution
Each paging attempt is a Bernoulli trial with success probability p where a success occurs if the
pager receives the paging message.
(a) The paging message is sent again and again until a success occurs. Hence the number of
paging messages is N = n if there are n − 1 paging failures followed by a paging success. That
is, N has the geometric PMF
(1 − p)n−1 p n = 1, 2, . . .
PN (n) = (1)
0 otherwise
(b) The probability that no more three paging attempts are required is
∞
P [N ≤ 3] = 1 − P [N > 3] = 1 − PN (n) = 1 − (1 − p)3 (2)
n=4
This answer can be obtained without calculation since N > 3 if the first three paging attempts
fail and that event occurs with probability (1 − p)3 . Hence, we must choose p to satisfy
1 − (1 − p)3 ≥ 0.95 or (1 − p)3 ≤ 0.05. This implies
500,000
P [B > 500,000] = 1 − PB (b) (1)
b=1
500,000
=1−p (1 − p)b−1 (2)
b=1
500,000
Math Fact B.4 implies that (1 − x) b=1 xb−1 = 1 − x500,000 . Substituting, x = 1 − p, we obtain:
(a) From the definition of the Poisson PMF, the PMF of B, the number of buses in T minutes,
is
(T /5)b e−T /5 /b! b = 0, 1, . . .
PB (b) = (1)
0 otherwise
(b) Choosing T = 2 minutes, the probability that three buses arrive in a two minute interval is
39
(c) By choosing T = 10 minutes, the probability of zero buses arriving in a ten minute interval is
(a) If each message is transmitted 8 times and the probability of a successful transmission is p,
then the PMF of N , the number of successful transmissions has the binomial PMF
8 n
n p (1 − p)
8−n n = 0, 1, . . . , 8
PN (n) = (1)
0 otherwise
(b) The indicator random variable I equals zero if and only if N = 8. Hence,
P [I = 0] = P [N = 0] = 1 − P [I = 1] (2)
40
Problem 2.3.10 Solution
(a) We can view whether each caller knows the birthdate as a Bernoulli trial. As a result, L is
the number of trials needed for 6 successes. That is, L has a Pascal PMF with parameters
p = 0.75 and k = 6 as defined by Definition 2.8. In particular,
l−1 6 l−6 l = 6, 7, . . .
PL (l) = 5 (0.75) (0.25) (1)
0 otherwise
(c) The probability that the station will need nine or more calls to find a winner is
P [L ≥ 9] = 1 − P [L < 9] (3)
= 1 − PL (6) − PL (7) − PL (8) (4)
= 1 − (0.75) [1 + 6(0.25) + 21(0.25) ] ≈ 0.321
6 2
(5)
(a) Since each day is independent of any other day, P [W33 ] is just the probability that a winning
lottery ticket was bought. Similarly for P [L87 ] and P [N99 ] become just the probability that a
losing ticket was bought and that no ticket was bought on a single day, respectively. Therefore
(b) Suppose we say a success occurs on the kth trial if on day k we buy a ticket. Otherwise, a
failure occurs. The probability of success is simply 1/2. The random variable K is just the
number of trials until the first success and has the geometric PMF
(1/2)(1/2)k−1 = (1/2)k k = 1, 2, . . .
PK (k) = (2)
0 otherwise
41
(c) The probability that you decide to buy a ticket and it is a losing ticket is (1−p)/2, independent
of any other day. If we view buying a losing ticket as a Bernoulli success, R, the number of
losing lottery tickets bought in m days, has the binomial PMF
m
r [(1 − p)/2] [(1 + p)/2]
r m−r r = 0, 1, . . . , m
PR (r) = (3)
0 otherwise
(d) Letting D be the day on which the j-th losing ticket is bought, we can find the probability
that D = d by noting that j − 1 losing tickets must have been purchased in the d − 1 previous
days. Therefore D has the Pascal PMF
j−1 [(1 − p)/2] [(1 + p)/2]
d−1 j d−j d = j, j + 1, . . .
PD (d) = (4)
0 otherwise
(a) Let Sn denote the event that the Sixers win the series in n games. Similarly, Cn is the event
that the Celtics in in n games. The Sixers win the series in 3 games if they win three straight,
which occurs with probability
P [S3 ] = (1/2)3 = 1/8 (1)
The Sixers win the series in 4 games if they win two out of the first three games and they
win the fourth game so that
3
P [S4 ] = (1/2)3 (1/2) = 3/16 (2)
2
The Sixers win the series in five games if they win two out of the first four games and then
win game five. Hence,
4
P [S5 ] = (1/2)4 (1/2) = 3/16 (3)
2
By symmetry, P [Cn ] = P [Sn ]. Further we observe that the series last n games if either the
Sixers or the Celtics win the series in n games. Thus,
Consequently, the total number of games, N , played in a best of 5 series between the Celtics
and the Sixers can be described by the PMF
⎧
⎪
⎪ 2(1/2)3 = 1/4 n=3
⎨ 3
21(1/2)4 = 3/8 n=4
PN (n) = (5)
⎪
⎪ 2 4 (1/2)5 = 3/8 n=5
⎩ 2
0 otherwise
(b) For the total number of Celtic wins W , we note that if the Celtics get w < 3 wins, then the
Sixers won the series in 3 + w games. Also, the Celtics win 3 games if they win the series in
3,4, or 5 games. Mathematically,
P [S3+w ] w = 0, 1, 2
P [W = w] = (6)
P [C3 ] + P [C4 ] + P [C5 ] w = 3
42
Thus, the number of wins by the Celtics, W , has the PMF shown below.
⎧
⎪
⎪ P [S3 ] = 1/8 w=0
⎪
⎪
⎨ P [S4 ] = 3/16 w=1
PW (w) = P [S5 ] = 3/16 w=2 (7)
⎪
⎪
⎪ 1/8 + 3/16 + 3/16 = 1/2
⎪ w=3
⎩
0 otherwise
(c) The number of Celtic losses L equals the number of Sixers’ wins WS . This implies PL (l) =
PWS (l). Since either team is equally likely to win any game, by symmetry, PWS (w) = PW (w).
This implies PL (l) = PWS (l) = PW (l). The complete expression of for the PMF of L is
⎧
⎪
⎪ 1/8 l = 0
⎪
⎪
⎨ 3/16 l = 1
PL (l) = PW (l) = 3/16 l = 2 (8)
⎪
⎪
⎪ 1/2 l = 3
⎪
⎩
0 otherwise
n n
n
PK (k) = pk (1 − p)n−k (1)
k
k=0 k=0
n
k
n−k
n a b
= (2)
k a+b a+b
k=0
n n k n−k
a b
= k=0 k n (3)
(a + b)
n
Since k=0 PK (k) = 1, we see that
n n
n n n
(a + b) = (a + b) PK (k) = ak bn−k (4)
k
k=0 k=0
(a) P [Y < 1] = 0
(b) P [Y ≤ 1] = 1/4
(e) P [Y = 1] = 1/4
43
(f) P [Y = 3] = 1/2
(g) From the staircase CDF of Problem 2.4.1, we see that Y is a discrete random variable. The
jumps in the CDF occur at at the values that Y can take on. The height of each jump equals
the probability of that value. The PMF of Y is
⎧
⎪
⎪ 1/4 y = 1
⎨
1/4 y = 2
PY (y) = (1)
⎪
⎪ 1/2 y = 3
⎩
0 otherwise
0.6
0.4 0.2 −1 ≤ x < 0
FX (x) = (1)
0.2
⎪ 0.7
⎪ 0≤x<1
0 ⎩
−2 −1 0 1 2 1 x≥1
x
(a) Similar to the previous problem, the graph of the CDF is shown below.
1 ⎧
0.8 ⎪
⎪ 0 x < −3
⎨
FX(x)
0.6
0.4 0.4 −3 ≤ x < 5
FX (x) = (1)
0.2
⎪
⎪ 0.8 5 ≤ x < 7
0 ⎩
−3 0 5 7 1 x≥7
x
44
Problem 2.4.4 Solution
Let q = 1 − p, so the PMF of the geometric (p) random variable K is
k−1
pq k = 1, 2, . . . ,
PK (k) = (1)
0 otherwise.
k
k
FK (k) = PK (j) = pq j−1 = 1 − q k . (2)
j=1 j=1
Since K is integer valued, FK (k) = FK (k) for all integer and non-integer values of k. (If this
point is not clear, you should review Example 2.24.) Thus, the complete expression for the CDF
of K is
0 k < 1,
FK (k) = k (3)
1 − (1 − p) k ≥ 1.
n
n
FN (n) = PN (i) = (1/3)i (2/3) = 1 − (1/3)n+1 (2)
i=0 i=0
A complete expression for FN (n) must give a valid answer for every value of n, including non-integer
values. We can write the CDF using the floor function x which denote the largest integer less
than or equal to X. The complete expression for the CDF is
⎧
⎨ 0 x<0
FN (x) = 1 − (1/3)x+1 0 ≤ x < 100 (3)
⎩
1 x ≥ 100
45
The corresponding CDF is
⎧
⎪
⎪ 0 b<0
1 ⎪
⎪ 0≤b<1
0.75 ⎪
⎪ 0.70
⎨
FB(b)
1 1 1
0.75 0.75 0.75
FY(y)
FY(y)
FY(y)
0.5 0.5 0.5
0.25 0.25 0.25
0 0 0 (2)
−1 0 1 2 3 −1 0 1 2 3 −1 0 1 2 3
y y y
p = 1/4 p = 1/2 p = 3/4
0.5
0.25 FN (n) = 7/8 3≤n<4 (3)
⎪
⎪
0 ⎪ 15/16
⎪ 4≤n<5
0 1 2 3 4 5 6 7 ⎪
⎪
⎪
⎪ 31/32 5≤n<6
n ⎩
1 n≥6
46
Problem 2.5.1 Solution
For this problem, we just need to pay careful attention to the definitions of mode and median.
(a) The mode must satisfy PX (xmod ) ≥ PX (x) for all x. In the case of the uniform PMF, any
integer x between 1 and 100 is a mode of the random variable X. Hence, the set of all modes
is
Xmod = {1, 2, . . . , 100} (1)
(b) The median must satisfy P [X < xmed ] = P [X > xmed ]. Since
In fact, for any x satisfying 50 < x < 51, P [X < x ] = P [X > x ] = 1/2. Thus,
(a) Since each call is either a voice or data call, the cost of one call can only take the two values
associated with the cost of each type of call. Therefore the PMF of X is
⎧
⎨ 0.6 x = 20
PX (x) = 0.4 x = 30 (1)
⎩
0 otherwise
(b) The expected cost, E[C], is simply the sum of the cost of each type of call multiplied by the
probability of such a call occurring.
47
Problem 2.5.4 Solution
From the solution to Problem 2.4.2, the PMF of X is
⎧
⎪
⎪ 0.2 x = −1
⎨
0.5 x=0
PX (x) = (1)
⎪
⎪ 0.3 x=1
⎩
0 otherwise
The expected value of X is
E [X] = xPX (x) = −1(0.2) + 0(0.5) + 1(0.3) = 0.1 (2)
x
48
Problem 2.5.8 Solution
The following experiments are based on a common model of packet transmissions in data networks.
In these networks, each data packet contains a cylic redundancy check (CRC) code that permits
the receiver to determine whether the packet was decoded correctly. In the following, we assume
that a packet is corrupted with probability = 0.001, independent of whether any other packet is
corrupted.
(a) Let X = 1 if a data packet is decoded correctly; otherwise X = 0. Random variable X is a
Bernoulli random variable with PMF
⎧
⎨ 0.001 x=0
PX (x) = 0.999 x=1 (1)
⎩
0 otherwise
The parameter = 0.001 is the probability a packet is corrupted. The expected value of X is
E [X] = 1 − = 0.999 (2)
(b) Let Y denote the number of packets received in error out of 100 packets transmitted. Y has
the binomial PMF
100 y 100−y y = 0, 1, . . . , 100
PY (y) = y (0.001) (0.999) (3)
0 otherwise
The expected value of Y is
E [Y ] = 100 = 0.1 (4)
(c) Let L equal the number of packets that must be received to decode 5 packets in error. L has
the Pascal PMF l−1 5 l−5 l = 5, 6, . . .
PL (l) = 4 (0.001) (0.999) (5)
0 otherwise
The expected value of L is
5 5
E [L] = = = 5000 (6)
0.001
(d) If packet arrivals obey a Poisson model with an average arrival rate of 1000 packets per
second, then the number N of packets that arrive in 5 seconds has the Poisson PMF
5000n e−5000 /n! n = 0, 1, . . .
PN (n) = (7)
0 otherwise
The expected value of N is E[N ] = 5000.
49
Problem 2.5.10 Solution
By the definition of the expected value,
n
n x
E [Xn ] = x p (1 − p)n−x (1)
x
x=1
n
(n − 1)!
= np px−1 (1 − p)n−1−(x−1) (2)
(x − 1)!(n − 1 − (x − 1))!
x=1
The above sum is 1 because it is the sum of a binomial random variable for n − 1 trials over all
possible values.
At this point, the key step is to reverse the order of summation. You may need to make a sketch
of the feasible values for i and j to see how this reversal occurs. In this case,
∞
∞
j−1 ∞
P [X > i] = PX (j) = jPX (j) = E [X] (2)
i=0 j=1 i=0 j=1
50
(b) From the PMF, it is straighforward to write down the CDF.
⎧
⎪
⎪ 0 u<1
⎨
1/4 1 ≤ u < 4
FU (u) = (5)
⎪
⎪ 1/2 4 ≤ u < 9
⎩
1 u≥9
In particular,
51
Problem 2.6.3 Solution
From the solution to Problem 2.4.3, the PMF of X is
⎧
⎪
⎪ 0.4 x = −3
⎨
0.4 x=5
PX (x) = (1)
⎪
⎪ 0.2 x=7
⎩
0 otherwise
This implies
PW (−7) = PX (7) = 0.2 PW (−5) = PX (5) = 0.4 PW (3) = PX (−3) = 0.4 (3)
52
Problem 2.6.5 Solution
(a) The source continues to transmit packets until one is received correctly. Hence, the total
number of times that a packet is transmitted is X = x if the first x − 1 transmissions were in
error. Therefore the PMF of X is
x−1
q (1 − q) x = 1, 2, . . .
PX (x) = (1)
0 otherwise
(b) The time required to send a packet is a millisecond and the time required to send an acknowl-
edgment back to the source takes another millisecond. Thus, if X transmissions of a packet
are needed to send the packet correctly, then the packet is correctly received after T = 2X − 1
milliseconds. Therefore, for an odd integer t > 0, T = t iff X = (t + 1)/2. Thus,
(t−1)/2
q (1 − q) t = 1, 3, 5, . . .
PT (t) = PX ((t + 1)/2) = (2)
0 otherwise
53
Using Definition 2.14,
E [T ] = tPT (t) = 0.3(75) + 0.3(90) + 0.3(105) + 0.1(120) = 93 (4)
t∈ST
Since each winning ticket grosses $1000, the revenue we collect over 50 years is R = 1000T dollars.
The expected revenue is
E [R] = 1000E [T ] = 18250 (2)
But buying a lottery ticket everyday for 50 years, at $2.00 a pop isn’t cheap and will cost us a total
of 18250 · 2 = $36500. Our net profit is then Q = R − 36500 and the result of our loyal 50 year
patronage of the lottery system, is disappointing expected loss of
4
E [D] = d · PD (d) = 1(0.2) + 2(0.4) + 3(0.3) + 4(0.1) = 2.3 (1)
d=1
54
(c) C can be expressed as a function of D in the following manner.
⎧
⎪
⎪ 90 D = 1
⎨
70 D = 2
C(D) = (3)
⎪
⎪ 40 D = 3
⎩
40 D = 4
For this cellular billing plan, we are given no free minutes, but are charged half the flat fee. That
is, we are going to pay 15 dollars regardless and $1 for each minute we use the phone. Hence
C = 15 + M and for c ≥ 16, P [C = c] = P [M = c − 15]. Thus we can construct the PMF of the
cost C
(1 − p)c−16 p c = 16, 17, . . .
PC (c) = (2)
0 otherwise
55
Since C = 15 + M , the expected cost per month of the plan is
In Problem 2.7.5, we found that that the expected cost of the plan was
In comparing the expected costs of the two plans, we see that the new plan is better (i.e. cheaper)
if
15 + 1/p ≤ 20 + [(1 − p)30 ]/(2p) (5)
A simple plot will show that the new plan is better if p ≤ p0 ≈ 0.2.
1
E [rs (W )] = PW (w) rs (w) (2)
w=0
= PW (0) (−10) + PW (1) (k − 10) (3)
= (1 − (1 − q)10 )(−10) + (1 − q)10 (k − 10) = (0.9)10 k − 10. (4)
1
E [ru (W )] = PW (w) ru (w) (6)
w=0
= PW (0) (−30) + PW (1) (k − 30) (7)
= (1 − (1 − q/2) )(−30) + (1 − q/2) (k − 30) = (0.95) k − 30
10 10 10
(8)
To determine which implementation generates the most profit, we solve E[Ru ] ≥ E[Rs ], yielding
k ≥ 20/[(0.95)10 − (0.9)10 ] = 80.21. So for k < $80.21 using all standard devices results in greater
56
revenue, while for k > $80.21 more revenue will be generated by implementing all ultra-reliable
devices. That is, when the price commanded for a working circuit is sufficiently high, we should
build more-expensive higher-reliability circuits.
If you have read ahead to Section 2.9 and learned about conditional expected values, you might prefer
the following solution. If not, you might want to come back and review this alternate approach after
reading Section 2.9.
Let W denote the event that a circuit works. The circuit works and generates revenue of k
dollars if all of its 10 constituent devices work. For each implementation, standard or ultra-reliable,
let R denote the profit on a device. We can express the expected profit as
Let’s first consider the case when only standard devices are used. In this case, a circuit works
with probability P [W ] = (1 − q)10 . The profit made on a working device is k − 10 dollars while a
nonworking circuit has a profit of -10 dollars. That is, E[R|W ] = k − 10 and E[R|W c ] = −10. Of
course, a negative profit is actually a loss. Using Rs to denote the profit using standard circuits,
the expected profit is
And for the ultra-reliable case, the circuit works with probability P [W ] = (1−q/2)10 . The profit per
working circuit is E[R|W ] = k − 30 dollars while the profit for a nonworking circuit is E[R|W c ] =
−30 dollars. The expected profit is
Not surprisingly, we get the same answers for E[Ru ] and E[Rs ] as in the first solution by performing
essentially the same calculations. it should be apparent that indicator random variable W in the
first solution indicates the occurrence of the conditioning event W in the second solution. That is,
indicators are a way to track conditioning events.
46
(a) There are 6 equally likely winning combinations so that
1 1
q = 46 = ≈ 1.07 × 10−7 (1)
6
9,366,819
(b) Assuming each ticket is chosen randomly, each of the 2n − 1 other tickets is independently a
winner with probability q. The number of other winning tickets Kn has the binomial PMF
2n−1 k
k q (1 − q)2n−1−k k = 0, 1, . . . , 2n − 1
PKn (k) = (2)
0 otherwise
(c) Since there are Kn + 1 winning tickets in all, the value of your winning ticket is Wn =
n/(Kn + 1) which has mean
1
E [Wn ] = nE (3)
Kn + 1
57
Calculating the expected value
1
2n−1
1
E = PKn (k) (4)
Kn + 1 k+1
k=0
is fairly complicated. The trick is to express the sum in terms of the sum of a binomial
PMF.
2n−1
1
1 (2n − 1)!
E = q k (1 − q)2n−1−k (5)
Kn + 1 k + 1 k!(2n − 1 − k)!
k=0
1
2n−1
(2n)!
= q k (1 − q)2n−(k+1) (6)
2n (k + 1)!(2n − (k + 1))!
k=0
58
Problem 2.7.9 Solution
46
(a) There are 6 equally likely winning combinations so that
1 1
q = 46 = ≈ 1.07 × 10−7 (1)
6
9,366,819
(b) Assuming each ticket is chosen randomly, each of the 2n − 1 other tickets is independently a
winner with probability q. The number of other winning tickets Kn has the binomial PMF
2n−1 k
k q (1 − q)2n−1−k k = 0, 1, . . . , 2n − 1
PKn (k) = (2)
0 otherwise
Since the pot has n + r dollars, the expected amount that you win on your ticket is
n+r 1
E [V ] = 0(1 − q) + qE = q(n + r)E (3)
Kn + 1 Kn + 1
Note that E[1/Kn + 1] was also evaluated in Problem 2.7.8. For completeness, we repeat
those steps here.
2n−1
1
1 (2n − 1)!
E = q k (1 − q)2n−1−k (4)
Kn + 1 k + 1 k!(2n − 1 − k)!
k=0
1
2n−1
(2n)!
= q k (1 − q)2n−(k+1) (5)
2n (k + 1)!(2n − (k + 1))!
k=0
We observe that the above sum labeled A is the sum of a binomial PMF for 2n trials and
success probability q over all possible values except j = 0. Thus A = 1 − 2n
0 q (1 − q)
0 2n−0 ,
which implies
1 A 1 − (1 − q)2n
E = = (8)
Kn + 1 2nq 2nq
The expected value of your ticket is
59
a sufficiently large pot r will guarantee that E[V ] > 1. For example if n = 107 , (20 million
tickets sold) then r
E [V ] = 0.44 1 + 7 (10)
10
If the carryover pot r is 30 million dollars, then E[V ] = 1.76. This suggests that buying a
one dollar ticket is a good idea. This is an unusual situation because normally a carryover
pot of 30 million dollars will result in far more than 20 million tickets being sold.
(c) So that we can use the results of the previous part, suppose there were 2n − 1 tickets sold
before you must make your decision. If you buy one of each possible ticket, you are guaranteed
to have one winning ticket. From the other 2n − 1 tickets, there will be Kn winners. The
total number of winning tickets will be Kn + 1. In the previous part we found that
1 1 − (1 − q)2n
E = (11)
Kn + 1 2nq
Let R denote the expected return from buying one of each possible ticket. The pot had
r dollars beforehand. The 2n − 1 other tickets are sold add n − 1/2 dollars to the pot.
Furthermore, you must buy 1/q tickets, adding 1/(2q) dollars to the pot. Since the cost of
the tickets is 1/q dollars, your expected profit
r + n − 1/2 + 1/(2q) 1
E [R] = E − (12)
Kn + 1 q
q(2r + 2n − 1) + 1 1 1
= E − (13)
2q Kn + 1 q
[q(2r + 2n − 1) + 1](1 − (1 − q)2n ) 1
= − (14)
4nq 2 q
For fixed n, sufficiently large r will make E[R] > 0. On the other hand, for fixed r,
limn→∞ E[R] = −1/(2q). That is, as n approaches infinity, your expected loss will be quite
large.
60
Problem 2.8.2 Solution
From the solution to Problem 2.4.1, the PMF of Y is
⎧
⎪
⎪ 1/4 y=1
⎨
1/4 y=2
PY (y) = (1)
⎪
⎪ 1/2 y=3
⎩
0 otherwise
The variance of Y is
Var[Y ] = E Y 2 − (E [Y ])2 = 23/4 − (9/4)2 = 11/16 (4)
The variance of X is
Var[X] = E X 2 − (E [X])2 = 0.5 − (0.1)2 = 0.49 (4)
61
The expected value of X is
E [X] = xPX (x) = −3(0.4) + 5(0.4) + 7(0.2) = 2.2 (2)
x
The variance of X is
Var[X] = E X 2 − (E [X])2 = 23.4 − (2.2)2 = 18.56 (4)
62
The expected value of X 2 is
5
2
E X = x2 PX (x) (4)
x=0
5 1
2 2 5 1 2 5 1 2 5 1 2 5 1 2 5 1
=0 +1 +2 +3 +4 +5 (5)
0 25 1 25 2 25 3 25 4 25 5 25
= [5 + 40 + 90 + 80 + 25]/25 = 240/32 = 15/2 (6)
The variance of X is
Var[X] = E X 2 − (E [X])2 = 15/2 − 25/4 = 5/4 (7)
By taking the square root of the variance, the standard deviation of X is σX = 5/4 ≈ 1.12.
(b) The probability that X is within one standard deviation of its mean is
63
Problem 2.8.9 Solution
With our measure of jitter being σT , and the fact that T = 2X − 1, we can express the jitter as a
function of q by realizing that
4q
Var[T ] = 4 Var[X] = (1)
(1 − q)2
Therefore, our maximum permitted jitter is
√
2 q
σT = = 2 msec (2)
(1 − q)
Solving for q yields q 2 − 3q + 1 = 0. By solving this quadratic equation, we obtain
√
3± 5 √
q= = 3/2 ± 5/2 (3)
2
√
Since q must be a value between 0 and 1, we know that a value of q = 3/2 − 5/2 ≈ 0.382 will
ensure a jitter of at most 2 milliseconds.
Solving for the value of x̂ that makes the derivative de(x̂)/dx̂ equal to zero results in the value of
x̂ that minimizes e(x̂). Note that when we take the derivative with respect to x̂, both E[X 2 ] and
E[X] are simply constants.
d 2
E X − 2x̂E [X] + x̂2 = 2E [X] − 2x̂ = 0 (3)
dx̂
Hence we see that x̂ = E[X]. In the sense of mean squared error, the best guess for a random
variable is the mean value. In Chapter 9 this idea is extended to develop minimum mean squared
error estimation.
64
By factoring out λ2 and substituting j = k − 2, we obtain
∞
λj e−λ
E [K(K − 1)] = λ2 = λ2 (4)
j!
j=0
1
The above sum equals 1 because it is the sum of a Poisson PMF over all possible values. Since
E[K] = λ, the variance of K is
Var[K] = E K 2 − (E [K])2 (5)
= E [K(K − 1)] + E [K] − (E [K])2 (6)
= λ2 + λ − λ2 = λ (7)
where
2
4
E D = d2 PD (d) = 0.2 + 1.6 + 2.7 + 1.6 = 6.1 (2)
d=1
So finally we have √
σD = 6.1 − 2.32 = 0.81 = 0.9 (3)
The probability of the event B = {Y < 3} is P [B] = 1 − P [Y = 3] = 1/2. From Theorem 2.17, the
conditional PMF of Y given B is
⎧
PY (y) ⎨ 1/2 y = 1
P [B] y ∈ B
PY |B (y) = = 1/2 y = 2 (2)
0 otherwise ⎩
0 otherwise
The conditional first and second moments of Y are
E [Y |B] = yPY |B (y) = 1(1/2) + 2(1/2) = 3/2 (3)
y
2
E Y 2 |B = y PY |B (y) = 12 (1/2) + 22 (1/2) = 5/2 (4)
y
65
Problem 2.9.2 Solution
From the solution to Problem 2.4.2, the PMF of X is
⎧
⎪
⎪ 0.2 x = −1
⎨
0.5 x=0
PX (x) = (1)
⎪
⎪ 0.3 x=1
⎩
0 otherwise
The event B = {|X| > 0} has probability P [B] = P [X = 0] = 0.5. From Theorem 2.17, the
conditional PMF of X given B is
⎧
PX (x) ⎨ 0.4 x = −1
P [B] x∈B
PX|B (x) = = 0.6 x=1 (2)
0 otherwise ⎩
0 otherwise
The event B = {X > 0} has probability P [B] = PX (5) + PX (7) = 0.6. From Theorem 2.17, the
conditional PMF of X given B is
⎧
PX (x) ⎨ 2/3 x=5
P [B] x ∈ B
PX|B (x) = = 1/3 x=7 (2)
0 otherwise ⎩
0 otherwise
66
Problem 2.9.4 Solution
The event B = {X = 0} has probability P [B] = 1 − P [X = 0] = 15/16. The conditional PMF of
X given B is
4 1
PX (x)
P [B] x∈B x 15 x = 1, 2, 3, 4
PX|B (x) = = (1)
0 otherwise 0 otherwise
67
Problem 2.9.6 Solution
(a) Consider each circuit test as a Bernoulli trial such that a failed circuit is called a success.
The number of trials until the first success (i.e. a failed circuit) has the geometric PMF
(1 − p)n−1 p n = 1, 2, . . .
PN (n) = (1)
0 otherwise
Note that (1 − p)19 is just the probability that the first 19 circuits pass the test, which is
what we would expect since there must be at least 20 tests if the first 19 circuits pass. The
conditional PMF of N given B is
PN (n)
P [B] n∈B (1 − p)n−20 p n = 20, 21, . . .
PN |B (n) = = (3)
0 otherwise 0 otherwise
∞
E [N |B] = nPN |B (n) = n(1 − p)n−20 p (4)
n n=20
We see that in the above sum, we effectively have the expected value of J + 19 where J is
geometric random variable with parameter p. This is not surprising since the N ≥ 20 iff we
observed 19 successful tests. After 19 successful tests, the number of additional tests needed
to find the first failure is still a geometric random variable with mean 1/p.
P [M > 0] = 1 − P [M = 0] = 1 − q (2)
68
(b) The probability that we run a marathon on any particular day is the probability that M ≥ 26.
∞
r = P [M ≥ 26] = q(1 − q)m = (1 − q)26 (3)
m=26
(c) We run a marathon on each day with probability equal to r, and we do not run a marathon
with probability 1 − r. Therefore in a year we have 365 tests of our jogging resolve, and thus
365 chances to run a marathon. So the PMF of the number of marathons run in a year, J,
can be expressed as
365 j
j r (1 − r)
365−j j = 0, 1, . . . , 365
PJ (j) = (4)
0 otherwise
(d) The random variable K is defined as the number of miles we run above that required for a
marathon, K = M − 26. Given the event, A, that we have run a marathon, we wish to know
how many miles in excess of 26 we in fact ran. So we want to know the conditional PMF
PK|A (k).
P [K = k, A] P [M = 26 + k]
PK|A (k) = = (5)
P [A] P [A]
Since P [A] = r, for k = 0, 1, . . .,
(1 − q)26+k q
PK|A (k) = = (1 − q)k q (6)
(1 − q)26
The complete expression of for the conditional PMF of K is
(1 − q)k q k = 0, 1, . . .
PK|A (k) = (7)
0 otherwise
(a) The event that a fax was sent to machine A can be expressed mathematically as the event
that the number of pages X is an even number. Similarly, the event that a fax was sent
to B is the event that X is an odd number. Since SX = {1, 2, . . . , 8}, we define the set
A = {2, 4, 6, 8}. Using this definition for A, we have that the event that a fax is sent to A is
equivalent to the event X ∈ A. The event A has probability
69
The conditional first and second moments of X given A is
E [X|A] = xPX|A (x) = 2(0.3) + 4(0.3) + 6(0.2) + 8(0.2) = 4.6 (4)
x
2
E X 2 |A = x PX|A (x) = 4(0.3) + 16(0.3) + 36(0.2) + 64(0.2) = 26 (5)
x
(b) Let the event B denote the event that the fax was sent to B and that the fax had no more
than 6 pages. Hence, the event B = {1, 3, 5} has probability
P B = PX (1) + PX (3) + PX (5) = 0.4 (8)
The conditional PMF of X given B is
⎧
⎨ 3/8 x = 1, 3
PX (x)
P [B ] x∈ B
PX|B (x) = = 1/4 x = 5 (9)
0 otherwise ⎩
0 otherwise
Given the event B , the conditional first and second moments are
E X|B = xPX|B (x) = 1(3/8) + 3(3/8) + 5(1/4)+ = 11/4 (10)
x
2
E X 2 |B = x PX|B (x) = 1(3/8) + 9(3/8) + 25(1/4) = 10 (11)
x
In terms of Matlab, the efficient solution is to generate the vector of perfect squares x =
[0 1 4 9 16 ...] and then to pass that vector to the binomialpmf.m. In this case, the val-
ues of the binomial PMF are calculated only once. Here is the code:
function q=perfectbinomial(n,p);
i=0:floor(sqrt(n));
x=i.^2;
q=sum(binomialpmf(n,p,x));
For a binomial (100, 0.5) random variable X, the probability X is a perfect square is
>> perfectbinomial(100,0.5)
ans =
0.0811
70
Problem 2.10.2 Solution
The random variable X given in Example 2.29 is just a finite random variable. We can generate
random samples using the finiterv function. The code is
function x=faxlength8(m);
sx=1:8;
p=[0.15*ones(1,4) 0.1*ones(1,4)];
x=finiterv(sx,p,m);
function y=avgfax(m);
x=faxlength8(m);
yy=cumsum([10 9 8 7 6]);
yy=[yy 50 50 50];
y=sum(yy(x))/m;
Each time we perform the experiment of executing the function avgfax, we generate m random
1
m
samples of X, and m corresponding samples of Y . The sum Y = m i=1 Yi is random. For m = 10,
four samples of Y are
>> [avgfax(10) avgfax(10) avgfax(10) avgfax(10)]
ans =
31.9000 31.2000 29.6000 34.1000
>>
For m = 100, the results are arguably more consistent:
>> [avgfax(100) avgfax(100) avgfax(100) avgfax(100)]
ans =
34.5300 33.3000 29.8100 33.6900
>>
Finally, for m = 1000, we obtain results reasonably close to E[Y ]:
>> [avgfax(1000) avgfax(1000) avgfax(1000) avgfax(1000)]
ans =
32.1740 31.8920 33.1890 32.8250
>>
In Chapter 7, we will develop techniques to show how Y converges to E[Y ] as m → ∞.
The problem asks us to find the smallest value of k such that P [Xn ≤ k] ≥ 0.75. That is, if the
server caches the k most popular files, then with P [Xn ≤ k] the request is for one of the k cached
71
files. First, we might as well solve this problem for any probability p rather than just p = 0.75.
Thus, in math terms, we are looking for
k = min k |P Xn ≤ k ≥ p . (2)
What makes the Zipf distribution hard to analyze is that there is no closed form expression for
! n "−1
1
c(n) = . (3)
x
x=1
Thus, we use Matlab to grind through the calculations. The following simple program generates
the Zipf distributions and returns the correct value of k.
function k=zipfcache(n,p);
%Usage: k=zipfcache(n,p);
%for the Zipf (n,alpha=1) distribution, returns the smallest k
%such that the first k items have total probability p
pmf=1./(1:n);
pmf=pmf/sum(pmf); %normalize to sum to 1
cdf=cumsum(pmf);
k=1+sum(cdf<=p);
The program zipfcache generalizes 0.75 to be the probability p. Although this program is suffi-
cient, the problem asks us to find k for all values of n from 1 to 103 !. One way to do this is to call
zipfcache a thousand times to find k for each value of n. A better way is to use the properties of
the Zipf PDF. In particular,
k
1 c(n)
P Xn ≤ k = c(n) = (4)
x c(k )
x=1
72
Note that zipfcacheall uses a short Matlab program countless.m that is almost the same as
count.m introduced in Example 2.47. If n=countless(x,y), then n(i) is the number of elements
of x that are strictly less than y(i) while count returns the number of elements less than or equal
to y(i).
In any case, the commands
k=zipfcacheall(1000,0.75);
plot(1:1000,k);
is sufficient to produce this figure of k as a function of m:
200
150
100
k
50
0
0 100 200 300 400 500 600 700 800 900 1000
n
We see in the figure that the number of files that must be cached grows slowly with the total
number of files n.
Finally, we make one last observation. It is generally desirable for Matlab to execute opera-
tions in parallel. The program zipfcacheall generally will run faster than n calls to zipfcache.
However, to do its counting all at once, countless generates and n × n array. When n is not too
large, say n ≤ 1000, the resulting array with n2 = 1,000,000 elements fits in memory. For much
large values of n, say n = 106 (as was proposed in the original printing of this edition of the text,
countless will cause an “out of memory” error.
function diff=poissontest(alpha,m)
x=poissonrv(alpha,m);
xr=0:ceil(3*alpha);
pxsample=hist(x,xr)/m;
pxsample=pxsample(:);
%pxsample=(countequal(x,xr)/m);
px=poissonpmf(alpha,xr);
plot(xr,pxsample,xr,px);
diff=sum((pxsample-px).^2);
For m = 100, 1000, 10000, here are sample plots comparing the PMF and the relative frequency.
The plots show reasonable agreement for m = 10000 samples.
73
0.2 0.2
0.1 0.1
0 0
0 5 10 15 0 5 10 15
(a) m = 100 (b) m = 100
0.2 0.2
0.1 0.1
0 0
0 5 10 15 0 5 10 15
(a) m = 1000 (b) m = 1000
0.2 0.2
0.1 0.1
0 0
0 5 10 15 0 5 10 15
(a) m = 10,000 (b) m = 10,000
For (n, p) = (10, 1), the binomial PMF has no randomness. For (n, p) = (100, 0.1), the approxima-
tion is reasonable:
1 0.2
0.8
0.15
0.6
0.1
0.4
0.05
0.2
0 0
0 5 10 15 20 0 5 10 15 20
(a) n = 10, p = 1 (b) n = 100, p = 0.1
Finally, for (n, p) = (1000, 0.01), and (n, p) = (10000, 0.001), the approximation is very good:
74
0.2 0.2
0.15 0.15
0.1 0.1
0.05 0.05
0 0
0 5 10 15 20 0 5 10 15 20
(a) n = 1000, p = 0.01 (b) n = 10000, p = 0.001
Subtracting 1 from each side and then multiplying through by −1 (which reverses the inequalities),
we obtain
∗ ∗
(1 − p)k −1 > 1 − R ≥ (1 − p)k . (3)
Next we take the logarithm of each side. Since logarithms are monotonic functions, we have
Since 0 < p < 1, we have that ln(1 − p) < 0. Thus dividing through by ln(1 − p) reverses the
inequalities, yielding
ln(1 − R)
k∗ − 1 > ≤ k∗ . (5)
ln(1 − p)
Since k ∗ is an integer, it must be the smallest integer greater than or equal to ln(1 − R)/ ln(1 − p).
That is, following the last step of the random sample algorithm,
% &
∗ ln(1 − R)
K=k = (6)
ln(1 − p)
function x=geometricrv(p,m)
%Usage: x=geometricrv(p,m)
% returns m samples of a geometric (p) rv
r=rand(m,1);
x=ceil(log(1-r)/log(1-p));
75
Problem 2.10.8 Solution
For the PC version of Matlab employed for this test, poissonpmf(n,n) reported Inf for n =
n∗ = 714. The problem with the poissonpmf function in Example 2.44 is that the cumulative
product that calculated nk /k! can have an overflow. Following the hint, we can write an alternate
poissonpmf function as follows:
function pmf=poissonpmf(alpha,x)
%Poisson (alpha) rv X,
%out=vector pmf: pmf(i)=P[X=x(i)]
x=x(:);
if (alpha==0)
pmf=1.0*(x==0);
else
k=(1:ceil(max(x)))’;
logfacts =cumsum(log(k));
pb=exp([-alpha; ...
-alpha+ (k*log(alpha))-logfacts]);
okx=(x>=0).*(x==floor(x));
x=okx.*x;
pmf=okx.*pb(x+1);
end
%pmf(i)=0 for zero-prob x(i)
By summing logarithms, the intermediate terms are much less likely to overflow.
76
Problem Solutions – Chapter 3
(a)
P [X > 1/2] = 1 − P [X ≤ 1/2] = 1 − FX (1/2) = 1 − 3/4 = 1/4 (2)
(b) This is a little trickier than it should be. Being careful, we can write
Since the CDF of X is a continuous function, the probability that X takes on any specific
value is zero. This implies P [X = 3/4] = 0 and P [X = −1/2] = 0. (If this is not clear at this
point, it will become clear in Section 3.6.) Thus,
P [−1/2 ≤ X < 3/4] = P [−1/2 < X ≤ 3/4] = FX (3/4) − FX (−1/2) = 5/8 (4)
(c)
P [|X| ≤ 1/2] = P [−1/2 ≤ X ≤ 1/2] = P [X ≤ 1/2] − P [X < −1/2] (5)
Note that P [X ≤ 1/2] = FX (1/2) = 3/4. Since the probability that P [X = −1/2] = 0,
P [X < −1/2] = P [X ≤ 1/2]. Hence P [X < −1/2] = FX (−1/2) = 1/4. This implies
(a) For V to be a continuous random variable, FV (v) must be a continuous function. This occurs
if we choose c such that FV (v) doesn’t have a discontinuity at v = 7. We meet this requirement
if c(7 + 5)2 = 1. This implies c = 1/144.
77
(b)
P [V > 4] = 1 − P [V ≤ 4] = 1 − FV (4) = 1 − 81/144 = 63/144 (2)
(c)
P [−3 < V ≤ 0] = FV (0) − FV (−3) = 25/144 − 4/144 = 21/144 (3)
(d) Since 0 ≤ FV (v) ≤ 1 and since FV (v) is a nondecreasing function, it must be that −5 ≤ a ≤ 7.
In this range,
P [V > a] = 1 − FV (a) = 1 − (a + 5)2 /144 = 2/3 (4)
√
The unique solution in the range −5 ≤ a ≤ 7 is a = 4 3 − 5 = 1.928.
(b)
P [−2 < W ≤ 2] = FW (2) − FW (−2) = 1/4 − 1/4 = 0. (3)
(c)
P [W > 0] = 1 − P [W ≤ 0] = 1 − FW (0) = 3/4 (4)
(d) By inspection of FW (w), we observe that P [W ≤ a] = FW (a) = 1/2 for a in the range
3 ≤ a ≤ 5. In this range,
FW (a) = 1/4 + 3(a − 3)/8 = 1/2 (5)
This implies a = 11/3.
(a) By definition, nx is the smallest integer that is greater than or equal to nx. This implies
nx ≤ nx ≤ nx + 1.
(b) By part (a),
nx nx nx + 1
≤ ≤ (1)
n n n
That is,
nx 1
x≤ ≤x+ (2)
n n
This implies
nx 1
x ≤ lim ≤ lim x + = x (3)
n→∞ n n→∞ n
78
(c) In the same way, nx is the largest integer that is less than or equal to nx. This implies
nx − 1 ≤ nx ≤ nx. It follows that
nx − 1 nx nx
≤ ≤ (4)
n n n
That is,
1 nx
x− ≤ ≤x (5)
n n
This implies
1 nx
lim x − = x ≤ lim ≤x (6)
n→∞ n n→∞ n
79
Problem 3.2.4 Solution
For x < 0, FX (x) = 0. For x ≥ 0,
' x
FX (x) = fX (y) dy (1)
'
0
x
2 2
= a2 ye−a y /2 dy (2)
0
$x
2 2 $ 2 2
= −e−a y /2 $ = 1 − e−a x /2 (3)
0
For the PDF to be non-negative for x ∈ [0, 1], we must have ax + 2 − 2a/3 ≥ 0 for all x ∈ [0, 1].
This requirement can be written as
a(2/3 − x) ≤ 2 (0 ≤ x ≤ 1) (4)
For x = 2/3, the requirement holds for all a. However, the problem is tricky because we must
consider the cases 0 ≤ x < 2/3 and 2/3 < x ≤ 1 separately because of the sign change of the
inequality. When 0 ≤ x < 2/3, we have 2/3 − x > 0 and the requirement is most stringent at
x = 0 where we require 2a/3 ≤ 2 or a ≤ 3. When 2/3 < x ≤ 1, we can write the constraint as
a(x − 2/3) ≥ −2. In this case, the constraint is most stringent at x = 1, where we must have
a/3 ≥ −2 or a ≥ −6. Thus a complete expression for our requirements are
−6 ≤ a ≤ 3 b = 2 − 2a/3 (5)
As we see in the following plot, the shape of the PDF fX (x) varies greatly with the value of a.
80
3
a=−6
a=−3
2.5 a=0
a=3
1.5
0.5
0
0 0.2 0.4 0.6 0.8 1
and
E [h(X)] = E X 2 = Var [X] + E [X]2 = 4/3 + 1 = 7/3 (3)
(c) Finally
E [Y ] = E [h(X)] = E X 2 = 7/3 (4)
' 3 4
4 2 2 x 49 61 49
Var [Y ] = E X − E X = dx − = − (5)
−1 4 9 5 9
1+9 (9 − 1)2 16
E [X] = =5 Var [X] = = (1)
2 12 3
√
(b) Define h(X) = 1/ X then
√
h(E [X]) = 1/ 5 (2)
' 9 −1/2
x
E [h(X)] = dx = 1/2 (3)
1 8
81
(c)
(a) To find E[X], we first find the PDF by differentiating the above CDF.
1/2 0 ≤ x ≤ 2
fX (x) = (2)
0 otherwise
(b)
'
x2 2
E X2 = dx = 8/3 (4)
0 2
Var[X] = E X 2 − E [X]2 = 8/3 − 1 = 5/3 (5)
82
(a) We can find the expected value of Y by first differentiating the above CDF to find the PDF
1/2 −1 ≤ y ≤ 1,
fY (y) = (2)
0 otherwise.
(b)
'
y2 1
E Y2 = dy = 1/3 (4)
−1 2
Var[Y ] = E Y 2 − E [Y ]2 = 1/3 − 0 = 1/3 (5)
83
Problem 3.3.7 Solution
To find the moments, we first find the PDF of U by taking the derivative of FU (u). The CDF and
corresponding PDF are
⎧ ⎧
⎪
⎪ 0 u < −5 ⎪
⎪ 0 u < −5
⎪
⎪ ⎪
⎪
⎨ (u + 5)/8 −5 ≤ u < −3 ⎨ 1/8 −5 ≤ u < −3
FU (u) = 1/4 −3 ≤ u < 3 fU (u) = 0 −3 ≤ u < 3 (1)
⎪
⎪ ⎪
⎪
⎪
⎪ 1/4 + 3(u − 3)/8 3≤u<5 ⎪
⎪ 3/8 3≤u<5
⎩ ⎩
1 u ≥ 5. 0 u ≥ 5.
84
With the variable substitution y = x/µ, we obtain
' ∞
n
E [X ] = αµ n
y −(α−n+1) dy (3)
1
We see that E[X n ] < ∞ if and only if α − n + 1 > 1, or, equivalently, n < α. In this case,
$y=∞
n αµn $
−(α−n+1)+1 $
E [X ] = y $ (4)
−(α − n + 1) + 1 y=1
$
−αµn −(α−n) $$y=∞ αµn
= y $ = (5)
α−n y=1 α−n
85
Problem 3.4.4 Solution
Since Y is an Erlang random variable with parameters λ = 2 and n = 2, we find in Appendix A
that
4ye−2y y ≥ 0
fY (y) = (1)
0 otherwise
(a) Appendix A tells us that E[Y ] = n/λ = 1.
(b) Appendix A also tells us that Var[Y ] = n/λ2 = 1/2.
(c) The probability that 1/2 ≤ Y < 3/2 is
' 3/2 ' 3/2
P [1/2 ≤ Y < 3/2] = fY (y) dy = 4ye−2y dy (2)
1/2 1/2
( (
This integral is easily completed using the integration by parts formula u dv = uv − v du
with
u = 2y dv = 2e−2y
du = 2dy v = −e−2y
(b) For x < −5, FX (x) = 0. For x ≥ 5, FX (x) = 1. For −5 ≤ x ≤ 5, the CDF is
' x
x+5
FX (x) = fX (τ ) dτ = (2)
−5 10
The complete expression for the CDF of X is
⎧
⎨ 0 x < −5
FX (x) = (x + 5)/10 5 ≤ x ≤ 5 (3)
⎩
1 x>5
86
(d) The fifth moment of X is
' $5
5
x5 x6 $$
dx = =0 (5)
−5 10 60 $−5
a + b = 14 b−a=6 (2)
a=4 b = 10 (3)
(a)
' 2
P [1 ≤ X ≤ 2] = (1/2)e−x/2 dx = e−1/2 − e−1 = 0.2387 (2)
1
(c) X is an exponential random variable with parameter a = 1/2. By Theorem 3.8, the expected
value of X is E[X] = 1/a = 2.
87
Problem 3.4.8 Solution
Given the uniform PDF
1/(b − a) a ≤ u ≤ b
fU (u) = (1)
0 otherwise
The mean of U can be found by integrating
' b
b2 − a2 b+a
E [U ] = u/(b − a) du = = (2)
a 2(b − a) 2
Where we factored (b2 − a2 ) = (b − a)(b + a). The variance of U can also be found by finding E[U 2 ].
' b
2 (b3 − a3 )
E U = u2 /(b − a) du = (3)
a 3(b − a)
Therefore the variance is
2
(b3 − a3 ) b+a (b − a)2
Var[U ] = − = (4)
3(b − a) 2 12
We will use CA (X) and CB (X) to denote the cost of a call under the two plans. From the problem
statement, we note that CA (X) = 10X so that E[CA (X)] = 10E[X] = 10τ . On the other hand
E [CB (X)] = 99 + 10E [(X − 20)|X > 20] P [X > 20] (6)
since given X ≥ 20, X −20 has a PDF identical to X by the memoryless property of the exponential
random variable. Thus,
E [CB (X)] = 99 + 10τ e−20/τ (8)
Some numeric comparisons show that E[CB (X)] ≤ E[CA (X)] if τ > 12.34 minutes. That is, the
flat price for the first 20 minutes is a good deal only if your average phone call is sufficiently long.
88
Problem 3.4.10 Solution
The integral I1 is ' ∞ $∞
$
I1 = λe−λx dx = −e−λx $ = 1 (1)
0 0
The above marked integral equals 1 since it is the integral of an Erlang PDF with parameters λ
and n + k over all possible values. Hence,
(n + k − 1)!
E Xk = k (3)
λ (n − 1)!
This implies that the first and second moments are
n! n (n + 1)! (n + 1)n
E [X] = = E X2 = 2 = (4)
(n − 1)!λ λ λ (n − 1)! λ2
It follows that the variance of X is n/λ2 .
89
(a) By Definition 3.7, the CDF of Erlang (n, λ) random variable Xn is
' x ' x n n−1 −λt
λ t e
FXn (x) = fXn (t) dt = dt. (2)
−∞ 0 (n − 1)!
(c) Now we do proof by induction. For n = 1, the Erlang (n, λ) random variable X1 is simply
an exponential random variable. Hence for x ≥ 0, FX1 (x) = 1 − e−λx . Now we suppose the
claim is true for FXn−1 (x) so that
n−2
(λx)k e−λx
FXn−1 (x) = 1 − . (7)
k!
k=0
90
By our induction hyothesis, E[X n−1 ] = (n − 1)!/λn−1 which implies
E [X n ] = n!/λn (5)
(a) Since fX (x) ≥ 0 and x ≥ r over the entire integral, we can write
' ∞ ' ∞
xfX (x) dx ≥ rfX (x) dx = rP [X > r] (1)
r r
91
Problem 3.5.2 Solution
The standard normal Gaussian random variable Z has mean µ = 0 and variance σ 2 = 1. Making
these substitutions in Definition 3.8 yields
1 2
fZ (z) = √ e−z /2 (1)
2π
This implies Φ(10/σX ) = 0.55. Using Table 3.1 for the Gaussian CDF, we find that 10/σX = 0.15
or σX = 66.6.
implies that (10 − µT )/15 = 0 or µT = 10. Now we have a Gaussian T with mean 10 and standard
92
deviation 15. So we are prepared to answer the following problems.
32 − 10
P [T > 32] = 1 − P [T ≤ 32] = 1 − Φ (3)
15
= 1 − Φ(1.45) = 1 − 0.926 = 0.074 (4)
0 − 10
P [T < 0] = FT (0) = Φ (5)
15
= Φ(−2/3) = 1 − Φ(2/3) (6)
= 1 − Φ(0.67) = 1 − 0.749 = 0.251 (7)
P [T > 60] = 1 − P [T ≤ 60] = 1 − FT (60) (8)
60 − 10
=1−Φ = 1 − Φ(10/3) (9)
15
= Q(3.33) = 4.34 · 10−4 (10)
93
Problem 3.5.7 Solution
We are given that there are 100,000,000 men in the United States and 23,000 of them are at least 7
feet tall, and the heights of U.S men are independent Gaussian random variables with mean 5 10 .
(a) Let H denote the height in inches of a U.S male. To find σX , we look at the fact that the
probability that P [H ≥ 84] is the number of men who are at least 7 feet tall divided by the
total number of men (the frequency interpretation of probability). Since we measure H in
inches, we have
23,000 70 − 84
P [H ≥ 84] = =Φ = 0.00023 (1)
100,000,000 σX
(b) The probability that a randomly chosen man is at least 8 feet tall is
96 − 70
P [H ≥ 96] = Q = Q(6.5) (3)
4
Unfortunately, Table 3.2 doesn’t include Q(6.5), although it should be apparent that the
probability is very small. In fact, Q(6.5) = 4.0 × 10−11 .
(c) First we need to find the probability that a man is at least 7’6”.
90 − 70
P [H ≥ 90] = Q = Q(5) ≈ 3 · 10−7 = β (4)
4
Although Table 3.2 stops at Q(4.99), if you’re curious, the exact value is Q(5) = 2.87 · 10−7 .
Now we can begin to find the probability that no man is at least 7’6”. This can be modeled
as 100,000,000 repetitions of a Bernoulli trial with parameter 1 − β. The probability that no
man is at least 7’6” is
(1 − β)100,000,000 = 9.4 × 10−14 (5)
(d) The expected value of N is just the number of trials multiplied by the probability that a man
is at least 7’6”.
E [N ] = 100,000,000 · β = 30 (6)
94
√
(a) Since Y is Gaussian (0, 1/ 2), Y has variance 1/2 and
1 2 /[2(1/2)] 1 2
fY (y) = e−y = √ e−y . (2)
2π(1/2) π
(y (y
For y ≥ 0, FY (y) = −∞ fY (u) du = 1/2 + 0 fY (u) du. Substituting fY (u) yields
' y
1 1 2 1
FY (y) = + √ e−u du = + erf(y). (3)
2 π 0 2
√ √ √
(b) Since Y is Gaussian (0, 1/ 2), Z = 2Y is Gaussian with expected value E[Z] = 2E[Y ] = 0
and variance Var[Z] = 2 Var[Y ] = 1. Thus Z is Gaussian (0, 1) and
√
z z 1 z
Φ(z) = FZ (z) = P 2Y ≤ z = P Y ≤ √ = FY √ = + erf √ (4)
2 2 2 2
(b) When we write I 2 as the product of integrals, we use y to denote the other variable of
integration so that
' ∞
' ∞
1 −x2 /2 1 −y 2 /2
2
I = √ e dx √ e dy (3)
2π 2π −∞
' ∞ −∞
' ∞
1 2 2
= e−(x +y )/2 dx dy (4)
2π −∞ −∞
95
Thus, from the problem statement, the BER is
' ∞ ' ∞ y
P e = E [Pe (Y )] = Q( 2y)fY (y) dy = Q( 2y) e−y/γ dy (2)
−∞ 0 γ
Like most integrals with exponential factors, its a good idea to try integration by parts. Before
doing so, we recall that if X is a Gaussian (0, 1) random variable with CDF FX (x), then
Where FX (−1− ) denotes the limiting value of the CDF found by approaching −1 from the
left. Likewise, FX (−1+ ) is interpreted to be the value of the CDF found by approaching
−1 from the right. We notice that these two probabilities are the same and therefore the
probability that X is exactly −1 is zero.
96
(b)
P [X < 0] = FX 0− = 1/3 (3)
P [X ≤ 0] = FX (0) = 2/3 (4)
Here we see that there is a discrete jump at X = 0. Approached from the left the CDF yields
a value of 1/3 but approached from the right the value is 2/3. This means that there is a
non-zero probability that X = 0, in fact that probability is the difference of the two values.
(c)
P [0 < X ≤ 1] = FX (1) − FX 0+ = 1 − 2/3 = 1/3 (6)
P [0 ≤ X ≤ 1] = FX (1) − FX 0− = 1 − 1/3 = 2/3 (7)
The difference in the last two probabilities above is that the first was concerned with the
probability that X was strictly greater then 0, and the second with the probability that X
was greater than or equal to zero. Since the the second probability is a larger set (it includes
the probability that X = 0) it should always be greater than or equal to the first probability.
The two differ by the probability that X = 0, and this difference is non-zero only when the
random variable exhibits a discrete jump in the CDF.
(b)
P [X < 0] = FX 0− = 1/2 P [X ≤ 0] = FX (0) = 1/2 (2)
(c)
(a) By taking the derivative of the CDF FX (x) given in Problem 3.6.2, we obtain the PDF
δ(x+1)
4 + 1/4 + δ(x−1)
4 −1 ≤ x ≤ 1
fX (x) = (1)
0 otherwise
97
(b) The first moment of X is
' ∞
E [X] = xfX (x) dx (2)
−∞
$1
= x/4|x=−1 + x2 /8$−1 + x/4|x=1 = −1/4 + 0 + 1/4 = 0. (3)
(a) Since the conversation time cannot be negative, we know that FW (w) = 0 for w < 0. The
conversation time W is zero iff either the phone is busy, no one answers, or if the conversation
time X of a completed call is zero. Let A be the event that the call is answered. Note that
the event Ac implies W = 0. For w ≥ 0,
FW (w) = P [Ac ] + P [A] FW |A (w) = (1/2) + (1/2)FX (w) (1)
Thus the complete CDF of W is
0 w<0
FW (w) = (2)
1/2 + (1/2)FX (w) w ≥ 0
98
(b) By taking the derivative of FW (w), the PDF of W is
(1/2)δ(w) + (1/2)fX (w)
fW (w) = (3)
0 otherwise
Next, we keep in mind that since X must be nonnegative, fX (x) = 0 for x < 0. Hence,
The variance of W is
Var[W ] = E W 2 − (E [W ])2 = E X 2 /2 − (E [X] /2)2 (7)
2
= (1/2) Var[X] + (E [X]) /4 (8)
Of course, for y < 60, P [D ≤ y|G] = 0. From the problem statement, if the throw is a foul, then
D = 0. This implies
P [D ≤ y|Gc ] = u(y) (3)
99
where u(·) denotes the unit step function. Since P [G] = 0.7, we can write
However, when we take the derivative, either expression for the CDF will yield the PDF. However,
taking the derivative of the first expression perhaps may be simpler:
0.3δ(y) y < 60
fD (y) = (7)
0.07e−(y−60)/10 y ≥ 60
Taking the derivative of the second expression for the CDF is a little tricky because of the product
of the exponential and the step function. However, applying the usual rule for the differentation of
a product does give the correct answer:
The middle term δ(y − 60)(1 − e−(y−60)/10 ) dropped out because at y = 60, e−(y−60)/10 = 1.
Likewise when the professor is more than 5 minutes late, the students leave and a 0 minute lecture
is observed. Since he is late 30% of the time and given that he is late, his arrival is uniformly
distributed between 0 and 10 minutes, the probability that there is no lecture is
The only other possible lecture durations are uniformly distributed between 75 and 80 minutes,
because the students will not wait longer then 5 minutes, and that probability must add to a total
of 1 − 0.7 − 0.15 = 0.15. So the PDF of T can be written as
⎧
⎪
⎪ 0.15δ(t) t=0
⎨
0.03 75 ≤ 7 < 80
fT (t) = (3)
⎪
⎪ 0.7δ(t − 80) t = 80
⎩
0 otherwise
100
Since fX (x) = 1 for 0 ≤ x ≤ 1, we see that for 0 ≤ y < 1,
' √
√ y
√
P [X ≤ y] = dx = y (2)
0
Problem √
3.7.2 Solution
Since Y = X, the fact that X is nonegative and that we asume the squre root is always positive
implies FY (y) = 0 for y < 0. In addition, for y ≥ 0, we can find the CDF of Y by writing
√
FY (y) = P [Y ≤ y] = P X ≤ y = P X ≤ y 2 = FX y 2 (1)
101
Problem 3.7.4 Solution
From Problem 3.6.1, random variable X has CDF
⎧
⎪
⎪ 0 x < −1
⎨
x/3 + 1/3 −1 ≤ x < 0
FX (x) = (1)
⎪
⎪ x/3 + 2/3 0≤x<1
⎩
1 1≤x
(a) We can find the CDF of Y , FY (y) by noting that Y can only take on two possible values, 0
and 100. And the probability that Y takes on these two values depends on the probability
that X < 0 and X ≥ 0, respectively. Therefore
⎧
⎨ 0 y<0
FY (y) = P [Y ≤ y] = P [X < 0] 0 ≤ y < 100 (2)
⎩
1 y ≥ 100
The probabilities concerned with X can be found from the given CDF FX (x). This is the
general strategy for solving problems of this type: to express the CDF of Y in terms of the
CDF of X. Since P [X < 0] = FX (0− ) = 1/3, the CDF of Y is
⎧
⎨ 0 y<0
FY (y) = P [Y ≤ y] = 1/3 0 ≤ y < 100 (3)
⎩
1 y ≥ 100
(b) The CDF FY (y) has jumps of 1/3 at y = 0 and 2/3 at y = 100. The corresponding PDF of
Y is
fY (y) = δ(y)/3 + 2δ(y − 100)/3 (4)
2
X
0
0 0.5 1
U
(a) From the sketch, we observe that X will be nonnegative. Hence FX (x) = 0 for x < 0. Since
U has a uniform distribution on [0, 1], for 0 ≤ u ≤ 1, P [U ≤ u] = u. We use this fact to find
the CDF of X. For x ≥ 0,
FX (x) = P [− ln(1 − U ) ≤ x] = P 1 − U ≥ e−x = P U ≤ 1 − e−x (1)
102
For x ≥ 0, 0 ≤ 1 − e−x ≤ 1 and so
FX (x) = FU 1 − e−x = 1 − e−x (2)
Thus, X has an exponential PDF. In fact, since most computer languages provide uniform
[0, 1] random numbers, the procedure outlined in this problem provides a way to generate
exponential random variables from uniform random variables.
Therefore, for 0 ≤ y ≤ 1,
P [Y ≤ y] = P [g(X) ≤ y] = y 3 (3)
Thus, using g(X) = X 1/3 , we see that for 0 ≤ y ≤ 1,
P [g(X) ≤ y] = P X 1/3 ≤ y = P X ≤ y 3 = y 3 (4)
103
(a)
P [L = 0.5] = P [|V | ≥ 0.5] = P [V ≥ 0.5] + P [V ≤ −0.5] (3)
= 1 − FV (0.5) + FV (−0.5) (4)
= 1 − 1.5/2 + 0.5/2 = 1/2 (5)
104
Problem 3.7.9 Solution
The uniform (0, 2) random variable U has PDF and CDF
⎧
⎨ 0 u < 0,
1/2 0 ≤ u ≤ 2,
fU (u) = FU (u) = u/2 0 ≤ u < 2, (1)
0 otherwise, ⎩
1 u > 2.
To find the CDF of the output of the clipper, W , we remember that W = U for 0 ≤ U ≤ 1
while W = 1 for 1 ≤ U ≤ 2. First, this implies W is nonnegative, i.e., FW (w) = 0 for w < 0.
Furthermore, for 0 ≤ w ≤ 1,
Lastly, we observe that it is always true that W ≤ 1. This implies FW (w) = 1 for w ≥ 1. Therefore
the CDF of W is ⎧
⎨ 0 w<0
FW (w) = w/2 0≤w<1 (4)
⎩
1 w≥1
From the jump in the CDF at w = 1, we see that P [W = 1] = 1/2. The corresponding PDF can
be found by taking the derivative and using the delta function to model the discontinuity.
1/2 + (1/2)δ(w − 1) 0 ≤ w ≤ 1
fW (w) = (5)
0 otherwise
we follow the same procedure as in Problem 3.7.4. We attempt to express the CDF of Y in terms
of the CDF of X. We know that Y is always less than −10. We also know that −10 ≤ Y < 10
when X ≥ 0, and finally, that Y = 10 when X < 0. Therefore
⎧
⎨ 0 y < −10
FY (y) = P [Y ≤ y] = P [X ≥ 0] = 1 − FX (0) −10 ≤ y < 10 (2)
⎩
1 y ≥ 10
105
Problem 3.7.11 Solution
The PDF of U is
1/2 −1 ≤ u ≤ 1
fU (u) = (1)
0 otherwise
Since W ≥ 0, we see that FW (w) = 0 for w < 0. Next, we observe that the rectifier output W is a
mixed random variable since
' 0
P [W = 0] = P [U < 0] = fU (u) du = 1/2 (2)
−1
Finally, U ≤ 1 implies W ≤ 1, which implies FW (w) = 1 for w ≥ 1. Hence, the complete expression
for the CDF is ⎧
⎨ 0 w<0
FW (w) = (w + 1)/2 0 ≤ w ≤ 1 (5)
⎩
1 w>1
By taking the derivative of the CDF, we find the PDF of W ; however, we must keep in mind that
the discontinuity in the CDF at w = 0 yields a corresponding impulse in the PDF.
(δ(w) + 1)/2 0 ≤ w ≤ 1
fW (w) = (6)
0 otherwise
Perhaps an easier way to find the expected value is to use Theorem 2.10. In this case,
' ∞ ' 1
E [W ] = g(u)fW (w) du = u(1/2) du = 1/4 (8)
−∞ 0
106
(b) Using Theorem 3.19, the PDF of Y = aX is
λ −λ(y/a)
1 ae y/a ≥ 0
fY (y) = fX (y/a) = (3)
a 0 otherwise
−(λ/a)y
(λ/a)e y≥0
= (4)
0 otherwise
Hence Y is an exponential (λ/a) exponential random variable.
(c) Using Theorem 3.19, the PDF of Y = aX is
n
λ (y/a)n−1 e−λ(y/a)
1 a(n−1)! y/a ≥ 0
fY (y) = fX (y/a) = (5)
a 0 otherwise
n n−1 −(λ/a)y
(λ/a) y e
(n−1)! y ≥ 0,
= (6)
0 otherwise,
which is an Erlang (n, λ) PDF.
(d) If X is a Gaussian (µ, σ) random variable, then Y = aX has PDF
1 2 2
fY (y) = fX (y/a) = √ e−((y/a)−µ) /2σ (7)
a 2πσ 2
1 2 2 2
=√ e−(y−aµ) /2(a σ ) (8)
2
2πa σ 2
(9)
Thus Y is a Gaussian random variable with expected value E[Y ] = aµ and Var[Y ] = a2 σ 2 .
That is, Y is a Gaussian (aµ, aσ) random variable.
107
Problem 3.7.14 Solution
Since X = F −1 (U ), it is desirable that the function F −1 (u) exist for all 0 ≤ u ≤ 1. However, for the
continuous uniform random variable U , P [U = 0] = P [U = 1] = 0. Thus, it is a zero probability
event that F −1 (U ) will be evaluated at U = 0 or U = 1. Asa result, it doesn’t matter whether
F −1 (u) exists at u = 0 or u = 1.
3
2
Y
1
0
0 1 2 3
X
(b) Since Y ≥ 1/2, we can conclude that FY (y) = 0 for y < 1/2. Also, FY (1/2) = P [Y = 1/2] =
1/4. Similarly, for 1/2 < y ≤ 1,
108
Similarly for the case when a < 0 we have
y−b y−b
FY (y) = P [Y ≤ y] = P X ≥ = 1 − FX (3)
a a
And by taking the derivative, we find that for negative a,
1 y−b
fY (y) = − fX a<0 (4)
a a
A valid expression for both positive and negative a is
1 y−b
fY (y) = fX (5)
|a| a
Therefore the assertion is proved.
(a) Given FX (x) is a continuous function, there exists x0 such that FX (x0 ) = u. For each
value of u, the corresponding x0 is unique. To see this, suppose there were also x1 such
that FX (x1 ) = u. Without loss of generality, we can assume x1 > x0 since otherwise we
could exchange the points x0 and x1 . Since FX (x0 ) = FX (x1 ) = u, the fact that FX (x)
is nondecreasing implies FX (x) = u for all x ∈ [x0 , x1 ], i.e., FX (x) is flat over the interval
[x0 , x1 ], which contradicts the assumption that FX (x) has no flat intervals. Thus, for any
u ∈ (0, 1), there is a unique x0 such that FX (x) = u. Moreiver, the same x0 is the minimum
of all x such that FX (x ) ≥ u. The uniqueness of x0 such that FX (x)x0 = u permits us to
define F̃ (u) = x0 = FX−1 (u).
(b) In this part, we are given that FX (x) has a jump discontinuity at x0 . That is, there exists
u− − + + − + − +
0 = FX (x0 ) and u0 = FX (x0 ) with u0 < u0 . Consider any u in the interval [u0 , u0 ].
Since FX (x0 ) = FX (x+
0 ) and FX (x) is nondecreasing,
FX (x) ≥ FX (x0 ) = u+
0, x ≥ x0 . (1)
109
Moreover,
FX (x) < FX x− −
0 = u0 , x < x0 . (2)
Thus for any u satisfying u−
o ≤ u ≤ u0 , FX (x) < u for x < x0 and FX (x) ≥ u for x ≥ x0 .
+
(c) We note that the first two parts of this problem were just designed to show the properties of
F̃ (u). First, we observe that
P X̂ ≤ x = P F̃ (U ) ≤ x = P min x |FX x ≥ U ≤ x . (3)
Note that P [A] = P [X̂ ≤ x]. In addition, P [B] = P [U ≤ FX (x)] = FX (x) since P [U ≤ u] = u
for any u ∈ [0, 1].
We will show that the events A and B are the same. This fact implies
P X̂ ≤ x = P [A] = P [B] = P [U ≤ FX (x)] = FX (x) . (6)
All that remains is to show A and B are the same. As always, we need to show that A ⊂ B
and that B ⊂ A.
• To show A ⊂ B, suppose A is true and min{x |FX (x ) ≥ U } ≤ x. This implies there
exists x0 ≤ x such that FX (x0 ) ≥ U . Since x0 ≤ x, it follows from FX (x) being
nondecreasing that FX (x0 ) ≤ FX (x). We can thus conclude that
We note x ∈ L. It follows that the minimum element min{x |x ∈ L} ≤ x. That is,
min x |FX x ≥ U ≤ x, (9)
110
(a) The event B has probability
' 3
1 3
P [B] = P [−3 ≤ X ≤ 3] = dx = (2)
−3 10 5
(b) Given B, we see that X has a uniform PDF over [a, b] with a = −3 and b = 3. From
Theorem 3.6, the conditional expected value of X is E[X|B] = (a + b)/2 = 0.
(c) From Theorem 3.6, the conditional variance of X is Var[X|B] = (b − a)2 /12 = 3.
5 − 7e−2/5
= (8)
1 − e−2/5
111
Problem 3.8.3 Solution
The condition right side of the circle is R = [0, 1/2]. Using the PDF in Example 3.5, we have
' 1/2 ' 1/2
P [R] = fY (y) dy = 3y 2 dy = 1/8 (1)
0 0
112
We observe that w2 fW (w) is an even function. Hence
' ∞
E W 2 |C = 2 w2 fW (w) dw (6)
' ∞0
= w2 fW (w) dw = E W 2 = σ 2 = 16 (7)
−∞
113
Now we can calculate the conditional variance.
Var[T |T > 0.02] = E T 2 |T > 0.02 − (E [T |T > 0.02])2 (11)
= E (T + 0.02)2 − (E [T + 0.02])2 (12)
= Var[T + 0.02] (13)
= Var[T ] = 0.01 (14)
(b) If instead we learn that D ≤ 70, we can calculate the conditional PDF by first calculating
' 70
P [D ≤ 70] = fD (y) dy (4)
0
' 60 ' 70
= 0.3δ(y) dy + 0.07e−(y−60)/10 dy (5)
0 60
$70
−(y−60)/10 $
= 0.3 + −0.7e $ = 1 − 0.7e−1 (6)
60
(a) Given that a person is healthy, X is a Gaussian (µ = 90, σ = 20) random variable. Thus,
1 2 2 1 2
fX|H (x) = √ e−(x−µ) /2σ = √ e−(x−90) /800 (1)
σ 2π 20 2π
114
(b) Given the event H, we use the conditional PDF fX|H (x) to calculate the required probabilities
P T + |H = P [X ≥ 140|H] = P [X − 90 ≥ 50|H] (2)
X − 90
=P ≥ 2.5|H = 1 − Φ(2.5) = 0.006 (3)
20
Similarly,
P T − |H = P [X ≤ 110|H] = P [X − 90 ≤ 20|H] (4)
X − 90
=P ≤ 1|H = Φ(1) = 0.841 (5)
20
Thus,
P [T − |H] P [H]
P H|T − = (10)
P [T − |D] P [D] + P [T − |H] P [H]
0.841(0.9)
= = 0.986 (11)
0.106(0.1) + 0.841(0.9)
We say that a test is a failure if the result is T 0 . Thus, given the event H, each test has
conditional failure probability of q = 0.153, or success probability p = 1 − q = 0.847. Given
H, the number of trials N until a success is a geometric (p) random variable with PMF
(1 − p)n−1 p n = 1, 2, . . . ,
PN |H (n) = (13)
0 otherwise.
115
(a) The event Bi that Y = ∆/2 + i∆ occurs if and only if i∆ ≤ X < (i + 1)∆. In particular,
since X has the uniform (−r/2, r/2) PDF
1/r −r/2 ≤ x < r/2,
fX (x) = (1)
0 otherwise,
we observe that ' (i+1)∆
1 ∆
P [Bi ] = dx = (2)
i∆ r r
In addition, the conditional PDF of X given Bi is
fX (x) /P [B] x ∈ Bi 1/∆ i∆ ≤ x < (i + 1)∆
fX|Bi (x) = = (3)
0 otherwise 0 otherwise
It follows that given Bi , Z = X − Y = X − ∆/2 − i∆, which is a uniform (−∆/2, ∆/2)
random variable. That is,
1/∆ −∆/2 ≤ z < ∆/2
fZ|Bi (z) = (4)
0 otherwise
(b) We observe that fZ|Bi (z) is the same for every i. Thus, we can write
fZ (z) = P [Bi ] fZ|Bi (z) = fZ|B0 (z) P [Bi ] = fZ|B0 (z) (5)
i i
Thus, Z is a uniform (−∆/2, ∆/2) random variable. From the definition of a uniform (a, b)
random variable, Z has mean and variance
(∆/2 − (−∆/2))2 ∆2
E [Z] = 0, Var[Z] = = . (6)
12 12
116
Problem 3.9.1 Solution
Taking the derivative of the CDF FY (y) in Quiz 3.1, we obtain
1/4 0 ≤ y ≤ 4
fY (y) = (1)
0 otherwise
We see that Y is a uniform (0, 4) random variable. By Theorem 3.20, if X is a uniform (0, 1)
random variable, then Y = 4X is a uniform (0, 4) random variable. Using rand as Matlab’s
uniform (0, 1) random variable, the program quiz31rv is essentially a one line program:
function y=quiz31rv(m)
%Usage y=quiz31rv(m)
%Returns the vector y holding m
%samples of the uniform (0,4) random
%variable Y of Quiz 3.1
y=4*rand(m,1);
The commands
x=modemrv(10000); hist(x,100);
generate 10,000 sample of the modem receiver voltage and plots the relative frequencies using 100
bins. Here is an example plot:
300
250
Relative Frequency
200
150
100
50
0
−15 −10 −5 0 5 10 15
x
As expected, the result is qualitatively similar (“hills” around X = −5 and X = 5) to the sketch
in Figure 3.3.
117
Problem 3.9.3 Solution
The code for Q̂(z) is the Matlab function
function p=qapprox(z);
%approximation to the Gaussian
% (0,1) complementary CDF Q(z)
t=1./(1.0+(0.231641888.*z(:)));
a=[0.127414796; -0.142248368; 0.7107068705; ...
-0.7265760135; 0.5307027145];
p=([t t.^2 t.^3 t.^4 t.^5]*a).*exp(-(z(:).^2)/2);
This code generates two plots of the relative error e(z) as a function of z:
z=0:0.02:6;
q=1.0-phi(z(:));
qhat=qapprox(z);
e=(q-qhat)./q;
plot(z,e); figure;
semilogy(z,abs(e));
−6
−2 10
−3 −8
10
−4
0 2 4 6 −10
10
z 0 2 4 6
The left side plot graphs e(z) versus z. It appears that the e(z) = 0 for z ≤ 3. In fact, e(z) is
nonzero over that range, but the relative error is so small that it isn’t visible in comparison to
e(6) ≈ −3.5 × 10−3 . To see the error for small z, the right hand graph plots |e(z)| versus z in log
scale where we observe very small relative errors on the order of 10−7 .
function k=georv(p,m);
lambda= -log(1-p);
k=ceil(exponentialrv(lambda,m));
To compare this technique with that use in geometricrv.m, we first examine the code for exponentialrv.m:
function x=exponentialrv(lambda,m)
x=-(1/lambda)*log(1-rand(m,1));
118
To analyze how m = 1 random sample is generated, let R = rand(1,1). In terms of mathematics,
exponentialrv(lambda,1) generates the random variable
ln(1 − R)
X=− (1)
λ
For λ = − ln(1 − p), we have that
% &
ln(1 − R)
K = X = (2)
ln(1 − p)
This is precisely the same function implemented by geometricrv.m. In short, the two methods for
generating geometric (p) random samples are one in the same.
function w=iwcdf(u);
w=((u>=0).*(u <= 0.25).*(8*u-5))+...
((u > 0.25).*(u<=1).*((8*u+7)/3));
function exponentialtest(lambda,n)
delta=0.01;
x=exponentialrv(lambda,n);
xr=(0:delta:(5.0/lambda))’;
fxsample=(histc(x,xr)/(n*delta));
fx=exponentialpdf(lambda,xr);
plot(xr,fx,xr,fxsample);
generates n samples of an exponential λ random variable and plots the relative frequency
ni /(n∆) against the corresponding exponential PDF. Note that the histc function generates
a histogram using xr to define the edges of the bins. Two representative plots for n = 1,000
and n = 100,000 samples appear in the following figure:
119
1.5 1.5
1 1
0.5 0.5
0 0
0 1 2 3 4 5 0 1 2 3 4 5
exponentialtest(1,1000) exponentialtest(1,100000)
For n = 1,000, the jaggedness of the relative frequency occurs because δ is sufficiently small
that the number of sample of X in each bin i∆ < X ≤ (i+1)∆ is fairly small. For n = 100,000,
the greater smoothness of the curve demonstrates how the relative frequency is becoming a
better approximation to the actual PDF.
(b) Similar results hold for Gaussian random variables. The following code generates the same
comparison between the Gaussian PDF and the relative frequency of n samples.
function gausstest(mu,sigma2,n)
delta=0.01;
x=gaussrv(mu,sigma2,n);
xr=(0:delta:(mu+(3*sqrt(sigma2))))’;
fxsample=(histc(x,xr)/(n*delta));
fx=gausspdf(mu,sigma2,xr);
plot(xr,fx,xr,fxsample);
1 0.5
0.8 0.4
0.6 0.3
0.4 0.2
0.2 0.1
0 0
0 2 4 6 0 2 4 6
gausstest(3,1,1000) gausstest(3,1,100000)
function y=uquantize(r,b,x)
%uniform (-r/2,r/2) b bit quantizer
n=2^b;
delta=r/n;
x=min(x,(r-delta/2)/2);
x=max(x,-(r-delta/2)/2);
y=(delta/2)+delta*floor(x/delta);
120
Note that if |x| > r/2, then x is truncated so that the quantizer output has maximum amplitude.
Next, we generate Gaussian samples, quantize them and record the errors:
function stdev=quantizegauss(r,b,m)
x=gaussrv(0,1,m);
x=x((x<=r/2)&(x>=-r/2));
y=uquantize(r,b,x);
z=x-y;
hist(z,100);
stdev=sqrt(sum(z.^2)/length(z));
For a Gaussian random variable X, P [|X| > r/2] > 0 for any value of r. When we generate enough
Gaussian samples, we will always see some quantization errors due to the finite (−r/2, r/2) range.
To focus our attention on the effect of b bit quantization, quantizegauss.m eliminates Gaussian
samples outside the range (−r/2, r/2). Here are outputs of quantizegauss for b = 1, 2, 3 bits.
15000 15000 15000
0 0 0
−2 0 2 −1 0 1 −0.5 0 0.5
b=1 b=2 b=3
It is obvious that for b = 1 bit quantization, the error is decidely not uniform. However, it appears
that the error is uniform for b = 2 and b = 3. You can verify that uniform errors is a reasonable
model for larger values of b.
(u + 5)/8
0.5
FU (u) = 1/4 −3 ≤ u < 3 (1)
⎪
⎪
0 ⎪
⎪ 1/4 + 3(u − 3)/8 3 ≤ u < 5
−5 0 5 ⎩
u 1 u ≥ 5.
At x = 1/4, there are multiple values of u such that FU (u) = 1/4. However, except for x = 1/4,
the inverse FU−1 (x) is well defined over 0 < x < 1. At x = 1/4, we can arbitrarily define a value for
FU−1 (1/4) because when we produce sample values of FU−1 (X), the event X = 1/4 has probability
zero. To generate the inverse CDF, given a value of x, 0 < x < 1, we ave to find the value of u
such that x = FU (u). From the CDF we see that
1 u+5
0≤x≤ ⇒x= (2)
4 8
1 1 3
<x≤1 ⇒ x = + (u − 3) (3)
4 4 8
(4)
121
These conditions can be inverted to express u as a function of x.
−1 8x − 5 0 ≤ x ≤ 1/4
u = F (x) = (5)
(8x + 7)/3 1/4 < x ≤ 1
In particular, when X is a uniform (0, 1) random variable, U = F −1 (X) will generate samples of
the rndom variable U . A Matlab program to implement this solution is now straightforward:
function u=urv(m)
%Usage: u=urv(m)
%Generates m samples of the random
%variable U defined in Problem 3.3.7
x=rand(m,1);
u=(x<=1/4).*(8*x-5);
u=u+(x>1/4).*(8*x+7)/3;
To see that this generates the correct output, we can generate a histogram of a million sample
values of U using the commands
u=urv(1000000); hist(u,100);
The output
4 is shown in the following graph, alongside the corresponding PDF of U .
x 10
4
⎧
3 ⎪
⎪ 0 u < −5
⎪
⎪
⎨ 1/8 −5 ≤ u < −3
Count
2
fU (u) = 0 −3 ≤ u < 3 (6)
⎪
⎪
1 ⎪
⎪ 3/8 3≤u<5
⎩
0 0 u ≥ 5.
−5 0 5
u
Note that the scaling constant 104 on the histogram plot comes from the fact that the histogram
was generated using 106 sample points and 100 bins. The width of each bin is ∆ = 10/100 = 0.1.
Consider a bin of idth ∆ centered at u0 . A sample value of U would fall in that bin with probability
fU (u0 )∆. Given that we generate m = 106 samples, we would expect about mfU (u0 )∆ = 105 fU (u0 )
samples in each bin. For −5 < u0 < −3, we would expect to see about 1.25 × 104 samples in each
bin. For 3 < u0 < 5, we would expect to see about 3.75 × 104 samples in each bin. As can be seen,
these conclusions are consistent with the histogam data.
Finally, we comment that if you generate histograms for a range of values of m, the number of
samples, you will see that the histograms will become more and more similar to a scaled version of
the PDF. This gives the (false) impression that any bin centered on u0 has a number of samples
increasingly close to mfU (u0 )∆. Because the histpgram is always the same height, what is actually
happening is that the vertical axis is effectively scaled by 1/m and the height of a histogram bar is
proportional to the fraction of m samples that land in that bin. We will see in Chapter 7 that the
fraction of samples in a bin does converge to the probability of a sample being in that bin as the
number of samples m goes to infinity.
0.5
FX (x) = (x + 1)/4 −1 ≤ x < 1, (1)
⎩
0 1 x ≥ 1.
−2 0 2
x
122
Following the procedure outlined in Problem 3.7.18, we define for 0 < u ≤ 1,
We observe that if 0 < u < 1/4, then we can choose x so that FX (x) = u. In this case, (x+1)/4 = u,
or equivalently, x = 4u − 1. For 1/4 ≤ u ≤ 1, the minimum x that satisfies FX (x) ≥ u is x = 1.
These facts imply
4u − 1 0 < u < 1/4
F̃ (u) = (3)
1 1/4 ≤ u ≤ 1
It follows that if U is a uniform (0, 1) random variable, then F̃ (U ) has the same CDF as X. This
is trivial to implement in Matlab.
function x=quiz36rv(m)
%Usage x=quiz36rv(m)
%Returns the vector x holding m samples
%of the random variable X of Quiz 3.6
u=rand(m,1);
x=((4*u-1).*(u< 0.25))+(1.0*(u>=0.25));
123
Problem Solutions – Chapter 4
(a) The probability P [X ≤ 2, Y ≤ 3] can be found be evaluating the joint CDF FX,Y (x, y) at
x = 2 and y = 3. This yields
(b) To find the marginal CDF of X, FX (x), we simply evaluate the joint CDF at y = ∞.
1 − e−x x ≥ 0
FX (x) = FX,Y (x, ∞) = (2)
0 otherwise
(c) Likewise for the marginal CDF of Y , we evaluate the joint CDF at X = ∞.
1 − e−y y ≥ 0
FY (y) = FX,Y (∞, y) = (3)
0 otherwise
(a) Because the probability that any random variable is less than −∞ is zero, we have
(b) The probability that any random variable is less than infinity is always one.
(d) Part (d) follows the same logic as that of part (a).
124
Problem 4.1.3 Solution
We wish to find P [x1 ≤ X ≤ x2 ] or P [y1 ≤ Y ≤ y2 ]. We define events A = {y1 ≤ Y ≤ y2 } and
B = {y1 ≤ Y ≤ y2 } so that
P [A ∪ B] = P [A] + P [B] − P [AB] (1)
Keep in mind that the intersection of events A and B are all the outcomes such that both A and
B occur, specifically, AB = {x1 ≤ X ≤ x2 , y1 ≤ Y ≤ y2 }. It follows that
P [A ∪ B] = P [x1 ≤ X ≤ x2 ] + P [y1 ≤ Y ≤ y2 ]
− P [x1 ≤ X ≤ x2 , y1 ≤ Y ≤ y2 ] . (2)
By Theorem 4.5,
P [x1 ≤ X ≤ x2 , y1 ≤ Y ≤ y2 ]
= FX,Y (x2 , y2 ) − FX,Y (x2 , y1 ) − FX,Y (x1 , y2 ) + FX,Y (x1 , y1 ) . (3)
Expressed in terms of the marginal and joint CDFs,
P [A ∪ B] = FX (x2 ) − FX (x1 ) + FY (y2 ) − FY (y1 ) (4)
− FX,Y (x2 , y2 ) + FX,Y (x2 , y1 ) + FX,Y (x1 , y2 ) − FX,Y (x1 , y1 ) (5)
y2 y2 y2
y1 y1 y1
(3)
X X X
x1 x1 x2 x1 x2
A B C
125
(b) In terms of the joint CDF FX,Y (x, y), we can write
However,
Thus, we have the contradiction that e−(x+y) ≤ 0 for all x, y ≥ 0. We can conclude that the given
function is not a valid CDF.
126
Problem 4.2.1 Solution
In this problem, it is helpful to label points with nonzero probability on the X, Y plane:
y P
X,Y (x, y)
4 6
3 •3c •6c •12c
2
1 •c •2c •4c
0 - x
0 1 2 3 4
Thus c = 1/28.
(d) There are two ways to solve this part. The direct way is to calculate
1(1) + 2(0) 1
P [Y = X] = PX,Y (x, y) = = (5)
28 28
x=1,2,4 y=x
The indirect way is to use the previous results and the observation that
(e)
(1)(3) + (2)(3) + (4)(3) 21 3
P [Y = 3] = PX,Y (x, 3) = = = (7)
28 28 4
x=1,2,4
127
y
PX,Y (x, y) 6
•c 1 •c •3c
•2c •2c - x
1 2
•3c •c •c
?
(a) To find c, we sum the PMF over all possible values of X and Y . We choose c so the sum
equals one.
PX,Y (x, y) = c |x + y| = 6c + 2c + 6c = 14c (1)
x y x=−2,0,2 y=−1,0,1
Thus c = 1/14.
(b)
P [Y < X] = PX,Y (0, −1) + PX,Y (2, −1) + PX,Y (2, 0) + PX,Y (2, 1) (2)
= c + c + 2c + 3c = 7c = 1/2 (3)
(c)
P [Y > X] = PX,Y (−2, −1) + PX,Y (−2, 0) + PX,Y (−2, 1) + PX,Y (0, 1) (4)
= 3c + 2c + c + c = 7c = 1/2 (5)
(e)
p2
p r •rr
XXX
p r
XX
1−p X a •ra
p(1−p)
HH
HH p r •ar p(1−p)
1−pHH a
XXX
XX
1−p X a •aa (1−p)2
128
Now we construct a table that maps the sample outcomes to values of X and Y .
outcome P [·] X Y
rr p 2 1 1
ra p(1 − p) 1 0 (1)
ar p(1 − p) 0 1
aa (1 − p)2 0 0
outcome X Y
hh 0 1
ht 1 0 (1)
th 1 1
tt 2 0
PX,Y (x, y) y = 0 y = 1
x=0 0 1/4
(2)
x=1 1/4 1/4
x=2 1/4 0
• Lowercase axis labels: For the lowercase labels, we observe that we are depicting the masses
associated with the joint PMF PX,Y (x, y) whose arguments are x and y. Since the PMF
function is defined in terms of x and y, the axis labels should be x and y.
• Uppercase axis labels: On the other hand, we are depicting the possible outcomes (labeled with
their respective probabilities) of the pair of random variables X and Y . The corresponding
axis labels should be X and Y just as in Figure 4.2. The fact that we have labeled the
possible outcomes by their probabilities is irrelevant. Further, since the expression for the
PMF PX,Y (x, y) given in the figure could just as well have been written PX,Y (·, ·), it is clear
that the lowercase x and y are not what matter.
129
Problem 4.2.6 Solution
As the problem statement indicates, Y = y < n if and only if
A: the first y tests are acceptable, and
B: test y + 1 is a rejection.
Thus P [Y = y] = P [AB]. Note that Y ≤ X since the number of acceptable tests before the first
failure cannot exceed the number of acceptable circuits. Moreover, given the occurrence of AB, the
event X = x < n occurs if and only if there are x − y acceptable circuits in the remaining n − y − 1
tests. Since events A, B and C depend on disjoint sets of tests, they are independent events. Thus,
for 0 ≤ y ≤ x < n,
The case y = x = n occurs when all n tests are acceptable and thus PX,Y (n, n) = pn .
130
A The first x tests must be acceptable.
B Test x+1 must be a rejection since otherwise we would have x+1 acceptable at the beginnning.
Since the events A, B and C are independent, the joint PMF for x + k ≤ r, x ≥ 0 and k ≥ 0 is
n−x−1
PK,X (k, x) = p (1 − p)
x
(1 − p)k−1 pn−x−1−(k−1) (1)
k−1
P [A] P [B]
P [C]
2
1 •c 2c
• •
4c
0 - x
0 1 2 3 4
131
(c) The second moments are
E X2 = xPX (x) = 12 (4/28) + 22 (8/28) + 42 (16/28) = 73/7 (5)
x=1,2,4
2
E Y = yPY (y) = 12 (7/28) + 32 (21/28) = 7 (6)
y=1,3
(c) Since X and Y both have zero mean, the variances are
Var[X] = E X 2 = x2 PX (x) = (−2)2 (6/14) + 22 (6/14) = 24/7 (5)
x=−2,0,2
Var[Y ] = E Y 2 = y 2 PY (y) = (−1)2 (5/14) + 12 (5/14) = 5/7 (6)
y=−1,0,1
The standard deviations are σX = 24/7 and σY = 5/7.
132
Problem 4.3.3 Solution
We recognize that the given joint PMF is written as the product of two marginal PMFs PN (n) and
PK (k) where
100 100n e−100
n! n = 0, 1, . . .
PN (n) = PN,K (n, k) = (1)
0 otherwise
k=0
∞ 100
k pk (1 − p)100−k k = 0, 1, . . . , 100
PK (k) = PN,K (n, k) = (2)
0 otherwise
n=0
n
n
PN (n) = PN,K (n, k) = (1 − p)n−1 p/n = (1 − p)n−1 p (2)
k=1 k=1
The marginal PMF of K is found by summing PN,K (n, k) over all possible N . Note that if K = k,
then N ≥ k. Thus,
∞
1
PK (k) = (1 − p)n−1 p (3)
n
n=k
n
100n e−100 100n e−100
PN (n) = PN,K (n, k) = = (1)
(n + 1)! n!
k k=0
133
(a) The joint PDF of X and Y is
Y
1
Y+X=1
c x + y ≤ 1, x, y ≥ 0
fX,Y (x, y) = (1)
0 otherwise
X
1
To find the constant c we integrate over the region shown. This gives
' 1 ' 1−x
cx $$1 c
c dy dx = cx − $ = = 1 (2)
0 0 2 0 2
Therefore c = 2.
(b) To find the P [X ≤ Y ] we look to integrate over the area indicated by the graph
Y
1 X£Y ' 1/2 ' 1−x
X=Y P [X ≤ Y ] = dy dx (3)
0 x
' 1/2
= (2 − 4x) dx (4)
0
X
= 1/2 (5)
1
(c) The probability P [X + Y ≤ 1/2] can be seen in the figure. Here we can set up the following
integrals
Y
Y+X=1 '
1 1/2 ' 1/2−x
P [X + Y ≤ 1/2] = 2 dy dx (6)
Y+X=½ 0 0
' 1/2
= (1 − 2x) dx (7)
X 0
= 1/2 − 1/4 = 1/4 (8)
1
(a) To find the constant c integrate fX,Y (x, y) over the all possible values of X and Y to get
' 1' 1
1= cxy 2 dx dy = c/6 (2)
0 0
Therefore c = 6.
134
(b) The probability P [X ≥ Y ] is the integral of the joint PDF fX,Y (x, y) over the indicated shaded
region.
Y
1 ' 1' x
P [X ≥ Y ] = 6xy 2 dy dx (3)
0 0
' 1
= 2x4 dx (4)
0
X = 2/5 (5)
1
Y
1 Similarly, to find P [Y ≤ X 2 ] we can integrate over the region
shown in the figure.
Y=X 2
' 1 ' x2
P Y ≤ X2 = 6xy 2 dy dx (6)
0 0
X
= 1/4 (7)
1
(c) Here we can choose to either integrate fX,Y (x, y) over the lighter shaded region, which would
require the evaluation of two integrals, or we can perform one integral over the darker region
by recognizing
min(X,Y) < ½
Y
min(X,Y) > ½
1 P [min(X, Y ) ≤ 1/2] = 1 − P [min(X, Y ) > 1/2] (8)
' 1 ' 1
=1− 6xy 2 dx dy (9)
1/2 1/2
' 1
9y 2 11
=1− dy = (10)
X 1/2 4 32
1
(d) The probability P [max(X, Y ) ≤ 3/4] can be found be integrating over the shaded region
shown below.
Y
1 max(X,Y) < ¾
P [max(X, Y ) ≤ 3/4] = P [X ≤ 3/4, Y ≤ 3/4] (11)
' 3' 3
4 4
= 6xy 2 dx dy (12)
0 $ 0 $
= x2 $0 y 3 $0
3/4 3/4
(13)
X
= (3/4)5 = 0.237 (14)
1
135
(a) The probability that X ≥ Y is:
Y ' ∞' x
X³Y P [X ≥ Y ] = 6e−(2x+3y) dy dx (2)
'
0 0
∞ $y=x
= 2e−2x −e−3y $y=0 dx (3)
'
0
∞
X = [2e−2x − 2e−5x ] dx = 3/5 (4)
0
The P [X + Y ≤ 1] is found by integrating over the region where X + Y ≤ 1
' 1 ' 1−x
Y P [X + Y ≤ 1] = 6e−(2x+3y) dy dx (5)
1 0 0
'
X+Y≤ 1 1 $y=1−x
= 2e−2x −e−3y $y=0 dx (6)
0
' 1
= 2e−2x 1 − e−3(1−x) dx (7)
X 0
$1
1 = −e−2x − 2ex−3 $0 (8)
= 1 + 2e−3 − 3e−2 (9)
(b) The event min(X, Y ) ≥ 1 is the same as the event {X ≥ 1, Y ≥ 1}. Thus,
' ∞' ∞
P [min(X, Y ) ≥ 1] = 6e−(2x+3y) dy dx = e−(2+3) (10)
1 1
• x < 0 or y < 0
Y
1
In this case, the region of integration doesn’t overlap the
region of nonzero probability and
y ' y ' x
FX,Y (x, y) = fX,Y (u, v) du dv = 0 (1)
X −∞ −∞
x 1
136
• 0<y≤x≤1
In this case, the region where the integral has a nonzero contribution is
Y
' y ' x
1
FX,Y (x, y) = fX,Y (u, v) dy dx (2)
'−∞ −∞
y' x
y
= 8uv du dv (3)
'0 y v
X
= 4(x2 − v 2 )v dv (4)
x 1 0
$v=y
= 2x2 v 2 − v 4 $v=0 = 2x2 y 2 − y 4 (5)
• 0 < x ≤ y and 0 ≤ x ≤ 1
Y
1 ' y ' x
FX,Y (x, y) = fX,Y (u, v) dv du (6)
y '−∞ −∞
x' u
= 8uv dv du (7)
'0 x 0
X
1 x
= 4u3 du = x4 (8)
0
• 0 < y ≤ 1 and x ≥ 1
Y
' y ' x
1
FX,Y (x, y) = fX,Y (u, v) dv du (9)
−∞ −∞
y ' y' 1
= 8uv du dv (10)
' 0
y
v
X = 4v(1 − v 2 ) dv (11)
x 1 0
= 2y 2 − y 4 (12)
• x ≥ 1 and y ≥ 1
137
Y
y
1
In this case, the region of integration completely covers the
region of nonzero probability and
' y ' x
FX,Y (x, y) = fX,Y (u, v) du dv (13)
−∞ −∞
X =1 (14)
1 x
(a) The joint PDF (and the corresponding region of nonzero probability) are
Y
1/2 −1 ≤ x ≤ y ≤ 1
X fX,Y (x, y) = (1)
0 otherwise
1
-1
(b)
' 1' 1 '
1 1
1−x
P [X > 0] = dy dx = dx = 1/4 (2)
0 x 2 0 2
This result can be deduced by geometry. The shaded triangle of the X, Y plane corresponding
to the event X > 0 is 1/4 of the total shaded area.
138
(d) From the marginal PDF fX (x), the expected value of X is
' ∞ '
1 1
E [X] = xfX (x) dx = x(1 − x) dx (5)
−∞ 2 −1
$1
x2 x3 $$ 1
= − $ =− . (6)
4 6 −1 3
1
5x2 /2 −1 ≤ x ≤ 1, 0 ≤ y ≤ x2
fX,Y (x, y) = (1)
0 otherwise
X
-1 1
We can find the appropriate marginal PDFs by integrating the joint PDF.
139
(a) The marginal PDF of X is
' x2
5x2 5x4 /2 −1 ≤ x ≤ 1
fX (x) = dy = (2)
0 2 0 otherwise
Since |y| is symmetric about the origin, we can simplify the integral to
' √ $√ 2 2
4 |x| r 2 −x2
2 |x| 2 $$ r −x 2 |x| (r2 − x2 )
fX (x) = 4 y dy = 4 y $ = (3)
r 0 r 0 r4
Note that for |x| > r, fX (x) = 0. Hence the complete expression for the PDF of X is
2|x|(r 2 −x2 )
r4
−r ≤ x ≤ r
fX (x) = (4)
0 otherwise
(b) Note that the joint PDF is symmetric in x and y so that fY (y) = fX (y).
140
(a) The joint PDF of X and Y and the region of nonzero probability are
Y
1
cy 0 ≤ y ≤ x ≤ 1
fX,Y (x, y) = (1)
0 otherwise
X
1
(b) To find the value of the constant, c, we integrate the joint PDF over all x and y.
' ' ' 1' x ' $1
∞ ∞ 1
cx2 cx3 $$ c
fX,Y (x, y) dx dy = cy dy dx = dx = $ = . (2)
−∞ −∞ 0 0 0 2 6 0 6
Thus c = 6.
(c) We can find the CDF FX (x) = P [X ≤ x] by integrating the joint PDF over the event X ≤ x.
For x < 0, FX (x) = 0. For x > 1, FX (x) = 1. For 0 ≤ x ≤ 1,
Y ''
1 FX (x) = fX,Y x , y dy dx (3)
x ≤x
' x ' x
= 6y dy dx (4)
' 0
x
0
X
x 1 = 3(x )2 dx = x3 . (5)
0
The complete expression for the joint CDF is
⎧
⎨ 0 x<0
FX (x) = x3 0 ≤ x ≤ 1 (6)
⎩
1 x≥1
(d) Similarly, we find the CDF of Y by integrating fX,Y (x, y) over the event Y ≤ y. For y < 0,
FY (y) = 0 and for y > 1, FY (y) = 1. For 0 ≤ y ≤ 1,
''
Y
FY (y) = fX,Y x , y dy dx (7)
1 y ≤y
' y' 1
y
= 6y dx dy (8)
0 y
' y
X 6y (1 − y ) dy
= (9)
1 $y0
= 3(y )2 − 2(y )3 $0 = 3y 2 − 2y 3 . (10)
The complete expression for the CDF of Y is
⎧
⎨ 0 y<0
FY (y) = 3y 2 − 2y 3 0 ≤ y ≤ 1 (11)
⎩
1 y>1
141
(e) To find P [Y ≤ X/2], we integrate the joint PDF fX,Y (x, y) over the region y ≤ x/2.
Y ' 1 ' x/2
1 P [Y ≤ X/2] = 6y dy dx (12)
0 0
' 1 $x/2
½ = 3y 2 $0 dx (13)
0
' 1
X 3x2
1 = dx = 1/4 (14)
0 4
y
4 6 PX,Y (x, y)
W =−2 W =−1 W =1
3/28 6/28 12/28
3 • • •
W =0 W =1 W =3
1/28 2/28 4/28
1 • • •
0 - x
0 1 2 3 4
(a) To find the PMF of W , we simply add the probabilities associated with each possible value
of W :
142
Problem 4.6.2 Solution
y In Problem 4.2.2, the joint PMF
6 PX,Y (x, y) PX,Y (x, y) is given in terms of the
c c 3c
W =0
• 1 • W =2
• W =4 parameter c. For this problem, we first
2c 2c need to find c. Before doing so, it is
W
=−2 -
• • W =2 x convenient to label each possible X, Y
1 2
point with the corresponding value of
• • c • c
3c
W =−4 W =−2 W =0 W = X + 2Y .
?
To find c, we sum the PMF over all possible values of X and Y . We choose c so the sum equals
one.
PX,Y (x, y) = c |x + y| (1)
x y x=−2,0,2 y=−1,0,1
= 6c + 2c + 6c = 14c (2)
(a) From the above graph, we can calculate the probability of each possible value of w.
143
Problem 4.6.4 Solution
Y
W>w The x, y pairs with nonzero probability are shown in the figure.
For w = 0, 1, . . . , 10, we observe that
w
P [W > w] = P [min(X, Y ) > w] (1)
= P [X > w, Y > w] (2)
= 0.01(10 − w) 2
(3)
X
w
To find the PMF of W , we observe that for w = 1, . . . , 10,
144
Problem 4.6.6 Solution
(a) The minimum value of W is W = 0, which occurs when X = 0 and Y = 0. The maximum
value of W is W = 1, which occurs when X = 1 or Y = 1. The range of W is SW =
{w|0 ≤ w ≤ 1}.
(a) Since the joint PDF fX,Y (x, y) is nonzero only for 0 ≤ y ≤ x ≤ 1, we observe that W =
Y − X ≤ 0 since Y ≤ X. In addition, the most negative value of W occurs when Y = 0 and
X = 1 and W = −1. Hence the range of W is SW = {w| − 1 ≤ w ≤ 0}.
(b) For w < −1, FW (w) = 0. For w > 0, FW (w) = 1. For −1 ≤ w ≤ 0, the CDF of W is
Y
FW (w) = P [Y − X ≤ w] (1)
1 ' 1 ' x+w
Y=X+w = 6y dy dx (2)
½ −w 0
' 1
= 3(x + w)2 dx (3)
X −w
-w 1 $1
= (x + w)3 $−w = (1 + w)3 (4)
145
Therefore, the complete CDF of W is
⎧
⎨ 0 w < −1
FW (w) = (1 + w) −1 ≤ w ≤ 0
3 (5)
⎩
1 w>0
1
2 0≤y≤x≤1
fX,Y (x, y) = (1)
0 otherwise
X
1
w
The complete expression for the CDF is
P[Y<wX] ⎧
⎨ 0 w<0
X
FW (w) = w 0≤w<1 (3)
1 ⎩
1 w≥1
By taking the derivative of the CDF, we find that the PDF of W is
1 0≤w<1
fW (w) = (4)
0 otherwise
We see that W has a uniform PDF over [0, 1]. Thus E[W ] = 1/2.
1
2 0≤y≤x≤1
fX,Y (x, y) = (1)
0 otherwise
X
1
146
(a) Since fX,Y (x, y) = 0 for y > x, we can conclude that Y ≤ X and that W = X/Y ≥ 1. Since
Y can be arbitrarily small but positive, W can be arbitrarily large. Hence the range of W is
SW = {w|w ≥ 1}.
(b) For w ≥ 1, the CDF of W is
Y
1 P[Y<X/w]
FW (w) = P [X/Y ≤ w] (2)
1/w = 1 − P [X/Y > w] (3)
= 1 − P [Y < X/w] (4)
X
1
= 1 − 1/w (5)
Note that we have used the fact that P [Y < X/w] equals 1/2 times the area of the corre-
sponding triangle. The complete CDF is
0 w<1
FW (w) = (6)
1 − 1/w w ≥ 1
The PDF of W is found by differentiating the CDF.
dFW (w) 1/w2 w ≥ 1
fW (w) = = (7)
dw 0 otherwise
To find the expected value E[W ], we write
' ∞ ' ∞
dw
E [W ] = wfW (w) dw = . (8)
−∞ 1 w
However, the integral diverges and E[W ] is undefined.
147
Problem 4.6.11 Solution
Following the hint, we observe that either Y ≥ X or X ≥ Y , or, equivalently, (Y /X) ≥ 1 or
(X/Y ) ≥ 1. Hence, W ≥ 1. To find the CDF FW (w), we know that FW (w) = 0 for w < 1. For
w ≥ 1, we solve
Y=wX
Y We note that in the middle of the above steps, nonnegativity of X and Y
a
a/w was essential. We can depict the given set {X/w ≤ Y ≤ wX} as the dark
region on the X, Y plane. Because the PDF is uniform over the square, it
Y=X/w
is easier to use geometry to calculate the probability. In particular, each
a/w of the lighter triangles that are not part of the region of interest has area
X a2 /2w.
a
This implies
a2 /2w + a2 /2w 1
P [X/w ≤ Y ≤ wX] = 1 − 2
=1− (1)
a w
The final expression for the CDF of W is
0 w<1
FW (w) = (2)
1 − 1/w w ≥ 1
By taking the derivative, we obtain the PDF
0 w<1
fW (w) = (3)
1/w2 w ≥ 1
148
(b) The correlation of X and Y is
rX,Y = xyPX,Y (x, y) (5)
x=1,2,4 y=1,3
1 · 1 · 1 1 · 3 · 3 2 · 1 · 2 2 · 3 · 6 4 · 1 · 4 4 · 3 · 12
= + + + + + (6)
28 28 28 28 28 28
= 210/28 = 15/2 (7)
Cov [X, Y ]
ρX,Y = = 0. (12)
Var[X] Var[Y ]
149
(a) Random variable W = 2XY has expected value
E 2XY = 2xy PX,Y (x, y) (3)
x=−2,0,2 y=−1,0,1
3 2 1 1 1
= 2−2(−1) + 2−2(0) + 2−2(1) + 20(−1) + 20(1) (4)
14 14 14 14 14
2(−1) 1 2(0) 2 2(1) 3
+2 +2 +2 (5)
14 14 14
= 61/28 (6)
150
since only terms in which both h and b are nonzero make a contribution. Using the marginal
PMFs, the expected values of X and Y are
1
E [H] = hPH (h) = −1(0.6) + 0(0.2) + 1(0.2) = −0.2 (5)
h=−1
E [B] = bPB (b) = 0(0.2) + 2(0.5) + 4(0.3) = 2.2 (6)
b=0,2,4
The covariance is
4
25 13 7 3
E [Y ] = yPY (y) = 1 + 2 + 3 + 4 = 7/4 (2)
48 48 48 48
y=1
4
1
E [X] = xPX (x) = (1 + 2 + 3 + 4) = 5/2 (3)
4
x=1
4
25 13 7 3
2
E Y = y 2 PY (y) = 12 + 22 + 32 + 42 = 47/12 (4)
48 48 48 48
y=1
2
4
1 2
E X = x2 PX (x) = 1 + 22 + 32 + 42 = 15/2 (5)
4
x=1
151
(c) To find the correlation, we evaluate the product XY over all values of X and Y . Specifically,
4
x
rX,Y = E [XY ] = xyPX,Y (x, y) (8)
x=1 y=1
1 2 3 4 4 6 8 9 12 16
= + + + + + + + + + (9)
4 8 12 16 8 12 16 12 16 16
=5 (10)
152
Problem 4.7.6 Solution
0 - x
0 1 2 3 4
Using the results from Problem 4.7.4, we have the following answers.
(a) The expected values are
E [W ] = E [Y ] = 7/4 E [V ] = E [X] = 5/2 (1)
153
Problem 4.7.8 Solution
The joint PDF of X and Y is
(x + y)/3 0 ≤ x ≤ 1, 0 ≤ y ≤ 2
fX,Y (x, y) = (1)
0 otherwise
Before calculating moments, we first find the marginal PDFs of X and Y . For 0 ≤ x ≤ 1,
' ∞ ' 2 $y=2
x+y xy y 2 $$ 2x + 2
fX (x) = fX,Y (x, y) dy = dy = + $ = (2)
−∞ 0 3 3 6 y=0 3
For 0 ≤ y ≤ 2,
' ' 1
$x=1
∞
x y x2 xy $$ 2y + 1
fY (y) = fX,Y (x, y) dx = + dx = + $ = (3)
−∞ 0 3 3 6 3 x=0 6
Complete expressions for the marginal PDFs are
2x+2
3 0≤x≤1 2y+1
6 0≤y≤2
fX (x) = fY (y) = (4)
0 otherwise 0 otherwise
(a) The expected value of X is
' ∞ ' $1
1
2x + 2 2x3 x2 $$ 5
E [X] = xfX (x) dx = x dx = + $ = (5)
−∞ 0 3 9 3 0 9
The second moment of X is
' ∞ ' $1
2 2
1
2 2x +2 x4 2x3 $$ 7
E X = x fX (x) dx = x dx = + = (6)
−∞ 0 3 6 9 $0 18
The variance of X is Var[X] = E[X 2 ] − (E[X])2 = 7/18 − (5/9)2 = 13/162.
(b) The expected value of Y is
' ∞ ' $2
2
2y + 1 y2 y 3 $$ 11
E [Y ] = yfY (y) dy = y dy = + $ = (7)
−∞ 0 6 12 9 0 9
The second moment of Y is
' ∞ ' $2
2 2
2
2 2y +1 y3 y 4 $$ 16
E Y = y fY (y) dy = y dy = + $ = (8)
−∞ 0 6 18 12 0 9
The variance of Y is Var[Y ] = E[Y 2 ] − (E[Y ])2 = 23/81.
(c) The correlation of X and Y is
''
E [XY ] = xyfX,Y (x, y) dx dy (9)
' 1' 2
x+y
= xy dy dx (10)
3
0
' 1 !0
$y=2 "
x2 y 2 xy 3 $$
= + dx (11)
0 6 9 $y=0
' 1 2
$1
2x 8x 2x3 4x2 $$ 2
= + dx = + $ = (12)
0 3 9 9 9 0 3
The covariance is Cov[X, Y ] = E[XY ] − E[X]E[Y ] = −1/81.
154
(d) The expected value of X and Y is
155
Problem 4.7.10 Solution
The joint PDF of X and Y and the region of nonzero probability are
Y
1
5x2 /2 −1 ≤ x ≤ 1, 0 ≤ y ≤ x2
fX,Y (x, y) = (1)
0 otherwise
X
-1 1
1
2 0≤y≤x≤1
fX,Y (x, y) = (1)
0 otherwise
X
1
156
Before finding moments, it is helpful to first find the marginal PDFs. For 0 ≤ x ≤ 1,
' ∞ ' x
fX (x) = fX,Y (x, y) dy = 2 dy = 2x (2)
−∞ 0
Also, for y < 0 or y > 1, fY (y) = 0. Complete expressions for the marginal PDFs are
2x 0 ≤ x ≤ 1 2(1 − y) 0 ≤ y ≤ 1
fX (x) = fY (y) = (4)
0 otherwise 0 otherwise
The covariance is
Cov [X, Y ] = E [XY ] − E [X] E [Y ] = 1/36. (10)
157
Problem 4.7.12 Solution
Y Random variables X and Y have joint PDF
1/2 −1 ≤ x ≤ y ≤ 1
fX,Y (x, y) = (1)
X
0 otherwise
1
The region of possible pairs (x, y) is shown with the joint PDF. The rest of
-1
this problem is just calculus.
' ' ' $1
1 1
xy 1 1
x2 x4 $$
E [XY ] = dy dx = x(1 − x ) dx =
2
− $ =0 (2)
−1 x 2 4 −1 8 16 −1
' 1' 1
1 x y
E eX+Y = e e dy dx (3)
−1 x 2
' 1
1
= ex (e1 − ex ) dx (4)
2 −1
$
1 1+x 1 2x $$1 e2 e−2 1
= e − e $ = + − (5)
2 4 −1 4 4 2
158
The second moment of K is
∞ ∞
n
(1 − p)n−1 p (1 − p)n−1 p 2
n
E K2 = k2 = k (7)
n n
n=1 k=1 n=1 k=1
n 2
Using the identity k=1 k = n(n + 1)(2n + 1)/6, we obtain
∞
2 (n + 1)(2n + 1) (N + 1)(2N + 1)
E K = (1 − p)n−1
p=E (8)
6 6
n=1
Applying the values of E[N ] and E[N 2 ] found above, we find that
2 E N 2 E [N ] 1 2 1 1
E K = + + = 2+ + (9)
3 2 6 3p 6p 6
Thus, we can calculate the variance of K.
5 1 5
Var[K] = E K 2 − (E [K])2 = − + (10)
12p2 3p 12
To find the correlation of N and K,
∞
∞
(1 − p)n−1 p
n n
E [N K] = nk = (1 − p)n−1 p k (11)
n
n=1 k=1 n=1 k=1
n
Since k=1 k = n(n + 1)/2,
∞
n(n + 1) N (N + 1) 1
E [N K] = (1 − p) n−1
p=E = 2 (12)
2 2 p
n=1
10
10
P [A] = P [X > 5, Y > 5] = 0.01 = 0.25 (1)
x=6 y=6
159
Problem 4.8.2 Solution
The event B occurs iff X ≤ 5 and Y ≤ 5 and has probability
5
5
P [B] = P [X ≤ 5, Y ≤ 5] = 0.01 = 0.25 (1)
x=1 y=1
So then
6e−(2x+3y)
1−3e−2 +2e−3
x + y ≤ 1, x ≥ 0, y ≥ 0
fX,Y |A (x, y) = (2)
0 otherwise
n
n
1
PN (n) = PN,K (n, k) = (1 − p)n−1 p = (1 − p)n−1 p (1)
n
k=1 k=1
The conditional PMF PN |B (n|b) could be found directly from PN (n) using Theorem 2.17. However,
we can also find it just by summing the conditional joint PMF.
n
(1 − p)n−10 p n = 10, 11, . . .
PN |B (n) = PN,K|B (n, k) = (5)
0 otherwise
k=1
160
From the conditional PMF PN |B (n), we can calculate directly the conditional moments of N given
B. Instead, however, we observe that given B, N = N − 9 has a geometric PMF with mean 1/p.
That is, for n = 1, 2, . . .,
Note that further along in the problem we will need E[N 2 |B] which we now calculate.
E N 2 |B = Var[N |B] + (E [N |B])2 (9)
2 17
= 2+ + 81 (10)
p p
For the conditional moments of K, we work directly with the conditional PMF PN,K|B (n, k).
∞
∞
(1 − p)n−10 p
n n
(1 − p)n−10 p
E [K|B] = k = k (11)
n n
n=10 k=1 n=10 k=1
n
Since k=1 k = n(n + 1)/2,
∞
n+1 1 1
E [K|B] = (1 − p)n−1 p = E [N + 1|B] = +5 (12)
2 2 2p
n=1
Applying the values of E[N |B] and E[N 2 |B] found above, we find that
2 E N 2 |B E [N |B] 1 2 37 2
E K |B = + + = 2+ + 31 (16)
3 2 6 3p 6p 3
Thus, we can calculate the conditional variance of K.
5 7 2
Var[K|B] = E K 2 |B − (E [K|B])2 = 2
− +6 (17)
12p 6p 3
161
To find the conditional correlation of N and K,
∞
n ∞
n
(1 − p)n−10 p
E [N K|B] = nk = (1 − p)n−1
p k (18)
n
n=10 k=1 n=10 k=1
n
Since k=1 k = n(n + 1)/2,
∞
n(n + 1) 1 1 9
E [N K|B] = (1 − p)n−10 p = E [N (N + 1)|B] = 2 + + 45 (19)
2 2 p p
n=10
From fX,Y |A (x, y), we find the conditional marginal PDF fX|A (x). For 0 ≤ x ≤ 1,
' ∞ ' 1 $y=1
y2 $ 1
fX|A (x) = fX,Y |A (x, y) dy = (x + y) dy = xy + $$ =x+ (7)
−∞ 0 2 y=0 2
The complete expression is
x + 1/2 0 ≤ x ≤ 1
fX|A (x) = (8)
0 otherwise
For 0 ≤ y ≤ 1, the conditional marginal PDF of Y is
' ∞ ' 1 $x=1
x2 $
fY |A (y) = fX,Y |A (x, y) dx = (x + y) dx = + xy $$ = y + 1/2 (9)
−∞ 0 2 x=0
The complete expression is
y + 1/2 0 ≤ y ≤ 1
fY |A (y) = (10)
0 otherwise
162
Problem 4.8.6 Solution
Random variables X and Y have joint PDF
(4x + 2y)/3 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
fX,Y (x, y) = (1)
0 otherwise
163
Problem 4.8.7 Solution
(b)
' ∞ '
1
120x2 80
19 (1 − y 3/2 ) 0 ≤ y ≤ 1/4
fY |A (y) = fX,Y |A (x, y) dx = 2 √
dx = (7)
−∞ y 19 0 otherwise
164
(e) The conditional mean of X given A is
' −1/2 ' 1/2 ' 1
30x3 120x5 30x3
E [X|A] = dx + dx + dx = 0 (12)
−1 19 −1/2 19 1/2 19
Given a trip lasts d days, the weight change is equally likely to be any value between −d and d
pounds. Thus,
1/(2d + 1) w = −d, −d + 1, . . . , d
PW |D (w|d) = (2)
0 otherwise
The joint PMF is simply
outcome P [·] W Y
hh p2 0 2
ht p(1 − p) 1 1 (1)
th p(1 − p) −1 1
tt (1 − p)2 0 0
In the following table, we write the joint PMF PW,Y (w, y) along with the marginal PMFs PY (y)
and PW (w).
PW,Y (w, y) w = −1 w=0 w=1 PY (y)
y=0 0 (1 − p)2 0 (1 − p)2
y=1 p(1 − p) 0 p(1 − p) 2p(1 − p) (2)
y=2 0 p2 0 p2
PW (w) p(1 − p) 1 − 2p + 2p2 p(1 − p)
Using the definition PW |Y (w|y) = PW,Y (w, y)/PY (y), we can find the conditional PMFs of W given
Y.
1 w=0 1/2 w = −1, 1
PW |Y (w|0) = PW |Y (w|1) = (3)
0 otherwise 0 otherwise
1 w=0
PW |Y (w|2) = (4)
0 otherwise
165
Similarly, the conditional PMFs of Y given W are
⎧ (1−p)2
⎪
⎨ 1−2p+2p2
y=0
1 y=1 p2
PY |W (y| − 1) = PY |W (y|0) = y=2 (5)
0 otherwise ⎪
⎩ 1−2p+2p2
0 otherwise
1 y=1
PY |W (y|1) = (6)
0 otherwise
(a) The conditional PDF fX|Y (x|y) is defined for all y such that 0 ≤ y ≤ 1. For 0 ≤ y ≤ 1,
(x+y)
fX,Y (x, y) (x + y) 0≤x≤1
fX|Y (x) = = (1 = x+1/2 (2)
fX (x) 0 otherwise
0 (x + y) dy
(b) The conditional PDF fY |X (y|x) is defined for all values of x in the interval [0, 1]. For 0 ≤
x ≤ 1,
(x+y)
y+1/2 0 ≤ y ≤ 1
fX,Y (x, y) (x + y)
fY |X (y) = = (1 = (3)
fY (y) 0 otherwise
0 (x + y) dx
1
2 0≤y≤x≤1
fX,Y (x, y) = (1)
0 otherwise
X
1
For 0 ≤ y ≤ 1,
' ∞ ' 1
fY (y) = fX,Y (x, y) dx = 2 dx = 2(1 − y) (2)
−∞ y
Also, for y < 0 or y > 1, fY (y) = 0. The complete expression for the marginal PDF is
2(1 − y) 0 ≤ y ≤ 1
fY (y) = (3)
0 otherwise
166
That is, since Y ≤ X ≤ 1, X is uniform over [y, 1] when Y = y. The conditional expectation of X
given Y = y can be calculated as
' ∞
E [X|Y = y] = xfX|Y (x|y) dx (5)
−∞
' $1
1
x x2 $$ 1+y
= dx = = (6)
y 1−y 2(1 − y) $y 2
In fact, since we know that the conditional PDF of X is uniform over [y, 1] when Y = y, it wasn’t
really necessary to perform the calculation.
1
2 0≤y≤x≤1
fX,Y (x, y) = (1)
0 otherwise
X
1
For 0 ≤ x ≤ 1, the marginal PDF for X satisfies
' ∞ ' x
fX (x) = fX,Y (x, y) dy = 2 dy = 2x (2)
−∞ 0
Note that fX (x) = 0 for x < 0 or x > 1. Hence the complete expression for the marginal PDF of
X is
2x 0 ≤ x ≤ 1
fX (x) = (3)
0 otherwise
The conditional PDF of Y given X = x is
fX,Y (x, y) 1/x 0 ≤ y ≤ x
fY |X (y|x) = = (4)
fX (x) 0 otherwise
Given X = x, Y has a uniform PDF over [0, x] and thus (has conditional expected value E[Y |X = x] =
∞
x/2. Another way to obtain this result is to calculate −∞ yfY |X (y|x) dy.
167
(a) First we observe that A takes on the values SA = {−1, 1} while B takes on values from
SB = {0, 1}. To construct a table describing PA,B (a, b) we build a table for all possible values
of pairs (A, B). The general form of the entries is
Now we fill in the entries using the conditional PMFs PB|A (b|a) and the marginal PMF PA (a).
This yields
1
E [B|A = 1] = bPB|A (b|1) = PB|A (1|1) = 1/2. (4)
b=0
(c) Before finding the conditional PMF PA|B (a|1), we first sum the columns of the joint PMF
table to find
4/9 b = 0
PB (b) = (5)
5/9 b = 1
The conditional PMF of A given B = 1 is
PA,B (a, 1) 2/5 a = −1
PA|B (a|1) = = (6)
PB (1) 3/5 a = 1
(d) Now that we have the conditional PMF PA|B (a|1), calculating conditional expectations is
easy.
E [A|B = 1] = aPA|B (a|1) = −1(2/5) + (3/5) = 1/5 (7)
a=−1,1
E A2 |B = 1 = a2 PA|B (a|1) = 2/5 + 3/5 = 1 (8)
a=−1,1
168
(e) To calculate the covariance, we need
E [A] = aPA (a) = −1(1/3) + 1(2/3) = 1/3 (10)
a=−1,1
1
E [B] = bPB (b) = 0(4/9) + 1(5/9) = 5/9 (11)
b=0
1
E [AB] = abPA,B (a, b) (12)
a=−1,1 b=0
1
E [B|A = −1] = bPB|A (b| − 1) = 0(1/3) + 1(2/3) = 2/3 (1)
b=0
1
E [B|A = 1] = bPB|A (b|1) = 0(1/2) + 1(1/2) = 1/2 (2)
b=0
Keep in mind that E[B|A] is a random variable that is a function of A. that is we can write
2/3 A = −1
E [B|A] = g(A) = (3)
1/2 A = 1
169
We can find the marginal PMF for N by summing over all possible K. For n ≥ 0,
n
100n e−100 100n e−100
PN (n) = = (2)
(n + 1)! n!
k=0
We see that N has a Poisson PMF with expected value 100. For n ≥ 0, the conditional PMF of K
given N = n is
PN,K (n, k) 1/(n + 1) k = 0, 1, . . . , n
PK|N (k|n) = = (3)
PN (n) 0 otherwise
That is, given N = n, K has a discrete uniform PMF over {0, 1, . . . , n}. Thus,
n
E [K|N = n] = k/(n + 1) = n/2 (4)
k=0
The other fact we know is that for each of the 100 pizzas sold, the pizza is sold before noon with
probability 1/2. Hence, N has the binomial PMF
100 n 100−n n = 0, 1, . . . , 100
PN (n) = n (1/2) (1/2) (2)
0 otherwise
1/2 −1 ≤ x ≤ y ≤ 1
X fX,Y (x, y) = (1)
0 otherwise
1
-1
170
(a) For −1 ≤ y ≤ 1, the marginal PDF of Y is
' ∞ '
1 y
fY (y) = fX,Y (x, y) dx = dx = (y + 1)/2 (2)
−∞ 2 −1
The complete expression for the marginal PDF of Y is
(y + 1)/2 −1 ≤ y ≤ 1
fY (y) = (3)
0 otherwise
(c) Given Y = y, the conditional PDF of X is uniform over [−1, y]. Hence the conditional
expected value is E[X|Y = y] = (y − 1)/2.
E [Y |X = x] = 0 (4)
171
A complete expression for the joint PMF of M and N is
(1 − p)n−2 p2 m = 1, 2, . . . , n − 1; n = m + 1, m + 2, . . .
PM,N (m, n) = (4)
0 otherwise
n−1
PN (n) = (1 − p)n−2 p2 = (n − 1)(1 − p)n−2 p2 , n = 2, 3, . . . (5)
m=1
Not surprisingly, if we view each voice call as a successful Bernoulli trial, M has a geometric PMF
since it is the number of trials up to and including the first success. Also, N has a Pascal PMF
since it is the number of trials required to see 2 successes. The conditional PMF’s are now easy to
find.
PM,N (m, n) (1 − p)n−m−1 p n = m + 1, m + 2, . . .
PN |M (n|m) = = (11)
PM (m) 0 otherwise
Given that call N = n was the second voice call, the first voice call is equally likely to occur in any
of the previous n − 1 calls.
(a) The number of buses, N , must be greater than zero. Also, the number of minutes that
pass cannot be less than the number of buses. Thus, P [N = n, T = t] > 0 for integers n, t
satisfying 1 ≤ n ≤ t.
172
(b) First, we find the joint PMF of N and T by carefully considering the possible sample paths.
In particular, PN,T (n, t) = P [ABC] = P [A]P [B]P [C] where the events A, B and C are
A = {n − 1 buses arrive in the first t − 1 minutes} (1)
B = {none of the first n − 1 buses are boarded} (2)
C = {at time t a bus arrives and is boarded} (3)
These events are independent since each trial to board a bus is independent of when the buses
arrive. These events have probabilities
t − 1 n−1
P [A] = p (1 − p)t−1−(n−1) (4)
n−1
P [B] = (1 − q)n−1 (5)
P [C] = pq (6)
Consequently, the joint PMF of N and T is
t−1 n−1
n−1 p (1 − p)t−n (1 − q)n−1 pq n ≥ 1, t ≥ n
PN,T (n, t) = (7)
0 otherwise
(c) It is possible to find the marginal PMF’s by summing the joint PMF. However, it is much
easier to obtain the marginal PMFs by consideration of the experiment. Specifically, when a
bus arrives, it is boarded with probability q. Moreover, the experiment ends when a bus is
boarded. By viewing whether each arriving bus is boarded as an independent trial, N is the
number of trials until the first success. Thus, N has the geometric PMF
(1 − q)n−1 q n = 1, 2, . . .
PN (n) = (8)
0 otherwise
To find the PMF of T , suppose we regard each minute as an independent trial in which a
success occurs if a bus arrives and that bus is boarded. In this case, the success probability
is pq and T is the number of minutes up to and including the first success. The PMF of T is
also geometric.
(1 − pq)t−1 pq t = 1, 2, . . .
PT (t) = (9)
0 otherwise
(d) Once we have the marginal PMFs, the conditional PMFs are easy to find.
n−1 t−1−(n−1)
PN,T (n, t) t−1 p(1−q) 1−p
PN |T (n|t) = = n−1 1−pq 1−pq n = 1, 2, . . . , t (10)
PT (t) 0 otherwise
That is, given you depart at time T = t, the number of buses that arrive during minutes
1, . . . , t−1 has a binomial PMF since in each minute a bus arrives with probability p. Similarly,
the conditional PMF of T given N is
t−1 n
n−1 p (1 − p)
PN,T (n, t) t−n t = n, n + 1, . . .
PT |N (t|n) = = (11)
PN (n) 0 otherwise
This result can be explained. Given that you board bus N = n, the time T when you leave
is the time for n buses to arrive. If we view each bus arrival as a success of an independent
trial, the time for n buses to arrive has the above Pascal PMF.
173
Problem 4.9.15 Solution
If you construct a tree describing what type of call (if any) that arrived in any 1 millisecond period,
it will be apparent that a fax call arrives with probability α = pqr or no fax arrives with probability
1 − α. That is, whether a fax message arrives each millisecond is a Bernoulli trial with success
probability α. Thus, the time required for the first success has the geometric PMF
(1 − α)t−1 α t = 1, 2, . . .
PT (t) = (1)
0 otherwise
Note that N is the number of trials required to observe 100 successes. Moreover, the number of
trials needed to observe 100 successes is N = T + N where N is the number of trials needed
to observe successes 2 through 100. Since N is just the number of trials needed to observe 99
successes, it has the Pascal (k = 99, p) PMF
n − 1 99
PN (n) = α (1 − α)n−99 . (2)
98
Since the trials needed to generate successes 2 though 100 are independent of the trials that yield
the first success, N and T are independent. Hence
PN |T (n|t) = PN |T (n − t|t) = PN (n − t) . (3)
Applying the PMF of N found above, we have
n − t − 1 99
PN |T (n|t) = α (1 − α)n−t−99 . (4)
98
Finally the joint PMF of N and T is
PN,T (n, t) = PN |T (n|t) PT (t) (5)
n−t−1 100
98 α (1 − α)n−100 t = 1, 2, . . . ; n = 99 + t, 100 + t, . . .
= (6)
0 otherwise
This solution can also be found a consideration of the sample sequence of Bernoulli trials in which
we either observe or do not observe a fax message.
To find the conditional PMF PT |N (t|n), we first must recognize that N is simply the number
of trials needed to observe 100 successes and thus has the Pascal PMF
n − 1 100
PN (n) = α (1 − α)n−100 (7)
99
Hence for any integer n ≥ 100, the conditional PMF is
⎧ n−t−1
PN,T (n, t) ⎨ ( n−198 )
t = 1, 2, . . . , n − 99
PT |N (t|n) = = ( 99 ) (8)
PN (n) ⎩ 0 otherwise.
174
The joint PMF of X and N can be expressed as the product of the marginal PMFs because we
know that X and Y are independent.
75 25
PX,Y (x, y) = (1/2)100 (2)
x y
(a) Normally, checking independence requires the marginal PMFs. However, in this problem, the
zeroes in the table of the joint PMF PX,Y (x, y) allows us to verify very quickly that X and
Y are dependent. In particular, PX (−1) = 1/4 and PY (1) = 14/48 but
PX,Y (−1, 1) = 0 = PX (−1) PY (1) (1)
(b) To fill in the tree diagram, we need the marginal PMF PX (x) and the conditional PMFs
PY |X (y|x). By summing the rows on the table for the joint PMF, we obtain
PX,Y (x, y) y = −1 y = 0 y = 1 PX (x)
x = −1 3/16 1/16 0 1/4
(2)
x=0 1/6 1/6 1/6 1/2
x=1 0 1/8 1/8 1/4
Now we use the conditional PMF PY |X (y|x) = PX,Y (x, y)/PX (x) to write
⎧
⎨ 3/4 y = −1
1/3 y = −1, 0, 1
PY |X (y| − 1) = 1/4 y = 0 PY |X (y|0) = (3)
⎩ 0 otherwise
0 otherwise
1/2 y = 0, 1
PY |X (y|1) = (4)
0 otherwise
175
Now we can us these probabilities to label the tree. The generic solution and the specific
solution with the exact values are
Y =−1
PY |X (−1|−1) Y =−1
3/4
X=−1
Y =0 X=−1
Y =0
PY |X (0|−1) 1/4
PX (−1) 1/4
Y =−1
PY |X (−1|0) Y =−1
1/3
@ PX (0)
X=0 H
PY |X (0|0)
HH
Y =0
@ 1/2
X=0 HH 1/3 Y =0
@ H @ HH
PY |X (1|0) H
H 1/3 H
H
@ Y =1 @ Y =1
@
PX (1)
@
1/4
@
@ X=1
P
XYX
|X (0|1)
Y =0
@
@ X=1 XXX1/2 Y =0
XX XX
X
X
PY |X (1|1) Y =1 1/2 X Y =1
Since PM |N (m|n) depends on the event N = n, we see that M and N are dependent.
Similarly, no matter how large X1 may be, the number of additional flips for the second heads
is the same experiment as the number of flips needed for the first occurrence of heads. That
is, PX2 (x) = PX1 (x). Moreover, the flips needed to generate the second occurrence of heads are
independent of the flips that yield the first heads. Hence, it should be apparent that X1 and X2
are independent and
(1 − p)x1 +x2 −2 p2 x1 = 1, 2, . . . ; x2 = 1, 2, . . .
PX1 ,X2 (x1 , x2 ) = PX1 (x1 ) PX2 (x2 ) = (2)
0 otherwise
However, if this independence is not obvious, it can be derived by examination of the sample path.
When x1 ≥ 1 and x2 ≥ 1, the event {X1 = x1 , X2 = x2 } occurs iff we observe the sample sequence
tt · · · t h tt
· · · t h
(3)
x1 − 1 times x2 − 1 times
The above sample sequence has probability (1−p)x1 −1 p(1−p)x2 −1 p which in fact equals PX1 ,X2 (x1 , x2 )
given earlier.
176
Problem 4.10.6 Solution
We will solve this problem when the probability of heads is p. For the fair coin, p = 1/2. The
number X1 of flips until the first heads and the number X2 of additional flips for the second heads
both have the geometric PMF
(1 − p)x−1 p x = 1, 2, . . .
PX1 (x) = PX2 (x) = (1)
0 otherwise
2(1 − p)
Var[Y ] = Var[X1 ] + Var[−X2 ] = Var[X1 ] + Var[X2 ] = (3)
p2
(a) To calculate P [X > Y ], we use the joint PDF fX,Y (x, y) = fX (x)fY (y).
''
P [X > Y ] = fX (x) fY (y) dx dy (2)
x>y
' ∞ '
1 −y/2 ∞ 1 −x/3
= e e dx dy (3)
0 2 y 3
' ∞
1 −y/2 −y/3
= e e dy (4)
2
' ∞
0
1 −(1/2+1/3)y 1/2 3
= e dy = = (5)
0 2 1/2 + 2/3 7
(b) Since X and Y are exponential random variables with parameters λX = 1/3 and λY = 1/2,
Appendix A tells us that E[X] = 1/λX = 3 and E[Y ] = 1/λY = 2. Since X and Y are
independent, the correlation is E[XY ] = E[X]E[Y ] = 6.
177
(b) By Theorem 3.5(f), Var[−X2 ] = (−1)2 Var[X2 ] = Var[X2 ]. Since X1 and X2 are independent,
Theorem 4.27(a) says that
PW (w) = P [W = w] = P [X + Y = w] . (1)
Since X and Y are independent, PX,Y (k, w − k) = PX (k)PY (w − k). It follows that for any integer
w,
∞
PW (w) = PX (k) PY (w − k) . (3)
k=−∞
This table can be translated directly into the joint PMF of N and D.
178
(b) We find the marginal PMF PD (d) by summing the columns of the joint PMF. This yields
⎧
⎪
⎪ 0.3 d = 20,
⎨
0.4 d = 100,
PD (d) = (2)
⎪
⎪ 0.3 d = 300,
⎩
0 otherwise.
(c) To find the conditional PMF PD|N (d|2), we first need to find the probability of the condition-
ing event
PN (2) = PN,D (2, 20) + PN,D (2, 100) + PN,D (2, 300) = 0.4 (3)
The conditional PMF of N D given N = 2 is
⎧
⎪
⎪ 1/4 d = 20
PN,D (2, d) ⎨ 1/2 d = 100
PD|N (d|2) = = (4)
PN (2) ⎪
⎪ 1/4 d = 300
⎩
0 otherwise
(e) To check independence, we could calculate the marginal PMFs of N and D. In this case,
however, it is simpler to observe that PD (d) = PD|N (d|2). Hence N and D are dependent.
(f) In terms of N and D, the cost (in cents) of a fax is C = N D. The expected value of C is
E [C] = ndPN,D (n, d) (6)
n,d
Factory Q Factory R
small order 0.3 0.2
medium order 0.1 0.2
large order 0.1 0.1
179
(b) Before we find E[B], it will prove helpful to find the marginal PMFs PB (b) and PM (m). These
can be found from the row and column sums of the table of the joint PMF
(c) From the marginal PMF of B, we know that PB (2) = 0.3. The conditional PMF of M given
B = 2 is ⎧
PB,M (2, m) ⎨
1/3 m = 60
PM |B (m|2) = = 2/3 m = 180 (4)
PB (2) ⎩
0 otherwise
(e) From the marginal PMFs we calculated in the table of part (b), we can conclude that B and
M are not independent. since PB,M (1, 60) = PB (1)PM (m)60.
(f) In terms of M and B, the cost (in cents) of sending a shipment is C = BM . The expected
value of C is
E [C] = bmPB,M (b, m) (6)
b,m
(a) Since X1 and X2 are identically distributed they will share the same CDF FX (x).
⎧
' x x≤0
⎨ 0 2
FX (x) = fX x dx = x /4 0 ≤ x ≤ 2 (2)
⎩
0 1 x≥2
180
(b) Since X1 and X2 are independent, we can say that
1
P [X1 ≤ 1, X2 ≤ 1] = P [X1 ≤ 1] P [X2 ≤ 1] = FX1 (1) FX2 (1) = [FX (1)]2 = (3)
16
(d)
For the event A = {X > Y }, this problem asks us to calculate the conditional expectations E[X|A]
and E[Y |A]. We will do this using the conditional joint PDF fX,Y |A (x, y). Since X and Y are
independent, it is tempting to argue that the event X > Y does not alter the probability model
for X and Y . Unfortunately, this is not the case. When we learn that X > Y , it increases the
probability that X is large and Y is small. We will see this when we compare the conditional
expectations E[X|A] and E[Y |A] to E[X] and E[Y ].
(a) We can calculate the unconditional expectations, E[X] and E[Y ], using the marginal PDFs
fX (x) and fY (y).
' ∞ ' 1
E [X] = fX (x) dx = 2x2 dx = 2/3 (2)
−∞ 0
' ∞ ' 1
E [Y ] = fY (y) dy = 3y 3 dy = 3/4 (3)
−∞ 0
(b) First, we need to calculate the conditional joint PDF ipdf X, Y |Ax, y. The first step is to
write down the joint PDF of X and Y :
6xy 2 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
fX,Y (x, y) = fX (x) fY (y) = (4)
0 otherwise
181
The event A has probability
Y
''
1 P [A] = fX,Y (x, y) dy dx (5)
x>y
' 1' x
X>Y
= 6xy 2 dy dx (6)
0 0
X ' 1
1 = 2x4 dx = 2/5 (7)
0
Y
The conditional joint PDF of X and Y given A is
1
fX,Y (x,y)
P [A] (x, y) ∈ A
fX,Y |A (x, y) = (8)
0 otherwise
15xy 2 0 ≤ y ≤ x ≤ 1
X = (9)
1 0 otherwise
The triangular region of nonzero probability is a signal that given A, X and Y are no longer
independent. The conditional expected value of X given A is
' ∞' ∞
E [X|A] = xfX,Y |A (x, y|a) x, y dy dx (10)
−∞ −∞
' 1 ' x
2
= 15 x y 2 dy dx (11)
0 0
' 1
=5 x5 dx = 5/6. (12)
0
We see that E[X|A] > E[X] while E[Y |A] < E[Y ]. That is, learning X > Y gives us a clue
that X may be larger than usual while Y may be smaller than usual.
Hence, FX,Y (x, y) = FX (x)FY (y) implies that X and Y are independent.
If X and Y are independent, then
182
By Definition 4.3,
' x ' y
FX,Y (x, y) = fX,Y (u, v) dv du (3)
−∞
' x
−∞
' y
= fX (u) du fY (v) dv (4)
−∞ −∞
= FX (x) FX (x) (5)
(a) To find if W and X are independent, we must be able to factor the joint density function
fX,W (x, w) into the product fX (x)fW (w) of marginal density functions. To verify this, we
must find the joint PDF of X and W . First we find the joint CDF.
FX,W (x, w) = P [X ≤ x, W ≤ w] (1)
= P [X ≤ x, Y − X ≤ w] = P [X ≤ x, Y ≤ X + w] (2)
Since Y ≥ X, the CDF of W satisfies FX,W (x, w) = P [X ≤ x, X ≤ Y ≤ X + w]. Thus, for
x ≥ 0 and w ≥ 0,
' x ' x +w
Y FX,W (x, w) = λ2 e−λy dy dx (3)
x
{X<x}∩{X<Y<X+w} '0 x $x +w
−λy $
= −λe $ dx (4)
x
' x
0
w = −λe−λ(x +w) + λe−λx dx (5)
X
0
$
$x
x = e−λ(x +w) − e−λx $ (6)
0
= (1 − e−λx )(1 − e −λw
) (7)
183
We see that FX,W (x, w) = FX (x)FW (w). Moreover, by applying Theorem 4.4,
∂ 2 FX,W (x, w)
fX,W (x, w) = = λe−λx λe−λw = fX (x) fW (w) . (8)
∂x ∂w
Since we have our desired factorization, W and X are independent.
(b) Following the same procedure, we find the joint CDF of Y and W .
FW,Y (w, y) = P [W ≤ w, Y ≤ y] = P [Y − X ≤ w, Y ≤ y] (9)
= P [Y ≤ X + w, Y ≤ y] . (10)
The region of integration corresponding to the event {Y ≤ x + w, Y ≤ y} depends on whether
y < w or y ≥ w. Keep in mind that although W = Y − X ≤ Y , the dummy arguments y and
w of fW,Y (w, y) need not obey the same constraints. In any case, we must consider each case
separately.
For y > w, the integration is
' y−w ' u+w
Y
FW,Y (w, y) = λ2 e−λv dv du
{Y<y}Ç{Y<X+w} 0 u
' y ' y
y + λ2 e−λv dv du (11)
y−w u
w ' y−w
X =λ e−λu − e−λ(u+w) du
y-w y 0
' y
+λ e−λu − e−λy du (12)
y−w
It follows that
$y−w $y
$ $
FW,Y (w, y) = −e−λu + e−λ(u+w) $ + −e−λu − uλe−λy $ (13)
0 y−w
= 1 − e−λw − λwe−λy . (14)
For y ≤ w,
Y ' y ' y
FW,Y (w, y) = λ2 e−λv dv du (15)
{Y<y} '0
y
u
w = −λe−λy + λe−λu du (16)
y 0
$y
$
X = −λue−λy − e−λu $ (17)
0
= 1 − (1 + λy)e−λy (18)
The complete expression for the joint CDF is
⎧
⎨ 1 − e−λw − λwe−λy 0 ≤ w ≤ y
FW,Y (w, y) = 1 − (1 + λy)e−λy 0≤y≤w (19)
⎩
0 otherwise
Applying Theorem 4.4 yields
∂ 2 FW,Y (w, y) 2λ2 e−λy 0 ≤ w ≤ y
fW,Y (w, y) = = (20)
∂w ∂y 0 otherwise
The joint PDF fW,Y (w, y) doesn’t factor and thus W and Y are dependent.
184
Problem 4.10.17 Solution
We need to define the events A = {U ≤ u} and B = {V ≤ v}. In this case,
FU,V (u, v) = P [AB] = P [B] − P [Ac B] = P [V ≤ v] − P [U > u, V ≤ v] (1)
Note that U = min(X, Y ) > u if and only if X > u and Y > u. In the same way, since V =
max(X, Y ), V ≤ v if and only if X ≤ v and Y ≤ v. Thus
P [U > u, V ≤ v] = P [X > u, Y > u, X ≤ v, Y ≤ v] (2)
= P [u < X ≤ v, u < Y ≤ v] (3)
Thus, the joint CDF of U and V satisfies
FU,V (u, v) = P [V ≤ v] − P [U > u, V ≤ v] (4)
= P [X ≤ v, Y ≤ v] − P [u < X ≤ v, u < X ≤ v] (5)
Since X and Y are independent random variables,
FU,V (u, v) = P [X ≤ v] P [Y ≤ v] − P [u < X ≤ v] P [u < X ≤ v] (6)
= FX (v) FY (v) − (FX (v) − FX (u)) (FY (v) − FY (u)) (7)
= FX (v) FY (u) + FX (u) FY (v) − FX (u) FY (u) (8)
The joint PDF is
∂ 2 FU,V (u, v)
fU,V (u, v) = (9)
∂u∂v
∂
= [fX (v) FY (u) + FX (u) fY (v)] (10)
∂u
= fX (u) fY (v) + fX (v) fY (v) (11)
185
Problem 4.11.2 Solution
For the joint PDF
2 −4xy+4y 2 )
fX,Y (x, y) = ce−(2x , (1)
we proceed as in Problem 4.11.1 to find values for σY , σX , E[X], E[Y ] and ρ.
In the problem statement, we learn that E[Y |X] = X/2. Hence ρ = 1/2. From Definition 4.17, the
joint PDF is
1 2 2
fX,Y (x, y) = √ e−2(x −xy+y )/3 (3)
3π 2
186
Problem 4.11.4 Solution
The event B is the set of outcomes satisfying X 2 + Y 2 ≤ 22 . Of ocurse, the calculation of P [B]
depends on the probability model for X and Y .
(a) In this instance, X and Y have the same PDF
0.01 −50 ≤ x ≤ 50
fX (x) = fY (x) = (1)
0 otherwise
Since X and Y are independent, their joint PDF is
−4
10 −50 ≤ x ≤ 50, −50 ≤ y ≤ 50
fX,Y (x, y) = fX (x) fY (y) = (2)
0 otherwise
Because X and Y have a uniform PDF over the bullseye area, P [B] is just the value of the
joint PDF over the area times the area of the bullseye.
P [B] = P X 2 + Y 2 ≤ 22 = 10−4 · π22 = 4π · 10−4 ≈ 0.0013 (3)
(b) In this case, the joint PDF of X and Y is inversely proportional to the area of the target.
1/[π502 ] x2 + y 2 ≤ 502
fX,Y (x, y) = (4)
0 otherwise
The probability of a bullseye is
2
π22 1
P [B] = P X 2 + Y 2 ≤ 22 = = ≈ 0.0016. (5)
π502 25
(c) In this instance, X and Y have the identical Gaussian (0, σ) PDF with σ 2 = 100; i.e.,
1 2 /2σ 2
fX (x) = fY (x) = √ e−x (6)
2πσ 2
Since X and Y are independent, their joint PDF is
1 −(x2 +y2 )/2σ2
fX,Y (x, y) = fX (x) fY (y) = e (7)
2πσ 2
To find P [B], we write
''
P [B] = P X + Y ≤ 2
2 2 2
= fX,Y (x, y) dx dy (8)
x2 +y 2 ≤22
''
1 2 +y 2 )/2σ 2
= e−(x dx dy (9)
2πσ 2 x2 +y 2 ≤22
This integral is easy using polar coordinates. With the substitutions x2 + y 2 = r2 , and
dx dy = r dr dθ,
' 2 ' 2π
1 2 2
P [B] = 2
e−r /2σ r dr dθ (10)
2πσ 0 0
' 2
1 2 2
= 2 re−r /2σ dr (11)
σ 0
2
$
2 $2
= −e−r /2σ $ = 1 − e−4/200 ≈ 0.0198. (12)
0
187
Problem 4.11.5 Solution
Given that the temperature is high, then W is measured. Since ρ = 0, W and T are inde-
pendent and
W −7 10 − 7
q = P [W > 10] = P > = 1 − Φ(1.5) = 0.067. (2)
2 2
The tree for this experiment is
q W >10
p T >38 W ≤10
1−q
T ≤38
1−p
P [I] = P [T > 38, W > 10] = P [T > 38] P [W > 10] = pq = 0.0107. (3)
188
Plugging in the various parameters gives
√
E [W |T = t] = 7 + 2(t − 37) and Var [W |T = t] = 2 (9)
Using this conditional mean and variance, we obtain the conditional Gaussian PDF
1 √ 2
fW |T (w|t) = √ e−(w−(7+ 2(t−37))) /4 . (10)
4π
Given T = t, the conditional probability the person is declared ill is
Plugging these values into the equation for b, it follows that b = −2acρ, or, equivalently, ρ = −b/2ac.
This implies
1
d2 = 2 2 2 = (1 − ρ2 )a2 c2 = a2 c2 − b2 /4 (3)
4π σX σY (1 − ρ2 )
Since |ρ| ≤ 1,
we see that |b| ≤ 2ac. Further, for any choice of a, b and c that meets this constraint,
choosing d = a2 c2 − b2 /4 yields a valid PDF.
189
and y, we obtain
' ∞' ∞ ' ∞ ' ∞
1 −(x−µX )2 /2σX
2 1 2 2
fX,Y (x, y) dx dy = √ e √ e−(y−µ̃Y (x)) /2σ̃Y
dy dx (2)
−∞ −∞ −∞ σX 2π −∞ σ̃Y 2π
1
' ∞
1 2 2
= √ e−(x−µX ) /2σX
dx (3)
−∞ σX 2π
The marked integral equals 1 because for each value of x, it is the integral of a Gaussian PDF of
one variable over all possible values. In fact, it is the integral of the conditional PDF fY |X (y|x)
over all possible y. To complete the proof, we see that
' ∞' ∞ ' ∞
1 2 2
fX,Y (x, y) dx dy = √ e−(x−µX ) /2σX dx = 1 (4)
−∞ −∞ −∞ σX 2π
since the remaining integral is the integral of the marginal Gaussian PDF fX (x) over all possible
x.
Since Var[Y ] = E[Y 2 ] − (E[Y ])2 , we will find the moments of Y . The first moment is
For the second moment of Y , we follow the problem hint and use the iterated expectation
E Y 2 = E X12 X22 = E E X12 X22 |X2 = E X22 E X12 |X2 . (3)
It follows that
E X12 X22 = E X22 E X12 |X22 (8)
2
σ1 2 σ1
= E [µ1 + σ1 (1 − ρ )]X2 + 2ρµ1 (X2 − µ2 )X2 + ρ 2 (X2 − µ2 ) X2 .
2 2 2 2 2 2 2
(9)
σ2 σ2
190
Since E[X22 ] = σ22 + µ22 ,
E X12 X22 = µ21 + σ12 (1 − ρ2 ) (σ22 + µ22 )
σ1 σ2
+ 2ρµ1 E (X2 − µ2 )X22 + ρ2 12 E (X2 − µ2 )2 X22 . (10)
σ2 σ2
We observe that
E (X2 − µ2 )X22 = E (X2 − µ2 )(X2 − µ2 + µ2 )2 (11)
= E (X2 − µ2 ) (X2 − µ2 )2 + 2µ2 (X2 − µ2 ) + µ22 (12)
= E (X2 − µ2 )3 + 2µ2 E (X2 − µ2 )2 + µ2 E [(X2 − µ2 )] (13)
We recall that E[X2 − µ2 ] = 0 and that E[(X2 − µ2 )2 ] = σ22 . We now look ahead to Problem 6.3.4
to learn that
E (X2 − µ2 )3 = 0, E (X2 − µ2 )4 = 3σ24 . (14)
This implies
E (X2 − µ2 )X22 = 2µ2 σ22 . (15)
It follows that
E (X2 − µ2 )2 X22 = 3σ24 + µ22 σ22 . (20)
191
>> format rat;
>> imagepmf;
>> [SX(:) SY(:) PXY(:)]
ans =
800 400 1/5
1200 400 1/20
1600 400 0
800 800 1/20
1200 800 1/5
1600 800 1/10
800 1200 1/10
1200 1200 1/10
1600 1200 1/5
>>
Note that the command format rat wasn’t necessary; it just formats the output as rational num-
bers, i.e., ratios of integers, which you may or may not find esthetically pleasing.
function ex=finiteexp(sx,px);
%Usage: ex=finiteexp(sx,px)
%returns the expected value E[X]
%of finite random variable X described
%by samples sx and probabilities px
ex=sum((sx(:)).*(px(:)));
Note that finiteexp performs its calculations on the sample values sx and probabilities px using
the column vectors sx(:) and px(:). As a result, we can use the same finiteexp function when
the random variable is represented by grid variables. For example, we can calculate the correlation
r = E[XY ] as
r=finiteexp(SX.*SY,PXY)
function covxy=finitecov(SX,SY,PXY);
%Usage: cxy=finitecov(SX,SY,PXY)
%returns the covariance of
%finite random variables X and Y
%given by grids SX, SY, and PXY
ex=finiteexp(SX,PXY);
ey=finiteexp(SY,PXY);
R=finiteexp(SX.*SY,PXY);
covxy=R-ex*ey;
192
The following script calculates the desired quantities:
%imageavg.m >> imageavg
%Solution for Problem 4.12.2 ex =
imagepmf; %defines SX, SY, PXY 1180
ex=finiteexp(SX,PXY) ey =
ey=finiteexp(SY,PXY) 860
rxy=finiteexp(SX.*SY,PXY) rxy =
cxy=finitecov(SX,SY,PXY) 1064000
cxy =
49200
>>
The careful reader will observe that imagepmf is inefficiently coded in that the correlation E[XY ]
is calculated twice, once directly and once inside of finitecov. For more complex problems, it
would be worthwhile to avoid this duplication.
For functions like FX,Y (x, y) that have multiple cases, we calculate the function for each case and
multiply by the corresponding boolean condition so as to have a zero contribution when that case
doesn’t apply. Using this technique, its important to define the boundary conditions carefully to
make sure that no point is included in two different boundary conditions.
The only catch is that for a given value of y, we need to calculate the binomial probability of x − y
successes in (n − y − 1) trials. We can do this using the function call
binomialpmf(n-y-1,p,x-y)
193
However, this function expects the argument n-y-1 to be a scalar. As a result, we must perform a
separate call to binomialpmf for each value of y.
An alternate solution is direct calculation of the PMF PX,Y (x, y) in Problem 4.2.6. In this
case, we calculate m! using the Matlab function gamma(m+1). Because, gamma(x) function will
calculate the gamma function for each element in a vector x, we can calculate the PMF without
any loops:
function [SX,SY,PXY]=circuits2(n,p);
%Usage: [SX,SY,PXY]=circuits2(n,p);
% (See Problem 4.12.4)
[SX,SY]=ndgrid(0:n,0:n);
PXY=0*SX;
PXY(find((SX==n) & (SY==n)))=p^n;
I=find((SY<=SX) &(SX<n));
PXY(I)=(gamma(n-SY(I))./(gamma(SX(I)-SY(I)+1)...
.*gamma(n-SX(I)))).*(p.^SX(I)).*((1-p).^(n-SX(I)));
Some experimentation with cputime will show that circuits2(n,p) runs much faster than circuits(n,p).
As is typical, the for loop in circuit results in time wasted running the Matlab interpretor and
in regenerating the binomial PMF in each cycle.
To finish the problem, we need to calculate the correlation coefficient
Cov [X, Y ]
ρX,Y = . (1)
σX σY
In fact, this is one of those problems where a general solution is better than a specific solution.
The general problem is that given a pair of finite random variables described by the grid variables
SX, SY and PMF PXY, we wish to calculate the correlation coefficient
This problem is solved in a few simple steps. First we write a function that calculates the
expected value of a finite random variable.
function ex=finiteexp(sx,px);
%Usage: ex=finiteexp(sx,px)
%returns the expected value E[X]
%of finite random variable X described
%by samples sx and probabilities px
ex=sum((sx(:)).*(px(:)));
Note that finiteexp performs its calculations on the sample values sx and probabilities px using
the column vectors sx(:) and px(:). As a result, we can use the same finiteexp function when
the random variable is represented by grid variables. We can build on finiteexp to calculate the
variance using finitevar:
function v=finitevar(sx,px);
%Usage: ex=finitevar(sx,px)
% returns the variance Var[X]
% of finite random variables X described by
% samples sx and probabilities px
ex2=finiteexp(sx.^2,px);
ex=finiteexp(sx,px);
v=ex2-(ex^2);
194
function rho=finitecoeff(SX,SY,PXY);
%Usage: rho=finitecoeff(SX,SY,PXY)
%Calculate the correlation coefficient rho of
%finite random variables X and Y
ex=finiteexp(SX,PXY); vx=finitevar(SX,PXY);
ey=finiteexp(SY,PXY); vy=finitevar(SY,PXY);
R=finiteexp(SX.*SY,PXY);
rho=(R-ex*ey)/sqrt(vx*vy);
195
>> t2=cputime;w2=wrv2(1,1,1000000);t2=cputime-t2
t2 =
0.2500
>> t1=cputime;w1=wrv1(1,1,1000000);t1=cputime-t1
t1 =
0.7610
>>
We see in our simple experiments that wrv2 is faster by a rough factor of 3. (Note that repeating
such trials yielded qualitatively similar results.)
196
Problem Solutions – Chapter 5
(a) Since the four types of repairs are mutually exclusive choices and since 4 laptops are returned
for repair, the joint distribution of N1 , . . . , N4 is the multinomial PMF
4
PN1 ,...,N4 (n1 , . . . , n4 ) = pn1 pn2 pn3 pn4 (1)
n 1 , n2 , n3 , n4 1 2 3 4
8 n1 4 n2 2 n3 1 n4
n 1 !n2
4!
!n3 !n4 ! 15 15 15 15 n1 + · · · + n4 = 4; ni ≥ 0
=
0 otherwise
(2)
(b) Let L2 denote the event that exactly two laptops need LCD repairs. Thus P [L2 ] = PN1 (2).
Since each laptop requires an LCD repair with probability p1 = 8/15, the number of LCD
repairs, N1 , is a binomial (4, 8/15) random variable with PMF
4
PN1 (n1 ) = (8/15)n1 (7/15)4−n1 (3)
n1
(c) A repair is type (2) with probability p2 = 4/15. A repair is type (3) with probability p3 =
2/15; otherwise a repair is type “other” with probability po = 9/15. Define X as the number
of “other” repairs needed. The joint PMF of X, N2 , N3 is the multinomial PMF
n2
n3
x
4 4 2 9
PN2 ,N3 ,X (n2 , n3 , x) = (5)
n 2 , n3 , x 15 15 15
Similarly, since each repair is a motherboard repair with probability p2 = 4/15, the number
of motherboard repairs has binomial PMF
n2
4−n2
4 4 11
PN2 (n2 ) n2 = (9)
n2 15 15
197
Finally, the probability that more laptops require motherboard repairs than keyboard repairs
is
P [N2 > N3 ] = PN2 ,N3 (1, 0) + PN2 ,N3 (2, 0) + PN2 ,N3 (2, 1) + PN2 (3) + PN2 (4) (10)
where we use the fact that if N2 = 3 or N2 = 4, then we must have N2 > N3 . Inserting the
various probabilities, we obtain
P [N2 > N3 ] = PN2 ,N3 (1, 0) + PN2 ,N3 (2, 0) + PN2 ,N3 (2, 1) + PN2 (3) + PN2 (4) (11)
Since a computer has feature i with probability pi independent of whether any other feature is
on the computer, the number Ni of computers with feature i is independent of the number of
computers with any other features. That is, N1 , . . . , N4 are mutually independent and have joint
PMF
PN1 ,...,N4 (n1 , . . . , n4 ) = PN1 (n1 ) PN2 (n2 ) PN3 (n3 ) PN4 (n4 ) (2)
However, simplifying the above integral depends on the values of each xi . In particular,
fX1 ,...,Xn (y1 , . . . , yn ) = 1 if and only if 0 ≤ yi ≤ 1 for each i. Since FX1 ,...,Xn (x1 , . . . , xn ) = 0
if any xi < 0, we limit, for the moment, our attention to the case where xi ≥ 0 for all i. In
this case, some thought will show that we can write the limits in the following way:
' max(1,x1 ) ' min(1,xn )
FX1 ,...,Xn (x1 , . . . , xn ) = ··· dy1 · · · dyn (2)
0 0
= min(1, x1 ) min(1, x2 ) · · · min(1, xn ) (3)
198
(b) For n = 3,
1 − P min Xi ≤ 3/4 = P min Xi > 3/4 (5)
i i
However, just keep in mind that the inequalities 0 ≤ x and x ≤ 1 are vector inequalities that must
hold for every component xi .
Given fX (x) with c = 2/3 and a1 = a2 = a3 = 1 in Problem 5.2.2, find the marginal
PDF fX3 (x3 ).
1
The wrong problem statement appears in the first printing.
199
Filling in the parameters in Problem 5.2.2, we obtain the vector PDF
2
fX (x) = 3 (x1 + x2 + x3 ) 0 ≤ x1 , x2 , x3 ≤ 1 (1)
0 otherwise
Since PJi (j) > 0 only for integers j > 0, we have that PK (k) > 0 only for 0 < k1 < k2 < k3 ;
otherwise PK (k) = 0. Finally, for 0 < k1 < k2 < k3 ,
200
The complete expression is
p2 (1 − p)k2 −2 1 ≤ k1 < k2
PK1 ,K2 (k1 , k2 ) = (6)
0 otherwise
Next we find PK1 ,K3 (k1 , k3 ). For k1 ≥ 1 and k3 ≥ k1 + 2, we have
∞
3 −1
k
PK1 ,K3 (k1 , k3 ) = PK1 ,K2 ,K3 (k1 , k2 , k3 ) = p3 (1 − p)k3 −3 (7)
k2 =−∞ k2 =k1 +1
(b) Going back to first principles, we note that Kn is the number of trials up to and including
the nth success. Thus K1 is a geometric (p) random variable, K2 is an Pascal (2, p) random
variable, and K3 is an Pascal (3, p) random variable. We could write down the respective
marginal PMFs of K1 , K2 and K3 just by looking up the Pascal (n, p) PMF. Nevertheless, it
is instructive to derive these PMFs from the joint PMF PK1 ,K2 ,K3 (k1 , k2 , k3 ).
For k1 ≥ 1, we can find PK1 (k1 ) via
∞
∞
PK1 (k1 ) = PK1 ,K2 (k1 , k2 ) = p2 (1 − p)k2 −2 (13)
k2 =−∞ k2 =k1 +1
= p (1 − p)k1 −1 [1 + (1 − p) + (1 − p)2 + · · · ]
2
(14)
k1 −1
= p(1 − p) (15)
The complete expression for the PMF of K1 is the usual geometric PMF
p(1 − p)k1 −1 k1 = 1, 2, . . . ,
PK1 (k1 ) = (16)
0 otherwise.
Following the same procedure, the marginal PMF of K2 is
∞
2 −1
k
PK2 (k2 ) = PK1 ,K2 (k1 , k2 ) = p2 (1 − p)k2 −2 (17)
k1 =−∞ k1 =1
201
Since PK2 (k2 ) = 0 for k2 < 2, the complete PMF is the Pascal (2, p) PMF
k2 − 1 2
PK2 (k2 ) = p (1 − p)k2 −2 (19)
1
202
For 0 ≤ y1 ≤ 1, the marginal PDF of Y1 can be found from
' ∞ ' 1
fY1 (y1 ) = fY1 ,Y2 (y1 , y2 ) dy2 = 12(1 − y2 )2 dy2 = 4(1 − y1 )3 (10)
−∞ y1
Now we note that the following events are one in the same:
3 4
{N0 = n0 , N1 = n1 } = N0 = n0 , N1 = n1 , N̂ = 10000 − n0 − n1 (4)
203
(a) To evaluate the joint PMF of N1 and N2 , we define a new experiment with mutually exclusive
events: s1 , s2 and “other” Let N̂ denote the number of trial outcomes that are “other”. In
this case, a trial is in the “other” category with probability p̂ = 1 − p1 − p2 . The joint PMF
of N1 , N2 , and N̂ is
n!
PN1 ,N2 ,N̂ (n1 , n2 , n̂) = pn1 pn2 (1 − p1 − p2 )n̂ n1 + n2 + n̂ = n (2)
n1 !n2 !n̂! 1 2
Now we note that the following events are one in the same:
3 4
{N1 = n1 , N2 = n2 } = N1 = n1 , N2 = n2 , N̂ = n − n1 − n2 (3)
(b) We could find the PMF of Ti by summing the joint PMF PN1 ,...,Nr (n1 , . . . , nr ). However, it
is easier to start from first principles. Suppose we say a success occurs if the outcome of
the trial is in the set {s1 , s2 , . . . , si } and otherwise a failure occurs. In this case, the success
probability is qi = p1 + · · · + pi and Ti is the number of successes in n trials. Thus, Ti has the
binomial PMF n t
t qi (1 − qi )
n−t t = 0, 1, . . . , n
PTi (t) = (6)
0 otherwise
n!
PT1 ,T2 (t1 , t2 ) = pt1 pt2 −t1 (1 − p1 − p2 )n−t2 0 ≤ t 1 ≤ t2 ≤ n (10)
t1 !(t2 − t1 )!(n − t2 )! 1 2
(a) Note that Z is the number of three-page faxes. In principle, we can sum the joint PMF
PX,Y,Z (x, y, z) over all x, y to find PZ (z). However, it is better to realize that each fax has 3
pages with probability 1/6, independent of any other fax. Thus, Z has the binomial PMF
5 z 5−z z = 0, 1, . . . , 5
PZ (z) = z (1/6) (5/6) (1)
0 otherwise
(b) From the properties of the binomial distribution given in Appendix A, we know that E[Z] =
5(1/6).
204
(c) We want to find the conditional PMF of the number X of 1-page faxes and number Y of
2-page faxes given Z = 2 3-page faxes. Note that given Z = 2, X + Y = 3. Hence for
non-negative integers x, y satisfying x + y = 3,
5! x y 2
PX,Y,Z (x, y, 2) x!y!2! (1/3) (1/2) (1/6)
PX,Y |Z (x, y|2) = = 5 (2)
PZ (2) 2
2 (1/6) (5/6)
3
That is, given Z = 2, there are 3 faxes left, each of which independently could be a 1-page fax.
The conditonal PMF of the number of 1-page faxes is binomial where 2/5 is the conditional
probability that a fax has 1 page given that it either has 1 page or 2 pages. Moreover given
X = x and Z = 2 we must have Y = 3 − x.
(d) Given Z = 2, the conditional PMF of X is binomial for 3 trials and success probability 2/5.
The conditional expectation of X given Z = 2 is E[X|Z = 2] = 3(2/5) = 6/5.
(e) There are several ways to solve this problem. The most straightforward approach is to
realize that for integers 0 ≤ x ≤ 5 and 0 ≤ y ≤ 5, the event {X = x, Y = y} occurs iff
{X = x, Y = y, Z = 5 − (x + y)}. For the rest of this problem, we assume x and y are non-
negative integers so that
The above expression may seem unwieldy and it isn’t even clear that it will sum to 1. To
simplify the expression, we observe that
Using PZ (z) found in part (c), we can calculate PX,Y |Z (x, y|5 − x − y) for 0 ≤ x + y ≤ 5,
integer valued.
PX,Y,Z (x, y, 5 − x − y)
PX,Y |Z (x, y|5 − x + y) = (8)
PZ (5 − x − y)
x
y
x+y 1/3 1/2
= (9)
x 1/2 + 1/3 1/2 + 1/3
x
(x+y)−x
x+y 2 3
= (10)
x 5 5
In the above expression, it is wise to think of x + y as some fixed value. In that case, we see
that given x + y is a fixed value, X and Y have a joint PMF given by a binomial distribution
205
in x. This should not be surprising since it is just a generalization of the case when Z = 2.
That is, given that there were a fixed number of faxes that had either one or two pages, each
of those faxes is a one page fax with probability (1/3)/(1/2 + 1/3) and so the number of one
page faxes should have a binomial distribution, Moreover, given the number X of one page
faxes, the number Y of two page faxes is completely specified.
Finally, by rewriting PX,Y (x, y) given above, the complete expression for the joint PMF of X
and Y is
1 5−x−y 5 x+y x+y 2 x 3 y
5
5−x−y 6 6 x 5 5 x, y ≥ 0
PX,Y (x, y) = (11)
0 otherwise
{K1 = k1 , K2 = k2 , · · · , Kn = kn } (2)
Thus
206
(b) For j < n,
PK1 ,K2 ,...,Kj (k1 , k2 , . . . , kj ) = PKj (kj ) . (8)
Since Kj is just Kn with n = j, we have
j
p (1 − p)kj −j 1 ≤ k1 < k2 < · · · < kj ,
PKj (kj ) = (9)
0 otherwise.
(c) Rather than try to deduce PKi (ki ) from the joint PMF PKn (kn ), it is simpler to return to
first principles. In particular, Ki is the number of trials up to and including the ith success
and has the Pascal (i, p) PMF
ki − 1 i
PKi (ki ) = p (1 − p)ki −i . (10)
i−1
PN1 ,N2 ,N3 ,N4 (4, 4, 4, 4) = 0 = p41 p42 p43 p44 = PN1 (4) PN2 (4) PN3 (4) PN4 (4) . (1)
207
Thus,
1 0 ≤ x ≤ 1,
fX1 (x1 ) = (4)
0 otherwise.
Following similar steps, one can show that
1 0 ≤ x ≤ 1,
fX1 (x) = fX2 (x) = fX3 (x) = fX4 (x) = (5)
0 otherwise.
Thus
fX (x) = fX1 (x) fX2 (x) fX3 (x) fX4 (x) . (6)
We conclude that X1 , X2 , X3 and X4 are independent.
Thus,
e−x1 x1 ≥ 0,
fX1 (x1 ) = (5)
0 otherwise.
Following similar steps, one can show that
' ∞' ∞ −2x
2 2 x2 ≥ 0,
fX2 (x2 ) = fX (x) dx1 dx3 = (6)
0 0 0 otherwise.
' ∞' ∞ −3x
3 3 x3 ≥ 0,
fX3 (x3 ) = fX (x) dx1 dx2 = (7)
0 0 0 otherwise.
Thus
fX (x) = fX1 (x1 ) fX2 (x2 ) fX3 (x3 ) . (8)
We conclude that X1 , X2 , and X3 are independent.
fX1 ,X2 ,X3 (10, 9, 8) = 0 = fX1 (10) fX2 (9) fX3 (8) . (1)
It follows that X1 , X2 and X3 are dependent. Readers who find this quick answer dissatisfying
are invited to confirm this conclusions by solving Problem 5.4.6 for the exact expressions for the
marginal PDFs fX1 (x1 ), fX2 (x2 ), and fX3 (x3 ).
208
Problem 5.4.6 Solution
We find the marginal PDFs using Theorem 5.5. First we note that for x < 0, fXi (x) = 0. For
x1 ≥ 0,
' ∞ ' ∞
' ∞
−x3
fX1 (x1 ) = e dx3 dx2 = e−x2 dx2 = e−x1 (1)
x1 x2 x1
Lastly,
' x3 ' x3
' x3
−x3
fX3 (x3 ) = e dx2 dx1 = (x3 − x1 )e−x3 dx1 (3)
0 x1 0
$x1 =x3
1 $
2 −x3 $ 1
= − (x3 − x1 ) e $ = x23 e−x3 (4)
2 x1 =0 2
Since U1 , . . . , Un are continuous, P [Ui = Uj ] = 0 for all i = j. For the same reason, P [Xi = Xj ] = 0
for i = j. Thus we need only to consider the case when x1 < x2 < · · · < xn .
To understand the claim, it is instructive to start with the n = 2 case. In this case, (X1 , X2 ) =
(x1 , x2 ) (with x1 < x2 ) if either (U1 , U2 ) = (x1 , x2 ) or (U1 , U2 ) = (x2 , x1 ). For infinitesimal ∆,
209
For the general case of n uniform random variables, we define π = π(1) . . . π(n) as a permu-
tation vector of the integers 1, 2, . . . , n and Π as the set of n! possible permutation vectors. In this
case, the event {X1 = x1 , X2 = x2 , . . . , Xn = xn } occurs if
Since there are n! permutations and fU1 ,...,Un (xπ(1) , . . . , xπ(n) ) = 1/T n for each permutation π, we
can conclude that
fX1 ,...,Xn (x1 , . . . , xn ) = n!/T n . (8)
Since the order statistics are necessarily ordered, fX1 ,...,Xn (x1 , . . . , xn ) = 0 unless x1 < · · · < xn .
For an arbitrary matrix A, the system of equations Ax = y − b may have no solutions (if the
columns of A do not span the vector space), multiple solutions (if the columns of A are linearly
dependent), or, when A is invertible, exactly one solution. In the invertible case,
PY (y) = P [AX = y − b] = P X = A−1 (y − b) = PX A−1 (y − b) . (2)
As an aside, we note that when Ax = y − b has multiple solutions, we would need to do some
bookkeeping to add up the probabilities PX (x) for all vectors x satisfying Ax = y − b. This can
get disagreeably complicated.
210
Problem 5.5.3 Solution
The response time Xi of the ith truck has PDF fXi (xi ) and CDF FXi (xi ) given by
1 −x/2
2 e x ≥ 0, 1 − e−x/2 x ≥ 0
fXi (xi ) = FXi (xi ) = FX (xi ) = (1)
0 otherwise, 0 otherwise.
Let R = max(X1 , X2 , . . . , X6 ) denote the maximum response time. From Theorem 5.7, R has PDF
(a) The probability that all six responses arrive within five seconds is
(b) This question is worded in a somewhat confusing way. The “expected response time” refers
to E[Xi ], the response time of an individual truck, rather than E[R]. If the expected response
time of a truck is τ , then each Xi has CDF
1 − e−x/τ x ≥ 0
FXi (x) = FX (x) = (4)
0 otherwise.
The goal of this problem is to find the maximum permissible value of τ . When each truck
has expected response time τ , the CDF of R is
6 (1 − e−r/τ )6 r ≥ 0,
FR (r) = (FX (x) r) = (5)
0 otherwise.
We need to find τ such that
This implies
−3
τ= = 0.7406 s. (7)
ln 1 − (0.9)1/6
To find the probability that W ≤ 25, we will find the CDF FW (w) since this will also be
useful for part (c).
211
Since the Xi are iid,
10
FW (w) = 1 − P [Xi > w] = 1 − (1 − FXi (w))10 (6)
i=1
10
w − 35
=1− 1−Φ (7)
5
Thus,
(b) The finishing time of the last boat is L = max(X1 , . . . , X10 ). The probability that the last
boat finishes in more than 50 minutes is
Once again, since the Xi are iid Gaussian (35, 5) random variables,
10
P [L > 50] = 1 − P [Xi ≤ 50] = 1 − (FXi (50))10 (12)
i=1
= 1 − (Φ([50 − 35]/5))10 (13)
= 1 − (Φ(3)) 10
= 0.0134 (14)
(c) A boat will finish in negative time if and only iff the winning boat finishes in negative time,
which has probability
Unfortunately, the tables in the text have neither Φ(7) nor Q(7). However, those with access
to Matlab, or a programmable calculator, can find out that Q(7) = 1 − Φ(7) = 1.28 × 10−12 .
This implies that a boat finishes in negative time with probability
212
In terms of the conditional expectations given Ji , these facts can be written as
E [Ni |Ji ] = Ji E Ni2 |Ji = Ji + Ji2 (3)
This permits us to evaluate the moments of Ji−1 in terms of the moments of Ji . Specifically,
1 Ji 3Ji
E [Ji−1 |Ji ] = E [Ji |Ji ] + E [Ni |Ji ] = Ji + = (4)
2 2 2
This implies
3
E [Ji−1 ] = E [Ji ] (5)
2
We can use this the calculate E[Ji ] for all i. Since the jackpot starts at 1 million dollars, J6 = 106
and E[J6 ] = 106 . This implies
E [Ji ] = (3/2)6−i 106 (6)
Now we will find the second moment E[Ji2 ]. Since Ji−1
2 = J 2 + N J + N 2 /4, we have
i i i i
2 2 2
E Ji−1 |Ji = E Ji |Ji + E [Ni Ji |Ji ] + E Ni |Ji /4 (7)
= Ji2 + Ji E [Ni |Ji ] + (Ji + Ji2 )/4 (8)
= (3/2)2 Ji2 + Ji /4 (9)
By taking the expectation over Ji we have
2
E Ji−1 = (3/2)2 E Ji2 + E [Ji ] /4 (10)
This recursion allows us to calculate E[Ji2 ] for i = 6, 5, . . . , 0. Since J6 = 106 , E[J62 ] = 1012 . From
the recursion, we obtain
1
E J52 = (3/2)2 E J62 + E [J6 ] /4 = (3/2)2 1012 + 106 (11)
4
1
E J42 = (3/2)2 E J52 + E [J5 ] /4 = (3/2)4 1012 + (3/2)2 + (3/2) 106 (12)
4
1
E J32 = (3/2)2 E J42 + E [J4 ] /4 = (3/2)6 1012 + (3/2)4 + (3/2)3 + (3/2)2 106 (13)
4
The same recursion will also allow us to show that
1
E J22 = (3/2)8 1012 + (3/2)6 + (3/2)5 + (3/2)4 + (3/2)3 106 (14)
4
1
E J12 = (3/2)10 1012 + (3/2)8 + (3/2)7 + (3/2)6 + (3/2)5 + (3/2)4 106 (15)
4
1
E J02 = (3/2)12 1012 + (3/2)10 + (3/2)9 + · · · + (3/2)5 106 (16)
4
Finally, day 0 is the same as any other day in that J = J0 + N0 /2 where N0 is a Poisson random
variable with mean J0 . By the same argument that we used to develop recursions for E[Ji ] and
E[Ji2 ], we can show
E [J] = (3/2)E [J0 ] = (3/2)7 106 ≈ 17 × 106 (17)
and
E J 2 = (3/2)2 E J02 + E [J0 ] /4 (18)
1
= (3/2)14 1012 + (3/2)12 + (3/2)11 + · · · + (3/2)6 106 (19)
4
106
= (3/2)14 1012 + (3/2)6 [(3/2)7 − 1] (20)
2
213
Finally, the variance of J is
106
Var[J] = E J 2 − (E [J])2 = (3/2)6 [(3/2)7 − 1] (21)
2
Since the variance is hard to interpret, we note that the standard deviation of J is σJ ≈ 9572.
Although the expected jackpot grows rapidly, the standard deviation of the jackpot is fairly small.
Not surprisingly, since the Xi are identical, symmetry would suggest that Xn is as likely as any of
the other Xi to be the largest. Hence P [A] = 1/n should not be surprising.
(a) The coavariance matrix of X = X1 X2 is
Var[X1 ] Cov [X1 , X2 ] 4 3
CX = = . (1)
Cov [X1 , X2 ] Var[X2 ] 3 9
214
Problem 5.6.2 Solution
The mean value of a sum of random variables is always the sum of their individual means.
n
E [Y ] = E [Xi ] = 0 (1)
i=1
The variance of any sum of random variables can be expressed in terms of the individual variances
and co-variances. Since the E[Y ] is zero, Var[Y ] = E[Y 2 ]. Thus,
⎡! "2 ⎤ ⎡ ⎤
n n n n
n
Var[Y ] = E ⎣ Xi ⎦ = E ⎣ Xi Xj ⎦ = E Xi2 + E [Xi Xj ] (2)
i=1 i=1 j=1 i=1 i=1 j
=i
fX (x) = fX1 (x1 ) fX2 (x2 ) fX3 (x3 ) fX4 (x4 ) (1)
It follows that for each i, E[Xi ] = 1/2, E[Xi2 ] = 1/3 and Var[Xi ] = 1/12. In addition, Xi and Xj
have correlation
E [Xi Xj ] = E [Xi ] E [Xj ] = 1/4. (3)
and covariance Cov[Xi , Xj ] = 0 for i = j since independent random variables always have zero
covariance.
215
(b) The correlation matrix is
⎡ ⎤
E X12 E [X
1 X2 ] E [X1 X3 ] E [X1 X4 ]
⎢E [X2 X1 ] E X22 E [X ⎥
RX = E XX = ⎢ 2X 3 ] E [X2 X4 ]⎥ (5)
⎣E [X3 X1 ] E [X3 X2 ] E X3 2 E [X ⎦
3X
2
4 ]
E [X4 X1 ] E [X4 X2 ] E [X4 X3 ] E X4
⎡ ⎤
1/3 1/4 1/4 1/4
⎢1/4 1/3 1/4 1/4⎥
=⎢⎣1/4 1/4
⎥ (6)
1/3 1/4⎦
1/4 1/4 1/4 1/3
For m = n,
2 2 2 1−p 1 2−p
RJ (m, m) = E Jm = Var[Jm ] + (E Jm ) = 2
+ 2 = . (4)
p p p2
Thus ⎡ ⎤
2−p 1 1
1 ⎣
RJ = 2 1 2−p 1 ⎦. (5)
p
1 1 2−p
Because Jm and Jn are independent, off-diagonal terms in the covariance matrix are
216
Since CJ (m, m) = Var[Jm ], we have that
⎡ ⎤
1 0 0
1−p 1−p⎣
CJ = 2
I= 0 1 0⎦ . (7)
p p2
0 0 1
Since the components of J are independent, it has the diagonal covariance matrix
⎡ ⎤
Var[J1 ] 0 0
1−p
CJ = ⎣ 0 Var[J2 ] 0 ⎦= I (3)
p2
0 0 Var[J3 ]
CK = ACJ A (5)
1−p
= AIA (6)
p2
⎡ ⎤⎡ ⎤ ⎡ ⎤
1 0 0 1 1 1 1 1 1
1−p⎣ 1−p⎣
= 2
1 1 0⎦ ⎣0 1 1⎦ = 1 2 2⎦ (7)
p p2
1 1 1 0 0 1 1 2 3
(c) Given the expected value vector µK and the covariance matrix CK , we can use Theorem 5.12
217
to find the correlation matrix
RK = CK + µK µK (8)
⎡ ⎤ ⎡ ⎤
1 1 1 1/p
1−p⎣ ⎦ + ⎣2/p⎦ 1/p 2/p 3/p
= 1 2 2 (9)
p2
1 2 3 3/p
⎡ ⎤ ⎡ ⎤
1 1 1 1 2 3
1−p⎣ 1
= 2
1 2 2⎦ + 2 ⎣2 4 6⎦ (10)
p p
1 2 3 3 6 9
⎡ ⎤
2−p 3−p 4−p
1
= 2 ⎣3 − p 6 − 2p 8 − 2p ⎦ (11)
p
4 − p 8 − 2p 12 − 3p
218
Inspection will show that Y1 and Y4 are independent since fY1 ,Y4 (y1 , y4 ) = fY1 (y1 )fY4 (y4 ). Similarly,
Y2 and Y4 are independent since fY2 ,Y3 (y2 , y3 ) = fY2 (y2 )fY3 (y3 ). This implies
We also need to calculate fY1 ,Y2 (y1 , y2 ), fY3 ,Y4 (y3 , y4 ), fY1 ,Y3 (y1 , y3 ) and fY2 ,Y4 (y2 , y4 ). To start, for
0 ≤ y1 ≤ y2 ≤ 1,
' ∞' ∞
fY1 ,Y2 (y1 , y2 ) = fY1 ,Y2 ,Y3 ,Y4 (y1 , y2 , y3 , y4 ) dy3 dy4 (12)
−∞ −∞
' 1 ' y4 ' 1
= 4 dy3 dy4 = 4y4 dy4 = 2. (13)
0 0 0
Similarly, for 0 ≤ y3 ≤ y4 ≤ 1,
' ∞ ' ∞
fY3 ,Y4 (y3 , y4 ) = fY1 ,Y2 ,Y3 ,Y4 (y1 , y2 , y3 , y4 ) dy1 dy2 (14)
−∞ −∞
' 1 ' y2 ' 1
= 4 dy1 dy2 = 4y2 dy2 = 2. (15)
0 0 0
We observe that Y1 and Y3 are independent since fY1 ,Y3 (y1 , y3 ) = fY1 (y1 )fY3 (y3 ). It follows that
219
We observe that Y2 and Y4 are independent since fY2 ,Y4 (y2 , y4 ) = fY2 (y2 )fY4 (y4 ). It follows that
RY (2, 4) = E[Y2 Y4 ] = E[Y2 ]E[Y4 ] = 4/9. The above results give RY (i, j) for i ≤ j. Since RY is a
symmetric matrix, ⎡ ⎤
1/6 1/4 1/9 2/9
⎢1/4 1/2 2/9 4/9⎥
RY = ⎢ ⎣1/9 2/9 1/6 1/4⎦ .
⎥ (25)
2/9 4/9 1/4 1/2
Since µX = 1/3 2/3 1/3 2/3 , the covariance matrix is
CY = RY − µX µX (26)
⎡ ⎤
⎡ ⎤ 1/3
1/6 1/4 1/9 2/9 ⎢2/3⎥
= ⎣1/4 1/2 2/9 4/9⎦ − ⎢ ⎥
⎣1/3⎦ 1/3 2/3 1/3 2/3 (27)
2/9 4/9 1/4 1/2
2/3
⎡ ⎤
1/18 1/36 0 0
⎢1/36 1/18 0 0 ⎥
=⎢ ⎣ 0
⎥ . (28)
0 1/18 1/36⎦
0 0 1/36 1/18
The off-diagonal zero blocks are a consequence of Y1 Y2 being independent of Y3 Y4 . Along
the diagonal, the two identical sub-blocks occur because fY1 ,Y2 (x, y) = fY3 ,Y4 (x, y). In short, the
matrix structure is the result of Y1 Y2 and Y3 Y4 being iid random vectors.
In this problem, the PDF is simple enough that we can compute E[Yin ] for arbitrary integers n ≥ 0.
' ∞' ∞ ' 1 ' 1−y2
n n
E [Y1 ] = y1 fY1 ,Y2 (y1 , y2 ) dy1 dy2 = 2y1n dy1 dy2 . (3)
−∞ −∞ 0 0
Symmetry of the joint PDF fY1 ,2 (y1 ,2 ) implies that E[Y2n ] = E[Y1n ]. Thus, E[Y1 ] = E[Y2 ] = 1/3
and
E [Y] = µY = 1/3 1/3 . (5)
In addition,
RY (1, 1) = E Y12 = 1/6, RY (2, 2) = E Y22 = 1/6. (6)
220
To complete the correlation matrix, we find
' ∞' ∞ ' 1 ' 1−y2
RY (1, 2) = E [Y1 Y2 ] = y1 y2 fY1 ,Y2 (y1 , y2 ) dy1 dy2 = 2y1 y2 dy1 dy2 . (7)
−∞ −∞ 0 0
It follows that the covariance matrix CX is positive semi-definite if and only if the correlation
matrix RY is positive semi-definite. Thus, it is sufficient to show that every correlation matrix,
whether it is denoted RY or RX , is positive semi-definite.
To show a correlation matrix RX is positive semi-definite, we write
a RX a = a E XX a = E a XX a = E (a X)(X a) = E (a X)2 . (2)
We note that W = a X is a random variable. Since E[W 2 ] ≥ 0 for any random variable W ,
a RX a = E W 2 ≥ 0. (3)
RX = CX + µX µX (1)
⎡ ⎤ ⎡ ⎤
4 −2 1 4
= ⎣−2 4 −2⎦ + ⎣8⎦ 4 8 6 (2)
1 −2 4 6
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
4 −2 1 16 32 24 20 30 25
= ⎣−2 4 −2⎦ + ⎣32 64 48⎦ = ⎣30 68 46⎦ (3)
1 −2 4 24 48 36 25 46 40
221
(b) Let Y = X1 X2 . Since Y is a subset of the components of X, it is a Gaussian random
vector with expected velue vector
µY = E [X1 ] E [X2 ] = 4 8 . (4)
and covariance matrix
Var[X1 ] Cov [X1 , X2 ] 4 −2
CY = = (5)
CX1 X2 Var[X2 ] −2 4
We note that det(CY ) = 12 and that
1 4 2 1/3 1/6
C−1
Y = = . (6)
12 2 4 1/6 1/3
This implies that
1/3 1/6 y1 − 4
(y − µY )
C−1
Y (y − µY ) = y1 − 4 y2 − 8 (7)
1/6 1/3 y2 − 8
y1 /3 + y2 /6 − 8/3
= y1 − 4 y2 − 8 (8)
y1 /6 + y2 /3 − 10/3
y12 y1 y2 16y1 20y2 y22 112
= + − − + + (9)
3 3 3 3 3 3
The PDF of Y is
1 −1
fY (y) = √ e−(y−µY ) CY (y−µY )/2
(10)
2π 12
1 2 2
=√ e−(y1 +y1 y2 −16y1 −20y2 +y2 +112)/6 (11)
48π 2
Since Y = X1 , X2 , the PDF of X1 and X2 is simply
1 2 2
fX1 ,X2 (x1 , x2 ) = fY1 ,Y2 (x1 , x2 ) = √ e−(x1 +x1 x2 −16x1 −20x2 +x2 +112)/6 (12)
48π 2
(c) We can observe directly from µX and CX that X1 is a Gaussian (4, 2) random variable. Thus,
X1 − 4 8−4
P [X1 > 8] = P > = Q(2) = 0.0228 (13)
2 2
222
(a) The expected value of Y is
⎡ ⎤
4
1 1/2 2/3 ⎣ ⎦ −4 8
µY = AµX + b = 8 + = . (3)
1 −1/2 2/3 −4 0
6
CY = ACX A (4)
⎡ ⎤⎡ ⎤
4 −2 1 1 1
1 1/2 2/3 ⎣ 1 43 55
= −2 4 −2⎦ ⎣1/2 −1/2⎦ = . (5)
1 −1/2 2/3 9 55 103
1 −2 4 2/3 2/3
(d) From µY , we see that E[Y2 ] = 0. From the covariance matrix CY , we learn that Y2 has
variance σ22 = CY (2, 2) = 103/9. Since Y2 is a Gaussian random variable,
1 Y2 1
P [−1 ≤ Y2 ≤ 1] = P − ≤ ≤ (7)
σ2 σ2 σ2
1 −1
=Φ −Φ (8)
σ2 σ2
1
= 2Φ −1 (9)
σ2
3
= 2Φ √ − 1 = 0.2325. (10)
103
Var[Y ] = CY = a CX a. (1)
223
Thus,
1 1
= . (3)
2π[det (CX )]1/2
2πσ1 σ2 1 − ρ2
Using the 2 × 2 matrix inverse formula
−1
a b 1 d −b
= , (4)
c d ad − bc −c a
we obtain 0 1 −ρ 1
1 σ22 −ρσ1 σ2 1 σ12
C−1
σ1 σ2
X = 2 2 = −ρ . (5)
σ1 σ2 (1 − ρ2 ) −ρσ1 σ2 2
σ1 1 − ρ2 σ1 σ2 1
σ22
Thus
0 −ρ 1
1
σ12 σ1 σ2 x1 − µ1
x1 − µ1 x2 − µ2 −ρ
1 σ1 σ2
1
σ22
x2 − µ2
− (x − µX ) C−1
X (x − µX ) = − (6)
2 2(1 − ρ2 )
⎡ ⎤
x1 −µ1
σ12
− ρ(xσ21−µ 2)
x2 − µ2 ⎣ ⎦
σ2
x1 − µ1
− ρ(xσ11−µ
σ2
1)
+ x2σ−µ
2
2
2
=− (7)
2(1 − ρ2 )
(x1 −µ1 )2 2ρ(x1 −µ1 )(x2 −µ2 ) (x2 −µ2 )2
σ12
− σ1 σ2 + σ22
=− . (8)
2(1 − ρ2 )
224
(a) From Theorem 5.13, Y has covariance matrix
CY = QCX Q (1)
cos θ − sin θ σ12 0 cos θ sin θ
= (2)
sin θ cos θ 0 σ22 − sin θ cos θ
2
σ cos2 θ + σ22 sin2 θ (σ12 − σ22 ) sin θ cos θ
= 12 . (3)
(σ1 − σ22 ) sin θ cos θ σ12 sin2 θ + σ22 cos2 θ
Since Y1 and Y2 are jointly Gaussian, they are independent if and only if Cov[Y1 , Y2 ] =
0. Thus, Y1 and Y2 are independent for all θ if and only if σ12 = σ22 . In this case, when
the joint PDF fX (x) is symmetric in x1 and x2 . In terms of polar coordinates, the PDF
fX (x) = fX1 ,X2 (x1 , x2 ) depends on r = x21 + x22 but for a given r, is constant for all
−1
φ = tan (x2 /x1 ). The transformation of X to Y is just a rotation of the coordinate system
by θ preserves this circular symmetry.
(b) If σ22 > σ12 , then Y1 and Y2 are independent if and only if sin θ cos θ = 0. This occurs in the
following cases:
• θ = 0: Y1 = X1 and Y2 = X2
• θ = π/2: Y1 = −X2 and Y2 = −X1
• θ = π: Y1 = −X1 and Y2 = −X2
• θ = −π/2: Y1 = X2 and Y2 = X1
In all four cases, Y1 and Y2 are just relabeled versions, possibly with sign changes, of X1 and
X2 . In these cases, Y1 and Y2 are independent because X1 and X2 are independent. For
other values of θ, each Yi is a linear combination of both X1 and X2 . This mixing results in
correlation between Y1 and Y2 .
CX = QCY Q (2)
1 1 −1 1 + ρ 0 1 1 1
=√ √ (3)
2 1 1 0 1−ρ 2 −1 1
1 1 + ρ −(1 − ρ) 1 1
= (4)
2 1+ρ 1−ρ −1 1
1 ρ
= . (5)
ρ 1
225
Problem 5.7.8 Solution
As given in the
problem
statement, we define the m-dimensional vector X, the n-dimensional vector
X
Y and W = . Note that W has expected value
Y
X E [X] µX
µW = E [W] = E = = . (1)
Y E [Y] µY
(a) If you are familiar with the Gram-Schmidt procedure, the argument is that applying Gram-
Schmidt to the rows of A yields m orthogonal row vectors. It is then possible to augment
those vectors with an additional n − m orothogonal vectors. Those orthogonal vectors would
be the rows of Ã.
An alternate argument is that since A has rank m the nullspace of A, i.e., the set of all
vectors y such that Ay = 0 has dimension n − m. We can choose any n − m linearly
independent vectors y1 , y2 , . . . , yn−m in the nullspace A. We then define à to have columns
y1 , y2 , . . . , yn−m . It follows that AÃ = 0.
226
is a rank n matrix. To prove this fact, we will suppose there exists w such that Āw = 0,
and then show that w is a zero vector. Since A and à together have n linearly independent
rows, we can write the row vector w as a linear combination of the rows of A and Ã. That
is, for some v and ṽ,
w = vt A + ṽ Ã. (3)
The condition Āw = 0 implies
0
A
A v + Ã ṽ = . (4)
ÃC−1
X 0
This implies
Since AÃ = 0, Equation (5) implies that AA v = 0. Since A is rank m, AA is an m × m
rank m matrix. It follows that v = 0. We can then conclude from Equation (6) that
ÃC−1
X Ã ṽ = 0. (7)
(c) We note that By Theorem 5.16, the Gaussian vector Ȳ = ĀX has covariance matrix
Since (C−1 −1
X ) = CX ,
Ā = A (ÃC−1
X ) = A C−1
X Ã . (9)
Applying this result to Equation (8) yields
A −1
ACX −1
ACX A AÃ
C̄ = CX A CX Ã = A CX Ã = . (10)
ÃC−1
X Ã ÃA ÃC−1
X Ã
Since ÃA = 0,
ACX A 0 CY 0
C̄ = = . (11)
0 ÃC−1
X Ã 0 CŶ
We see that C̄ is block diagonal covariance matrix. From the claim of Problem 5.7.8, we can
conclude that Y and Ŷ are independent Gaussian random vectors.
227
and covariance matrix
CY = Var[Y ] = ACX A (3)
⎡ ⎤⎡ ⎤
4 −2 1
1/3
⎣ ⎦ ⎣ 2
= 1/3 1/3 1/3 −2 4 −2 1/3⎦ = (4)
3
1 −2 4 1/3
Thus Y is a Gaussian (6, 2/3) random variable, implying
0 1
Y −6 4−6 √ √
P [Y > 4] = P > = 1 − Φ(− 6) = Φ( 6) = 0.9928 (5)
2/3 2/3
(a) The covariance matrix CX has Var[Xi ] = 25 for each diagonal entry. For i = j, the i, jth
entry of CX is
[CX ]ij = ρXi Xj Var[Xi ] Var[Xj ] = (0.8)(25) = 20 (1)
The covariance matrix of X is a 10 × 10 matrix of the form
⎡ ⎤
25 20 · · · 20
⎢ .. ⎥
⎢20 25 . . . .⎥
CX = ⎢ ⎢ .. . . ..
⎥.
⎥ (2)
⎣. . . 20⎦
20 · · · 20 25
228
If you have read the solution to Quiz 5.8, you know that CT is a symmetric Toeplitz matrix and
that Matlab has a toeplitz function to generate Toeplitz matrices. Using the toeplitz function
to generate the covariance matrix, it is easy to use gaussvector to generate samples of the random
vector T. Here is the code for estimating P [A] using m samples.
function p=julytemp583(m); julytemp583(100000)
c=36./(1+(0:30)); ans =
CT=toeplitz(c); 0.0684
mu=80*ones(31,1); >> julytemp583(100000)
T=gaussvector(mu,CT,m); ans =
Y=sum(T)/31; 0.0706
Tmin=min(T); >> julytemp583(100000)
p=sum((Tmin>=72) & (Y <= 82))/m; ans =
0.0714
>> julytemp583(100000)
ans =
0.0701
We see from repeated experiments with m = 100,000 trials that P [A] ≈ 0.07.
A program to estimate P [W ≤ 25] uses gaussvector to generate m sample vector of race times X.
In the program sailboats.m, X is an 10 × m matrix such that each column of X is a vector of race
times. In addition min(X) is a row vector indicating the fastest time in each race.
function p=sailboats(w,m) >> sailboats(25,10000)
%Usage: p=sailboats(f,m) ans =
%In Problem 5.8.4, W is the 0.0827
%winning time in a 10 boat race. >> sailboats(25,100000)
%We use m trials to estimate ans =
%P[W<=w] 0.0801
CX=(5*eye(10))+(20*ones(10,10)); >> sailboats(25,100000)
mu=35*ones(10,1); ans =
X=gaussvector(mu,CX,m); 0.0803
W=min(X); >> sailboats(25,100000)
p=sum(W<=w)/m; ans =
0.0798
We see from repeated experiments with m = 100,000 trials that P [W ≤ 25] ≈ 0.08.
229
is different from that of any other Poisson random variables. Thus, we must generate the daily
jackpots sequentially. Here is a simple program for this purpose.
function jackpot=lottery1(jstart,M,D)
%Usage: function j=lottery1(jstart,M,D)
%Perform M trials of the D day lottery
%of Problem 5.5.5 and initial jackpot jstart
jackpot=zeros(M,1);
for m=1:M,
disp(’trm)
jackpot(m)=jstart;
for d=1:D,
jackpot(m)=jackpot(m)+(0.5*poissonrv(jackpot(m),1));
end
end
The main problem with lottery1 is that it will run very slowly. Each call to poissonrv generates
an entire Poisson PMF PX (x) for x = 0, 1, .
. . , xmax where xmax ≥ 2 · 106 . This is slow in several
ways. First, we repeating the calculation of xj=1 max
log j with each call to poissonrv. Second, each
call to poissonrv asks for a Poisson sample value with expected value α > 1 · 106 . In these cases,
for small values of x, PX (x) = αx e−αx /x! is so small that it is less than the smallest nonzero number
that Matlab can store!
To speed up the simulation, we have written a program bigpoissonrv which generates Poisson
(α) samples for large α. The program makes an approximation that for a Poisson (α) random vari-
√ √
able X, PX (x) ≈ 0 for |x − α| > 6 α. Since X has standard deviation α, we are assuming that X
cannot be more than six standard deviations away from its mean value. The error in this approxi-
mation is very small. In fact, for a Poisson (a) random variable, the program poissonsigma(a,k)
√
calculates the error P [|X − a| > k a]. Here is poissonsigma.m and some simple calculations:
function err=poissonsigma(a,k); >> poissonsigma(1,6)
xmin=max(0,floor(a-k*sqrt(a))); ans =
xmax=a+ceil(k*sqrt(a)); 1.0249e-005
sx=xmin:xmax; >> poissonsigma(10,6)
logfacts =cumsum([0,log(1:xmax)]); ans =
%logfacts includes 0 in case xmin=0 2.5100e-007
%Now we extract needed values: >> poissonsigma(100,6)
logfacts=logfacts(sx+1); ans =
%pmf(i,:) is a Poisson a(i) PMF 1.2620e-008
% from xmin to xmax >> poissonsigma(1000,6)
pmf=exp(-a+ (log(a)*sx)-(logfacts)); ans =
err=1-sum(pmf); 2.6777e-009
>> poissonsigma(10000,6)
ans =
1.8081e-009
>> poissonsigma(100000,6)
ans =
-1.6383e-010
The error reported by poissonsigma(a,k) should always be positive. In fact, we observe
negative errors for very large a. For large α and x, numerical calculation of PX (x) = αx e−α /x!
is tricky because we are taking ratios of very large numbers. In fact, for α = x = 1,000,000,
Matlab calculation of αx and x! will report infinity while e−α will evaluate as zero. Our method
230
x
of calculating the Poisson (α) PMF is to use the fact that ln x! = j=1 ln j to calculate
⎛ ⎞
x
exp (ln PX (x)) = exp ⎝x ln α − α − ln j ⎠ . (1)
j=1
This method works reasonably well except that the calculation of the logarithm has finite precision.
The consequence is that the calculated sum over the PMF can vary from 1 by a very small amount,
on the order of 10−7 in our experiments. In our problem, the error is inconsequential, however, one
should keep in mind that this may not be the case in other other experiments using large Poisson
random variables. In any case, we can conclude that within the accuracy of Matlab’s simulated
experiments, the approximations to be used by bigpoissonrv are not significant.
The other feature
of bigpoissonrv is that for a vector alpha corresponding to expected values
α1 · · · αm , bigpoissonrv returns a vector X such that X(i) is a Poisson alpha(i) sample.
The work of calculating the sum of logarithms is done only once for all calculated samples. The
result is a significant savings in cpu time as long as the values of alpha are reasonably close to each
other.
function x=bigpoissonrv(alpha)
%for vector alpha, returns a vector x such that
% x(i) is a Poisson (alpha(i)) rv
%set up Poisson CDF from xmin to xmax for each alpha(i)
alpha=alpha(:);
amin=min(alpha(:));
amax=max(alpha(:));
%Assume Poisson PMF is negligible +-6 sigma from the average
xmin=max(0,floor(amin-6*sqrt(amax)));
xmax=amax+ceil(6*sqrt(amax));%set max range
sx=xmin:xmax;
%Now we include the basic code of poissonpmf (but starting at xmin)
logfacts =cumsum([0,log(1:xmax)]); %include 0 in case xmin=0
logfacts=logfacts(sx+1); %extract needed values
%pmf(i,:) is a Poisson alpha(i) PMF from xmin to xmax
pmf=exp(-alpha*ones(size(sx))+ ...
(log(alpha)*sx)-(ones(size(alpha))*logfacts));
cdf=cumsum(pmf,2); %each row is a cdf
x=(xmin-1)+sum((rand(size(alpha))*ones(size(sx)))<=cdf,2);
Finally, given bigpoissonrv, we can write a short program lottery that simulates trials of the
jackpot experiment. Ideally, we would like to use lottery to perform m = 1,000 trials in a single
pass. In general, Matlab is more efficient when calculations are executed in parallel using vectors.
√
However, in bigpoissonrv, the matrix pmf will have m rows and at least 12 α = 12,000 columns.
For m more than several hundred, Matlab running on my laptop reported an “Out of Memory”
error. Thus, we wrote the program lottery to perform M trials at once and to repeat that N times.
The output is an M × N matrix where each i, j entry is a sample jackpot after seven days.
231
function jackpot=lottery(jstart,M,N,D)
%Usage: function j=lottery(jstart,M,N,D)
%Perform M trials of the D day lottery
%of Problem 5.5.5 and initial jackpot jstart
jackpot=zeros(M,N);
for n=1:N,
jackpot(:,n)=jstart*ones(M,1);
for d=1:D,
disp(d);
jackpot(:,n)=jackpot(:,n)+(0.5*bigpoissonrv(jackpot(:,n)));
end
end
100
Frequency
50
0
1.7076 1.7078 1.708 1.7082 1.7084 1.7086 1.7088 1.709 1.7092 1.7094 1.7096
J 7
x 10
If you go back and solve Problem 5.5.5, you will see that the jackpot J has expected value
E[J] = (3/2)7 × 106 = 1.70859 × 107 dollars. Thus it is not surprising that the histogram is
centered around a jackpot of 1.708 × 107 dollars. If we did more trials, and used more histogram
bins, the histogram would appear to converge to the shape of a Gaussian PDF. This fact is explored
in Chapter 6.
232
Problem Solutions – Chapter 6
(a) The PMF of N1 , the number of phone calls needed to obtain the correct answer, can be
determined by observing that if the correct answer is given on the nth call, then the previous
n − 1 calls must have given wrong answers so that
(3/4)n−1 (1/4) n = 1, 2, . . .
PN1 (n) = (1)
0 otherwise
(b) N1 is a geometric random variable with parameter p = 1/4. In Theorem 2.5, the mean of a
geometric random variable is found to be 1/p. For our case, E[N1 ] = 4.
(c) Using the same logic as in part (a) we recognize that in order for n to be the fourth correct
answer, that the previous n − 1 calls must have contained exactly 3 correct answers and that
the fourth correct answer arrived on the n-th call. This is described by a Pascal random
variable. n−1 n−4 (1/4)4 n = 4, 5, . . .
PN4 (n4 ) = 3 (3/4) (2)
0 otherwise
(d) Using the hint given in the problem statement we can find the mean of N4 by summing up
the means of the 4 identically distributed geometric random variables each with mean 4. This
gives E[N4 ] = 4E[N1 ] = 16.
233
Problem 6.1.4 Solution
We can solve this problem using Theorem 6.2 which says that
Thus the variance of X is Var[X] = E[X 2 ] − (E[X])2 = 1/18. By symmetry, it should be apparent
that E[Y ] = E[X] = 1/3 and Var[Y ] = Var[X] = 1/18. To find the covariance, we first find the
correlation
' 1 ' 1−x ' 1
E [XY ] = 2xy dy dx = x(1 − x)2 dx = 1/12 (5)
0 0 0
The covariance is
For this specific problem, it’s arguable whether it would easier to find Var[W ] by first deriving the
CDF and PDF of W . In particular, for 0 ≤ w ≤ 1,
' w ' w−x ' w
FW (w) = P [X + Y ≤ w] = 2 dy dx = 2(w − x) dx = w2 (8)
0 0 0
The variance of W is Var[W ] = E[W 2 ] − (E[W ])2 = 1/18. Not surprisingly, we get the same answer
both ways.
234
Since Yn has zero mean, the variance of Yn is
Var[Yn ] = E Yn2 (2)
1
= E (Xn + Xn−1 + Xn−2 )2 (3)
9
1
= E Xn2 + Xn−12 2
+ Xn−2 + 2Xn Xn−1 + 2Xn Xn−2 + 2Xn−1 Xn−2 (4)
9
1 4
= (1 + 1 + 1 + 2/4 + 0 + 2/4) = (5)
9 9
X+Y=w
FW (w) = 2 dx dy + 2 dx dy (3)
w
0 0 2
0
X
w w2
= 2w − 1 − (4)
2
Putting all the parts together gives the CDF FW (w) and (by taking the derivative) the PDF
fW (w).
⎧
⎪ 0 w<0 ⎧
⎪
⎨ w2 ⎨ w 0≤w≤1
0≤w≤1
FW (w) = 2
w2 fW (w) = 2−w 1≤w ≤2 (5)
⎪ ⎩
⎩ 2w − 1 − 2 1 ≤ w ≤ 2
⎪ 0 otherwise
1 w>2
235
Proceeding as in Problem 6.2.1, we must first find FW (w) by integrating over the square defined
by 0 ≤ x, y ≤ 1. Again we are forced to find FW (w) in parts as we did in Problem 6.2.1 resulting
in the following integrals for their appropriate regions. For 0 ≤ w ≤ 1,
' w ' w−x
FW (w) = dx dy = w2 /2 (2)
0 0
For 1 ≤ w ≤ 2,
' w−1 ' 1 ' 1 ' w−y
FW (w) = dx dy + dx dy = 2w − 1 − w2 /2 (3)
0 0 w−1 0
The complete CDF FW (w) is shown below along with the corresponding PDF fW (w) = dFW (w)/dw.
⎧
⎪ 0 w<0 ⎧
⎪
⎨ 2 ⎨ w 0≤w≤1
w /2 0≤w≤1
FW (w) = fW (w) = 2−w 1≤w ≤2 (4)
⎪
⎪ 2w − 1 − w2 /2 1 ≤ w ≤ 2 ⎩
⎩ 0 otherwise
1 otherwise
When µ = λ, the previous derivation is invalid because of the denominator term λ − µ. For µ = λ,
we have
' ∞
fW (w) = fX (x) fY (w − x) dx (5)
−∞
' w
= λe−λx λe−λ(w−x) dx (6)
0
' w
= λ2 e−λw dx (7)
2 −λw 0
λ we w≥0
= (8)
0 otherwise
Note that when µ = λ, W is the sum of two iid exponential random variables and has a second
order Erlang PDF.
236
Problem 6.2.4 Solution
In this problem, X and Y have joint PDF
8xy 0 ≤ y ≤ x ≤ 1
fX,Y (x, y) = (1)
0 otherwise
(∞
We can find the PDF of W using Theorem 6.4: fW (w) = −∞ fX,Y (x, w − x) dx. The only tricky
part remaining is to determine the limits of the integration. First, for w < 0, fW (w) = 0. The
two remaining cases are shown in the accompanying figure. The shaded area shows where the joint
PDF fX,Y (x, y) is nonzero. The diagonal lines depict y = w − x as a function of x. The intersection
of the diagonal line and the shaded area define our limits of integration.
For 0 ≤ w ≤ 1,
' w
y fW (w) = 8x(w − x) dx (2)
w/2
2 $w
= 4wx2 − 8x3 /3$w/2 = 2w3 /3 (3)
w
1<w<2
1 For 1 ≤ w ≤ 2,
w
0<w<1
' 1
fW (w) = 8x(w − x) dx (4)
x w/2
w 1 w 2 $1
= 4wx2 − 8x3 /3$w/2 (5)
= 4w − 8/3 − 2w3 /3 (6)
Since X + Y ≤ 2, fW (w) = 0 for w > 2. Hence the complete expression for the PDF of W is
⎧
⎨ 2w3 /3 0≤w≤1
fW (w) = 4w − 8/3 − 2w3 /3 1 ≤ w ≤ 2 (7)
⎩
0 otherwise
237
Problem 6.2.6 Solution
The random variables K and J have PMFs
j −α
α e
j = 0, 1, 2, . . . β k e−β
j! k = 0, 1, 2, . . .
PJ (j) = PK (k) = k! (1)
0 otherwise 0 otherwise
n
P [N = n] = PJ (n − k) PK (k) (3)
k=0
n
αn−k e−α β k e−β
= (4)
(n − k)! k!
k=0
n−k
k
(α + β)n e−(α+β)
n
n! α β
= (5)
n! k!(n − k)! α + β α+β
k=0
1
The marked sum above equals 1 because it is the sum of a binomial PMF over all possible values.
The PMF of N is the Poisson PMF
(α+β)n e−(α+β)
PN (n) = n! n = 0, 1, 2, . . . (6)
0 otherwise
238
Problem 6.3.2 Solution
(a) By summing across the rows of the table, we see that J has PMF
0.6 j = −2
PJ (j) = (1)
0.4 j = −1
(b) Summing down the columns of the table, we see that K has PMF
⎧
⎨ 0.7 k = −1
PK (k) = 0.2 k = 0 (2)
⎩
0.1 k = 1
(c) To find the PMF of M = J + K, it is easist to annotate each entry in the table with the
coresponding value of M :
(d) One way to solve this problem, is to find the MGF φM (s) and then take four derivatives.
Sometimes its better to just work with definition of E[M 4 ]:
E M4 = PM (m) m4 (5)
m
= 0.42(−3)4 + 0.40(−2)4 + 0.14(−1)4 + 0.04(0)4 = 40.434 (6)
As best I can tell, the prupose of this problem is to check that you know when not to use the
methods in this chapter.
239
Direct evaluation of the above expression at s = 0 yields 0/0 so we must apply l’Hôpital’s rule and
differentiate the numerator and denominator.
bebs − aeas + s b2 ebs − a2 eas − bebs − aeas
E [X] = lim (3)
s→0 2(b − a)s
b2 ebs − a2 eas b+a
= lim = (4)
s→0 2(b − a) 2
To find the second moment of X, we first find that the second derivative of φX (s) is
d2 φX (s) s2 b2 ebs − a2 eas − 2s bebs − aeas + 2 bebs − aeas
= (5)
ds2 (b − a)s3
Substituting s = 0 will yield 0/0 so once again we apply l’Hôpital’s rule and differentiate the
numerator and denominator.
2 d2 φX (s) s2 b3 ebs − a3 eas
E X = lim = lim (6)
s→0 ds2 s→0 3(b − a)s2
b3 − a3
= = (b2 + ab + a2 )/3 (7)
3(b − a)
In this case, it is probably simpler to find these moments without using the MGF.
To calculate the moments of Y , we define Y = X + µ so that Y is Gaussian (µ, σ). In this case
the second moment of Y is
E Y 2 = E (X + µ)2 = E X 2 + 2µX + µ2 = σ 2 + µ2 . (5)
Similarly, the third moment of Y is
E Y 3 = E (X + µ)3 (6)
= E X 3 + 3µX 2 + 3µ2 X + µ3 = 3µσ 2 + µ3 . (7)
Finally, the fourth moment of Y is
E Y 4 = E (X + µ)4 (8)
= E X 4 + 4µX 3 + 6µ2 X 2 + 4µ3 X + µ4 (9)
= 3σ 4 + 6µ2 σ 2 + µ4 . (10)
240
Problem 6.3.5 Solution
The PMF of K is
1/n k = 1, 2, . . . , n
PK (k) = (1)
0 otherwise
The corresponding MGF of K is
1 s
φK (s) = E esK = e + e2 s + · · · + ens (2)
n
es
= 1 + es + e2s + · · · + e(n−1)s (3)
n
es (ens − 1)
= (4)
n(es − 1)
We can evaluate the moments of K by taking derivatives of the MGF. Some algebra will show that
Evaluating d2 φK (s)/ds2 at s = 0 yields 0/0. Because (es − 1)3 appears in the denominator, we
need to use l’Hôpital’s rule three times to obtain our answer.
$
d2 φK (s) $$ n2 (n + 3)3 e(n+3)s − (2n2 + 2n − 1)(n + 2)3 e(n+2)s + (n + 1)5 − 8e2s − es
$ = lim (9)
ds2 s=0 s→0 6nes
n2 (n + 3)3 − (2n2 + 2n − 1)(n + 2)3 + (n + 1)5 − 9
= (10)
6n
= (2n + 1)(n + 1)/6 (11)
We can use these results to derive two well known results. We observe that we can directly use the
PMF PK (k) to calculate the moments
1 1
n n
E [K] = k E K2 = k2 (12)
n n
k=1 k=1
Using the answers we found for E[K] and E[K 2 ], we have the formulas
n
n(n + 1)
n
n(n + 1)(2n + 1)
k= k2 = (13)
2 6
k=1 k=1
241
Problem 6.4.1 Solution
N is a binomial (n = 100, p = 0.4) random variable. M is a binomial (n = 50, p = 0.4) random
variable. Thus N is the sum of 100 independent Bernoulli (p = 0.4) and M is the sum of 50
independent Bernoulli (p = 0.4) random variables. Since M and N are independent, L = M + N
is the sum of 150 independent Bernoulli (p = 0.4) random variables. Hence L is a binomial
n = 150, p = 0.4) and has PMF
150
PL (l) = (0.4)l (0.6)150−l . (1)
l
(a) From Table 6.1, Y is an exponential (λ = 1) random variable. For an exponential (λ) random
variable, Example 6.5 derives the moments of the exponential random variable. For λ = 1,
the moments of Y are
E [Y ] = 1, E Y 2 = 2, E Y 3 = 3! = 6. (1)
It follows that
E W 2 = Var[W ] + (E [W ])2 = 5 + 25 = 30. (4)
242
(c) Although we could just use the fact that the expectation of the sum equals the sum of the
expectations, the problem asks us to find the moments using φM (s). In this case,
$
dφM (s) $$ $
E [M ] = $ = n(1 − p + pes )n−1 pes $s=0 = np (3)
ds s=0
The variance of M is
Var[M ] = E M 2 − (E [M ])2 = np(1 − p) = n Var[K] (7)
Hence,
Var[Xi ] = E Xi2 − (E [Xi ])2 = 2/3. (6)
By Theorems 6.1 and 6.3, the mean and variance of Y are
E [Y ] = nE [X] = n (7)
Var[Y ] = n Var[X] = 2n/3 (8)
Another more complicated way to find the mean and variance is to evaluate derivatives of
φY (s) as s = 0.
243
Problem 6.4.5 Solution
2k e−2 /k! k = 0, 1, 2, . . .
PKi (k) = (1)
0 otherwise
And let Ri = K1 + K2 + . . . + Ki
(a) From Table 6.1, we find that the Poisson (α = 2) random variable K has MGF φK (s) =
s
e2(e −1) .
i
s −1)
φRi (s) = φK (s) = e2i(e (2)
n=1
(c) Since the MGF of Ri is of the same form as that of the Poisson with parameter, α = 2i.
Therefore we can conclude that Ri is in fact a Poisson random variable with parameter
α = 2i. That is,
(2i)r e−2i /r! r = 0, 1, 2, . . .
PRi (r) = (3)
0 otherwise
(d) Because Ri is a Poisson random variable with parameter α = 2i, the mean and variance of
Ri are then both 2i.
Y = X1 + X2 + · · · + X31 (1)
The random variables X1 , . . . , X31 are Gaussian and independent but not identically distributed.
However, since the sum of independent Gaussian random variables is Gaussian, we know that Y is
Gaussian. Hence, all we need to do is find the mean and variance of Y in order to specify the PDF
of Y . The mean of Y is
31
31
31(32)
E [Y ] = E [Xi ] = (32 − i/4) = 32(31) − = 868 kW-hr (2)
8
i=1 i=1
244
Problem 6.4.7 Solution
By Theorem 6.8, we know that φM (s) = [φK (s)]n .
(a) From Table 6.1, we see that the exponential random variable X has MGF
λ
φX (s) = (1)
λ−s
(b) Note that K is a geometric random variable identical to the geometric random variable X in
Table 6.1 with parameter p = 1 − q. From Table 6.1, we know that random variable K has
MGF
(1 − q)es
φK (s) = (2)
1 − qes
Since K is independent of each Xi , V = X1 + · · · + XK is a random sum of random variables.
From Theorem 6.12,
(1 − q) λ−s
λ
(1 − q)λ
φV (s) = φK (ln φX (s)) = = (3)
1− λ
q λ−s (1 − q)λ − s
We see that the MGF of V is that of an exponential random variable with parameter (1− q)λ.
The PDF of V is
(1 − q)λe−(1−q)λv v ≥ 0
fV (v) = (4)
0 otherwise
245
Problem 6.5.2 Solution
The number N of passes thrown has the Poisson PMF and MGF
(30)n e−30 /n! n = 0, 1, . . . s
PN (n) = φN (s) = e30(e −1) (1)
0 otherwise
Let Xi = 1 if pass i is thrown and completed and otherwise Xi = 0. The PMF and MGF of each
Xi is ⎧
⎨ 1/3 x = 0
PXi (x) = 2/3 x = 1 φXi (s) = 1/3 + (2/3)es (2)
⎩
0 otherwise
The number of completed passes can be written as the random sum of random variables
K = X1 + · · · + XN (3)
We see that K has the MGF of a Poisson random variable with mean E[K] = 30(2/3) = 20,
variance Var[K] = 20, and PMF
(20)k e−20 /k! k = 0, 1, . . .
PK (k) = (5)
0 otherwise
V + Y1 + · · · + YK (1)
246
The PDF of V cannot be found in a simple form. However, we can use the MGF to calculate the
mean and variance. In particular,
$ $
dφV (s) $$ 300s/(1−15s) 300 $
$
E [V ] = $ =e = 300 (5)
ds (1 − 15s) $s=0
2
$
s=0
2 d2 φV (s) $$
E V = (6)
ds2 $s=0
2 $
$
300 9000 $
= e300s/(1−15s) + e300s/(1−15s)
$ = 99, 000 (7)
(1 − 15s) 2 (1 − 15s) $
3
s=0
Thus, V has variance Var[V ] = E[V 2 ] − (E[V ])2 = 9, 000 and standard deviation σV ≈ 94.9.
A second way to calculate the mean and variance of V is to use Theorem 6.13 which says
V = X1 + · · · + XK (2)
From Appendix A,
s −1]
φX (s) = (1 − q) + qes φK (s) = er[e (3)
By Theorem 6.12,
s −1)
φV (s) = φK (ln φX (s)) = er[φX (s)−1] = erq(e (4)
Hence, we see that V has the MGF of a Poisson random variable with mean E[V ] = rq. The PMF
of V is
(rq)v e−rq /v! v = 0, 1, 2, . . .
PV (v) = (5)
0 otherwise
(a) We can view K as a shifted geometric random variable. To find the MGF, we start from first
principles with Definition 6.1:
∞
∞
p
φK (s) = esk p(1 − p)k = p [(1 − p)es ]k = (1)
1 − (1 − p)es
k=0 n=0
247
2 /2
(b) First, we need to recall that each Xi has MGF φX (s) = es+s . From Theorem 6.12, the
MGF of R is
p
φR (s) = φK (ln φX (s)) = φK (s + s2 /2) = (2)
1 − (1 − p)es+s2 /2
(c) To use Theorem 6.13, we first need to calculate the mean and variance of K:
$ $
dφK (s) $$ p(1 − p)es 2 $$ 1−p
E [K] = $ = $ = (3)
ds s=0 1 − (1 − p)e s $ p
$ s=0
$
2 d2 φK (s) $$ [1 − (1 − p)es ]es + 2(1 − p)e2s $$
E K = = p(1 − p) (4)
ds2 $s=0 [1 − (1 − p)es ]3 $
s=0
(1 − p)(2 − p)
= (5)
p2
Hence, Var[K] = E[K 2 ] − (E[K])2 = (1 − p)/p2 . Finally. we can use Theorem 6.13 to write
1−p 1−p 1 − p2
Var[R] = E [K] Var[X] + (E [X])2 Var[K] = + 2 = (6)
p p p2
248
A second way is to argue that symmetry implies E[Xi |X1 + · · · + Xn = k] = γ, the same for each
i. In this case,
n
nγ = E [Xi |X1 + · · · + Xn = k] = E [X1 + · · · + Xn |X1 + · · · + Xn = k] = k (7)
i=1
k
k
k k2
E [U |K = k] = E [Xi |X1 + · · · + Xn = k] = = (8)
n n
i=1 i=1
This says that the random variable E[U |K] = K 2 /n. Using iterated expectations, we have
E [U ] = E [E [U |K]] = E K 2 /n (9)
Since K is a binomial random variable, we know that E[K] = np and Var[K] = np(1 − p). Thus,
1 2 1
E [U ] = E K = Var[K] + (E [K])2 = p(1 − p) + np2 (10)
n n
On the other hand, V is just and ordinary random sum of independent random variables and the
mean of E[V ] = E[X]E[M ] = np2 .
(a) It is tempting to use Theorem 6.12 to find φY (s); however, this would be wrong since each Xi
is not independent of N . In this problem, we must start from first principles using iterated
expectations.
∞
φY (s) = E E es(X1 +···+XN ) |N = PN (n) E es(X1 +···+Xn ) |N = n (2)
n=1
Given N = n, we know that games 1 through n − 1 were either wins or ties and that game n
was a loss. That is, given N = n, Xn = 0 and for i < n, Xi = 0. Moreover, for i < n, Xi has
the conditional PMF
1/2 x = 1, 2
PXi |N =n (x) = PXi |Xi
=0 (x) = (4)
0 otherwise
249
Now we can find the MGF of Y .
∞
φY (s) = PN (n) E esX1 |N = n E esX2 |N = n · · · E esXn |N = n (7)
n=1
∞ ∞
n−1 1 n
= PN (n) es /2 + e2s /2 = s 2s
PN (n) es /2 + e2s /2 (8)
e /2 + e /2
n=1 n=1
It follows that
∞
1 n ln[(es +e2s )/2] φN (ln[es /2 + e2s /2])
φY (s) = PN (n) e = (9)
es /2 + e2s /2 es /2 + e2s /2
n=1
The tournament ends as soon as you lose a game. Since each game is a loss with probability
1/3 independent of any previous game, the number of games played has the geometric PMF
and corresponding MGF
(2/3)n−1 (1/3) n = 1, 2, . . . (1/3)es
PN (n) = φN (s) = (10)
0 otherwise 1 − (2/3)es
(b) To find the moments of Y , we evaluate the derivatives of the MGF φY (s). Since
we see that $
dφY (s) $$ 3
E [Y ] = = =3 (13)
ds $s=0 9(1/3)2
If you’re curious, you may notice that E[Y ] = 3 precisely equals E[N ]E[Xi ], the answer you
would get if you mistakenly assumed that N and each Xi were independent. Although this
may seem like a coincidence, its actually the result of theorem known as Wald’s equality.
The second derivative of the MGF is
d2 φY (s) (1 − es /3 − e2s /3)(es + 4e2s ) + 2(es + 2e2s )2 /3
= (14)
ds2 9(1 − es /3 − e2s /3)3
250
Problem 6.6.1 Solution
We know that the waiting time, W is uniformly distributed on [0,10] and therefore has the following
PDF.
1/10 0 ≤ w ≤ 10
fW (w) = (1)
0 otherwise
We also know that the total time is 3 milliseconds plus the waiting time, that is X = W + 3.
With these facts, we can answer the questions posed by the problem.
(a) Let X1 , . . . , X120 denote the set of call durations (measured in minutes) during the month.
From the problem statement, each X − I is an exponential (λ) random variable with E[Xi ] =
1/λ = 2.5 min and Var[Xi ] = 1/λ2 = 6.25 min2 . The total number of minutes used during
the month is Y = X1 + · · · + X120 . By Theorem 6.1 and Theorem 6.3,
251
The subscriber’s bill is 30 + 0.4(y − 300)+ where x+ = x if x ≥ 0 or x+ = 0 if x < 0. the
subscribers bill is exactly $36 if Y = 315. The probability the subscribers bill exceeds $36
equals
Y − 300 315 − 300 15
P [Y > 315] = P > =Q √ = 0.2919. (2)
σY σY 750
(b) If the actual call duration is Xi , the subscriber is billed for Mi = Xi minutes. Because
each Xi is an exponential (λ) random variable, Theorem 3.9 says that Mi is a geometric (p)
random variable with p = 1 − e−λ = 0.3297. Since Mi is geometric,
1 1−p
E [Mi ] = = 3.033, Var[Mi ] = = 6.167. (3)
p p2
The number of billed minutes in the month is B = M1 + · · · + M120 . Since M1 , . . . , M120 are
iid random variables,
Similar to part (a), the subscriber is billed $36 if B = 315 minutes. The probability the
subscriber is billed more than $36 is
B − 364 315 − 365
P [B > 315] = P √ > √ = Q(−1.8) = Φ(1.8) = 0.964. (5)
740.08 740.08
Since we can perform the exact calculation, using a central limit theorem may seem silly; however
for large n, calculating nn or n! is difficult for large n. Moreover, it’s interesting to see how good
the approximation is. In this case, the approximation is
n + 0.5 − n n − 0.5 − n 1
P [Wn = n] = P [n ≤ Wn ≤ n] ≈ Φ √ −Φ √ = 2Φ √ − 1 (3)
n n 2 n
The comparison of the exact calculation and the approximation are given in the following table.
P [Wn = n] n = 1 n = 4 n = 16 n = 64
exact 0.3679 0.1954 0.0992 0.0498 (4)
approximate 0.3829 0.1974 0.0995 0.0498
252
Problem 6.7.2 Solution
(a) Since the number of requests N has expected value E[N ] = 300 and variance Var[N ] = 300,
we need C to satisfy
N − 300 C − 300
P [N > C] = P √ > √ (1)
300 300
C − 300
=1−Φ √ = 0.05. (2)
300
From Table 3.1, we note that Φ(1.65) = 0.9505. Thus,
√
C = 300 + 1.65 300 = 328.6. (3)
(c) This part of the problem could be stated more carefully. Re-examining Definition 2.10 for
the Poisson random variable and the accompanying discussion in Chapter 2, we observe that
the webserver has an arrival rate of λ = 300 hits/min, or equivalently λ = 5 hits/sec. Thus
in a one second interval, the number of requests N is a Poisson (α = 5) random variable.
However, since the server “capacity” in a one second interval is not precisely defined, we will
make the somewhat arbitrary definition that the server capacity is C = 328.6/60 = 5.477
packets/sec. With this somewhat arbitrary definition, the probability of overload in a one
second interval is
P N > C = 1 − P N ≤ 5.477 = 1 − P N ≤ 5 . (5)
Because the number of arrivals in the interval is small, it would be a mistake to use the
Central Limit Theorem to estimate this overload probability. However, the direct calculation
of the overload probability is not hard. For E[N ] = α = 5,
5
5
αn
1 − P N ≤ 5 = 1 − PN (n) = 1 − e−α = 0.3840. (6)
n!
n=0 n=0
(d) Here we find the smallest C such that P [N ≤ C] ≥ 0.95. From the previous step, we know
that C > 5. Since N is a Poisson (α = 5) random variable, we need to find the smallest C
such that
C
P [N ≤ C] = αn e−α /n! ≥ 0.95. (7)
n=0
253
(e) If we use the Central Limit theorem to estimate the overload probability in a one second
interval, we would use the facts that E[N ] = 5 and Var[N ] = 5 to estimate the the overload
probability as
5−5
1−P N ≤5 =1−Φ √ = 0.5 (8)
5
which overestimates the overload probability by roughly 30 percent. We recall from Chapter 2
that a Poisson random is the limiting case of the (n, p) binomial random variable when n is
large and np = α.In general, for fixed p, the Poisson and binomial PMFs become closer as n
increases. Since large n is also the case for which the central limit theorem applies, it is not
surprising that the the CLT approximation for the Poisson (α) CDF is better when α = np
is large.
Comment: Perhaps a more interesting question is why the overload probability in a one-second
interval is so much higher than that in a one-minute interval? To answer this, consider a T -second
interval in which the number of requests NT is a Poisson (λT ) random variable while the server
capacity is cT hits. In the earlier problem parts, c = 5.477 hits/sec. We make the assumption that
the server system is reasonably well-engineered in that c > λ. (We will learn in Chapter 12 that to
assume otherwise means that the backlog of requests will grow without bound.) Further, assuming
T is fairly large, we use the CLT to estimate the probability of overload in a T -second interval as
√
NT − λT cT − λT
P [NT ≥ cT ] = P √ ≥ √ =Q k T , (9)
λT λT
√
where k = (c − λ)/ λ. As long as c > λ, the overload probability decreases with increasing T .
In fact, the overload probability goes rapidly to zero as T becomes large. The reason is that the
gap cT − λT between server capacity cT and the expected number of requests λT √ grows linearly
in T while the standard deviation of the number of requests grows proportional to T . However,
one should add that the definition of a T -second overload is somewhat arbitrary. In fact, one can
argue that as T becomes large, the requirement for no overloads simply becomes less stringent. In
Chapter 12, we will learn techniques to analyze a system such as this webserver in terms of the
average backlog of requests and the average delay in serving in serving a request. These statistics
won’t depend on a particular time period T and perhaps better describe the system performance.
(a) The number of tests L needed to identify 500 acceptable circuits is a Pascal (k = 500, p = 0.8)
random variable, which has expected value E[L] = k/p = 625 tests.
(b) Let K denote the number of acceptable circuits in n = 600 tests. Since K is binomial
(n = 600, p = 0.8), E[K] = np = 480 and Var[K] = np(1 − p) = 96. Using the CLT, we
estimate the probability of finding at least 500 acceptable circuits as
K − 480 20 20
P [K ≥ 500] = P √ ≥√ ≈Q √ = 0.0206. (1)
96 96 96
1.0-binomialcdf(600,0.8,499)
ans =
0.0215
254
(d) We need to find the smallest value of n such that the binomial (n, p) random variable K
satisfies P [K ≥ 500] ≥ 0.9. Since E[K] = np and Var[K] = np(1−p), the CLT approximation
yields 0 1
K − np 500 − np
P [K ≥ 500] = P ≥ ≈ 1 − Φ(z) = 0.90. (2)
np(1 − p) np(1 − p)
where z = (500 − np)/ np(1 − p). It follows that 1 − Φ(z) = Φ(−z) ≥ 0.9, implying
z = −1.29. Since p = 0.8, we have that
np − 500 = 1.29 np(1 − p). (3)
2 /2−sc
We can minimize es by minimizing the exponent s2 /2 − sc. By setting
d 2
s /2 − sc = 2s − c = 0 (2)
ds
2
we obtain s = c. At s = c, the upper bound is P [Z ≥ c] ≤ e−c /2 . The table below compares this
upper bound to the true probability. Note that for c = 1, 2 we use Table 3.1 and the fact that
Q(c) = 1 − Φ(c).
We see that in this case, the Chernoff bound typically overestimates the true probability by roughly
a factor of 10.
Since Z is N [0, 1], we can apply the result of Problem 6.8.1 with c replaced by (c − µ)/σ. This
yields
2 2
P [X ≥ c] = P [Z ≥ (c − µ)/σ] ≤ e−(c−µ) /2σ (2)
255
Problem 6.8.3 Solution
From Appendix A, we know that the MGF of K is
s −1)
φK (s) = eα(e (1)
2
(λw)k e−λw
FW (w) = 1 − (1)
k!
k=0
Since φWn (s) = (φX (s))n , applying the Chernoff bound to Wn yields
n
P [Wn ≥ nc] ≤ min e−snc φWn (s) = min e−sc φX (s) (2)
s≥0 s≥0
For y ≥ 0, y n is a nondecreasing function of y. This implies that the value of s that minimizes
e−sc φX (s) also minimizes (e−sc φX (s))n . Hence
n
P [Mn (X) ≥ c] = P [Wn ≥ nc] ≤ min e−sc φX (s) (3)
s≥0
256
Problem 6.9.1 Solution
Note that Wn is a binomial (10n , 0.5) random variable. We need to calculate
A complication is that the event Wn < w is not the same as Wn ≤ w when w is an integer. In this
case, we observe that
Thus
9 :
P [Bn ] = FWn (0.501 × 10n ) − FWn 0.499 × 109 − 1 (4)
function pb=binomialcdftest(N);
pb=zeros(1,N);
for n=1:N,
w=[0.499 0.501]*10^n;
w(1)=ceil(w(1))-1;
pb(n)=diff(binomialcdf(10^n,0.5,w));
end
Unfortunately, on this user’s machine (a Windows XP laptop), the program fails for N = 4. The
problem, as noted earlier is that binomialcdf.m uses binomialpmf.m, which fails for a binomial
(10000, p) random variable. Of course, your mileage may vary. A slightly better solution is to use
the bignomialcdf.m function, which is identical to binomialcdf.m except it calls bignomialpmf.m
rather than binomialpmf.m. This enables calculations for larger values of n, although at some cost
in numerical accuracy. Here is the code:
function pb=bignomialcdftest(N);
pb=zeros(1,N);
for n=1:N,
w=[0.499 0.501]*10^n;
w(1)=ceil(w(1))-1;
pb(n)=diff(bignomialcdf(10^n,0.5,w));
end
257
Problem 6.9.2 Solution
The Erlang (n, λ = 1) random variable X has expected value E[X] = n/λ = n and variance
Var[X] = n/λ2 = n. The PDF of X as well as the PDF of a Gaussian random variable Y with the
same expected value and variance are
⎧ n−1 −x
⎨ x e
x≥0 1 2
fX (x) = (n − 1)! fY (x) = √ e−x /2n (1)
⎩ 2πn
0 otherwise
function df=erlangclt(n); From the forms of the functions, it not likely to be apparent that
r=3*sqrt(n); fX (x) and fY (x) are similar. The following program plots fX (x)
x=(n-r):(2*r)/100:n+r; and fY (x) for values of x within three standard deviations of the
fx=erlangpdf(n,1,x); expected value n. Below are sample outputs of erlangclt(n) for
fy=gausspdf(n,sqrt(n),x);
n = 4, 20, 100.
plot(x,fx,x,fy);
df=fx-fy;
In the graphs we will see that as n increases, the Erlang PDF becomes increasingly similar to
the Gaussian PDF of the same expected value and variance. This is not surprising since the Erlang
(n, λ) random variable is the sum of n of exponential random variables and the CLT says that the
Erlang CDF should converge to a Gaussian CDF as n gets large.
0.1
0.2 0.04
fX(x) fY(x)
fX(x) fY(x)
fX(x) fY(x)
0.15
0.05
0.1 0.02
0.05
0 0 0
0 5 10 10 20 30 80 100 120
x x x
erlangclt(4) erlangclt(20) erlangclt(100)
On the other hand, the convergence should be viewed with some caution. For example, the
mode (the peak value) of the Erlang PDF occurs at x = n − 1 while the mode of the Gaussian PDF
is at x = n. This difference only appears to go away for n = 100 because the graph x-axis range
is expanding. More important, the two PDFs are quite different far away from the center of the
distribution. The Erlang PDF is always zero for x < 0 while the Gaussian PDF is always positive.
For large postive x, the two distributions do not have the same exponential decay. Thus it’s not a
good idea to use the CLT to estimate probabilities of rare events such as {X > x} for extremely
large values of x.
258
function y=binomcltpmf(n,p)
x=-1:17;
xx=-1:0.05:17;
y=binomialpmf(n,p,x);
std=sqrt(n*p*(1-p));
clt=gausspdf(n*p,std,xx);
hold off;
pmfplot(x,y,’\it x’,’\it p_X(x) f_X(x)’);
hold on; plot(xx,clt); hold off;
1 0.4
fX(x)
fX(x)
0.5 0.2
pX(x)
pX(x)
0 0
−5 0 5 10 15 20 −5 0 5 10 15 20
x x
binomcltpmf(2,0.5) binomcltpmf(4,0.5)
0.4 0.2
fX(x)
fX(x)
0.2 0.1
pX(x)
pX(x)
0 0
−5 0 5 10 15 20 −5 0 5 10 15 20
x x
binomcltpmf(8,0.5) binomcltpmf(16,0.5)
To see why the values of the PDF and PMF are roughly the same, consider the Gaussian random
variable Y . For small ∆,
FY (x + ∆/2) − FY (x − ∆/2)
fY (x) ∆ ≈ . (1)
∆
For ∆ = 1, we obtain
fY (x) ≈ FY (x + 1/2) − FY (x − 1/2) . (2)
Since the Gaussian CDF is approximately the same as the CDF of the binomial (n, p) random
variable X, we observe for an integer x that
Although the equivalence in heights of the PMF and PDF is only an approximation, it can be
useful for checking the correctness of a result.
259
%convx1x2.m In particular, between its minimum and maximum values,
sw=(0:125); the vector px2 must enumerate all integer values, including
px1=[0,0.04*ones(1,25)]; those which have zero probability. In addition, we write down
px2=zeros(1,101); sw=0:125 directly based on knowledge that the range enu-
px2(10*(1:10))=10*(1:10)/550;
merated by px1 and px2 corresponds to X1 + X2 having a
pw=conv(px1,px2);
h=pmfplot(sw,pw,... minimum value of 0 and a maximum value of 125.
’\itw’,’\itP_W(w)’);
set(h,’LineWidth’,0.25);
The resulting plot will be essentially identical to Figure 6.4. One final note, the command
set(h,’LineWidth’,0.25) is used to make the bars of the PMF thin enough to be resolved indi-
vidually.
0.03
PW(w)
0.02
0.01
0
0 10 20 30 40 50 60
w
260
As an example, suppose X is a discrete uniform (0, 20) random variable and Y is an independent
discrete uniform (0, 80) random variable. The following program sum2unif will generate and plot
the PMF of W = X + Y .
%sum2unif.m
sx=0:20;px=ones(1,21)/21;
sy=0:80;py=ones(1,81)/81;
[pw,sw]=sumfinitepmf(px,sx,py,sy);
h=pmfplot(sw,pw,’\it w’,’\it P_W(w)’);
set(h,’LineWidth’,0.25);
0.015
0.01
PW(w)
0.005
0
0 10 20 30 40 50 60 70 80 90 100
w
261
Problem Solutions – Chapter 7
2
σX 25
Var[M9 (X)] = = (1)
9 9
(b)
Now the probability that M9 (X) > 7 can be approximated using the Central Limit Theorem
(CLT).
(a) Since X1 is a uniform random variable, it must have a uniform PDF over an interval [a, b].
From Appendix A, we can look up that µX = (a+b)/2 and that Var[X] = (b−a)2 /12. Hence,
given the mean and variance, we obtain the following equations for a and b.
Solving these equations yields a = 4 and b = 10 from which we can state the distribution of
X.
1/6 4 ≤ x ≤ 10
fX (x) = (2)
0 otherwise
Var[X] 3
Var[M16 (X)] = = (3)
16 16
262
(c)
' ∞ ' 10
P [X1 ≥ 9] = fX1 (x) dx = (1/6) dx = 1/6 (4)
9 9
(d) The variance of M16 (X) is much less than Var[X1 ]. Hence, the PDF of M16 (X) should
be much more concentrated about E[X] than the PDF of X1 . Thus we should expect
P [M16 (X) > 9] to be much less than P [X1 > 9].
Thus Var[Y ] = 1/5 − (1/3)2 = 4/45 and the sample mean Mn (Y ) has standard error
)
4
en = . (4)
45n
(a) Since Yn = X2n−1 + (−X2n ), Theorem 6.1 says that the expected value of the difference is
(b) By Theorem 6.2, the variance of the difference between X2n−1 and X2n is
263
(c) Each Yn is the difference of two samples of X that are independent of the samples used by
any other Ym . Thus Y1 , Y2 , . . . is an iid random sequence. By Theorem 7.1, the mean and
variance of Mn (Y ) are
Var[W ] 1002
P [|W − E [W ] | ≥ 200] ≤ ≤ = 0.25 (1)
2002 2002
The following table compares the upper bound and the true probability:
The Chebyshev bound gets increasingly weak as k goes up. As an example, for k = 4, the bound
exceeds the true probability by a factor of 1,000 while for k = 5 the bound exceeds the actual
probability by a factor of nearly 100,000.
264
the Yi are iid uniform (0, 1) random variables. By Theorem 3.20 to conclude that W = 30V is the
sum of three iid uniform (0, 30) random variables.
To start, let V2 = Y1 + Y2 . Since each Y1 has a PDF shaped like a unit area pulse, the PDF of
V2 is the triangular function
1 ⎧
⎨ v 0≤v≤1
fV (v)
2
0.5
fV2 (v) = 2−v 1<v ≤2 (1)
⎩
0 0 otherwise
0 1 2
v
The PDF of V is the convolution integral
' ∞
fV (v) = fV2 (y) fY3 (v − y) dy (2)
−∞
' 1 ' 2
= yfY3 (v − y) dy + (2 − y)fY3 (v − y) dy. (3)
0 1
To compute the convolution, it is helpful to depict the three distinct cases. In each case, the square
“pulse” is fY3 (v − y) and the triangular pulse is fV2 (y).
1 1 1
0 0 0
−1 0 1 2 3 −1 0 1 2 3 −1 0 1 2 3
0≤v<1 1≤v<2 2≤v<3
From the graphs, we can compute the convolution for each case:
' v
1
0≤v<1: fV3 (v) = y dy = v 2 (5)
0 2
' 1 ' v
3
1≤v<2: fV3 (v) = y dy + (2 − y) dy = −v 2 + 3v − (6)
v−1 1 2
' 2
(3 − v)2
2≤v<3: fV3 (v) = (2 − y) dy = (7)
v−1 2
To complete the problem, we use Theorem 3.20 to observe that W = 30V3 is the sum of three iid
uniform (0, 30) random variables. From Theorem 3.19,
⎧
⎪ 2
⎪ (w/30) /60 0 ≤ w < 30,
⎨
1 [−(w/30) + 3(w/30) − 3/2]/30 30 ≤ w < 60,
2
fW (w) = fV3 (v3 ) v/30 = (8)
30 ⎪
⎪ [3 − (w/30)]2 /60 60 ≤ w < 90,
⎩
0 otherwise.
265
Finally, we can compute the exact probability
' $90
1 9
(3 − w/30)3 $$ 1
P [W ≥ 75] = 0[3 − (w/30)] dw = −
2
$ = 48 (9)
60 75 6 75
For comparison, the Markov inequality indicated that P [W < 75] ≤ 3/5 and the Chebyshev in-
equality showed that P [W < 75] ≤ 1/4. As we see, both inequalities are quite weak in this case.
E [R] 54
P [R ≥ 250] ≤ = = 0.432. (2)
250 125
>> 1-pascalcdf(3,1/36,249)
ans =
0.0299
Thus the Markov and Chebyshev inequalities are valid bounds but not good estimates of
P [R ≥ 250].
P [µ − σ ≤ Y ≤ µ + σ] = P [−σ ≤ Y − µ ≤ σ] (1)
Y −µ
= P −1 ≤ ≤1 (2)
σ
= Φ(1) − Φ(−1) = 2Φ(1) − 1 = 0.6827. (3)
Note that Y can be any Gaussian random variable, including, for example, Mn (X) when X is
Gaussian. When X is not Gaussian, the same claim holds to the extent that the central limit
theorem promises that Mn (X) is nearly Gaussian for large n.
266
Problem 7.3.2 Solution
It should seem obvious that the result is true since Var[R̂n ] going to zero implies the probablity
that R̂n differs from E[R̂n ] is going to zero. Similarly, the difference between E[R̂n ] and r is also
going to zero deterministically. Hence it ought to follow that R̂n is converging to r in probability.
Here are the details:
We must show that limN →∞ P [|R̂n − r| ≥ ] = 0. First we note that R̂n being asymptotically
unbiased implies that limn→∞ E[R̂n ] = r. Equivalently, given > 0, there exists n0 such that
|E[R̂n ] − r| ≤ 2 /2 for all n ≥ n0 .
Second, we observe that
$ $2 $ $2 $ $2 $ $2
$ $ $ $ $ $ $ $
$R̂n − r$ = $(R̂n − E R̂n ) + (E R̂n − r)$ ≤ $R̂n − E R̂n $ + $E R̂n − r$ . (1)
It follows that $ $2
$ $ Var[R̂n ]
lim P $R̂n − r$ ≥ ≤ lim
2
√ = 0. (6)
n→∞ n→∞ (/ 2)2
(a) Since the expectation of a sum equals the sum of the expectations also holds for vectors,
1 1
n n
E [M(n)] = E [X(i)] = µX = µX . (1)
n n
i=1 i=1
267
(b) The jth component of M(n) is Mj (n) = n1 ni=1 Xj (i), which is just the sample mean of Xj .
Defining Aj = {|Mj (n) − µj | ≥ c}, we observe that
P max |Mj (n) − µj | ≥ c = P [A1 ∪ A2 ∪ · · · ∪ Ak ] . (2)
j=1,...,k
k
Since 2
j=1 σj < ∞, limn→∞ P [maxj=1,...,k |Mj (n) − µj | ≥ c] = 0.
Hence,
1 2
E [Yk ] = E X2k − 2X2k X2k−1 + X2k−1
2
= E X 2 − (E [X])2 = Var[X] (2)
2
Next we observe that Y1 , Y2 , . . . is an iid random sequence. If this independence is not obvious,
consider that Y1 is a function of X1 and X2 , Y2 is a function of X3 and X4 , and so on. Since
X1 , X2 , . . . is an idd sequence, Y1 , Y2 , . . . is an iid sequence. Hence, E[Mn (Y )] = E[Y ] = Var[X],
implying Mn (Y ) is an unbiased estimator of Var[X]. We can use Theorem 7.5 to prove that Mn (Y )
is consistent if we show that Var[Y ] is finite. Since Var[Y ] ≤ E[Y 2 ], it is sufficient to prove that
E[Y 2 ] < ∞. Note that
4 − 4X 3 X
2k 2k−1 + 6X2k X2k−1 − 4X2k X2k−1 + X2k−1
X2k 2 2 3 4
Yk2 = (3)
4
Taking expectations yields
1 3 2 2
E Yk2 = E X 4 − 2E X 3 E [X] + E X (4)
2 2
Hence, if the first four moments of X are finite, then Var[Y ] ≤ E[Y 2 ] < ∞. By Theorem 7.5, the
sequence Mn (Y ) is consistent.
268
(a) From Theorem 6.2, we have
n
n−1
n
Var[X1 + · · · + Xn ] = Var[Xi ] + 2 Cov [Xi , Xj ] (1)
i=1 i=1 j=i+1
Note that Var[Xi ] = σ 2 and for j > i, Cov[Xi , Xj ] = σ 2 aj−i . This implies
n−1
n
Var[X1 + · · · + Xn ] = nσ 2 + 2σ 2 aj−i (2)
i=1 j=i+1
n−1
= nσ 2 + 2σ 2 a + a2 + · · · + an−i (3)
i=1
2aσ
2 n−1
= nσ 2 + (1 − an−i ) (4)
1−a
i=1
2aσ 2 2aσ 2
Var[X1 + · · · + Xn ] = nσ 2 + (n − 1) − a + a2 + · · · + an−1 (5)
1−a 1−a
2
2
2
n(1 + a)σ 2aσ a
= − − 2σ 2
(1 − an−1 ) (6)
1−a 1−a 1−a
(b) Since the expected value of a sum equals the sum of the expected values,
E [X1 ] + · · · + E [Xn ]
E [M (X1 , . . . , Xn )] = =µ (8)
n
The variance of M (X1 , . . . , Xn ) is
Var[X1 + · · · + Xn ] σ 2 (1 + a)
Var[M (X1 , . . . , Xn )] = ≤ (9)
n2 n(1 − a)
Var[M (X1 , . . . , Xn )] σ 2 (1 + a)
P [|M (X1 , . . . , Xn ) − µ| ≥ c] ≤ ≤ (10)
c2 n(1 − a)c2
(c) Taking the limit as n approaches infinity of the bound derived in part (b) yields
σ 2 (1 + a)
lim P [|M (X1 , . . . , Xn ) − µ| ≥ c] ≤ lim =0 (11)
n→∞ n→∞ n(1 − a)c2
Thus
lim P [|M (X1 , . . . , Xn ) − µ| ≥ c] = 0 (12)
n→∞
269
Problem 7.3.7 Solution
(a) Since the expectation of the sum equals the sum of the expectations,
1 n
1 n
E R̂(n) = E X(m)X (m) = R = R. (1)
n n
m=1 m=1
(b) This proof follows the method used to solve Problem 7.3.4. The i, jth element of R̂(n) is
1
n
R̂i,j (n) = n m=1 Xi (m)Xj (m), which is just the sample mean of Xi Xj . Defining the event
3$ $ 4
$ $
Ai,j = $R̂i,j (n) − E [Xi Xj ]$ ≥ c , (2)
we observe that
$ $
$ $
P max $R̂i,j (n) − E [Xi Xj ]$ ≥ c = P [∪i,j Ai,j ] . (3)
i,j
By the result of Problem 4.11.8, Xi Xj , the product of jointly Gaussian random variables, has
finite variance. Thus
k
k
Var[Xi Xj ] = Var[Xi Xj ] ≤ k 2 max Var[Xi Xj ] < ∞. (6)
i,j
i,j i=1 j=1
It follows that
$ $
$ $ k 2 maxi,j Var[Xi Xj ]
lim P max $R̂i,j (n) − E [Xi X − j]$ ≥ c ≤ lim =0 (7)
n→∞ i,j n→∞ nc2
270
(c) Now we wish to find the value of n such that P [|Mn (X) − PX (1)| ≥ .03] ≤ .01. From the
Chebyshev inequality, we write
2
σX
0.1 = . (4)
n(.03)2
2 = 0.09, solving for n yields n = 100.
Since σX
When we know only the mean and variance of Xi , our only real tool is the Chebyshev
inequality which says that
(b) If each Xi is a Gaussian, the sample mean, Mn (X) will also be Gaussian with mean and
variance
In this case,
76 − µ 74 − µ
P [74 ≤ Mn (X) ≤ 76] = Φ −Φ (6)
σ σ
√ √
= Φ( n /15) − Φ(− n /15) (7)
√
= 2Φ( n /15) − 1 = 0.99 (8)
Thus, n = 1,521.
Since even under the Gaussian assumption, the number of samples n is so large that even if the
Xi are not Gaussian, the sample mean may be approximated by a Gaussian. Hence, about 1500
samples probably is about right. However, in the absence of any information about the PDF of Xi
beyond the mean and variance, we cannot make any guarantees stronger than that given by the
Chebyshev inequality.
271
1
n
(b) Let XA,i to denote XA on the ith trial. Since P̂n (A) = Mn (XA ) = n i=1 XA,i ,
1
n
P [A] (1 − P [A])
Var[P̂n (A)] = 2
Var[XA,i ] = . (2)
n n
i=1
(c) Since P̂100 (A) = M100 (XA ), we can use Theorem 7.12(b) to write
$ $
$ $ Var[XA ] 0.16
P $P̂100 (A) − P [A]$ < c ≥ 1 − 2
=1− = 1 − α. (3)
100c 100c2
For c = 0.1, α = 0.16/[100(0.1)2 ] = 0.16. Thus, with 100 samples, our confidence coefficient
is 1 − α = 0.84.
(d) In this case, the number of samples n is unknown. Once again, we use Theorem 7.12(b) to
write $ $
$ $ Var[XA ] 0.16
P $P̂n (A) − P [A]$ < c ≥ 1 − =1− = 1 − α. (4)
nc2 nc2
For c = 0.1, we have confidence coefficient 1 − α = 0.95 if α = 0.16/[n(0.1)2 ] = 0.05, or
n = 320.
p(1 − p)
P [|M100 (X) − p| < c] ≥ 1 − = 1 − α. (1)
100c2
For confidence coefficient 0.99, we require
p(1 − p)
≤ 0.01 or c≥ p(1 − p). (2)
100c2
Since p is unknown, we must ensure that the constraint is met
for every value of p. The worst case
occurs at p = 1/2 which maximizes p(1 − p). In this case, c = 1/4 = 1/2 is the smallest value of
c for which we have confidence coefficient of at least 0.99.
If M100 (X) = 0.06, our interval estimate for p is
Since p ≥ 0, M100 (X) = 0.06 and c = 0.5 imply that our interval estimate is
Our interval estimate is not very tight because because 100 samples is not very large for a confidence
coefficient of 0.99.
272
Since P̂n (A) = Mn (XA ) and E[Mn (XA )] = P [A], we can use Theorem 7.12(b) to write
$ $
$ $ Var[XA ]
P $P̂n (A) − P [A]$ < 0.05 ≥ 1 − . (2)
n(0.05)2
Note that Var[XA ] = P [A](1 − P [A]) ≤ 0.25. Thus for confidence coefficient 0.9, we require that
Var[XA ] 0.25
1− 2
≥1− ≥ 0.9. (3)
n(0.05) n(0.05)2
(b) In this case, we meet the requirement by choosing c = 10−3 P [A] = 10−5 . This implies
$ $ P [A] (1 − P [A]) 9.9 × 107
$ $ 0.0099
P $P̂n (A) − P [A]$ ≥ c ≤ = = (4)
nc2 n10−10 n
The confidence level 0.01 is met if 9.9 × 107 /n = 0.01 or n = 9.9 × 109 .
Thus $ $ Var[X ]
$ $ E
P $P̂n (A) − P [E]$ ≥ c ≤ 2
≤ 2. (3)
nc nc
273
Problem 7.5.1 Solution
In this problem, we have to keep straight that the Poisson expected value α = 1 is a different
α than the confidence coefficient 1 − α. That said, we will try avoid using α for the confidence
coefficient. Using X to denote the Poisson (α = 1) random variable, the trace of the sample mean
is the sequence M1 (X), M2 (X), . . . The confidence interval estimate of α has the form
Since Var[Mn (X)] = Var[X]/n = 1/n, the 0.9 confidence interval shrinks with increasing n. In
particular, c = cn will be a decreasing sequence. Using a Central Limit Theorem approximation, a
0.9 confidence implies
0 1
−cn Mn (X) − α cn
0.9 = P ≤ ≤ (4)
1/n 1/n 1/n
√ √ √
= Φ(cn n) − Φ(−cn n) = 2Φ(cn n) − 1. (5)
√ √
Equivalently, Φ(cn n) = 0.95 or cn = 1.65/ n.
Thus, as a function of the number of samples n, we plot three functions: the sample mean
√ √
Mn (X), and the upper limit Mn (X) + 1.65/ n and lower limit Mn (X) − 1.65/ n of the 0.9
confidence interval. We use the Matlab function poissonmeanseq(n) to generate these sequences
for n sample values.
function M=poissonmeanseq(n);
x=poissonrv(1,n);
nn=(1:n)’;
M=cumsum(x)./nn;
r=(1.65)./sqrt(nn);
plot(nn,M,nn,M+r,nn,M-r);
2 3
2
1
Mn(X)
Mn(X)
1
0
0
−1
−1
−2 −2
0 20 40 60 0 200 400 600
n n
poissonmeanseq(60) poissonmeanseq(600)
274
Problem 7.5.2 Solution
For a Bernoulli (p = 1/2) random variable X, the sample mean Mn (X) is the fraction of successes
in n Bernoulli trials. That is, Mn (X) = Kn /n where Kn is a binomial (n, p = 1/2) random variable.
Thus the probability the sample mean is within one standard error of (p = 1/2) is
√ √
n n n n
pn = P − ≤ Kn ≤ + (1)
2 2 2 2
√ √
n n n n
= P Kn ≤ + − P Kn < − (2)
2 2 2 2
√
% √ &
n n n n
= FKn + − FKn − −1 (3)
2 2 2 2
Here is a Matlab function that graphs pn as a function of n for N steps alongside the output
graph for bernoullistderr(50).
function p=bernoullistderr(N); 1
p=zeros(1,N); 0.9
for n=1:N,
r=[ceil((n-sqrt(n))/2)-1; ... 0.8
pn
(n+sqrt(n))/2];
0.7
p(n)=diff(binomialcdf(n,0.5,r));
end 0.6
plot(1:N,p);
ylabel(’\it p_n’); 0.5
0 10 20 30 40 50
xlabel(’\it n’); n
275
We can conclude that
λ λ
λ̂ = λ̃ = (3)
Mn (Y ) Vn (Y )
For λ = 1, the estimators λ̂ and λ̃ are just scaled versions of the estimators for the case λ = 1.
Hence it is sufficient to consider only the λ = 1 case. The function z=lamest(n,m) returns the
estimation errors for m trials of each estimator where each trial uses n iid exponential (1) samples.
Finally, z is a 2 × m matrix such that column i of z records the estimation errors for trial i. If
λ̂i and λ̃i are the estimates for for trial i, then z(1,i) is the error Ẑi = λ̂i − 1 while z(2,i) is the
error Z̃i = λ̃i − 1.
Now that we can simulate the errors generated by each estimator, we need to determine which
estimator is better. We start by using the commands
z=lamest(1000,1000);
plot(z(1,:),z(2,:),’bd’)
to perform 1,000 trials, each using 1,000 samples. The plot command generates a scatter plot of
the error pairs (Ẑi , Z̃i ) for each trial. Here is an example of the resulting scatter plot:
0.2
0.1
z(2,i)
−0.1
−0.2
−0.1 −0.05 0 0.05 0.1 0.15
z(1,i)
In the scatter plot, each diamond marks an independent pair (Ẑ, Z̃) where Ẑ is plotted on the x-axis
and Z̃ is plotted on the y-axis. (Although it is outside the scope of this solution, it is interesting
to note that the errors Ẑ and Z̃ appear to be positively correlated.) From the plot, it may not be
obvious that one estimator is better than the other. However, by reading the axis ticks carefully,
one can observe that it appears that typical values for Ẑ are in the interval (−0.05, 0.05) while
typical values for Z̃ are in the interval (−0.1, 0.1). This suggests that Ẑ may be superior. To verify
this observation, we calculate the sample mean for each squared errors
1 2 1 2
m m
Mm (Ẑ 2 ) = Ẑi Mm (Z̃ 2 ) = Z̃i (4)
m m
i=1 i=1
276
>> sum(z.^2,2)/1000
ans =
0.0010
0.0021
That is, M1,000 (Ẑ 2 ) = 0.0010 and M1,000 (Z̃ 2 ) = 0.0021. In fact, one can show (with a lot of work)
for large m that
Mm (Ẑ 2 ) ≈ 1/m Mm (Z̃ 2 ) = 2/m (5)
and that
Mm (Z̃ 2 )
lim = 2. (6)
m→∞ Mm (Ẑ 2 )
In short, the mean squared error of the λ̃ estimator is twice that of the λ̂ estimator.
(In the original printing, 0.05 was 0.01 but that requirement demanded that n be so large that
most installations of Matlab would grind to a halt on the calculations.)
The Matlab program uses a matrix algebra identity that may (or may not) be familiar. For a
matrix
X = x(1) x(2) · · · x(n) , (3)
with columns x(i), we can write
n
XX = x(i)x (i). (4)
i=1
277
n=[10 100 1000 10000];
p=diagtest(n,2000);
The output is
p=
1.0000 1.0000 1.0000 0.0035
We see that p(n) goes from roughly 1 to almost 0 in going from n = 1,000 to n = 10,000. To
investigate this transition more carefully, we execute the commands
nn=1000:500:10000;
p=diagtest(nn,2000);
The output is shown in the following graph. We use a semilog plot to emphasize differences when
p(n) is close to zero.
0
10
−1
10
p(n)
−2
10
−3
10
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
n
Beyond n = 1,000, the probability p(n) declines rapidly. The “bumpiness” of the graph for large n
occurs because the probability p(n) is small enough that out of 2,000 trials, the 0.05 threshold is
exceeded only a few times.
Note that if x has dimension greater than 10, then the value of n needed to ensure that p(n) is
small would increase.
• p > 0, or
• p = 0 but the Chebyshev inequality isn’t a sufficient powerful technique to verify this fact.
To resolve whether p = 0 (and the sample mean converges to the expected value) one can spend
time trying to prove either p = 0 or p > 0. At this point, we try some simulation experiments to
see if the experimental evidence points one way or the other.
As requested by the problem, we implement a Matlab function samplemeantest(n,a) to
simulate one hundred traces of the sample mean when E[X] = a. Each trace is a length n sequence
M1 (X), M2 (X), . . . , Mn (X).
278
function mx=samplemeantest(n,a); The n × 100 matrix x consists of iid samples of
u=rand(n,100); X. Taking cumulative sums along each column of
x=a-2+(1./sqrt(1-u)); x, and dividng row i by i, each column of mx is
d=((1:n)’)*ones(1,100); a length n sample mean trace. we then plot the
mx=cumsum(x)./d;
traces.
plot(mx);
xlabel(’\it n’); ylabel(’\it M_n(X)’);
axis([0 n a-1 a+1]);
The following graph was generated by samplemeantest(1000,5):
5.5
Mn(X)
4.5
4
0 100 200 300 400 500 600 700 800 900 1000
n
Frankly, it is difficult to draw strong conclusions from the graph. If the sample sequences Mn (X)
are converging to E[X], the convergence is fairly slow. Even after averaging 1,000 samples, typical
values for the sample mean appear to range from a − 0.5 to a + 0.5. There may also be outlier
sequences which are still off the charts since we truncated the y-axis range. On the other hand, the
sample mean sequences do not appear to be diverging (which is also possible since Var[X] = ∞.)
Note the above graph was generated using 105 sample values. Repeating the experiment with more
samples, say samplemeantest(10000,5), will yield a similarly inconclusive result. Even if your
version of Matlab can support the generation of 100 times as many samples, you won’t know for
sure whether the sample mean sequence always converges. On the other hand, the experiment is
probably enough that if you pursue the analysis, you should start by trying to prove that p = 0.
(This will make a a fun problem for the third edition!)
279
Problem Solutions – Chapter 8
280
(b) Now we wish to develop a test of the form
Thus we need to find the value of c that makes the above probability true. This value will
tell us that if we observe more than c heads, then with significance level α = 0.01, we should
reject the hypothesis that the coin is fair. To find this value of c we look to evaluate the CDF
k
100
FK (k) = (1/2)100 . (7)
i
i=0
From Table 3.1, we have (c − 50)/5 = 2.35 or c = 61.75. Once again, we see that we reject
the hypothesis if we observe 62 or more heads.
(a) The lifetime of a chip is X, an exponential (λ) random variable with λ = (T /200)2 . The
probability p that a chip passes the one-day test is
(c) Raising T raises the failure rate λ = (T /200)2 and thus lowers m = 3.67/λ. In essence, raising
the temperature makes a “tougher” test and thus requires fewer chips to be tested for the
same significance level.
281
Problem 8.1.4 Solution
(a) The rejection region is R = {T > t0 }. The duration of a voice call has exponential PDF
(1/3)e−t/3 t ≥ 0,
fT (t) = (1)
0 otherwise.
The significance level of the test is
' ∞
α = P [T > t0 ] = fT (t) dt = e−t0 /3 . (2)
t0
α = P [X > 0] = 1 − P [X = 0] = 1 − (1 − q0 )n . (1)
Comment: For α = 0.01, keep in mind that there is a one percent probability that a normal
factory will fail the test. That is, a test failure is quite unlikely if the factory is operating normally.
282
Problem 8.2.1 Solution
For the MAP test, we must choose acceptance regions A0 and A1 for the two hypotheses H0 and
H1 . From Theorem 8.2, the MAP rule is
PN |H0 (n) P [H1 ]
n ∈ A0 if ≥ ; n ∈ A1 otherwise. (1)
PN |H1 (n) P [H0 ]
Since PN |Hi (n) = λni e−λi /n!, the MAP rule becomes
n
λ0 P [H1 ]
n ∈ A0 if e−(λ0 −λ1 ) ≥ ; n ∈ A1 otherwise. (2)
λ1 P [H0 ]
By taking logarithms and assuming λ1 > λ0 yields the final form of the MAP rule
λ1 − λ0 + ln(P [H0 ] /P [H1 ])
n ∈ A0 if n ≤ n∗ = ; n ∈ A1 otherwise. (3)
ln(λ1 /λ0 )
From the MAP rule, we can get the ML rule by setting the a priori probabilities to be equal. This
yields the ML rule
λ1 − λ0
n ∈ A0 if n ≤ n∗ = ; n ∈ A1 otherwise. (4)
ln(λ1 /λ0 )
283
(d) The ML rule is the same as the MAP rule when P [H0 ] = P [H1 ]. When P [H0 ] > P [H1 ], the
MAP rule (which minimizes the probability of an error) should enlarge the A0 acceptance
region. Thus we would expect tMAP > tM L .
(f) For a given threshold t0 , we learned in parts (a) and (b) that
The Matlab program rocvoicedataout graphs both receiver operating curves. The program
and the resulting ROC are shown here.
t=0:0.05:30; 1
PFA= exp(-t/3); µD=6
0.8 µD=10
PMISS6= 1-exp(-t/6);
PMISS10=1-exp(-t/10); 0.6
PMISS
plot(PFA,PMISS6,PFA,PMISS10);
0.4
legend(’\mu_D=6’,’\mu_D=10’);
xlabel(’\itP_{\rmFA}’); 0.2
ylabel(’\itP_{\rmMISS}’);
0
0 0.5 1
PFA
As one might expect, larger µD resulted in reduced PMISS for the same PFA .
where where γ is the largest possible value such that L(n)<γ PN |H0 (n) ≤ α.
Given H0 , N is Poisson (a0 = 1,000) while given H1 , N is Poisson (a1 = 1,300). We can solve
for the acceptance set A0 by observing that n ∈ A0 if
PN |H0 (n) an e−a0 /n!
= 0n −a1 ≥ γ. (2)
PN |H1 (n) a1 e /n!
284
Since ln(a0 /a1 ) < 0, divind through reverses the inequality and shows that
(a0 − a1 ) + ln γ (a1 − a0 ) − ln γ
n ∈ A0 if n ≤ n∗ = = ; n ∈ A1 otherwise (4)
ln(a0 /a1 ) ln(a1 /a0 )
However, we still need to determine the constant γ. In fact, it is easier to work with the threshold
n∗ directly. Note that L(n) < γ if and only if n > n∗ . Thus we choose the smallest n∗ such that
P [N > n∗ |H0 ] = PN |H0 (n) α ≤ 10−6 . (5)
n>n∗
To find n∗ a reasonable approach would be to use Central Limit Theorem approximation since
given H0 , N is a Poisson (1,000) random variable, which has the same PDF as the sum on 1,000
independent Poisson (1) random variables. Given H0 , N has expected value a0 and variance a0 .
From the CLT,
∗
∗ N − a0 n∗ − a0 n − a0
P [N > n |H0 ] = P √ > √ |H0 ≈ Q √ ≤ 10−6 . (6)
a0 a0 a0
From Table 3.2, Q(4.75) = 1.02 × 10−6 and Q(4.76) < 10−6 , implying
√
n∗ = a0 + 4.76 a0 = 1150.5. (7)
On the other hand, perhaps the CLT should be used with some caution since α = 10−6 implies we
are using the CLT approximation far from the center of the distribution. In fact, we can check out
answer using the poissoncdf functions:
>> nstar=[1150 1151 1152 1153 1154 1155];
>> (1.0-poissoncdf(1000,nstar))’
ans =
1.0e-005 *
0.1644 0.1420 0.1225 0.1056 0.0910 0.0783
>>
Thus we see that n∗ 1154. Using this threshold, the miss probability is
Keep in mind that this is the smallest possible PMISS subject to the constraint that PFA ≤ 10−6 .
(a) Given H0 , X is Gaussian (0, 1). Given H1 , X is Gaussian (4, 1). From Theorem 8.2, the
MAP hypothesis test is
2 /2
fX|H0 (x) e−x
x ∈ A0 if = ; x ∈ A1 otherwise. (1)
fX|H1 (x) P [H1 ]
e−(x−4)2 /2 ≥
P [H0 ]
Since a target is present with probability P [H1 ] = 0.01, the MAP rule simplifies to
1 P [H1 ]
x ∈ A0 if x ≤ xMAP = 2 − ln = 3.15; x ∈ A1 otherwise. (2)
4 P [H0 ]
285
The false alarm and miss probabilities are
(b) The cost of a false alarm is C10 = 1 unit while the cost of a miss is C01 = 104 units. From
Theorem 8.3, we see that the Minimum Cost test is the same as the MAP test except the
P [H0 ] is replaced by C10 P [H0 ] and P [H1 ] is replaced by C01 P [H1 ]. Thus, we see from thr
MAP test that the minimum cost test is
1 C01 P [H1 ]
x ∈ A0 if x ≤ xMC = 2 − ln = 0.846; x ∈ A1 otherwise. (7)
4 C10 P [H0 ]
The false alarm and miss probabilities are
Because the cost of a miss is so high, the minimum cost test greatly reduces the miss proba-
bility, resulting in a much lower average cost than the MAP test.
286
Problem 8.2.6 Solution
Given H0 , Mn (T ) has expected value E[V ]/n = 3/n and variance Var[V ]/n = 9/n. Given H1 ,
Mn (T ) has expected value E[D]/n = 6/n and variance Var[D]/n = 36/n.
(a) Using a Central Limit Theorem approximation, the false alarm probability is
0 1
Mn (T ) − 3 t0 − 3 √
PFA = P [Mn (T ) > t0 |H0 ] = P > = Q( n[t0 /3 − 1]). (1)
9/n 9/n
We will see shortly that using a CLT approximation for the likelihood functions is something
of a detour. Nevertheless, with a CLT approximation, the likelihood functions are
) )
n −n(t−3)2 /18 n −n(t−6)2 /72
fMn (T )|H0 (t) = e fMn (T )|H1 (t) = e (4)
18π 72π
The negative root of the quadratic is the result of the Gaussian assumption which allows for
a nonzero probability that Mn (T ) will be negative. In this case, hypothesis H1 which has
higher variance becomes more likely. However, since Mn (T ) ≥ 0, we can ignore this root since
it is just an artifact of the CLT approximation.
In fact, the CLT approximation gives an incorrect answer. Note that Mn (T ) = Yn /n where
Yn is a sum of iid exponential random variables. Under hypothesis H0 , Yn is an Erlang
(n, λ0 = 1/3) random variable. Under hypothesis H1 , Yn is an Erlang (n, λ1 = 1/6) random
variable. Since Mn (T ) = Yn /n is a scaled version of Yn , Theorem 3.20 tells us that given
287
hypothesis Hi , Mn (T ) is an Erlang (n, nλi ) random variable. Thus Mn (T ) has likelihood
functions
(nλi )n tn−1 e−nλi t
(n−1)! t≥0
fMn (T )|Hi (t) = (8)
0 otherwise
Using the Erlang likelihood functions, the ML rule becomes
n
fMn (T )|H0 (t) λ0
t ∈ A0 if = e−n(λ0 −λ1 )t ≥ 1; t ∈ A1 otherwise. (9)
fMn (T )|H1 (t) λ1
ln(λ0 /λ1 )
t ∈ A0 if t ≤ tML = = 6 ln 2 = 4.159; t ∈ A1 otherwise. (10)
λ0 − λ1
Since 6 ln 2 = 4.159, this rule is not the same as the rule derived using a CLT approximation.
Using the exact Erlang PDF, the ML rule does not depend on n. Moreoever, even if n → ∞,
the exact Erlang-derived rule and the CLT approximation rule remain different. In fact, the
CLT-based rule is simply an approximation to the correct rule. This highlights that we should
first check whether a CLT approximation is necessary before we use it.
(d) In this part, we will use the exact Erlang PDFs to find the MAP decision rule. From 8.2, the
MAP rule is
n
fMn (T )|H0 (t) λ0 P [H1 ]
t ∈ A0 if = e−n(λ0 −λ1 )t ≥ ; t ∈ A1 otherwise. (11)
fMn (T )|H1 (t) λ1 P [H0 ]
Since P [H0 ] = 0.8 and P [H1 ] = 0.2, the MAP rule simplifies to
ln λλ01 − 1
ln PP [H 1]
n [H0 ] ln 4
t ∈ A0 if t ≤ tMAP = = 6 ln 2 + ; t ∈ A1 otherwise. (12)
λ0 − λ1 n
(e) Although we have seen it is incorrect to use a CLT approximation to derive the decision
rule, the CLT approximation used in parts (a) and (b) remains a good way to estimate the
false alarm and miss probabilities. However, given Hi , Mn (T ) is an Erlang (n, nλi ) random
variable. In particular, given H0 , Mn (T ) is an Erlang (n, n/3) random variable while given
H1 , Mn (T ) is an Erlang (n, n/6). Thus we can also use erlangcdf for an exact calculation
of the false alarm and miss probabilities. To summarize the results of parts (a) and (b), a
threshold t0 implies that
√
PFA = P [Mn (T ) > t0 |H0 ] = 1-erlangcdf(n,n/3,t0) ≈ Q( n[t0 /3 − 1]), (13)
√
PMISS = P [Mn (T ) ≤ t0 |H1 ] = erlangcdf(n,n/6,t0) ≈ Φ( n[t0 /6 − 1]). (14)
(15)
288
%voicedatroc.m
t0=1:0.1:8’;
n=9;
PFA9=1.0-erlangcdf(n,n/3,t0);
PFA9clt=1-phi(sqrt(n)*((t0/3)-1));
PM9=erlangcdf(n,n/6,t0);
PM9clt=phi(sqrt(n)*((t0/6)-1));
n=16;
PFA16=1.0-erlangcdf(n,n/3,t0);
PFA16clt=1.0-phi(sqrt(n)*((t0/3)-1));
PM16=erlangcdf(n,n/6,t0);
PM16clt=phi(sqrt(n)*((t0/6)-1));
plot(PFA9,PM9,PFA9clt,PM9clt,PFA16,PM16,PFA16clt,PM16clt);
axis([0 0.8 0 0.8]);
legend(’Erlang n=9’,’CLT n=9’,’Erlang n=16’,’CLT n=16’);
0.8
Erlang n=9
0.6 CLT n=9
Erlang n=16
0.4 CLT n=16
0.2
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Both the true curve and CLT-based approximations are shown. The graph makes it clear
that the CLT approximations are somewhat innaccurate. It is also apparent that the ROC
for n = 16 is clearly better than for n = 9.
289
This rule simplifies to
p0 /(1 − p0 ) 1 − p1
k ∈ A0 if k ln ≥ ln n; k ∈ A1 otherwise. (5)
p1 /(1 − p1 ) 1 − p0
To proceed further, we need to know if p0 < p1 or if p0 ≥ p1 . For t0 = 4.5,
p0 = e−1.5 = 0.2231 < e−0.75 = 0.4724 = p1 . (6)
In this case, the ML rule becomes
1−p0
ln 1−p1
k ∈ A0 if k ≤ kM L = n = (0.340)n; k ∈ A1 otherwise. (7)
p1 /(1−p1 )
ln p0 /(1−p0 )
function [PFA,PMISS]=binvoicedataroc(n);
t0=[3; 4.5];
p0=exp(-t0/3); p1=exp(-t0/6);
k0=(0:n)’;
PFA=zeros(n+1,2);
for j=1:2,
PFA(:,j) = 1.0-binomialcdf(n,p0(j),k0);
PM(:,j)=binomialcdf(n,p1(j),k0);
end
plot(PFA(:,1),PM(:,1),’-o’,PFA(:,2),PM(:,2),’-x’);
legend(’t_0=3’,’t_0=4.5’);
axis([0 0.8 0 0.8]);
xlabel(’\itP_{\rmFA}’);
ylabel(’\itP_{\rmMISS}’);
290
and here is the resulting ROC:
0.8
t0=3
0.6 t0=4.5
PMISS
0.4
0.2
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
PFA
As we see, the test works better with threshold t0 = 4.5 than with t0 = 3.
The probability of a decoding error is minimized by the MAP rule. Since P [H0 ] = P [H1 ] = 1/2,
the MAP rule is
fY |H0 (y) e−y P [H1 ]
y ∈ A0 if = −y ≥ = 1; y ∈ A1 otherwise. (2)
fY |H1 (y) ye P [H0 ]
y ∈ A0 if y ≤ 1; y ∈ A1 otherwise. (3)
291
When we observe K = k ∈ {0, 1, . . . , n}, plugging in the conditional PMF’s yields the rule
n k n k
k ∈ A0 if q0 (1 − q0 )n−k
> q (1 − q1 )n−k ; k ∈ A1 otherwise. (3)
k k 1
Cancelling common factors, taking the logarithm of both sides, and rearranging yields
Note that q1 > q0 implies q1 /(1 − q1 ) > q0 /(1 − q0 ). Thus, we can rewrite our ML rule as
ln[(1 − q0 )/(1 − q1 )]
k ∈ A0 if k < k ∗ = n ; k ∈ A1 otherwise. (6)
ln[q1 /q0 ] + ln[(1 − q0 )/(1 − q1 )]
(b) Let k ∗ denote the threshold given in part (a). Using n = 500, q0 = 10−4 , and q1 = 10−2 , we
have
ln[(1 − 10−4 )/(1 − 10−2 )]
k ∗ = 500 ≈ 1.078 (7)
ln[10−2 /10−4 ] + ln[(1 − 10−4 )/(1 − 10−2 )]
Thus the ML rule is that if we observe K ≤ 1, then we choose hypothesis H0 ; otherwise, we
choose H1 . The false alarm probability is
(c) In the test of Example 8.8, the geometric random variable N , the number of tests needed
to find the first failure, was used. In this problem, the binomial random variable K, the
number of failures in 500 tests, was used. We will call these two procedures the geometric
(N ) (N )
and the binomial tests. Also, we will use PFA and PMISS to denote the false alarm and miss
(K) (K)
probabilities using the geometric test. We also use PFA and PMISS for the error probabilities
of the binomial test. From Example 8.8, we have the following comparison:
292
When making comparisons between tests, we want to judge both the reliability of the test
as well as the cost of the testing procedure. With respect to the reliability, we see that the
conditional error probabilities appear to be comparable in that
(N ) (K)
PFA PMISS
(K)
= 3.75 but (N )
= 4.57. (17)
PFA PMISS
Roughly, the false alarm probability of the geometric test is about four times higher than
that of the binomial test. However, the miss probability of the binomial test is about four
times that of the geometric test. As for the cost of the test, it is reasonable to assume the
cost is proportional to the number of disk drives that are tested. For the geometric test of
Example 8.8, we test either until the first failure or until 46 drives pass the test. For the
binomial test, we test until either 2 drives fail or until 500 drives the pass the test! You can,
if you wish, calculate the expected number of drives tested under each test method for each
hypothesis. However, it isn’t necessary in order to see that a lot more drives will be tested
using the binomial test. If we knew the a priori probabilities P [Hi ] and also the relative costs
of the two type of errors, then we could determine which test procedure was better. However,
without that information, it would not be unreasonable to conclude that the geometric test
offers performance roughly comparable to that of the binomial test but with a significant
reduction in the expected number of drives tested.
The decision rule that minimizes E[C ] is the same as the minimum cost test in Theorem 8.3 with
the costs C01 and C10 replaced by the differential costs C01 − C and C − C .
11 10 00
293
2 , the conditional PDF of X given H is
Since N is Gaussian with zero mean and variance σN i
1 2 /2σ 2
fX|Hi (x) = e−(x−a(i−1)) N . (2)
2
2πσN
x ∈ A0 if x ≤ −a/2. (5)
A0 = {x|x ≤ −a/2} A1 = {x| − a/2 < x ≤ a/2} A2 = {x|x > a/2} (8)
Graphically, the signal space is one dimensional and the acceptance regions are
A0 A1 A2
s0 s1 s2 X
-a 0 a
Just as in the QPSK system of Example 8.13, the additive Gaussian noise dictates that the ac-
ceptance region Ai is the set of observations x that are closer to si = (i − 1)a than any other
sj .
As in Example 8.13, we will assume N1 and N2 are iid zero mean Gaussian random variables with
variance σ 2 . Thus, given hypothesis Hijk , X1 and X2 are independent and the conditional joint
PDF of X1 and X2 is
fX1 ,X2 |Hijk (x1 , x2 ) = fX1 |Hijk (x1 ) fX2 |Hijk (x2 ) (2)
1 −(x1 −s(1) 2 2 (2) 2
ijk ) /2σ e−(x2 −sijk ) /2σ
2
= 2
e (3)
2πσ
1 −[(x1 −s(1) 2 (2) 2
ijk ) +(x2 −sijk ) ]/2σ
2
= 2
e (4)
2πσ
294
In terms of the distance x − sijk between vectors
0 (1)
1
x sijk
x= 1 sijk = (2) (5)
x2 sijk
we can write
1 −x−sijk 2 /2σ2
fX1 ,X2 |Hi (x1 , x2 ) = e (6)
2πσ 2
Since all eight symbols s000 , . . . , s111 are equally likely, the MAP and ML rules are
x ∈ Aijk if fX1 ,X2 |Hijk (x1 , x2 ) ≥ fX1 ,X2 |Hi j k (x1 , x2 ) for all other Hi j k . (7)
This means that Aijk is the set of all vectors x that are closer to sijk than any other signal.
Graphically, to find the boundary between points closer to sijk than si j k , we draw the line seg-
ment connecting sijk and si j k . The boundary is then the perpendicular bisector. The resulting
boundaries are shown in this figure:
X2
A110 A100
s110 s100
A010 A000
s010 s000
X1
A011 A001
s011 s001
A111 A101
s111 s101
A0 = {x|x ≤ −a/2} A1 = {x| − a/2 < x ≤ a/2} A2 = {x|x > a/2} (1)
To calculate the probability of decoding error, we first calculate the conditional error probabilities
295
Since the three hypotheses H0 , H1 , and H2 each have probability 1/3, the probability of error is
2
4 a
P [DE ] = P [X ∈ Ai |Hi ] P [Hi ] = Q (6)
3 2σN
i=0
fX1 ,X2 |Hi (x1 , x2 ) = fX1 |Hi (x1 ) fX2 |Hi (x2 ) (1)
1 −(x1 −si1 )2 /2σ2 −(x2 −si2 )2 /2σ2
= e e (2)
2πσ 2
1 −[(x1 −si1 )2 +(x2 −si2 )2 ]/2σ2
= e (3)
2πσ 2
From Definition 8.2 the acceptance regions Ai for the ML multiple hypothesis test must satisfy
(x1 , x2 ) ∈ Ai if fX1 ,X2 |Hi (x1 , x2 ) ≥ fX1 ,X2 |Hj (x1 , x2 ) for all j. (4)
(x1 , x2 ) ∈ Ai if (x1 − si1 )2 + (x2 − si2 )2 ≤ (x1 − sj1 )2 + (x2 − sj2 )2 for all j (5)
In terms of the vectors x and si , the acceptance regions are defined by the rule
x ∈ Ai if x − si 2 ≤ x − sj 2 (6)
Just as in the case of QPSK, the acceptance region Ai is the set of vectors x that are closest to si .
Assuming all four hypotheses are equally likely, the probability of an error decoding the third bit
is
P [B3 |H011 ] + P [B3 |H001 ] + P [B3 |H101 ] + P [B3 |H111 ]
P [B3 ] = (3)
4
Q(1/σN ) + Q(2/σN )
= (4)
2
296
Problem 8.3.6 Solution
(a) Hypothesis Hi is that X = si +N, where N is a Gaussian random vector independent of which
signal was transmitted. Thus, given Hi , X is a Gaussian (si , σ 2 I) random vector. Since X is
two-dimensional,
1 − 1 (x−si ) σ2 I−1 (x−si ) 1 − 12
x−si
2
fX|Hi (x) = 2
e 2 = e 2σ . (1)
2πσ 2πσ 2
Since the hypotheses Hi are equally likely, the MAP and ML rules are the same and achieve
the minimum probability of error. In this case, from the vector version of Theorem 8.8, the
MAP rule is
X2 Using the conditional PDFs fX|Hi (x), the MAP rule becomes
A2
s2
s1 A1 x ∈ Am if x − sm 2 ≤ x − sj 2 for all j. (3)
s0 A0
X1 In terms of geometry, the interpretation is that all vectors x
sM-1 closer to sm than to any other signal sj are assigned to Am . In
AM-1 this problem, the signal constellation (i.e., the set of vectors si )
sM-2
AM-2 is the set of vectors on the circle of radius E. The acceptance
regions are the “pie slices” around each signal vector.
(c) By symmetry, PERR is the same as the conditional probability of error 1−P [Ai |Hi ], no matter
which si is transmitted. Let B denote a circle of radius d at the origin and let Bi denote the
circle of radius d around si . Since B0 ⊂ A0 ,
297
By changing to polar coordinates,
' d ' 2π
1 2 2
P [N ∈ B] = 2
e−r /2σ r dθ dr (6)
2πσ 0 0
' d
1 2 2
= 2 re−r /2σ r dr (7)
σ 0
2
$
2 $d 2 2 2 2
= −e−r /2σ $ = 1 − e−d /2σ = 1 − e−E sin (π/M )/2σ (8)
0
Thus
2 (π/M )/2σ 2
PERR = 1 − P [A0 |H0 ] ≤ 1 − P [N ∈ B] = e−E sin . (9)
(a) In Problem 8.3.4, we found that in terms of the vectors x and si , the acceptance regions are
defined by the rule
Just as in the case of QPSK, the acceptance region Ai is the set of vectors x that are closest
to si . Graphically, these regions are easily found from the sketch of the signal constellation:
X2
s7 s3
s5 s2
s6 s4 s1 s0
X1
s9 s8 s1 2 s1 4
s1 0 s1 3
s11 s1 5
298
Given H1 , a correct decision is made if (X1 , X2 ) ∈ A1 . Given H1 , X1 = 1 + N1 and X2 =
1 + N2 . Thus,
(c) Surrounding each signal si is an acceptance region Ai that is no smaller than the acceptance
region A1 . That is,
This implies
15
P [C] = P [C|Hi ] P [H1 ] (11)
i=0
15
15
≥ P [C|H1 ] P [Hi ] = P [C|H1 ] P [Hi ] = P [C|H1 ] (12)
i=0 i=0
(x1 , x2 ) ∈ Ai if pi fX1 ,X2 |Hi (x1 , x2 ) ≥ pj fX1 ,X2 |Hj (x1 , x2 ) for all j (1)
Expanding the squares and using the identity cos2 θ + sin2 θ = 1 yields the simplified rule
• (x1 , x2 ) ∈ Ai if for all j,
σ2 pj
x1 [cos θi − cos θj ] + x2 [sin θi − sin θj ] ≥ √ ln (4)
E pi
299
Note that the MAP rules define linear constraints in x1 and x2 . Since θi = π/4 + iπ/2, we use the
following table to enumerate the constraints:
cos θi sin θi
√ √
i=0 1/ √2 1/√2
i=1 −1/√2 1/ √2 (5)
i=2 −1/√ 2 −1/√2
i=3 1/ 2 −1/ 2
To be explicit, to determine whether (x1 , x2 ) ∈ Ai , we need to check the MAP rule for each j = i.
Thus, each Ai is defined by three constraints. Using the above table, the acceptance regions are
• (x1 , x2 ) ∈ A0 if
σ2 p1 σ2 p3 σ2 p2
x1 ≥ √ ln x2 ≥ √ ln x1 + x2 ≥ √ ln (6)
2E p0 2E p0 2E p0
• (x1 , x2 ) ∈ A1 if
σ2 p1 σ2 p2 σ2 p3
x1 ≤ √ ln x2 ≥ √ ln − x1 + x2 ≥ √ ln (7)
2E p0 2E p1 2E p1
• (x1 , x2 ) ∈ A2 if
σ2 p2 σ2 p2 σ2 p2
x1 ≤ √ ln x2 ≤ √ ln x1 + x2 ≥ √ ln (8)
2E p3 2E p1 2E p0
• (x1 , x2 ) ∈ A3 if
σ2 p2 σ2 p3 σ2 p2
x1 ≥ √ ln x2 ≤ √ ln − x1 + x2 ≥ √ ln (9)
2E p3 2E p0 2E p3
300
X2
A1 s1 s0 A0
X1
A2 s2 s3 A3
Note that the boundary between A1 and A3 defined by −x1 + x2 ≥ 0 plays no role because of the
high value of p0 .
301
The MAP rule is
2 2
y ∈ Am if e−y−SP ≥ e−y−SP
1/2 x /2σ 2 1/2 x /2σ 2
m j
for all j, (5)
or equivalently,
; ; ; ;
; ; ; ;
y ∈ Am if ;y − SP1/2 xm ; ≤ ;y − SP1/2 xj ; for all j. (6)
; ;
That is, we choose the vector x∗ = xm that minimizes the distance ;y − SP1/2 xj ; among all
vectors xj ∈ Bk . Since this vector x∗ is a function of the observation y, this is described by
the math notation ; ;
; ;
x∗ (y) = arg min ;y − SP1/2 x; , (7)
x∈Bk
where arg minx g(x) returns the argument x that minimizes g(x).
; ;
(c) To implement this detector, we must evaluate ;y − SP1/2 x; for each x ∈ Bk . Since there 2k
vectors in Bk , we have to evaluate 2k hypotheses. Because the number of hypotheses grows
exponentially with the number of users k, the maximum likelihood detector is considered to
be computationally intractable for a large number of users k.
302
This rule simplifies to
It is useful to show these acceptance sets graphically. In this plot, the area around yi is the
acceptance set Ai and the dashed lines are the boundaries between the acceptance sets.
Y2
A3 A1
y3 y1
1 + cos θ −1 + cos θ
y1 = y3 = (7)
Y1 sin θ sin θ
1 − cos θ −1 − cos θ
y2 = y4 = (8)
y4 y2 − sin θ − sin θ
A4 A2
(Even though the components of Y are conditionally independent given Hi , the four integrals
Ai fY|Hi (y) dy cannot be represented in a simple form. Moreoever, they cannot even be rep-
resented by the Φ(·) function. Note that the probability of a correct decision is the probability that
the bits X1 and X2 transmitted by both users are detected correctly.
The probability of a bit error is still somewhat more complex. For example if X1 = 1, then
hypotheses H1 and H3 are equally likely. The detector guesses X̂1 = 1 if Y ∈ A1 ∪ A3 . Given
X1 = 1, the conditional probability of a correct decision on this bit is
1 1
P X̂1 = 1|X1 = 1 = P [Y ∈ A1 ∪ A3 |H1 ] + P [Y ∈ A1 ∪ A3 |H3 ] (10)
2' 2 '
1 1
= fY|H1 (y) dy + f (y) dy (11)
2 A1 ∪A3 2 A1 ∪A3 Y|H3
By comparison, the decorrelator does something simpler. Since S is a square invertible matrix,
−1 −1 −1 −1 1 1 − cos θ
(S S) S = S (S ) S = S = (12)
sin θ 0 1
A3 y3 y1 A1
cos θ
X̂1 = sgn (Ỹ1 ) = sgn Y1 − Y2 (14)
Y1 sin θ
Y2
X̂2 = sgn (Ỹ2 ) = sgn = sgn (Y2 ). (15)
A4 y4 y2 A2 sin θ
303
Because we chose a coordinate system such that S1 lies along the x-axis, the effect of the decorrela-
tor on the rule for bit X2 is particularly easy to understand. For bit X2 , we just check whether the
vector Y is in the upper half plane. Generally, the boundaries of the decorrelator decision regions
are determined by straight lines, they are easy to implement and probability of error is easy to
calculate. However, these regions are suboptimal in terms of probability of error.
1 − pi
20
1 − pi x0
P [X > x0 |Hi ] = px−1 = pi + · · · + p19 (2)
1 − p20
i x=x0 +1
i
1 − pi20 i
px0 (1 − pi )
= i 1 + p i + · · · + p 19−x0
(3)
1 − p20
i
i
pxi 0 (1 − p20−x
i
0
) pxi 0 − p20i
= = (4)
1 − p20
i 1 − p 20
i
We note that the above formula is also correct for x0 = 20. Using this formula, the false alarm and
miss probabilities are
px0 0 − p20
PFA = P [X > x0 |H0 ] = 0
, (5)
1 − p20
0
1 − px1 0
PMISS = 1 − P [X > x0 |H1 ] = (6)
1 − p20
1
The Matlab program rocdisc(p0,p1) returns the false alarm and miss probabilities and also
plots the ROC. Here is the program and the output for rocdisc(0.9,0.99):
function [PFA,PMISS]=rocdisc(p0,p1); 1
x=0:20;
0.8
PFA= (p0.^x-p0^(20))/(1-p0^(20));
P MISS
0
0 0.5 1
P FA
From the receiver operating curve, we learn that we have a fairly lousy sensor. No matter how
we set the threshold x0 , either the false alarm probability or the miss probability (or both!) exceed
0.5.
304
It is straightforward to use Matlab to plot PERR as a function of p. The function bperr calculates
PERR for a vector p and a scalar signal to noise ratio snr corresponding to v/σ. A second program
bperrplot(snr) plots PERR as a function of p. Here are the programs
function perr=bperr(p,snr); function pe=bperrplot(snr);
%Problem 8.4.2 Solution p=0.02:0.02:0.98;
r=log(p./(1-p))/(2*snr); pe=bperr(p,snr);
perr=(p.*(qfunction(r+snr))) ... plot(p,pe);
+((1-p).*phi(r-snr)); xlabel(’\it p’);
ylabel(’\it P_{ERR}’);
Here are three outputs of bperrplot for the requested SNR values.
−24
x 10
0.8 0.2 8
0.6 0.15
6
PERR
PERR
PERR
0.4 0.1
4
0.2 0.05
0 0 2
0 0.5 1 0 0.5 1 0 0.5 1
p p p
bperrplot(0.1) bperrplot(0.1) bperrplot(0.1)
In all three cases, we see that PERR is maximum at p = 1/2. When p = 1/2, the optimal (minimum
probability of error) decision rule is able to exploit the one hypothesis having higher a priori
probability than the other.
This gives the wrong impression that one should consider building a communication system
with p = 1/2. To see this, consider the most extreme case in which the error probability goes to
zero as p → 0 or p → 1. However, in these extreme cases, no information is being communicated.
When p = 0 or p = 1, the detector can simply guess the transmitted bit. In fact, there is no need
to tansmit a bit; however, it becomes impossible to transmit any information.
Finally, we note that v/σ is an SNR voltage ratio. For communication systems, it is common to
measure SNR as a power ratio. In particular, v/σ = 10 corresponds to a SNR of 10 log1 0(v 2 /σ 2 ) =
20 dB.
Thus among {0.4, 0.5, · · · , 1.0}, it appears that T = 0.8 is best. Now we test values of T in the
neighborhood of 0.8:
>> T=[0.70:0.02:0.9];Pe=sqdistor(1.5,0.5,100000,T);
>>[Pmin,Imin]=min(Pe);T(Imin)
ans =
0.78000000000000
305
This suggests that T = 0.78 is best among these values. However, inspection of the vector Pe shows
that all values are quite close. If we repeat this experiment a few times, we obtain:
>> T=[0.70:0.02:0.9];Pe=sqdistor(1.5,0.5,100000,T);
>> [Pmin,Imin]=min(Pe);T(Imin)
ans =
0.78000000000000
>> T=[0.70:0.02:0.9];Pe=sqdistor(1.5,0.5,100000,T);
>> [Pmin,Imin]=min(Pe);T(Imin)
ans =
0.80000000000000
>> T=[0.70:0.02:0.9];Pe=sqdistor(1.5,0.5,100000,T);
>> [Pmin,Imin]=min(Pe);T(Imin)
ans =
0.76000000000000
>> T=[0.70:0.02:0.9];Pe=sqdistor(1.5,0.5,100000,T);
>> [Pmin,Imin]=min(Pe);T(Imin)
ans =
0.78000000000000
This suggests that the best value of T is in the neighborhood of 0.78. If someone were paying you
to find the best T , you would probably want to do more testing. The only useful lesson here is that
when you try to optimize parameters using simulation results, you should repeat your experiments
to get a sense of the variance of your results.
When we plot the functions f (x) = (x − 4)2 and g(x) = γ0 + 16 ln x, we see that there exist x1 and
x2 such that f (x1 ) = g(x1 ) and f (x2 ) = g(x2 ). In terms of x1 and x2 ,
A0 = [0, x1 ] ∪ [x2 , ∞), A1 = (x1 , x2 ). (5)
Using a Taylor series expansion of ln x around x = x0 = 4, we can show that
g(x) = γ0 + 16 ln x ≤ h(x) = γ0 + 16(ln 4 − 1) + 4x. (6)
Since h(x)
√ is linear, we can use the quadratic formula to solve f (x) = h(x), yielding a solution
x̄2 = 6 + 4 + 16 ln 4 + γ0 . One can show that x2 ≤ x̄2 . √
In the example shown below corresponding
to γ = 1 shown here, x1 = 1.95, x2 = 9.5 and x̄2 = 6 + 20 = 10.47.
306
60
40
20
0
f(x)
−20 g(x)
h(x)
−40
0 2 4 6 8 10 12
0.8
0.6
P MISS
0.4
0.2
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
P FA
307
After all of this work, we see that the sensor is not particularly good in the the ense that no matter
how we choose the thresholds, we cannot reduce both the miss and false alarm probabilities under
30 percent.
However, we are only asked to find the probability of symbol error, but not the probability that
symbol i is decoded as symbol j at the receiver. Because of the symmetry of the signal constellation
and the acceptance regions, the probability of symbol error is the same no matter what symbol is
transmitted.
N2
Thus it is simpler to assume that s0 is transmitted every time and
check that the noise vector N is in the pie slice around s0 . In fact by
/2 N1 translating s0 to the origin, we obtain the “pie slice” geometry shown
(-E1/2, 0) (0,0)
in the figure. Because the lines marking the boundaries of the pie slice
have slopes ± tan θ/2.
Finally, since each Ni has variance σ 2 , we define the Gaussian (0, I) random vector Z = N/σ and
write our constraints as
− tan θ/2 1 Z1 1 √
≤ γ tan θ/2, (3)
− tan θ/2 −1 Z2 1
where γ = E/σ 2 is the signal to noise ratio of the system.
The Matlab “simulation” simply generates many pairs Z1 Z2 and checks what fraction
meets these constraints. the function mpsksim(M,snr,n) simulates the M -PSK system with SNR
snr for n bit transmissions. The script mpsktest graphs the symbol error probability for M =
8, 16, 32.
308
function Pe=mpsksim(M,snr,n); %mpsktest.m;
%Problem 8.4.5 Solution: snr=10.^((0:30)/10);
%Pe=mpsksim(M,snr,n) n=500000;
%n bit M-PSK simulation Pe8=mpsksim(8,snr,n);
t=tan(pi/M); Pe16=mpsksim(16,snr,n);
A =[-t 1; -t -1]; Pe32=mpsksim(32,snr,n);
Z=randn(2,n); loglog(snr,Pe8,snr,Pe16,snr,Pe32);
PC=zeros(length(snr)); legend(’M=8’,’M=16’,’M=32’,3);
for k=1:length(snr),
B=(A*Z)<=t*sqrt(snr(k));
PC(k)=sum(min(B))/n;
end
Pe=1-PC;
In mpsksim, each column of the matrix Z corresponds to a pair of noise variables Z1 Z2 .
The code B=(A*Z)<=t*sqrt(snr(k)) checks whether each pair of noise variables is in the pie slice
region. That is, B(1,j) and B(2,j) indicate if the ith pair meets the first and second constraints.
Since min(B) operates on each column of B, min(B) is a row vector indicating which pairs of noise
variables passed the test.
Here is the output of mpsktest:
0
10
M=8
M=16
M=32
−5
10
0 1 2 3
10 10 10 10
The curves for M = 8 and M = 16 end prematurely because for high SNR, the error rate is so low
that no errors are generated in 500,000 symbols. In this case, the measured Pe is zero and since
log 0 = −∞, the loglog function simply ignores the zero values.
The transmitted data vector x belongs to the set Bk of all binary ±1 vectors of length k. In
principle, we can enumerate the vectors in Bk as x0 , x1 , . . . , x2k −1 . Moreover, each possible data
vector xm represents a hypothesis. Since there are 2k possible data vectors, there are 2k acceptance
sets Am . The set Am is the set of all vectors y such that the decision rule is to guess X̂ = xm ,
Our normal procedure is to write a decision rule as “y ∈ Am if . . . ” however this problem has so
many has so many hypotheses that it is more staightforward to refer to a hypothesis X = xm by
the function x̂(y) which returns the vector xm when y ∈ Am . In short, x̂(y) is our best guess as
to which vector x was transmitted when y is received.
309
Because each hypotheses has a priori probability 2−k , the probability of error is minimized by
the maximum likelihood (ML) rule
Keep in mind that arg maxx g(x) returns the argument x that maximizes g(x). In any case, the
form of fY|X (y|x) implies that the ML rule should minimize the negative exponent of fY|X (y|x).
That is, the ML rule is
; ;
; ;
x̂(y) = arg min ;y − SP1/2 x; (3)
x∈Bk
Since the term y y is the same for every x, we can define the function
In this case, the ML rule can be expressed as x̂(y) = arg minx∈Bk h(x). We use Matlab to evaluate
h(x) for each x ∈ Bk . Since for k = 10, Bk has 210 = 1024 vectors, it is desirable to make the
calculation as easy as possible. To this end, we define w = SP1/2 x and and we write, with some
abuse of notation, h(·) as a function of w:
Still, given y, we need to evaluate h(w) for each vector w. In Matlab, this will be convenient
because we can form the matrices X and W with columns consisting of all possible vectors x and
w. In Matlab, it is easy to calculate w w by operating on the matrix W without looping through
all columns w.
function X=allbinaryseqs(n) In terms of Matlab, we start by defining X=allbinaryseqs(n)
%See Problem 8.4.6 which returns an n×2n matrix X, corresponding to X, such that
%X: n by 2^n matrix of all the columns of X enumerate all possible binary ±1 sequences of
%length n binary vectors length n. How allbinaryseqs works will be clear by generating
%Thanks to Jasvinder Singh
the matrices A and P and reading the help for bitget.
A=repmat([0:2^n-1],[n,1]);
P=repmat([1:n]’,[1,2^n]);
X = bitget(A,P);
X=(2*X)-1;
Next, for a set of signal vectors (spreading sequences in CDMA parlance) given by the n × k
matrix S, err=cdmasim(S,P,m) simulates the transmission of a frame of m symbols through a k
user CDMA system with additive Gaussian noise. A “symbol,” is just a vector x corresponding to
the k transmitted bits of the k users.
310
In addition, the function Pe=rcdma(n,k,snr,s,m) runs cdmasim for the pairs of values of users
k and SNR snr. Here is the pair of functions:
function err=cdmasim(S,P,m); function Pe=rcdma(n,k,snr,s,m);
%err=cdmasim(P,S,m); %Pe=rcdma(n,k,snr,s,m);
%S= n x k matrix of signals %R-CDMA simulation:
%P= diag matrix of SNRs (power % proc gain=n, users=k
% normalized by noise variance) % rand signal set/frame
%See Problem 8.4.6 % s frames, m symbols/frame
k=size(S,2); %number of users %See Problem 8.4.6 Solution
n=size(S,1); %processing gain [K,SNR]=ndgrid(k,snr);
X=allbinaryseqs(k);%all data Pe=zeros(size(SNR));
Phalf=sqrt(P); for j=1:prod(size(SNR)),
W=S*Phalf*X; p=SNR(j);k=K(j);
WW=sum(W.*W); e=0;
err=0; for i=1:s,
for j=1:m, S=randomsignals(n,k);
s=duniformrv(1,2^k,1); e=e+cdmasim(S,p*eye(k),m);
y=S*Phalf*X(:,s)+randn(n,1); end
[hmin,imin]=min(-2*y’*W+WW); Pe(j)=e/(s*m*k);
err=err+sum(X(:,s)~=X(:,imin)); % disp([p k e Pe(j)]);
end end
In cdmasim, the kth diagonal element of P is the “power” pk of user k. Technically, we assume
that the additive Gaussian noise variable have variance 1, and thus pk is actually the signal to
noise ratio of user k. In addition, WW is a length 2k row vector, with elements w w for each possible
w. For each of the m random data symbols, represented by x (or X(:,s) in Matlab), cdmasim
calculates a received signal y (y). Finally, hmin is the minimum h(w) and imin is the index of
the column of W that minimizes h(w). Thus imin is also the index of the minimizing column of
X. Finally, cdmasim compares x̂(y) and the transmitted vector x bit by bit and counts the total
number of bit errors.
The function rcdma repeats cdmasim for s frames, with a random signal set for each frame.
Dividing the total number of bit errors over s frames by the total number of transmitted bits, we
find the bit error rate Pe . For an SNR of 4 dB and processing gain 16, the requested tests are
generated with the commands
>> n=16;
>> k=[2 4 8 16];
>> Pe=rcdma(n,k,snr,100,1000);
>>Pe
Pe =
0.0252 0.0272 0.0385 0.0788
>>
To answer part (b), the code for the matched filter (MF) detector is much simpler because
there is no need to test 2k hypotheses for every transmitted symbol. Just as for the case of the
ML detector, we define a function err=mfcdmasim(S,P,m) that simulates the MF detector for m
symbols for a given set of signal vectors S. In mfcdmasim, there is no need for looping. The mth
transmitted symbol is represented by the mth column of X and the corresponding received signal
is given by the mth column of Y. The matched filter processing can be applied to all m columns
at once. A second function Pe=mfrcdma(n,k,snr,s,m) cycles through all combinations of users k
and SNR snr and calculates the bit error rate for each pair of values. Here are the functions:
311
function err=mfcdmasim(S,P,m); function Pe=mfrcdma(n,k,snr,s,m);
%err=mfcdmasim(P,S,m); %Pe=rcdma(n,k,snr,s,m);
%S= n x k matrix of signals %R-CDMA, MF detection
%P= diag matrix of SNRs % proc gain=n, users=k
% SNR=power/var(noise) % rand signal set/frame
%See Problem 8.4.6b % s frames, m symbols/frame
k=size(S,2); %no. of users %See Problem 8.4.6 Solution
n=size(S,1); %proc. gain [K,SNR]=ndgrid(k,snr);
Phalf=sqrt(P); Pe=zeros(size(SNR));
X=randombinaryseqs(k,m); for j=1:prod(size(SNR)),
Y=S*Phalf*X+randn(n,m); p=SNR(j);kt=K(j);
XR=sign(S’*Y); e=0;
err=sum(sum(XR ~= X)); for i=1:s,
S=randomsignals(n,kt);
e=e+mfcdmasim(S,p*eye(kt),m);
end
Pe(j)=e/(s*m*kt);
disp([snr kt e]);
end
Here is a run of mfrcdma.
>> pemf=mfrcdma(16,k,4,1000,1000);
4 2 73936
4 4 264234
4 8 908558
4 16 2871356
>> pemf’
ans =
0.0370 0.0661 0.1136 0.1795
>>
The following plot compares the maximum likelihood (ML) and matched filter (MF) detectors.
0.2
ML
0.15 MF
0.1
0.05
0
0 2 4 6 8 10 12 14 16
As the ML detector offers the minimum probability of error, it should not surprising that it has a
lower bit error rate. Although the MF detector is worse, the reduction in detctor complexity makes
it attractive. In fact, in practical CDMA-based cellular phones, the processing gain ranges from
roughly 128 to 512. In such case, the complexity of the ML detector is prohibitive and thus only
matched filter detectors are used.
312
√ √
where S is an n × k matrix with ith column Si and P1/2 = diag[ p1 , . . . , pk ] is a k × k diagonal
matrix of received powers, and N is a Gaussian (0, σ 2 I) Gaussian noise vector.
(a) When S has linearly independent columns, S S is invertible. In this case, the decorrelating
detector applies a transformation to Y to generate
where Ñ = (S S)−1 S N is still a Gaussian noise vector with expected value E[Ñ] = 0.
Decorrelation separates the signals in that the ith component of Ỹ is
√
Ỹi = pi Xi + Ñi . (3)
This is the same as a single user-receiver output of the binary communication system of
Example 8.6. The single-user decision rule X̂i = sgn (Ỹi ) for the transmitted bit Xi has
probability of error
!) "
√ pi
Pe,i = P Ỹi > 0|Xi = −1 = P − pi + Ñi > 0 = Q . (4)
Var[Ñi ]
However, since Ñ = AN where A = (S S)−1 S , Theorem 5.16 tells us that Ñ has covariance
matrix CÑ = ACN A . We note that the general property that (B−1 ) = (B )−1 implies that
A = S((S S) )−1 = S(S S)−1 . These facts imply
Note that S S is called the correlation matrix since its i, jth entry is Si Sj is the correlation
between the signal of user i and that of user j. Thus Var[Ñi ] = σ 2 (S S)−1
ii and the probability
of bit error for user i is for user i is
!) " !) "
pi pi
Pe,i = Q =Q . (6)
Var[Ñi ] (S S)−1
ii
To find the probability of error for a randomly chosen but, we average over the bits of all
users and find that !) "
1 1
k k
pi
Pe = Pe,i = Q . (7)
k
i=1
k (S S)−1
i=1 ii
(b) When S S is not invertible, the detector flips a coin to decide each bit. In this case, Pe,i = 1/2
and thus Pe = 1/2.
(c) When S is chosen randomly, we need to average over all possible matrices S to find the average
probability of bit error. However, there are 2kn possible matrices S and averaging over all
of them is too much work. Instead, we randomly generate m matrices S and estimate the
average Pe by averaging over these m matrices.
A function berdecorr uses this method to evaluate the decorrelator BER. The code has a
lot of lines because it evaluates the BER using m signal sets for each combination of users
k and SNRs snr. However, because the program generates signal sets and calculates the
BER asssociated with each, there is no need for the simulated transmission of bits. Thus
the program runs quickly. Since there are only 2n distinct columns for matrix S, it is quite
313
possible to generate signal sets that are not linearly independent. In this case, berdecorr
assumes the “flip a coin” rule is used. Just to see whether this rule dominates the error
probability, we also display counts of how often S is rank deficient.
Here is the (somewhat tedious) code:
function Pe=berdecorr(n,k,snr,m);
%Problem 8.4.7 Solution: R-CDMA with decorrelation
%proc gain=n, users=k, average Pe for m signal sets
count=zeros(1,length(k)); %counts rank<k signal sets
Pe=zeros(length(k),length(snr)); snr=snr(:)’;
for mm=1:m,
for i=1:length(k),
S=randomsignals(n,k(i)); R=S’*S;
if (rank(R)<k(i))
count(i)=count(i)+1;
Pe(i,:)=Pe(i,:)+0.5*ones(1,length(snr));
else
G=diag(inv(R));
Pe(i,:)=Pe(i,:)+sum(qfunction(sqrt((1./G)*snr)))/k(i);
end
end
end
disp(’Rank deficiency count:’);disp(k);disp(count);
Pe=Pe/m;
Running berdecorr with processing gains n = 16 and n = 32 yields the following output:
As you might expect, the BER increases as the number of users increases. This occurs
because the decorrelator must suppress a large set of interferers. Also, in generating 10,000
signal matrices S for each value of k we see that rank deficiency is fairly uncommon, however
it occasionally occurs for processing gain n = 16, even if k = 4 or k = 8. Finally, here is a
plot of these same BER statistics for n = 16 and k ∈ {2, 4, 8, 16}. Just for comparison, on the
same graph is the BER for the matched filter detector and the maximum likelihood detector
found in Problem 8.4.6.
314
0.4
ML
0.3 Decorr
MF
Pe
0.2
0.1
0
0 2 4 6 8 10 12 14 16
k
We see from the graph that the decorrelator is better than the matched filter for a small
number of users. However, when the number of users k is large (relative to the processing
gain n), the decorrelator suffers because it must suppress all interfering users. Finally, we
note that these conclusions are specific to this scenario when all users have equal SNR. When
some users have very high SNR, the decorrelator is good for the low-SNR user because it
zeros out the interference from the high-SNR user.
In the next step we scale s by (1 + b1 ). If b1 = 1, then s is stretched to the outer ring. Finally, we
add a Gaussian noise vector N to generate the received signal X = sb1 b2 b3 + N.
X2 In the solution to Problem 8.3.2, we found that the accep-
tance set for the hypothesis Hb1 b2 b3 that sb1 b2 b3 is transmitted
A110 A100 is the set of signal space points closest to sb1 b2 b3 . Graphically,
s110 s100 these acceptance sets are given in the adjacent figure. These
A010 A000 acceptance sets correspond an inverse mapping of the re-
s010 s000 ceived signal vector X to a bit vector guess b̂ = b̂1 b̂2 b̂3
X1 using the following rules:
A011 A001
s001 • b̂2 = 1 if X1 < 0; otherwise b̂2 = 0.
s011
A111 A101 • b̂3 = 1 if X2 < 0; otherwise b̂3 = 0.
s111 s101
√
• If |X1 | + |X2 | > 3 2/2, then b̂1 = 1; otherwise b̂1 = 0.
315
We implement these steps with the function [Pe,ber]=myqam(sigma,m) which simulates the
transmission of m symbols for each value of the vector sigma. Each column of B corresponds to a
bit vector b. Similarly, each column of S and X corresponds to a transmitted signal s and received
signal X. We calculate both the symbol decision errors that are made as well as the bit decision
errors. Finally, a script myqamplot.m plots the symbol error rate Pe and bit error rate ber as a
function of sigma. Here are the programs:
function [Pe,ber]=myqam(sigma,m); %myqamplot.m
Pe=zeros(size(sigma)); ber=Pe; sig=10.^(0.2*(-8:0));
B=reshape(bernoullirv(0.5,3*m),3,m); [Pe,ber]=myqam(sig,1e6);
%S(1,:)=1-2*B(2,:); loglog(sig,Pe,’-d’, ...
%S(2,:)=1-2*B(3,:); sig,ber,’-s’);
S=1-2*B([2; 3],:); legend(’SER’,’BER’,4);
S=([1;1]*(1+B(1,:))).*S;
N=randn(2,m);
for i=1:length(sigma),
X=S+sigma(i)*N;
BR=zeros(size(B));
BR([2;3],:)=(X<0);
BR(1,:)=sum(abs(X))>(3/sqrt(2));
E=(BR~=B);
Pe(i)=sum(max(E))/m;
ber(i)=sum(sum(E))/(3*m);
end
Note that we generate the bits and transmitted signals, and normalized noise only once. However
for each value of sigma, we rescale the additive noise, recalculate the received signal and receiver
bit decisions. The output of myqamplot is shown in this figure:
0
10
SER
BER
−5
10
−2 −1 0
10 10 10
Careful reading of the figure will show that the ratio of the symbol error rate to the bit error rate is
always very close to 3. This occurs because in the acceptance set for b1b2 b3 , the adjacent acceptance
sets correspond to a one bit difference. Since the usual type of symbol error occurs when the vector
X is in the adjacent set, a symbol error typically results in one bit being in error but two bits being
received correctly. Thus the bit error rate is roughly one third the symbol error rate.
(a) For the M -PSK communication system with additive Gaussian noise, Aj denoted the hypoth-
esis that signal sj was transmitted. The solution to Problem 8.3.6 derived the MAP decision
rule
316
X2
A2
s2
x ∈ Am if x − sm 2 ≤ x − sj 2 for all j. (1)
s1 A1
s0 A0 In terms of geometry, the interpretation is that all vectors x
X1
closer to sm than to any other signal sj are assigned to Am . In
sM-1
AM-1 this problem, the signal constellation (i.e., the set of vectors si )
sM-2 is the set of vectors on the circle of radius E. The acceptance
AM-2
regions are the “pie slices” around each signal vector.
We observe that
Since all the signals are on the same circle, sj sj is the same for all j. Also, x x is the same
for all j. Thus
min x − sj 2 = min −x sj = max x sj . (3)
j j j
Since x sj= x sj cos φ where φ is the angle between x and sj . Thus maximizing x sj is
equivalent to minimizing the angle between x and sj .
(b) In Problem 8.4.5, we estimated the probability of symbol error without building a complete
simulation of the M -PSK system. In this problem, we need to build a more complete simula-
tion to determine the probabilities Pij . By symmetry, it is sufficient to transmit s0 repeatedly
and count how often the receiver guesses sj . This is done by the functionp=mpskerr(M,snr,n).
function p=mpskerr(M,snr,n); Note that column i of S is the signal si−1 . The kth
%Problem 8.4.5 Solution: column of X corresponds to Xk = s0 + Nk , the re-
%Pe=mpsksim(M,snr,n) ceived signal for the kth transmission. Thus y(k)
%n bit M-PSK simulation corresponds to maxj Xk sj and e(k) reports the re-
t=(2*pi/M)*(0:(M-1));
ceiver decision for the kth transmission. The vector
S=sqrt(snr)*[cos(t);sin(t)];
X=repmat(S(:,1),1,n)+randn(2,n); p calculates the relative frequency of each receiver
[y,e]=max(S’*X); decision.
p=countequal(e-1,(0:(M-1)))’/n;
The next step is to translate the vector P00 P01 · · · P0,M −1 (corresponding to p in
Matlab) into an entire matrix P with elements Pij . The symmetry of the phase rotiation
dictates that each row of P should be a one element cyclic rotation of the previous row. More-
over, by symmetry we observe that P01 = P0,M −1 , P02 = P0,M −2 and so on. However, because
p is derived from a simulation experiment, it will exhibit this symmetry only approximately.
317
(c) The next step is to determine the effect of the mapping of bits to transmission vectors sj .
The matrix D with i, jth element dij that indicates the number of bit positions in which the
bit string assigned to si differs from the bit string assigned to sj . In this case, the integers
provide a compact representation of this mapping. For example the binary mapping is
s0 s1 s2 s3 s4 s5 s6 s7
000 001 010 011 100 101 110 111
0 1 2 3 4 5 6 7
s0 s1 s2 s3 s4 s5 s6 s7
000 001 011 010 110 111 101 100
0 1 3 2 6 7 5 4
Thus the binary mapping can be represented by a vector c1 = 0 1 · · · 7 while the Gray
mapping is described by c2 = 0 1 3 2 6 7 5 4 .
>> c1=0:7;
>>snr=[4 8 16 32 64];
>>Pb=mpskmap(c1,snr,1000000);
>> Pb
Pb =
0.7640 0.4878 0.2198 0.0529 0.0038
318
(e) Here is the performance of the Gray mapping:
Experimentally, we observe that the BER of the binary mapping is higher than the BER of
the Gray mapping by a factor in the neighborhood of 1.5 to 1.7
In fact, this approximate ratio can be derived by a quick and dirty analysis. For high SNR,
suppose that that si is decoded as si+1 or si−1 with probability q = Pi,i+1 = Pi,i−1 and all
other types of errors are negligible. In this case, the BER formula based on this approximation
corresponds to summing the matrix D for the first off-diagonals and the corner elements. Here
are the calculations:
>> D=mpskdist(c1);
>> sum(diag(D,1))+sum(diag(D,-1))+D(1,8)+D(8,1)
ans =
28
>> DG=mpskdist(c2);
>> sum(diag(DG,1))+sum(diag(DG,-1))+DG(1,8)+DG(8,1)
ans =
16
The ratio of BERs is 28/16 = 1.75. Experimentally, we found at high SNR that the ratio of
BERs was 0.0038/0.0023 = 1.65, which seems to be in the right ballpark.
where dij (k) indicates whether the bit strings mapped to si and sj differ in bit position k.
As in Problem 8.4.9, we describe the mapping by the vector of integers d. For example the
binary mapping is
s0 s1 s2 s3 s4 s5 s6 s7
000 001 010 011 100 101 110 111
0 1 2 3 4 5 6 7
The Gray mapping is
319
s0 s1 s2 s3 s4 s5 s6 s7
000 001 011 010 110 111 101 100
0 1 3 2 6 7 5 4
Thus the binary mapping can be represented by a vector c1 = 0 1 · · · 7 while the Gray
mapping is described by c2 = 0 1 3 2 6 7 5 4 .
function D=mpskdbit(c,k); The function D=mpskdbit(c,k) translates the mapping
%See Problem 8.4.10: For mapping vector c into the matrix D with entries dij that indicates
%c, calculate BER of bit k whether bit k is in error when transmitted symbol si is
L=length(c);m=log2(L); decoded by the receiver as sj . The method is to generate
[C1,C2]=ndgrid(c,c);
grids C1 and C2 for the pairs of integers, identify bit k in
B1=bitget(C1,k);
B2=bitget(C2,k); each integer, and then check if the integers differ in bit k.
D=(B1~=B2);
Thus, there is a matrix D associated with each bit position and we calculate the expected number
of bit errors associated with each bit position. For each bit, the rest of the solution is the same as
in Problem 8.4.9. We use the commands p=mpskerr(M,snr,n) and P=mpskmatrix(p) to calculate
the matrix P which holds an estimate of each probability Pij . Finally, using matrices P and D, we
treat BER(k) as a finite random variable that takes on value dij with probability Pij . the expected
value of this finite random variable is the expected number of bit errors.
function Pb=mpskbitmap(c,snr,n); Given the integer mapping vector c, we estimate the
%Problem 8.4.10: Calculate prob. of BER of the a mapping using just one more function
%bit error for each bit position for Pb=mpskmap(c,snr,n). First we calculate the matrix
%an MPSK bit to symbol mapping c D with elements dij. Next, for a given value of snr,
M=length(c);m=log2(M);
we use n transmissions to estimate the probabilities
p=mpskerr(M,snr,n);
P=mpskmatrix(p); Pij . Last, we calculate the expected number of bit k
Pb=zeros(1,m); errors per transmission.
for k=1:m,
D=mpskdbit(c,k);
Pb(k)=finiteexp(D,P)/M;
end
For an SNR of 10dB, we evaluate the two mappings with the following commands:
>> c1=0:7;
>> mpskbitmap(c1,10,100000)
ans =
0.2247 0.1149 0.0577
We see that in the binary mapping, the 0.22 error rate of bit 1 is roughly double that of bit 2,
which is roughly double that of bit 3. For the Gray mapping, the error rate of bit 1 is cut in half
relative to the binary mapping. However, the bit error rates at each position a re still not identical
since the error rate of bit 1 is still double that for bit 2 or bit 3. One might surmise that careful
study of the matrix D might lead one to prove for the Gray map that the error rate for bit 1 is
exactly double that for bits 2 and 3 . . . but that would be some other homework problem.
320
Problem Solutions – Chapter 9
(a) For 0 ≤ x ≤ 1,
' ∞ ' 1
fX (x) = fX,Y (x, y) dy = 6(y − x) dy (1)
−∞ x
$y=1
= 3y 2 − 6xy $y=x (2)
= 3(1 − 2x + x ) = 3(1 − x)
2 2
(3)
321
(c) First we calculate
' '
0.5 0.5 $0.5 7
P [X < 0.5] = fX (x) dx = 3(1 − x)2 dx = −(1 − x)3 $0 = . (6)
0 0 8
322
Problem 9.1.3 Solution
(a) For 0 ≤ x ≤ 1,
' ∞ ' 1
fX (x) = fX,Y (x, y) dy = 2 dy = 2(1 − x). (1)
−∞ x
The complete expression of the PDF of X is
2(1 − x) 0 ≤ x ≤ 1,
fX (x) = (2)
0 otherwise.
323
(f) We already know that P [X > 1/2] = 1/4. However, this problem differs from the other
problems in this section because we will estimate Y based on the observation of X. In this
case, we need to calculate the conditional joint PDF
fX,Y (x,y)
P [X>1/2] x > 1/2, 8 1/2 < x ≤ y ≤ 1,
fX,Y |X>1/2 (=) = (11)
0 otherwise, 0 otherwise.
(a) The conditional PDF of X given Y is found by dividing the joint PDF by the marginal with
respect to Y . For y < 0 or y > 1, fY (y) = 0. For 0 ≤ y ≤ 1,
' y
$y
fY (y) = 6(y − x) dx = 6xy − 3x2 $0 = 3y 2 (2)
0
324
(c) First we must find the marginal PDF for X. For 0 ≤ x ≤ 1,
' '
∞ 1 $y=1
fX (x) = fX,Y (x, y) dy = 6(y − x) dy = 3y 2 − 6xy $y=x (7)
−∞ x
= 3(1 − 2x + x2 ) = 3(1 − x)2 (8)
(c) The MMSE estimator of X given Y is X̂M (Y ) = E[X|Y ] = Y /2. The mean squared error is
e∗X,Y = E (X − X̂M (Y ))2 = E (X − Y /2)2 = E X 2 − XY + Y 2 /4 (4)
325
Of course, the integral must be evaluated.
' 1' y
∗
eX,Y = 2(x2 − xy + y 2 /4) dx dy (5)
0 0
' 1
3 $x=y
= 2x /3 − x2 y + xy 2 /2 $x=0 dy (6)
0
' 1 3
y
= dy = 1/24 (7)
0 6
Another approach to finding the mean square error is to recognize that the MMSE estimator
is a linear estimator and thus must be the optimal linear estimator. Hence, the mean square
error of the optimal linear estimator given by Theorem 9.4 must equal e∗X,Y . That is, e∗X,Y =
Var[X](1 − ρ2X,Y ). However, calculation of the correlation coefficient ρX,Y is at least as much
work as direct calculation of e∗X,Y .
(b) No, the random variables X and Y are not independent since
This implies
From Theorem 9.4, the minimum mean square error of the optimum linear estimate is
2 71 71
e∗L = σX
2
(1 − ρ2X,Y ) = = . (8)
3 120 180
326
(f) The conditional probability mass function is
⎧
⎪
⎨
1/6
1/4 = 2/3 x = −1,
PX,Y (x, −3)
PX|Y (x| − 3) = = 1/12 (9)
⎪ 1/4 = 1/3 x = 0.
PY (−3) ⎩
0 otherwise.
(b) Because X and V are independent random variables, the variance of R is the sum of the
variance of V and the variance of X.
Var[R] = Var[V ] + Var[X] = 12 + 3 = 15. (3)
327
(e) The minimum mean square error in the estimate is
e∗ = Var[V ](1 − ρ2V,R ) = 12(1 − 12/15) = 12/5 (7)
328
(a) Random variables X and Y are independent since for all x and y,
Since each other pair of random variables has the same marginal PMFs as X and Y but a
different joint PMF, all of the other pairs of random variables must be dependent. Since X
and Y are independent, ρX,Y = 0. For the other pairs, we must compute the covariances.
(b) From Theorem 9.4, the least mean square linear estimator of U given V is
σU
ÛL (V ) = ρU,V (V − E [V ]) + E [U ] = ρU,V V = −V (11)
σV
Similarly for the other pairs, all expected values are zero and the ratio of the standard
deviations is always 1. Hence,
329
The first and second moments of X are
' 1
$1
E [X] = (x + 2x2 − 3x3 ) dx = x2 /2 + 2x3 /3 − 3x4 /4$0 = 5/12 (3)
0
' 1
2 $1
E X = (x2 + 2x3 − 3x4 ) dx = x3 /3 + x4 /2 − 3x5 /5$0 = 7/30 (4)
0
330
The correlation between X and Y is
' 1' 1
E [XY ] = 6 xy(y − x) dy dx = 1/5 (6)
0 x
(a) Given X = x, Y is uniform on [0, x]. Hence E[Y |X = x] = x/2. Thus the minimum mean
square estimate of Y given X is
(b) The minimum mean square estimate of X given Y can be found by finding the conditional
probability density function of X given Y . First we find the joint density function.
λe−λx 0 ≤ y ≤ x
fX,Y (x, y) = fY |X (y|x) · fX (x) = (4)
0 otherwise
By dividing the joint density by the marginal density of Y we arrive at the conditional density
of X given Y .
fX,Y (x, y) λe−λ(x−y) x ≥ y
fX|Y (x|y) = = (6)
fY (y) 0 otherwise
Now we are in a position to find the minimum mean square estimate of X given Y . Given
Y = y, the conditional expected value of X is
' ∞
E [X|Y = y] = λxe−λ(x−y) dx (7)
y
331
We observe that if U is an exponential random variable with parameter λ, then
1
E [X|Y = y] = E [U + y] = +y (9)
λ
The minimum mean square error estimate of X given Y is
1
X̂M (Y ) = E [X|Y ] = +Y (10)
λ
(c) Since the MMSE estimate of Y given X is the linear estimate ŶM (X) = X/2, the optimal
linear estimate of Y given X must also be the MMSE estimate. That is, ŶL (X) = X/2.
(d) Since the MMSE estimate of X given Y is the linear estimate X̂M (Y ) = Y + 1/λ, the optimal
linear estimate of X given Y must also be the MMSE estimate. That is, X̂L (Y ) = Y + 1/λ.
332
(b) The MMSE estimate of X given R = r is E[X|R = r]. From the initial problem statement,
we know that given R = r, X is exponential with expectred value 1/r. That is, E[X|R = r] =
1/r. Another way of writing this statement is
Because E[X] doesn’t exist, the LMSE estimate of X given R doesn’t exist.
(d) Just as in part (c), because E[X] doesn’t exist, the LMSE estimate of R given X doesn’t
exist.
(E [XY ])2
e∗ = E X 2 − (3)
E [Y 2 ]
(c) We can write the LMSE estimator given in Theorem 9.4 in the form
σX
x̂L (()Y ) = ρX,Y Y −b (4)
σY
where
σX
b = ρX,Y E [Y ] − E [X] (5)
σY
When b = 0, X̂(Y ) is the LMSE estimate. Note that the typical way that b = 0 occurs when
E[X] = E[Y ] = 0. However, it is possible that the right combination of expected values,
variances, and correlation coefficent can also yield b = 0.
333
Problem 9.3.1 Solution
In this case, the joint PDF of X and R is
From Theorem 9.6, the MAP estimate of R given X = x is the value of r that maximizes
fX|R (x|r)fR (r). Since R has a uniform PDF over [0, 1000],
r̂MAP (x) = arg max fX|R (x|r) fR (r) = arg max fX|R (x|r) (3)
0≤r 0≤r≤1000
Hence, the maximizing value of r is the same as for the ML estimate in Quiz 9.3 unless the
maximizing r exceeds 1000 m. In this case, the maximizing value is r = 1000 m. From the solution
to Quiz 9.3, the resulting ML estimator is
1000 x < −160
r̂ML (x) = (4)
(0.1)10−x/40 x ≥ −160
(a) The minimum mean square error estimate of N given R is the conditional expected value of
N given R = r. This is given directly in the problem statement as r.
(b) The maximum a posteriori estimate of N given R is simply the value of n that will maximize
PN |R (n|r). That is,
n̂M AP (r) = arg max PN |R (n|r) = arg max(rT )n e−rT /n! (4)
n≥0 n≥0
Usually, we set a derivative to zero to solve for the maximizing value. In this case, that tech-
nique doesn’t work because n is discrete. Since e−rT is a common factor in the maximization,
we can define g(n) = (rT )n /n! so that n̂M AP = arg maxn g(n). We observe that
rT
g(n) = g(n − 1) (5)
n
this implies that for n ≤ rT , g(n) ≥ g(n − 1). Hence the maximizing value of n is the largest
n such that n ≤ rT . That is, n̂M AP = rT .
334
(c) The maximum likelihood estimate of N given R selects the value of n that maximizes
fR|N =n (r), the conditional PDF of R given N . When dealing with situations in which we
mix continuous and discrete random variables, its often helpful to start from first principles.
In this case,
PN |R (n|r) fR (r)
fR|N (r|n) = (9)
PN (n)
To find the value of n that maximizes fR|N (r|n), we need to find the denominator PN (n).
' ∞
PN (n) = PN |R (n|r) fR (r) dr (10)
−∞
' ∞
(rT )n e−rT −µr
= µe dr (11)
n!
0
' ∞
µT n
= rn (µ + T )e−(µ+T )r dr (12)
n!(µ + T ) 0
µT n
= E [X n ] (13)
n!(µ + T )
where X is an exponential random variable with expected value 1/(µ + T ). There are several
ways to derive the nth moment of an exponential random variable including integration by
parts. In Example 6.5, the MGF is used to show that E[X n ] = n!/(µ + T )n . Hence, for n ≥ 0,
µT n
PN (n) = (14)
(µ + T )n+1
335
Problem 9.3.3 Solution
Both parts (a) and (b) rely on the conditional PDF of R given N = n. When dealing with
situations in which we mix continuous and discrete random variables, its often helpful to start from
first principles.
P [r < R ≤ r + dr, N = n]
fR|N (r|n) dr = P [r < R ≤ r + dr|N = n] = (1)
P [N = n]
P [N = n|R = r] P [r < R ≤ r + dr]
= (2)
P [N = n]
In terms of PDFs and PMFs, we have
PN |R (n|r) fR (r)
fR|N (r|n) = (3)
PN (n)
To find the value of n that maximizes fR|N (r|n), we need to find the denominator PN (n).
' ∞
PN (n) = PN |R (n|r) fR (r) dr (4)
−∞
' ∞
(rT )n e−rT −µr
= µe dr (5)
n!
0
' ∞
µT n µT n
= rn (µ + T )e−(µ+T )r dr = E [X n ] (6)
n!(µ + T ) 0 n!(µ + T )
where X is an exponential random variable with expected value 1/(µ + T ). There are several ways
to derive the nth moment of an exponential random variable including integration by parts. In
Example 6.5, the MGF is used to show that E[X n ] = n!/(µ + T )n . Hence, for n ≥ 0,
µT n
PN (n) = (7)
(µ + T )n+1
Finally, the conditional PDF of R given N is
(rT )n e−rT
PN |R (n|r) fR (r) n! µe−µr (µ + T )n+1 rn e−(µ+T )r
fR|N (r|n) = = µT n
= (8)
PN (n) n!
(µ+T )n+1
(a) The MMSE estimate of R given N = n is the conditional expected value E[R|N = n]. Given
N = n, the conditional PDF oF R is that of an Erlang random variable or order n + 1. From
Appendix A, we find that E[R|N = n] = (n + 1)/(µ + T ). The MMSE estimate of R given
N is
N +1
R̂M (N ) = E [R|N ] = (9)
µ+T
(b) The MAP estimate of R given N = n is the value of r that maximizes fR|N (r|n).
(µ + T )n+1 n −(µ+T )r
R̂MAP (n) = arg max fR|N (r|n) = arg max r e (10)
r≥0 r≥0 n!
By setting the derivative with respect to r to zero, we obtain the MAP estimate
n
R̂MAP (n) = (11)
µ+T
336
(c) The ML estimate of R given N = n is the value of R that maximizes PN |R (n|r). That is,
(rT )n e−rT
R̂ML (n) = arg max (12)
r≥0 n!
Seting the derivative with respect to r to zero yields
R̂ML (n) = n/T (13)
Differentiating PQ|K (q|k) with respect to q and setting equal to zero yields
dPQ|K (q|k) n
= kq k−1 (1 − q)n−k − (n − k)q k (1 − q)n−k−1 = 0 (3)
dq k
T‘¡he maximizing value is q = k/n so that
K
Q̂ML (K) = (4)
n
(b) To find the PMF of K, we average over all q.
' ∞ ' 1
n k
PK (k) = PK|Q (k|q) fQ (q) dq = q (1 − q)n−k dq (5)
−∞ 0 k
We can evaluate this integral by expressing it in terms of the integral of a beta PDF. Since
(n+1)!
β(k + 1, n − k + 1) = k!(n−k)! , we can write
' 1
1 1
PK (k) = β(k + 1, n − k + 1)q k (1 − q)n−k dq = (6)
n+1 0 n+1
That is, K has the uniform PMF
1/(n + 1) k = 0, 1, . . . , n
PK (k) = (7)
0 otherwise
Hence, E[K] = n/2.
(c) The conditional PDF of Q given K is
(n+1)! k
PK|Q (k|q) fQ (q)
k!(n−k)! q (1 − q)n−k 0 ≤ q ≤ 1
fQ|K (q|k) = = (8)
PK (k) 0 otherwise
That is, given K = k, Q has a beta (k + 1, n − k + 1) PDF.
(d) The MMSE estimate of Q given K = k is the conditional expectation E[Q|K = k]. From the
beta PDF described in Appendix A, E[Q|K = k] = (k + 1)/(n + 2). The MMSE estimator is
K +1
Q̂M (K) = E [Q|K] = (9)
n+2
337
Problem 9.4.1 Solution
From the problem statement, we learn for vectors X = X1 X2 X3 and Y = Y1 Y2 that
⎡ ⎤
1 3/4 1/2
1 1 0
E [X] = 0, RX = ⎣3/4 1 3/4⎦ , Y = AX = X. (1)
0 1 1
1/2 3/4 1
(a) Since E[Y] = AE[X] = 0, we can apply Theorem 9.7(a) which states that the minimum
mean square error estimate of X1 is X̂1 (Y) = â Y where â = R−1 Y RYX1 . The rest of the
solution is just calculation. (We note that even in the case of a 3 × 3 matrix, its convenient
to use Matlab with format rat mode to perform the calculations and display the results
as nice fractions.) From Theorem 5.13,
⎡ ⎤⎡ ⎤
1 3/4 1/2 1 0
1 1 0 ⎣ ⎦ ⎣ ⎦ 7/2 3
RY = ARX A = 3/4 1 3/4 1 1 = . (2)
0 1 1 3 7/2
1/2 3/4 1 0 1
Finally,
14/13 −12/13 7/4 1 19
â = R−1
Y RYX1 = = . (4)
−12/13 14/13 5/4 26 −7
Thus the linear MMSE estimator of X1 given Y is
19 7
X̂1 (Y) = â Y = Y1 − Y2 = 0.7308Y1 − 0.2692Y2 . (5)
26 26
(b) By Theorem 9.7(c), the mean squared error of the optimal estimator is
(c) We can estimate random variable X1 based on the observation of random variable Y1 using
Theorem 9.4. Note that Theorem 9.4 is just a special case of Theorem 9.8 in which the
observation is a random vector. In any case, from Theorem 9.4, the optimum linear estimate
is X̂1 (Y1 ) = a∗ Y1 + b∗ where
Cov [X1 , Y1 ]
a∗ = , b∗ = µX1 − a∗ µY1 . (9)
Var[Y1 ]
338
These facts, along with RX and RY from part (a), imply
Cov [X1 , Y1 ] = E [X1 Y1 ] (11)
= E [X1 (X1 + X2 )] = RX (1, 1) + RX (1, 2) = 7/4, (12)
Var[Y1 ] = E Y12 = RY (1, 1) = 7/2 (13)
Thus
Cov [X1 , Y1 ] 7/4 1
a∗ = = = , b∗ = µX1 − a∗ µY1 = 0. (14)
Var[Y1 ] 7/2 2
Thus the optimum linear estimate of X1 given Y1 is
1
X̂1 (Y1 ) = Y1 . (15)
2
From Theorem 9.4(b), the mean square error of this estimator is
e∗L = σX
2
1
(1 − ρ2X1 ,Y1 ) (16)
Since X1 and Y1 have zero expected value, σX 2 = R (1, 1) = 1 and σ 2 = R (1, 1) = 7/2.
1 X Y1 Y
Also, since Cov[X1 , Y1 ] = 7/4, we see that
)
Cov [X1 , Y1 ] 7/4 7
ρX1 ,Y1 = = = . (17)
σX1 σY1 7/2 8
Thus e∗L = 1 − ( 7/8)2 = 1/8. Note that 1/8 > 3/52. As we would expect, the estimate of
X1 based on just Y1 has larger mean square error than the estimate based on both Y1 and Y2 .
339
Once again, independence of W and X1 yields
This implies
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
E X12 RX (1, 1) 1
⎣ ⎦ ⎣ ⎦ 1 1 0 ⎣ ⎦ 7/4
RYX1 = A E [X2 X1 ] = A RX (2, 1) = 3/4 = . (10)
0 1 1 5/4
E [X3 X1 ] RX (3, 1) 1/2
(b) By Theorem 9.7(c), the mean squared error of the optimal estimator is
In Problem 9.4.1, we solved essentialy the same problem but the observations Y were not
subjected to the additive noise W. In comparing the estimators, we see that the additive noise
perturbs the estimator somewhat but not dramatically because the correaltion structure of
X and the mapping A from X to Y remains unchanged. On the other hand, in the noiseless
case, the resulting mean square error was about half as much, 3/52 = 0.0577 versus 0.1098.
(c) We can estimate random variable X1 based on the observation of random variable Y1 using
Theorem 9.4. Note that Theorem 9.4 is a special case of Theorem 9.8 in which the observation
is a random vector. In any case, from Theorem 9.4, the optimum linear estimate is X̂1 (Y1 ) =
a∗ Y1 + b∗ where
Cov [X1 , Y1 ]
a∗ = , b∗ = µX1 − a∗ µY1 . (16)
Var[Y1 ]
Since E[Xi ] = µXi = 0 and Y1 = X1 + X2 + W1 , we see that
340
Thus
Cov [X1 , Y1 ] 7/4 35
a∗ = = = , b∗ = µX1 − a∗ µY1 = 0. (21)
Var[Y1 ] 3.6 72
Thus the optimum linear estimate of X1 given Y1 is
35
X̂1 (Y1 ) = Y1 . (22)
72
From Theorem 9.4(b), the mean square error of this estimator is
e∗L = σX
2
1
(1 − ρ2X1 ,Y1 ) (23)
Since X1 and Y1 have zero expected value, σX 2 = R (1, 1) = 1 and σ 2 = R (1, 1) = 3.6.
1 X Y1 Y
Also, since Cov[X1 , Y1 ] = 7/4, we see that
√
Cov [X1 , Y1 ] 7/4 490
ρX1 ,Y1 = =√ = . (24)
σX1 σY1 3.6 24
Thus e∗L = 1 − (490/242 ) = 0.1493. As we would expect, the estimate of X1 based on just Y1
has larger mean square error than the estimate based on both Y1 and Y2 .
CX = RX − µX µX (2)
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 3/4 1/2 −0.1 0.99 0.75 0.51
= ⎣3/4 1 1/2⎦ − ⎣ 0 ⎦ −0.1 0 0.1 = ⎣0.75 1.0 0.75⎦ , (3)
1/2 3/4 1 0.1 0.51 0.75 0.99
and
Y1 1 1 0
Y= = AX + W = X + W. (4)
Y2 0 1 1
(a) Since E[X] is nonzero, we use Theorem 9.8 which states that the minimum mean square error
estimate of X1 is X̂1 (Y) = â Y + b̂ where â = C−1
Y CYX1 and b̂ = E[X1 ] − â E[Y].
First we find RY . Since E[Y] = AE[X] + E[W] = AE[X],
CY = E (Y − E [Y])(Y − E [Y]) (5)
= E (A(X − E [X]) + W)(A(X − E [X]) + W) (6)
= E (A(X − E [X]) + W)((X − E [X]) A + W ) (7)
= AE (X − E [X])(X − E [X]) A
+ E W(X − E [X]) A + AE (X − E [X])W + E WW (8)
341
Since X and W are independent, E[W(X − E[X]) ] = 0 and E[(X − E[X])W ] = 0. This
implies
CY = ACX A + RW (9)
⎡ ⎤⎡ ⎤
0.99 0.75 0.51 1 0
1 1 0 ⎣ 0.1 0 3.59 3.01
= 0.75 1.0 0.75⎦ ⎣1 1⎦ + = . (10)
0 1 1 0 0.1 3.01 3.59
0.51 0.75 0.99 0 1
We note that although E[X] = 0.1, the estimator’s offset is only 0.0172. This is because the
change in E[X] is also included in the change in E[Y ].
(b) By Theorem 9.8(c), the mean square error of the optimal estimator is
(c) We can estimate random variable X1 based on the observation of random variable Y1 using
Theorem 9.4. Note that Theorem 9.4 is a special case of Theorem 9.8 in which the observation
is a random vector. In any case, from Theorem 9.4, the optimum linear estimate is X̂1 (Y1 ) =
a∗ Y1 + b∗ where
Cov [X1 , Y1 ]
a∗ = , b∗ = µX1 − a∗ µY1 . (22)
Var[Y1 ]
342
First we note that E[X1 ] = µX1 = −0.1. Second, since Y1 = X1 + X2 + W1 and E[W1 ] = 0,
we see that
Thus
Cov [X1 , Y1 ] 1.74
a∗ = = = 0.4847, b∗ = µX1 − a∗ µY1 = −0.0515. (30)
Var[Y1 ] 3.59
Thus the optimum linear estimate of X1 given Y1 is
e∗L = σX
2
1
(1 − ρ2X1 ,Y1 ) (32)
Note that σX2 = R (1, 1) = 0.99 and σ 2 = R (1, 1) = 3.59. Also, since Cov[X , Y ] = 1.74,
1 X Y1 Y 1 1
we see that
Cov [X1 , Y1 ] 1.74
ρX1 ,Y1 = = = 0.923. (33)
σX1 σY1 (0.99)(3.59)
Thus e∗L = 0.99(1 − (0.923)2 ) = 0.1466. As we would expect, the estimate of X1 based on
just Y1 has larger mean square error than the estimate based on both Y1 and Y2 .
343
Problem 9.4.5 Solution
In this problem, we view Y = X1 X2 as the observation and X = X3 as the variable we wish
to estimate. Since E[X] = 0, we can use Theorem 9.7(a) to find the minimum mean square error
estimate X̂L (Y) = â Y where â = R−1
Y RYX .
(c) In the previous part, we found that the optimal linear estimate of X3 based on the observation
of random variables X1 and X2 employed only X2 . Hence this same estimate, X̂3 = −0.8X2 ,
is the optimal linear estimate of X3 just using X2 . (This can be derived using Theorem 9.4,
if you wish to do more algebra.)
(d) Since the estimator is the same, the mean square error is still e∗L = 0.36.
RY â = RYX . (1)
Note that
⎡ ⎤ ⎡⎡ ⎤ ⎤ ⎡ n−1 ⎤
1 c ··· cn−2 X1 c
⎢ .. .. ⎥ ⎢⎢ ⎥ ⎥ ⎢cn−2 ⎥
⎢ c c2 . . ⎥ ⎢⎢ X2 ⎥ ⎥ ⎢ ⎥
RY = E YY = ⎢
⎢
⎥,
⎥ RYX = E ⎢⎢ .. ⎥ Xn ⎥ = ⎢ .. ⎥ . (2)
⎣
..
.
..
.
..
. c ⎦ ⎣⎣ . ⎦ ⎦ ⎣ . ⎦
cn−2 · · · c 1 Xn−1 c
344
We see that the last column of cRY equals RYX . Equivalently, if â = 0 · · · 0 c , then
RY â = RYX . It follows that the optimal linear estimator of Xn given Y is
X̂n (Y) = â Y = cXn−1 , (3)
which completes the proof of the claim.
The mean square error of this estimate is
e∗L = E (Xn − cXn−1 )2 (4)
= RX (n, n) − cRX (n, n − 1) − cRX (n − 1, n) + c2 RX (n − 1, n − 1) (5)
= 1 − 2c + c = 1 − c
2 2 2
(6)
When c is close to 1, Xn−1 and Xn are highly correlated and the estimation error will be small.
Comment: We will see in Chapter 11 that correlation matrices with this structure arise frequently
in the study of wide sense stationary random sequences. In fact, if you read ahead, you will find
that the claim we just proved is the essentially the same as that made in Theorem 11.10.
(a) In this case, we use the observation Y to estimate each Xi . Since E[Xi ] = 0,
k
√
E [Y] = E [Xj ] pj Sj + E [N] = 0. (1)
j=1
Thus, Theorem 9.7(a) tells us that the MMSE linear estimate of Xi is X̂i (Y) = â Y where
â = R−1
Y RYXi . First we note that
⎡⎛ ⎞ ⎤
k
√
RYXi = E [YXi ] = E ⎣⎝ Xj pj Sj + N⎠ Xi ⎦ (2)
j=1
k
k
√
= pj pl E [Xj Xl ] Sj S l
j=1 l=1
k
√
k
√
+ pj E [Xj N] Sj + pl E Xl N S l + E NN (5)
j=1 l=1
=0 =0
k
= pj Sj S j + σ 2 I (6)
j=1
345
Now we use a linear algebra identity. For a matrix S with columns S1 , S2 , . . . , Sk , and a
diagonal matrix P = diag[p1 , p2 , . . . , pk ],
k
pj Sj S j = SPS . (7)
j=1
(b) We observe that V = (SPS + σ 2 I)−1 Y is a vector that does not depend on which bit Xi
√
that we want to estimate. Since X̂i = pi S i V, we can form the vector of estimates
⎡ ⎤ ⎡√ ⎤ ⎡√ ⎤⎡ ⎤
X̂1 p1 S 1 V p1 S 1
⎢ .. ⎥ ⎢ . ⎥ ⎢ .. ⎥ ⎢ .. ⎥
X̂ = ⎣ . ⎦ = ⎣ .. ⎦=⎣ . ⎦⎣ . ⎦V (11)
√ √
X̂k pk S k V pk S k
1/2
=P SV (12)
−1
= P1/2 S SPS + σ 2 I Y (13)
0.4
0.2
0
0 5 10 15 20 25
Although the plot lacks labels, there are three curves for the mean square error MSE(n) corre-
sponding to φ0 ∈ {0.1, 0.5, 0.9}. Keep in mind that MSE(n) is the MSE of the linear estimate of
X21 using random variables X1 , . . . , Xn .
If you run the commands, you’ll find that the φ0 = 0.1 yields the lowest mean square error
while φ0 = 0.9 results in the highest mean square error. When φ0 = 0.1, random variables Xn
for n = 10, 11, . . . , 20 are increasingly correlated with X21 . The result is that the MSE starts to
346
decline rapidly for n > 10. As φ0 increases, fewer observations Xn are correlated with X21 . The
result is the MSE is simply worse as φ0 increases. For example, when φ0 = 0.9, even X20 has only
a small correlation with X21 . We only get a good estimate of X21 at time n = 21 when we observe
X21 + W21 .
0 0 0
0 10 20 0 10 20 0 10 20
n=0:20; n=0:20; n=0:20;
r1=cos(0.1*pi*n); r5=cos(0.5*pi*n); r9=cos(0.9*pi*n);
mse(r1); mse(r5); mse(r9);
All three cases report similar results for the mean square error (MSE). the reason is that in all
three cases, X1 and X21 are completely correlated; that is, ρX1 ,X21 = 1. As a result, X1 = X21 so
that at time n = 1, the observation is
Y1 = X1 + W1 = X21 + W1 . (1)
The MSE at time n = 1 is 0.1, corresponding to the variance of the additive noise. Subsequent
improvements in the estimates are the result of making other measurements of the form Yn =
Xn + Wn where Xn is highly correlated with X21 . The result is a form of averaging of the additive
noise, which effectively reduces its variance.
The choice of φ0 changes the values of n for which Xn and X21 are highly correlated. However,
in all three cases, there are several such values of n and the result in all cases is an improvement
in the MSE due to averaging the additive noise.
Note that RX(n) and RW(n) are the n × n upper-left submatrices of RX and RW . In addition,
⎡⎡ ⎤ ⎤ ⎡ ⎤
X 1 + W1 r0
⎢⎢ .. ⎥ ⎥ ⎢ .. ⎥
RY(n) X = E ⎣⎣ . ⎦ X1 ⎦ = ⎣ . ⎦ . (2)
X n + Wn rn−1
347
Compared to the solution of Example 9.10, the only difference in the solution is in the reversal of
the vector RY(n) X . The optimal filter based on the first n observations is â(n) = R−1 R (n) ,
Y (n) Y X
and the mean square error is
e∗L = Var[X1 ] − (â(n) ) RY(n) X . (3)
MSE
0.05 0.05
0 0
0 5 10 15 20 25 0 5 10 15 20 25
n n
ra=sinc(0.1*pi*(0:20)); rb=cos(0.5*pi*(0:20));
mse953(ra) mse953(rb)
(a) (b)
In comparing the results of cases (a) and (b), we see that the mean square estimation error depends
strongly on the correlation structure given by r|i−j| . For case (a), Y1 is a noisy observation of X1
and is highly correlated with X1 . The MSE at n = 1 is just the variance of W1 . Additional samples
of Yn mostly help to average the additive noise. Also, samples Xn for n ≥ 10 have very little
correlation with X1 . Thus for n ≥ 10, the samples of Yn result in almost no improvement in the
estimate of X1 .
In case (b), Y1 = X1 + W1 , just as in case (a), is simply a noisy copy of X1 and the estimation
error is due to the variance of W1 . On the other hand, for case (b), X5 , X9 , X13 and X17 and X21
are completely correlated with X1 . Other samples also have significant correlation with X1 . As a
result, the MSE continues to go down with increasing n.
348
function S=randomsignals(n,k); The transmitted data vector x belongs to the set Bk of
%S is an n by k matrix, columns are all binary ±1 vectors of length k. This short program
%random unit length signal vectors generates k random signals, each of length n. Each
S=(rand(n,k)>0.5); random signal is a binary ±1 sequence normalized to
S=((2*S)-1.0)/sqrt(n);
length 1.
The evaluation of the LMSE detector is most similar to evaluation of the matched filter (MF)
detector in Problem 8.4.6. We define a function err=lmsecdmasim(S,P,m) that simulates the
LMSE detector for m symbols for a given set of signal vectors S. In lmsecdmasim, there is no
need for looping. The mth transmitted symbol is represented by the mth column of X and the
corresponding received signal is given by the mth column of Y. The matched filter processing can
be applied to all m columns at once. A second function Pe=lmsecdma(n,k,snr,s,m) cycles through
all combination of users k and SNR snr and calculates the bit error rate for each pair of values.
Here are the functions:
function e=lmsecdmasim(S,P,m); function Pe=lmsecdma(n,k,snr,s,m);
%err=lmsecdmasim(P,S,m); %Pe=lmsecdma(n,k,snr,s,m);
%S= n x k matrix of signals %RCDMA, LMSE detector, users=k
%P= diag matrix of SNRs %proc gain=n, rand signals/frame
% SNR=power/var(noise) % s frames, m symbols/frame
k=size(S,2); %no. of users %See Problem 9.5.4 Solution
n=size(S,1); %proc. gain [K,SNR]=ndgrid(k,snr);
P2=sqrt(P); Pe=zeros(size(SNR));
X=randombinaryseqs(k,m); for j=1:prod(size(SNR)),
Y=S*P2*X+randn(n,m); p=SNR(j);kt=K(j); e=0;
L=P2*S’*inv((S*P*S’)+eye(n)); for i=1:s,
XR=sign(L*Y); S=randomsignals(n,kt);
e=sum(sum(XR ~= X)); e=e+lmsecdmasim(S,p*eye(kt),m);
end
Pe(j)=e/(s*m*kt);
disp([snr kt e]);
end
Here is a run of lmsecdma.
>> pelmse = lmsecdma(32,k,4,1000,1000);
4 2 48542
4 4 109203
4 8 278266
4 16 865358
4 32 3391488
>> pelmse’
ans =
0.0243 0.0273 0.0348 0.0541 0.1060
>>
For processing gain n = 32, the maximum likelihood detector is too complex for my version of
Matlab to run quickly. Instead we can compare the LMSE detector to the matched filter (MF)
detector of Problem 8.4.6 and the decorrelator of Problem 8.4.7 with the following script:
k=[2 4 8 16 32];
pemf = mfrcdma(32,k,4,1000,1000);
pedec=berdecorr(32,k,4,10000);
pelmse = lmsecdma(32,k,4,1000,1000);
plot(k,pemf,’-d’,k,pedec,’-s’,k,pelmse);
legend(’MF’,’DEC’,’LMSE’,2);
axis([0 32 0 0.5]);
349
The resulting plot resembles
0.5
MF
0.4 DEC
LMSE
0.3
0.2
0.1
0
0 5 10 15 20 25 30
Compared to the matched filter and the decorrelator, the linear pre-processing of the LMSE detector
offers an improvement in the bit error rate. Recall that for each bit Xi , the decorrelator zeroes
out the interference from all users j = i at the expense of enhancing the receiver noise. When the
number of users is small, the decorrelator works well because the cost of suppressing other users is
small. When the number of users equals the processing gain, the decorrelator owrks poorly because
the noise is gratly enhanced. By comparison, the LMSE detector applies linear processing that
results in an output that minimizes the mean square error between the output and the original
transmitted bit. It works about as well as the decorrelator when the number of users is small. For
a large number of users, it still works better than the matched filter detector.
350
Problem Solutions – Chapter 10
• In Example 10.3, the daily noontime temperature at Newark Airport is a discrete time,
continuous value random process. However, if the temperature is recorded only in units of
one degree, then the process was would be discrete value.
• In Example 10.4, the number of active telephone calls is discrete time and discrete value.
• The dice rolling experiment of Example 10.5 yields a discrete time, discrete value random
process.
• The QPSK system of Example 10.6 is a continuous time and continuous value random process.
0
−0.5
−1
0 0.2T 0.4T 0.6T 0.8T T
1
0.5
x(t,s1)
0
−0.5
−1
0 0.2T 0.4T 0.6T 0.8T T
1
0.5
x(t,s2)
0
−0.5
−1
0 0.2T 0.4T 0.6T 0.8T T
1
0.5
x(t,s3)
0
−0.5
−1
0 0.2T 0.4T 0.6T 0.8T T
t
351
1
0
−1
10 T 2T 3T
0
−1
10 T 2T 3T
0
−1
10 T 2T 3T
0
−1
10 T 2T 3T
0
−1
10 T 2T 3T
0
−1
10 T 2T 3T
0
−1
10 T 2T 3T
0
−1
0 T 2T 3T
P [X(t) ≤ x] = P [t − W ≤ x] = P [W ≥ t − x] (1)
We note that the CDF contain no discontinuities. Taking the derivative of the CDF FX(t) (x) with
respect to x, we obtain the PDF
x−t
e x<t
fX(t) (x) = (4)
0 otherwise
352
Problem 10.3.2 Solution
(b) To find the PMF of T1 , we view each oscillator test as an independent trial. A success occurs
on a trial with probability p if we find a one part in 104 oscillator. The first one part in 104
oscillator is found at time T1 = t if we observe failures on trials 1, . . . , t − 1 followed by a
success on trial t. Hence, just as in Example 2.11, T1 has the geometric PMF
(1 − p)t−1 p t = 1, 2, . . .
PT1 (t) = (3)
9 otherwise
A geometric random variable with success probability p has mean 1/p. This is derived in
Theorem 2.5. The expected time to find the first good oscillator is E[T1 ] = 1/p = 20 minutes.
(c) Since p = 0.05, the probability the first one part in 104 oscillator is found in exactly 20
minutes is PT1 (20) = (0.95)19 (0.05) = 0.0189.
(d) The time T5 required to find the 5th one part in 104 oscillator is the number of trials needed
for 5 successes. T5 is a Pascal random variable. If this is not clear, see Example 2.15 where
the Pascal PMF is derived. When we are looking for 5 successes, the Pascal PMF is
t−1 5
4 p (1 − p)
t−5 t = 5, 6, . . .
PT5 (t) = (4)
0 otherwise
Looking up the Pascal PMF in Appendix A, we find that E[T5 ] = 5/p = 100 minutes. The
following argument is a second derivation of the mean of T5 . Once we find the first one part in
104 oscillator, the number of additional trials needed to find the next one part in 104 oscillator
once again has a geometric PMF with mean 1/p since each independent trial is a success with
probability p. Similarly, the time required to find 5 one part in 104 oscillators is the sum of
five independent geometric random variables. That is,
T5 = K1 + K2 + K3 + K4 + K5 (5)
where each Ki is identically distributed to T1 . Since the expectation of the sum equals the
sum of the expectations,
353
Problem 10.3.3 Solution
Once we find the first one part in 104 oscillator, the number of additional tests needed to find
the next one part in 104 oscillator once again has a geometric PMF with mean 1/p since each
independent trial is a success with probability p. That is T2 = T1 + T where T is independent and
identically distributed to T1 . Thus,
E [T2 |T1 = 3] = E [T1 |T1 = 3] + E T |T1 = 3 (1)
= 3 + E T = 23 minutes. (2)
Note that condition T ≤ t is needed to make sure that the pulse doesn’t arrive after time t. The
other condition T > t + ln x ensures that the pulse didn’t arrrive too early and already decay too
much. We can express these facts in terms of the CDF of X(t).
⎧
⎨ 0 x<0
FX(t) (x) = 1 − P [X(t) > x] = 1 + FT (t + ln x) − FT (t) 0 ≤ x < 1 (4)
⎩
1 x≥1
We can take the derivative of the CDF to find the PDF. However, we need to keep in mind that
the CDF has a jump discontinuity at x = 0. In particular, since ln 0 = −∞,
Hence, when we take a derivative, we will see an impulse at x = 0. The PDF of X(t) is
(1 − FT (t))δ(x) + fT (t + ln x) /x 0 ≤ x < 1
fX(t) (x) = (6)
0 otherwise
354
Wn both use Xn−1 in their averaging, Wn−1 and Wn are dependent. We can verify this observation
by calculating the covariance of Wn−1 and Wn . First, we observe that for all n,
Thus,
900 + 916 + 900 + 900
Cov [Wn−1 , Wn ] = − 900 = 4 (5)
4
Since Cov[Wn−1 , Wn ] = 0, Wn and Wn−1 must be dependent.
Since this argument is valid for all k including k = 1, we can conclude that Y1 , Y2 , . . . are identically
distributed. Moreover, since the trials are independent, the failures between successes k − 1 and k
and the number of failures between successes k − 1 and k are independent. Hence, Y1 , Y2 , . . . is an
iid sequence.
Using the general expression for the PMF, we can write down the answer for each part.
355
Problem 10.5.2 Solution
Following the instructions given, we express each answer in terms of N (m) which has PMF
(6m)n e−6m /n! n = 0, 1, 2, . . .
PN (m) (n) = (1)
0 otherwise
(a) The probability of no queries in a one minute interval is PN (1) (0) = 60 e−6 /0! = 0.00248.
(b) The probability of exactly 6 queries arriving in a one minute interval is PN (1) (6) = 66 e−6 /6! =
0.161.
(c) The probability of exactly three queries arriving in a one-half minute interval is PN (0.5) (3) =
33 e−3 /3! = 0.224.
For t ≥ 2, the customers in service are precisely those customers that arrived in the interval (t−2, t].
The number of such customers has a Poisson PMF with mean λ[t − (t − 2)] = 2λ. The resulting
PMF of N (t) is
(2λ)n e−2λ /n! n = 0, 1, 2, . . .
PN (t) (n) = (t ≥ 2) (2)
0 otherwise
356
Problem 10.5.6 Solution
The time T between queries are independent exponential random variables with PDF
(1/8)e−t/8 t ≥ 0
fT (t) = (1)
0 otherwise
(b)
P [T ≥ 13, T ≥ 5]
P [T ≥ 13|T ≥ 5] = (3)
P [T ≥ 5]
P [T ≥ 13] e−13/8
= = −5/8 = e−1 ≈ 0.368 (4)
P [T ≥ 5] e
(c) Although the time betwen queries are independent exponential random variables, N (t) is not
exactly a Poisson random process because the first query occurs at time t = 0. Recall that
in a Poisson process, the first arrival occurs some time after t = 0. However N (t) − 1 is a
Poisson process of rate 8. Hence, for n = 0, 1, 2, . . .,
357
Problem 10.5.8 Solution
When x < 0, e−x > 1 so that P [Ui ≤ e−x ] = 1. When x ≥ 0, we have 0 < e−x ≤ 1, implying
P [Ui ≤ e−x ] = e−x . Combining these facts, we have
1 x<0
P [Xi > x] = (2)
e−x x ≥ 0
n
n+1
ln Ui ≥ −t > ln Ui (5)
i=1 i=1
358
Problem 10.6.2 Solution
In an interval (t, t + ∆] with an infinitesimal ∆, let Ai denote the event of an arrival of the process
Ni (t). Also, let A = A1 ∪A2 denote the event of an arrival of either process. Since Ni (t) is a Poisson
process, the alternative model says that P [Ai ] = λi ∆. Also, since N1 (t)+N2 (t) is a Poisson process,
the proposed Poisson process model says
Lastly, the conditional probability of a type 1 arrival given an arrival of either type is
P [A1 A] P [A1 ] λ1 ∆ λ1
P [A1 |A] = = = = (2)
P [A] P [A] (λ1 + λ2 )∆ λ1 + λ2
This solution is something of a cheat in that we have used the fact that the sum of Poisson processes
is a Poisson process without using the proposed model to derive this fact.
Now we can consider the special cases arising when t < 2. When 0 ≤ t < 1, every arrival is still in
service. Thus the number in service N (t) equals the number of arrivals and has the PMF
(λt)n e−λt /n! n = 0, 1, 2, . . .
PN (t) (n) = (0 ≤ t ≤ 1) (2)
0 otherwise
When 1 ≤ t < 2, let M1 denote the number of customers in the interval (t − 1, t]. All M1 customers
arriving in that interval will be in service at time t. The M2 customers arriving in the interval
(0, t − 1] must each flip a coin to decide one a 1 minute or two minute service time. Only those
customers choosing the two minute service time will be in service at time t. Since M2 has a Poisson
PMF with mean λ(t − 1), the number M2 of those customers in the system at time t has a Poisson
PMF with mean λ(t − 1)/2. Finally, the number of customers in service at time t has a Poisson
359
PMF with expected value E[N (t)] = E[M1 ] + E[M2 ] = λ + λ(t − 1)/2. Hence, the PMF of N (t)
becomes
(λ(t + 1)/2)n e−λ(t+1)/2 /n! n = 0, 1, 2, . . .
PN (t) (n) = (1 ≤ t ≤ 2) (3)
0 otherwise
that each Si is in the interval (si , si + ∆] and that N = n. This joint event implies that there
were zero arrivals in each interval (si + ∆, si+1 ]. That is, over the interval [0, T ], the Poisson
< has exactly one arrival in each interval (si , si + ∆] and zero arrivals in the time period
process
T − ni=1 (si , si + ∆]. The collection of intervals in which there was no arrival had a total duration
of T − n∆. Note that the probability of exactly one arrival in the interval (si , si + ∆] is λ∆e−λδ
and the probability of zero arrivals in a period of duration T − n∆ is e−λ(Tn −∆) . In addition, the
event of one arrival in each interval (si , si + ∆) and zero events in the period of length T − n∆ are
independent events because they consider non-overlapping periods of the Poisson process. Thus,
n
P [s1 < S1 ≤ s1 + ∆, . . . , sn < Sn ≤ sn + ∆, N = n] = λ∆e−λ∆ e−λ(T −n∆) (1)
= (λ∆)n e−λT (2)
If it seems that the above argument had some “hand-waving,” we now do the derivation of
P [s1 < S1 ≤ s1 + ∆, . . . , sn < Sn ≤ sn + ∆|N = n] in somewhat excruciating detail. (Feel free to
skip the following if you were satisfied with the earlier explanation.)
360
For the interval (s, t], we use the shorthand notation 0(s,t) and 1(s,t) to denote the events of 0
arrivals and 1 arrival respectively. This notation permits us to write
The set of events 0(0,s1 ) , 0(sn +∆,T ) , and for i = 1, . . . , n−1, 0(si +∆,si+1 ) and 1(si ,si +∆) are independent
because each devent depend on the Poisson process in a time interval that overlaps none of the
other time intervals. In addition, since the Poisson process has rate λ, P [0(s,t) ] = e−λ(t−s) and
P [1(si ,si +∆) ] = (λ∆)e−λ∆ . Thus,
W1 = X 1 (1)
W2 = X 1 + X 2 (2)
..
.
Wk = X 1 + X 2 + · · · + X k . (3)
361
In terms of matrices, W = AX where A is the lower triangular matrix
⎡ ⎤
1
⎢1 1 ⎥
⎢ ⎥
A = ⎢. . ⎥. (4)
⎣.. . . ⎦
1 ··· ··· 1
Since E[W] = AE[X] = 0, it folows from Theorem 5.16 that
1
fW (w) = fX A−1 w . (5)
|det (A)|
Since A is a lower triangular matrix, det(A) = 1, the product of its diagonal entries. In addition,
reflecting the fact that each Xn = Wn − Wn−1 ,
⎡ ⎤ ⎡ ⎤
1 W1
⎢−1 1 ⎥ ⎢ W2 − W1 ⎥
⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥
A−1 = ⎢ 0 −1 1 ⎥ and A−1 W = ⎢ W3 − W2 ⎥ . (6)
⎢ .. .. .. .. ⎥ ⎢ .. ⎥
⎣ . . . . ⎦ ⎣ . ⎦
0 · · · 0 −1 1 Wk − Wk−1
2
Combining these facts with the observation that fX (x) = kn=1 fXn (xn ), we can write
k
fW (w) = fX A−1 w = fXn (wn − wn−1 ) , (7)
n=1
which completes the missing steps in the proof of Theorem 10.8.
362
Problem 10.8.3 Solution
In this problem, the daily temperature process results from
2πn
Cn = 16 1 − cos + 4Xn (1)
365
where Xn is an iid random sequence of N [0, 1] random variables. The hardest part of this problem
is distinguishing between the process Cn and the covariance function CC [k].
(c) A model of this type may be able to capture the mean and variance of the daily temperature.
However, one reason this model is overly simple is because day to day temperatures are
uncorrelated. A more realistic model might incorporate the effects of “heat waves” or “cold
spells” through correlated daily temperatures.
E [C0 ] E [Xi ]
n
E [Cn ] = + 4 =0 (5)
2n 2n−i
i=1
363
Since C0 , X1 , X2 , . . . are independent (and zero mean), E[C0 Xi ] = 0. This implies
E C02 E [Xi Xj ]
m m+k
CC [m, k] = 2m+k + 16 (7)
2 2m−i 2m+k−j
i=1 j=1
For i = j, E[Xi Xj ] = 0 so that only the i = j terms make any contribution to the double
sum. However, at this point, we must consider the cases k ≥ 0 and k < 0 separately. Since
each Xi has variance 1, the autocovariance for k ≥ 0 is
1
m
1
CC [m, k] = + 16 (8)
22m+k 22m+k−2i
i=1
1 16 m
= + k (1/4)m−i (9)
22m+k 2
i=1
1 16 1 − (1/4)m
= + k (10)
22m+k 2 3/4
For k < 0, we can write
E C02 E [Xi Xj ]
m m+k
CC [m, k] = 2m+k + 16 (11)
2 2m−i 2m+k−j
i=1 j=1
1
m+k
1
= + 16 (12)
22m+k 22m+k−2i
i=1
1 16
m+k
= + −k (1/4)m+k−i (13)
22m+k 2
i=1
1 16 1 − (1/4)m+k
= + k (14)
22m+k 2 3/4
A general expression that’s valid for all m and k is
1 16 1 − (1/4)min(m,m+k)
CC [m, k] = + (15)
22m+k 2|k| 3/4
(c) Since E[Ci ] = 0 for all i, our model has a mean daily temperature of zero degrees Celsius for
the entire year. This is not a reasonable model for a year.
(d) For the month of January, a mean temperature of zero degrees Celsius seems quite reasonable.
we can calculate the variance of Cn by evaluating the covariance at n = m. This yields
1 16 4(4n − 1)
Var[Cn ] = + (16)
4n 4n 3
Note that the variance is upper bounded by
364
Problem 10.8.5 Solution
This derivation of the Poisson process covariance is almost identical to the derivation of the Brown-
ian motion autocovariance since both rely on the use of independent increments. From the definition
of the Poisson process, we know that µN (t) = λt. When τ ≥ 0, we can write
By the definition of the Poisson process, N (t + τ ) − N (t) is the number of arrivals in the interval
[t, t + τ ) and is independent of N (t) for τ > 0. This implies
Note that since N (t) is a Poisson random variable, Var[N (t)] = λt. Hence
E N 2 (t) = Var[N (t)] + (E [N (t)]2 = λt + (λt)2 (5)
Therefore, for τ ≥ 0,
If τ < 0, then we can interchange the labels t and t+τ in the above steps to show CN (t, τ ) = λ(t+τ ).
For arbitrary t and τ , we can combine these facts to write
Thus,
This implies
365
Problem 10.9.2 Solution
For an arbitrary set of samples Y (t1 ), . . . , Y (tk ), we observe that Y (tj ) = X(atj ). This implies
Thus,
fY (t1 +τ ),...,Y (tk +τ ) (y1 , . . . , yk ) = fX(at1 +aτ ),...,X(atk +aτ ) (y1 , . . . , yk ) (2)
We see that a time offset of τ for the Y (t) process corresponds to an offset of time τ = aτ for the
X(t) process. Since X(t) is a stationary process,
366
Problem 10.9.5 Solution
Given A = a, Y (t) = aX(t) which is a special case of Y (t) = aX(t) + b given in Theorem 10.10.
Applying the result of Theorem 10.10 with b = 0 yields
1 y yn
1
fY (t1 ),...,Y (tn )|A (y1 , . . . , yn |a) = n fX(t1 ),...,X(tn ) ,..., (1)
a a a
Integrating over the PDF fA (a) yields
' ∞
fY (t1 ),...,Y (tn ) (y1 , . . . , yn ) = fY (t1 ),...,Y (tn )|A (y1 , . . . , yn |a) fA (a) da (2)
'0 ∞ y
1 1 yn
= f , . . . , fA (a) da (3)
0 an X(t1 ),...,X(tn ) a a
This complicated expression can be used to find the joint PDF of Y (t1 + τ ), . . . , Y (tn + τ ):
' ∞ y
1 1 yn
fY (t1 +τ ),...,Y (tn +τ ) (y1 , . . . , yn ) = fX(t1 +τ ),...,X(tn +τ ) , . . . , fA (a) da (4)
0 an a a
Since X(t) is a stationary process, the joint PDF of X(t1 + τ ), . . . , X(tn + τ ) is the same as the
joint PDf of X(t1 ), . . . , X(tn ). Thus
' ∞ y
1 1 yn
fY (t1 +τ ),...,Y (tn +τ ) (y1 , . . . , yn ) = f , . . . , fA (a) da (5)
an X(t1 +τ ),...,X(tn +τ ) a a
'0 ∞ y
1 1 yn
= f X(t 1 ),...,X(t n ) , . . . , fA (a) da (6)
0 an a a
= fY (t1 ),...,Y (tn ) (y1 , . . . , yn ) (7)
We can conclude that Y (t) is a stationary process.
In principle, we can calculate P [Aτ ] by integrating fX(t1 +τ ),...,X(tn +τ ) (x1 , . . . , xn ) over the region
corresponding to event Aτ . Since X(t) is a stationary process,
fX(t1 +τ ),...,X(tn +τ ) (x1 , . . . , xn ) = fX(t1 ),...,X(tn ) (x1 , . . . , xn ) (4)
This implies P [Aτ ] does not depend on τ . In particular,
FY (t1 +τ ),...,Y (tn +τ ) (y1 , . . . , yn ) = P [Aτ ] (5)
= P [g(X(t1 )) ≤ y1 , . . . , g(X(tn )) ≤ yn ] (6)
= FY (t1 ),...,Y (tn ) (y1 , . . . , yn ) (7)
367
Problem 10.10.1 Solution
The autocorrelation function RX (τ ) = δ(τ ) is mathematically valid in the sense that it meets the
conditions required in Theorem 10.12. That is,
RX (τ ) = δ(τ ) ≥ 0 (1)
RX (τ ) = δ(τ ) = δ(−τ ) = RX (−τ ) (2)
RX (τ ) ≤ RX (0) = δ(0) (3)
However, for a process X(t) with the autocorrelation RX (τ ) = δ(τ ), Definition 10.16 says that the
average power of the process is
E X 2 (t) = RX (0) = δ(0) = ∞ (4)
We see that neither E[Y (t)] nor RY (t, τ ) depend on t. Thus Y (t) is a wide sense stationary process.
Since X and Y are uncorrelated, E[XY ] = E[X]E[Y ] = 0. Thus, when we expand E[W (t)W (t + τ )]
and take the expectation, all of the XY cross terms will be zero. This implies
RW (t, τ ) = E X 2 cos 2πf0 t cos 2πf0 (t + τ ) + E Y 2 sin 2πf0 t sin 2πf0 (t + τ ) (4)
368
to write
σ2 σ2
RW (t, τ ) = (cos 2πf0 τ + cos 2πf0 (2t + τ )) + (cos 2πf0 τ − cos 2πf0 (2t + τ )) (8)
2 2
= σ 2 cos 2πf0 τ (9)
Thus RW (t, τ ) = RW (τ ). Since
E [W (t)] = E [X] cos 2πf0 t + E [Y ] sin 2πf0 t = 0, (10)
we can conclude that W (t) is a wide sense stationary process. However, we note that if E[X 2 ] =
E[Y 2 ], then the cos 2πf0 (2t + τ ) terms in RW (t, τ ) would not cancel and W (t) would not be wide
sense stationary.
(a) In the problem statement, we are told that X(t) has average power equal to 1. By Defini-
tion 10.16, the average power of X(t) is E[X 2 (t)] = 1.
(b) Since Θ has a uniform PDF over [0, 2π],
1/(2π) 0 ≤ θ ≤ 2π
fΘ (θ) = (1)
0 otherwise
The expected value of the random phase cosine is
' ∞
E [cos(2πfc t + Θ)] = cos(2πfc t + θ)fΘ (θ) dθ (2)
−∞
' 2π
1
= cos(2πfc t + θ) dθ (3)
0 2π
1
= sin(2πfc t + θ)|2π
0 (4)
2π
1
= (sin(2πfc t + 2π) − sin(2πfc t)) = 0 (5)
2π
(c) Since X(t) and Θ are independent,
E [Y (t)] = E [X(t) cos(2πfc t + Θ)] = E [X(t)] E [cos(2πfc t + Θ)] = 0 (6)
Note that the mean of Y (t) is zero no matter what the mean of X(t) since the random phase
cosine has zero mean.
(d) Independence of X(t) and Θ results in the average power of Y (t) being
E Y 2 (t) = E X 2 (t) cos2 (2πfc t + Θ) (7)
= E X 2 (t) E cos2 (2πfc t + Θ) (8)
= E cos2 (2πfc t + Θ) (9)
Note that we have used the fact from part (a) that X(t) has unity average power. To finish
the problem, we use the trigonometric identity cos2 φ = (1 + cos 2φ)/2. This yields
2 1
E Y (t) = E (1 + cos(2π(2fc )t + Θ)) = 1/2 (10)
2
Note that E[cos(2π(2fc )t + Θ)] = 0 by the argument given in part (b) with 2fc replacing fc .
369
Problem 10.10.5 Solution
This proof simply parallels the proof of Theorem 10.12. For the first item, RX [0] = RX [m, 0] =
E[Xm 2 ]. Since X 2 ≥ 0, we must have E[X 2 ] ≥ 0. For the second item, Definition 10.13 implies
m m
that
RX [k] = RX [m, k] = E [Xm Xm+k ] = E [Xm+k Xm ] = RX [m + k, −k] (1)
Since Xm is wide sense stationary, RX [m + k, −k] = RX [−k]. The final item requires more effort.
First, we note that when Xm is wide sense stationary, Var[Xm ] = CX [0], a constant for all t.
Second, Theorem 4.17 says that
Note that CX [m, k] ≤ |CX [m, k]|, and thus it follows that
(This little step was unfortunately omitted from the proof of Theorem 10.12.) Now for any numbers
a, b, and c, if a ≤ b and c ≥ 0, then (a + c)2 ≤ (b + c)2 . Choosing a = CX [m, k], b = CX [0], and
c = µ2X yields
2 2
CX [m, m + k] + µ2X ≤ CX [0] + µ2X (4)
In the above expression, the left side equals (RX [k])2 while the right side is (RX [0])2 , which proves
the third part of the theorem.
1 m
1 m
= E [Xn ] = µX = µX (2)
2m + 1 n=−m 2m + 1 n=−m
To show consistency,
mit is sufficient to show that limm→∞ Var[X m ] = 0. First, we observe that
X m − µX = 2m+1
1
n=−m (Xn − µX ). This implies
⎡! "2 ⎤
1 m
Var[X(T )] = E ⎣ (Xn − µX ) ⎦ (3)
2m + 1 n=−m
0 ! m "! m "1
1
=E (Xn − µX ) (Xn − µX ) (4)
(2m + 1)2 n=−m n =−m
1
m
m
= E [(Xn − µX )(Xn − µX )] (5)
(2m + 1)2 n=−m n =−m
1 m
m
= CX n − n . (6)
(2m + 1)2 n=−m n =−m
370
We note that
m
m
$ $
CX n − n ≤ $CX n − n $ (7)
n =−m n =−m
∞ ∞
$ $
≤ $CX n − n $ = |CX (k)| < ∞. (8)
n =−∞ k=−∞
1 m
K
Var[X m ] ≤ K= . (9)
(2m + 1)2 n=−m 2m + 1
In addition,
(b) To examine whether X(t) and W (t) are jointly wide sense stationary, we calculate
Since W (t) and X(t) are both wide sense stationary and since RW X (t, τ ) depends only on
the time difference τ , we can conclude from Definition 10.18 that W (t) and X(t) are jointly
wide sense stationary.
371
(a) Since E[X1 (t)] = E[X(t + a)] = µX and
RX1 (t, τ ) = E [X1 (t)X1 (t + τ )] (1)
= E [X(t + a)X(t + τ + a)] (2)
= RX (τ ), (3)
we have verified that X1 (t) is wide sense stationary. Now we calculate the cross correlation
RXX1 (t, τ ) = E [X(t)X1 (t + τ )] (4)
= E [X(t)X(t + τ + a)] (5)
= RX (τ + a). (6)
Since RXX1 (t, τ ) depends on the time difference τ but not on the absolute time t, we conclude
that X(t) and X1 (t) are jointly wide sense stationary.
(b) Since E[X2 (t)] = E[X(at)] = µX and
RX2 (t, τ ) = E [X2 (t)X2 (t + τ )] (7)
= E [X(at)X(a(t + τ ))] (8)
= E [X(at)X(at + aτ )] = RX (aτ ), (9)
we have verified that X2 (t) is wide sense stationary. Now we calculate the cross correlation
RXX2 (t, τ ) = E [X(t)X2 (t + τ )] (10)
= E [X(t)X(a(t + τ ))] (11)
= RX ((a − 1)t + τ ). (12)
Except for the trivial case when a = 1 and X2 (t) = X(t), RXX2 (t, τ ) depends on both the
absolute time t and the time difference τ , we conclude that X(t) and X2 (t) are not jointly
wide sense stationary.
(c) We have already verified that RY (t, τ ) depends only on the time difference τ . Since E[Y (t)] =
E[X(t − t0 )] = µX , we have verified that Y (t) is wide sense stationary.
(d) Since X(t) and Y (t) are wide sense stationary and since we have shown that RXY (t, τ )
depends only on τ , we know that X(t) and Y (t) are jointly wide sense stationary.
Comment: This problem is badly designed since the conclusions don’t depend on the specific
RX (τ ) given in the problem text. (Sorry about that!)
372
Problem 10.12.1(Solution
t+τ
Writing Y (t + τ ) = 0 N (v) dv permits us to write the autocorrelation of Y (t) as
' t ' t+τ
RY (t, τ ) = E [Y (t)Y (t + τ )] = E N (u)N (v) dv du (1)
0 0
' t ' t+τ
= E [N (u)N (v)] dv du (2)
0 0
' t ' t+τ
= αδ(u − v) dv du. (3)
0 0
At this point, it matters whether τ ≥ 0 or if τ < 0. When τ ≥ 0, then v ranges from 0 to t + τ and
at some point in the integral over v we will have v = u. That is, when τ ≥ 0,
' t
RY (t, τ ) = α du = αt. (4)
0
When τ < 0, then we must reverse the order of integration. In this case, when the inner integral
is over u, we will have u = v at some point so that
' t+τ ' t ' t+τ
RY (t, τ ) = αδ(u − v) du dv = α dv = α(t + τ ). (5)
0 0 0
(c) The general form of the multivariate density for X(t1 ), X(t2 ) is
1 1 −1 (x−µ )
fX(t1 ),X(t2 ) (x1 , x2 ) = e− 2 (x−µX ) C X (3)
(2π)k/2 |C|1/2
where k = 2 and x = x1 x2 and µX = µ1 µ2 . Hence,
1 1
= . (4)
(2π)k/2 |C| 1/2
2π σ12 σ22 (1 − ρ2 )
373
Furthermore, the exponent is
1
− (x̄ − µ̄X ) C−1 (x̄ − µ̄X )
2
⎡ 1 −ρ ⎤
1 1 ⎢ σ2 σ1 σ2 ⎥ x1 − µ1
= − x1 − µ1 x2 − µ2 ⎣ 1 1 ⎦ x2 − µ2 (5)
2 1 − ρ2 −ρ
σ12
σ1 σ2
2
x1 − µ1 2ρ(x1 − µ1 )(x2 − µ2 ) x2 − µ2 2
− +
σ1 σ1 σ2 σ2
=− (6)
2(1 − ρ2 )
Plugging in each piece into the joint PDF fX(t1 ),X(t2 ) (x1 , x2 ) given above, we obtain the
bivariate Gaussian PDF.
Since X is a Gaussian random vector and W = AX with A a rank n matrix, Theorem 5.16 implies
that W is a Gaussian random vector.
374
In noisycosine.m, we use a function subsample.m to obtain the discrete time sample functions.
In fact, subsample is hardly necessary since it’s such a simple one-line Matlab function:
function y=subsample(x,n)
%input x(1), x(2) ...
%output y(1)=x(1), y(2)=x(1+n), y(3)=x(2n+1)
y=x(1:n:length(x));
>> t=(1:600)’; These commands will simulate the switch for 600 minutes, pro-
>> M=simswitch(10,0.1,t); ducing the vector M of samples of M (t) each minute, the vector
>> Mavg=cumsum(M)./t; Mavg which is the sequence of time average estimates, and a plot
>> plot(t,M,t,Mavg); resembling this one:
150
100
M(t)
50
M(t)
Time Average M(t)
0
0 100 200 300 400 500 600
t
From the figure, it appears that the time average is converging to a value in th neighborhood of 100.
In particular, because the switch is initially empty with M (0) = 0, it takes a few hundred minutes
for the time average to climb to something close to 100. Following the problem instructions, we
can write the following short program to examine ten simulation runs:
function Mavg=simswitchavg(T,k)
%Usage: Mavg=simswitchavg(T,k)
%simulate k runs of duration T of the
%telephone switch in Chapter 10
%and plot the time average of each run
t=(1:k)’;
%each column of Mavg is a time average sample run
Mavg=zeros(T,k);
for n=1:k,
M=simswitch(10,0.1,t);
Mavg(:,n)=cumsum(M)./t;
end
plot(t,Mavg);
375
120
100
From the graph, one can see that even after T = 600 minutes, each sample run produces a time
average M 600 around 100. Note that in Chapter 12, we will able Markov chains to prove that the
expected number of calls in the switch is in fact 100. However, note that even if T is large, M T is
still a random variable. From the above plot, one might guess that M 600 has a standard deviation
of perhaps σ = 2 or σ = 3. An exact calculation of the variance of M 600 is fairly difficult because
it is a sum of dependent random variables, each of which has a PDF that is in itself reasonably
difficult to calculate.
1
T
M (T ) = M (t). (1)
T
t=1
376
>> t=(1:600)’; We form and plot the time average using these commands will
>> M=simswitchd(100,1,t); yield a plot vaguely similar to that shown below.
>> Mavg=cumsum(M)./t;
>> plot(t,Mavg);
105
M(t) time average
100
95
0 100 200 300 400 500 600
t
We used the word “vaguely” because at t = 1, the time average is simply the number of arrivals in
the first minute, which is a Poisson (α = 100) random variable which has not been averaged. Thus,
the left side of the graph will be random for each run. As expected, the time average appears to
be converging to 100.
To simplify our algebra, we assume for large n that 0.1λT is an integer. In this case, n = 1.1λT
and
!) "
0.1λT λT
P [Sn > T ] ≈ Q − √ =Φ (3)
1.1λT 110
Thus for large λT , P [K = 1] is very small. For example, if λT = 1,000, P [Sn > T ] ≈ Φ(3.01) =
0.9987. If λT = 10,000, P [Sn > T ] ≈ Φ(9.5).
377
Problem 10.13.5 Solution
Following the problem instructions, we can write the function newarrivals.m. For convenience,
here are newarrivals and poissonarrivals side by side.
function s=newarrivals(lam,T) function s=poissonarrivals(lam,T)
%Usage s=newarrivals(lam,T) %arrival times s=[s(1) ... s(n)]
%Returns Poisson arrival times % s(n)<= T < s(n+1)
%s=[s(1) ... s(n)] over [0,T] n=ceil(1.1*lam*T);
n=poissonrv(lam*T,1); s=cumsum(exponentialrv(lam,n));
s=sort(T*rand(n,1)); while (s(length(s))< T),
s_new=s(length(s))+ ...
cumsum(exponentialrv(lam,n));
s=[s; s_new];
end
s=s(s<=T);
Clearly the code for newarrivals is shorter, more readable, and perhaps, with the help of
Problem 10.6.4, more logical than poissonarrivals. Unfortunately this doesn’t mean the code
runs better. Here are some cputime comparisons:
>> t=cputime;s=poissonarrivals(1,100000);t=cputime-t
t =
0.1110
>> t=cputime;s=newarrivals(1,100000);t=cputime-t
t =
0.5310
>> t=cputime;poissonrv(100000,1);t=cputime-t
t =
0.5200
>>
Unfortunately, these results were highly repeatable. The function poissonarrivals generated
100,000 arrivals of a rate 1 Poisson process required roughly 0.1 seconds of cpu time. The same
task took newarrivals about 0.5 seconds, or roughly 5 times as long! In the newarrivals code,
the culprit is the way poissonrv generates a single Poisson random variable with expected value
100,000. In this case, poissonrv generates the first 200,000 terms of the Poisson PMF! This
required calculation is so large that it dominates the work need to generate 100,000 uniform random
numbers. In fact, this suggests that a more efficient way to generate a Poisson (α) random variable
N is to generate arrivals of a rate α Poisson process until the N th arrival is after time 1.
378
In brownbarrier shown below, pb(1) tracks how often the process touches the left barrier at
−b while pb(2) tracks how often the right side barrier at b is reached. By symmetry, P [X(t) = b] =
P [X(t) = −b]. Thus if T is chosen very large, we should expect pb(1)=pb(2). The extent to which
this is not the case gives an indication of the extent to which we are merely estimating the barrier
probability. Here is the code and for each T ∈ {10,000, 100,000, 1,000,000}, here two sample runs:
The sample runs show that for α = 0.1 and b = 1 that the
Otherwise, the numerical simulations are not particularly instructive. Perhaps the most important
thing to understand is that the Brownian motion process with barriers is very different from the
ordinary Brownian motion process. Remember that for ordinary Brownian motion, the variance
of X(t) always increases linearly with t. For the process with barriers, X 2 (t) ≤ b2 and thus
Var[X(t)] ≤ b2 . In fact, for the process with barriers, the PDF of X(t) converges to a limit as t
becomes large. If you’re curious, you shouldn’t have much trouble digging in the library to find out
more.
379
time between arrivals to the switch. In this case, calls depart the switch faster than they arrive
which is impossible because each departing call was an arriving call at an earlier time. Similarly,
if 1/µ > 1/λ , then calls would be departing from the switch more slowly than they arrived. This
can happen to an overloaded switch; however, it’s impossible in this system because each arrival
departs after an exponential time. Thus the only possibility is that 1/µ = 1/λ. In the program
simswitchdepart.m, we plot a histogram of departure times for a switch with arrival rate λ against
the scaled exponential (λ) PDF λe−λx b where b is the histogram bin size. Here is the code:
function I=simswitchdepart(lambda,mu,T)
%Usage: I=simswitchdepart(lambda,mu,T)
%Poisson arrivals, rate lambda
%Exponential (mu) call duration
%Over time [0,T], returns I,
%the vector of inter-departure times
%M(i) = no. of calls at time t(i)
s=poissonarrivals(lambda,T);
y=s+exponentialrv(mu,length(s));
y=sort(y);
n=length(y);
I=y-[0; y(1:n-1)]; %interdeparture times
imax=max(I);b=ceil(n/100);
id=imax/b; x=id/2:id:imax;
pd=hist(I,x); pd=pd/sum(pd);
px=exponentialpdf(lambda,x)*id;
plot(x,px,x,pd);
xlabel(’\it x’);ylabel(’Probability’);
legend(’Exponential PDF’,’Relative Frequency’);
0.1
Exponential PDF
0.08 Relative Frequency
Probability
0.06
0.04
0.02
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
As seen in the figure, the match is quite good. Although this is not a carefully designed statistical
test of whether the inter-departure times are exponential random variables, it is enough evidence
that one may want to pursue whether such a result can be proven.
In fact, the switch in this problem is an example of an M/M/∞ queuing system for which
it has been shown that not only do the inter-departure have an exponential distribution, but the
steady-state departure process is a Poisson process. For the curious reader, details can be found,
for example, in the text Discrete Stochastic Processes by Gallager.
380
Problem Solutions – Chapter 11
We see that RY (t, τ ) only depends on the time difference τ . Thus Y (t) is wide sense stationary.
381
Problem 11.2.1 Solution
1
1 1
= RX [n + i − j] (4)
9
i=−1 j=−1
1
= (RX [n + 2] + 2RX [n + 1] + 3RX [n] + 2RX [n − 1] + RX [n − 2]) (5)
9
Substituting in RX [n] yields
⎧
⎪
⎪ 1/3 n=0
⎨
2/9 |n| = 1
RY [n] = (6)
⎪
⎪ 1/9 |n| = 2
⎩
0 otherwise
sin(2000πkTs ) + sin(1000πkTs )
RX [k] = RX (kTs ) = 10 (1)
2000πkTs
sin(0.5πk) + sin(0.25πk)
= 20 (2)
πk
= 10 sinc(0.5k) + 5 sinc(0.25k) (3)
382
(b) Theorem 11.5 also says that the output autocorrelation is
∞
∞
RW [n] = hi hj RY [n + i − j] (2)
i=−∞ j=−∞
1
1
= RY [n + i − j] (3)
i=0 j=0
For n = −3,
RW [−3] = RY [−4] + 2RY [−3] + RY [−2] = RY [−2] = 0.5 (5)
Following the same procedure, its easy to show that RW [n] is nonzero for |n| = 0, 1, 2.
Specifically, ⎧
⎪
⎪ 0.5 |n| = 3
⎪
⎪
⎨ 3 |n| = 2
RW [n] = 7.5 |n| = 1 (6)
⎪
⎪
⎪
⎪ 10 n = 0
⎩
0 otherwise
(c) The second moment of the output is E[Wn2 ] = RW [0] = 10. The variance of Wn is
Var[Wn ] = E Wn2 − (E [Wn ])2 = 10 − 22 = 6 (7)
(d) This part doesn’t require any probability. It just checks your knowledge of linear systems
and convolution. There is a bit of confusion because hn is used to denote both the filter
that transforms Xn to Yn as well as the filter that transforms Yn to Wn . To avoid confusion,
we will use ĥn to denote the filter that transforms Xn to Yn . Using Equation (11.25) for
discrete-time convolution, we can write
∞
∞
Wn = hj Yn−j , Yn−j = ĥi Xn−j−i . (8)
j=−∞ i=−∞
For each j, we make the substitution k = i + j, and then reverse the order of summation to
obtain
⎛ ⎞
∞ ∞ ∞ ∞
Wn = hj ĥk−j Xn−k = ⎝ hj ĥk−j ⎠ Xn−k . (10)
j=−∞ k=−∞ k=−∞ j=−∞
That is, the filter gn is the convolution of the filters ĥn and hn .
383
In the context of our particular problem, the filter ĥn that transforms Xn to Yn is given in
Example 11.5. The two filters are
ĥ = ĥ0 ĥ1 = 1/2 1/2 h = h0 h1 = 1 1 (12)
Keep in mind that hn = ĥn = 0 for n < 0 or n > 1. From Equation (11),
⎧
⎪
⎪ 1/2 k = 0
⎨
1 k=1
gk = ĥk + ĥk−1 = (13)
⎪
⎪ 1/2 k=2
⎩
0 otherwise
1
1
= hi hj RY [n + i − j] (3)
i=0 j=0
For n = −3,
Following the same procedure, it’s easy to show that RV [n] is nonzero for |n| = 0, 1, 2.
Specifically, ⎧
⎪
⎪ −0.5 |n| = 3
⎪
⎪
⎨ −1 |n| = 2
RV [n] = 0.5 |n| = 1 (6)
⎪
⎪
⎪
⎪ 2 n = 0
⎩
0 otherwise
(c) Since E[Vn ] = 0, the variance of the output is E[Vn2 ] = RV [0] = 2. The variance of Wn is
Var[Vn ] = E Wn2 RV [0] = 2 (7)
(d) This part doesn’t require any probability. It just checks your knowledge of linear systems
and convolution. There is a bit of confusion because hn is used to denote both the filter that
transforms Xn to Yn as well as the filter that transforms Yn to Vn . To avoid confusion, we will
384
use ĥn to denote the filter that transforms Xn to Yn . Using Equation (11.25) for discrete-time
convolution, we can write
∞
∞
Vn = hj Yn−j , Yn−j = ĥi Xn−j−i . (8)
j=−∞ i=−∞
For the inner sum, we make the substitution k = i+j and then reverse the order of summation
to obtain
⎛ ⎞
∞
∞ ∞
∞
Vn = hj ĥk−j Xn−k = ⎝ hj ĥk−j ⎠ Xn−k . (10)
j=−∞ k=−∞ k=−∞ j=−∞
That is, the filter fn is the convolution of the filters ĥn and hn . In the context of our particular
problem, the filter ĥn that transforms Xn to Yn is given in Example 11.5. The two filters are
ĥ = ĥ0 ĥ1 = 1/2 1/2 h = h0 h1 = 1 −1 (12)
Keep in mind that hn = ĥn = 0 for n < 0 or n > 1. From Equation (11),
⎧
⎪
⎪ 1/2 k=0
⎨
0 k=1
fk = ĥk − ĥk−1 = (13)
⎪
⎪ −1/2 k = 2
⎩
0 otherwise
385
Problem 11.2.6 Solution
The mean of Yn = (Xn + Yn−1 )/2 can be found by realizing that Yn is an infinite sum of the Xi ’s.
1 1 1
Yn = Xn + Xn−1 + Xn−2 + . . . (1)
2 4 8
Since the Xi ’s are each of zero mean, the mean of Yn is also 0. The variance of Yn can be expressed
as
∞
1 1 1 1 1
Var[Yn ] = + + + . . . Var[X] = ( )i σ 2 = ( − 1)σ 2 = σ 2 /3 (2)
4 16 64 4 1 − 1/4
i=1
386
Since E[Zi ] = 0, the mean function of the Xn process is
n−1
E [Xn ] = cn E [X0 ] + cn−1−i E [Zi ] = E [X0 ] (7)
i=0
Thus, for Xn to be a zero mean process, we require that E[X0 ] = 0. The autocorrelation function
can be written as
⎡! "⎛ ⎞⎤
n−1
n+k−1
RX [n, k] = E [Xn Xn+k ] = E ⎣ cn X0 + cn−1−i Zi ⎝cn+k X0 + cn+k−1−j Zj ⎠⎦ (8)
i=0 j=0
n−1
1 − c2n
RX [n, k] = c2n+k Var[X0 ] + c2(n−1)+k−2i) σ̄ 2 = c2n+k Var[X0 ] + σ̄ 2 ck (9)
1 − c2
i=0
n+k−1
= c2n+k Var[X0 ] + c−k c2(n+k−1−j) σ̄ 2 (12)
j=0
1 − c2(n+k)
= c2n+k σ 2 + σ̄ 2 c−k (13)
1 − c2
2
σ̄ −k σ̄ 2
= c +c2n+k
σ −
2
(14)
1 − c2 1 − c2
σ̄ 2 = (1 − c2 )σ 2 (15)
387
By continuing the same procedure, we can conclude that
n
Yn = aj+1 Xn−j + an Y0 (4)
j=0
To calculate the autocorrelation RY [m, k], we consider first the case when k ≥ 0.
⎡ ⎤
m
m+k
m m+k
CY [m, k] = E ⎣ a m−i+1
Xi am+k−j+1 ⎦
Xj = am−i+1 am+k−j+1 E [Xi Xj ] (7)
i=0 j=0 i=0 j=0
388
Problem 11.3.1 Solution
Sincethe process X n has expected value E[Xn ] = 0, we know that CX (k) = RX (k) = 2−|k| . Thus
X = X1 X2 X3 has covariance matrix
⎡ ⎤ ⎡ ⎤
20 2−1 2−2 1 1/2 1/4
CX = ⎣2−1 20 2−1 ⎦ = ⎣1/2 1 1/2⎦ . (1)
2−2 2−1 20 1/4 1/2 1
If we are using Matlab for calculations, it is best to decalre the problem solved at this point.
However, if you like algebra, we can write out the PDF in terms of the variables x1 , x2 and x3 . To
do so we find that the inverse covariance matrix is
⎡ ⎤
4/3 −2/3 0
C−1 ⎣ 5/3 −2/3⎦
X = −2/3 (3)
0 −2/3 4/3
A little bit of algebra will show that det(CX ) = 9/16 and that
We note that the components of X are iid Gaussian (0, 1) random variables. Hence X has
covariance matrix CX = I, the identity matrix. Since Y3 = HX,
⎡ ⎤
2 −2 1
CY3 = HCX H = HH = ⎣−2 3 −2⎦ . (3)
1 −2 3
389
Some calculation (by hand or by Matlab) will show that det(CY3 ) = 3 and that
⎡ ⎤
5 4 1
1⎣
C−1
Y3 = 4 5 2⎦ . (4)
3
1 2 2
Some algebra will show that
5y12 + 5y22 + 2y32 + 8y1 y2 + 2y1 y3 + 4y2 y3
y C−1
Y3 y = . (5)
3
This implies Y3 has PDF
1 1 −1
fY3 (y) = exp − y CY3 y (6)
(2π)3/2 [det (CY3 )]1/2 2
1 5y1 + 5y2 + 2y32 + 8y1 y2 + 2y1 y3 + 4y2 y3
2 2
= √ exp − . (7)
(2π)3/2 3 6
(b) To find the PDF of Y2 = Y1 Y2 , we start by observing that the covariance matrix of Y2
is just the upper left 2 × 2 submatrix of CY3 . That is,
2 −2 −1 3/2 1
CY2 = and CY2 = . (8)
−2 3 1 1
Since det(CY2 ) = 2, it follows that
1 1 −1
fY2 (y) = exp − y CY2 y (9)
(2π)3/2 [det (CY2 )]1/2 2
1 3 2
= √ exp − y1 − 2y1 y2 − y2 .
2
(10)
(2π)3/2 2 2
390
Since Y = HX, ⎡ ⎤
3/2 −3/8 9/16
CY = HCX H = ⎣−3/8 3/2 −3/8⎦ . (4)
9/16 −3/8 3/2
Some calculation (by hand or preferably by Matlab) will show that det(CY ) = 675/256 and that
⎡ ⎤
12 2 −4
1 ⎣ 2 11 2 ⎦ .
C−1
Y = (5)
15
−4 2 12
Some algebra will show that
12y12 + 11y22 + 12y32 + 4y1 y2 + −8y1 y3 + 4y2 y3
y C−1
Y y = . (6)
15
This implies Y has PDF
1 1 −1
fY (y) = exp − y CY y (7)
(2π)3/2 [det (CY )]1/2 2
16 12y1 + 11y22 + 12y32 + 4y1 y2 + −8y1 y3 + 4y2 y3
2
= √ exp − . (8)
(2π)3/2 15 3 30
This solution is another demonstration of why the PDF of a Gaussian random vector should be
left in vector form.
Comment: We know from Theorem 11.5 that Yn is a stationary Gaussian process. As a result,
the random variables Y1 , Y2 and Y3 are identically distributed and CY is a symmetric Toeplitz
matrix. This might make on think that the PDF fY (y) should be symmetric in the variables y1 ,
y2 and y3 . However, because Y2 is in the middle of Y1 and Y3 , the information provided by Y1 and
Y3 about Y2 is different than the information Y1 and Y2 convey about Y3 . This fact appears as
asymmetry in fY (y).
391
Since Y = HX, ⎡ ⎤
3/2 3/8 −9/16
CY = HCX H = ⎣ 3/8 3/2 3/8 ⎦ . (4)
−9/16 3/8 3/2
Some calculation (preferably by Matlab) will show that det(CY ) = 297/128 and that
⎡ ⎤
10/11 −1/3 14/33
C−1 ⎣ 5/6 −1/3 ⎦ .
Y = −1/3 (5)
14/33 −1/3 10/11
Some algebra will show that
10 2 5 2 10 2 2 28 2
y C−1
Y y = y + y + y − y1 y2 + y1 y3 − y2 y3 . (6)
11 1 6 2 11 3 3 33 3
This implies Y has PDF
1 1 −1
fY (y) = exp − y CY y (7)
(2π)3/2 [det (CY )]1/2 2
√
8 2 5 2 5 2 5 2 1 14 1
= √ exp − y1 − y2 − y3 + y1 y2 − y1 y3 + y2 y3 . (8)
(2π)3/2 3 33 11 12 11 3 33 3
This solution is yet another demonstration of why the PDF of a Gaussian random vector should
be left in vector form.
392
By expanding the square and taking the expectation, each cross-term is of the form E[Xi Xj ] =
RX [i − j], so that
0
2 1
1 6 1 6
e∗L = E − Xn−1 + Xn − Xn+1 − Xn−1 + Xn − Xn+1 (7)
7 7 7 7
1 12 2 36 12
= RX [0] − RX [1] + RX [2] + RX [0] − RX [1] + RX [0] (8)
49 49 7 49 7
86 96 2 3
= RX [0] − RX [1] + RX [2] = . (9)
49 49 7 7
This direct method is already tedious, even for a simple filter of order M = 2. A better way to
calculate the mean square error is to recall that Theorem 11.9 is just of Theorem 9.7 expressed in
the terminology of filters. Expressing part (c) of Theorem 9.7 in terms of the linear prediction filter
h, the mean square error of the predictor is
←−
e∗L = Var[Xn+1 ] − h RXn Xn+k (10)
←
− RX [2]
= RX [0] − h (11)
RX [1]
1/2 3
= 1 − −1/7 6/7 = . (12)
3/4 7
For an arbitrary filter order M , Equation (10) is a much simpler way to compute the mean square
error.
where
RX [0] RX [1] 1.1 0.75
RXn = = (2)
RX [1] RX [0] 0.75 1.1
and
Xn−1 RX [2] 0.5
RXn Xn+1 =E Xn+1 = = . (3)
Xn RX [1] 0.75
Thus the filter vector h satisfies
−1
←− h1 1.1 0.75 0.5 −0.0193
h = = = . (4)
h0 0.75 1.1 0.75 0.6950
Thus h = 0.6950 −0.0193 and the optimum linear predictor of Xn+1 given Xn and Xn−1 is
←
− Xn−1
X̂n+1 = h Xn = −0.0193 0.6950 = −0.0193Xn−1 + 0.6950Xn . (5)
Xn
To find the mean square error of this predictor, we can calculate it directly as
e∗L = E (X̂n+1 − Xn+1 )2 = E (−0.0193Xn−1 + 0.6950Xn − Xn+1 )2 . (6)
393
We can expand the square and take the expectation term by term since each cross-term is of the
form E[Xi Xj ] = RX [i − j]. This approach is followed in the solution of Problem 11.4.1 and it is
quite tedious. A better way to calculate the mean square error is to recall that Theorem 11.9 is
just Theorem 9.7 expressed in the terminology of filters. Expressing part (c) of Theorem 9.7 in
terms of the linear prediction filter h, the mean square error of the predictor is
←−
e∗L = Var[Xn+1 ] − h RXn Xn+k (7)
←
− RX [2]
= RX [0] − h (8)
RX [1]
1/2
= 1.1 − −0.0193 0.6950 = 0.5884. (9)
3/4
Comment: It is instructive to compare this solution to the solution of Problem 11.4.1 where the
random process, denoted X̃n here to distinguish it from Xn in this problem, has autocorrelation
function
1 − 0.25 |k| |k| ≤ 4,
RX̃ [k] = (10)
0 otherwise.
The difference is simply that RX̃ [0] = 1, rather than RX [0] = 1.1 as in this problem. This difference
corresponds to adding an iid noise sequence to X̃n to create Xn . That is,
Xn = X̃n + Zn (11)
where Zn is an iid additive noise sequence with autocorrelation function RZ [k] = 0.1δ[k] that is
independent of the Xn process. Thus Xn in this problem can be viewed as a noisy version of
X̃n in Problem 11.4.1. Because
the X̃n process
is less noisy, the optimal predictor filter of X̃n+1
given X̃n−1 and X̃n is h̃ = 6/7 −1/7 = 0.8571 −0.1429 , which places more emphasis on the
current value X̃n in predicting the next value.
In addition, the mean squared error of the predictor of X̃n+1 is only 3/7 = 0.4285, which is less
than 0.5884. Not surprisingly, the noise in the Xn process reduces the performance of the predictor.
394
This implies
−1
←
− 1 + η2 c c
h = (3)
c 1 + η2 1
1 1 + η2 −c c
= (4)
(1 + η 2 )2 − c2 −c 1 + η2 1
1 cη 2
= . (5)
(1 + η 2 )2 − c2 1 + η 2 − c2
Note that we always find that e∗L < Var[Xn ] = 1 simply because the optimal estimator cannot be
worse than the blind estimator that ignores the observation Yn .
From Theorem 9.7, the mean square error of the filter output is
Equations (3) and (4) are general expressions for the means square error of the optimal linear filter
that can be applied to any situation described by Theorem 11.11.
To apply this result to the problem at hand, we observe that RX [0] = c0 = 1 and that
←− 1 cη 2 RX [1] c
h = , RXn Xn = = (5)
(1 + η 2 )2 − c2 1 + η 2 − c2 RX [0] 1
395
This implies
←
−
e∗L = RX [0] − h RXn Xn (6)
1 2 c
=1− cη 1 + η − c
2 2 (7)
(1 + η 2 )2 − c2 1
c2 η 2 + 1 + η 2 − c2
=1− (8)
(1 + η 2 )2 − c2
1 + η 2 − c2
= η2 (9)
(1 + η 2 )2 − c2
The remaining question is what value of c minimizes the mean square error e∗L . The usual
de∗
approach is to set the derivative dcL to zero. This would $yield the incorrect answer c = 0. In fact,
d2 e ∗ $
evaluating the second derivative at c = 0 shows that dc2L $ < 0. Thus the mean square error e∗L
c=0
is maximum at c = 0. For a more careful analysis, we observe that e∗L = η 2 f (x) where
a−x
f (x) = , (10)
a2 − x
with x = c2 , and a = 1 + η 2 . In this case, minimizing f (x) is equivalent to minimizing the mean
square error. Note that for RX [k] to be a respectable autocorrelation function, we must have
|c| ≤ 1. Thus we consider only values of x in the interval 0 ≤ x ≤ 1. We observe that
df (x) a2 − a
=− 2 (11)
dx (a − x)2
Since a > 1, the derivative is negative for 0 ≤ x ≤ 1. This implies the mean square error is
minimized by making x as large as possible, i.e., x = 1. Thus c = 1 minimizes the mean square
error. In fact c = 1 corresponds to the autocorrelation function RX [k] = 1 for all k. Since each Xn
has zero expected value, every pair of sample Xn and Xm has correlation coefficient
Cov [Xn , Xm ] RX [n − m]
ρXn ,Xm = = = 1. (12)
Var[Xn ] Var[Xm ] RX [0]
That is, c = 1 corresponds to a degenerate process in which every pair of samples Xn and Xm are
perfectly correlated. Physically, this corresponds to the case where where the random process Xn
is generated by generating a sample of a random variable X and setting Xn = X for all n. The
observations are then of the form Yn = X +Zn . That is, each observation is just a noisy observation
of the random variable X. For c = 1, the optimal filter
1 1
h= 2
. (13)
2+η 1
396
square error linear estimator. In that theorem, Xn and Yn−1 play the roles of X and Y . That is,
our estimate X̂n of Xn is
Var[Xn ] 1/2
X̂n = X̂L (Yn−1 ) = ρXn ,Yn−1 (Yn−1 − E [Yn−1 ]) + E [Xn ] (1)
Var[Yn−1 ]
By recursive application of Xn = cXn−1 + Zn−1 , we obtain
n
n
Xn = a X0 + aj−1 Zn−j (2)
j=1
n
The expected value of Xn is E[Xn ] = an E[X0 ] + j=1 a
j−1 E[Z
n−j ] = 0. The variance of Xn is
n
n
Var[Xn ] = a2n Var[X0 ] + [aj−1 ]2 Var[Zn−j ] = a2n Var[X0 ] + σ 2 [a2 ]j−1 (3)
j=1 j=1
where β 2 = η 2 /(d2 σ 2 ). From Theorem 9.4, the mean square estimation error at step n
1 + β2
e∗L (n) = E[(Xn − X̂n )2 ] = Var[Xn ](1 − ρ2Xn ,Yn−1 ) = σ 2 (11)
1 + β 2 (1 − c2 )
We see that mean square estimation error e∗L (n) = e∗L , a constant for all n. In addition, e∗L is an
increasing function β.
397
Problem 11.5.1 Solution
To use Table 11.1, we write RX (τ ) in terms of the autocorrelation
sin(πx)
sinc(x) = . (1)
πx
In terms of the sinc(·) function, we obtain
RX (τ ) = 10 sinc(2000τ ) + 5 sinc(1000τ ). (2)
From Table 11.1,
10 f 5 f
SX (f ) = rect + rect (3)
2,000 2000 1,000 1,000
Here is a graph of the PSD.
0.012
0.01
0.008
SX(f)
0.006
0.004
0.002
0
−1500 −1000 −500 0 500 1000 1500
f
At this point, we consider the cases α > 0 and α < 0 separately. For α > 0, the substitution
τ = ατ yields
'
1 ∞ SX (f /α)
SY (f ) = RX (τ )e−j2π(f /α)τ dτ = (3)
α −∞ α
When α < 0, we start with Equation (2) and make the substitution τ = −ατ , yielding
' ∞
1 f
SY (f ) = RX (−τ )e−j2π −α τ dτ . (4)
−α −∞
Since RX (−τ ) = RX (τ ),
' ∞
1 f 1 f
−j2π −α τ
SY (f ) = RX (τ )e dτ . = SX (5)
−α −∞ −α −α
For −α = |α| for α < 0, we can combine the α > 0 and α < 0 cases in the expression
1 f
SY (f ) = SX . (6)
|α| |α|
398
Problem 11.6.1 Solution
Since the random sequence Xn has autocorrelation function
We can find the PSD directly from Table 11.2 with 0.1|k| corresponding to a|k| . The table yields
To complete the problem, we need to show that SXY (−f ) = [SXY (f )]∗ . First we note that since
RXY (τ ) is real valued, [RXY (τ )]∗ = RXY (τ ). This implies
' ∞
∗
[SXY (f )] = [RXY (τ )]∗ [e−j2πf τ ]∗ dτ (4)
−∞
' ∞
= RXY (τ )e−j2π(−f )τ dτ (5)
−∞
= SXY (−f ) (6)
2 · 104 1
SX (f ) = H(f ) = (1)
(2πf )2 + 104 a + j2πf
By Theorem 11.16,
2 · 104
SY (f ) = |H(f )|2 SX (f ) = (2)
[(2πf )2 + a2 ][(2πf )2 + 104 ]
399
Note that this method will work only if a = 100. This same method was also used in
Example 11.22. The values of A and B can be found by
$ $
2 · 104 $ −2 · 104 2 · 104 $ 2 · 104
A= $ = B = $ = (4)
$
(2πf )2 + 104 f = ja a2 − 104 $
a2 + 104 f = j100 a2 − 104
2π 2π
−104 /a 2a 1 200
SY (f ) = + (5)
a2 − 104 (2πf )2 + a2 a2 − 104 (2πf )2 + 104
Since e−c|τ | and 2c/((2πf )2 + c2 ) are Fourier transform pairs for any constant c > 0, we see
that
−104 /a −a|τ | 100
RY (τ ) = 2 e + 2 e−100|τ | (6)
a − 10 4 a − 104
(b) To find a = 1/(RC), we use the fact that
−104 /a 100
E Y 2 (t) = 100 = RY (0) = 2 + 2 (7)
a − 10 4 a − 104
Rearranging, we find that a must satisfy
Recall that a = 100 is not a valid solution because our expansion of SY (f ) was not valid
√ Also, we require a > 0 in order to take the inverse transform of SY (f ). Thus
for a = 100.
a = −50 + 2501 ≈ 0.01 and RC ≈ 100.
(c) Since |H(f )| = 1 for f ∈ [−B, B], the average power of Y (t) is
' ∞ '
B
E Y 2 (t) = SY (f ) df = df = 2B (2)
−∞ −B
(d) Since the white noise W (t) has zero mean, the mean value of the filter output is
400
Problem 11.8.3 Solution
Since SY (f ) = |H(f )|2 SX (f ), we first find
SY (f ) = (a21 + a22 )SX (f ) + a1 a2 SX (f ) e−j2πf (t2 −t1 ) + a1 a2 SX (f ) e−j2πf (t1 −t2 ) (4)
(b) From Table 11.1, the input has power spectral density
1 2
SX (f ) = e−πf /4 (2)
2
The output power spectral density is
1
2
e−πf /4 |f | ≤ 2
SY (f ) = |H(f )| SX (f ) =
2
2 (3)
0 otherwise
This integral cannot be expressed in closed form. However, we can express it inthe form of
the integral of a standardized Gaussian PDF by making the substitution f = z 2/π. With
this subsitution,
' √2π
2 1 −z 2 /2
E Y (t) = √ √ e dz (5)
2π − 2π
√ √
= Φ( 2π) − Φ(− 2π) (6)
√
= 2Φ( 2π) − 1 = 0.9876 (7)
The output power almost equals the input power because the filter bandwidth is sufficiently
wide to pass through nearly all of the power of the input.
401
Problem 11.8.5 Solution
By making the substitution, f = 50 tan θ, we have df = 50 sec2 θ dθ. Using the identity
1 + tan2 θ = sec2 θ, we have
' tan−1 (2)
100 tan−1 (2)
E Y 2 (t) = 8 2 dθ = = 1.12 × 10−7 (9)
10 π 0 106 π 2
402
(a) From Table 11.1, we observe that
8 1
SX (f ) = H(f ) = (2)
16 + (2πf )2 7 + j2πf
From Theorem 11.17,
8
SXY (f ) = H(f )SX (f ) = (3)
[7 + j2πf ][16 + (2πf )2 ]
(b) To find the cross correlation, we need to find the inverse Fourier transform of SXY (f ). A
straightforward way to do this is to use a partial fraction expansion of SXY (f ). That is, by
defining s = j2πf , we observe that
8 −8/33 1/3 1/11
= + + (4)
(7 + s)(4 + s)(4 − s) 7+s 4+s 4−s
Hence, we can write the cross spectral density as
−8/33 1/3 1/11
SXY (f ) = + + (5)
7 + j2πf 4 + j2πf 4 − jπf
Unfortunately, terms like 1/(a − j2πf ) do not have an inverse transforms. The solution is to
write SXY (f ) in the following way:
−8/33 8/33 1/11 1/11
SXY (f ) = + + + (6)
7 + j2πf 4 + j2πf 4 + j2πf 4 − j2πf
−8/33 8/33 8/11
= + + (7)
7 + j2πf 4 + j2πf 16 + (2πf )2
(8)
(a) Since E[N (t)] = µN = 0, the expected value of the output is µY = µN H(0) = 0.
403
(d) Since N (t) is a Gaussian process, Theorem 11.3 says Y (t) is a Gaussian process. Thus the
random variable Y (t) is Gaussian with
E [Y (t)] = 0 Var[Y (t)] = E Y 2 (t) = 10−3 (5)
Thus the impulse response h(v) depends on t. That is, the filter response is linear but not time
invariant. Since Theorem 11.2 requires that h(t) be time invariant, this example does not violate
the theorem.
(a) Note that |H(f )| = 1. This implies SM̂ (f ) = SM (f ). Thus the average power of M̂ (t) is
' ∞ ' ∞
q̂ = SM̂ (f ) df = SM (f ) df = q (1)
−∞ −∞
404
To find the expected value of the random phase cosine, for an integer n = 0, we evaluate
' 2π
1
E [cos(2πfc t + nΘ)] = cos(2πfc t + nθ) dθ (5)
0 2π
1
= sin(2πfc t + nθ)|2π
0 (6)
2nπ
1
= (sin(2πfc t + 2nπ) − sin(2πfc t)) = 0 (7)
2nπ
Similar steps will show that for any integer n = 0, the random phase sine also has expected
value
E [sin(2πfc t + nΘ)] = 0 (8)
Using the trigonometric identity cos2 φ = (1 + cos 2φ)/2, we can show
2 1
E cos (2πfc t + Θ) = E (1 + cos(2π(2fc )t + 2Θ)) = 1/2 (9)
2
Similarly,
1
2
E sin (2πfc t + Θ) = E (1 − cos(2π(2fc )t + 2Θ)) = 1/2 (10)
2
Since M (t) and M̂ (t) are independent of Θ, the average power of the upper sideband signal
is
E U 2 (t) = E M 2 (t) E cos2 (2πfc t + Θ) + E M̂ 2 (t) E sin2 (2πfc t + Θ) (12)
− E M (t)M̂ (t) E [2 cos(2πfc t + Θ) sin(2πfc t + Θ)] (13)
= q/2 + q/2 + 0 = q (14)
405
We see that for all τ = 0, RV (t, t + τ ) = 0. Thus we need to find the expected value of
(d) Since E[V (t)] = 0 and since RV (t, τ ) = RV (τ ), we see that V (t) is a wide sense stationary
process. Since L(f ) is a linear time invariant filter, the filter output Y (t) is also a wide sense
stationary process.
(e) The filter input V (t) has power spectral density SV (f ) = 12 10−15 . The filter output has power
spectral density −15
10 /2 |f | ≤ B
SY (f ) = |L(f )| SV (f ) =
2
(14)
0 otherwise
The average power of Y (t) is
' ∞ '
B
1 −15
2
E Y (t) = SY (f ) df = 10 df = 10−15 B (15)
−∞ −B 2
SX (f )
Ĥ(f ) = (1)
SX (f ) + SN (f )
406
It follows that the optimal filter is
1 f
rect 2W
2W 105 f
Ĥ(f ) = = 5 rect . (3)
1
rect 2Wf
+ 10−5 10 + 2W 2W
2W
From Table 11.2, we see that the filter Ĥ(f ) has impulse response
109 −α|τ |
ĥ(τ ) = e (4)
2α
√
where α = 1.025 × 109 = 3.20 × 104 .
(b) From Equation (11.147), the minimum mean square error is
' ∞ ' ∞
SX (f ) SN (f )
e∗L = df = Ĥ(f )SN (f ) df (5)
−∞ SX (f ) + SN (f ) −∞
' ∞
−5
= 10 Ĥ(f ) df (6)
−∞
104
= 10−5 ĥ(0) = = 0.1562. (7)
2α
407
Problem 11.10.1 Solution
Although it is straightforward to calculate sample paths of Yn using the filter response Yn =
1 1
2 Yn−1 + 2 Xn directly, the necessary loops makes for a slow program. A solution using vectors and
matrices tends to run faster. From the filter response, we can write
1
Y1 = X1 (1)
2
1 1
Y2 = X1 + X2 (2)
4 2
1 1 1
Y3 = X1 + X2 + X3 (3)
8 4 2
..
. (4)
1 1 1
Yn = n
X1 + n−1 X2 + · · · + Xn (5)
2 2 2
In vector notation, these equations become
⎡ ⎤ ⎡ ⎤⎡ ⎤
Y1 1/2 0 ··· 0 X1
⎢ Y2 ⎥ ⎢ . .. ⎥ ⎢ X ⎥
⎢ ⎥ ⎢ 1/4 1/2 . . . ⎥
⎥⎢ 2⎥
⎢ .. ⎥ = ⎢
⎢ ⎥ ⎢ .. ⎥ . (6)
⎣ . ⎦ ⎣ .. . . .. . .. 0 ⎦⎣ . ⎦
Yn 1/2n · · · 1/4 1/2 Xn
Y H X
When X is a column of iid Gaussian (0, 1) random variables, the column vector Y = HX is a single
sample path of Y1 , . . . , Yn . When X is an n × m matrix of iid Gaussian (0, 1) random variables,
each column of Y = HX is a sample path of Y1 , . . . , Yn . In this case, let matrix entry Yi,j denote
a sample Yi of the jth sample path. The samples Yi,1 , Yi,2 , . . . , Yi,m are iid samples of Yi . We can
estimate the mean and variance of Yi using the sample mean Mn (Yi ) and sample variance Vm (Yi )
of Section 7.3. These estimates are
1 1
m m
Mn (Yi ) = Yi,j , V (Yi ) = (Yi,j − Mn (Yi ))2 (7)
m m−1
j=1 j=1
function ymv=yfilter(m);
%ymv(i) is the mean and var (over m paths) of y(i),
%the filter output of 11.2.6 and 11.10.1
X=randn(500,m);
H=toeplitz([(0.5).^(1:500)],[0.5 zeros(1,499)]);
Y=H*X;
yav=sum(Y,2)/m;
yavmat=yav*ones(1,m);
yvar=sum((Y-yavmat).^2,2)/(m-1);
ymv=[yav yvar];
408
0.6
Sample mean
0.4 Sample variance
0.2
−0.2
−0.4
0 50 100 150 200 250 300 350 400 450 500
n
We see that each sample mean is small, on the other order of 0.1. Note that E[Yi ] = 0. For m = 100
samples, the sample mean has variance 1/m = 0.01 and standard deviation 0.1. Thus it is to be
expected that we observe sample mean values around 0.1.
Also, it can be shown (in the solution to Problem 11.2.6 for example) that as i becomes large,
Var[Yi ] converges to 1/3. Thus our sample variance results are also not surprising.
Comment: Although within each sample path, Yi and Yi+1 are quite correlated, the sample
means of Yi and Yi+1 are not very correlated when a large number of sample paths are averaged.
Exact calculation of the covariance of the sample means of Yi and Yi+1 might be an interesting
exercise. The same observations apply to the sample variance as well.
Problem 11.10.2 Solution
This is just a Matlab question that has nothing to do with probability. In the Matlab oper-
ation R=fft(r,N), the shape of the output R is the same as the shape of the input r. If r is
a column vector, then R is a column vector. If r is a row vector, then R is a row vector. For
fftc to work the same way, the shape of n must be the same as the shape of R. The instruction
n=reshape(0:(N-1),size(R)) does this.
Problem 11.10.3 Solution
The program cospaths.m generates Gaussian sample paths with the desired autocorrelation func-
tion RX (k) = cos(0.04 ∗ pi ∗ k). Here is the code:
function x=cospaths(n,m);
%Generate m sample paths of length n of a
%Gaussian process with ACF R[k]=cos(0.04*pi*k)
k=0:n-1;
rx=cos(0.04*pi*k)’;
x=gaussvector(0,rx,m);
The program is simple because if the second input parameter to gaussvector is a length m vector
rx, then rx is assumed to be the first row of a symmetric Toeplitz covariance matrix. The commands
x=cospaths(100,10);plot(x) will produce a graph like this one:
2
1
Xn
−1
−2
0 10 20 30 40 50 60 70 80 90 100
n
409
We note that every sample path of the process is Gaussian random sequence. However, it would
also appear from the graph that every sample path is a perfect sinusoid. this may seem strange if
you are used to seeing Gaussian processes simply as noisy processes or fluctating Brownian motion
processes. However, in this case, the amplitude and phase of each sample path is random such
that over the ensemble of sinusoidal sample functions, each sample Xn is a Gaussian (0, 1) random
variable.
Finally, to confirm that that each sample path is a perfect sinusoid, rather than just resembling
a sinusoid, we calculate the DFT of each sample path. The commands
>> x=cospaths(100,10);
>> X=fft(x);
>> stem((0:99)/100,abs(X));
120
100
80
Xk
60
40
20
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
k/100
The above plot consists of ten overlaid 100-point DFT magnitude stem plots, one for each Gaussian
sample function. Each plot has exactly two nonzero components at frequencies k/100 = 0.02 and
(100 − k)/100 = 0.98 corresponding to each sample path sinusoid having frequency 0.02. Note that
the magnitude of each 0.02 frequency component depends on the magnitude of the corresponding
sinusoidal sample path.
The same discussion goes on to give an example where x = A\b is both faster and more accurate.
410
However, in this problem RX [k] = (−0.9)|k| has infinite duration. When we pass the truncated
representation rx of length m = 6 and request lmsepredictor(rx,M) for M ≥ 6, the result is that
rx is incorrectly padded with zeros. The resulting filter output will be the LMSE filter for the filter
response
(−0.9)|k| |k| ≤ 5,
RX [k] = (1)
0 otherwise,
rather than the LMSE filter for the true autocorrelation function.
To find the power spectral density SX (φ), we need to find the DTFT of sinc(φ0 k) Unfortunately,
this was omitted from Table 11.2 so we now take a detour and derive it here. As with any derivation
of the transform of a sinc function, we guess the answer and calculate the inverse transform. In
this case, suppose
1 1 |φ| ≤ φ0 /2,
SX (φ) = rect(φ/φ0 ) = (2)
φ0 0 otherwise.
We find RX [k] from the inverse DTFT. For |φ0 | ≤ 1,
' '
1/2
j2πφk 1 φ0 /2
1 ejπφ0 k − e−jπφ0 k
RX [k] = SX (φ) e dφ = ej2πφk dφ = = sinc(φ0 k) (3)
−1/2 φ0 −φ0 /2 φ0 j2πk
Now we apply this result to take the transform of RX [k] in Equation (1). This yields
10 5
SX (φ) = rect(φ/0.5) + rect(φ/0.25). (4)
0.5 0.25
Ideally, an 2N + 1-point DFT would yield a sampled version of the DTFT SX (φ). However, the
truncation of the autocorrelation RX [k] to 201 points results in a difference. For N = 100, the DFT
will be a sampled version of the DTFT of RX [k] rect(k/(2N + 1)). Here is a Matlab program that
shows the difference when the autocorrelation is truncated to 2N + 1 terms.
function DFT=twosincsdft(N);
%Usage: SX=twosincsdft(N);
%Returns and plots the 2N+1
%point DFT of R(-N) ... R(0) ... R(N)
%for ACF R[k] in Problem 11.2.2
k=-N:N;
rx=10*sinc(0.5*k) + 5*sinc(0.25*k);
DFT=fftc(rx);
M=ceil(0.6*N);
phi=(0:M)/(2*N+1);
stem(phi,abs(DFT(1:(M+1))));
xlabel(’\it \phi’);
ylabel(’\it S_X(\phi)’);
411
50
40
30
SX(φ)
20
10
0
0 0.05 0.1 0.15 0.2 0.25 0.3
φ
From the stem plot of the DFT, it is easy to see the deviations from the two rectangles that make
up the DTFT SX (φ). We see that the effects of windowing are particularly pronounced at the
break points.
Comment: In twosincsdft, DFT must be real-valued since it is the DFT of an autocorrelation
function. Hence the command stem(DFT) should be sufficient. However, due to numerical precision
issues, the actual DFT tends to have a tiny imaginary hence we use the abs operator.
where RXn is given by Theorem 11.6 and RXn Xn+1 is given by Equation (11.66). In this problem,
⎡⎡ ⎤ ⎤ ⎡ ⎤
Xn−M +1 RX [M ]
⎢⎢ .. ⎥ ⎥ ⎢ .. ⎥
RXn Xn+1 = E ⎣⎣ . ⎦ Xn+1 ⎦ = ⎣ . ⎦. (2)
Xn RX [1]
For M > 2, we use the Matlab function onesteppredictor(r,M) to perform the calculations.
function [h,e]=onesteppredictor(r,M);
%usage: h=onesteppredictor(r,M);
%input: r=[R_X(0) R_X(1) .. R_X(m-1)]
%assumes R_X(n)==0 for n >=m
%output=vector h for lmse predictor
% xx=h’[X(n),X(n-1),..,X(n-M+1)] for X(n+1)
m=length(r);
r=[r(:);zeros(M-m+1,1)];%append zeros if needed
RY=toeplitz(r(1:M));
RYX=r(M+1:-1:2);
h=flipud(RY\RYX);
e=r(1)-(flipud(h))’*RYX;
412
The code is pretty straightforward. Here are two examples just to show it works.
>> [h2,e2]=onesteppredictor(r,2)
h2 =
0.8571
-0.1429
e2 =
0.4286
>> [h4,e4]=onesteppredictor(r,4)
h4 =
0.8000
0.0000
-0.0000
-0.2000
e4 =
0.4000
The problem also requested that we calculate the mean square error as a function of the filter order
M . Here is a script and the resulting plot of the MSE.
%onestepmse.m 0.44
r=1-0.25*(0:3); 0.42
ee=[ ]; 0.4
for M=2:10,
MSE
0.38
[h,e]=onesteppredictor(r,M); 0.36
ee=[ee,e];
0.34
end
0.32
plot(2:10,ee,’-d’);
xlabel(’\itM’); 0.3
2 4 6 8 10
ylabel(’\it MSE’); M
413
function h=kpredictor(r,M,k);
%usage: h=kpredictor(r,M,k);
%input: r=[R_X(0) R_X(1) .. R_X(m-1)]
%assumes R_X(n)==0 for n >=m
%output=vector a
% for lmse predictor xx=h’[X(n),X(n-1),..,X(n-N+1)] for X(n+k)
m=length(r);
r=[r(:);zeros(M-m+1,1)]; %appends zeros if needed
RY=toeplitz(r(1:M));
RYX=r(1+k:M+k);
h=flipud(RY\RYX);
Some sample paths for Xn and X̂n for the requested parameters are shown below. In each pair,
the one-step prediction X̂n is marked by dots.
2 2
Xn
0 0
X
−2 −2
0 10 20 30 40 50 0 10 20 30 40 50
n n
(a) c = 0.9, d = 10 (d) c = 0.6, d = 10
414
2 2
Xn
n
0 0
X
−2 −2
0 10 20 30 40 50 0 10 20 30 40 50
n n
(b) c = 0.9, d = 1 (e) c = 0.6, d = 1
2 2
Xn
n
0 0
X
−2 −2
0 10 20 30 40 50 0 10 20 30 40 50
n n
(c) c = 0.9, d = 0.1 (f ) c = 0.6, d = 0.1
The mean square estimation error at step n was found to be
d2 + 1
e∗L (n) = e∗L = σ 2 (2)
d2 + (1 − c2 )
We see that the mean square estimation error is e∗L (n) = e∗L , a constant for all n. In addition, e∗L
is a decreasing function of d. In graphs (a) through (c), we see that the predictor tracks Xn less
well as d decreases because decreasing d corresponds to decreasing the contribution of Xn−1 to the
measurement Yn−1 . Effectively, the impact of the measurement noise is increased. As d decreases,
the predictor places less emphasis on the measurement Yn and instead makes predictions closer to
E[X] = 0. That is, when d is small in graphs (c) and (f), the predictor stays close to zero. With
respect to c, the performance of the predictor is less easy to understand. In Equation (11), the
mean square error e∗L is the product of
σ2 (d2 + 1)(1 − c2 )
Var[Xn ] = 1 − ρ2Xn ,Yn−1 = (3)
1 − c2 d2 + (1 − c2 )
As a function of increasing c2 , Var[Xn ] increases while 1 − ρ2Xn ,Yn−1 decreases. Overall, the mean
square error e∗L is an increasing function of c2 . However, Var[X] is the mean square error obtained
using a blind estimator that always predicts E[X] while 1 − ρ2Xn ,Yn−1 characterizes the extent to
which the optimal linear predictor is better than the blind predictor. When we compare graphs
(a)-(c) with c = 0.9 to graphs (d)-(f) with c = 0.6, we see greater variation in Xn for larger a but
in both cases, the predictor worked well when d was large.
415
Problem Solutions – Chapter 12
P20
The only difference between this chain and an arbitrary 3 state chain is that transitions from 0,
the OFF state, to state 2, the mini-OFF state, are not allowed. From the problem statement, the
corresponding Markov chain is
⎡ ⎤ ⎡ ⎤
P00 P01 P02 0.999929 0.000071 0
P = ⎣P10 P11 P12 ⎦ = ⎣0.000100 0.899900 0.1⎦ . (1)
P20 P21 P22 0.000100 0.699900 0.3
416
Problem 12.1.5 Solution
In this problem, it is helpful to go fact by fact to identify the information given.
• “. . . each read or write operation reads or writes an entire file and that files contain a geometric
number of sectors with mean 50.”
This statement says that the length L of a file has PMF
(1 − p)l−1 p l = 1, 2, . . .
PL (l) = (1)
0 otherwise
with p = 1/50 = 0.02. This says that when we write a sector, we will write another sector
with probability 49/50 = 0.98. In terms of our Markov chain, if we are in the write state, we
write another sector and stay in the write state with probability P22 = 0.98. This fact also
implies P20 + P21 = 0.02.
Also, since files that are read obey the same length distribution,
• “Further, suppose idle periods last for a geometric time with mean 500.”
This statement simply says that given the system is idle, it remains idle for another unit of
time with probability P00 = 499/500 = 0.998. This also says that P01 + P02 = 0.002.
• “After an idle period, the system is equally likely to read or write a file.”
Given that at time n, Xn = 0, this statement says that the conditional probability that
P01
P [Xn+1 = 1|Xn = 0, Xn+1 = 0] = = 0.5 (3)
P01 + P02
Combined with the earlier fact that P01 + P02 = 0.002, we learn that
• “However, on completion of a write operation, a read operation follows with probability 0.6.”
Now we find that given that at time n, Xn = 2, the conditional probability that
P21
P [Xn+1 = 1|Xn = 2, Xn+1 = 2] = = 0.6 (7)
P20 + P21
Combined with the earlier fact that P20 + P21 = 0.02, we learn that
417
The complete tree is
0.998 0.98 0.98
0.001 0.016
0 1 2
0.004 0.012
0.001
0.008
The key step is in observing that the Markov property of Xn implies that Xmn summarizes the
past history of the Xn process. That is, given Xmn , Xm(n+1) is independent of Xmk for all k < n.
Finally, this implies that the state X̂n has one-step state transition probabilities equal to the
m-step transition probabilities for the Markov chain Xn . That is, P̂ = Pm .
This chain is a special case of the two-state chain in Example 12.5 and Example 12.6 with p = 0.8
and q = 0.9. You may wish to derive the eigenvalues and eigenvectors of P in order to diagonalize
and then find Pn . Or, you may wish just to refer to Example 12.6 which showed that the chain
has eigenvalues λ1 = 1 and λ2 = 1 − (p + q) = −0.7 and that the n-step transition matrix is
n P00 (n) P01 (n) 1 q p λn2 p −p
P = = + (2)
P10 (n) P11 (n) p+q q p p + q −q q
1 0.9 0.8 (−0.7)n 0.8 −0.8
= + . (3)
1.7 0.9 0.8 1.7 −0.9 0.9
418
Problem 12.2.2 Solution
From the given Markov chain, the state transition matrix is
⎡ ⎤ ⎡ ⎤
P00 P01 P02 0.5 0.5 0
P = ⎣P10 P11 P12 ⎦ = ⎣ 0.5 0.5 0⎦ (1)
P20 P21 P22 0.25 0.25 0.5
The way to find Pn is to make the decomposition P = SDS−1 where the columns of S are the
eigenvectors of P and D is a diagonal matrix containing the eigenvalues of P. The eigenvalues are
λ1 = 1 λ2 = 0 λ3 = 1/2 (2)
The corresponding eigenvectors are
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 −1 0
s1 = ⎣1⎦ s2 = ⎣ 1 ⎦ s3 = ⎣0⎦ (3)
1 0 1
The decomposition of P is
⎡ ⎤⎡ ⎤⎡ ⎤
1 −1 0 1 0 0 0.5 0.5 0
P = SDS−1 = ⎣1 1 0⎦ ⎣0 0 0 ⎦ ⎣−0.5 0.5 0⎦ (4)
1 0 1 0 0 0.5 −0.5 −0.5 1
Finally, Pn is
⎡ ⎤⎡ ⎤⎡ ⎤
1 −1 0 1 0 0 0.5 0.5 0
Pn = SDn S−1 = ⎣1 1 0⎦ ⎣0 0 0 ⎦ ⎣−0.5 0.5 0⎦ (5)
1 0 1 0 0 (0.5)n −0.5 −0.5 1
⎡ ⎤
0.5 0.5 0
= ⎣ 0.5 0.5 0 ⎦ (6)
0.5 − (0.5)n+1 0.5 − (0.5)n+1 (0.5)n
419
Problem 12.3.2 Solution
At time n − 1, let pi (n − 1) denote the state probabilities. By Theorem 12.4, the probability of
state k at time n is
∞
pk (n) = pi (n − 1)Pik (1)
i=0
Since Pik = q for every state i,
∞
pk (n) = q pi (n − 1) = q (2)
i=0
Thus for any time n > 0, the probability of state k is q.
Problem 12.3.3 Solution
In this problem, the arrivals are the occurrences of packets in error. It would seem that N (t) cannot
be a renewal process because the interarrival times seem to depend on the previous interarrival
times. However, following a packet error, the sequence of packets that are correct (c) or in error
(e) up to and including the next error is given by the tree
Assuming that sending a packet takes one unit of time, the time X until the next packet error has
the PMF ⎧
⎨ 0.9 x=1
PX (x) = 0.001(0.99)x−2 x = 2, 3, . . . (1)
⎩
0 otherwise
Thus, following an error, the time until the next error always has the same PMF. Moreover, this
time is independent of previous interarrival times since it depends only on the Bernoulli trials
following a packet error. It would appear that N (t) is a renewal process; however, there is one
additional complication. At time 0, we need to know the probability p of an error for the first
packet. If p = 0.9, then X1 , the time until the first error, has the same PMF as X above and
the process is a renewal process. If p = 0.9, then the time until the first error is different from
subsequent renewal times. In this case, the process is a delayed renewal process.
Problem 12.4.1 Solution
The hardest part of this problem is that we are asked to find all ways of replacing a branch. The
primary problem with the Markov chain in Problem 12.1.1 is that state 2 is a transient state. We
can get rid of the transient behavior by making a nonzero branch probability P12 or P02 . The
possible ways to do this are:
• Replace P00 = 1/2 with P02 = 1/2
• Replace P01 = 1/2 with P02 = 1/2
• Replace P11 = 1/2 with P12 = 1/2
• Replace P10 = 1/2 with P12 = 1/2
Keep in mind that even if we make one of these replacements, there will be at least one self transition
probability, either P00 or P11 , that will be nonzero. This will guarantee that the resulting Markov
chain will be aperiodic.
420
Problem 12.4.2 Solution
The chain given in Example 12.11 has two communicating classes as well as the transient state 2. To
create a single communicating class, we need to add a transition that enters state 2. Yet, no matter
how we add such a transition, we will still have two communicating classes. A second transition
will be needed to create a single communicating class. Thus, we need to add two branches. There
are many possible pairs of branches. Some pairs of positive branch probabilities that create an
irreducible chain are
{P12 , P23 } {P50 , P02 } {P51 , P02 } (1)
Since the random variable Tij assumes that we start in state i, E[Tij |Gi ] = E[Tij ]. Thus E[Tjj |Gi ] ≥
E[Tij ]. In addition, P [Gi ] ≥ Pji (n) since there may be paths with more than n hops that take the
system from state j to i. These facts imply
Thus, state j is not positive recurrent, which is a contradiction. Hence, it must be that E[Tij ] < ∞.
421
Problem 12.5.2 Solution
From the Markov chain given in Problem 12.1.1, the state transition matrix is
⎡ ⎤
0.5 0.5 0
P = ⎣ 0.5 0.5 0⎦ (1)
0.25 0.25 0.5
We find the stationary probabilities π = π0 π1 π2 by solving
π = πP 2πj = 1 (2)
j=0
Of course, one equation of π = π P will be redundant. The three independent equations are
From the second equation, we see that π2 = 0. This leaves the two equations:
Solving these two equations yields π0 = π1 = 0.5. The stationary probability vector is
π = π0 π1 π2 = 0.5 0.5 0 (8)
If you happened to solve Problem 12.2.2, you would have found that the n-step transition matrix
is ⎡ ⎤
0.5 0.5 0
Pn = ⎣ 0.5 0.5 0 ⎦ (9)
0.5 − (0.5)n+1 0.5 − (0.5)n+1 (0.5)n
From Theorem 12.21, we know that each rows of the n-step transition matrix converges to π . In
this case,
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0.5 0.5 0 0.5 0.5 0 π
n
lim P = lim ⎣ 0.5 0.5 0 ⎦ ⎣
= lim 0.5 0.5 0 = π⎦
⎦ ⎣ (10)
n→∞ n→∞ n→∞
0.5 − (0.5)n+1 0.5 − (0.5)n+1 (0.5)n 0.5 0.5 0 π
422
The self-transitions in state 0 and state 4 guarantee that the Markov chain is aperiodic. Since the
chain is also irreducible, we can find the stationary probabilities by solving π = π P; however, in
this problem it is simpler to apply Theorem 12.13. In particular, by partitioning the chain between
states i and i + 1, we obtain
πi p = πi+1 (1 − p). (1)
This implies πi+1 = απi where α = p/(1 − p). It follows that πi = αi π0 . REquiring the stationary
probabilities to sum to 1 yields
4
πi = π0 (1 + α + α2 + α3 + α4 ) = 1. (2)
i=0
This implies
1 − α5
π0 = (3)
1−α
Thus, for i = 0, 1, . . . , 4,
5
1− p
i
1− α5 1−p p
πi = αi = . (4)
1−α 1 − 1−p
p 1−p
The self-transitions in state 0 and state K guarantee that the Markov chain is aperiodic. Since the
chain is also irreducible, we can find the stationary probabilities by solving π = π P; however, in
this problem it is simpler to apply Theorem 12.13. In particular, by partitioning the chain between
states i and i + 1, we obtain
πi p = πi+1 (1 − p). (1)
This implies πi+1 = απi where α = p/(1 − p). It follows that πi = αi π0 . REquiring the stationary
probabilities to sum to 1 yields
K
πi = π0 (1 + α + α2 + · · · + αK ) = 1. (2)
i=0
This implies
1 − αK+1
π0 = (3)
1−α
Thus, for i = 0, 1, . . . , K,
K+1
1− p
i
1− αK+1 1−p p
πi = αi = . (4)
1−α 1− p 1−p
1−p
423
Problem 12.5.5 Solution
For this system, it’s hard to draw the entire Markov chain since from each state n there are six
branches, each with probability 1/6 to states n + 1, n + 2, . . . , n + 6. (Of course, if n + k > K − 1,
then the transition is to state n + k mod K.) Nevertheless, finding the stationary probabilities is
not very hard. In particular, the nth equation of π = π P yields
1
πn = (πn−6 + πn−5 + πn−4 + πn−3 + πn−2 + πn−1 ) . (1)
6
Rather than try to solve these equations algebraically, it’s easier to guess that the solution is
π = 1/K 1/K · · · 1/K . (2)
We will assume the units of time are seconds. Thus, if a teller is busy one second, the teller will
become idle in th next second with probability p = 1/120. The Markov chain for this system is
1-p p2+(1-p)2 1-p
1-p p(1-p)
0 1 2
p(1-p) p
We can solve this chain very easily for the stationary probability vector π. In particular,
424
Problem 12.5.7 Solution
In this case, we will examine the system each minute. For each customer in service, we need to
keep track of how soon the customer will depart. For the state of the system, we will use (i, j), the
remaining service requirements of the two customers, To reduce the number of states, we will order
the requirements so that i ≤ j. For example, when two new customers start service each requiring
two minutes of service, the system state will be (2, 2). Since the system assumes there is always a
backlog of cars waiting to enter service, the set of states is
0 (0, 1) One teller is idle, the other teller has a customer requiring one more minute of service
1 (1, 1) Each teller has a customer requiring one more minute of service.
2 (1, 2) One teller has a customer requring one minute of service. The other teller has a customer
requiring two minutes of service.
Writing the stationary probability equations for states 0, 2, and 3 and adding the constraint
j πj = 1 yields the following equations:
π0 = π2 (1)
π2 = (1/2)π0 + (1/2)π1 (2)
π3 = (1/4)π0 + (1/4)π1 (3)
1 = π0 + π 1 + π 2 + π 3 (4)
Substituting π2 = π0 in the second equation yields π1 = π0 . Substituting that result in the third
equation yields π3 = π0 /2. Making sure the probabilities add up to 1 yields
π = π0 π1 π2 π3 = 2/7 2/7 2/7 1/7 . (5)
Both tellers are busy unless the system is in state 0. The stationary probability both tellers are
busy is 1 − π0 = 5/7.
425
Problem 12.6.1 Solution
Equivalently, we can prove that if Pii = 0 for some i, then the chain cannot be
periodic. So, suppose for state i, Pii > 0. Since Pii = Pii (1), we see that the
0.5
largest d that divides n for all n such that Pii (n) > 0 is d = 1. Hence, state i is
0 1
aperiodic and thus the chain is aperiodic.
0.5
The converse that Pii = 0 for all i implies the chain is periodic is false. As a 0.5
0.5
0.5 0.5
counterexample, consider the simple chain on the right with Pii = 0 for each i.
Note that P00 (2) > 0 and P00 (3) > 0. The largest d that divides both 2 and 3 2
is d = 1. Hence, state 0 is aperiodic. Since the chain has one communicating
class, the chain is also aperiodic.
0 1 2 …K-1 K
P[N=2|N >1]
P[N=3|N >2]
P[N=K|N >K-1]
P[N=K+1|N >K]
Solving π = π P yields
Next we apply the requirement that the stationary probabilities sum to 1. Since P [N ≤ K + 1] = 1,
we see for n ≥ K + 1 that P [N > n] = 0. Thus
K
K ∞
1= πn = π0 P [N > n] = π0 P [N > n] . (5)
n=0 n=0 n=0
∞
From Problem 2.5.11, we recall that n=0 P [N > n] = E[N ]. This implies π0 = 1/E[N ] and that
P [N > n]
πn = . (6)
E [N ]
This is exactly the same stationary distribution found in Quiz 12.5! In this problem, we can view
the system state as describing the age of an object that is repeatedly replaced. In state 0, we start
with a new (zero age) object, and each unit of time, the object ages one unit of time. The random
426
variable N is the lifetime of the object. A transition to state 0 corresponds to the current object
expiring and being replaced by a new object.
In Quiz 12.5, the system state described a countdown timer for the residual life of an object. At
state 0, the system would transition to a state N = n corresponding to the lifetime of n for a new
object. This object expires and is replaced each time that state 0 is reached. This solution and the
solution to Quiz 12.5 show that the age and the residual life have the same stationary distribution.
That is, if we inspect an object at an arbitrary time in the distant future, the PMF of the age of
the object is the same as the PMF of the residual life.
C0 C1
0 1
2 3
From each state i ∈ C0 , all transitions are to states j ∈ C1 . Similarly, from each state i ∈ C1 ,
only transitions to states j ∈ C0 are permitted. The sets {0, 1} and {2, 3} are each communicating
classes. However, each state has period 2.
• If j ∈ Cn (i0 ) and j ∈ Cn (i0 ) with n > n, then n = n + kd for some integer k
To prove this claim, we observe that irreducibility of the Markov chain implies there is a sequence
of m hops from state j to state i0 . Since there is an n hop path from i0 to j and an m hop path
from j back to i0 , n + m = kd for some integer k. Similarly, since there is an n hop path from i0
to j, n + m = k d for some integer k . Thus
Now we define
∞
=
Cn = Cn+kd (i0 ), n = 0, 1, . . . , d − 1. (2)
k=0
Because the chain is irreducible, any state i belongs to some set Cn (i0 ), and thus any state i
belongs to at least one set Cn . By our earlier claim, each node i belongs to exactly one set Cn .
427
Hence, {C0 , . . . , Cd−1 } is a partition of the states of the states of the Markov chain. By construction
of the set Cn (i0 ), there exists states i ∈ Cn (i0 ) and j ∈ Cn+1 (i0 ) such that Pij > 0. Hence there
exists i ∈ Cn and j ∈ Cn+1 such that Pij > 0.
Now suppose there exists states i ∈ Cn and j ∈ Cn+m such that Pij > 0 and m > 1. In this
case, the sequence of n + kd hops from i0 to i followed by one hop to state j is an n + 1 + kd hop
path from i0 to j, implying j ∈ Cn+1 . This contradicts the fact that j cannot belong to both Cn+1
and Cn+m . Hence no such transition from i ∈ Cn to j ∈ Cn+m is possible.
πi = (1 − α)αi , i = 0, 1, 2, . . . (1)
where α = p/(1 − p). Note that the stationary probabilities do not exist if α ≥ 1, or equivalently,
p ≥ 1/2.
where α = p(1 − q) and δ = q(1 − p). To find the stationary probabilities, we apply Theorem 12.13
by partitioning the state space between states S = {0, 1, . . . , i} and S = {i + 1, i + 2, . . .} as shown
in Figure 12.4. By Theorem 12.13, for state i > 0,
πi α = πi+1 δ. (1)
This implies πi+1 = (α/δ)πi . A cut between states 0 and 1 yields π1 = (p/δ)π0 . Combining these
results, we have for any state i > 0,
p α i−1
πi = π0 (2)
δ δ
Under the condition α < δ, it follows that
p α i−1
∞
∞
p/δ
πi = π0 + π0 = π0 1+ (3)
δ δ 1 − α/δ
i=0 i=1
428
since p < q implies α/δ < 1. Thus, applying i πi = 1 and noting δ − α = q − p, we have
q p p/(1 − p) i−1
π0 = , πi = , i = 1, 2, . . . (4)
q−p (1 − p)(1 − q) q/(1 − q)
Note that α < δ if and only if p < q, which is both sufficient and necessary for the Markov chain
to be positive recurrent.
0 1
2 ½
429
Problem 12.9.3 Solution
From each state i, there are transitions of rate qij = 1 to each of the other k − 1 states. Thus each
state i has departure rate νi = k − 1. Thus, the stationary probabilities satisfy
pj (k − 1) = pj j = 1, 2, . . . , k (1)
i
=j
The stationary probabilities satisfy p0 λ1 = p1 λ0 . Thus p1 = (λ1 /λ0 )p0 . Since p0 + p1 = 1, we have
that
p0 + (λ1 /λ0 )p0 = 1. (1)
This implies
λ0 λ1
p0 = , p1 = . (2)
λ0 + λ1 λ0 + λ1
It is also possible to solve this problem using a discrete time Markov chain. One way to do this is
to assume a very small time step ∆. In state 0, a transition to state 1 occurs with probability λ1 ∆;
otherwise the system stays in state 0 with probability 1 − λ1 ∆. Similarly, in state 1, a transition to
state 0 occurs with probability λ0 ∆; otherwise the system stays in state 1 with probability 1 − λ0 ∆.
Here is the Markov chain for this discrete time system:
0 1
Not surprisingly, the stationary probabilities for this discrete time system are
λ0 λ1
π0 = , π1 = . (3)
λ0 + λ1 λ0 + λ1
430
The parameter ρ = λ/µ is the normalized load. When c = 2, the blocking probability is
ρ2 /2
P [B] = (2)
1 + ρ + ρ2 /2
Setting P [B] = 0.1 yields the quadratic equation
2 2
ρ2 − ρ − = 0 (3)
9 9
The solutions to this quadratic are √
1±
19
ρ= (4)
9
√
The meaningful nonnegative solution is ρ = (1 + 19)/9 = 0.5954.
Note that although the load per server remains the same, doubling the number of circuits to 200
caused the blocking probability to go down by more than a factor of 10 (from 0.004 to 2.76 × 10−4 ).
This is a general property of the Erlang-B formula and is called trunking efficiency by telephone
system engineers. The basic principle ss that it’s more efficient to share resources among larger
groups.
The hard part of calculating P [B] is that most calculators, including MATLAB have trouble
calculating 200!. (In MATLAB, factorial is calculated using the gamma function. That is, 200! =
gamma(201).) To do these calculations, you need to observe that if qn = ρn /n!, then
ρ
qn+1 = qn−1 (3)
n
A simple MATLAB program that uses this fact to calculate the Erlang-B formula for large values
of c is
function y=erlangbsimple(r,c)
%load is r=lambda/mu
%number of servers is c
p=1.0;
psum=1.0;
for k=1:c
p=p*r/k;
psum=psum+p;
end
y=p/psum;
431
Essentially the problems with the calculations of erlangbsimple.m are the same as those of cal-
culating the Poisson PMF. A better program for calculating the Erlang-B formula uses the im-
provements employed in poissonpmf to calculate the Poisson PMF for large values. Here is the
code:
function pb=erlangb(rho,c);
%Usage: pb=erlangb(rho,c)
%returns the Erlang-B blocking
%probability for sn M/M/c/c
%queue with load rho
pn=exp(-rho)*poissonpmf(rho,0:c);
pb=pn(c+1)/sum(pn);
0 1 c-1 c
µ µ µ µ
pn = ρn p0 n = 0, 1, . . . , c (1)
c
pi = p0 1 + ρ + ρ2 + · · · + ρc = 1 (2)
i=0
This implies
1−ρ
p0 = (3)
1 − ρc+1
The stationary probabilities are
(1 − ρ)ρn
pn = n = 0, 1, . . . , c (4)
1 − ρc+1
0 1 c c+1
µ 2µ cµ cµ cµ
432
In the solution to Quiz 12.10, we found that the stationary probabilities for the queue satisfied
p0 ρn /n! n = 1, 2, . . . , c
pn = n−c c (1)
p0 (ρ/c) ρ /c! n = c + 1, c + 2, . . .
where ρ = λ/µ = λ. We must be sure that ρ is small enough that there exists p0 > 0 such that
! "
ρn ρc ρ n−c
∞
c ∞
pn = p0 1 + + =1 (2)
n! c! c
n=0 n=1 n=c+1
This requirement is met if and only if the infinite sum converges, which occurs if and only if
∞
ρ n−c ρ j
∞
= <∞ (3)
c c
n=c+1 j=1
That is, p0 > 0 if and only if ρ/c < 1, or λ < c. In short, if the arrival rate in cars per second is
less than the service rate (in cars per second) when all booths are busy, then the Markov chain has
a stationary distribution. Note that if ρ > c, then the Markov chain is no longer positive recurrent
and the backlog of cars will grow to infinity.
(a) In this case, we have two M/M/1 queues, each with an arrival rate of λ/2. By defining
ρ = λ/µ, each queue has a stationary distribution
pn = (1 − ρ/2) (ρ/2)n n = 0, 1, . . . (1)
Note that in this case, the expected number in queue i is
∞
ρ/2
E [Ni ] = npn = (2)
1 − ρ/2
n=0
(b) The combined queue is an M/M/2/∞ queue. As in the solution to Quiz 12.10, the stationary
probabilities satisfy
p0 ρn /n! n = 1, 2
pn = n−2 2 (4)
p0 ρ ρ /2 n = 3, 4, . . .
The expected number in the system is E[N ] = ∞ n=1 npn . Some algebra will show that
ρ
E [N ] = (6)
1 − (ρ/2)2
We see that the average number in the combined queue is lower than in the system with individual
queues. The reason for this is that in the system with individual queues, there is a possibility that
one of the queues becomes empty while there is more than one person in the other queue.
433
Problem 12.10.6 Solution
This implies
pn =
⎧
⎪
⎨
⎪
p0
0
pn =
P [H] = pc =
0
n=0
n
(λ + h)n
n!
⎩ (λ + h) λ
0 c−r
lh
p0
c−r
n!
(λ + h)n
n!
c!
l
The LCFS queue operates in a way that is quite different from the usual first come, first served
queue. However, under the assumptions of exponential service times and Poisson arrivals, customers
arrive at rate λ and depart at rate µ, no matter which service discipline is used. The Markov chain
for the LCFS queue is the same as the Markov chain for the M/M/1 first come, first served queue:
ll
n−(c−r)
lh
c-r
l ll
c-r
l
c-r+1
2
l
l ll
c-1
n = c − r + 1, c − r + 2, . . . , c
n = 1, 2, . . . , c − r
p0 n = c − r + 1, c − r + 2, . . . , c
+ (λ + h)c−r
n=0
434
λ n−(c−r)
n!
n=c−r+1
(λ+h)n
n!
1
=1
3
c-1
l
c
l ll
It would seem that the LCFS queue should be less efficient than the ordinary M/M/1 queue because
a new arrival causes us to discard the work done on the customer in service. This is not the case,
however, because the memoryless property of the exponential PDF implies that no matter how
much service had already been performed, the remaining service time remains identical to that of
a new customer.
When the number of calls, n, is less than c − r, we admit either type of call and qn,n+1 = λ + h.
When n ≥ c − r, we block the new calls and we admit only handoff calls so that qn,n+1 = h. Since
the service times are exponential with an average time of 1 minute, the call departure rate in state
n is n calls per minute. Theorem 12.24 says that the stationary probabilities pn satisfy
⎧
⎪
⎨ λ + h pn−1 n = 1, 2, . . . , c − r
⎪ λ
n
⎩ pn−1
(1)
c
Finally, a handoff call is dropped if and only if a new call finds the system with c calls in progress.
(λ + h)c−r λr /c!
+ λ+h
λ
c−r
c
n=c−r+1
λn
n!
(2)
(3)
(4)
Problem 12.11.1 Solution
Here is the Markov chain describing the free throws.
0.9
0.6 0.7 0.8
1 2 3 4
0.5
Note that state 4 corresponds to “4 or more consecutive successes” while state −4 corresponds to
“4 or more consecutive misses.” We denote the stationary probabilities by the vector
π = π−4 π−3 π−2 π−1 π0 π1 π2 π3 π4 . (1)
To solve the problem at hand, we divide the work into two functions; freethrowmat(n) returns
the n step transition matrix and freethrowp(n) that calculates the probability of a success on the
free throw n.
function Pn=freethrowmat(n); function ps=freethrowp(n);
P=[0.9 0 0 0 0 0.1 0 0 0;... PP=freethrowmat(n-1);
0.8 0 0 0 0 0.2 0 0 0;... p0=[zeros(1,4) 1 ...
0 0.7 0 0 0 0.3 0 0 0;... zeros(1,4)];
0 0 0.6 0 0 0.4 0 0 0;... ps=p0*PP*0.1*(1:9)’;
0 0 0 0.5 0 0.5 0 0 0;...
0 0 0 0.4 0 0 0.6 0 0;...
0 0 0 0.3 0 0 0 0.7 0;...
0 0 0 0.2 0 0 0 0 0.8;...
0 0 0 0.1 0 0 0 0 0.9];
Pn=P^n;
In freethrowp.m, p0 is the initial state probability row vector π (0). Thus p0*PP is the state
probability row vector after n − 1 free throws. Finally, p0*PP*0.1*(1:9)’ multiplies the state
probability vector by the conditional probability of successful free throw given the current state.
The answer to our problem is simply
435
>> freethrowp(11)
ans =
0.5000
>>
In retrospect, the calculations are unnecessary! Because the system starts in state 0, symmetry
of the Markov chain dictates that states −k and k will have the same probability at every time
step. Because state −k has success probability 0.5 − 0.1k while state k has success probability
0.5 + 0.1k, the conditional success probability given the system is in state −k or k is 0.5. Averaged
over k = 1, 2, 3, 4, the average success probability is still 0.5.
Comment: Perhaps finding the stationary distribution is more interesting. This is done fairly
easily:
>> p=dmcstatprob(freethrowmat(1));
>> p’
ans =
0.3123 0.0390 0.0558 0.0929 0 0.0929 0.0558 0.0390 0.3123
About sixty precent of the time the shooter has either made four or more consecutive free throws
or missed four or more free throws. On the other hand, one can argue that in a basketball game,
a shooter rarely gets to take more than a half dozen (or so) free throws, so perhaps the stationary
distribution isn’t all that interesting.
In states 2c, 3c and 4c, only the clerk is working. In states 2m, 3m and 4m, the manager is also
working. The state space {0, 1, 2c, 3c, 4c, 2m, 3m, 4m, 5, 6, . . .} is countably infinite. Finding the
state probabilities is a little bit complicated because the are enough states that we would like to
use Matlab; however, Matlab can only handle a finite state space. Fortunately, we can use
Matlab because the state space for states n ≥ 5 has a simple structure.
We observe for n ≥ 5 that the average rate of transitions from state n to state n + 1 must equal
the average rate of transitions from state n + 1 to state n, implying
pn = αn−5 p5 , n = 5, 6, . . . , (2)
436
where α = λ < 2 < 1. The requirement that the stationary probabilities sum to 1 implies
4 ∞
1 = p0 + p 1 + (pjc + pjm ) + pn (3)
j=2 n=5
4 ∞
= p0 + p1 + (pjc + pjm ) + p5 αn−5 (4)
j=2 n=5
4
p5
= p0 + p1 + (pjc + pjm ) + (5)
1−α
j=2
j <
5, we can solve for the staitonary probabilities. In
particular, we use Theorem 12.23 to write i rij pi = 0. This leads to a set of matrix equations
for the state probability vector
p = p0 p1 p2c p3c p3c p4c p2m p3m p4m p5 (6)
where the first row just shows the correspondence of the state probabilities and the matrix columns.
For each state i, excepting state 5, the departure rate νi from that state equals the sum of entries
of the corresponding row of Q. To find the stationary probabilities, our normal procedure is to
use Theorem 12.23 and solve p R = 0 and p 1 = 1, where R is the same as Q except the zero
diagonal entries are replaced by −νi . The equation p 1 = 1 replaces one column of the set of matrix
equations. This is the approach of cmcstatprob.m.
In this problem, we follow almost the same procedure. We form the matrix R by replacing
the diagonal entries of Q. However, instead of replacing an arbitrary column with the equation
p 1 = 1, we replace the column corresponding to p5 with the equation
p5
p0 + p1 + p2c + p3c + p4c + p2m + p3m + p4m + = 1. (8)
1−α
That is, we solve
p R = 0 0 0 0 0 0 0 0 0 1 . (9)
437
where
⎡ ⎤
−λ λ 0 0 0 0 0 0 1
⎢ 1 −1 − λ λ 0 0 0 0 0 1 ⎥
⎢ ⎥
⎢ 0 1 −1 − λ λ 0 0 0 0 1 ⎥
⎢ ⎥
⎢ 0 0 1 −1 − λ λ 0 0 0 1 ⎥
⎢ ⎥
R=⎢
⎢ 0 0 0 1 −1 − λ 0 0 0 1 ⎥
⎥ (10)
⎢ 0 2 0 0 0 −2 − λ λ 0 1 ⎥
⎢ ⎥
⎢ 0 −2 − ⎥
⎢ 0 0 0 0 2 λ λ 1 ⎥
⎣ 0 0 0 0 0 0 2 −2 − λ 1 ⎦
1
0 0 0 0 0 0 0 2 1−α
we can write
∞
4 ∞
E [N ] = npn = npn + np5 αn−5 (12)
n=0 n=0 n=5
4 ∞
E [N ] = npn + p5 (k + 5)αk (13)
n=0 k=0
4 ∞
5
= npn + p5 + p5 kαk (14)
1−α
n=0 k=0
4
5 α
E [N ] = npn + p5 + (15)
1 − α (1 − α)2
n=0
4
5 − 4α
= npn + p5 (16)
(1 − α)2
n=0
Furthermore, the manager is working unless the system is in state 0, 1, 2c, 3c, or 4c. Thus
We implement these equations in the following program, alongside the corresponding output.
438
function [EN,PW]=clerks(lam); >> [en05,pw05]=clerks(0.5)
Q=diag(lam*[1 1 1 1 0 1 1 1],1); en05 =
Q=Q+diag([1 1 1 1 0 2 2 2],-1); 0.8217
Q(6,2)=2; Q(5,9)=lam; pw05 =
R=Q-diag(sum(Q,2)); 0.0233
n=size(Q,1); >> [en10,pw10]=clerks(1.0)
a=lam/2; en10 =
R(:,n)=[ones(1,n-1) 1/(1-a)]’; 2.1111
pv=([zeros(1,n-1) 1]*R^(-1)); pw10 =
EN=pv*[0;1;2;3;4;2;3;4; ... 0.2222
(5-4*a)/(1-a)^2]; >> [en15,pw15]=clerks(1.5)
PW=1-sum(pv(1:5)); en15 =
4.5036
pw15 =
0.5772
>>
We see that in going from an arrival rate of 0.5 customers per minute to 1.5 customers per
minute, the average number of customers goes from 0.82 to 4.5 customers. Similarly, the probability
the manager is working rises from 0.02 to 0.57.
• If 50 ≤ i ≤ 59, then there will be min(i, K) brake pads sold. At the end of the day, the
number of pads remaining is less than 60, and so 50 more pads are delivered overnight. Thus
the next state is j = 50 if K ≥ i pads are ordered, i pads are sold and 50 pads are delivered
overnight. On the other hand, if there are K < i orders, then the next state is j = i − K + 50.
In this case,
P [K ≥ i] j = 50,
Pij = (1)
PK (50 + i − j) j = 51, 52, . . . , 50 + i.
– j = 50: If there are K ≥ i orders, then all i pads are sold, 50 pads are delivered overnight,
and the next state is j = 50. Thus
– 51 ≤ j ≤ 59: If 50 + i − j pads are sold, then j − 50 pads ar left at the end of the day.
In this case, 50 pads are delivered overnight, and the next state is j with probability
439
ordered, then there will be i − (50 + i − j) = j − 50 pads at the end of the day. Since
60 ≤ j ≤ 109, 10 ≤ j − 50 ≤ 59, there will be 50 pads delivered overnight and the next
state will be j. Thus
– For i < j ≤ 109, state j can be reached from state i if there 50 + i − j orders, leaving
i − (50 + i − j) = j − 50 in stock at the end of the day. This implies 50 pads are delivered
overnight and the next stage is j. the probability of this event is
Note that the “0 otherwise” rule comes into effect when 50 ≤ i ≤ 59 and j > 50 + i. To simplify
these rules, we observe that PK (k) = 0 for k < 0. This implies PK (50 + i − j) = 0 for j > 50 + i.
In addition, for j > i, PK (i − j) = 0. These facts imply that we can write the state transition
probabilities in the simpler form:
⎧
⎪
⎪ P [K ≥ i] 50 ≤ i ≤ 109, j = 50,
⎨
PK (50 + i − j) 50 ≤ i ≤ 59, 51 ≤ j
Pij = (7)
⎪
⎪ P (50 + i − j) 60 ≤ i ≤ 109, 51 ≤ j ≤ 59,
⎩ K
PK (i − j) + PK (50 + i − j) 60 ≤ i ≤ 109, 60 ≤ j
440
Expressed as a table, the state transition matrix P is
In terms of Matlab, all we need to do is to encode the matrix P, calculate the stationary
probability vector π, and then calculate E[Y ], the expected number of pads sold on a typical day.
To calculate E[Y ], we use iterated expectation. The number of pads ordered is the Poisson random
variable K. We assume that on a day n that Xn = i and we calculate the conditional expectation
i−1
E [Y |Xn = i] = E [min(K, i)] = jPK (j) + iP [K ≥ i] . (11)
j=0
48
i−1
E [Y |Xn = i] = jPK (j) + jPK (j) + iP [K ≥ i] . (12)
j=0 j=49
Finally, we assume that on a typical day n, the state of the system Xn is described by the stationary
probabilities P [Xn = i] = πi and we calculate
109
E [Y ] = E [Y |Xn = i] πi . (13)
i=50
441
function [pstat,ey]=brakepads(alpha); The first half of brakepads.m constructs P to cal-
s=(50:109)’; culate the stationary probabilities. The first col-
beta=1-poissoncdf(alpha,s-1); umn of P is just the vector
grow=poissonpmf(alpha,50+(-1:-1:-59));
gcol=poissonpmf(alpha,50+(-1:58)); beta = β50 · · · β109 . (14)
drow=poissonpmf(alpha,0:-1:-49);
dcol=poissonpmf(alpha,0:49); The rest of P is easy to construct using toeplitz
P=[beta,toeplitz(gcol,grow)];
function. We first build an asymmetric Toeplitz
P(11:60,11:60)=P(11:60,11:60)...
+toeplitz(dcol,drow); matrix with first row and first column
pstat=dmcstatprob(P);
[I,J]=ndgrid(49:108,49:108); grow = γ−1 γ−2 · · · γ−59 (15)
G=J.*(I>=J); gcol = γ−1 γ0 · · · γ58 (16)
EYX=(G*gcol)+(s.*beta);
pk=poissonpmf(alpha,0:48);
EYX=EYX+(0:48)*pk;
ey=(EYX’)*pstat;
Note that δk = PK (k) + γk . Thus, to construct the Toeplitz matrix in the lower right corner of
P, we simply add the Toeplitz matrix corresponding to the missing PK (k) term. The second half
of brakepads.m calculates E[Y ] using the iterated expectation. Note that
EYX = E [Y |Xn = 50] · · · E [Y |Xn = 109] . (17)
The somewhat convoluted code becomes clearer by noting the following correspondences:
48
i−1
E [Y |Xn = i] = jPK (j) + jPK (j) + iP [K ≥ i] . (18)
j=0 j=49
s.*beta
(0:48)*pk G*gcol
To find E[Y ], we execute the commands:
>> [ps,ey]=brakepads(50);
>> ey
ey =
49.4154
>>
We see that although the store receives 50 orders for brake pads on average, the average number
sold is only 49.42 because once in awhile the pads are out of stock. Some experimentation will show
that if the expected number of orders α is significantly less than 50, then the expected number of
brake pads sold each days is very close to α. On the other hand, if α 50, then the each day the
store will run out of pads and will get a delivery of 50 pads ech night. The expected number of
unfulfilled orders will be very close to α − 50.
Note that a new inventory policy in which the overnight delivery is more than 50 pads or the
threshold for getting a shipment is more than 60 will reduce the expected numer of unfulfilled
orders. Whether such a change in policy is a good idea depends on factors such as the carrying
cost of inventory that are absent from our simple model.
442
l l l l l
0 1 2 7 8
1 2 2 2 2
Since this is a birth death process, we could easily solve this problem using analysis. However,
as this problem is in the Matlab section of this chapter, we might as well construct a Matlab
solution:
function [p,en]=veryfast2(lambda); The code solves the stationary distribution and the ex-
c=2*[0,eye(1,8)]’; pected number of cars in the system for an arbitrary
r=lambda*[0,eye(1,8)]; arrival rate λ.
Q=toeplitz(c,r);
Q(2,1)=1;
p=cmcstatprob(Q);
en=(0:8)*p;
Here is the output:
>> [p,en]=veryfast2(0.75);
>> p’
ans =
0.4546 0.3410 0.1279 0.0480 0.0180 0.0067 0.0025 0.0009 0.0004
>> en
en =
0.8709
>>
443