Probability Module2
Probability Module2
Probability Module2
Introduction
The starting point for studying probability is the definition of four key terms: experiment, sample outcome,
sample space, and event. The latter three, all carryovers from classical set theory, give us a familiar
mathematical framework within which to work; the former is what provides the conceptual mechanism for
casting real-world phenomena into probabilistic terms.
In this chapter, we introduce the concept of the probability of an event and then show how probabilities can
be computed in certain situations. As a preliminary, however, we need to discuss the concept of the sample
space and the events of an experiment.
Learning Outcomes
At the end of this module, the students will be able to:
1. discuss the concept of the sample space and the events of an experiment;
2. introduce and compute the probability of an event;
3. discuss the axioms of probability;
4. compute probabilities of mutually exclusive events and independent events;
5. compute conditional probabilities using the Bayes’ Rule.
By an experiment we will mean any procedure that (1) can be repeated, theoretically, an infinite
number of times; and (2) has a well-defined set of possible outcomes. Thus, rolling a pair of dice qualifies
as an experiment; so does measuring a hypertensive’s blood pressure or doing a spectrographic analysis
to determine the carbon content of moon rocks. Asking a would-be psychic to draw a picture of an image
presumably transmitted by another would-be psychic does not qualify as an experiment, because the set of
possible outcomes cannot be listed, characterized, or otherwise defined.
Each of the potential eventualities of an experiment is referred to as a sample outcome, 𝑠, and
their totality is called the sample space, 𝑆. To signify the membership of 𝑠 in 𝑆, we write 𝑠 ∈ 𝑆. Any
designated collection of sample outcomes, including individual outcomes, the entire sample space, and the
null set, constitutes an event. In other words, any subset 𝐸 of the sample space is known as an event. An
event 𝐸 is said to occur if the outcome of the experiment is one of the members of 𝐸.
Example 1.1.
1. If the outcome of an experiment consists of the determination of the sex of a newborn child, then
𝑆 = {𝑏, 𝑔}
where the outcome 𝑏 means that the child is a boy and 𝑔 that it is a girl.
2. If the outcome of an experiment is the order of finish in a race among the seven horses having post
positions 1, 2, 3, 4, 5, 6, and 7, then
𝑆 = {all 7! permutatons of (1, 2, 3, 4, 5, 6, 7)}.
For instance, the outcome (2, 3, 1, 6, 5, 4, 7) means that the number 2 horse comes in first, the
number 3 horse in second, then the number 1 in third, and so on.
3. If the experiment consists of flipping two coins, then the sample space is
𝑆 = {(𝐻, 𝐻), (𝐻, 𝑇), (𝑇, 𝐻), (𝑇, 𝑇)}
where 𝐻 represents a head and 𝑇 represents a tail. Each sample outcome is an ordered pair
(outcome on the first coin, outcome on the second coin).
(b) What sample outcomes make up the event 𝐸: majority of coins show heads?
5. Imagine rolling two dice, the first one is red and the second one is green.
𝑆 = {(𝑖, 𝑗) | 𝑖, 𝑗 = 1, 2, 3, 4, 5, 6}
(b) What sample outcomes make up the event 𝐸: sum of the faces is a 7?
𝐸 = {(1, 6), (6, 1), (2, 5), (5, 2), (3, 4), (4, 3)}
Definition 1.2. For any two events 𝐸 and 𝐹 of a sample space 𝑆, we define the following events.
i. The union of 𝐸 and 𝐹, denoted by 𝐸 ∪ 𝐹, is the event consisting of all outcomes that are either in
𝐸 or in 𝐹 or in both 𝐸 and 𝐹. The event 𝐸 ∪ 𝐹 will occur if either 𝐸 or 𝐹 occurs.
ii. The intersection of 𝐸 and 𝐹, denoted by 𝐸 ∩ 𝐹, is the event consisting of all outcomes that are
both in 𝐸 and 𝐹. The event 𝐸 ∩ 𝐹 will occur only if both 𝐸 and 𝐹 occur.
Example 1.3.
1. Consider Example 1.1 (3). If 𝐸 is the event that the first coin lands heads and 𝐹 is the event that
the second coin lands heads, then find 𝐸 ∪ 𝐹 and 𝐸 ∩ 𝐹.
𝐸 ∩ 𝐹 = {(𝐻, 𝐻)}
2. Consider Example 1.1 (5). If 𝐸 is the event that the sum of the dice is 7 and 𝐹 is the event that the
sum of the dice is 6, then find 𝐸 ∪ 𝐹 and 𝐸 ∩ 𝐹.
𝐸 = {(1, 6), (6, 1), (2, 5), (5, 2), (3, 4), (4, 3)}
𝐹 = {(1, 5), (5, 1), (2, 4), (4, 2), (3, 3)}
𝐸 ∪ 𝐹 = {(1, 6), (6, 1), (2, 5), (5, 2), (3, 4), (4, 3), (1, 5), (5, 1), (2, 4), (4, 2), (3, 3)}
𝐸∩𝐹 =∅
Now, we define unions and intersections of more than two events in a similar manner. If 𝐸1 , 𝐸2 , …
are events, then the union of these events, denoted by
⋃ 𝐸𝑛
𝑛=1
is defined to be the event that consists of all outcomes that are in 𝐸𝑛 for at least one value of 𝑛 = 1, 2, ….
Similarly, the intersection of the events 𝐸𝑛 , denoted by
∞
⋂ 𝐸𝑛
𝑛=1
is defined to be the event consisting of those outcomes that are in all of the 𝐸𝑛 , 𝑛 = 1, 2, ….
Definition 1.5. For any event 𝐸, the complement of 𝐸, denoted by 𝐸 𝑐 , is the event consisting of all
outcomes in the sample space 𝑆 that are not in 𝐸. The event 𝐸 𝑐 will occur if and only if event 𝐸 does not
occur.
Example 1.6. Consider Example 1.1 (4). If 𝐸 is the event consisting of the outcomes with at least one tail,
find 𝐸 𝑐 .
𝐸 = {𝐻𝐻𝑇, 𝐻𝑇𝐻, 𝑇𝐻𝐻, 𝑇𝑇𝐻, 𝑇𝐻𝑇, 𝐻𝑇𝑇, 𝑇𝑇𝑇}
𝐸 𝑐 = {𝐻𝐻𝐻 }
2. AXIOMS OF PROBABILITY
Having introduced the concepts of “experiment” and “sample space”, we are now ready to pursue
in a formal way the all-important problem of assigning a probability to an experiment’s outcome – and, more
generally, to an event. Specifically, if 𝐸 is any event defined on a sample space 𝑆, the symbol 𝑃(𝐸) will
denote the probability of 𝐸, and we will refer to 𝑃 as the probability function. It is, in effect, a mapping
from a set (i.e. an event) to a number. The backdrop for our discussion will be the unions, intersections,
and complements of set theory; the starting point will be the axioms that were originally set forth by Andrei
Kolmogorov, the great Russian probabilist. These axioms are necessary and sufficient to define the way
any and all probabilities must behave.
If 𝑆 has a finite number of members, Kolmogorov showed that as few as three axioms are
necessary and sufficient for characterizing the probability function 𝑃:
Axiom 3: Let 𝐸 and 𝐹 be any two mutually exclusive events defined over 𝑆. Then
𝑃 (𝐸 ∪ 𝐹 ) = 𝑃 (𝐸 ) + 𝑃 (𝐹 ).
𝑃 (⋃ 𝐸𝑖 ) = ∑ 𝑃(𝐸𝑖 ).
𝑖=1 𝑖=1
From these simple statements come the general rules for manipulating the probability function that
apply no matter what specific mathematical form the function may take in a particular context.
Some of the immediate consequences of Kolmogorov’s axioms are the results given in Theorems
2.1 through 2.6. Despite their simplicity, several of these properties prove to be immensely useful in solving
all sorts of problems.
𝑃 (⋃ 𝐸𝑖 ) = ∑ 𝑃(𝐸𝑖 ).
𝑖=1 𝑖=1
Proof.
We use mathematical induction.
i. WTS: The result is true for 𝑛 = 1.
𝑃 (⋃ 𝐸𝑖 ) = 𝑃(𝐸1 ) = ∑ 𝑃(𝐸𝑖 )
𝑖=1 𝑖=1
𝑃 (⋃ 𝐸𝑖 ) = 𝑃 (⋃ 𝐸𝑖 ∪ 𝐸𝑘+1 )
𝑖=1 𝑖=1
= 𝑃 (⋃ 𝐸𝑖 ) + 𝑃(𝐸𝑘+1 )
𝑖=1
= ∑ 𝑃(𝐸𝑖 ) + 𝑃(𝐸𝑘+1 )
𝑖=1
𝑘+1
= ∑ 𝑃(𝐸𝑖 )
𝑖=1
Theorem 2.6. For any events 𝐸 and 𝐹, 𝑃(𝐸 ∪ 𝐹 ) = 𝑃(𝐸 ) + 𝑃(𝐹 ) − 𝑃(𝐸 ∩ 𝐹 ).
Proof.
Note that the following are true.
𝐸 = (𝐸 ∩ 𝐹 𝑐 ) ∪ (𝐸 ∩ 𝐹 ) and 𝐹 = (𝐹 ∩ 𝐸 𝑐 ) ∪ (𝐸 ∩ 𝐹 )
Since 𝐸 ∩ 𝐹 𝑐 and 𝐸 ∩ 𝐹 are mutually exclusive, it follows from Axiom 3 that
𝑃(𝐸 ) = 𝑃(𝐸 ∩ 𝐹 𝑐 ) + 𝑃 (𝐸 ∩ 𝐹 ).
Similarly,
𝑃(𝐹 ) = 𝑃 (𝐹 ∩ 𝐸 𝑐 ) + 𝑃(𝐸 ∩ 𝐹 ).
Adding the two preceding equations, we have
𝑃(𝐸 ) + 𝑃 (𝐹 ) = 𝑃(𝐸 ∩ 𝐹 𝑐 ) + 𝑃(𝐸 ∩ 𝐹 ) + 𝑃 (𝐹 ∩ 𝐸 𝑐 ) + 𝑃(𝐸 ∩ 𝐹 ).
We can verify that the first three terms on the right side of the equation is equal to 𝑃(𝐸 ∪ 𝐹 ).Thus,
𝑃(𝐸 ) + 𝑃 (𝐹 ) = 𝑃(𝐸 ∪ 𝐹 ) + 𝑃(𝐸 ∩ 𝐹 ).
Then by subtracting 𝑃(𝐸 ∩ 𝐹 ) on both sides of the equation,
𝑃(𝐸 ∪ 𝐹 ) = 𝑃(𝐸 ) + 𝑃 (𝐹 ) − 𝑃(𝐸 ∩ 𝐹 ).
Example 2.7.
1. Let 𝐴 and 𝐵 be two events defined on a sample space 𝑆 such that 𝑃(𝐴) = 0.3, 𝑃 (𝐵) = 0.5,
and 𝑃(𝐴 ∪ 𝐵) = 0.7. Find:
(a) 𝑃 (𝐴 ∩ 𝐵)
By Theorem 2.6, 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃 (𝐵) − 𝑃(𝐴 ∩ 𝐵).
Thus, 𝑃(𝐴 ∩ 𝐵) = 𝑃 (𝐴) + 𝑃(𝐵) − 𝑃 (𝐴 ∪ 𝐵).
= 0.3 + 0.5 − 0.7
𝑷(𝑨 ∩ 𝑩) = 𝟎. 𝟏
(b) 𝑃 (𝐴𝑐 ∪ 𝐵𝑐 )
By De Morgan’s Law, 𝑃(𝐴𝑐 ∪ 𝐵𝑐 ) = 𝑃((𝐴 ∩ 𝐵)𝑐 ).
By Theorem 2.1, 𝑃((𝐴 ∩ 𝐵)𝑐 ) = 1 − 𝑃(𝐴 ∩ 𝐵).
= 1 − 0.1
𝑷(𝑨𝒄 ∪ 𝑩𝒄 ) = 𝟎. 𝟗
(c) 𝑃(𝐴𝑐 ∩ 𝐵)
Note that 𝑃(𝐴𝑐 ∩ 𝐵) = 𝑃 (𝐵) − 𝑃(𝐴 ∩ 𝐵).
= 0.5 − 0.1
𝑷(𝑨𝒄 ∩ 𝑩) = 𝟎. 𝟒
2. Mary is taking two books along on her holiday vacation. With probability 0.5, she will like the first
book; with probability 0.4, she will like the second book; and with probability 0.3, she will like both
books. What is the probability that she likes neither book?
Let 𝐴 and 𝐵 respectively denote the events of liking the first and the second books.
𝑃(𝐴) = 0.5
𝑃(𝐵) = 0.4
𝑃(𝐴 ∩ 𝐵) = 0.3
The event that she likes neither book means that she doesn’t like both books, hence the
probability of this event can be represented as 𝑃 (𝐴𝑐 ∩ 𝐵𝑐 ).
Applying De Morgan’s Law and Theorem 2.1, we have
𝑃(𝐴𝑐 ∩ 𝐵𝑐 ) = 𝑃((𝐴 ∪ 𝐵)𝑐 ) = 1 − 𝑃(𝐴 ∪ 𝐵).
To solve for 𝑃(𝐴 ∪ 𝐵), we use Theorem 2.6.
𝑃 (𝐴 ∪ 𝐵 ) = 𝑃 (𝐴 ) + 𝑃 (𝐵 ) − 𝑃 (𝐴 ∩ 𝐵 )
3. In a newly released martial arts film, the actress playing the lead role has a stunt double who
handles all of the physically dangerous action scenes. According to the script, the actress appears
in 40% of the film’s scenes, her double appears in 30%, and the two of them are together 5% of the
time. What is the probability that in a given scene:
4. Having endured (and survived) the mental trauma that comes from taking two years of chemistry, a
year of physics, and a year of biology, Billy decides to test the medical school waters and sends his
MCATs to two colleges, 𝑋 and 𝑌. Based on how his friends have fared, he estimates that his
probability of being accepted at 𝑋 is 0.7, and at 𝑌 is 0.4. He also suspects there is a 75% chance
that at least one of his applications will be rejected. What is the probability that he gets at least one
acceptance?
𝑃(𝑋) = 0.7
𝑃(𝑌) = 0.4
𝑃(𝑋 𝑐 ∪ 𝑌 𝑐 ) = 𝑃((𝑋 ∩ 𝑌)𝑐 ) = 0.75
The probability of the event that he gets at least one acceptance can be represented as
𝑃 (𝑋 ∪ 𝑌 ).
Applying Theorems 2.6 and 2.1, we have
𝑃 (𝑋 ∪ 𝑌 ) = 𝑃 (𝑋 ) + 𝑃 (𝑌 ) − 𝑃 (𝑋 ∩ 𝑌 )
= 𝑃(𝑋) + 𝑃(𝑌) − [1 − 𝑃 ((𝑋 ∩ 𝑌)𝑐 )]
= 0.7 + 0.4 − [1 − 0.75]
𝑷(𝑿 ∪ 𝒀) = 𝟎. 𝟖𝟓
5. Show that
𝑃 (𝐴 ∩ 𝐵) ≥ 1 − 𝑃(𝐴𝑐 ) − 𝑃(𝐵𝑐 )
for any two events 𝐴 and 𝐵 defined on a sample space 𝑆.
3. CONDITIONAL PROBABILITY
In Section 2, we calculated probabilities of certain events by manipulating other probabilities whose
values were given. For many real-world situations, though, the “given” in a probability problem goes beyond
simply knowing a set of other probabilities. Sometimes, we know for a fact that certain events have already
occurred, and those occurrences may have a bearing on the probability we are trying to find. In short, the
probability of an event 𝐴 may have to be “adjusted” if we know for certain that some related event 𝐵 has
already occurred. Any probability that is revised to take into account the (known) occurrence of other
events is said to be a conditional probability.
1
Consider a fair die being tossed, with 𝐴 defined as the event “6 appears.” Clearly, 𝑃(𝐴) = 6. But
suppose that the die has already been tossed by someone who refuses to tell us whether or not 𝐴 occurred
but does enlighten us to the extent of confirming that 𝐵 occurred, where 𝐵 is the event “Even number
appears.” What are the chances of 𝐴 now? Here, common sense can help us: There are three equally
likely even numbers making up the event 𝐵, one of which satisfies the event 𝐴, so the “updated” probability
1
is 3.
Notice that the effect of additional information, such as the knowledge that 𝐵 has occurred, is to
revise the original sample space 𝑆 to a new set of outcomes 𝑆 ′ . In this example, the original 𝑆 contained
six outcomes while the conditional sample space contains three.
The symbol 𝑃 (𝐴|𝐵) – read “the probability of 𝐴 given 𝐵” – is used to denote a conditional
probability. Specifically, 𝑃(𝐴|𝐵) refers to the probability that 𝐴 will occur given that 𝐵 has already
occurred.
Definition 3.1. Let 𝐴 and 𝐵 be any two events defined on a sample space 𝑆 such that 𝑃(𝐵) > 0. The
conditional probability of 𝐴, assuming that 𝐵 has already occurred, is written 𝑃(𝐴|𝐵) and is given by
𝑃 (𝐴 ∩ 𝐵 )
𝑃 (𝐴|𝐵) =
𝑃 (𝐵 )
Example 3.2. A card is drawn from a poker deck. What is the probability that the card is a club, given that
the card is a king?
Let 𝐶 be the event that the card drawn is a club. Let 𝐾 be the event that the card is a king. The desired
probability is 𝑃 (𝐶 |𝐾 ) and by Definition 3.1, we have
𝑃 (𝐶 ∩ 𝐾 ) 1/52 𝟏
𝑃 (𝐶 |𝐾 ) = = =
𝑃 (𝐾 ) 4/52 𝟒
Example 3.3. Joe is 80% certain that his missing key is in one of the two pockets of his hanging jacket,
being 40% certain it is in the left-hand pocket and 40% certain it is in the right-hand pocket. If a search of
the left-hand pocket does not find the key, what is the conditional probability that it is in the other pocket?
Let 𝑅 be the event that the key is in the right-hand pocket and let 𝐿 be the event that it is in the left-hand
pocket. The desired probability is denoted by 𝑃(𝑅|𝐿𝑐 ) and by Definition 3.1, we have
𝑃 (𝑅 ∩ 𝐿𝑐 )
𝑃(𝑅|𝐿𝑐 ) =
𝑃 (𝐿𝑐 )
Since 𝐿𝑐 represents the event that the key is not in the left-hand pocket, it follows that the key may be in
the right-hand pocket or neither of the two pockets. This implies that 𝑅 ⊂ 𝐿𝑐 . Hence,
𝑃 (𝑅 ) 𝑃 (𝑅 ) 0.4 𝟐
𝑃(𝑅|𝐿𝑐 ) = = = =
𝑃(𝐿𝑐 ) 1 − 𝑃(𝐿) 1 − 0.4 𝟑
Example 3.4. Consider the set of families having two children. Assume that the 4 possible birth sequences
– (younger child is a boy, older child is a boy), (younger child is a boy, older child is a girl), and so on – are
equally likely. What is the probability that both children are boys given that at least one is a boy?
Let 𝐴 be the event that both children are boys and let 𝐵 be the event that at least one child is a boy. The
desired probability is 𝑃(𝐴|𝐵) and by Definition 3.1, we have
𝑃 (𝐴 ∩ 𝐵 )
𝑃 (𝐴|𝐵) =
𝑃 (𝐵 )
Since 𝐵 = {(𝑏, 𝑏), (𝑏, 𝑔), (𝑔, 𝑏)} and 𝐴 = {(𝑏, 𝑏)}, clearly 𝐴 ⊂ 𝐵 which implies that
𝑃(𝐴) 1/4 𝟏
𝑃(𝐴|𝐵 ) = = =
𝑃(𝐵) 3/4 𝟑
Example 3.5. Two events 𝐴 and 𝐵 are defined such that (1) the probability that 𝐴 occurs but 𝐵 does not
occur is 0.2, (2) the probability that 𝐵 occurs but 𝐴 does not occur is 0.1, and (3) the probability that neither
occurs is 0.6. What is 𝑃(𝐴|𝐵)?
(1) 𝑃 (𝐴 ∩ 𝐵𝑐 ) = 0.2
(2) 𝑃 (𝐵 ∩ 𝐴𝑐 ) = 0.1
(3) 𝑃 ((𝐴 ∪ 𝐵)𝑐 ) = 0.6
By Theorems 2.6 and 2.1,
𝑃 (𝐴 ∪ 𝐵 ) = 𝑃 (𝐴 ) + 𝑃 (𝐵 ) − 𝑃 (𝐴 ∩ 𝐵 )
1 − 𝑃((𝐴 ∪ 𝐵)𝑐 ) = 𝑃(𝐴 ∩ 𝐵𝑐 ) + 𝑃(𝐴 ∩ 𝐵) + 𝑃(𝐵 ∩ 𝐴𝑐 ) + 𝑃(𝐴 ∩ 𝐵) − 𝑃(𝐴 ∩ 𝐵)
1 − 0.6 = 𝑃 (𝐴 ∩ 𝐵𝑐 ) + 𝑃 (𝐴 ∩ 𝐵) + 𝑃(𝐵 ∩ 𝐴𝑐 )
0.4 = 0.2 + 𝑃(𝐴 ∩ 𝐵) + 0.1
𝑃(𝐴 ∩ 𝐵) = 0.4 − 0.2 − 0.1 = 0.1
We know that
𝑃 (𝐵) = 𝑃 (𝐵 ∩ 𝐴𝑐 ) + 𝑃 (𝐴 ∩ 𝐵)
𝑃(𝐵) = 0.1 + 0.1 = 0.2
Hence, the desired probability is
𝑃 (𝐴 ∩ 𝐵) 0.1 𝟏
𝑃 (𝐴|𝐵 ) = = =
𝑃 (𝐵 ) 0.2 𝟐
Example 3.6. Suppose that an urn contains 8 red balls and 4 white balls. We draw 2 balls from the urn
without replacement. If we assume that at each draw, each ball in the urn is equally likely to be chosen,
what is the probability that both balls drawn are red?
Let 𝑅1 and 𝑅2 respectively denote the events that the first and second balls drawn are red. The desired
probability here is
8 𝐶2 14
𝑃(𝑅1 ∩ 𝑅2 ) = =
12 𝐶2 33
Alternatively, we can solve this probability by using Definition 3.1. Hence,
7 8 𝟏𝟒
𝑃(𝑅1 ∩ 𝑅2 ) = 𝑃 (𝑅2 |𝑅1 )𝑃(𝑅1 ) = ( )=
11 12 𝟑𝟑
We have seen that conditional probabilities can be useful in evaluating intersection probabilities,
that is, 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴|𝐵)𝑃(𝐵) = 𝑃(𝐵|𝐴)𝑃(𝐴). The next result which is sometimes referred to as
the multiplication rule holds for higher-order intersections.
Example 3.8. An urn contains five white chips, four black chips, and three red chips. Four chips are drawn
sequentially and without replacement. What is the probability of obtaining the sequence (white, red, white,
black)?
Let 𝐴 be the event that the first chip drawn is white, 𝐵 be the event that the second is red, 𝐶 be the event
that the third is white, and 𝐷 be the event that the last is black.
Our desired probability is 𝑃(𝐴 ∩ 𝐵 ∩ 𝐶 ∩ 𝐷).
Example 3.9. An ordinary deck of 52 playing cards is randomly divided into 4 piles of 13 cards each.
Compute the probability that each pile has exactly 1 ace.
Let 𝐴1 be the event that the ace of spades is in any one of the piles, 𝐴2 be the event that the ace of
spades and ace of hearts are in different piles, 𝐴3 be the event that the aces of spades, hearts and
diamonds are all in different piles, and 𝐴4 be the event that all aces are in different piles.
Our desired probability is 𝑃(𝐴1 ∩ 𝐴2 ∩ 𝐴3 ∩ 𝐴4 ).
By Theorem 3.7, we have
𝑃(𝐴1 ∩ 𝐴2 ∩ 𝐴3 ∩ 𝐴4 ) = 𝑃(𝐴1 ) ∙ 𝑃(𝐴2 |𝐴1 ) ∙ 𝑃(𝐴3 |𝐴1 ∩ 𝐴2 ) ∙ 𝑃(𝐴4 |𝐴1 ∩ 𝐴2 ∩ 𝐴3 )
12 24 36
= 1 ∙ (1 − ) ∙ (1 − ) ∙ (1 − )
51 50 49
39 26 13 𝟐𝟏𝟗𝟕
=1∙ ∙ ∙ =
51 50 49 𝟐𝟎𝟖𝟐𝟓
The next results are two very useful theorems that apply to partitioned sample spaces. By
definition, a set of events 𝐴1 , 𝐴2 , … , 𝐴𝑛 partitions a sample space 𝑆 if every outcome in 𝑆 belongs to one
and only one of the 𝐴𝑖 ’s. That is, the 𝐴𝑖 ’s are mutually exclusive and their union is 𝑆.
Let 𝐵, as shown in Figure 3.2, denote any event defined on 𝑆. The first result, Theorem 3.10, gives
a formula for the “unconditional” probability of 𝐵 (in terms of the 𝐴𝑖 ’s). Then Theorem 3.13 calculates the
set of conditional probabilities, 𝑃(𝐴𝑗 |𝐵), 𝑗 = 1, 2, … , 𝑛.
Theorem 3.10. Let {𝐴𝑖 }𝑛𝑖=1 be a set of events defined over a sample space 𝑆 such that 𝑆 = ⋃𝑛𝑖=1 𝐴𝑖 ,
𝐴𝑖 ∩ 𝐴𝑗 = ∅ for 𝑖 ≠ 𝑗, and 𝑃(𝐴𝑖 ) > 0 for 𝑖 = 1, 2, … , 𝑛. For any event 𝐵 defined on 𝑆,
𝑛
𝑃 (𝐵 ) = ∑ 𝑃 (𝐵 |𝐴 𝑖 ) 𝑃 (𝐴 𝑖 )
𝑖=1
Proof.
We can express 𝐵 as
𝐵 = (𝐵 ∩ 𝐴1 ) ∪ (𝐵 ∩ 𝐴2 ) ∪ ⋯ ∪ (𝐵 ∩ 𝐴𝑛 )
By Theorem 2.5, we have
𝑃(𝐵) = 𝑃(𝐵 ∩ 𝐴1 ) + 𝑃 (𝐵 ∩ 𝐴2 ) + ⋯ + 𝑃(𝐵 ∩ 𝐴𝑛 )
By Definition 3.1, we have
𝑃 (𝐵) = 𝑃(𝐵 |𝐴1 )𝑃(𝐴1 ) + 𝑃(𝐵|𝐴2 )𝑃(𝐴2 ) + ⋯ 𝑃 (𝐵 |𝐴𝑛 )𝑃(𝐴𝑛 )
= ∑ 𝑃 ( 𝐵 |𝐴 𝑖 ) 𝑃 (𝐴 𝑖 )
𝑖=1
as desired.
Example 3.11. Urn I contains two black chips and four white chips; urn II, three black and one white. A
chip is drawn at random from urn I and transferred to urn II. Then a chip is drawn from urn II. What is the
probability that the chip drawn from urn II is black?
Let 𝐵 be the event that the chip drawn from urn II is black. Let 𝐴1 and 𝐴2 be the events that the chip
transferred from urn I is black and white, respectively.
Example 3.12. A standard poker deck is shuffled and the card on top is removed. What is the probability
that the second card is an ace?
Let 𝐵 be the event that the second card is an ace. Let 𝐴1 and 𝐴2 be the events that the top card was an
ace and not an ace, respectively.
By Theorem 3.10, we have
𝑃(𝐵) = 𝑃(𝐵|𝐴1 )𝑃(𝐴1 ) + 𝑃(𝐵|𝐴2 )𝑃(𝐴2 )
3 4 4 48 𝟏
= ( )+ ( )=
51 52 51 52 𝟏𝟑
Now, the next result will be applied simply as Reverend Thomas Bayes originally intended, as a
formula for evaluating a certain kind of “inverse” probability. If we know 𝑃 (𝐵|𝐴𝑖 ) for all 𝑖, this theorem
enables us to compute conditional probabilities “in the other direction” – that is, we can deduce 𝑃(𝐴𝑗 |𝐵)
from the 𝑃(𝐵|𝐴𝑖 )’s.
Theorem 3.13. (Bayes’ Theorem) Let {𝐴𝑖 }𝑛𝑖=1 be a set of events defined over a sample space 𝑆 such that
𝑆 = ⋃𝑛𝑖=1 𝐴𝑖 , 𝐴𝑖 ∩ 𝐴𝑗 = ∅ for 𝑖 ≠ 𝑗, and 𝑃(𝐴𝑖 ) > 0 for 𝑖 = 1, 2, … , 𝑛. For any event 𝐵 defined on 𝑆,
where 𝑃(𝐵) > 0,
𝑃(𝐵|𝐴𝑗 )𝑃(𝐴𝑗 )
𝑃(𝐴𝑗 |𝐵) = 𝑛
∑𝑖=1 𝑃 (𝐵 |𝐴𝑖 )𝑃(𝐴𝑖 )
for any 1 ≤ 𝑗 ≤ 𝑛.
Proof.
By Definition 3.1, for any 1 ≤ 𝑗 ≤ 𝑛, we have
𝑃(𝐴𝑗 ∩ 𝐵) 𝑃(𝐵|𝐴𝑗 )𝑃(𝐴𝑗 )
𝑃(𝐴𝑗 |𝐵) = =
𝑃 (𝐵 ) 𝑃 (𝐵 )
By Theorem 3.10, we have
𝑃(𝐵|𝐴𝑗 )𝑃(𝐴𝑗 )
𝑃(𝐴𝑗 |𝐵) = 𝑛
∑𝑖=1 𝑃 (𝐵 |𝐴𝑖 )𝑃(𝐴𝑖 )
Example 3.14. A biased coin, twice as likely to come up heads as tails, is tossed once. If it shows heads, a
chip is drawn from urn I, which contains three white chips and four red chips; if it shows tails, a chip is
drawn from urn II, which contains six white chips and three red chips. Given that a white chip was drawn,
what is the probability that the coin came up tails?
Let 𝐵 be the event that a white chip was drawn. Let 𝐴1 and 𝐴2 respectively denote the events that the coin
came up heads (the chip came from urn I) and came up tails (the chip came from urn II). Our desired
probability is 𝑃 (𝐴2 |𝐵).
Applying Theorem 3.13, we have
𝑃 (𝐵 |𝐴 2 )𝑃 (𝐴 2 )
𝑃 (𝐴 2 |𝐵 ) =
𝑃(𝐵|𝐴1 )𝑃(𝐴1 ) + 𝑃(𝐵|𝐴2 )𝑃(𝐴2 )
6 3
Based on the illustration above, we can verify that 𝑃 (𝐵 |𝐴2 ) = 9 and 𝑃(𝐵|𝐴1 ) = 7. Based on the
2 1
imposed condition on the coin, we know that 𝑃(𝐴1 ) = 3 and 𝑃(𝐴2 ) = 3.
Hence,
6 1
𝑃 (𝐴2 |𝐵) = 9 ( 3) =
𝟕
3 2 6 1 𝟏𝟔
7 ( 3) + 9 ( 3)
Example 3.15. During a power blackout, one hundred persons are arrested on suspicion of looting. Each is
given a polygraph test. From past experience it is known that the polygraph is 90% reliable when
administered to a guilty suspect and 98% reliable when given to someone who is innocent. Suppose that of
the one hundred persons taken into custody, only twelve were actually involved in any wrongdoing. What is
the probability that a given suspect is innocent given that the polygraph says he is guilty?
Let 𝐵 be the event that the polygraph says the suspect is guilty. Let 𝐴1 and 𝐴2 be the events that the
suspect is guilty and not guilty, respectively. Our desired probability is 𝑃(𝐴2 |𝐵).
Applying Theorem 3.13, we have
𝑃 (𝐵 |𝐴 2 )𝑃 (𝐴 2 )
𝑃 (𝐴 2 |𝐵 ) =
𝑃(𝐵|𝐴1 )𝑃(𝐴1 ) + 𝑃(𝐵|𝐴2 )𝑃(𝐴2 )
Based on the given conditions, we know that 𝑃(𝐵|𝐴1 ) = 0.9 and 𝑃 (𝐵𝑐 |𝐴2 ) = 0.98. This implies that
12 88
𝑃(𝐵|𝐴2 ) = 0.02. Note that 𝑃(𝐴1 ) = 100 and 𝑃(𝐴2 ) = 100 .
Therefore,
0.02(0.88) 𝟐𝟐
𝑃 (𝐴 2 | 𝐵 ) = =
0.9(0.12) + 0.02(0.88) 𝟏𝟓𝟕
4. INDEPENDENCE
Section 3 dealt with the problem of reevaluating the probability of a given event in light of the
additional information that some other event has already occurred. It often is the case, though, that the
probability of the given event remains unchanged, regardless of the outcome of the second event – that is,
𝑃(𝐴|𝐵) = 𝑃(𝐴) = 𝑃(𝐴|𝐵𝑐 ). Events sharing this property are said to be independent.
Example 4.2. Let 𝐴 be the event of drawing a king from a standard poker deck and 𝐵 be the event of
drawing a diamond. Then, by Definition 4.1, 𝐴 and 𝐵 are independent because the probability of their
intersection – drawing a king of diamonds – is equal to 𝑃 (𝐴) · 𝑃(𝐵):
1 4 13
𝑃 (𝐴 ∩ 𝐵 ) = = ∙ = 𝑃 ( 𝐴 ) ∙ 𝑃 (𝐵 )
52 52 52
Example 4.3. Two coins are flipped, and all 4 outcomes are assumed to be equally likely. If 𝐸 is the event
that the first coin lands on heads and 𝐹 the event that the second lands on tails, then 𝐸 and 𝐹 are
2 2
independent since 𝑃(𝐸 ) = 𝑃 ({𝐻𝐻, 𝐻𝑇}) = , 𝑃 (𝐹 ) = 𝑃({𝐻𝑇, 𝑇𝑇}) = , and
4 4
1 2 2
𝑃 (𝐸 ∩ 𝐹 ) = 𝑃({𝐻𝑇}) = = ∙ = 𝑃(𝐸) ∙ 𝑃 (𝐹 )
4 4 4
Example 4.4. Suppose that 𝐴 and 𝐵 are independent events. Does it follow that 𝐴𝑐 and 𝐵𝑐 are also
independent?
We need to show that 𝑃(𝐴𝑐 ∩ 𝐵𝑐 ) = 𝑃(𝐴𝑐 )𝑃(𝐵𝑐 ).
𝑃(𝐴𝑐 ∩ 𝐵𝑐 ) = 𝑃[(𝐴 ∪ 𝐵)𝑐 ] (by De Morgan’s Law)
= 1 − 𝑃 (𝐴 ∪ 𝐵 ) (by Theorem 2.1)
= 1 − [𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)] (by Theorem 2.6)
= 1 − 𝑃 (𝐴 ) − 𝑃 (𝐵 ) + 𝑃 (𝐴 )𝑃 (𝐵 ) (by Definition 4.1)
= [1 − 𝑃(𝐴)][1 − 𝑃(𝐵)] (by factoring)
= 𝑃(𝐴𝑐 )𝑃(𝐵𝑐 ) (by Theorem 2.1)
Example 4.5. Suppose that 𝑃(𝐴 ∩ 𝐵) = 0.2, 𝑃(𝐴) = 0.6, and 𝑃(𝐵) = 0.5.
(a) Are 𝐴 and 𝐵 mutually exclusive?
Since 𝑃(𝐴 ∩ 𝐵) = 0.2 ≠ 0, this means that 𝐴 and 𝐵 are not mutually exclusive.
Example 4.6. Myra and Carlos are summer interns working as proofreaders for a local newspaper. Based
on aptitude tests, Myra has a 50% chance of spotting a hyphenation error, while Carlos picks up on that
same kind of mistake 80% of the time. Suppose the copy they are proofing contains a hyphenation error.
What is the probability it goes undetected?
Let 𝐴 and 𝐵 be the events that Myra and Carlos, respectively, spot the hyphenation error. Our desired
probability is
𝑃(error is undetected) = 1 − 𝑃(error is detected)
= 1 − 𝑃(at least one of them detect the error)
= 1 − 𝑃 (𝐴 ∪ 𝐵 )
= 1 − [𝑃(𝐴) + 𝑃(𝐵) − 𝑃 (𝐴 ∩ 𝐵)]
= 1 − [𝑃(𝐴) + 𝑃(𝐵) − 𝑃 (𝐴)𝑃(𝐵)]
= 1 − [0.5 + 0.8 − 0.5(0.8)]
= 𝟎. 𝟏
Of course, we may also extend this definition of independence to more than three events. More
generally, the events 𝐴1 , 𝐴2 , … , 𝐴𝑛 are said to be independent if for every subset of 𝐴1′ , 𝐴2′ , … 𝐴𝑟 ′ , 𝑟 ≤
𝑛 of these events,
𝑃 (𝐴1′ ∩ 𝐴2′ ∩ ⋯ ∩ 𝐴𝑛′ ) = 𝑃(𝐴1′ ) ∙ 𝑃(𝐴2′ ) ∙ ⋯ ∙ 𝑃(𝐴𝑛′ )
Example 4.8. Suppose that two fair dice (one red and one green) are rolled. Define the events:
𝐴: a 1 or a 2 shows on the red die
{(1,1), (1,2), (1,3), (1,4), (1,5), (1,6), (2,1), (2,2), (2,3), (2,4), (2,5), (2,6)}
𝐵: a 3, 4, or 5 shows on the green die
(1,3), (2,3), (3,3), (4,3), (5,3), (6,3), (1,4), (2,4), (3,4),
{ }
(4,4), (5,4), (6,4), (1,5), (2,5), (3,5), (4,5), (5,5), (6,5)
12 18 6
Note that 𝑃(𝐴) = 36, 𝑃(𝐵) = 36, and 𝑃(𝐶 ) = 36. Since 𝐴 ∩ 𝐵 ∩ 𝐶 = {(1,3)}, it follows that
1 1 1 1 1
𝑃(𝐴 ∩ 𝐵 ∩ 𝐶 ) = 36. Moreover, 𝑃(𝐴) ∙ 𝑃(𝐵) ∙ 𝑃 (𝐶 ) = 3 (2) (6) = 36. Thus,
𝑃 ( 𝐴 ∩ 𝐵 ∩ 𝐶 ) = 𝑃 (𝐴 ) ∙ 𝑃 (𝐵 ) ∙ 𝑃 (𝐶 )
Notice the following:
6 1 12 18 1
𝑃(𝐴 ∩ 𝐵) = 36 = 6 and 𝑃(𝐴) ∙ 𝑃(𝐵) = 36 (36) = 6
2 1 12 6 1
𝑃(𝐴 ∩ 𝐶 ) = 36 = 18 and 𝑃(𝐴) ∙ 𝑃(𝐶 ) = 36 (36) = 18
2 1 18 6 1
𝑃(𝐵 ∩ 𝐶 ) = 36 = 18 and 𝑃(𝐵) ∙ 𝑃(𝐶 ) = 36 (36) = 12
Since 𝑃(𝐵 ∩ 𝐶 ) ≠ 𝑃 (𝐵) ∙ 𝑃(𝐶 ), it follows that 𝐴, 𝐵, and 𝐶 are not independent.
Example 4.9. An insurance company plans to assess its future liabilities by sampling the records of its
current policyholders. A pilot study has turned up three clients – one living in Alaska, one in Missouri, and
one in Vermont – whose estimated chances of surviving to the year 2021 are 0.7, 0.9, and 0.3,
respectively. What is the probability that by the end of 2020 the company will have had to pay death
benefits to exactly one of the three?
Let 𝐴1 , 𝐴2 , 𝐴3 be the events that the clients living in Alaska, Missouri and Vermont, respectively, survive
through 2021. Let 𝐸 be the event that exactly one of them dies. Then,
𝐸 = (𝐴𝑐1 ∩ 𝐴2 ∩ 𝐴3 ) ∪ 𝑃(𝐴1 ∩ 𝐴𝑐2 ∩ 𝐴3 ) ∪ (𝐴1 ∩ 𝐴2 ∩ 𝐴𝑐3 )
Since the three events in the above union are mutually exclusive, our desired probability is
𝑃(𝐸 ) = 𝑃(𝐴𝑐1 ∩ 𝐴2 ∩ 𝐴3 ) + 𝑃(𝐴1 ∩ 𝐴𝑐2 ∩ 𝐴3 ) + 𝑃(𝐴1 ∩ 𝐴2 ∩ 𝐴𝑐3 )
We will assume in this problem that the three clients are living independently, hence
𝑃(𝐸 ) = 𝑃 (𝐴𝑐1 )𝑃(𝐴2 )𝑃(𝐴3 ) + 𝑃(𝐴1 )𝑃(𝐴𝑐2 )𝑃(𝐴3 ) + 𝑃(𝐴1 )𝑃(𝐴2 )𝑃(𝐴𝑐3 )
= 0.3(0.9)(0.3) + 0.7(0.1)(0.3) + 0.7(0.9)(0.7) = 𝟎. 𝟓𝟒𝟑
We have already seen several examples where the event of interest was actually an intersection of
independent simpler events (in which case the probability of the intersection reduced to a product). There is
a special case of that basic scenario that deserves special mention because it applies to numerous real-
world situations. If the events making up the intersection all arise from the same physical circumstances
and assumptions (i.e., they represent repetitions of the same experiment), they are referred to as repeated
independent trials. The number of such trials may be finite or infinite.
Example 4.10. Suppose the string of Christmas tree lights you just bought has twenty-four bulbs wired in
series. If each bulb has a 99.9% chance of “working” the first time current is applied, what is the probability
that the string itself will not work?
Let 𝐴𝑖 be the event that the 𝑖th bulb fails, where 𝑖 = 1, 2, … , 24. Then
𝑃(string does not work) = 1 − 𝑃(string works)
= 1 − 𝑃(all bulbs work)
= 1 − 𝑃(𝐴𝑐1 ∩ 𝐴𝑐2 ∩ ⋯ ∩ 𝐴𝑐24 )
= 1 − (0.999)24
= 𝟎. 𝟎𝟐𝟑𝟕
2. Urn 𝐴 contains 3 red and 3 black balls, while urn 𝐵 contains 4 red and 6 black balls. If a ball is
randomly selected from each urn, what is the probability that the balls will be the same color?
3. Let 𝐴 and 𝐵 be two events defined on 𝑆. If the probability that at least one of them occurs is 0.3,
and the probability that 𝐴 occurs but 𝐵 does not occur is 0.1, what is 𝑃(𝐵)?
4. A deck of cards is dealt out. What is the probability that the 14 th card is an ace? What is the
probability that the first ace occurs on the 14th card?
6. Find 𝑃(𝐴 ∩ 𝐵) if 𝑃(𝐴) = 0.2, 𝑃(𝐵) = 0.4, and 𝑃 (𝐴|𝐵 ) + 𝑃(𝐵|𝐴) = 0.75.
7. Show that if 𝑃(𝐴) > 0, then 𝑃 [(𝐴 ∩ 𝐵)|𝐴] ≥ 𝑃[(𝐴 ∩ 𝐵)|(𝐴 ∪ 𝐵)].