ch04 BasicProbability
ch04 BasicProbability
Probability Concepts
CHAPTER 4
Basic Probability
1. Introduction
In Chapter 3, we computed summary population parameters and sample statistics such as mean and
standard deviation for datasets. Because we were not given information about whether one observation
was more “likely” than another, we gave them all equal weight. (This is sometimes called the “equally
likely” assumption.) For example, in the formula for the population mean µ of a numerical attribute,
taken over all N members of the population,
N
x1 + x2 + · · · + xN 1 X
µ= = xi ,
N N
i=1
2. Overview of Probability
The mathematical theory of probability deals with patterns that occur in random events. Probability
is a well-established branch of mathematics that originated in 17th century France: gamblers who had
to leave a game early wanted a “fair” way of dividing the pot. This meant working out the chance
of each winning and dividing the pot in proportion to those chances. Pascal and Fermat1 had a long
written correspondence which led to the founding of probability theory. This branch of mathematics
studies games and other situations which involve chance or uncertainty. Today, probability theory finds
applications in every area of academic activity from finance and economics to physics, and in daily
experience from weather prediction to predicting the risks of new medical treatments.
Probability theory tries to describe random occurrences, e. g., measuring the height of a randomly chosen
person.
We attempt to describe how likely is a given outcome or set of outcomes when we observe a random
process or experiment. To say an experiment is random means (a) it is repeatable under identical
conditions; (b) the outcome of any particular trial can vary; (c) if we repeat the experiment many times,
we see some statistical regularity in the outcomes. For example:
• We roll a die2 and the possible outcomes are: 1, 2, 3, 4, 5 and 6, corresponding to the side that
turns up;
• We toss a coin with possible outcomes: H (heads) and T (tails);
• A bank gives out a loan and it is repaid in full or not repaid; Repaid (R), Not repaid (N ).
1Among other things (Pascal’s triangle, the programming language Pascal named after him, his lifelong ill-health, . . . )
the philosopher and mathematician Blaise Pascal is famous for Pascal’s wager on why he should believe in God: Pascal
reasons that if he believes in God but God does not exist, then he has lost a finite amount (of time); but if he does not
believe in God, yet God exists, then he has lost an infinite reward (eternity in heaven); so purely because of the infinite
di↵erence in consequences, he is better o↵ believing in God. Fermat was a well-o↵ official who contributed to many areas
of maths, especially number theory: his famous Fermat’s last theorem, conjectured in 1637 in the margin of his copy of
Diophantus’s book Arithmetica, opened up whole new areas of mathematics but was not proven until 358 years later in
1994 by Andrew Wiles.
2The word die is singular, while the word dice is its plural.
59
60 MIS10090
The observation or experiment is often called a trial. Much of the time, we measure some numerical
attribute derived from the outcome (see random variables below) as we can do numerical calculations
on these such as mean and standard deviation.
2.1. Terms in probability. There are some technical terms which you will need to understand to
progress further.
Definition 4.1 (Outcome). An outcome (or observation) is the result of observing a random process or
experiment, that is, the result of carrying out a trial, e. g., measuring a person’s height as 1.8m.
Definition 4.2 (Sample space). The sample space is the set of all possible outcomes, e. g., the set of all
heights of people measured. It is often denoted by ⌦ or S. It is also known as the probability space.
The sample space depends on what we want to observe in the experiment, e. g., the experiment might
be on all students in the class, but S is di↵erent if we measure weights rather than heights.
In the examples earlier, the sample spaces are, respectively: {1, 2, 3, 4, 5, 6}, {H, T } and {R, N }.
Remark 4.3 (For your information: not examinable). The positive integers N = {1, 2, 3, 4, . . .} are also
called the natural numbers or counting numbers. A set S is called countable if it can be put in one-one
correspondence with a subset of N, possibly a finite subset of N. That is, S is countable if you can write
S as a (possibly infinite) indexed list S = {s1 , s2 , s3 , . . .}.
In mathematics, there are di↵erent sizes of infinity, and N is the smallest. A set is called uncountable if
it is not countable: this means it has “more” elements than N, i. e., is of a larger infinity, such as R, the
real numbers. R and intervals within R, e. g., [a, b] = {x 2 R : a x b} where a < b, are the only
uncountable sets we look at.
From this comes the fundamental distinction between discrete and continuous that we will meet later in
the context of distributions and random variables. A sample space S is discrete if S is countable (which
includes the case of S being finite); and S is continuous if S is uncountable. Then a discrete distribution
or random variable (see later) is one defined on a countable S; while a continuous distribution or random
variable is one defined on an uncountable S.
Definition 4.4 (Event). An event is a defined set of outcomes (a subset of the sample space). A
single-element event (just one outcome) is sometimes called an elementary event.
Example 4.5. We might say the event “Giant person” is the set of all measurements of a person’s height
as > 2.0m, that is the subset G = {height : height > 2}.
Or if rolling a die, we might define “Even” as the event {2, 4, 6} where an even number is rolled. }
Definition 4.6 (Occurs). An event occurs if in our trial we observe one of the outcomes corresponding
to that event.
Example 4.7. If we measure a person’s height of 2.1m, the event “Giant person” or G occurs.
If our die roll shows 4, then the event “Even” occurs. If the roll shows 2, then the event “Even” occurs.
But if the roll shows 5, then the event “Even” does not occur. }
Example 4.8. Two distinct six-sided dice are rolled and the numbers on their faces noted. Describe the
sample space. Define and describe the events
A = {the outcomes where the sum of the two faces is 6}
B = {the outcomes where both dice show the same number}.
}
Definition 4.9 (Mutually exclusive). Two or more events are said to be mutually exclusive (or disjoint)
if at most one of them can occur when the experiment is performed, that is, if no two of them have
outcomes in common.
Important: in the next chapter we will meet the concept of independent events. Be aware from the word
Go that mutually exclusive and independent are totally di↵erent concepts!
Data Analysis for Decision Makers 61
Example 4.10. Suppose our experiment is “Select one card at random from a deck of cards”: then S
is the set of all 52 cards. Define Event A as “Queen of Diamonds is selected” and Event B as “Queen
of Clubs is selected”. Then Events A and B are mutually exclusive: at most one of the events A, B can
occur (maybe neither occurs). }
Definition 4.11 (Collectively exhaustive). Two or more events are said to be collectively exhaustive if
at least one of the events must occur. The union of these events covers the whole sample space.
Definition 4.12 (Partition). A collection of events is called a partition of the sample space if the events
are both collectively exhaustive and mutually exclusive, that is, exactly one of the events must occur.
2.2. Displaying events as Venn diagrams. Since events are subsets of the sample space, one
way to display them is by using Venn diagrams. Each of the events is viewed as a set and combined
with the other events in the appropriate manner (union, intersection, etc.) to represent the problem.
Figure 4.1 shows mutually exclusive events A and B.
Figure 4.1. Two mutually exclusive sets (events) A and B within a sample space S
Venn diagrams are particularly useful when the events are not mutually exclusive, that is, there is some
overlap. See Figure 4.2.
What do we mean by the event A and B? We mean the event which occurs when both A occurs and B
occurs. It is called a joint event since A and B occur jointly i. e., together. This means that the outcome
observed from the random process is an element of both A and B. In Example 4.13 on cards, the event
A and B occurs if the card drawn is a black ace (ace of spades or ace of clubs).
In set notation, it would be more correct to write A \ B for A and B, since A and B is the set A \ B in
a Venn diagram. We can use either, but you may find A and B or A & B more intuitive. See Figure 4.3.
Similarly, what do we mean by the event A or B? We mean the event which occurs when either or both
of events A or B occur. This means that the outcome observed from the random process is an element
of either A or B (or both). In Example 4.13 on cards, the event A or B occurs if the card drawn is
either a black card or an ace (or both).
In set notation, it would be more correct to write A [ B for A or B, since A or B is the set A [ B in a
Venn diagram. Again, we can use either, but you may find A or B more intuitive. See Figure 4.4.
62 MIS10090
Figure 4.2. Two sets (events) A and B within a sample space S. The joint event
A \ B = A and B is in green. The event A [ B = A or B is all of the red, blue and
green areas together.
Figure 4.3. Two sets (events) A and B within a sample space S, focussing on the joint
event A \ B = A and B = A & B
Figure 4.4. Two sets (events) A and B within a sample space S, focussing on the event
A [ B = A or B.
Data Analysis for Decision Makers 63
2.3. Probability of an event occurring. We seek to describe how “likely” is an event e. g., how
likely that we will find a “Giant person”.
Definition 4.14. We assign a number between 0 and 1 to an event E to describe the likelihood of E
occurring, with a larger number meaning “more likely”. This number is called the probability of the
event E, and is written as P (E).
These probabilities must obey certain rules, given below. But first, we will consider what probability
might mean: how should we interpret the word “probability”.
3. Interpretations of Probability
There are three major interpretations of the term “probability” of a given event:
4. Rules of Probability
• Given two events A and B, the probability of either or both of events A or B occurring is
calculated using the formula:
P (A or B) = P (A) + P (B) P (A and B)
To get an intuition as to why this is so, look again at Figure 4.2: event A is the blue and green
areas, while event B is the red and green areas. Then we can get the area covered by A or B
as all of the red, blue and green areas together. This is the area of A together with the area
of B: but we must subtract the area of the joint event A \ B = A and B (green) to avoid
double-counting.
• (consequence of previous point) Probabilities of mutually exclusive events add up: if events A1
and A2 are mutually exclusive, P (A1 or A2 ) = P (A1 ) + P (A2 ) (since P (A1 and A2 ) = 0: See
Figure 4.1: there is no overlap so no danger of double-counting.)
• The sum of the probabilities of all mutually exclusive and collectively exhaustive events is 1.
this is because the union of the mutually exclusive and collectively exhaustive set of events
is the whole sample space S.
in particular, the total probability over S equals 1: P (S) = 1 (sum the probabilities of all
outcomes)
e. g., if the sample space is S = A1 [ A2 [ B with A1 , A2 and B mutually exclusive
(non-overlapping) then P (A1 ) + P (A2 ) + P (B) = 1
Example 4.15. Table 4.1 shows the numbers of employees with particular roles (rows) and genders
(columns) in a company of 1000 employees.
Data Analysis for Decision Makers 65
If we wished to select random employees from this company (a sample space of size 1000), this table
gives us the probabilities.4
The second column represents the event “Employee is male” while the third column represents the event
“Employee is female”. For example, the probability that an employee chosen at random is female is
530/1000 = 0.53 = 53%. The gender events are mutually exclusive; each employee appears in only one
of the columns.
The rows show which department an employee works in. We can use the table to find the probability
that an employee works in sales. Let S be the event that the employee works in the sales department:
then P (S) = 80/1000 = 0.08 = 8%. Again, these are mutually exclusive events: if an employee is in the
sales dept, he/she is not in IT.
There are no other genders and no other departments for this company: all 1,000 employees are counted
somewhere so this listing of events is also collectively exhaustive.
A joint event such as “Female and works in Sales” is represented in the cell of the table where the
respective row (Sales) meets the respective column (Female). We see that P (Female and works in
Sales) = 20/1000 = 0.02 = 2%.
}
4.2. Complements.
Definition 4.16. The complement of an event A is the event A = S r A comprising all the outcomes
that are not in A.
If A does not occur, A must occur, so we have P (A) = 1 P (A). Thus A and A together form a partition
of the sample space.
Other notations you may see for the complement of A include ⇠A, A0 , or Ac .
5. Counting
There are various ways of working out the probability of an event E: as mentioned above, many of these
involve counting
(a) the number of outcomes in event E (“favourable outcomes”); and
(b) the total number of possible outcomes in the sample space;
then (assuming all outcomes are equally likely) taking the ratio of these as
favourable
P (E) = .
total
But how do we count? It can depend on several things, such as
• whether the order in which we list items is important;
• whether an item is replaced after we have counted it, or not replaced.
4This is also called a contingency table, or cross-tabulation table, and allows the sample space for a particular problem
to be viewed in a tabular format.
66 MIS10090
We now introduce some standard mathematical5 terms, which we will see used in di↵erent parts of the
course.
5.1. Multiplication Principle. Suppose I want to buy a new car. Imagine (this is not very
realistic — just an example) I can choose from 3 makes of car. Each make has 2 models and each model
can be provided in any one of 5 colours. How many choices do I have?
I can choose from 3 makes, for each of those I can choose from 2 models, for each of those I can choose
5 colours: that gives 3 ⇥ 2 ⇥ 5 = 30 choices overall.
For each independent choice, I multiply together the number of options available for that choice.
This works in general for any number of choices and is called the multiplication principle: if I have c1
choices for variable x1 , c2 choices for variable x2 , . . . , cm choices for variable xm , then in total I have
c1 ⇥ c2 ⇥ · · · ⇥ cm
ways to assign values to all m variables x1 , . . . , xm .
5.2. Factorials. The factorial of a positive integer (whole number) n is written as n! and means
the product of all the whole numbers from n down to 1:
n! = n ⇥ (n 1) ⇥ (n 2) ⇥ · · · ⇥ 3 ⇥ 2 ⇥ 1.
For example, “four factorial” is written as 4! and is 4 ⇥ 3 ⇥ 2 ⇥ 1 = 24.
We take 0! to be 1 by convention (it keeps things consistent).
This exclamation mark is a standard notation, so don’t use an exclamation mark after a number unless
you really mean the factorial.
Why to use it: the factorial n! gives the number of ways in which n distinguishable objects can be ordered
in n distinguishable boxes. By “distinguishable”, we mean that if any two objects were swapped, the
outcome would be di↵erent.
Example 4.17. Suppose we have 4 people to be assigned to 4 di↵erent officer positions in a club. The
positions are: President, Vice-president, Secretary and Treasurer. How many di↵erent assignments of
officers are there?
Solution: We have n = 4 distinguishable objects to fill 4 positions. Thus, there are 4! di↵erent ways of
ordering the people among the positions, that is, 4 ⇥ 3 ⇥ 2 ⇥ 1 = 24 ways or assignments. }
5.3. Combinations. A choice of k objects, without regard to order and without repetition, selected
from n distinct objects is called a combination of n objects taken k at a time.
In other words, a combination is a way of choosing a subset of k objects from a set of n objects, where
order does not matter (recall that the order in which we list elements in a set is unimportant).
The number of such combinations (the number of ways we can choose a subset of k items from the pool
of n items) is given by:
✓ ◆
n n!
=
k k!(n k)!
n
This is often read as “n choose k”. An alternative notation for k is n Ck .
5These terms are most used in the branch of mathematics called Combinatorics, which studies counting, combining
and permuting of objects.
Data Analysis for Decision Makers 67
Example 4.18. You are going to draw 4 cards from a standard deck of 52 cards. How many di↵erent
4-card hands are possible?
Solution: This is a combination problem, because a hand of cards is a subset of cards where the order
does not matter. Therefore, n = 52 and k = 4. The number of possible 4-card hands is
✓ ◆
52 52! 52!
= =
4 4!(52 4)! 4!48!
52 ⇥ 51 ⇥ 50 ⇥ 49
= (cancel 48! above and below)
4⇥3⇥2⇥1
that is, 270,725 di↵erent 4-card hands. }
Exercise 4.19. A poker hand comprises 5 cards. How many di↵erent poker hands can be dealt from a
standard deck of 52 cards?