Prob Book
Prob Book
Prob Book
• Latex macro files are based on the CLP Calculus text by Joel Feldman, Andrew Rech-
nitzer and Elyse Yeager.
2
Contents
1 Course Outline 1
1.1 Course Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Units of Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Reference Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Prerequisite Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.6 Detailed Syllabus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Combinatorics 7
2.1 Randomness & Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Counting Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Role of Counting Rules in Probability . . . . . . . . . . . . . . . . . 14
2.2.2 Basic Principles of Counting . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.3 Permutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.4 Multinomial Coefficients: Permutations with Indistinguishable Objects 18
2.2.5 Circular Permutation . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.6 Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 Home Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
i
CONTENTS CONTENTS
4 Discrete Distributions 55
4.1 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.1.1 Types of Random Variable . . . . . . . . . . . . . . . . . . . . . . . . 57
4.1.2 Discrete Probability Distribution . . . . . . . . . . . . . . . . . . . . 58
4.1.3 Cumulative Distribution Function (cd f ) . . . . . . . . . . . . . . . . 60
4.2 Expectation of a Random Variable . . . . . . . . . . . . . . . . . . . . . . . 64
4.2.1 Expected Values of Sums of Random Variable: Some Properties . . . 66
4.3 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.3.1 Variance: Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.3.2 Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.4 Bernoulli Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.4.1 Conditions for Bernoulli Variable . . . . . . . . . . . . . . . . . . . . 73
4.4.2 Probability Mass Function (pm f ) . . . . . . . . . . . . . . . . . . . . 74
4.4.3 Bernoulli Distribution: Expectation & Variance . . . . . . . . . . . . 74
4.5 Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.5.1 Background Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.5.2 Binomial Random Variable . . . . . . . . . . . . . . . . . . . . . . . 75
4.5.3 Conditions for Binomial Distribution . . . . . . . . . . . . . . . . . . 76
4.5.4 Probability Mass Function (pm f ) . . . . . . . . . . . . . . . . . . . . 76
4.5.5 Shape of Binomial Distribution . . . . . . . . . . . . . . . . . . . . . 79
4.5.6 Binomial Distribution: Expectation & Variance . . . . . . . . . . . . 81
4.6 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.6.1 Conditions for Poisson Variable . . . . . . . . . . . . . . . . . . . . . 82
4.6.2 Probability Mass Function (pm f ) . . . . . . . . . . . . . . . . . . . . 84
4.6.3 Poisson Distribution: Expectation and Variance . . . . . . . . . . . . 84
4.6.4 Poisson Approximation to the Binomial Distribution . . . . . . . . . 85
4.6.5 Comparison of Binomial & Poisson Distribution . . . . . . . . . . . . 86
4.7 Geometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.7.1 Geometric Distribution Conditions . . . . . . . . . . . . . . . . . . . 87
4.7.2 Probability Mass Function (pm f ) . . . . . . . . . . . . . . . . . . . . 89
4.7.3 Geometric Distribution: Cumulative Distribution Function cd f . . . 90
4.7.4 Geometric Distribution: Expectation and Variance . . . . . . . . . . 91
4.8 Negative Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.8.1 Probability Mass Function (pm f ) . . . . . . . . . . . . . . . . . . . . 93
4.8.2 Negative Binomial Distribution: Expected Value and Variance . . . . 95
4.8.3 Comparison of Binomial and Negative Binomial Models . . . . . . . 97
4.9 Hypergeometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.9.1 Conditions for Hypergeometric Distribution . . . . . . . . . . . . . . 99
4.9.2 Probability Mass Function (pm f ) . . . . . . . . . . . . . . . . . . . . 100
4.9.3 Hypergeometric Distribution: Expected Value and Variance . . . . . 100
4.9.4 Binomial Approximation to Hypergeometric Distribution . . . . . . . 101
4.10 Home Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
ii
CONTENTS CONTENTS
iii
CONTENTS CONTENTS
Index 226
iv
Chapter 1
Course Outline
Since this course is on-campus, you should aim to spend time each day going through some
course content on the relevant unit covered during class. Then you should spend some time
doing some work on the assigned questions in that content.
a. Combinatorics
e. Properties of Expectations
f. Joint Distributions
The following are the divisions of the course for this delivery:
1
Course Outline 1.3 Reference Materials
1.4 IJ Evaluation
The evaluation for this course will be based on the following:
The primary assessment quizzes would be formative assessments, while summative assess-
ments would be the 2 exams. It is imperative that you keep up with the lecture content.
Reviewing the notes and solving assigned problems will help solidify your conceptual un-
derstanding. The quizzes & exams in this course will test not just your ability to perform
procedural calculations but also your grasp of the concepts.
2
Course Outline 1.5 Prerequisite Content
The prerequisite material for this course includes the use of contents of MATH 101 Cal 1.
The mathematics in this course makes use of the tools discussed in MATH 101 Cal 1. You
are reminded that it is your responsibility to review any material upon which this course
builds upon.
The dates/times listed below are approximate and are subject to change. Adjustments will
be made as the semester progresses. Your schedule indicates that you have a ’Probability’
class every of the week for the duration of the semester. There are two lectures, each of 75
minute duration on Monday & Wednesday at 5:00PM and tutorials that will be announced
by the TA. This will most likely be used to work on the assessments.
3
Course Outline 1.6 Detailed Syllabus
4
Course Outline 1.6 Detailed Syllabus
Reading/Studying Schedule
5
Course Outline 1.6 Detailed Syllabus
6
Chapter 2
Combinatorics
AS YOU READ . . .
1. What is randomness?
2. What is probability?
3. Why review set theory?
4. What is the link of combinatorics to probability?
• Lack of pattern in events, e.g., many technological systems suffer from a num-
ber of significant uncertainties which may appear at all stages of design, exe-
cution and use
7
Combinatorics 2.1 Randomness & Probability
• Before the 17th century, classical mathematical theory failed in processes or experi-
ments that involved uncertain or random outcomes.
Probability is one of the most important modern scientific tools that treats those aspects of
systems that have uncertainty, chance or haphazard features. Probability is a mathematical
term used for the likelihood that something will occur.
• What are the odds that Twitter’s Stock will plunge tomorrow?
• 27% of U.S. workers worry about being laid off, up from 15% in 2019
https://sciencing.com/examples-of-real-life-probability-12746354.html
• Medical Science: Probability helps to quantify the risk of death of a bladder cancer
patients, or how likely it is for a COVID-19 patient to be hospitalized, etc.
8
Combinatorics 2.1 Randomness & Probability
• Actuarial Science is based on the risk of some event, i.e., it deals with lifetimes of
humans to predict how long any given person is expected to live, based on other vari-
ables describing the particulars of his/her life. Though, this expected life span is a
poor prediction when applied to any given person, it works rather well when applied
to many persons. It can help to decide the premium rates the insurance companies
should charge for covering any given person.
§§ Some Definitions
In the classical approach to the theory of probability it is often assumed that the
experiments can be repeated arbitrarily (e.g. tossing a coin, testing a concrete cube)
and the result of each experiment can be unambiguously used to declare whether a
certain event occurred or did not occur (e.g., when tossing a coin, observing a ‘T’).
Such an experiment is called a Random Experiment.
9
Combinatorics 2.1 Randomness & Probability
The sample space S of a certain random experiment denotes all events which can
be outcomes of the experiment under consideration. The sample space can be:
a. finite, e.g., (tossing a coin 4 times), then the sample space consists of all
24 ´tuples of H and T; i.e.,
HHHH
HHHT
S= ..
.
TTTT
b. infinite, e.g., record the duration (in seconds) of the next telephone call; then
the sample space is the set of all positive real numbers
S = t0, 1, 2, . . .u
Figure 2.1.1.
10
Combinatorics 2.1 Randomness & Probability
Each possible outcome of a sample space is called a sample point, and an event is
generally referred to as a subset of the sample space having one (simple) or more
(compound) sample points as its elements.
§§ Poker Hands
Figure 2.1.2.
A Poker Hand (see Figure 2.1.3) is a set of 5 cards chosen without replacement from
a pack of 52 cards.
11
Combinatorics 2.1 Randomness & Probability
Figure 2.1.3.
(2) Straight-Flush (excluding royal flush): A straight-flush consists of five cards with values
in a row, all of the same suit. Ace may be considered as high or low, but not both.
(e.g., A, 2, 3, 4, 5 is a straight, but Q, K, A, 2, 3 is not a straight.) The lowest value
in the straight may be A, 2, 3, 4, 5, 6, 7, 8, or 9. (Note that a straight flush beginning
with 10 is a royal flush, and we don’t want to count those.) So there are 9 choices for
4
the card values, and then = 4 choices for the suit, giving a total choices of
1
9 ˆ 4 = 36
(3) Four-of-a-kind: A four-of-a-kind is four cards showing the same number plus any other
card of any suit
13 4 12 4
# 4-of-a-kind = ˆ ˆ ˆ = 13 ˆ 1 ˆ 48 = 624
1 4 1 1
(4) Full House: A full house is three cards showing the same number plus a pair.
13 4 12 4
# Full House = ˆ ˆ ˆ = 13 ˆ 4 ˆ 12 ˆ 6 = 3, 744
1 3 1 2
12
Combinatorics 2.1 Randomness & Probability
4
(5) Flush: A flush consists of five cards, all of the same suit. There are = 4 ways to
1
13
choose the suit, then given that there are 13 cards of that suit, there are = 1287
5
4 13
ways to choose the hand, giving a total of ˆ = 5, 148 flushes. But note that
1 5
this includes the straight and royal flushes, which we don’t want to include. Subtracting
(36+4=40), we get a grand total of 5, 148 ´ 40 = 5, 108.
(6) Straight (excluding straight-flush): A straight consists of five values in a row, not all of
the same suit. The lowest value in the straight could be A, 2, 3, 4, 5, 6, 7, 8, 9 or 10,
giving 10 choices for the card values. Then there are 4 ˆ 4 ˆ 4 ˆ 4 ˆ 4 = 45 ways to
choose the suits of the five cards, for a total of 10 ˆ 45 = 10, 240 choices. But this value
also includes the straight flushes and royal flushes which we do not want to include.
Subtracting the 40 straight and royal flushes, we get 10, 240 ´ 40 = 10, 200.
(7) Three-of-a-kind: A three-of-a-kind is three cards showing the same number plus two
cards that do not form a pair or create a four-of-a-kind.
13 4 12 4 4
# 3-of-a-Kind = ˆ ˆ ˆ ˆ = 54, 912
1 3 2 1 1
(8) Two-pair: Two-pairs is two cards showing the same numbers and another two cards
showing the same numbers (but not all four numbers the same) plus one extra card
(not the same as any of the other numbers).
13 4 4 11 4
# 2-Pair = ˆ ˆ ˆ ˆ = 123, 552
2 2 2 1 1
(9) One-pair: One-pair is two cards showing the same numbers and another three cards all
showing different numbers.
3
13 4 12 4
# 1-Pair = ˆ ˆ ˆ = 1, 098, 240
1 2 3 1
(10) High card: High card means we must avoid higher-ranking hands (also known as no
pair or simply nothing is a hand that does not fall into any other category). All higher-
ranked hands include a pair, a straight, or a flush.Because
the numbers showing on
13
the cards must be five different numbers, we have choices for the five numbers
5
4 4
showing on the cards. Each of the cards may have any of four suits, i.e., ˆ ˆ
1 1
5
4 4 4 4
ˆ ˆ = . We subtract the number of straights, flushes, and royal
1 1 1 1
flushes. (Note that we avoided having any pairs or more of a kind.)
5
13 4
# High Cards = ˆ ´ 10, 200 ´ 5, 108 ´ 36 ´ 4 = 1, 302, 540
5 1
13
Combinatorics 2.2 Counting Rules
Consider 2 fair dice are rolled. Let A be the event that a sum of 7 occurs. How likely it is
that event A will occur?
Figure 2.2.1.
Therefore, to calculate the required probability, we count the number of ways event A can
occur and the total number of possible outcomes in the sample space S in Figure (2.2.1).
14
Combinatorics 2.2 Counting Rules
Many problems in probability theory require that we count the number of ways that
a particular event can occur. Systematic methods for counting the number of favor-
able outcomes of an experiment fall under the subject area called Combinatorics.
We will study several combinatorial techniques for counting large finite sets without
actually listing their elements. Combinatorial techniques are helpful for counting
the size of events that are important in probability theory. When selecting elements
of a set, the number of possible outcomes depends on the conditions under which
the selection has taken place.
If r experiments that are to be performed are such that the first one may result in
any of n1 possible outcomes; and if there are n2 possible outcomes of the second
experiment; and if, for each of the possible outcomes of the first two experiments,
there are n3 possible outcomes of the third experiment; and so on, then there is a
total of n1 ˆ n2 ˆ ¨ ¨ ¨ ˆ nr possible outcomes of the r experiments. Consider the 3
tosses of a fair coin with 2 possible outcomes for each toss; see Figure 2.1.1. There
are a total of 2 ˆ 2 ˆ 2 = 8 possible outcomes.
Example 2.2.3
2. How many numbers are there between 99 and 1000 having at least one of their digits
7?
3. In Figure 2.2.2 there are four bus routes between A and B; and three bus routes between
B and C. A man can travel round-trip in number of ways by bus from A to C via B. If
he does not want to use a bus route more than once, in how many ways can he make
round trip?
15
Combinatorics 2.2 Counting Rules
Figure 2.2.2.
Bus Routes
Solution:
1. Numbers between 99 and 1000 means from 100-999. This can be considered a 4-step
process. Every step can be done in a number of ways that does not depend on previous
choices
(a) Choose first digit; the 1st digit can be any of the choices between 1-9, therefore 9
choices for this stage.
(b) Choose second digit; the 2nd digit can be chosen from 0-9 excluding the choice in
the previous stage, therefore 9 choice
(c) Choose third digit; the 3rd digit can be any of the choice between 0-9 excluding
the digits in the first 2 stages, i.e., 8 choices
So there are 9 ˆ 9 ˆ 8 = 648 possible numbers between 99 and 1000 with no repeated
digits.
2. Sample space of the numbers between 99 and 1000 without any restriction consists of
a total of 9 ˆ 10 ˆ 10 = 900 possible ways. With condition of having at least one of
their digits is 7 can be broken down into 2 parts that are complements of each other.
(a) Numbers between 99 and 1000 with condition that none of the digits as 7, i.e.,
8 ˆ 9 ˆ 9 = 648
(b) Numbers between 99 and 1000 with condition of at least one of the digits as 7 is
therefore , i.e., 900 ´ 648 = 252
3. The condition is that he does not want to use a bus route more than once,
Therefore, 12 ˆ 6 = 72 possible routes for a round trip with condition that a route is
not used more than once.
Example 2.2.3
16
Combinatorics 2.2 Counting Rules
2.2.3 §§ Permutation
Definition 2.2.4 (Permutation Rule).
n! = n(n ´ 1)(n ´ 2) . . . 1
0! ” 1
https://www.youtube.com/watch?v=RbugCeR-njk
b. Part of a Set of Objects Arrangement: The total number of permutations
of a set A of n elements is an ordered listing of a subset of A of size k
without replacement and is given by
n n!
Pk = = n(n ´ 1)(n ´ 2) ¨ ¨ ¨ (n ´ k + 1).
(n ´ k)!
17
Combinatorics 2.2 Counting Rules
If Ali and Sara (a couple) and Babar and Hina (another couple) and Soban and Muzna (an-
other couple) sit in a row of chairs as in Figure 2.2.3,
1. How many different seating arrangements are there?
2. How many ways they can be seated so that each of the 3 couples sit together?
3. Find also the number of ways of their seating if all the ladies sit together.
4. In how many different ways can the 3 women be seated together on the left, and then
the 3 men together on the right?
Figure 2.2.3.
Solution:
The solutions are as follows
1. 6! ways for seating 6 persons (no restriction)
2. 3! ˆ 2 ˆ 2 ˆ 2 = 48 ways. As 3 couples sit together is 1 condition, so a total of 3 objects
to arrange, but there can be 2 possible ways for each couple sitting together.
3. 3! ˆ 4! = 144 ways that 3 ladies sit together.
4. 3! ˆ 3! = 36 ways that 3 women be seated together on the left, and then the 3 men
together on the right.
Example 2.2.5
18
Combinatorics 2.2 Counting Rules
n
is called multinomial coefficient.
n1 , n2 , . . . , n k
Example 2.2.6
A bridge hand (4 hands of 13 cards each) is dealt from a standard 52 card deck. How many
different bridge hands are there?
Solution:
n! 52!
=
n1 !n2 !n3 !n4 ! (13!13!13!13!)
Example 2.2.6
• How can you arrange seating 3 friends A, B and C around a round table?
Figure 2.2.4.
Circular Permutation.
If we arrange these 3 persons around a round table as show in the Circular Arrangement 1 in
the Figure 2.2.4, we notice that all the different arrangements are not actually different but
all are same. Same is true for Circular Arrangement 2. If you move clockwise, start with A,
round the table in Figure 2.2.4, you will always get A-B-C. Important points to ponder are:
• If the clockwise and counter clockwise orders CAN be distinguished then total number
of circular permutation of n elements taken all together = (n ´ 1)!. The number is
(n ´ 1)! instead of the usual factorial n! since all cyclic permutations of objects are
equivalent because the circle can be rotated. The point is in circular permutation one
element is fixed and the remaining elements are arranged relative to it.
19
Combinatorics 2.2 Counting Rules
• If the clockwise and counter clockwise orders CANNOT be distinguished then total
number of circular permutation of n elements taken all together = (n ´ 1)!/2
2. If Ali and Soban insist on sitting besides each other, how many arrangements are
possible now to seat them around the table?
Figure 2.2.5.
Solution:
The solutions are as follows
1. (3 ´ 1)! ways for seating 3 couples (condition: couples sit together, so each couple is
taken as a single object.) But there can be 2 possible ways for each couple sitting
together. 6 (3 ´ 1)! ˆ 2 ˆ ˆ2 ˆ 2 ways.
Example 2.2.7
20
Combinatorics 2.2 Counting Rules
2.2.6 §§ Combinations
§§ Pascal’s Triangle
There is a connection between the total number of subsets of n elements and the binomial
n
ÿ n
coefficients: = 2n . In Figure 2.2.6, the sum of binomial coefficients in each row is
k
k =0
equal to 2n ; the cardinality of the power-set.
21
Combinatorics 2.2 Counting Rules
Figure 2.2.6.
Pascal’s Triangle.
n+r´1 6+3´1
= = 56
r 3
Example 2.2.9
22
Combinatorics 2.2 Counting Rules
Figure 2.2.7.
Example 2.2.10
1. A store has to hire two cashiers. Five people are interviewed for the jobs. How many
different ways can the hiring decisions be made?
2. Suppose there were 15 business people at a meeting. At the end of the meeting, each
person at the meeting shook hands with every other person. How many handshakes
were there?
3. A poker hand is a set of 5 cards chosen without replacement from a deck of 52 playing
cards. In how many ways can you get a hand with 3 red cards and 2 black cards?
4. There are 3 copies of Harry Potter and the Philosopher’s Stone, 4 copies of The Lost
Symbol, 5 copies of The Secret of the Unicorn. In how many ways can you arrange
these books on a shelf?
Solution:
5
1. = 10
2
2. As each person at the meeting shook hands with every otherperson,
and the order of
15
the handshakes between people does not matter; so a total of = 105 handshakes.
2
23
Combinatorics 2.2 Counting Rules
26 26
3.
3 2
4. There are a total of 12 books, therefore 12! ways to arrange those. These 12 books can
be categorized in to 3 distinct sets
However 3 copies of Harry Potter are not distinct; 4 copies of The Lost Symbol are not
distinct & likewise 5 copies of The Secret of the Unicorn are not distinct. Therefore
multinomial coefficients are used to find the number of arrangements here, i.e.,
12!
= 27720 ways
(3! ˆ 4! ˆ 5!)
Example 2.2.10
24
Combinatorics 2.3 Home Work
2. How many ways are there to seat 10 people, consisting of 5 couples, in a row of seats
if:
3. A box contains 30 balls, of which 10 are red and the other 20 blue. Suppose you take
out 8 balls from this box without replacement. How many possible ways are there to
have 3 red and 5 blue balls in this sample?
4. How many ways can eight people (including Mandy and Cindy) line up for a bus, if
Mandy and Cindy refuse to stand together?
5. How many integers, greater than 999 but not greater than 4000, can be formed with
the digits 0, 1, 2, 3 and 4, if repetition of digits is allowed?
6. In the laboratory analysis of samples from a chemical process, five samples from the
process are analyzed daily. In addition, a control sample is analyzed two times each
day to check the calibration of the laboratory instruments.
(a). How many different sequences of process and control samples are possible each
day? Assume that the five process samples are considered identical and that the
two control samples are considered identical.
(b). How many different sequences of process and control samples are possible if we
consider the five process samples to be different and the two control samples to
be identical?
(c). For the same situation as part (b), how many sequences are possible if the first
test of each day must be a control sample?
(a). How many three-digit phone prefixes that are used to represent a particular ge-
ographic area (such as an area code) can be created from the digits 0 through
9?
(b). As in part (a), how many three-digit phone prefixes are possible that do not start
with 0 or 1, but contain 0 or 1 as the middle digit?
(c). How many three-digit phone prefixes are possible in which no digit appears more
than once in each prefix?
25
Combinatorics 2.3 Home Work
§§ Answers:
1. (a) 2610
(b) 26 ˆ 25 ˆ ¨ ¨ ¨ ˆ 17
(c) 2 ˆ 26 ˆ 10 ˆ 25 ˆ 9 ˆ 24 ˆ 8 ˆ 23 ˆ 7 ˆ 22 ˆ 6
2. (a) 10!
(b) 10 ˆ 1 ˆ 8 ˆ 1 ˆ 6 ˆ 1 ˆ 4 ˆ 1 ˆ 2 ˆ 1
10 20
3.
3 5
4. 30240
5. 376
26
Chapter 3
AS YOU READ . . .
5. What is Bayes’ theorem and how it is useful in getting a data based updated probabil-
ity?
For each random experiment, there is an associated a random variable, which rep-
resents the outcome of any particular experiment.
A sample space is any set that lists all possible outcomes (or, responses) of some
unknown experiment or situation. A sample space is generally denoted by the cap-
ital letter S, e.g., when predicting tomorrow’s weather, then the sample space is
S = tRain, Cloudy, Sunnyu.
Each subset of a sample space is defined to be an event. When some experiment is
performed, an event either will or will not occur, for the weather forecast example,
the subsets trainu, tcloudyu, train, cloudyu, train, sunnyu, train, cloudy, sunnyu, . . .,
and even the empty set ϕ = tu, are all examples of subsets of S that could be events.
27
Basic Concepts & Laws of Probability 3.1 Some Definitions
A null or empty event is one that cannot happen, denoted by ϕ such as getting a
sum of 14 on 2 rolls of a fair die.
All outcomes in the sample space have an equal chance to occur. e.g., coin toss
outcomes ‘H & T’ are equally likely events. In rolling a balanced die each of the
outcomes t1, 2, . . . , 6u are equally likely.
Take an example of tossing 3 coins & the sample space S also visualized in Figure
2.1.1.
S = tTTT, TTH, THT, THH, HTT, HTH, HHT, HHHu
let X denote the number of heads in this example, let A be the event that at least
2 heads appear & B be the event that at most 2 heads appear. Write down A & B
28
Basic Concepts & Laws of Probability 3.1 Some Definitions
b. P(S) = 1
c. P(ϕ) = 0
18
P( A) = P(Sum is even) =
36
21
P( B) = P(Sum ą 6) =
36
Example 3.1.7
Let A: Sum is even when 2 fair dice are rolled. Then you might have to find the
probability that A does not occur, i.e., P( Ac ).
18 18
P( Ac ) = 1 ´ P( A) = 1 ´ =
36 36
29
Basic Concepts & Laws of Probability 3.1 Some Definitions
Figure 3.1.1.
Ac
Complement of an Event.
Solution:
In Definition 3.1.5, the events corresponding to at least and at most 2 heads were specified.
Let X be the number of heads that appear in 3 tosses, then possible values of X = t0, 1, 2, 3u
30
Basic Concepts & Laws of Probability 3.1 Some Definitions
6 P( X ě 2) = 4/8
= 1/2
P ( X ě 2) = 1 ´ P ( X ă 2)
= 1 ´ 4/8
= 1/2
P( X ď 2) = 7/8
Alternatively, we can also use complement rule to find the required probability
P ( X ď 2) = 1 ´ P ( X ą 2)
= 1 ´ 1/8
= 7/8
Remember that using complement rule of probability, you partition the sample space into
mutually exclusive events.
Example 3.1.9
31
Basic Concepts & Laws of Probability 3.1 Some Definitions
1. P( A Y B) = P( A) + P( B) ´ P( A X B)
• In two rolls of a fair die, you might be interested to find the probability
that sum is either even or greater than 6. Let A be the event that sum
is even, while B be the event that sum is greater than 6. Then
P( A Y B)1 = 1 ´ P( A Y B) = 5/6
2. Extension
P( A Y B Y C ) = P( A) + P( B) + P(C )
´ P( A X B) ´ P( A X C ) ´ P( B X C )
+ P( A X B X C )
32
Basic Concepts & Laws of Probability 3.1 Some Definitions
Figure 3.1.2.
A
A∪ B
2 4,6 3,5
Union of 2 Events
33
Basic Concepts & Laws of Probability 3.1 Some Definitions
Figure 3.1.3.
2 4
1 5 13
B C
Union of 3 Events
If A Ă B, then
P( B) = P( A) + P( Ac X B)
P( A) ď P( B) which is called monotonicity of probability.
34
Basic Concepts & Laws of Probability 3.1 Some Definitions
B = A Y ( Ac X B)
= t8, 8, 8, 8, 8, 10, 10, 10, 12u Y t2, 4, 4, 4, 6, 6, 6, 6, 6u
= t2, 4, 4, 4, 6, 6, 6, 6, 6, 8, 8, 8, 8, 8, 10, 10, 10, 12u
B is thus the event that an even number appears when 2 dice are rolled.
Example 3.1.12
Figure 3.1.4.
Subset
35
Basic Concepts & Laws of Probability 3.1 Some Definitions
• If there are N outcomes in the sample space and each outcome is equally likely,
1
then the probability of each outcome is , e.g., the Probability of getting a
N
Red with the spinner in Figure 3.1.5 is 1/8 as each of the N = 8 outcomes in
spinner are equally likely.
• If there are N outcomes in the sample space and each outcome is equally likely,
n
and A is an event with n outcomes, then P( A) = , e.g., the Probability of
N
getting a Yellow with the spinner in Figure 3.1.5 is 3/8
Figure 3.1.5.
A Spinner.
Example 3.1.14
36
Basic Concepts & Laws of Probability 3.1 Some Definitions
1
Ellie will take 2 books on vacation. She will like the first with probability with , the second
2
2 3
with probability . She will like both the books with probability . What is the probability
5 10
that she likes at least one of them? Find the probability that she dislikes both.
Solution:
1 2 3
P(1st) = ; P(2nd) = ; P( Both) =
2 5 10
1 2 3 6
P(likes at least 1 of them) = P(1st) + P(2nd) ´ P( Both) = + ´ =
2 5 10 10
3 7
P(Dislikes both) = 1 ´ P(Likes both) = 1 ´ =
10 10
Example 3.1.14
§§ Odds
Definition 3.1.15 (Odds).
Odds represent the likelihood that the event will occur. The odds in favor - the
ratio of the number of ways that an outcome can occur compared to how many
ways it cannot occur, i.e.,
r/s r
P( A) = =
(r/s) + 1 r+s
e.g., when you roll a fair die the odds of getting a ‘6’ are 1 to 5
1
• Convert from odds to probability 6 P(6) =
1+5
• Convert from a probability to odds, e.g., if probability is 1/6 , then the odds
are ‘1 : 5’
https://www.theweek.co.uk/99357/us-election-2020-polls-who-will-win-trump-biden
Example 3.1.16
1. A study was designed to compare two energy drink commercials. Each participant was
shown the commercials, A and B, in random order and asked to select the better one.
There were 100 women and 140 men who participated in the study. Commercial A was
selected by 45 women and by 80 men. Find the odds of selecting Commercial A for the
men. Do the same for the women.
37
Basic Concepts & Laws of Probability 3.1 Some Definitions
2. People with type O negative blood are universal donors. That is, any patient can
receive a transfusion of O negative blood. Only 7% of the American population have
O negative blood. If 10 people appear at random to give blood, what is the probability
that at least 1 of them is a universal donor?
3. Birthday Paradox: Two people enter a room and their birthdays (ignoring years) are
recorded.
(a.) What is the probability that the two people have a specific pair of birthdates?
(b.) What is the probability that the two people have different birthdates?
Solution:
2. Let X be the number of people with O negative group in a group of 10 people. We are
interested to find the probability that in a group of 10 people at least 1 of them has O
negative group, i.e., P( X ě 1). Probability of a single randomly selected person with O
negative group is 0.07, i.e., P( X ) = 0.07; using complement rule P( X c ) = 1 ´ 0.07 =
0.93 is the probability of not having O negative group.
P ( X ě 1) = 1 ´ P ( X ă 1)
= 1 ´ P ( X = 0)
= 1 ´ (0.93)10 7 Donors are independent.
= 0.516
Example 3.1.16
P(at least two people have same birthdates) = 1 ´ P(None have same birthdates)
= 1 ´ 365 ˆ 364 ˆ 363/(3653 )
= 1 ´ 365 ˆ 364 ˆ (365 ´ 3 + 1)/(3653 )
= 0.0082
38
Basic Concepts & Laws of Probability 3.2 Conditional Probability
The birthday problem is also shown in Figure 3.1.6. For sharing a birthday, a single pair
has a fixed probability of 0.0027 for matching. That’s low for just one pair. However, as the
number of people increases rapidly, so does the probability of a match.
Example 3.1.17
Figure 3.1.6.
0.4
0.2
0.0
0 20 40 60 80 100
Number of People
39
Basic Concepts & Laws of Probability 3.2 Conditional Probability
Flip a coin 3 times, (see Figure 2.1.1). What is the probability that the first coin
comes up heads? Suppose that some additional information that exactly two of the
three coins came up heads becomes available.
2. How do probabilities change when we know that some event B has occurred?
b. P(S|B) = 1
(given that P( B) ‰ 0)
Figure 3.2.1.
Conditional Probability.
40
Basic Concepts & Laws of Probability 3.3 Multiplication Rule
Example 3.2.2
A recent survey in US asked 100 people if they thought women in the armed forces should
be permitted to participate in combat. The results of the survey of Males & Females cross-
classified by their responses are given in the Table below:
Male (M) Female (F) Total
Yes 32 8 40
No 18 42 60
Total 50 50 100
Find the probability that a randomly selected respondent
1. was female who answered ’yes’.
2. who said ’no’ was a male.
Solution:
8/100
1. P( F|Yes) = = 8/40
40/100
18/100
2. P( M|No ) = = 18/60 = 3/10
60/100
Example 3.2.2
Example 3.3.1
When a company receives an order, there is a probability of 0.42 that its value is over $1000.
If an order is valued at over $1000, then there is a probability of 0.63 that the customer will
pay with a credit card. What is the probability that the next order will be valued at over
$1000 but will not be paid with a credit card?
Solution:
P(Over 1k) = 0.42
P(C|Over 1k) = 0.63
P(C1 X Over 1k) =?
P(C1 X Over 1k)
P(C1 |Over 1k) =
P(Over 1k)
P(C1 X Over 1k)
1 ´ 0.63 =
0.42
1
P(C X Over 1k) = 0.42 ˆ (1 ´ 0.63)
= 0.1554
41
Basic Concepts & Laws of Probability 3.3 Multiplication Rule
Example 3.3.1
How to compute the joint probability of A and B when we are given the probability
of A and the conditional probability of B given A, etc?
P( A X B) = P( B|A) ¨ P( A)
P( A X B) = P( A|B) ¨ P( B)
When the outcome or occurrence of the first event affects the outcome or occurrence
of the second event in such a way that the probability is changed, the events are
said to be dependent.
§§ Independence
S = t1H, 2H, 3H, 4H, 5H, 6H, 1T, 2T, 3T, 4T, 5T, 6Tu
2. What is the probability that the die comes up 5, conditional on knowing that
the coin came up tails?, i.e., P(Die shows 5|tail) = 1/6
In this example P(Die shows 5|tail) = P(Die shows 5) = 1/6, such events are
independent.
Whether or not the occurrence of one event affects the probability of the occurrence
of the other? Two events A and B are independent if the fact that A occurs does
not affect the probability of B occurring. By definition
P( A|B) = P( A)
P( B|A) = P( B)
P( A X B) = P( A) ¨ P( B)
42
Basic Concepts & Laws of Probability 3.3 Multiplication Rule
Example 3.3.5
A Harris pole found that 46% of Americans say they suffer great stress at least once a
week. If three people are selected at random, find the probability that all three will say that
they suffer stress at least once a week.?
Solution:
P(Stress at least once a week)=0.46; As 3 selected people are independent,
6 all three will suffer stress at least once a week = 0.46 ˆ 0.46 ˆ 0.46
= 0.097
Example 3.3.5
P( B|A) = 50/850
Also, the probability that both parts are defective is
50 50
P( A X B) = P( B|A) ¨ P( A) = ¨ = 0.0035
850 850
Example 3.3.6
• independence means that probability of one event does not affect the proba-
bility of the other, i.e., P( A|B) = P( A)
P( A X B) = 0 ùñ P( A|B) = 0 or P( B) = 0
43
Basic Concepts & Laws of Probability 3.3 Multiplication Rule
When the outcome or occurrence of the first event affects the outcome or occurrence
of the second event in such a way that the probability is changed, the events are
said to be dependent.
Example 3.3.9
Four of the light bulbs in a box of ten bulbs are burnt out or otherwise defective. If two
bulbs are selected at random without replacement; (see Figure 3.3.1) and tested, what is the
probability that
Solution:
As the bulbs are selected without replacement, therefore the selection is of dependent events
and the Multiplication Law for dependent events is used here.
Example 3.3.9
44
Basic Concepts & Laws of Probability 3.3 Multiplication Rule
Figure 3.3.1.
Generally there are two rules with Tree diagrams that you should keep in mind while
computing probabilities
1. When you are traveling along a branch you multiply the probabilities, i.e., use Multi-
plication Law of Probability.
2. When you go from branch to branch you add, i.e., either of the branches, so use Addition
Law of Probability.
P( A X B) = P( A|B) ¨ P( B)
P( A X B) = P( B|A) ¨ P( A)
P( A X B X C ) = P(C|A X B) P( A) P( B|A)
P ( A1 X A2 X A n . . . X A n ) = P( A1 ) ¨ P( A2 |A1 ) ¨ P( A3 |A1 X A2 )
¨ ¨ ¨ P( An |A1 X A2 X A3 ¨ ¨ ¨ X An´1 )
Students find it difficult to decide which Probability law to use for a certain scenario. Use
of Figure 3.3.2 while solving each problem will be helpful in making the correct choice.
45
Basic Concepts & Laws of Probability 3.3 Multiplication Rule
Figure 3.3.2.
46
Basic Concepts & Laws of Probability 3.4 Law of Total Probability
A = A X B1 Y A X B2 ¨ ¨ ¨ A X Bk
Figure 3.4.1.
47
Basic Concepts & Laws of Probability 3.4 Law of Total Probability
and 1. We assume that, for a given binary channel, 40% of the time a 1 is transmitted;
the probability that a transmitted 0 is correctly received is 0.90, and the probability that a
transmitted 1 is correctly received is 0.95. Determine the probability of a 1 being received.
Solution:
Use a Tree diagram as in Figure 3.4.2. Here we are given different simple probabilities
P(0) = 0.6; P(1) = 0.4
and some conditional probabilities
P(0|0) = 0.90; P(1|1) = 0.95
We need to find the probability of 1 being received. Using the tree diagram in Figure
3.4.2
Example 3.4.2
Figure 3.4.2.
P (0 ∩ 0)
0|0
0 . 90
|0 )=
P (0
P (0)
1|0
P (1
0 |0) =
.60 1−
=0 0.90
0) P (0 ∩ 1)
P(
S
P (1 ∩ 0)
P( 1 0|1
1) .95
=0
.4 1 −0
|1) =
P (0
P (1)
1|1
P (1
|1) =
0 . 95
P (1 ∩ 1)
Binary Signal
48
Basic Concepts & Laws of Probability 3.5 Bayes’ Theorem
• With known P( A|Bi ), move in the ’reverse’ direction in the tree diagram and
use P( A|Bi ) to find P( Bi |A) called ’Posterior Probability’
P( Bi ) ¨ P( A|Bi ) P( Bi ) ¨ P( A|Bi )
P( Bi |A) = =
P( A) k
ř
P( A|Bi ) ¨ P( Bi )
i =1
Figure 3.5.1.
P (A ∩ B1 )
A| B 1
)
|B 1
P (A
P (B1 )
A¯|B
1
B1 P (A¯
) |B1
B1 )
P( P (Ā ∩ B1 )
S
P (A ∩ B2 )
B2
P(
B2 A| B 2
)
) |B 2
P (A
P (B2 )
A¯|B
2
P (A¯
|B2
)
P (Ā ∩ B2 )
49
Basic Concepts & Laws of Probability 3.5 Bayes’ Theorem
Solution:
P(one was transmitted|one being received) is asking to move in the reverse direction in the
Figure 3.4.2.
P(one received|one transmitted). P(1 transmitted)
P(one was transmitted|one being received) =
P(one being received)
P (1 X 1)
=
P (0 X 1) + P (1 X 1)
0.4 ˆ 0.95
=
0.6 ˆ (1 ´ 0.90) + 0.4 ˆ 0.95
0.38
=
0.44
= 0.863
There is 86.3% chance that signal one was transmitted when signal one was received.
Example 3.5.2
50
Basic Concepts & Laws of Probability 3.5 Bayes’ Theorem
• The Royal Statistical Society later issued a statement and expressed concern at the ’mis-
use of statistics in the courts arguing that there was no statistical basis for Meadow’s
claim,’.
• Clark was wrongly convicted in November 1999. The convictions were upheld on appeal
in October 2000, but overturned in a second appeal in January 2003, after it emerged
that the prosecution forensic pathologist who examined both babies, had failed to
disclose microbiological reports that suggested the 2nd of her sons had died of natural
causes.
• Clark’s experience caused her to develop serious psychiatric problems and she died in
her home in March 2007 from alcohol poisoning.
§§ Applications
Standard applications of the multiplication formula, the law of total probabilities, and Bayes’
theorem occur with two-stage systems. The response for such systems can be thought of as
occurring in two steps or stages.
• Typically, we are given the probabilities for the first stage and the conditional proba-
bilities for the second stage.
• The multiplication formula is then used to calculate joint probabilities for what happens
at both stages;
• Law of Total Probability: used to compute the probabilities for what happens at the
second stage;
• Bayes’ Theorem: used to calculate the conditional probabilities for the first stage, given
what has occurred at the second stage
51
Basic Concepts & Laws of Probability 3.6 Home Work
1. The WW Insurance Company found that 53% of the residents of a city had homeowner’s
insurance with its company. Of these clients, 27% also had automobile insurance with
the company. If a resident is selected at random, find the probability that the resident
has both homeowner’s and automobile insurance.
2. If there are 25 people in a room, what is the probability that at least two of them share
the same birthday?
3. You have a blood test for a rare disease that occurs by chance in 1 person in 100,000.
If you have the disease, the test will report that you do with probability 0.95 (and that
you do not with probability 0.05). If you do not have the disease, the test will report
a false positive with probability 0.001. If the test says you do have the disease, what
is the probability it that you actually have the disease? Interpret the results
4. You go to see the doctor about an ingrown toenail. The doctor selects you at random
to have a blood test for swine flu, which is currently suspected to affect 1 in 10,000
people in Australia. The test is 99% accurate, in the sense that the probability of a
false positive is 1%. The probability of a false negative is zero. You test positive. What
is the new probability that you have swine flu? Interpret the results
5. Suppose that 65 percent of a discount chain’s employees are women and 33 percent of
the discount chain’s employees having a management position are women. If 25 percent
of the discount chain’s employees have a management position, what percentage of the
discount chain’s female employees have a management position?
6. A company administers an “aptitude test for managers” to aid in selecting new man-
agement trainees. Prior experience suggests that 60 percent of all applicants for man-
agement trainee positions would be successful if they were hired. Furthermore, past
experience with the aptitude test indicates that 85 percent of applicants who turn out
to be successful managers pass the test and 90 percent of applicants who do not turn
out to be successful managers fail the test. a If an applicant passes the “aptitude test
for managers,” what is the probability that the applicant will succeed in a management
position? b Based on your answer to part a, do you think that the “aptitude test for
managers” is a valuable way to screen applicants for management trainee positions?
Explain.
7. Three data entry specialists enter requisitions into a computer. Specialist 1 processes 30
percent of the requisitions, specialist 2 processes 45 percent, and specialist 3 processes
25 percent. The proportions of incorrectly entered requisitions by data entry specialists
1, 2, and 3 are .03, .05, and .02, respectively. Suppose that a random requisition is
found to have been incorrectly entered. What is the probability that it was processed
by data entry specialist 1? By data entry specialist 2? By data entry specialist 3?
52
Basic Concepts & Laws of Probability 3.6 Home Work
§§ Answers
1. 0.1431. Multiplication Law of Probability for Dependent Events.
365 ˆ 364 ˆ ¨ ¨ ¨ ˆ 341
2. P( None) = = 0.4313 P(At least 2 share)=1-0.4313=0.5687
36525
3. 0.0094. There is only 0.94% chance that you do have the disease, or in other words the
test result is most likely false positive.
4. 0.0099. There is only 0.99% chance that you do have the swine flue, or in other words
the test result is most likely false positive
5. 0.1269
6. 0.927; Yes
53
Basic Concepts & Laws of Probability 3.6 Home Work
54
Chapter 4
Discrete Distributions
AS YOU READ . . .
1. What is a Random Variable and what are different types of Random Variable?
5. What are different Discrete Probability Models, i.e., Bernoulli, Binomial, Poisson, Ge-
ometric, Negative Binomial and Hypergoemetric Distributions?
55
Discrete Distributions 4.1 Random Variables
1. A random variable is a function from the sample space S to the real numbers,
i.e., X is a rule which assigns a number X (s) for each outcome x P S
Example 4.1.2
1. Toss a coin. The sample space is S = tT, Hu. Let X be the number of heads from the
result of a coin toss, then X = t0, 1u
"
1/2 for x = 0, 1;
P( X = x ) =
0 otherwise.
2. Let X denote the sum of the numbers on the upper faces that might appear when 2
fair dice are rolled, (see Figure 2.2.1). Then X = t2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12u.
3. If we roll three dice, there are 63 = 216 possible outcomes. Let the random variable X
be the sum of the three dice. In this case, X = t3, 4, . . . , 17, 18u.
Example 4.1.2
56
Discrete Distributions 4.1 Random Variables
a. The number, x, of the next three customers entering a store who will make a
purchase. Here x could be 0, 1, 2, or 3.
b. The number, x, of four patients taking a new antibiotic who experience gastroin-
testinal distress as a side effect. Here x could be 0, 1, 2, 3, or 4.
c. The number of defective parts produced in manufacturing. Here x could be 0, 1,
2, or 3.
d. The number of people getting flu in winter. Here x could be 0, 1, 2, or 3.
Figure 4.1.1.
0 1 2 3
2. Continuous Random Variable: assumes any values in an interval (either finite or infi-
nite) of real numbers for its range, e.g.,
57
Discrete Distributions 4.1 Random Variables
Figure 4.1.2.
Heights of Students in cm
P( X = x ) = p( x )
Because p( x ) is a function that assigns probabilities to each value x of the random
variable X, it is sometimes called the probability function for X.
58
Discrete Distributions 4.1 Random Variables
1. 0 ď P( X = x ) ď 1, @ x P S .
ÿ
2. P( X = x ) = 1, where the summation is over all values of x with nonzero probability.
x
For a discrete random variable, its probability mass function (pm f ) defines all that we need
to know about the random variable. A pm f for a discrete random variable is defined (with
positive probabilities) only for a finite or countably infinite set of possible values - typically
integers. Toss a fair coin three times and let X denote the number of heads observed. The
probability distribution of X is shown graphically in Figure 4.1.3.
Figure 4.1.3.
1/2
3/8
P(X=x)
1/4
1/8
0
0 1 2 3
If we roll 2 dice, let X be the sum that appears on the upper faces. The probability
distribution of X is shown graphically in Figure 4.1.4.
59
Discrete Distributions 4.1 Random Variables
Figure 4.1.4.
Example 4.1.4
A company has five warehouses, only two of which have a particular product in stock. A
salesperson calls the five warehouses in random order until a warehouse with the product
is reached. Let the random variable Y be the number of calls made by the salesperson.
Calculate the probability mass function.
Solution:
Let X be the event that a particular product in stock, then P( X ) = p = 2/5. Let Y be the
number of calls made by the salesperson needed to find a warehouse with the product. He
calls the warehouses one by one until he finds the warehouse with required product.
Y P (Y ) F (Y )
1 2/5 2/5
2 3/5 ˆ 2/4 = 3/10 2/5 + 3/10 = 7/10
3 3/5 ˆ 1/2 ˆ 2/3 = 1/5 2/5 + 3/10 + 1/5 = 9/10
4 3/5 ˆ 1/2 ˆ 1/3 = 1/10 2/5 + 3/10 + 1/5 + 1/10 = 1
Example 4.1.4
60
Discrete Distributions 4.1 Random Variables
Example 4.1.6
Toss a fair coin three times and let x denote the number of heads observed. Find the corre-
sponding cumulative distribution function cd f .
$
’
’ 1/8 for x = 0;
& 3/8 for x = 1;
’
’
P( X = x ) = 3/8 for x = 2;
1/8 for x = 3;
’
’
’
’
0 otherwise.
%
Solution:
$
’
’ 0 x ă 0;
& 1/8 0 ď x ă 1;
’
’
F(X = x) = 4/8 1 ď x ă 2;
7/8 2 ď x ă 3;
’
’
’
’
1 x ě 3.
%
61
Discrete Distributions 4.1 Random Variables
Figure 4.1.5.
1
7/8
●
6/8
5/8
F(x)
4/8
●
3/8
2/8
1/8
●
0
0 1 2 3
Pay attention to the jump size in the step function? What do you conclude?
Example 4.1.6
Example 4.1.7
In Example 4.1.4, calculate the cumulative distribution function of the number of calls made
by the salesperson.
Solution:
Let Y be the number of calls made by the salesperson needed to find a warehouse with the
product. He calls the warehouses one by one until he finds the warehouse with required
product. The cdf of Y is the probability that he makes at most Y call
$
’
’ 0 y ă 1;
2/5 1 ď y ă 2;
’
’
&
F (Y = y ) = 7/10 2 ď y ă 3;
9/10 3 ď y ă 4;
’
’
’
’
1 y ě 4.
%
62
Discrete Distributions 4.1 Random Variables
Example 4.1.7
§§ Getting pm f from cd f
It is sometimes useful to be able to provide cumulative probabilities such as and that such
probabilities can be used to find the probability mass function of a random variable. There-
fore, using cumulative probabilities is an alternate method of describing the probability
distribution of a random variable. If the range of a discrete random variable X consists of
the values x1 ă x2 ă ¨ ¨ ¨ ă xn then
p ( x1 ) = F ( x1 )
p( xi ) = F ( xi ) ´ F ( xi´1 ); i = 2, 3, . . . n
Example 4.1.8
X is a discrete random variable with cd f as:
$
’
’ 0.00 ´8 ď x ă ´3;
& 0.03 ´3 ď x ă 1;
’
’
F(X = x) = 0.20 1 ď x ă 2.5;
0.76 2.5 ď x ă 7;
’
’
’
’
1.00 7 ď x ă 8.
%
Write down the pm f from the above cd f in appropriate form. Given that X is positive, what
is the probability that it will be at least 2?
Solution:
The probability mass function at each point is the change in the cumulative distribution
function at the point. Therefore,
$
’ 0.03 for x = ´3;
’
& 0.17 for x = 1;
’
’
P( X = x ) = 0.56 for x = 2.5;
0.24 for x = 7;
’
’
’
’
0 otherwise.
%
Example 4.1.8
63
Discrete Distributions 4.2 Expectation of a Random Variable
1. lim F ( x ) = 0
xÑ´8
2. lim F ( x ) = 1
xÑ+8
Figure 4.2.1.
The probability mass function provides complete information about the probabilistic
properties of a random variable. One of the most basic summary measures is the expec-
tation or mean of a random variable, i.e., E( X ). It is the average value of random variable X
and certainly reveals one of the most important characteristics of its distribution, i.e., center.
It is a probabilistic term that describes the likely outcome of a scenario. The concept of the
expected value of a random variable parallels the notion of a weighted average. The possible
values of the random variable are weighted by their probabilities, as specified in the following
definition:
64
Discrete Distributions 4.2 Expectation of a Random Variable
1. E( X ) is also called the 1st moment of the random variable X about zero. The
first moment of X is synonymously called the mean, expectation, or average
value of X.
Imagine placing the masses p( xi ) at the points xi on a beam; the balance point of the
beam is the expected value of X. Consequently, it describes the ’center’ of the distribution
of X in a manner similar to the balance point of a loading.
Figure 4.2.2.
Example 4.2.2
1. Toss three fair coins and let X denote the number of heads observed. Find the expected
65
Discrete Distributions 4.2 Expectation of a Random Variable
number of heads. $
’
’ 1/8 for x = 0;
& 3/8 for x = 1;
’
’
P( X = x ) = 3/8 for x = 2;
1/8 for x = 3;
’
’
’
’
0 otherwise.
%
2. Consider a game that costs $1 to play. The probability of losing is 0.7. The probability
of winning $50 is 0.1, and the probability of winning $35 is 0.2. Would you expect to
win or lose if you play this game?
Solution:
1.
Gain ( X ) P( X )
-1 0.7
(50-1) 0.1
(35-1) 0.2
In the long run, you are expected to gain $11 if you play this game.
Example 4.2.2
E ( a 1 X1 + a 2 X2 + ¨ ¨ ¨ + a n X n ) = a 1 E ( X1 ) + a 2 E ( X2 ) + ¨ ¨ ¨ + a n E ( X n )
66
Discrete Distributions 4.2 Expectation of a Random Variable
E ( X1 + X2 + ¨ ¨ ¨ + X n ) = E ( X1 ) + E ( X2 ) + ¨ ¨ ¨ + E ( X n ) (4.2.1)
= nE( X ) identical distributions (4.2.2)
E( aX + b) = aE( X ) + b
A median of X is any point that divides the mass of the distribution into two equal
parts; that is, x0 is a median of X if
1
P( X ď xo ) =
2
The mean of X may not exist, but there exists at least one median.
Random variables that are coded as 1 when an event occurs or 0 when the event
does not occur are called indicator random variables. In other words, I A maps all
outcomes in the set A to 1 and all outcomes outside A to 0. Roll a die. Let A be
the event that a 6 appears. Then
"
1 if x P A;
IA (x) =
0 otherwise.
Example 4.2.6
Four students order noodles at a certain local restaurant. Their orders are placed indepen-
dently. Each student is known to prefer Japanese pan noodles 40% of the time. How many
of them do we expect to order Japanese pan noodles?
67
Discrete Distributions 4.3 Variance
Solution:
Let X denote the number of students that order Japanese pan noodles altogether. Let
X1 , X2 , X3 , X4 be the indicator random variables representing the 4 students if they make a
choice of Japanese pan noodles or not
"
0 for xi = 0;
P ( Xi ) =
0.4 for xi = 1;
E( Xi ) = 1 ˆ 0.4 + 0 ˆ 0.6
= 0.4
Example 4.2.6
4.3 IJ Variance
Figure 4.3.1.
68
Discrete Distributions 4.3 Variance
The variance of a random variable X is a measure of how spread out its possible
values are. The variance of X is the 2nd central moment, commonly denoted by σ2
or Var ( X ). It is the most commonly used measure of dispersion of a distribution
about its mean. Large values of σ2 imply a large spread in the distribution of X
about its mean. Conversely, small values imply a sharp concentration of the mass of
distribution in the neighborhood of the mean as shown in Figure 5.2.4. For discrete
random variable
Var ( X ) = E( X ´ µ)2
= E( X 2 ) ´ [ E( X )]2 ,
where E( X 2 ) is the 2nd moment of the random variable X about zero. Variance is
the average value of the squared deviation of X from its mean µ. If X has units of
meters, e.g., the variance has units of meters squared.
For any random variable X, the variance of X is nonnegative, i.e.,
Var ( X ) = E( X ´ µ)2 ě 0
3. Var [ X + c] = Var [ X ].
4. Var ( X ) ě 0
6. The variance operator is not linear, but it is straightforward to determine the variance
of a linear function of a random variable. For any random variable X and any constants
a and b, let Y = aX + b, then Y is also a random variable and
69
Discrete Distributions 4.3 Variance
The sample standard deviation σ is the square root of the sample variance. It can
be interpreted as the distance of the data values to the mean.
b
σ = Var ( X )
b b
SD (Y ) = SD ( aX + b) = Var ( aX + b) = a2 Var ( X )
Example 4.3.3
(a) Ali and his brother both like chocolate chip cookies best. They have a jar of cookies
with 5 chocolate chip cookies, 3 oatmeal cookies, and 4 peanut butter cookies.
They are each allowed to have 3 cookies. To be fair, they agree to randomly
select their cookies without peeking, and they each must keep the cookies that
they select. What is the variance of the number of chocolate chip cookies that Ali
gets?
(b) A student was at work at the county amphitheater, and was given the task of
cleaning 1500 seats. To make the job more interesting, his boss hid a golden
ticket somewhere in the seats. The ticket is equally likely to be in any of the
seats. Let X be the number of seats cleaned until the ticket is found. Calculate
the variance of X.
Solution:
(a) Let X denote the number of chocolate chip cookies that Ali selects. As he is
allowed to have 3 cookies, therefore X = 0, 1, 2, 3
X P( X
)
X ¨ P( X ) X 2 ¨ P( X )
5 7 12
0 / = 7/44 0 0
03 3
5 7 12
1 / = 21/44 21/44 21/44
12 3
5 7 12
2 / = 7/22 14/22 28/22
21 3
5 7 12
3 / = 1/22 3/22 9/22
3 0 3
Total 5/4 95/44
70
Discrete Distributions 4.3 Variance
n
ÿ
E( X ) = x j P( X = x j )
j =1
= 5/4
n
ÿ
2
E( X ) = x2j P( X = x j )
j =1
= 95/44
Var ( X ) = E( X 2 ) ´ [ E( X )]2
= 95/44 ´ (5/4)2
= 0.6
(b) Let X be the number of seats cleaned until the ticket is found. As there are 1500
seats, so the probability of finding the ticket is p = 1/1500. The student will
start cleaning the seats & move to clean the next seat only if he does not find the
ticket in the previous seat.
X P( X )
1 1/1500
2 1 ´ 1/1500 ˆ 1/1499 = 1/1500
3 1 ´ 1/1500 ˆ 1 ´ 1/1499 ˆ 1/1498 = 1/1500
.. ..
. .
1500 1/1500
E( X ) = 1/1500 ˆ (1 + 2 + ¨ ¨ ¨ + 1500)
= 1/1500 ˆ 1500(1500 + 1)/2
= 750.5
Var ( X ) = E( X 2 ) ´ [ E( X )]2
= 750750.167 ´ 750.52
= 187499.9067
Example 4.3.3
71
Discrete Distributions 4.3 Variance
Example 4.3.5
Four cards are labeled $1, $2, $3, and $6. A player pays $4, selects two cards without re-
placement at random, and then receives the sum of the winnings indicated on the two cards.
Will the player win or lose money in the long run? What is the variance of the winning?
Solution:
Let X be the sum of the2cards that he selects without replacement. Probability for any
4
value of P( X = x ) = 1/ = 1/6.
2
X=Sum Y = X´4 P( X ) Y ¨ P( X )
(1,2)=3 3 ´ 4 = ´1 1/6 ´1/6
(1,3)=4 4´4 = 0 1/6 0
(1,6)=7 7´4 = 3 1/6 3/6
(2,3)=5 5´4 = 1 1/6 1/6
(2,6)=8 8´4 = 4 1/6 4/6
(3,6)=9 9´4 = 5 1/6 5/6
Total 1 12/6 = 2
Table 4.1
ÿ
As the expected winning E(Y ) = Y ¨ P( X ) = $2 (see Table 4.1), so the player will win
money ($2) in the long run.
Expected winning could also be calculated using expected value of the linear combination,
Y = X ´ 4.
E (Y ) = E ( X ´ 4 )
= E( X ) ´ 4
= 36/6 ´ 4
=2
Variance of the winning can be calculated using property of variance as given below:
Var ( X ´ 4) = Var ( X )
Var ( X ) = E( X 2 ) ´ [ E( X )]2
= 122/3 ´ 62
= 4.67
72
Discrete Distributions 4.4 Bernoulli Distribution
2. This random variable can only take two possible values, usually 0 and 1.
73
Discrete Distributions 4.4 Bernoulli Distribution
P( X = x ) = p x (1 ´ p)1´x
Also written as "
1 ´ p for x = 0
P( X ) =
p for x = 1
1
ÿ
E( X ) = xi p xi (1 ´ p)1´xi
i =0
=p
6 Var ( X ) = E( X 2 ) ´ [ E( X )]2
= p (1 ´ p )
Example 4.4.2
Thirty-eight percent of the songs on a student’s music player are rock songs. A student
chooses a song at random, with all songs equally likely to be chosen. Let X indicate whether
the selected song is a rock song. Find the expected number and variance of X.
Solution:
Let X be the indicator random variable with X = 1 if the selected song is a rock song, X = 0
otherwise. The probability of rock song X is p = 0.38.
"
1 ´ 0.38 f or x = 0
P( X ) =
0.38 f or x = 1
74
Discrete Distributions 4.5 Binomial Distribution
E( X ) = p
= 0.38
Var ( X ) = p(1 ´ p)
= 0.38 ˆ (1 ´ 0.38)
= 0.2356
Example 4.4.2
75
Discrete Distributions 4.5 Binomial Distribution
Binomial Experiment
A random variable X „ Bin (n, p), where n and p are the parameters of the
Binomial distribution. The pm f for Binomial distribution is:
n x
P( X = x ) = p (1 ´ p)n´x
x
Example 4.5.3
A particular concentration of a chemical found in polluted water has been found to be lethal
to 20% of the fish that are exposed to the concentration for 24 hours. Ten fish are placed in
a tank containing this concentration of the chemical in water.
Solution:
n = 10; p = 0.20. Let X be the number of fish that survive,
76
Discrete Distributions 4.5 Binomial Distribution
(a).
P( X ě 8) = P( X = 8) + P( X = 9) + P( X = 10)
10 8 10´8 10
= 0.2 (1 ´ 0.2) + 0.29 (1 ´ 0.2)10´9
8 9
10
+ 0.210 (1 ´ 0.2)10´10
10
= 0.000078
(b).
P ( X ď 6) = 1 ´ P ( X ą 6)
= 1 ´ P( X = 7) + P( X = 8) + P( X = 9) + P( X = 10)
10 7 10´7 10
= 1´ 0.2 (1 ´ 0.2) + 0.28 (1 ´ 0.2)10´8
7 8
10 9 10´9 10 10 10´10
+ 0.2 (1 ´ 0.2) + 0.2 (1 ´ 0.2)
9 10
= 0.999136
Example 4.5.3
Example 4.5.4
An airline estimates that 5% of the people making reservations on a certain flight will not
show up. Consequently, their policy is to sell 84 tickets for a flight that can only hold 80
passengers. What is the probability that there will be a seat available for every passenger
that shows up?
Solution:
P(No show) = 0.05; 6 P(show) = 0.95; n = 84
P( X ď 80) = 1 ´ P( X ą 80)
= 1 ´ P( X = 81) + P( X = 82) + P( X = 83) + P( X = 84)
= 0.6103
There will be 61.03% chance of a seat being available for everyone who show up.
Example 4.5.4
Example 4.5.5
A hospital receives 1/5 of its COVID-19 vaccine shipments from Moderna and the remainder
of its shipments from Pfizer. Each shipment contains a very large number of vaccine vials.
For Moderna shipments, 10% of the vials are ineffective, while for Pfizer, 2% of the vials are
77
Discrete Distributions 4.5 Binomial Distribution
ineffective. The hospital tests 30 randomly selected vials from a shipment and finds that one
vial is ineffective. What is the probability that this shipment came from Moderna?
Solution:
Let M be the event that shipment is from Moderna, P be the event that shipment is from
Pfizer, while I be the event that shipment is ineffective.
We are given P( M ) = 1/5; P( P) = 1 ´ 1/5 = 4/5; n = 30. Let X be the number of
ineffective vials in the sample of size 30.
30
1. P( I|M) = 0.10; P( X = 1|M ) = 0.101 (1 ´ 0.10)30´1 = 0.141
1
30
2. P( I|P) = 0.02; P( X = 1|P) = 0.021 (1 ´ 0.02)30´1 = 0.334
1
P( M|I ) is asking to get an updated probability of having the ineffective shipment from
Moderna.
P( I|M) P( M )
P( M|I ) =
P( I )
P( I X M)
=
P( I X M) + P( P X M)
1/5 ˆ 0.141
=
1/5 ˆ 0.141 + 4/5 ˆ 0.334
= 0.0954
There is a 9.54% chance that the ineffective vial is from Moderna.
Example 4.5.5
Example 4.5.6
The probability of a student passing an exam is 0.2. How many students must take the exam
to make the probability 0.99 that any number of students will pass the exam?
Solution:
p = 0.2; n =?. Let X be the number of students who pass. Any number of students will
pass, means that at least 1 student will pass the exam.
P ( X ě 1) = 1 ´ P ( X ă 1)
n
0.99 = 1 ´ 0.20 (1 ´ 0.2)n´0
0
0.99 = 1 ´ 0.8n
0.8n = 1 ´ 0.99
n = log(0.01)/log(0.8)
= 20.6377
Therefore n « 21. So 21 students must take an exam so that probability of any passing the
exam is 0.99.
Example 4.5.6
78
Discrete Distributions 4.5 Binomial Distribution
2. for p ą 0.5 the distribution will exhibit NEGATIVE SKEWNESS, (see, Figure 4.5.2).
Figure 4.5.1.
n=15, p=0.2
0.25
0.20
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 11 13 15
79
Discrete Distributions 4.5 Binomial Distribution
Figure 4.5.2.
n=15, p=0.8
0.25
0.20
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 11 13 15
Figure 4.5.3.
n=15, p=0.5
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 11 13 15
80
Discrete Distributions 4.5 Binomial Distribution
Figure 4.5.4.
n=40, p=0.2
0.15
0.10
0.05
0.00
0 3 6 9 12 16 20 24 28 32 36 40
n
ÿ n x
E( X ) = xi p (1 ´ p)n´x
x
i =0
= np
Var ( X ) = E( X 2 ) ´ [ E( X )]2
= np(1 ´ p)
Example 4.5.7
A company is considering drilling four oil wells. The probability of success for each well is
0.40, independent of the results for any other well. The cost of each well is $200,000. Each
well that is successful will be worth $600,000. What is the expected gain?
Solution:
Let X be the number of successful wells, i.e., X = 0, 1, . . . , 4. n = 4; p = 0.4. X is a binomial
random variable. The cost is a fixed constant of $200,000. So the total cost of 4 wells is a
fixed constant, i.e., b = $800, 000. The worth of each successful well is a fixed constant of
a = $600, 000. Let Y be the gain from 4 wells. Then Y = aX ´ b
81
Discrete Distributions 4.6 Poisson Distribution
E( X ) = np
= 4 ˆ 0.4
= 1.6
E(Y ) = E( aX ´ b)
= aE( X ) ´ b
= 600, 000 ˆ 1.6 ´ 800, 000
= 160000
82
Discrete Distributions 4.6 Poisson Distribution
4. A Poisson random variable can take on any positive integer value, i.e., X = 0, 1, 2, . . ..
In contrast, the Binomial distribution always has a finite upper limit, i.e., X =
0, 1, 2, . . . , n.
Figure 4.6.1.
83
Discrete Distributions 4.6 Poisson Distribution
λx
P( X = x ) = e´λ , x = 0, 1, 2, . . .
x!
1. Here X is the number of events that occur during the specified 1 unit of time
2. λ, the average rate of events that occur during the specified 1 unit of time,
space, volume, etc is the parameter of the Poisson Distribution.
The pm f for Poisson distribution for various values of λ is shown in Figure 4.6.1.
8
ÿ λ xi
E( X ) = xi e´λ
xi !
i =0
=λ
Var ( X ) = E( X 2 ) ´ [ E( X )]2
=λ
The Poisson random variable is special in the sense that the mean and the variance are equal.
Example 4.6.2
1. The number of typing errors made by a typist has a Poisson distribution with an average
of four errors per page. If more than four errors appear on a given page, the typist
must retype the whole page. What is the probability that a randomly selected page
needs to be retyped?
84
Discrete Distributions 4.6 Poisson Distribution
2. The number of meteors found by a radar system in any 30-second interval under speci-
fied conditions averages 1.81. Assume the meteors appear randomly and independently.
What is the probability that at least one meteor is found in a one-minute interval?
Solution:
1. λ = 4/page. Let X be the number of typing errors made. We need to calculate the
probability of retyping a randomly selected page, i.e., P( X ą 4)
P ( X ą 4) = 1 ´ P ( X ď 4)
= 1 ´ P ( X = 0) + P ( X = 1) + P ( X = 2) + P ( X = 3) + P ( X = 4)
0 1 2 3 4
´4 4 ´4 4 ´4 4 ´4 4 ´4 4
= 1´ e +e +e +e +e
0! 1! 2! 3! 4!
= 0.3711
P ( X ě 1) = 1 ´ P ( X ă 1)
= 1 ´ P ( X = 0)
0
´1.81ˆ2 (1.81 ˆ 2)
= 1´ e
0!
= 0.9732
Remember that for calculation of probabilities of ’at least type’ or ’greater than type’ events
for Poisson distribution, you will always have to use ’Complement Rule of Probability’.
Example 4.6.2
85
Discrete Distributions 4.7 Geometric Distribution
A rule of thumb; when np ă 7, then we can use Poisson Approximation to the Binomial
Distribution to find approximate probabilities.
Example 4.6.4
5% of the tools produced by a certain process are defective. Find the probability that in a
sample of 40 tools chosen at random, exactly three will be defective. Calculate a) using the
binomial distribution, and b) using the Poisson distribution as an approximation.
Solution:
p = 0.05; n = 40; np = 2; λ « 2
1. What is the probability you will have to make 5 attempts to make a successful call?
86
Discrete Distributions 4.7 Geometric Distribution
Example 4.7.1
Toss a coin until you get a ’H’
1
1. P(H on 1st toss)=
2
1 1
2. P(T on 1st. H on 2nd toss)= ¨
2 2
1 2 1
3. P(T on 1st 2. H on 3rd toss)= ¨
2 2
& so on until we get 1st ’H’.
The probabilities of the number of tosses until 1st ’H’ are displayed in Figure 4.7.1.
Example 4.7.1
87
Discrete Distributions 4.7 Geometric Distribution
Figure 4.7.1.
0.5
●
0.4
0.3
P(x)
●
0.2
●
0.1
●
●
●
0.0
● ● ● ● ● ● ● ●
2 4 6 8 10 12 14
88
Discrete Distributions 4.7 Geometric Distribution
The terms in this pm f form a geometric sequence as in Figure 4.7.1, that is why
the distribution is called Geometric Distribution. In general,
P( X = n) = (1 ´ p)n´1 p
Some references define Geometric distribution as the number of failures until you
get 1st success, i.e., Number of failures = number of trials - 1 that are followed by
1st success.
Example 4.7.3
A driver is eagerly eyeing a precious parking space some distance down the street. There are
five cars in front of the driver, each of which having a probability 0.2 of taking the space.
What is the probability that the car immediately ahead will enter the parking space?
Solution:
p = 0.2. Five cars in front & the probability that the car immediately ahead will enter the
parking space is P( X = 5)
Example 4.7.3
89
Discrete Distributions 4.7 Geometric Distribution
Figure 4.7.2.
1.0
● ● ● ● ● ● ● ●
●
●
●
0.9
●
0.8
F(x)
●
0.7
0.6
0.5
2 4 6 8 10 12 14
2. Its complement is the event that there are no successes in any of the first k attempts,
i.e., X ě k which has a probability of qk . You will only need more than k attempts, if
the 1st k attempts all resulted in failure.
6 P( X ď k ) = 1 ´ P( X ą k ) = 1 ´ qk ;
where q = 1 ´ p.
The cd f for coin toss until 1st H is shown in Figure 4.7.2. The cd f is a step function
which follows the properties of cd f .
Example 4.7.4
Assume that the probability of a specimen failing during a given experiment is 0.1. What
is the probability that it will take more than three specimens to have one surviving the
experiment?
90
Discrete Distributions 4.7 Geometric Distribution
Solution:
Let X be the number of specimens. Probability of surviving p = 1 ´ 0.1 = 0.9. We are
interested in P( X ą 3) to get one specimen surviving the experiment.
We can calculate the required probability using 2 approaches as explained below:
1. X ą 3 in Geometric means that in the 1st 3 specimens tested, none did survive the
experiment.
2. Let Y be the number of specimen surviving, i.e. success then we can fix n = 3 & find
the probability that none of the specimen did survive in the 1st 3 tested in the binomial
experiment. That is P( X ą 3) = P(Y = 0).
Example 4.7.4
The geometric distribution with parameter p has an expected value and a variance of
ÿ
E( X ) = xi (1 ´ p)i´1 p
i =1
1
=
p
Var ( X ) = E( X 2 ) ´ [ E( x )]2
1´ p
=
p2
1
Expected number of trials required to obtain the 1st success is . The fact that E( X ) is the
p
reciprocal of p is intuitively appealing, since it says that small values of p = P( A) require
many repetitions in order to have an event A occur.
91
Discrete Distributions 4.7 Geometric Distribution
The geometric distribution has the memoryless (forgetfulness) property. The dis-
tribution would be exactly the same regardless of the past, i.e.,
P( X ą n + k X X ą k)
P( X ą n + k|X ą k ) =
P( X ą k)
P( X ą n + k)
=
P( X ą k)
(1 ´ p ) n + k
=
(1 ´ p ) k
= (1 ´ p)n+k´k
= (1 ´ p ) n
= P( X ą n)
P( X ą n) = (1 ´ p)n is the probability that it takes more than n trials to get 1st
success means that all the previous n trials resulted in failure. Use of this property
simplifies conditional probability problems!
Example 4.7.6
The Super Breakfast Challenge (SBC) is known to be very difficult to consume. Only 10%
of people are able to eat all of the SBC.
1. How many people are needed, on average, until the first successful customer?
3. Given that the first 4 are unsuccessful, what is the probability at least 8 are needed?
Solution:
Let X be the number of people required until the first successful customer
1. E( X ) = 1/p = 1/0.1 = 10
Example 4.7.6
92
Discrete Distributions 4.8 Negative Binomial Distribution
§§ A Case Study
A coach wants to put together an intramural basketball team, from people living in a large
dorm. She estimates that 12% of people in the dorm like to play basketball. She goes door to
door to ask people if they would be interested in playing on the team. What is the probability
that she needs to
2. talk to 20 people, in order to find 5 people who will join the team?
3. How many dorm residents does she expect to interview before finding 5 people to create
the team?
Example 4.8.1
93
Discrete Distributions 4.8 Negative Binomial Distribution
Figure 4.8.1.
0.25
● ●
● r=2
● r=5
0.20
● r=10
●
0.15
● ●
prob
● ●
●
0.10
●
● ●
● ●
● ● ●
●
●
● ● ●
0.05
● ●
●
●
● ● ●
● ●
● ●
● ●
● ● ●
● ● ●
0.00
● ● ●
● ● ● ● ● ● ● ● ● ● ●
● ●
● ● ● ● ● ● ● ● ●
5 10 15 20 25 30
94
Discrete Distributions 4.8 Negative Binomial Distribution
where r and p are the 2 parameters of the negative binomial distribution. The
number of successes r ą 1 & the probability of success p is fixed from trial to trial.
Negative distribution is also known as the Pascal distribution.
1. A curbside parking facility has a capacity for 3 cars. Determine the probability that it
will be full within 10 minutes. It is estimated that 6 cars will pass this parking space
within the time span and, on average, 80% of all cars will want to park there.
95
Discrete Distributions 4.8 Negative Binomial Distribution
2. A public relations intern realizes that she forgot to assemble the consumer panel her
boss asked her to do. She panics and decides to randomly ask (independent) people if
they will work on the panel for an hour. Since she is willing to pay them for their work,
she believes she will have a 75% chance of people agreeing to work with her. Find the
probability that she will need to interview at least 10 people to find 5 willing to work
on the panel?
Solution:
1. The desired probability is simply the probability that the number of cars until the third
success (taking the parking space) is less than or equal to 6, i.e., we need to compute
the cd f . Let X be the number of cars to the third success, then X has a negative
binomial distribution with r = 3 and p = 0.8.
P ( X ď 6) = P ( X = 3) + P ( X = 4) + P ( X = 5) + P ( X = 6)
3´1
= 0.83 (1 ´ 0.8)3´3
3´1
4´1
+ 0.83 (1 ´ 0.8)4´3
3´1
5´1
+ 0.83 (1 ´ 0.8)5´3
3´1
6´1
+ 0.83 (1 ´ 0.8)6´3
3´1
= 0.983
2. The desired probability is simply the probability that the number of people to ask to
get the fifth success is at least 10, i.e., P( X ě 10). If X is this number, it has a negative
binomial distribution with r = 5 and p = 0.75.
P( X ě 10) = 1 ´ P( X ă 10)
= 1 ´ P ( X = 5) + P ( X = 6) + P ( X = 7) + P ( X = 8) + P ( X = 9)
5´1
= 1´ 0.755 (1 ´ 0.75)5´5
5´1
6´1 5 6´5 7´1
+ 0.75 (1 ´ 0.75) + 0.755 (1 ´ 0.75)7´5
5´1 5´1
8´1 5 8´5 9´1 5 9´5
+ 0.75 (1 ´ 0.75) + 0.75 (1 ´ 0.75)
5´1 5´1
= 0.0489
Alternate Method: This problem can be solved based on the intuitive concept that she
will only need to interview 10 or more people if she fails to get the required number of
people willing to work for her, i.e., less than 5 people are willing to work from the 9
people she would have interviewed. So n = 9. Let Y be the number of people willing
96
Discrete Distributions 4.8 Negative Binomial Distribution
to work.
P( X ě 10) = P(Y ă 5)
= P (Y = 0 ) + P (Y = 1 ) + P (Y = 2 ) + P (Y = 3 ) + P (Y = 4 )
9 0 5´0 9
= 0.75 (1 ´ 0.75) + 0.751 (1 ´ 0.75)5´1
0 1
9 2 5´2 9
+ 0.75 (1 ´ 0.75) + 0.753 (1 ´ 0.75)5´3
2 3
9
+ 0.754 (1 ´ 0.75)5´4
4
= 0.0489
Example 4.8.3
• Let Y have a binomial distribution with parameters n and p. (That is, Y = number of
successes in n Bernoulli trials with P(success) = p.)
( a ) P ( X ď n ) = P (Y ě r )
( b ) P ( X ą n ) = P (Y ă r )
1. The negative binomial gets its name as it does the opposite of what the binomial does.
2. Geometric distribution is a special case of the negative binomial distribution when you
get the 1st success.
97
Discrete Distributions 4.9 Hypergeometric Distribution
Figure 4.9.1.
1. Draw a ball from the urn, note the color of the ball and don’t replace it back in the
urn.
2. Draw a 2nd ball without replacement from the urn, note the color of the ball,
The hypergeometric distribution pm f for this case study is shown in Figure 4.9.2.
98
Discrete Distributions 4.9 Hypergeometric Distribution
Figure 4.9.2.
0.4
● ●
P(x)
0.2
0.1
● ●
● ●
0.0
● ● ● ●
0 2 4 6 8 10
99
Discrete Distributions 4.9 Hypergeometric Distribution
X „ hypergeom( N, M, n)
(M N´M
x )( n´x )
P( X = x ) = , where X = 0, 1, min( M, n) (4.9.1)
( Nn )
Parameters
Example 4.9.2
In a group of 25 factory workers, 20 are low-risk and 5 are high-risk. Two of the 25 factory
workers are randomly selected without replacement. Calculate the probability that exactly
one of the two selected factory workers is low-risk.
Solution:
N = 25; M = 5; n = 2
(20 5
1 )(1)
P ( X = 1) =
(25
2)
= 0.3333
Example 4.9.2
100
Discrete Distributions 4.9 Hypergeometric Distribution
N´n
The factor is called finite population correction. For a fixed sample size n, as N Ñ 8
N´1
it is clear that the correction goes to 1, i.e., for infinite populations the hypergeometric
distribution can be approximated by Binomial.
Example 4.9.3
A college student is running late for his class. He has 12 folders on his desk, 4 of which in-
clude assignments due today. Without taking time to look, he accidentally grabs just 3 folders
from his stack. When he gets to class, he counts how many of them contain his homework
assignments. What is the probability at least 2 of the 3 folders contain his assignments?
Solution:
N = 12; M = 4; n = 3
P ( X ě 2) = P ( X = 2) + P ( X = 3)
(42)(81) (43)(80)
= +
(12
3) (12
3)
= 0.2363
Example 4.9.3
Rule of Thumb: For very large population size N, if the sample size n is at most 5% of the
population size and sampling is without replacement, then the experiment may be analyzed
as if it were a binomial experiment. The probability of success p in this case is approximated
as M/N « p.
Example 4.9.4
A nationwide survey of 17,000 college seniors by the University of Michigan revealed that
almost 70% disapprove of daily smoking. If 18 of these seniors are selected at random and
asked their opinion, what is the probability that more than 9 but fewer than 14 disapprove
of smoking daily?
Solution:
N = 17, 000; p = 0.70; M = 4; n = 18; n/N = 18/17000 = 0.001. As n ď 0.05N, so we
can effectively use the binomial approximation to hypergeometric.
101
Discrete Distributions 4.9 Hypergeometric Distribution
Example 4.9.4
In general, it is a bit difficult to decide the appropriate distribution in a particular scenario.
Students should practice problems that will provide them with some skills for making correct
decisions. Figure 4.9.3 might be useful in making a correct choice.
Figure 4.9.3.
102
Discrete Distributions 4.10 Home Work
2. The demand for a particular type of pump at an isolated mine is random and indepen-
dent with an average demand of 2.8 pumps in a week (7 days). Further supplies are
ordered each Tuesday morning and arrive on the weekly plane on Friday morning. Last
Tuesday morning only one pump was in stock, so the storesman ordered six more to
come on Friday morning. Find the probability that stock will be exhausted and there
will be unsatisfied demand for at least one pump by Friday morning.
3. A salesperson has found that the probability of a sale on a single contact is approxi-
mately .03. If the salesperson contacts 100 prospects, what is the approximate proba-
bility of making at least one sale?
4. Used watch batteries are tested one at a time until a good battery is found. Let X
denote the number of batteries that need to be tested in order to find the first good
one. Find the expected value of X, given that P( X ą 3) = 0.5
5. A research study is concerned with the side effects of a new drug. The drug is given
to patients, one at a time, until two patients develop side effects. If the probability of
getting a side effect from the drug is 1/6, what is the probability that eight patients
are needed?
6. When drawing cards with replacement and re-shuffling, you bet someone that you can
draw an Ace within k draws. You want your chance of winning this bet to be at least
52%. What is the minimum value of k needed? What is the probability that you will
need at least ten draws to get 4 Aces?
§§ Answers
1. $840
2. 0.3374
3. 0.9524
4. « 5
103
Discrete Distributions 4.10 Home Work
5. « 0.0651
6. « 10; 0.0236
7. 0.9517
104
Chapter 5
Continuous Distributions
AS YOU READ . . .
2. What is Continuous Uniform Distribution and what are its parameters? In which
scenario can we use it to model probabilities?
3. What is Normal Distribution and its parameters? Why is Normal Distribution widely
applicable in practical life?
4. What is the Exponential Distribution and its parameters? How can we use it to model
chances of waiting times?
§§ A Case Study
Consider daily rainfall in Karachi in July. Theoretically, using measuring equipment with
perfect accuracy, the amount of rainfall could take on any value e.g., between 0 and 5 inches.
Let X represents the amount of rainfall in inches. We might want to calculate probabilities
such as:
1. the amount of rainfall in Karachi in July this year would be less than 5 inches, i.e.,
P( X ă 5) or
2. the amount of rainfall in Karachi in July this year would be between 2-inches to 4-
inches, i.e., P(2 ď X ď 4).
105
Continuous Distributions 5.1 Continuous Random Variable
The amount of rainfall X being a continuous random variable, includes all values in an interval
of real numbers. This could be an infinite interval such as (´8, 8). You could usually state
the beginning and end points, but you would have infinitely many possibilities of answers
within that range, e.g., 2 ď X ď 4; (see Figure 5.1.1).
Figure 5.1.1.
f(x)
0 1 2 3 4 5 6
106
Continuous Distributions 5.2 Continuous Probability Distribution
Figure 5.1.2.
20
15
10
f(x)
5
0
0.18 0.22
107
Continuous Distributions 5.2 Continuous Probability Distribution
S&P % Returns displayed in Figure 5.2.1 show real life application of continuous probability
distribution in Forex.
Figure 5.2.1.
1. A pd f for a continuous random variable is defined for all real numbers in the
range of the random variable.
2. More specifically, the area under the pd f curve between points a and b is the
same as the probability that the random variable will have a value between a
and b, (see Figure 5.2.2).
żb
P( a ď X ď b) = f ( x )dx
a
108
Continuous Distributions 5.2 Continuous Probability Distribution
Figure 5.2.2.
§§ Properties of pdf
For a continuous random variable X, a probability density function pdf denoted as f ( x ) is a
function such that:
1. Non-Negativity: f ( x ) ě 0, for all x
ż8
2. Unity: f ( x )dx = 1
´8
3. The probability that a continuous random variable X takes any specific value a is always
ża
0!, i.e., P( X = a) = P( a ď X ď a) = f ( x )dx = 0.
a
f ( x ) ě 0, @x 0 ď P( X = x ) ď 1, @x P S
ż8 ÿ
f ( x )dx = 1 P( x ) = 1
´8 i
Example 5.2.2
Let a continuous random variable X has density function
109
Continuous Distributions 5.2 Continuous Probability Distribution
#
A (1 ´ x 2 ) ´1 ă x ă 1,
f (x) =
0 elsewhere
2. Find the probability that X will be more than 1/2 but lesser than 3/4.
Solution:
1. To find A we require
ż8
f ( x )dx = 1
´8
ż1
= A(1 ´ x2 )dx
´1
1
x3
= A( x ´ )
3 ´1
= A(1 ´ 1/3 ´ (´1 + 1/3))
= A(2 ´ 2/3)
6 A = 3/4
2.
3/4
3
ż
P(1/2 ď X ď 3/4) = (1 ´ x2 )dx
4
1/2
3/4
3 x3
= ( x ´ )
4 3 1/2
= 29/256
3.
ż1
P( X ě 1/4) = 3/4(1 ´ x2 )dx
1/4
1
3 x3
= ( x ´ )
4 3 1/4
= 81/256
110
Continuous Distributions 5.2 Continuous Probability Distribution
Example 5.2.2
Example 5.2.3
The probability density function of the time to failure of an electronic component in a copier
(in hours) is #
0 x ă 0,
f ( x ) = 1 ´0.5x
2e for x ě 0
a. Determine the probability that a component fails in the interval from 1 to 2 hours.
b. At what time do we expect 50% of the components to have failed, i.e., median of the
distribution
Solution:
a.
ż2
1 ´0.5x
P (1 ď X ď 2) = e dx
2
1
2
= ´( e ´0.5x
)
1
= 0.2386
b. For the median of the distribution, we need to find the value of X that divides the
distribution into two halves, e.g., P(0 ď X ď x ) = 0.5.
P(0 ď X ď x ) = 0.5
żx
1 ´0.5x
= e dx
2
0
x
´0.5x
= ´( e )
0
= ´( e ´0.5x
) ´ (e´0.5(0) )
0.5 = 1 ´ (e´0.5x )
Solving for x, we get x = 1.3865. Therefore, after 1.39 hours we expect 50% of the
components to have failed.
Example 5.2.3
111
Continuous Distributions 5.2 Continuous Probability Distribution
We often need to compute the probability that the random variable X will be less
than or equal to a, i.e. P( X ď a), known as cd f
Continuous Case
F ( x ) = P( X ď a)
= F ( a)
ża
= f ( x )dx
´8
Discrete Case
F ( x ) = P( X ď a)
= F ( a)
ÿ
= P( x )
Xďa
§§ pd f from cd f
Discrete Case (pm f from cd f ): pm f was the jump size in the step function. The size of the
jump at any x can be written as
PX ( xi ) = F ( xi ) ´ F ( xi´1 )
d
f (x) = F ( x ).
dx
P( X ą a) = 1 ´ P( X ď a)
= 1 ´ F ( a)
P( a ď X ď b) = F (b) ´ F ( a)
112
Continuous Distributions 5.2 Continuous Probability Distribution
Example 5.2.5
If X is a continuous random variable with cd f given by
#
0 x ă 0,
F(x) =
1 ´ e´0.5x for x ě 0
Find the pd f of x.
Solution:
$
& 1 ´0.5x
d e xě0
f ( x ) = dx F ( x ) = 2
%0 elsewhere
#
0 x ă 0,
6 f (x) = 1 ´0.5x
2e for x ě 0
Example 5.2.5
5.2.3 §§ Expectation
§§ Continuous Case
ż8
E( X ) = x f ( x )dx
ż´8
8
E g( X ) = g( x ) f ( x )dx
´8
§§ Discrete Case
ÿ
E( X ) = x j P( x j )
j
ÿ
E g( X ) = g( x j ) P( x j )
j
113
Continuous Distributions 5.2 Continuous Probability Distribution
Figure 5.2.3.
5.2.4 §§ Variance
114
Continuous Distributions 5.2 Continuous Probability Distribution
Var ( X ) = E( x ´ µ)2
§§ Continuous Case
Var ( X ) = E( x ´ µ)2
ż8
= ( x ´ µ)2 f ( x )dx
´8
= E( X 2 ) ´ [ E( X )]2
ż8
2
E( X ) = x2 f ( x )dx
´8
Graphically, Var ( X ) is the spread of the values of the random variable around it’s
mean as shown in Figure 5.2.4.
§§ Discrete Case
Var ( X ) = E( x ´ µ)2
= E( X 2 ) ´ [ E( X )]2
ÿ
E( X 2 ) = x2j P( x j )
j
115
Continuous Distributions 5.2 Continuous Probability Distribution
Figure 5.2.4.
Example 5.2.8
Let a continuous random variable X has density function
#
A(1 ´ x2 ) ´1 ă x ă 1,
f (x) =
0 elsewhere
116
Continuous Distributions 5.2 Continuous Probability Distribution
Example 5.2.8
Example 5.2.9
The probability density function of the weight of packages delivered by a post office is
#
70
1 ď x ď 70,
f ( x ) = 69x2
0 elsewhere
1. If the cost is $2.50 per pound, what is the mean shipping cost of a package?
2. Find the Variance of the distribution of the shipping cost.
Solution:
Let X be the weight of the package. Shipping cost per pound is $2.50. The total cost can be
defined as Y = 2.5X.
1.
ż8
E( X ) = x f ( x )dx
´8
ż 70
70
= x dx
1 69x2
= 4.31
2.
Var ( X ) = E( X 2 ) ´ [ E( X )]2
= 70 ´ (4.31)2
= 51.42
Var (Y ) = Var (2.5X )
= 2.52 Var ( X )
= 321.3994
Example 5.2.9
117
Continuous Distributions 5.3 Piecewise Distributions
1. Find the cd f
2. Find E( x )
3. Find Var ( x )
Solution:
For 1 ď a ď 3, we have
ż1 ża
3 3
F ( a) = dx + 0dx =
4 4
0 1
For 3 ď a ď 4, we have
ż1 ż3 ża
3 1 3 1
F ( a) = dx + 0dx + dx = + 0 + ( a ´ 3)
4 4 4 4
0 1 3
For 4 ď a, we have
ż1 ż3 ż4 ża
3 1 3 1
F ( a) = dx + 0dx + dx + 0.dx = + =1
4 4 4 4
0 1 3 4
118
Continuous Distributions 5.3 Piecewise Distributions
ż8
2
E( X ) = x2 f ( x )dx
´8
1 ż3 ż4
1 2
ż
= x2 f ( x )dx + x2 . f ( x )dx + x dx
4
0 1 3
ż1 ż3 ż4
3 1 2
= x2 dx + x2 .0dx + x dx
4 4
0 1 3
= 3.33
3.
2
Var ( X ) = E( X 2 ) ´ E( X )
= 1.77
Example 5.3.1
119
Continuous Distributions 5.3 Piecewise Distributions
Figure 5.3.1.
●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
0.6
0.4
f(x)
●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
0.2
0.0
●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
● ●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●
0 1 2 3 4 5 6
120
Continuous Distributions 5.4 Continuous Uniform Distribution
Figure 5.3.2.
1.0
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●●
●
●
●●
●
●●
0.8
●
●
●
●●
●
●
●●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
0.6
●
●
●
●
●
●
●●
●
●
●
●
F(x)
●
●
●
●
●●
●
●
●
●
●
●
●
●
0.4
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
0.2
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
0.0
●
●
0 1 2 3 4 5 6
Uniform random variables are one of the most elementary continuous random variables.
§§ A Case Study
The total time to process a passport application by the state department is between 3 and 7
weeks. The interest might be to find out the expected time for an application processing. If
my passport needs renewal what is the probability that my application will be processed in
5 weeks or less? Let X be the processing time, it is important to note that X is equally likely
to fall anywhere in this interval of 3-7 weeks, i.e., X has a constant density on this interval.
121
Continuous Distributions 5.4 Continuous Uniform Distribution
Figure 5.4.1.
Figure 5.4.2 shows the Uniform density over the interval a and b. The random variable
X uniformly distributed on ( a, b), is equally likely to fall anywhere in this interval.
122
Continuous Distributions 5.4 Continuous Uniform Distribution
Figure 5.4.2.
0.5
0.4
0.3
b−a
1
f(x) =
0.2
0.1
0.0
a b
The distribution is also called a Rectangular Distribution. U(0,1) is the most commonly
used uniform distribution.
if x ď a
$
żx ’
& x´ 0
1 a
F ( x ) = P( X ď x ) = dx = aăxăb
a b´a % b´a
’
1 if x ě b
123
Continuous Distributions 5.4 Continuous Uniform Distribution
Figure 5.4.3.
1.0
0.8
0.6
F(x)
0.4
0.2
0.0
a b
Var ( X ) = E( X 2 ) ´ [ E( X )]2
( b ´ a )2
= .
12
124
Continuous Distributions 5.4 Continuous Uniform Distribution
Example 5.4.3
Suppose a bus always arrives at a particular stop between 8:00 AM and 8:10 AM. The density
is shown in Figure 5.4.4.
a. Find the probability that the bus will arrive tomorrow between 8:00 AM and 8:02 AM?
b. What is the expected time of the bus arrival?
c. Eighty percent of the time, the waiting time of a customer for the bus must fall below
what value?
d. If the bus did not arrive in the 1st 5 minutes, what is the probability that it will arrive
in the last 2 minutes?
Figure 5.4.4.
0.25
0.20
0.15
f(x)
0.10
0.05
0.00
0 10
Solution:
Let the random variable X be the waiting time on minutes scale.
a. Probability that the bus will arrive tomorrow between 8:00 AM and 8:02 AM, i.e.,
X ă 2).
ż2
1
P ( x ď 2) = dx
0 (b ´ a)
ż2
1
= dx
0 (10 ´ 0)
= 2/10
125
Continuous Distributions 5.5 Normal Distribution
There is a 20% chance that the bus will arrive tomorrow between 8:00 AM and 8:02
AM. It is also clear that, owing to uniformity in the distribution, the solution can be
found simply by taking the ratio of the length from 0 to 2 to the total length of the
distribution interval.
b.
a+b
E( x ) =
2
= 10/2
=5
P( x ď k ) = 0.80
żk
1
= dx
0 10 ´ 0
Solving for k, we get k = 0.8. Therefore, 80% of the time, the waiting time of a
customer for the bus must fall below 8:08 AM.
d. Here, the condition that bus did not arrive in the 1st 5 minutes is given.
P ( X ą 8 X X ą 5)
P( X ą 8|X ą= 5) =
P ( X ą 5)
P ( X ą 8)
=
P ( X ą 5)
2/10
=
5/10
= 2/5
Example 5.4.3
126
Continuous Distributions 5.5 Normal Distribution
§§ A Case Study
Smart phone batteries have on average lifetime of 1 year with 1 month margin of error.
You buy a new phone, what is the chance that your phone battery does not work past 1
month? Or it lasts at least 11 months? Such a random variable is expected to have a central
value around which most of the observations cluster; bell-shaped (approximately normal)
also called a Gaussian distribution named after German Mathematician Carl Gauss. Due to
the significance of his work, his picture and the normal pd f along with the normal curve are
displayed on German currency.
Figure 5.5.1.
1 ( x´µ)2
´
f ( x, µ, σ ) = ? e 2σ2 ; ´8 ă x ă 8,
σ 2π
where
a. µ (the mean) is the location and σ (the standard deviation) is the scale pa-
rameter. µ is exactly the first moment and the variance σ2 is second central
moment of the random variable.
127
Continuous Distributions 5.5 Normal Distribution
żx ( x´µ)2
1
FX ( x ) = ? e´ 2σ2 dx
σ 2π
´8
Figure 5.5.2 shows the effect of the location parameter µ on the pd f and the Figure 5.5.3
shows the impact of the location parameter µ on the cd f of the simulated normal distribu-
tions. The scale parameter σ = 1 is kept constant in the 3 distributions simulated. It can
be observed that the pd f of the Gaussian moves left or right depending on the value of the
mean µ, i.e., the change in the value of µ shifts the location of the curves.
Figure 5.5.2.
0.4
µ=0; σ=1
µ=1; σ=1
µ=−1; σ=1
0.3
f(x)
0.2
0.1
0.0
−4 −2 0 2 4
128
Continuous Distributions 5.5 Normal Distribution
Figure 5.5.3.
1.0
µ=0; σ=1
µ=1; σ=1
µ=−1; σ=1
0.8
0.6
F(x)
0.4
0.2
0.0
−4 −2 0 2 4
The impact of the scale parameter σ on the pd f is shown in Figure 5.5.4 and the Figure
5.5.5 shows similar impact of σ on the cd f of the normal distributions. The normal distribu-
tions presented in these figures were simulated with the a constant mean µ = 0, but different
variances. Changing the standard deviation either tightens or spreads out the width of the
distribution along the X-axis. Larger standard deviations produce wider distributions. The
change in σ scales the distribution.
129
Continuous Distributions 5.5 Normal Distribution
Figure 5.5.4.
0.8
µ=0; σ=1
µ=0; σ=2
µ=0; σ=0.5
0.6
f(x)
0.4
0.2
0.0
−4 −2 0 2 4
Figure 5.5.5.
1.0
µ=0; σ=1
µ=0; σ=2
0.8
µ=0; σ=0.5
0.6
F(x)
0.4
0.2
0.0
−4 −2 0 2 4
130
Continuous Distributions 5.5 Normal Distribution
Figure 5.5.6.
a. It is a symmetric, bell shaped distribution with total area under the curve being equal
to 1. This property is useful to solve practical application problems.
b. The mean, median and mode are all equal and located at the center of the distribution.
d. The inflection points are located at µ ´ σ and µ + σ as shown by the red points in the
curve in Figure 5.5.6. (An inflection point is a point on the curve where the sign of the
curvature changes.)
e. This curve lies entirely above the horizontal axis, and x-axis is an asymptote in both
horizontal directions
f. The area between the curve and the horizontal axis is exactly 1. Note that this is the
area of a region that is infinitely wide, since the curve never actually touches the x-axis.
§§ Background
A normal distribution with a mean of µ and standard deviation of σ, i.e., X „ N (µ, σ) has
pd f as:
1 ( x´µ)2
´
f ( x, µ, σ) = ? e 2σ2 ; ´8 ă x ă 8
σ 2π
To find the probability that a normal random variable x lies in the interval from a to b, we
need to find the area under the normal curve between the points a and b (see Figure 5.1.1).
However, there are an infinitely large number of normal distributions-one for each different
mean and standard deviation, (e.g., see Figure 5.5.2). A separate table of areas for each
of these curves is obviously impractical. Instead, we use a standardization procedure that
allows us to use the same table for all normal distributions.
131
Continuous Distributions 5.5 Normal Distribution
All normally distributed variables X can be transformed into the standard normal variable
Z.
x´µ
z= ñ X = µ + σz
σ
A z´value tells how many standard deviations above or below the mean a certain value of
X is.
The CDF of the standard Gaussian can be determined by integrating the PDF.
żz
1 2
Fz (z) = Φ(z) = ? e´z dz
2π
´8
132
Continuous Distributions 5.5 Normal Distribution
Figure 5.5.7.
1.0
0.8
0.6
Φ(z)
0.4
0.2
0.0
−4 −2 0 2 4
The cumulative distribution function is shown in Figure 5.5.7 and is often referred to as
an ’S-shaped’ curve. Notice that Φ(0) = 0.5 because the standard normal distribution is
symmetric about z = 0, and that the cumulative distribution function Φ(z) approaches 1
as z tends to 8 and approaches 0 as z tends to ´8. The symmetry of the standard normal
distribution about 0 implies that if the random variable Z has a standard normal distribution,
then
1 ´ Φ(z) = P( Z ě z) = P( Z ď ´z) = Φ(´z)
§§ Z-Table
1. The table for the cumulative distribution of the standard normal variable is shown in
Figure 5.5.8. The entries inside the table give the area under the standard normal
curve for any value of z from 0 Ñ 3.49 or so.
2. The table gives values for non-negative z. For negative values of z, the area can be
obtained from the symmetry property of the curve.
133
Continuous Distributions 5.5 Normal Distribution
3. Before using the table, remember to convert the normal random variable X to Z as:
x´µ
z=
σ
4. Convert Z back to X as X = µ + σZ
Figure 5.5.8.
134
Continuous Distributions 5.5 Normal Distribution
Example 5.5.4
In each of the following cases, evaluate the required probabilities.
a. P( Z ď 1.96)?
b. P( Z ď ´1.96)?
a. P( Z ě 1.96)?
b. P( Z ě ´1.96)?
Solution:
P( Z ď ´1.96) = P( Z ě 1.96),
P( Z ě 1.96) = 1 ´ P( Z ď 1.96),
using the Complement Law of Probability. Using the standard normal distribution
table,
a.
P( Z ě 1.96) = 1 ´ P( Z ď 1.96),
using the Complement Law of Probability. Using the table P( Z ě 1.96) = 1 ´
0.975 = 0.025. P( Z ě 1.96) is the shaded area under the curve in Figure 5.5.11.
135
Continuous Distributions 5.5 Normal Distribution
b.
P( Z ě ´1.96) = P( Z ď 1.96),
using the symmetry property. Using the table P( Z ě ´1.96) = 0.975 that is the
shaded area under the curve in Figure 5.5.12.
• Convert x into z as
4 ´ 10 16 ´ 10
P ďXď = P(´1.5 ď Z ď 1.5)
4 4
•
Example 5.5.4
Figure 5.5.9.
0.4
0.3
0.2
0.1
0.0
−3 −2 −1 0 1 2 3
P( Z ď 1.96)
136
Continuous Distributions 5.5 Normal Distribution
Figure 5.5.10.
0.4
0.3
0.2
0.1
0.0
−3 −2 −1 0 1 2 3
P( Z ď ´1.96)
Figure 5.5.11.
0.4
0.3
0.2
0.1
0.0
−3 −2 −1 0 1 2 3
P( Z ě 1.96)
137
Continuous Distributions 5.5 Normal Distribution
Figure 5.5.12.
0.4
0.3
0.2
0.1
0.0
−3 −2 −1 0 1 2 3
P(z ě ´1.96).
Figure 5.5.13.
0.4
0.3
0.2
0.1
0.0
−3 −2 −1 0 1 2 3
P(´1.96 ď z ď 1.96).
Example 5.5.5
The achievement scores for a college entrance examination are normally distributed with
mean 75 and standard deviation 10. What percentage of the students will score:
1. above 90?
2. below 70?
138
Continuous Distributions 5.5 Normal Distribution
2. below 70?
X´µ 70 ´ 75
P( X ă 70) = P ă
σ 10
= P( Z ă ´0.5)
= 0.3085
Example 5.5.5
139
Continuous Distributions 5.5 Normal Distribution
sample of ten items, what is the probability that exactly two will have strength more than
100?
Solution:
Let X be the strength with σ = 4.2.
P( X ă 100) = 0.95
X´µ 100 ´ µ
P ă = 0.95
σ 4.2
Now we need to find the z value corresponding to the probability of 0.95, i.e., 95th percentile.
Looking inside the Table we see that z value is 1.645, i.e., P( Z ď 1.645) = 0.95
X´µ
z=
σ
100 ´ µ
1.645 =
4.2
µ = 100 ´ 1.645 ˆ 4.2
= 93.091
Example 5.5.6
1. approximately 68% of observations fall within 1 standard deviation of the mean, i.e.,
µ ˘ σ. The probability that X is within one standard deviation of its mean µ is 0.68
140
Continuous Distributions 5.5 Normal Distribution
2. approximately 95% of observations fall within 2 standard deviation of the mean, i.e.,
µ ˘ 2σ. The probability that X is within two standard deviation of its mean µ is 0.95
3. approximately 99.7% of observations fall within 3 standard deviation of the mean, i.e.,
µ ˘ 3σ. The probability that X is within three standard deviation of its mean µ is
0.997
Figure 5.5.14.
Example 5.5.7
What’s a normal pulse rate? That depends on a variety of factors. Pulse rates between 60
and 100 beats per minute are considered normal for children over 10 and adults. Suppose that
these pulse rates are approximately normally distributed with a mean of 72 and a standard
deviation of 12.
1. What proportion of adults will have pulse rates between 60 and 84?
3. 2.5% of the adults will have their pulse rates exceeding x. Find x?
Solution:
Let X be the pulse rate that has Normal Distribution with µ = 72 and σ = 12. Convert x
into z and use the Normal Distribution Table to find out the required probabilities.
141
Continuous Distributions 5.5 Normal Distribution
Example 5.5.7
Moment generating functions is an exciting new tool to make probabilistic computations very
efficient.
142
Continuous Distributions 5.5 Normal Distribution
Mx (t) = E(etx )
provided that the expectation exists for t in some neighborhood of 0. We call Mx (t)
the moment generating function because all of the moments of X can be obtained
by successively differentiating this function Mx (t) and then evaluating the result at
t = 0.
Let X be a Normal random variable, the moment generating function Mx (t) is:
Mx (t) = E(etx )
ż8 ( x´µ)2
tx ?1 ´
= e e 2σ2 dx
σ 2π
´8
x´µ
Let z = ñ X = µ + σz
σ
ż8
1 1 2
Mx (t) = etσz+tµ ? e´ 2 z σdz
σ 2π
´8
ż8
tµ 1 1 2
=e ? etσz´ 2 z dz
2π
´8
tµ+ 12 σ2 t2
=e
143
Continuous Distributions 5.5 Normal Distribution
Definition 5.5.9.
n
ř
n Xi ´ nµ
i =1
ÿ
2
Xi „ N (nµ, nσ ) ñ Z = ? „ N (0, 1)
i =1 nσ2
Example 5.5.10
The weight of each of the eight individuals is approximately normally distributed with a mean
equal to 150 pounds and a standard deviation of 35 pounds each. What is the probability
that the total weight of eight people who occupy an elevator exceeds 1300 pounds?
Solution:
Let X be the weight of a single individual that is Normal with µ = 150 and σ = 35. As indi-
ÿ8
vidual weights are independent and Normal, so are the total weights Xi „ N (8(150), 8(35)2 ).
i =1
8
ÿ
Convert Xi into z and use the Normal Distribution Table to find out the required proba-
i =1
144
Continuous Distributions 5.6 Exponential Distribution
bilities.
n
ÿ
ř
8 Xi ´ nµ
i =1 1300 ´ 8(150)
P Xi ą 1300 =P ? ą a
i =1 nσ2 8(35)2
= P( Z ą 1.01)
= 1 ´ 0.8438
= 0.1562
There is a 15.62% chance that the total weight of eight individuals in the elevator would
exceed 1300 pounds
Example 5.5.10
§§ Background
Waiting is painful. What is the expected time until an air conditioning system fails as shown
in Figure 5.6.1? When a mother is waiting for her three children to call her, what is the
probability that the first call will arrive within the next 5 minutes?
Figure 5.6.1.
145
Continuous Distributions 5.6 Exponential Distribution
In such cases, let X be the time between successive occurrences. Clearly, X is a continuous
random variable whose range consists of the non-negative real numbers. It is expected that
most calls, times or distances will be short and a few will be long. So the density should be
large near x = 0 and decreasing as x increases.
§§ Poisson Process
The number of events that occur in a window of time or region in space
1. Events occur randomly, but with a long-term average rate of λ per unit time. e.g.,
λ = 10 per hour or λ = 24 ˆ 10 per day
2. The events are rare enough that in a very short time interval, there is a negligible
chance of more than one event.
4. Exponential distribution provides a description of the length of time between two con-
secutive events
5. The important point is we know the average time between events but they are randomly
spaced (stochastic). Let X be the wait time until the first call at a Customer Centre
from any start point in this setting.
6. If the wait time for a call is at least t minutes, then how many calls occurred in the
first t minutes?
146
Continuous Distributions 5.6 Exponential Distribution
The time gap between successive events from a Poisson process (with mean number
of events λ ą 0 per unit interval) is an exponential random variable with rate
parameter λ.
F ( x ) = P( X ď x )
żx
= λe´λx dx
0
$
&
= 1 ´ e´λx , if x ě 0,
0, otherwise.
%
147
Continuous Distributions 5.6 Exponential Distribution
Figure 5.6.2.
1.0
0.8
0.6
F(t)
0.4
0.2
0.0
d
f (x) = F(X )
dx
$
&
= λe´λx , if x ě 0,
0, otherwise.
%
Exponential distribution has only 1 parameter λ which is the average rate, i.e., the
number of events per time period.
148
Continuous Distributions 5.6 Exponential Distribution
Figure 5.6.3.
0.030
0.025
0.020
f(t)
0.015
0.010
0.005
0.000
Some examples of the densities pd f s and cd f s for Exponential random variables with
various values of λ are given in Figure 5.6.4 and Figure 5.6.5 respectively.
Figure 5.6.4.
λ = 0.05
λ=1
λ=2
λ=4
f(t)
Time
149
Continuous Distributions 5.6 Exponential Distribution
Figure 5.6.5.
F(t)
λ = 0.05
λ=1
λ=2
λ=4
Time
These examples show that no matter what the λ parameter is, the density starts at λ
when x = 0 and then quickly moves closer to 0 as x Ñ 8. The cd f starts at 0 but quickly
climbs close to 1 as x Ñ 8. For larger λ, the pd f and cd f curves are steeper, i.e., when λ
is large, the pdf f X ( x ) decays rapidly but the cd f FX ( x ) shows a rapid increase.
b. Var ( X ) = 1/λ2
That is, for exponential distribution, mean and standard deviation are equal.
Example 5.6.4 (Arrival Time of Factory Workers)
The arrival times of workers at a factory first-aid room satisfy a Poisson process with an
average of 1.8 per hour.
1. What is the expectation of the time between two arrivals at the first-aid room?
2. What is the probability that there is at least 1 hour between two arrivals at the first-aid
room?
3. What is the distribution of the number of workers visiting the first-aid room during a
4-hour period?
4. What is the probability that at least four workers visit the first-aid room during a
4-hour period?
150
Continuous Distributions 5.6 Exponential Distribution
Solution:
Let X be the time between 2 arrivals, then X „ exp(λ), with λ = 1.8
1. E( X ) = 1/1.8 = 0.5556.
3. Number of workers Y visiting the first-aid room during a 4-hour period is Poisson with
parameter λt = 1.8 ˆ 4 = 7.2
4.
P (Y ě 4 ) = 1 ´ P (Y ă 4 )
= 1 ´ P (Y = 0 ) ´ P (Y = 1 ) ´ P (Y = 2 ) ´ P (Y = 3 )
e´7.2 ˆ 7.20 e´7.2 ˆ 7.21 e´7.2 ˆ 7.22 e´7.2 ˆ 7.23
= 1´ ´ ´ ´
0! 1! 2! 3!
= 0.9281
Example 5.6.4
P( T ą t1 + t2 |T ą t1 ) = P( T ą t2 ); for all t1 , t2 ě 0
• From the point of view of waiting time, the memoryless property means that
it does not matter how long you have waited so far. If you have waited for
at least t1 time, the distribution of waiting time (from time t1 ) until t2 is the
same as when you started at time zero.
A memoryless wait for a bus would mean that the probability that a bus arrived in
the next minute is the same whether you just got to the station or if you’ve been
sitting there for twenty minutes already.
151
Continuous Distributions 5.6 Exponential Distribution
Here we need conditional probability which in the current scenario can be computed by using
the Memoryless Property
Example 5.6.6
152
Continuous Distributions 5.7 Home Work
0 x ď ´1,
$
’
x + 1
’
’
´1 ă x ď 1
’
’
& 4
’
’
’
F(x) = x
’
’ 1ăxď2
’
’
’
’ 2
’
’
%
1 xą2
2. Ninety identical electrical circuits are monitored at an extreme temperature to see how
long they last before failing. The 50th failure occurs after 263 minutes. If the failure
times are modeled with an exponential distribution,
a. when would you predict that the 80th failure will occur?
b. At what time will only 5% of the circuits fail?
3. In a study of the bone disease osteoporosis, heights of 351 elderly women were measured.
Suppose that their heights follow a normal distribution with µ = 160cm, but unknown
σ. Suppose that 2.27% of those women are taller than 170 cm, what is the standard
deviation? For a random sample of ten women, what is the probability that exactly
two will be shorter than 155cm?
5. The operator of a pumping station has observed that demand for water during early
afternoon hours has an approximately exponential distribution with mean 100 cfs (cubic
feet per second). Find the probability that the
a. demand will exceed 200 cfs during the early afternoon on a randomly selected day.
b. demand will exceed 200 cfs on a given day, given that previous demand was at
least 150 cfs?
What water-pumping capacity should the station maintain during early afternoons so
that the probability that demand will exceed capacity on a randomly selected day is
only .01?
6. Five students are waiting to talk to the TA when office hours begin. The TA talks
to the students one at a time, starting with the first student and ending with the
fifth student, with no breaks between students. Suppose that the time taken by the
153
Continuous Distributions 5.7 Home Work
7. A weather forecaster predicts that the May rainfall in a local area will be between three
and six inches but has no idea where within the interval the amount will be. Let X be
the amount of May rainfall in the local area. What is the probability that May rainfall
will be at least four inches? At most five inches? Explicitly specify the distribution
involved and the parameters from the scenario.
8. A student waits for a bus. Let X be the number of hours that the student waits.
Assume that the waiting time is Exponential with average 20 minutes.
a. What is the probability that the student waits more than 30 minutes?
b. What is the probability that the student waits more than 45 minutes (total), given
that she has already waited for 20 minutes?
c. Given that someone waits less than 45 minutes, what is the probability that they
waited less than 20 minutes?
d. What is the standard deviation of the student’s waiting time?
§§ Answers
$
1
& 4 ´1 ă x ď 1
’
’
’
1. f ( x ) = 1
1ăxď2
% 2
’
’
’
0 otherwise
3/4; 37/48
2. 732.4; 16.6537
3. σ = 5; 0.2844
4. µ = 6.953 ounce
6. 0.132
7. 0.667
8. a. 0.2231
b. 0.2865
c. 0.7066
d. 20 minutes
154
Chapter 6
Limit Theorems
AS YOU READ . . .
Laws of Large Numbers: For large n, the average of a large number of i.i.d.1 random
variables converges to the expected value.
Central Limit Theorems: Determining conditions under which the sum of a large num-
ber of random variables has an approximately normal distribution.
155
Limit Theorems 6.1 Limit Theorems
quantifying the fact that a random variable is ‘relatively close’ to its expected value ‘most
of the time’. It gives bounds that quantify both ‘how close’ and ‘how much of the time’ the
random variable is to its expected value.
Compare this inequality with Empirical Rule that also gives bound for probabilities for
k = 1, 2, 3. However, the difference between the entities is that Chebyshev’s Inequality is
applicable when distribution is not known. Figure 6.1.1 shows 1 ´ 1/k2 as the shaded area
that falls between µ ˘ kσ.
156
Limit Theorems 6.1 Limit Theorems
Figure 6.1.1.
Relative Frequency
1
1−
k2
µ− kσ µ+ kσ
Chebyshev Inequality
1
a. For k = 2, approximately 1 ´ 2 = 0.75 of observations fall within k = 2 standard
2
deviation of the mean
1
b. For k = 3, approximately 1 ´ 2 = 0.89 of observations fall within k = 3 standard
3
deviation of the mean
Example 6.1.2
1. The number of customers per day (Y) at a sales counter, has been observed for a long
period of time and found to have mean 20 and standard deviation 2. The probability
distribution of Y is not known. What can be said about the probability that, tomorrow
Y will be greater than 16 but less than 24?
157
Limit Theorems 6.2 Central Limit Theorem (CLT)
2. A mail-order computer business has six telephone lines. Let X denote the number of
lines in use at a specified time. Compute µ and σ for the distribution below. Using
k = 2, 3, what does Chebyshev Inequality suggest about the upper bound relative to
the corresponding probability? Interpret.
x 0 1 2 3 4 5 6
p( x ) 0.10 0.15 0.20 0.25 0.20 0.06 0.04
Solution:
1. µ = 20; σ = 2
2. µ = 2.64; σ = 1.53961
Example 6.1.2
1. the electricity consumption in a city at any given time that is the sum of the demands
of a large number of individual consumers
In these examples, the interest is to model the sum of either demands or quantity of water
as a sum of individual contributions or the measurement error as the sum of unobservable
small errors. What will be the distribution of the sum in these examples?
158
Limit Theorems 6.2 Central Limit Theorem (CLT)
Let X be a random variable, with finite mean µ and finite variance σ2 . Suppose you
repeatedly draw independent samples of size n from the distribution of X. Then as
ÿn
n ÝÑ 8, the distribution of the sample total Xi = X1 + X2 + ¨ ¨ ¨ + Xn becomes
i =1
approximately normal, i.e.,
n
ř
n Xi ´ nµ
i =1
ÿ
2
Xi « N (nµ, nσ ), while z = ? « N (0, 1)
i =1 nσ2
?
nσ2 is called the standard error of the total.
In other words
n
ř
Xi ´ nµ ż a e´z2 /2
i=1?
lim P ď a = ? dz
nÑ8 nσ 2 ´8 2π
= F ( a)
This theorem basically says that sums of n independent random variables (of any type)
are distributed similarly to a Normal random variable when n is large. The CLT is more
effective, when n is larger. The next examples show some applications of CLT.
Example 6.2.2
2. Consider the volumes of soda remaining in 100 cans of soda that are nearly empty. Let
X1 , . . . , X100 , denote the volumes (in ounces) of cans one through one hundred, respec-
tively. Suppose that the volumes X j are independent, and that each X j is Uniformly
distributed between 0 and 2. Find the probability that the 100 cans of soda contain
less than 90 ounces of soda in total.
Solution:
159
Limit Theorems 6.2 Central Limit Theorem (CLT)
ř
175 ´ 50(4) X ´ nµ 190 ´ 50(4)
P(175 ď T ď 190) = P a ă ? ă a
50(1.52 ) nσ2 50(1.52 )
= F (´0.94) ´ F (´2.36)
= 0.1645
In other words, there is 16.45% chance that the total amount of impurity in 50 batches
is between 175 - 190g.
ř
X ´ nµ 90 ´ 100(1)
P( T ď 90) = P ? ăa
nσ2 100(1/3)
= P( Z ď ´1.73)
= 1 ´ 0.9582
= 0.0418
In other words, the probability that the total volume of soda in 100 cans of soda is less
than 90 ounces is approximately 4.18%.
Example 6.2.2
If random samples of size n are drawn from a population repeatedly and sample mean x̄ are
computed. Figure 6.2.1 shows the main idea of Central Limit Theorem using a hypothetical
example. How do all sample means that are generated this way behave as this process
continues indefinitely?
160
Limit Theorems 6.2 Central Limit Theorem (CLT)
Figure 6.2.1.
6.2.3 §§ Simulations
Repeatedly sampling from a population using a specific sampling plan, we can assess the
performance of the resulting sample means.
Population data were generated from a Normal population with µ = 60; σ = 1. Figure 6.2.2
displays the distribution of 10,000 such data points simulated from a N (µ = 60; σ = 1).
161
Limit Theorems 6.2 Central Limit Theorem (CLT)
Figure 6.2.2.
Population of Heights
400
300
Frequency
200
100
µ = 60
0
56 58 60 62 64
Heights
A random sample of size n = 30 was drawn from the population data simulated and
sample mean x̄ was computed. This procedure was repeated 100,000 times. The distribution
of 100,000 sample means x̄ is displayed in Figure 6.2.3.
162
Limit Theorems 6.2 Central Limit Theorem (CLT)
Figure 6.2.3.
Histogram of x
20000
15000
Frequency
10000
5000
µ = 60
0
58 60 62 64
Both the distributions are centered at µ = 60. The variability of the 2 distributions needs
special attention, distribution of 100,000 sample means is much narrower, showing much less
variability.
Population data were generated from a Uniform population with U (0, 1). Figure 6.2.4 dis-
plays the 10,000 such data points simulated from a U (0, 1).
163
Limit Theorems 6.2 Central Limit Theorem (CLT)
Figure 6.2.4.
Population Distribution
120
100
80
Frequency
60
40
20
0
A random sample of size n = 30 was drawn from the population data simulated &
sample mean x̄ was computed. This procedure was repeated 100,000 times. The distribution
of 100,000 sample means x̄ is displayed in Figure 6.2.5.
164
Limit Theorems 6.2 Central Limit Theorem (CLT)
Figure 6.2.5.
Histogram of x
15000
10000
Frequency
5000
0
Both the distributions are centered at µ = 0.5. This shows that regardless of the parent
population, the distribution of the sample means is approximately normal with a mean of
µ = 0.5. Again the same phenomenon is observed in the variability of the 2 distributions,
distribution of 100,000 sample means is much narrower, showing much less variability.
Definition 6.2.3 (Central Limit Theorem (CLT): Sample Mean X̄).
The Central Limit Theorem basically says that for non-normal data, the distribution
of the sample means has an approximate normal distribution, no matter what the
distribution of the original data looks like, as long as the sample size is large enough
(usually at least 30) and all samples have the same size.
If X1 , X2 , . . . , Xn is a random sample of size n taken from a population (either finite
or infinite) with mean µ and finite variance σ2 , and if x̄ is the sample mean 2
σ2 x̄ ´ µ
X̄ „ N (µ, ) as n Ñ 8 Z= ? „ N (0, 1);
n σ/ n
?
where σ/ n is called the standard error of the mean.
165
Limit Theorems 6.2 Central Limit Theorem (CLT)
The central limit theorem tells us that for a population with any distribution,
• the distribution of the sample means approaches a normal distribution as the sample
size increases.
• the mean of the sample means is the same as the mean of the original population as
the sample size increases.
• the distribution of the sample means becomes narrower as the sample size increases,
showing that the standard deviation of the sample means becomes smaller.
Example 6.2.4
A coffee dispensing machine is supposed to dispense a mean of 7.00 fluid ounces of coffee per
cup with a standard deviation of 0.25 fluid ounces. The distribution approximates a normal
distribution. What is the probability that, when 12 cups are dispensed, their mean volume
is more than 7.15 fluid ounces?
Solution:
Let X be the amount of coffee dispensed, X „ N (µ = 7; σ = 0.25); n = 12
X̄ ´ µ 7.15 ´ 7
P( x̄ ą 7.15) = P ? ą ?
σ/ n 0.25/ 12
= P( Z ą 2.08)
= 1 ´ P( Z ă 2.08)
= 1 ´ 0.9811665
= 0.01883
In other words, there is 1.88% chance that the average amount of coffee dispensed exceeds
7.15 ounces.
Example 6.2.4
Example 6.2.5
The fracture strength of tempered glass averages 14 (measured in thousands of pounds per
square inch) and has standard deviation of 2.
a. What is the probability that the average fracture strength of 100 randomly selected
pieces of this glass exceeds 14.5?
b. Find an interval that includes, with probability 0.95, the average fracture strength of
100 randomly selected pieces of this glass.
Solution:
166
Limit Theorems 6.2 Central Limit Theorem (CLT)
a.
X̄ ´ µ 14.5 ´ 14
P( x̄ ą 14.5) = P ? ą ?
σ/ n 2/ 100
= P( Z ą 2.5)
= 1 ´ P( Z ă 2.5)
= 1 ´ 0.9938
= 0.0062
There is 0.6% chance that the average fracture strength of 100 randomly selected pieces
of this glass exceeds 14.5.
b. The central 95% means the area of 5% is divided equally in the 2 tails of the normal
curve. Therefore, P( Z ď z2 ) = 0.95 + 0.05/2 = 0.975 gives the cd f corresponding to
z2 . Looking inside the Normal Distribution table, we find the corresponding z-value as
1.96.
P(z1 ă Z ă z2 ) = 0.95
P(´1.96 ă Z ă 1.96) = 0.95
X̄ ´ µ
Z= ?
σ/ n
X̄2 ´ 14
1.96 = ?
2/ 100
X̄2 = 1.96 ˆ 1/5 + 14
= 14.392
X̄1 = ´1.96 ˆ 1/5 + 14
= 13.608
P(13.608 ă x̄ ă 14.392) = 0.95
There is 95% chance that the average fracture strength of 100 randomly selected pieces
of this glass lies in the interval 13.608 - 14.392.
Example 6.2.5
X ´ np
while Z = a « N (0, 1),
np(1 ´ p)
167
Limit Theorems 6.2 Central Limit Theorem (CLT)
i.e., the binomial distribution approaches to normal for large n. This phenomenon is shown
in simualted data distribution from a binomial with n = 40; p = 0.2 in Figure 6.2.6. The
binomial distribution has become evidently symmetrical.
Figure 6.2.6.
n=40, p=0.2
0.15
0.10
0.05
0.00
0 3 6 9 12 16 20 24 28 32 36 40
168
Limit Theorems 6.2 Central Limit Theorem (CLT)
To use the normal distribution to approximate the probability of obtaining exactly 12 (i.e.,
P( X = 12)), we would find the area under the normal curve from X = 11.5 to X = 12.5, the
lower and upper boundaries of 12, (see Figure 6.2.7). The small correction of 0.5 is used to
allow for the fact of using normal distribution to approximate binomial probabilities.
Figure 6.2.7.
0.20
0.15
0.10
0.05
0.00
0 2 4 6 8 10 13 16 19 22 25
Continuity Correction
§§ Continuity Correction
To find the binomial probabilities in the left hand side of the expressions, calculate the
approximation using normal distribution as shown in the right hand side of expressions given
169
Limit Theorems 6.2 Central Limit Theorem (CLT)
below .
Binomial « Normal
1 1
P( X = k) « P(k ´ ď X ď k + )
2 2
1 1
P( a ď X ď b) « P( a ´ ď X ď b + )
2 2
1
P( X ą k) « P( X ą k + )
2
1
P( X ě k) « P( X ą k ´ )
2
1
P( X ă k) « P( X ă k ´ )
2
1
P( X ď k) « P( X ă k + )
2
Caution: Continuity correction is only used for applications of the Central Limit Theorem
to discrete random variables. Continuity correction is not needed when applying the Central
Limit Theorem to sums of continuous random variables.
Example 6.2.6
At a certain local restaurant, students are known to prefer Japanese pan noodles 40% of the
time. Consider 2000 randomly chosen students, what is the probability that at most 840 of
the students eat Japanese pan noodles there?
Solution:
p = 0.4; n = 2000; np = 2000(0.4) = 800; n(1 ´ p) = 2000(1 ´ 0.4) = 1200; np(1 ´ p) =
480. As both np & n(1 ´ p) ă 5, so normal approximation to binomial is appropriate here.
P( X ď 840) = P X ď 840 + 0.5
840.5 ´ 1200
= P( Z ď ? )
480
= P( Z ď 1.85)
= 0.9677
Probability that at most 840 students eat Japanese pan noodles is 0.9677.
Example 6.2.6
A Poisson random variable with large parameter λ will be distributed like a Normal random
variable.
170
Limit Theorems 6.3 Home Work
2. The service times for customers coming through a checkout counter in a retail store are
independent random variables with mean 1.5 minutes and variance 1.0. Approximate
the probability that 100 customers can be served in less than 2 hours of total service
time.
3. The quality of computer disks is measured by the number of missing pulses. Brand
X is such that 80% of the disks have no missing pulses. If 100 disks of brand X are
inspected, what is the probability that 15 or more contain missing pulses?
4. Consider the lengths of calls handled by Zahir in a call center. The calls are indepen-
dent Exponential random variables, and each call lasts, on average, 1/3 of an hour.
On a particular day, Zahir records the lengths of 24 consecutive calls. What is the
approximate probability that the average of these 24 calls exceeds 1/4 of an hour?
5. At an auction, exactly 282 people place requests for an item. The bids are placed
‘blindly,’ which means that they are placed independently, without knowledge of the
actions of any other bidders. Assume that each bid (measured in dollars) is a continuous
random variable with a mean of $14.9 and a standard deviation of $2.54. Find the
probability that the sum of all the bids exceeds $4150.
6. A machine is shut down for repairs if a random sample of 100 items selected from the
daily output of the machine reveals at least 15% defectives. (Assume that the daily
output is a large number of items.) If on a given day the machine is producing only
10% defective items, what is the probability that it will be shut down?
7. An electronics company manufactures resistors that have a mean resistance of 100 ohms
and a standard deviation of 10 ohms. The distribution of resistance is normal. Find the
probability that a random sample of n = 25 resistors will have an average resistance
less than 95 ohms.
8. PVC pipe is manufactured with a mean diameter of 1.01 inch and a standard deviation
of 0.003 inch. Find the probability that a random sample of n = 9 sections of pipe will
have a sample mean diameter greater than 1.009 inch and less than 1.012 inch.
§§ Answers
1. 0.0021
2. 0.0013
3. 0.9162
171
Limit Theorems 6.3 Home Work
4. 0.8888
5. 0.8869
6. 0.0668
7. 0.0062
8. 0.8186
172
Chapter 7
Joint Distributions
AS YOU READ . . .
1. What is a Joint Distribution?
2. How do you model the joint chance behavior of more than one random variable?
3. What are marginal distributions?
4. What is convolution? How is it useful to find the distribution of sums of independent
random variables?
173
Joint Distributions 7.2 Joint Distributions: Discrete case
What is the probability that X takes on a particular value x, and Y takes on a particular
value y? i.e., what is P( X = x, Y = y)?
The entries in the cells of the Table 7.1 show the joint probabilities associated with X and
Y.
x/y 1 2 3 4 5 6 PX ( x )
1 1/36 1/36 1/36 1/36 1/36 1/36 1/6
2 1/36 1/36 1/36 1/36 1/36 1/36 1/6
3 1/36 1/36 1/36 1/36 1/36 1/36 1/6
4 1/36 1/36 1/36 1/36 1/36 1/36 1/6
5 1/36 1/36 1/36 1/36 1/36 1/36 1/6
6 1/36 1/36 1/36 1/36 1/36 1/36 1/6
PY (y) 1/6 1/6 1/6 1/6 1/6 1/6
Table 7.1
Example 7.1.1
The joint probability mass function of a pair of discrete random variables X and Y
is:
PX,Y ( x, y) = P( X = x & Y = y)
Properties:
1. 0 ď p X,Y ( x, y) ď 1 @ x, y
ÿÿ
2. PX,Y ( x, y) = 1
x y
The joint probability mass function for the roll of 2 dice is shown in Figure 7.2.1. A
nonzero probability is assigned to a point ( x, y) in the plane if and only if x = 1, 2, . . . , 6
and y = 1, 2, . . . , 6. Thus, exactly 36 points in the plane are assigned nonzero probabilities
of 1/36. Further, the probabilities are assigned in such a way that the sum of the nonzero
probabilities is equal to 1.
174
Joint Distributions 7.2 Joint Distributions: Discrete case
Figure 7.2.1.
P(X,Y)
6 6
5 5
4 4
3 3
Die 2 Die 1
2 2
1 1
If we are given a joint probability distribution for X and Y, we can obtain the
individual probability distribution for X or for Y.
2. The probability mass function of Y alone, called the marginal probability mass
function of Y, is defined by:
ÿ
PY (y) = p(y) = P( x, y)
x
175
Joint Distributions 7.2 Joint Distributions: Discrete case
FX,Y ( a, b) = P( X ď a & Y ď b)
The joint cd f for 2 dice rolls is given in Table 7.2 below. The Table entries can be filled in
by cumulating the probabilities in Table 7.1 from the lower end to a certain value of X and
Y
Fxy 1 2 3 4 5 6
1 1/36 2/36 . . . .
2 2/36 4/36 . . . .
3 3/36 6/36 . . . .
4 . . . . . .
5 . . . . . .
6 . . . . . 36/36
Table 7.2
Example 7.2.3
176
Joint Distributions 7.2 Joint Distributions: Discrete case
Figure 7.2.2.
F(X,Y)
6 6
5 5
4 4
3 3
Die 2 Die 1
2 2
1 1
The joint Cumulative Distribution Function for roll of 2 dice is shown in Figure 7.2.2.
A nonzero cumulative probability is assigned to a point ( x, y) in the plane if and only if
x = 1, 2, . . . , 6 and y = 1, 2, . . . , 6. These cumulative probabilities are increasing functions of
x, y and approach to maximum value of 1.
177
Joint Distributions 7.2 Joint Distributions: Discrete case
FX,Y ( a, b) = P( X ď a & Y ď b)
1. 0 ď FX,Y ( a, b) ď 1 @ a, b
2. lim FXY ( a, b) = 0
aÑ´8,bÑ´8
3. lim FXY ( a, b) = 1
aÑ8,bÑ8
178
Joint Distributions 7.2 Joint Distributions: Discrete case
Example 7.2.5
x/y 1 2 3 4 5 6 PX ( x )
1 1/36 1/36 1/36 1/36 1/36 1/36 1/6
2 1/36 1/36 1/36 1/36 1/36 1/36 1/6
3 1/36 1/36 1/36 1/36 1/36 1/36 1/6
4 1/36 1/36 1/36 1/36 1/36 1/36 1/6
5 1/36 1/36 1/36 1/36 1/36 1/36 1/6
6 1/36 1/36 1/36 1/36 1/36 1/36 1/6
PY (y) 1/6 1/6 1/6 1/6 1/6 1/6
179
Joint Distributions 7.2 Joint Distributions: Discrete case
Table 7.3
The condition for independence, i.e., p X,Y ( x, y) = p X ( x ) ¨ pY (y) @ x& y does not hold
true, (see Table 7.3). So X and Y are not independent. There is association between Gender
and CHD.
Example 7.2.8
180
Joint Distributions 7.3 Joint Distributions: Continuous Case
When X and Y are continuous random variables, the joint density function f ( x, y)
describes the likelihood that the pair ( X, Y ) belongs to the neighborhood of the
point ( x, y). The joint pd f of X, Y „ U (0, 1) is visualized as a surface lying above
the xy plane (see Figure 7.3.1).
Properties:
f X,Y ( x, y) ě 0
2.
ż8 ż8
f ( x, y)dx ¨ dy = 1
´8 ´8
3. The joint density can be integrated to get probabilities, i.e., if A and B are
sets of real numbers, then
ż ż
PXPA,YPB = f ( x, y)dx ¨ dy
B A
181
Joint Distributions 7.3 Joint Distributions: Continuous Case
Figure 7.3.1.
f(x,y)
0.8 0.8
0.6 0.6
y 0.4 0.4 x
0.2 0.2
If X and Y are continuous random variables with joint probability density function
f XY ( x, y), then the marginal density functions for X can be retrieved by integrating
over all y1 s: ż
f X (x) = f ( x, y)dy.
y
Similarly, the marginal density functions for Y can be retrieved by integrating over
all x1 s: ż
f Y (y) = f ( x, y)dx.
x
Example 7.3.3
Consider the joint pd f for X and Y:
12 2
f ( x, y) = ( x + xy) for 0 ď X ď 1; 0 ď Y ď 1
7
= 0 elsewhere
182
Joint Distributions 7.3 Joint Distributions: Continuous Case
ż1
12 2
fY (y) = ( x + xy)dx
0 7
1
12 3
= ( x /3 + x y/2)
2
7 0
1
= (4 + 6y)
7
Example 7.3.3
183
Joint Distributions 7.3 Joint Distributions: Continuous Case
Let X, Y be jointly continuous random variables with joint density f (X,Y ) ( x, y), then
the joint cumulative distribution function F ( a, b) is defined as:
żb ża
P( X ď a and Y ď b) = F ( a, b) = f ( x, y)dx.dy
´8 ´8
Properties:
1. 0 ď FX,Y ( a, b) ď 1 @ a, b
2. lim FXY ( a, b) = 0
a,bÑ´8
3. lim FXY ( a, b) = 1
a,bÑ8
B2
f ( x, y) = F ( x, y)
Bx ¨ By
Figure 7.3.2 shows the joint cd f of X, Y „ U (0, 1). The probability F ( x, y) corresponds
to the volume under f ( x, y) = 1, which is shaded. F ( x, y) is an increasing function of X, Y
that is also evident from the shaded part.
184
Joint Distributions 7.3 Joint Distributions: Continuous Case
Figure 7.3.2.
F(x,y)
0.8 0.8
0.6 0.6
y 0.4 0.4 x
0.2 0.2
Solution:
1.
B2 1
f ( x, y) = xy( x + y); for 0 ď x ď 2 & 0 ď y ď 2
Bx ¨ By 16
1 B 2
= ( x + 2xy); for 0 ď x ď 2 & 0 ď y ď 2
16 Bx
1
= (2x + 2y); for 0 ď x ď 2 & 0 ď y ď 2
16
1
= ( x + y); for 0 ď x ď 2 & 0 ď y ď 2
8
185
Joint Distributions 7.3 Joint Distributions: Continuous Case
1
f ( x, y) = ( x + y) for 0 ď X ď 2; 0 ď Y ď 2
8
= 0 elsewhere
2.
ż2
1
f X (x) = ( x + y)dy
0 8
1
= ( x + 1)
4
ż2
1
fY (y) = ( x + y)dx
0 8
1
= ( y + 1)
4
1
f (y) = (y + 1) for 0 ď Y ď 2
4
= 0 elsewhere
Example 7.3.5
Example 7.3.6
Consider the joint pd f for X and Y:
12 2
f ( x, y) = ( x + xy) for 0 ď X ď 1; 0 ď Y ď 1
7
= 0 elsewhere
Find the joint cd f of X and Y
Solution:
żyżx
12 2
FX,Y ( x, y) = ( x + xy)dxdy
0 0 7
x
12 y 3
ż
= ( x /3 + x y/2) dy
2
7 0
y0
12
= ( x3 y/3 + x2 y2 /4)
7 0
1 2
= x y(4x + 3y)
7
186
Joint Distributions 7.3 Joint Distributions: Continuous Case
Example 7.3.6
Let X, Y be jointly continuous random variables with joint density f (X,Y ) ( x, y) and
marginal densities f X ( x ), f Y (y). We say that X and Y are independent if
Example 7.3.8
For the Example 7.3.3
12 2
f ( x, y) = ( x + xy) for 0 ď X ď 1; 0 ď Y ď 1
7
= 0 elsewhere
12 2 6
f X (x) = x + x
7 7
1
f Y (y) = (4 + 6y)
7
187
Joint Distributions 7.4 Convolution
7.4 IJ Convolution
Engineers face a typical problem to determine the pd f of the sum of two random variables X
and Y, i.e., X + Y. This is a common problem as they have to evaluate the average of many
random variables, e.g., the sample mean of a collection of data points.
Example 7.4.1
Let S be the sum that appears on the roll of 2 dice, i.e., S = X1 + X2 ;
P(S = 2) = 1/36
P(S = 3) = P( X1 = 1, X2 = 2) + P( X1 = 2, X2 = 1)
= P ( X1 = 1 ) P ( X2 = 2 ) + P ( X1 = 2 ) P ( X2 = 1 )
= 2/36
Example 7.4.1
Figure 7.4.1.
188
Joint Distributions 7.4 Convolution
Suppose that X and Y are independent, integer valued random variables having
probability mass functions Px and Py , then Z = X + Y is also an integer-valued
random variable with probability mass function. Using the Law of Total Probability,
and independence
PX +Y (z) = P( X + Y = z)
ÿ
= P( X = k, Y = z ´ k). Law of Total Probability
k
ÿ
= P( X = k ) ¨ P(Y = z ´ k). Independence
k
ÿ
= PX (k ) PY (z ´ k )
k
189
Joint Distributions 7.4 Convolution
Figure 7.4.2.
Rolling of 3 Dice
Example 7.4.4
Let X1 and X2 be the outcomes from the 2 dice roll, and let S2 = X1 + X2 be the sum of
these outcomes & S3 = X1 + X2 + X3 the sample space for 3 dice rolls. The distribution
for S3 would then be the convolution of the distribution for S2 with the distribution for X3 .
Find the distribution of S3 = 7.
Solution:
P ( S3 = 7 ) = P ( S2 = 6 ) P ( X3 = 1 )
+ P ( S2 = 5 ) P ( X3 = 2 )
+ P ( S2 = 4 ) P ( X3 = 3 )
+ P ( S2 = 3 ) P ( X3 = 4 )
+ P ( S2 = 2 ) P ( X3 = 5 )
= 5/36 + 1/6
+ 4/36 + 1/6
+ 3/36 + 1/6
+ 2/36 + 1/6
+ 1/36 + 1/6
= 15/216
190
Joint Distributions 7.4 Convolution
Example 7.4.4
n
ÿ
1. Let Xi „ Bernoulli ( p); i = 1, . . . , n. Then the distribution of Xi „ Bin(n; p).
i
3. Let X „ Poi (λ) and Y „ Poi (µ) are independent. Then the distribution of X + Y, i.e.,
X + Y „ Poi (λ + µ).
Example 7.4.5
Suppose that X and Y are independent, continuous random variables having proba-
bility density functions f x and f y . Then the density of their sum is the convolution
of their densities, i.e., let sum Z = X + Y is a continuous random variable with
density
ż8
f X +Y ( z ) = f X (z ´ y) f Y (y)dy
´8
ż8
= f X ( x ) f Y (z ´ x )dx
´8
191
Joint Distributions 7.4 Convolution
ż8
f Z (z) = f X ( x ) f Y (z ´ x )dx
´8
ż8
Finding f X ( x ) f Y (z ´ x )dx comes to the same as finding the area of set: t( x, y)(0, 1) ˆ
´8
(0, 1)|x + y ď zu. As f X ( x ) = 1 if 0 ď x ď 1 and 0 otherwise
ż1
f Z (z) = f Y (z ´ x )dx
0
żz
1. When 0 ă z ă 1, the limits run from x = 0 to x = z, so f Z (z) = 1 ¨ dx = z.
0
ż1
2. When 1 ă z ă 2, the limits run from x = z ´ 1 to x = 1, so f Z (z) = 1 ¨ dx = 2 ´ z.
z´1
$
& z if 0 ď z ď 1,
f Z (z) = 2 ´ z if 1 ă z ď 2
0 otherwise
%
Convolution of two rectangle functions will give you a triangle function. (See the pdf of Z in
Figure 7.4.4).
Example 7.4.7
192
Joint Distributions 7.4 Convolution
Figure 7.4.3.
193
Joint Distributions 7.4 Convolution
Figure 7.4.4.
1.0
0.8
0.6
f(z)
0.4
0.2
0.0
n
ÿ n
ÿ n
ÿ
Xi „ N µi , σi2
i =1 i =1 i =1
Example 7.4.8
194
Joint Distributions 7.4 Convolution
f X +Y ( z ) = f ( X + Y = z )
ż8
= f X ( x ) f Y (z ´ x )dx
´8
żz
= λe´λx λe´λ(z´x) dx
0
= λ2 e´λz z
This density is called the Gamma(2, λ) density. The convolution of n = 2 i.i.d.2 Exponential
distributions results in the Gamma(n = 2, λ) density.
Example 7.4.9
195
Joint Distributions 7.5 Home Work
3. Consider the following joint pm f , f (0, 0) = 1/12; f (1, 0) = 5/12; f (0, 1) = f (1, 1) =
3/12; f ( x, y) = 0 for all other values. Find the marginal distributions of X and Y
respectively.
4. Suppose that a radioactive particle is randomly located in a square with sides of unit
length. Let X and Y denote the coordinates of the particle’s location. A reasonable
model for the relative frequency histogram for X and Y is the bivariate analogue of the
univariate uniform density function:
f ( x, y) = 1 for 0 ď X ď 1; 0 ď Y ď 1
= 0 elsewhere
5. Suppose that two continuous random variables X and Y have a joint probability density
function
196
Joint Distributions 7.5 Home Work
a. Find A.
b. Construct the marginal probability density functions f X ( x ) and f Y (y).
c. Are the random variables X and Y independent?
6. A certain process for producing an industrial chemical yeilds a product that contains
two main types of impurities. Let X denote the proportion of impurities of Type I and
Y denote the proportion of impurities of Type II. Suppose that the joint density of X
and Y can be modelled as,
§§ Answers
1. 256/512; 494/512
4. 0.08; 0.10
1 2(3 ´ x ) y
5. A = ´ ; f X (x) = ; for ´ 2 ď x ď 3; f Y (y) = ; for 4 ď y ď 6; The
125 25 10
random variables X and Y are independent.
6. 0.225
197
Joint Distributions 7.5 Home Work
198
Chapter 8
Properties of Expectation
AS YOU READ . . .
199
Properties of Expectation
8.2 Jointly Distributed Variables: Expectation for Continuous Case
If X and Y are jointly distributed continuous random variables, then initially cal-
culate the marginal pd f of X defined as:
ż
f X ( x ) = f XY ( x, y)dy
y
§§ Expectation: Properties
P( a ď X ď b) = 1, then
a ď E[ X ] ď b
A fundamental property of the expectation operator is that it is linear. If X and Y are jointly
distributed random variables and a, b are real numbers, then
E[ aX + bY ] = aE[ X ] + bE[Y ]
200
Properties of Expectation 8.3 Some Function of Jointly Distributed Random Variable
Example 8.3.2
Example 8.3.3 Expectation and Variance of X
Let X1 , . . . , Xn be i.i.d.1 random variables having distribution function F and expected value
201
Properties of Expectation 8.3 Some Function of Jointly Distributed Random Variable
n
2
ÿ Xi
µ and variance σ . Let X =
n
i =1
ÿ
n
Xi
E( X ) = E
n
i =1
ÿ n
1
= E Xi
n
i =1
n
1ÿ
= E Xi
n
i =1
1
= nµ
n
=µ
ÿ
n
Xi
Var ( X ) = Var
n
i =1
2 ÿ n
1
= ¨ Var Xi
n
i =1
2 ÿ n
1
= ¨ Var Xi
n
i =1
2
1
= ¨ nσ2
n
σ2
=
n
The same results were also displayed in the Central Limit Theorem (see Figure 6.2.3 and
Figure 6.2.5). The reason for the much smaller variability is now mathematically evident.
The variance of the distribution of the sample means x̄ is scaled down by a factor of size n,
the sample size.
Example 8.3.3
202
Properties of Expectation 8.3 Some Function of Jointly Distributed Random Variable
Suppose that X and Y are random variables with joint probability mass function
PXY and marginal probability mass functions PX and PY . Then E[ X + Y ] is given
by
ÿÿ
E[ X + Y ] = ( x + y) P( x, y)
x y
= E( x ) + E(y)
203
Properties of Expectation 8.3 Some Function of Jointly Distributed Random Variable
Another result is also evident that as X and Y are indicator random variables, so their
expectation is equal to their respective marginal probability at X = 1 or at Y = 1.
Example 8.3.5
Suppose that X and Y are random variables and g( x, y) and h( x, y) are some func-
tions of the two variables, then:
E g( X, Y ) ˘ h( X, Y ) = E g( X, Y ) ˘ E h( X, Y )
E g( X, Y ) + h( X, Y ) = E g( X, Y ) + E h( X, Y )
Solution:
Let g( X, Y ) = X + Y and h( X, Y ) = X ´ Y
From Example 8.3.5, E g( X, Y ) = 274/512
204
Properties of Expectation 8.3 Some Function of Jointly Distributed Random Variable
ÿÿ
E h( X, Y ) = h( x, y) P( x, y)
y x
ÿÿ
= ( x ´ y) ¨ P( x, y)
y x
= (0 ´ 0) ˆ 240/512 + (0 ´ 1) ˆ 254/512
+ (1 ´ 0) ˆ 16/512 + (1 ´ 1) ˆ 2/512
= 0 ´ 254/512 + 16/512 + 0
= ´238/512
E g( X, Y ) + h( X, Y ) = E ( X + Y ) + ( X ´ Y )
= E 2X
= 2E X
= 2 ˆ 18/512
= 18/256
E g( X, Y ) + E h( X, Y ) = E ( X + Y ) + E ( X ´ Y )
= 274/512 + (´238/512)
= 18/256
6 E g( X, Y ) + h( X, Y ) = E g( X, Y ) + E h( X, Y )
Example 8.3.7
If two random variables are independent, then the expectation of the product factors
into a product of expectations, i.e.,
E g ( X ) h (Y ) = E g ( X ) ¨ E h (Y )
In particular,
E( XY ) = E( X ) ¨ E(Y )
205
Properties of Expectation 8.4 Conditional Distribution
1. PX|y ( x ) ě 0
ÿ
2. PX|y ( x ) = 1
x
3. PX|y ( x ) = P( X = x|Y = y)
206
Properties of Expectation 8.4 Conditional Distribution
207
Properties of Expectation 8.4 Conditional Distribution
(c).
ÿ
E X|Y = 1 = xP( X|Y = 1)
x
= 0 ˆ 127/128 + 1 ˆ 1/128
= 1/128
Example 8.4.3
1. f X|y ( x ) ě 0
ż
2. f X|y ( x|y)dx = 1
x
3. f X|y ( x ) = f ( X = x|Y = y)
208
Properties of Expectation 8.4 Conditional Distribution
Example 8.4.6
The joint cd f of X and Y
1
F ( x, y) = xy( x + y); for 0 ď x ď 2 & 0 ď y ď 2
16
Find the conditional expectation E(Y|X ).
Solution:
The joint pd f , marginal pd f of X and Y were computed in Example 7.3.5.
#
1
( x + y) for 0 ď X ď 2; 0 ď Y ď 2
f ( x, y) = 8
0 elsewhere
#
1
( x + 1) for 0 ď X ď 2
f (x) = 4
0 elsewhere
#
1
(y + 1) for 0 ď Y ď 2
f (y) = 4
0 elsewhere
1
8 ( x + y)
f (Y|X ) = 1
4 ( x + 1)
x+y
= 1/2
x+1
ż2
x+y
E(Y|X ) = 1/2 y dy
0 x+1
2
1
= ( xy /2 + y /3)
2 3
2( x + 1) 0
x + 4/3
=
x+1
The conditional mean E(Y|X ) is a function of X.
Example 8.4.6
209
Properties of Expectation 8.4 Conditional Distribution
Suppose that X and Y are random variables and g( x, y) is a function of two variables,
then the conditional expectation of g( X|Y ) is:
• Discrete Case: ÿ
E g( X )|Y = y = g( x ) PX|Y ( x|y)
x
• Continuous Case:
ż
E g( X )|Y = y = g( x ) f X|Y ( x|y)dx
x
=E X
210
Properties of Expectation 8.4 Conditional Distribution
As E[ X|Y ] is a random variable, its expectation can be computed using the law of iterated
expectations as:
ÿ
E E[ X|Y ] = E[ X|Y ] P(Y = y)
y
= 1/16 ˆ 1/2 + 1/128 ˆ 1/2
= 9/256
= E( X )
Example 8.4.9
We need to find the conditional expectation of average income given degree status.
E E(Y|X ) = E Y|X = 0 P( X = 0) + E Y|X = 1 P( X = 1)
= 100000 ˆ 0.6 + 200000 ˆ 0.4
= 60, 000 + 80, 000
= 140, 000
211
Properties of Expectation 8.5 Covariance
8.5 IJ Covariance
The covariance matrix of a random vector captures the interaction between the components
of the vector. The diagonal entries contain the variance of each variable and the covariances
between the different variables are placed in the off diagonals.
Example 8.5.1 (Iris Data Set: Covariance Matrix)
The covariance matrix between Petal Length and Petal Width in Fisher’s Iris dataset is given
below:
" #
σx2 σxy
Σ=
σxy σy2
0.68 0.29
=
0.29 0.18
Figure 8.5.1 shows the scatter plot matrix for Fisher’s Iris dataset. Each scatter plot shows
the relationship between a pair of variables Petal Length, Petal Width, Sepal Length and
Sepal Width.
Figure 8.5.1.
Sepal.Length
5.5
4.5
●
●
●
7 2.0 2.5 3.0 3.5 4.0
●
●
● ● ●●
● ●●
● ●● ●
●●● ●
● ● ●●● ● ● ●●
● ●● ●
● ●●
●●
●
● ●
●● ●●● ●
● ●●
●● ●●● ● ●● ●●● ●●●● ●● ●●
Sepal.Width
● ●● ●●●●● ● ●
●●● ●●●●● ● ● ●
● ● ● ● ●●
● ●● ● ●
● ● ●●● ● ●
● ●
● ● ● ●
●●
●
● ●
●● ● ● ●
● ● ● ●
● ●● ●
● ● ● ●● ●
6
● ●●● ●● ● ● ●
●
● ●
● ●● ● ● ●●●●●●
● ●
● ●● ●● ● ●●● ●
●●● ●●
●● ●
●● ●● ●●●● ● ● ●● ●●
●●●
5
● ● ● ●●
● ●
●
●●● ●●●● ●
● ●
●●
●●
●●
●●
● ●●●● ● ●●● ●● ●● ●● ●●●
● ● ●
●●● ● ● ●●●
●●
●● ● ●● ●● ●
● ●
●●●
● ●
Petal.Length
4
● ●●● ●● ●
●● ●
● ● ● ● ●
●● ●●
● ●
3
2
● ● ● ●
● ● ● ●●●●
●● ● ●
●●●
●●● ●●
●●
●●●●● ● ●● ●● ●●
● ● ●
● ●●●●
●●● ●● ● ●●
● ●
●●●●
●● ●● ●
● ● ● ● ● ● ●
1
0.5 1.0 1.5 2.0 2.5
● ● ● ● ● ● ●●
● ● ● ● ● ● ● ●
● ● ●●● ● ● ●●● ● ●●●● ●●● ●
●● ● ● ● ● ●● ●
● ●●● ● ● ● ●● ● ●●●●● ●
●● ● ●● ● ● ● ● ● ●●●● ●●
● ●● ● ● ●● ●●● ●
●●●●●●● ● ●● ● ●●●●●● ●●● ●●●● ●
● ● ● ● ● ●
● ● ● ● ● ●● ●● ● ●
● ● ●● ●●●● ● ● ● ● ●●●●● ● ●●●●●●
● ● ●●● ● ●●●●●●● ● ●●●● ●
● ●●
●●● ●●●● ●
● ●● ●
● ● ●●●●
●●
●●● ●
●
● ●●●●●●●
●●●● ●
●●
Petal.Width
●● ● ●● ● ● ●●● ●● ●●● ●●
● ● ●
● ● ●
●● ● ● ● ●●● ● ●●●●●
●● ● ●● ● ● ● ●● ● ●●●●
● ●●●●●●●●●● ● ●●●●
●●●●●●●● ● ● ●●●
●●
●●
●●●●●
● ●● ● ●● ● ● ● ●●
4.5 5.5 6.5 7.5 2.0 2.5 3.0 3.5 4.0 1 2 3 4 5 6 7 0.5 1.0 1.5 2.0 2.5
Example 8.5.1
212
Properties of Expectation 8.5 Covariance
Figure 8.5.2.
https://www.visiondummy.com/2014/04/geometric-interpretation-covariance-matrix/
Example 8.5.2
213
Properties of Expectation 8.5 Covariance
Example 8.5.4
Let X and Y be two independent Bernoulli random variables with parameter p = 1/2. Con-
sider the random variables
U = X+Y
V = X´Y
214
Properties of Expectation 8.6 Correlation
8.5.1 §§ Properties
σXY = E X ´ E( X ) . Y ´ E(Y )
= E XY ´ E X .E Y
E XY = 2/512 was computed in Example 8.3.2, while E X = 18/512 and E Y =
256/512 in Example 8.3.5.
σXY = E XY ´ E X ¨ E Y
= 2/512 ´ 18/512 ˆ 256/512
= ´7/512
8.6 IJ Correlation
§§ Background
The covariance does not take into account the magnitude of the variances of the random
variables involved. Correlation quantifies the strength of the linear relationship between a
pair of random variables.
215
Properties of Expectation 8.6 Correlation
Figure 8.6.1.
Cov( X, Y )
ρ( X, Y ) = a
Var ( X ).Var (Y )
Note: The variance is only zero when a random variable is constant. So, as long as
X and Y are not constant, then the correlation between them is well-defined.
´1 ď ρ( X, Y ) ď 1
216
Properties of Expectation 8.6 Correlation
The scatter plots below illustrate the cases of strong and weak (positive or negative)
correlation. Figure 8.6.2 shows the strong positive linear association between duration of
the eruption and waiting time between eruptions for the Old Faithful geyser in Yellowstone
National Park, Wyoming, USA.
Figure 8.6.2.
●
●
● ●
●
●
90
● ●● ●●●
● ● ●
● ● ● ● ●●
● ●
● ●● ● ●
● ● ● ●●●
●● ●● ●●● ●● ●
● ● ● ● ● ●●● ● ●
● ● ●●
●●
● ●● ● ● ●
● ● ● ●● ● ●● ● ●● ●
80
● ● ● ● ●● ●
● ● ● ● ●● ● ● ●●
● ●
● ● ● ●● ● ● ●● ●● ●●
●● ●● ● ● ● ● ● ●
●● ●● ● ● ● ● ●
● ● ●
● ● ●
●●
● ● ● ● ●●
● ● ● ●● ●
●
waiting
● ● ● ● ●
70
● ● ●●
● ●
●
●
● ●
● ● ●
● ● ● ●
● ● ●
● ● ● ●
60
● ● ● ●● ●
● ●● ● ● ●●
● ●● ●
● ●
●● ●
● ●● ● ● ●
●
●●●●● ● ●
●● ●● ● ●
● ● ● ● ●
●●●● ● ●
50
● ● ● ●
●●● ● ●
● ● ●
● ● ●
●●
● ●
●● ●
●
eruptions
217
Properties of Expectation 8.6 Correlation
Figure 8.6.3 shows the relationship between weight of patient after study period (lbs)
(Postwt) and weight of patient before study period (lbs) (Prewt) for young female anorexia
patients. There seems to be no strong linear relationship between Postwt and Prewt.
Figure 8.6.3.
●
100
● ●
●
●
●
●
●
95
● ●
●
●
●
●
●
● ●
●
● ●
90
●
●
Postwt
● ●
● ● ●
●
● ●
85
●
● ●
●
● ● ●
● ● ●
● ● ●●
● ● ● ●
80
● ●
● ●
●
● ●
●●
● ●●
●
● ●
75
● ●
●●
●
●
●
70 75 80 85 90 95
Prewt
218
Properties of Expectation 8.6 Correlation
Figure 8.6.4 shows the relationship between Miles/gallon (mpg) and Displacement (disp)
(cu.in.) in cars dataset. There seems to be a negative relationship between mpg and disp.
Figure 8.6.4.
●
●
●
400
● ●
● ●
●
300
●●
disp
● ● ●
●
200
● ●
●
● ●
●
●
● ●
●
100
●
● ● ●
●
10 15 20 25 30
mpg
219
Properties of Expectation 8.6 Correlation
Cov( X, Y )
ρ( X, Y ) = a
Var ( X ) ¨ Var (Y )
2
Var ( X ) = E X ´ ( E X )2
Var (Y ) = E Y 2 ´ ( E Y )2
ÿ 2
E X2 = x P( x, y)
x
= 0 ˆ 240/512 + 02 ˆ 254/512
2
+ 12 ˆ 16/512 + 12 ˆ 2/512
= 18/512
Var ( X ) = 18/512 ´ (18/512)2
= 0.0339
2 ÿ 2
E Y = y P( x, y)
y
= 02 ˆ 240/512 + 12 ˆ 254/512
+ 02 ˆ 16/512 + 12 ˆ 2/512
= 256/512
= 1/2
Var (Y ) = 1/2 ´ (1/2)2
= 0.25
Cov( X, Y )
ρ( X, Y ) = a
Var ( X ) ¨ Var (Y )
´7/512
=a
0.0339 ˆ 0.25)
= ´0.1485
Example 8.6.3
Find a simplified expression for correlation between 10X, Y + 4
Solution:
220
Properties of Expectation 8.6 Correlation
Cov(10X, Y + 4) = 10Cov( X, Y )
Var (10X ) = 100Var ( X )
Var (Y + 4) = Var (Y )
Cov(10X, Y + 4)
ρ( X, Y ) = a
Var (10X ) ¨ Var (Y + 4)
10Cov( X, Y )
=
10SD ( X ) ¨ SD (Y )
Cov( X, Y )
=
SD ( X ) ¨ SD (Y )
=ρ
Example 8.6.3
Example 8.6.4
Let X be a Uniform random variable on the interval [0, 1], and let Y = X 2 . Find the corre-
lation between X and Y .
Solution:
Cov( X, Y )
ρ( X, Y ) = a
Var ( X ) ¨ Var (Y )
2
Var ( X ) = E X ´ ( E X )2
Var (Y ) = E Y 2 ´ ( E Y )2
"
1 if 0 ď x ď 1;
f X (x) =
0 otherwise
221
Properties of Expectation 8.6 Correlation
We see that
ż1
E X = ( x )1dx
0
= 1/2
ż1
E X = ( x2 )1dx
2
0
= 1/3
Var ( X ) = 1/3 ´ (1/2)2
= 1/12
E Y = E X2
= 1/3
2
E Y = E X4
ż1
= ( x4 )1dx
0
= 1/5
Var (Y ) = 1/5 ´ (1/3)2
= 4/45
Cov( X, Y ) = E XY ´ E( X ) E(Y )
= E X 3 ´ E ( X ) E (Y )
ż1
3
E X = ( x3 )1dx
0
= 1/4
Cov( X, Y ) = E X 3 ´ E( X ) E(Y )
= 1/4 ´ 1/2(1/3)
= 1/12
1/12
ρ( X, Y ) = ?
1/12 ˆ 4/45
= 0.968
222
Properties of Expectation 8.7 Home Work
223
Properties of Expectation 8.7 Home Work
§§ Answers
1. -0.16; -0.3563
2. -1/12; -0.3535
3. 0.4048; -0.056
4. 0.01875; 0.397
5. -0.0986; -0.5958
1
6.
1 ´ x2
224
Bibliography
[1] Chan, Stanley H., Introduction to Probability for Data Science, Michigan Publishing,
2021.
[2] Ward, M. D., and Gundlach, E. Introduction to Probability, Freeman Company, 2016.
[3] DeCoursey W. J., Statistics and Probability for Engineering Applications With Mi-
crosoft Excel, Newnes, 2003.
[4] Devore J. L., Probability and Statistics for Engineering & Sciences, Brooks/Cole, 2012.
[5] Forsyth D., Probability and Statistics for Computer Science, Springer, 2018.
[6] Hayter A., Probability and Statistics for Engineers & Scientists, Brooks/Cole, 2012.
[7] Montgomery, Douglas C., and Runger George C., Applied Statistics and Probability for
Engineers, John Wiley & Sons, Inc, 2011.
[8] Mendenhall W., Beaver R.J., and Beaver, B. M., Introduction to Probability and Statis-
tics, 14th Edition , Brooks/Cole, 2013.
[10] Rice J. A., Mathematical Statistics and Data Analysis, 3rd Edition, 2007
[13] Ross S., Introduction to Probability and Statistics For Engineers And Scientists , 3rd
Edition, 2004
225
BIBLIOGRAPHY BIBLIOGRAPHY
[15] Triola E., M., Elementary Statistics, Pearson Education, New York 2005.
[16] Walpole R. E., Myers, R. H., Myers, S. L. and Ye, K.,Probability and Statistics for
Engineers & Scientists, Brooks/Cole, 2012.
226
Index
227
INDEX INDEX
Normal approximation
Binomial, 167
Odds, 37
pdf, 108
Poker Hand, 11
Probability
Equally-likely events, 36
Probability Laws
Complement Rule, 29
Inclusion-exclusion principle, 32
Law-of-Total Probability, 47
Multiplication Law, 41
Dependent Events, 42, 45
Independent Events, 42
Probability Mass Function (pmf), 58
Random Variable, 56
Continuous, 57
Discrete, 57
Randomness, 7
Sample Space, 10
Tree Diagram, 10
Variance
Continuous, 115
Discrete, 69
228