Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Prob Book

Download as pdf or txt
Download as pdf or txt
You are on page 1of 234

Mathematics 230

Probability, 2022 Version

Probability Math 230 2022


Course Pack
Shaheena Bashir & Isaac Mulolani

This document was typeset on Wednesday 31st August, 2022.


§§ Legal stuff
• Copyright © 2021 Shaheena Bashir.

• Latex macro files are based on the CLP Calculus text by Joel Feldman, Andrew Rech-
nitzer and Elyse Yeager.

• This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike


4.0 International License. You can view a copy of the license at
http://creativecommons.org/licenses/by-nc-sa/4.0/.

• Links to the source files can be found at the text webpage

2
Contents

1 Course Outline 1
1.1 Course Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Units of Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Reference Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Prerequisite Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.6 Detailed Syllabus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Combinatorics 7
2.1 Randomness & Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Counting Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Role of Counting Rules in Probability . . . . . . . . . . . . . . . . . 14
2.2.2 Basic Principles of Counting . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.3 Permutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.4 Multinomial Coefficients: Permutations with Indistinguishable Objects 18
2.2.5 Circular Permutation . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.6 Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 Home Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3 Basic Concepts & Laws of Probability 27


3.1 Some Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1.1 Types of Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.2 Axioms of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.3 Inclusion-exclusion principle . . . . . . . . . . . . . . . . . . . . . . . 32
3.2 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3 Multiplication Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.4 Law of Total Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.5 Bayes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.6 Home Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

i
CONTENTS CONTENTS

4 Discrete Distributions 55
4.1 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.1.1 Types of Random Variable . . . . . . . . . . . . . . . . . . . . . . . . 57
4.1.2 Discrete Probability Distribution . . . . . . . . . . . . . . . . . . . . 58
4.1.3 Cumulative Distribution Function (cd f ) . . . . . . . . . . . . . . . . 60
4.2 Expectation of a Random Variable . . . . . . . . . . . . . . . . . . . . . . . 64
4.2.1 Expected Values of Sums of Random Variable: Some Properties . . . 66
4.3 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.3.1 Variance: Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.3.2 Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.4 Bernoulli Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.4.1 Conditions for Bernoulli Variable . . . . . . . . . . . . . . . . . . . . 73
4.4.2 Probability Mass Function (pm f ) . . . . . . . . . . . . . . . . . . . . 74
4.4.3 Bernoulli Distribution: Expectation & Variance . . . . . . . . . . . . 74
4.5 Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.5.1 Background Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.5.2 Binomial Random Variable . . . . . . . . . . . . . . . . . . . . . . . 75
4.5.3 Conditions for Binomial Distribution . . . . . . . . . . . . . . . . . . 76
4.5.4 Probability Mass Function (pm f ) . . . . . . . . . . . . . . . . . . . . 76
4.5.5 Shape of Binomial Distribution . . . . . . . . . . . . . . . . . . . . . 79
4.5.6 Binomial Distribution: Expectation & Variance . . . . . . . . . . . . 81
4.6 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.6.1 Conditions for Poisson Variable . . . . . . . . . . . . . . . . . . . . . 82
4.6.2 Probability Mass Function (pm f ) . . . . . . . . . . . . . . . . . . . . 84
4.6.3 Poisson Distribution: Expectation and Variance . . . . . . . . . . . . 84
4.6.4 Poisson Approximation to the Binomial Distribution . . . . . . . . . 85
4.6.5 Comparison of Binomial & Poisson Distribution . . . . . . . . . . . . 86
4.7 Geometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.7.1 Geometric Distribution Conditions . . . . . . . . . . . . . . . . . . . 87
4.7.2 Probability Mass Function (pm f ) . . . . . . . . . . . . . . . . . . . . 89
4.7.3 Geometric Distribution: Cumulative Distribution Function cd f . . . 90
4.7.4 Geometric Distribution: Expectation and Variance . . . . . . . . . . 91
4.8 Negative Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.8.1 Probability Mass Function (pm f ) . . . . . . . . . . . . . . . . . . . . 93
4.8.2 Negative Binomial Distribution: Expected Value and Variance . . . . 95
4.8.3 Comparison of Binomial and Negative Binomial Models . . . . . . . 97
4.9 Hypergeometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.9.1 Conditions for Hypergeometric Distribution . . . . . . . . . . . . . . 99
4.9.2 Probability Mass Function (pm f ) . . . . . . . . . . . . . . . . . . . . 100
4.9.3 Hypergeometric Distribution: Expected Value and Variance . . . . . 100
4.9.4 Binomial Approximation to Hypergeometric Distribution . . . . . . . 101
4.10 Home Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

5 Continuous Distributions 105


5.1 Continuous Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.2 Continuous Probability Distribution . . . . . . . . . . . . . . . . . . . . . . 107
5.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

ii
CONTENTS CONTENTS

5.2.2 Cumulative Distribution Function (cd f ) . . . . . . . . . . . . . . . . 112


5.2.3 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.2.4 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.3 Piecewise Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.4 Continuous Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . 121
5.4.1 Probability Density Function . . . . . . . . . . . . . . . . . . . . . . 122
5.4.2 Cumulative Distribution Function (cd f ) . . . . . . . . . . . . . . . . 123
5.4.3 Uniform Distribution: Expectation and Variance . . . . . . . . . . . 124
5.5 Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
5.5.1 Probability Density Function (pd f ) . . . . . . . . . . . . . . . . . . . 127
5.5.2 Effect of Mean and Variance . . . . . . . . . . . . . . . . . . . . . . . 128
5.5.3 Properties of Normal Distribution . . . . . . . . . . . . . . . . . . . . 130
5.5.4 Standard Normal Distribution . . . . . . . . . . . . . . . . . . . . . . 131
5.5.5 Finding Probabilities Using Table . . . . . . . . . . . . . . . . . . . . 135
5.5.6 Finding Probabilities and Percentiles . . . . . . . . . . . . . . . . . . 139
5.5.7 Empirical Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.5.8 Normal Distribution: Moment Generating Function . . . . . . . . . . 142
5.5.9 Sums of Independent Normal Random Variables . . . . . . . . . . . . 144
5.6 Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5.6.1 Link between Poisson and Exponential Distribution . . . . . . . . . . 146
5.6.2 Exponential Distribution: (cd f ) . . . . . . . . . . . . . . . . . . . . . 147
5.6.3 Exponential Distribution: (pd f ) . . . . . . . . . . . . . . . . . . . . . 148
5.6.4 Exponential Distribution: Expectation and Variance . . . . . . . . . 150
5.7 Home Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

6 Limit Theorems 155


6.1 Limit Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
6.1.1 Chebyshev Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . 155
6.2 Central Limit Theorem (CLT) . . . . . . . . . . . . . . . . . . . . . . . . . . 158
6.2.1 Sample Total (CLT) . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
6.2.2 Sample Mean (CLT) . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
6.2.3 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
6.2.4 Normal Approximation to Binomial . . . . . . . . . . . . . . . . . . . 167
6.3 Home Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

7 Joint Distributions 173


7.1 Bivariate Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
7.2 Joint Distributions: Discrete case . . . . . . . . . . . . . . . . . . . . . . . . 174
7.2.1 Joint Cumulative Distribution Function (cd f ) . . . . . . . . . . . . . 176
7.2.2 Independent Random Variables . . . . . . . . . . . . . . . . . . . . . 179
7.3 Joint Distributions: Continuous Case . . . . . . . . . . . . . . . . . . . . . . 181
7.3.1 Joint Cumulative Distribution Function (cd f ) . . . . . . . . . . . . . 184
7.3.2 Independent Random Variables . . . . . . . . . . . . . . . . . . . . . 187
7.4 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
7.4.1 Convolution: Discrete Case . . . . . . . . . . . . . . . . . . . . . . . 189
7.4.2 Convolution: Continuous Case . . . . . . . . . . . . . . . . . . . . . 191
7.5 Home Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

iii
CONTENTS CONTENTS

8 Properties of Expectation 199


8.1 Jointly Distributed Variables: Expectation for Discrete Case . . . . . . . . . 199
8.2 Jointly Distributed Variables: Expectation for Continuous Case . . . . . . . 200
8.3 Some Function of Jointly Distributed Random Variable . . . . . . . . . . . . 201
8.3.1 Expectation of Sums of Jointly Distributed Random Variables . . . . 203
8.3.2 Expectation of Sums of Functions of Jointly Distributed Random Vari-
ables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
8.4 Conditional Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
8.4.1 Conditional Distributions: Discrete Case . . . . . . . . . . . . . . . . 206
8.4.2 Conditional Expectation: Discrete Case . . . . . . . . . . . . . . . . 206
8.4.3 Conditional Distribution: Continuous Case . . . . . . . . . . . . . . . 208
8.4.4 Conditional Expectation: Continuous Case . . . . . . . . . . . . . . . 209
8.4.5 Properties of Conditional Expectation . . . . . . . . . . . . . . . . . 210
8.5 Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
8.5.1 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
8.6 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
8.7 Home Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

Index 226

iv
Chapter 1

Course Outline

Since this course is on-campus, you should aim to spend time each day going through some
course content on the relevant unit covered during class. Then you should spend some time
doing some work on the assigned questions in that content.

1.1 IJ Course Description


This course begins with a review of combinatorics along with uses in probability. A review
of set theory is in order for this unit. This is followed by an introduction to basic concepts
of probability, laws of probability, Bayes’ theorem, etc. A case study about the use of Bayes’
theorem in practical life is discussed. There is a discrete distributions unit. The fourth unit
will go through continuous random variables with some simple applications. The fifth unit
will cover joint distributions, covariance & correlation. The last unit in the course will cover
some properties of expectations.

1.2 IJ Units of Instruction


The following units will be covered in this course:

a. Combinatorics

b. Basic Concepts of Probability & Laws

c. Discrete Random Variables & Discrete Distributions

d. Continuous Random Variables & Continuous Distributions

e. Properties of Expectations

f. Joint Distributions

The following are the divisions of the course for this delivery:

1
Course Outline 1.3 Reference Materials

Course Instructor Assignments


Instructor Week Topic
Shaheena week 1-2 Combinatorics
Shaheena week 3 Axioms of Probability
Shaheena week 4 Conditional Probability
Shaheena week 5-8 Discrete Random
Variables
Shaheena week 9-11 Continuous Random
Variables
Shaheena week 12 Limit Theorems
Shaheena week 13 Jointly Distributed
Random Variables
Shaheena week 14 Properties of Expectation

1.3 IJ Reference Materials


For this course, a number of reference materials will be used. The following is a list of
references for this course. Note that there are also books in the library in these areas that
you can use as additional resources.

1. Sheldon Ross, A First Course in Probability, 9th Edition, 2012.

2. Sheldon Ross, Introduction to Probability Models, 10th Edition, 2010.

3. Freund, John, E. Modern Elementary Statistics, Twelfth Edition, Prentice-Hall, 2007.

4. TeXstudio 2.10.8 website: http://texstudio.sourceforge.net/.

1.4 IJ Evaluation
The evaluation for this course will be based on the following:

Best 5 out of 7 Quizzes 35%


Mid-Term Exam 35%
Final Exam 30%

The primary assessment quizzes would be formative assessments, while summative assess-
ments would be the 2 exams. It is imperative that you keep up with the lecture content.
Reviewing the notes and solving assigned problems will help solidify your conceptual un-
derstanding. The quizzes & exams in this course will test not just your ability to perform
procedural calculations but also your grasp of the concepts.

2
Course Outline 1.5 Prerequisite Content

1.5 IJ Prerequisite Content

The prerequisite material for this course includes the use of contents of MATH 101 Cal 1.
The mathematics in this course makes use of the tools discussed in MATH 101 Cal 1. You
are reminded that it is your responsibility to review any material upon which this course
builds upon.

The following are a list of topics you may need to review:

Content for Review


Prerequisite

1. Set Theory; DeMorgan’s Laws; Venn Diagram

2. A clear understanding of the cards in a standard deck of 52 playing cards

3. Basic understanding of differentiation & integration

1.6 IJ Detailed Syllabus

The dates/times listed below are approximate and are subject to change. Adjustments will
be made as the semester progresses. Your schedule indicates that you have a ’Probability’
class every of the week for the duration of the semester. There are two lectures, each of 75
minute duration on Monday & Wednesday at 5:00PM and tutorials that will be announced
by the TA. This will most likely be used to work on the assessments.

3
Course Outline 1.6 Detailed Syllabus

Navigation of content during the Semester


WEEK DATES SECTION
1 September 5 Set Theory , Notes, Combinatorics
2 September 12 Combinatorics Contd’ Examples, problems, exercises
3 September 19 Basic Concepts of Probability, Examples, problems, exercises
4 September 26 Conditional Probability & Bayes Theorem examples, problems,
exercises
5 October 3 Random Variables. Discrete Random variables, Expectation &
Variance Examples, problems, exercises
6 October 10 Discrete Distributions; Bernoulli & Binomial Distributions
7 October 17 Poisson Distributions
8 October 24 Geometric, Negative Binomial &, Hypergeometric Distributions
9 October 31 Continuous Random Variable; Probability Density Function
(pdf); Cumulative Distribution Function (cdf)
10 November 07 Uniform & Normal Random Variables
11 November 14 Exponential Random Variables
12 November 21 Central Limit Theorem and applications
13 November 28 Jointly Distributed Random Variables
14 December 05 Properties of Expectation

4
Course Outline 1.6 Detailed Syllabus

Reading/Studying Schedule

Week Topics Approximate Length

September 5 Introduction to Randomness; Combinatorics 2 weeks

September 19 Axioms of Probability 1 week

September 26 Conditional Probability & Independence 1 week

October 3 Discrete Random Variables, pmf, cdf, expectation,


1 week
variance

October 10 Discrete Distributions examples, problems, exercises 3 weeks

October 31 Continuous Random Variables; pdf, cdf, expectation,


1 week
variance

Uniform, Normal & Exponential Distributions examples,


November 07 2 week
problems

November 21 Limit Theorems examples, problems, exercises 1 week

November 28 Jointly Distributed Random Variables 1 week

December 05 Properties of Expectations, Covariance & Correlations 1 week

5
Course Outline 1.6 Detailed Syllabus

6
Chapter 2

Combinatorics

AS YOU READ . . .

1. What is randomness?
2. What is probability?
3. Why review set theory?
4. What is the link of combinatorics to probability?

2.1 IJ Randomness & Probability


§§ Randomness
Randomness is an erratic behavior of objects, physical phenomenon, etc. A few definitions
of Randomness are given below.

Definition 2.1.1 (Randomness).

• Lack of pattern in events, e.g., many technological systems suffer from a num-
ber of significant uncertainties which may appear at all stages of design, exe-
cution and use

• Statistical uncertainties due to limited availability of data; a random sample


of 50 electrical components taken instead of a batch produced during a certain
time period.

• Lack of knowledge of the behaviour of elements in real conditions

• The Cambridge Dictionary of Statistics defines random as governed by chance;


not completely determined by other factors.

7
Combinatorics 2.1 Randomness & Probability

Some real life examples of Randomness are

1. Everyday chance behavior of markets

2. Daily weather forecast

3. Results of Exit Polls

4. Real life events like accidents

IJ Why Study Probability: Background


• Classical mathematical theory successfully describes the world as a series of fixed and
real observable events, e.g., Ohm’s Law and Newton’s 2nd Law, etc.

• Before the 17th century, classical mathematical theory failed in processes or experi-
ments that involved uncertain or random outcomes.

• Calculation of odds in gambling games was an initial motivation for mathematicians


and later scientific analysis of mortality tables within the medical profession led to the
development of the subject area of Probability.

Probability is one of the most important modern scientific tools that treats those aspects of
systems that have uncertainty, chance or haphazard features. Probability is a mathematical
term used for the likelihood that something will occur.

§§ Why Study Probability: Real Life


• Reliability: Many consumer products like consumer electronics, automobiles, etc use
reliability theory in product design to reduce the risk of failure

• What is the chance of next wave of COVID-19?

• What are the odds that Twitter’s Stock will plunge tomorrow?

• 27% of U.S. workers worry about being laid off, up from 15% in 2019

https://sciencing.com/examples-of-real-life-probability-12746354.html

§§ Why Study Probability: Sciences


• Engineering: Probability is used in quality control and quality assurance to communi-
cation theory in electrical engineering

• Medical Science: Probability helps to quantify the risk of death of a bladder cancer
patients, or how likely it is for a COVID-19 patient to be hospitalized, etc.

8
Combinatorics 2.1 Randomness & Probability

• Actuarial Science is based on the risk of some event, i.e., it deals with lifetimes of
humans to predict how long any given person is expected to live, based on other vari-
ables describing the particulars of his/her life. Though, this expected life span is a
poor prediction when applied to any given person, it works rather well when applied
to many persons. It can help to decide the premium rates the insurance companies
should charge for covering any given person.

• To answer a research question (involving a sample to draw a conclusion about some


larger population), e.g., ‘how likely is it ¨ ¨ ¨?, what are the chances of ¨ ¨ ¨?’ we need to
understand probability, its rules, and probabilistic models.

IJ Link Between Set Theory & Probability Terminology

Math Word Stats Word Description Stats Notation


Element outcome one possible thing amongst many E1 , E2 , . . .
Universal Set sample space everything under consideration S or Ω
Subset Event collection of elements or outcomes A, B, A Ă B
Set notation tu
Empty set set with nothing in it ϕ
Complement not in set A, but in universal set Ac , Ā, A1
Union or either in A or B; or both AYB
Intersection and in A and B AXB
set subtraction in A but not in B A ´ B or AzB = A X Bc
conditions given restriction to sample space A|B

§§ Some Definitions

Definition 2.1.2 (Random Experiment).

In the classical approach to the theory of probability it is often assumed that the
experiments can be repeated arbitrarily (e.g. tossing a coin, testing a concrete cube)
and the result of each experiment can be unambiguously used to declare whether a
certain event occurred or did not occur (e.g., when tossing a coin, observing a ‘T’).
Such an experiment is called a Random Experiment.

9
Combinatorics 2.1 Randomness & Probability

Definition 2.1.3 (sample space S).

The sample space S of a certain random experiment denotes all events which can
be outcomes of the experiment under consideration. The sample space can be:

a. finite, e.g., (tossing a coin 4 times), then the sample space consists of all
24 ´tuples of H and T; i.e.,

HHHH
 HHHT

S= ..
 .
TTTT

b. infinite, e.g., record the duration (in seconds) of the next telephone call; then
the sample space is the set of all positive real numbers

S = t0, 1, 2, . . .u

Definition 2.1.4 (Tree Diagram).

A tree diagram is a graphical representation of Sample spaces. It is a picture


that branches for each option in choice. Tree diagrams are made up of nodes that
represent events, and branches that connect nodes to outcomes. Consider the 3
tosses of a fair coin with 2 possible outcomes for each toss as in Figure 2.1.1.

Figure 2.1.1.

Tree Diagram for 3 Flips of a Fair Coin

10
Combinatorics 2.1 Randomness & Probability

Definition 2.1.5 (Events).

Each possible outcome of a sample space is called a sample point, and an event is
generally referred to as a subset of the sample space having one (simple) or more
(compound) sample points as its elements.

Random Experiment Sample Space Event: Example


4 coin tosses the 4-tuple outcome with contains no
consecutive H’s e.g., HTTH.
Length of phone calls S = t0, 1, 2, . . .u A call between 20 and
30 minutes.

§§ Poker Hands

Standard Deck of 52 Playing Cards is displayed in Figure 2.1.2.

Figure 2.1.2.

Standard Deck of 52 Playing Cards

Definition 2.1.6 (Poker Hands).

A Poker Hand (see Figure 2.1.3) is a set of 5 cards chosen without replacement from
a pack of 52 cards.

11
Combinatorics 2.1 Randomness & Probability

Figure 2.1.3.

Poker Hand Rankings

Poker hand rankings from strongest to weakest are given below:


(1) Royal Flush: A royal flush is a hand consisting of a 10, J, Q, K, and an A all of  the

4
same suit. Since the values are fixed, we only need to choose the suit. There are .
1
possible suits. After that, the other four cards are completely determined. Thus, there
are 4 possible royal flushes:
 
4
# royal flushes = =4
1

(2) Straight-Flush (excluding royal flush): A straight-flush consists of five cards with values
in a row, all of the same suit. Ace may be considered as high or low, but not both.
(e.g., A, 2, 3, 4, 5 is a straight, but Q, K, A, 2, 3 is not a straight.) The lowest value
in the straight may be A, 2, 3, 4, 5, 6, 7, 8, or 9. (Note that a straight flush beginning
with 10 is a royal flush, and we don’t want to count those.) So there are 9 choices for
4
the card values, and then = 4 choices for the suit, giving a total choices of
1
9 ˆ 4 = 36

(3) Four-of-a-kind: A four-of-a-kind is four cards showing the same number plus any other
card of any suit
       
13 4 12 4
# 4-of-a-kind = ˆ ˆ ˆ = 13 ˆ 1 ˆ 48 = 624
1 4 1 1

(4) Full House: A full house is three cards showing the same number plus a pair.
       
13 4 12 4
# Full House = ˆ ˆ ˆ = 13 ˆ 4 ˆ 12 ˆ 6 = 3, 744
1 3 1 2

12
Combinatorics 2.1 Randomness & Probability
 
4
(5) Flush: A flush consists of five cards, all of the same suit. There are = 4 ways to
1  
13
choose the suit, then given that there are 13 cards of that suit, there are = 1287
    5
4 13
ways to choose the hand, giving a total of ˆ = 5, 148 flushes. But note that
1 5
this includes the straight and royal flushes, which we don’t want to include. Subtracting
(36+4=40), we get a grand total of 5, 148 ´ 40 = 5, 108.

(6) Straight (excluding straight-flush): A straight consists of five values in a row, not all of
the same suit. The lowest value in the straight could be A, 2, 3, 4, 5, 6, 7, 8, 9 or 10,
giving 10 choices for the card values. Then there are 4 ˆ 4 ˆ 4 ˆ 4 ˆ 4 = 45 ways to
choose the suits of the five cards, for a total of 10 ˆ 45 = 10, 240 choices. But this value
also includes the straight flushes and royal flushes which we do not want to include.
Subtracting the 40 straight and royal flushes, we get 10, 240 ´ 40 = 10, 200.

(7) Three-of-a-kind: A three-of-a-kind is three cards showing the same number plus two
cards that do not form a pair or create a four-of-a-kind.
         
13 4 12 4 4
# 3-of-a-Kind = ˆ ˆ ˆ ˆ = 54, 912
1 3 2 1 1

(8) Two-pair: Two-pairs is two cards showing the same numbers and another two cards
showing the same numbers (but not all four numbers the same) plus one extra card
(not the same as any of the other numbers).
         
13 4 4 11 4
# 2-Pair = ˆ ˆ ˆ ˆ = 123, 552
2 2 2 1 1

(9) One-pair: One-pair is two cards showing the same numbers and another three cards all
showing different numbers.
       3
13 4 12 4
# 1-Pair = ˆ ˆ ˆ = 1, 098, 240
1 2 3 1

(10) High card: High card means we must avoid higher-ranking hands (also known as no
pair or simply nothing is a hand that does not fall into any other category). All higher-
ranked hands include a pair, a straight, or a flush.Because
 the numbers showing on
13
the cards must be five different numbers, we have choices for the five numbers
5    
4 4
showing on the cards. Each of the cards may have any of four suits, i.e., ˆ ˆ
1 1
       5
4 4 4 4
ˆ ˆ = . We subtract the number of straights, flushes, and royal
1 1 1 1
flushes. (Note that we avoided having any pairs or more of a kind.)
   5
13 4
# High Cards = ˆ ´ 10, 200 ´ 5, 108 ´ 36 ´ 4 = 1, 302, 540
5 1

13
Combinatorics 2.2 Counting Rules

2.2 IJ Counting Rules

2.2.1 §§ Role of Counting Rules in Probability

Consider 2 fair dice are rolled. Let A be the event that a sum of 7 occurs. How likely it is
that event A will occur?

Figure 2.2.1.

Sample Space for rolls of 2 Fair Dice.

§§ Classical Definition of Probability

Number Of Ways Event A Can Occur


P( A) =
Total number Of Possible Outcomes
m
=
n
6
=
36

Therefore, to calculate the required probability, we count the number of ways event A can
occur and the total number of possible outcomes in the sample space S in Figure (2.2.1).

14
Combinatorics 2.2 Counting Rules

Definition 2.2.1 (Combinatorics: Objectives).

Many problems in probability theory require that we count the number of ways that
a particular event can occur. Systematic methods for counting the number of favor-
able outcomes of an experiment fall under the subject area called Combinatorics.
We will study several combinatorial techniques for counting large finite sets without
actually listing their elements. Combinatorial techniques are helpful for counting
the size of events that are important in probability theory. When selecting elements
of a set, the number of possible outcomes depends on the conditions under which
the selection has taken place.

2.2.2 §§ Basic Principles of Counting


There are different rules to count the number of possible outcomes.

Definition 2.2.2 (Generalized Multiplication Rule).

If r experiments that are to be performed are such that the first one may result in
any of n1 possible outcomes; and if there are n2 possible outcomes of the second
experiment; and if, for each of the possible outcomes of the first two experiments,
there are n3 possible outcomes of the third experiment; and so on, then there is a
total of n1 ˆ n2 ˆ ¨ ¨ ¨ ˆ nr possible outcomes of the r experiments. Consider the 3
tosses of a fair coin with 2 possible outcomes for each toss; see Figure 2.1.1. There
are a total of 2 ˆ 2 ˆ 2 = 8 possible outcomes.

Example 2.2.3

1. How many numbers between 99 and 1000 have no repeated digits?

2. How many numbers are there between 99 and 1000 having at least one of their digits
7?

3. In Figure 2.2.2 there are four bus routes between A and B; and three bus routes between
B and C. A man can travel round-trip in number of ways by bus from A to C via B. If
he does not want to use a bus route more than once, in how many ways can he make
round trip?

15
Combinatorics 2.2 Counting Rules

Figure 2.2.2.

Bus Routes

Solution:
1. Numbers between 99 and 1000 means from 100-999. This can be considered a 4-step
process. Every step can be done in a number of ways that does not depend on previous
choices

(a) Choose first digit; the 1st digit can be any of the choices between 1-9, therefore 9
choices for this stage.
(b) Choose second digit; the 2nd digit can be chosen from 0-9 excluding the choice in
the previous stage, therefore 9 choice
(c) Choose third digit; the 3rd digit can be any of the choice between 0-9 excluding
the digits in the first 2 stages, i.e., 8 choices

So there are 9 ˆ 9 ˆ 8 = 648 possible numbers between 99 and 1000 with no repeated
digits.

2. Sample space of the numbers between 99 and 1000 without any restriction consists of
a total of 9 ˆ 10 ˆ 10 = 900 possible ways. With condition of having at least one of
their digits is 7 can be broken down into 2 parts that are complements of each other.

(a) Numbers between 99 and 1000 with condition that none of the digits as 7, i.e.,
8 ˆ 9 ˆ 9 = 648
(b) Numbers between 99 and 1000 with condition of at least one of the digits as 7 is
therefore , i.e., 900 ´ 648 = 252

3. The condition is that he does not want to use a bus route more than once,

(a) For trip from A-C; there are 4 ˆ 3 = 12 choices


(b) For trip from C-A; there are 2 ˆ 3 = 6 choices (excluding the routes chosen in
trip from A-C)

Therefore, 12 ˆ 6 = 72 possible routes for a round trip with condition that a route is
not used more than once.

Example 2.2.3

16
Combinatorics 2.2 Counting Rules

2.2.3 §§ Permutation
Definition 2.2.4 (Permutation Rule).

Permutations are ordered arrangements of all or a part of a set of objects. Permu-


tations are fundamentally of 2 types:

1. Permutation with Repetition:

a. All Objects Arrangement: If all n objects are to be arranged from a set


consisting of n objects, then there are n possibilities for the first choice, n
possibilities for the second choice, and so on. Therefore, n ˆ n ˆ . . . ˆ n =
nn possible ways for this. e.g.,
How many five digit numbers can be formed with the digits: 1, 2, 3, 4,
5?
Solution:
55 five digit numbers can be formed.
b. Part of a Set of Objects Arrangement: The number of permutations of
n objects, taken r at a time, when repetition of objects is allowed is nr .
e.g.,
How many three digit numbers can be formed with the digits: 1, 2, 3, 4,
5?
Solution:
53 three digit numbers can be formed.

2. Permutation without Repetition:

a. All Objects Arrangement: The total number of permutations of a set A


of all n elements is given by

n! = n(n ´ 1)(n ´ 2) . . . 1

0! ” 1
https://www.youtube.com/watch?v=RbugCeR-njk
b. Part of a Set of Objects Arrangement: The total number of permutations
of a set A of n elements is an ordered listing of a subset of A of size k
without replacement and is given by

n n!
Pk = = n(n ´ 1)(n ´ 2) ¨ ¨ ¨ (n ´ k + 1).
(n ´ k)!

Hint: n! permutations of all objects, but (n ´ k ) elements not being


picked up, 6 divide by (n ´ k )! to avoid duplication

Example 2.2.5 (Seating Arrangement)

17
Combinatorics 2.2 Counting Rules

If Ali and Sara (a couple) and Babar and Hina (another couple) and Soban and Muzna (an-
other couple) sit in a row of chairs as in Figure 2.2.3,
1. How many different seating arrangements are there?
2. How many ways they can be seated so that each of the 3 couples sit together?
3. Find also the number of ways of their seating if all the ladies sit together.
4. In how many different ways can the 3 women be seated together on the left, and then
the 3 men together on the right?

Figure 2.2.3.

Seating Plan for 6 people

Solution:
The solutions are as follows
1. 6! ways for seating 6 persons (no restriction)
2. 3! ˆ 2 ˆ 2 ˆ 2 = 48 ways. As 3 couples sit together is 1 condition, so a total of 3 objects
to arrange, but there can be 2 possible ways for each couple sitting together.
3. 3! ˆ 4! = 144 ways that 3 ladies sit together.
4. 3! ˆ 3! = 36 ways that 3 women be seated together on the left, and then the 3 men
together on the right.

Example 2.2.5

2.2.4 §§ Multinomial Coefficients: Permutations with Indistinguishable Objects


How many 8-letter sequences are possible with 3 A’s, 2 B’s, and 3 C’s?
• Certain items are distinct; others are not
• The number of distinguishable permutations of n objects, of which n1 objects are
identical, another n2 objects are identical, and another n3 objects are identical, and so
on nk objects are identical, is
 
n n!
= , with n1 + n2 + n3 + ¨ ¨ ¨ + nk = n
n1 , n2 , . . . , n k n1 !n2 !n3 ! ¨ ¨ ¨ nk !

18
Combinatorics 2.2 Counting Rules

 
n
is called multinomial coefficient.
n1 , n2 , . . . , n k

Example 2.2.6
A bridge hand (4 hands of 13 cards each) is dealt from a standard 52 card deck. How many
different bridge hands are there?
Solution:
n! 52!
=
n1 !n2 !n3 !n4 ! (13!13!13!13!)

Example 2.2.6

2.2.5 §§ Circular Permutation


• Permutation in a circle is called circular permutation.

• How can you arrange seating 3 friends A, B and C around a round table?

Figure 2.2.4.

Circular Permutation.

If we arrange these 3 persons around a round table as show in the Circular Arrangement 1 in
the Figure 2.2.4, we notice that all the different arrangements are not actually different but
all are same. Same is true for Circular Arrangement 2. If you move clockwise, start with A,
round the table in Figure 2.2.4, you will always get A-B-C. Important points to ponder are:

• If the clockwise and counter clockwise orders CAN be distinguished then total number
of circular permutation of n elements taken all together = (n ´ 1)!. The number is
(n ´ 1)! instead of the usual factorial n! since all cyclic permutations of objects are
equivalent because the circle can be rotated. The point is in circular permutation one
element is fixed and the remaining elements are arranged relative to it.

19
Combinatorics 2.2 Counting Rules

• If the clockwise and counter clockwise orders CANNOT be distinguished then total
number of circular permutation of n elements taken all together = (n ´ 1)!/2

Example 2.2.7 (Seating Arrangement)


How many ways Ali and Sara (a couple) and Babar and Hina (another couple) and Soban
and Muzna (another couple) can be seated around a circular table as in Figure 2.2.5 so that

1. couples sit together?

2. If Ali and Soban insist on sitting besides each other, how many arrangements are
possible now to seat them around the table?

Figure 2.2.5.

Round Table with 6 spots.

Solution:
The solutions are as follows

1. (3 ´ 1)! ways for seating 3 couples (condition: couples sit together, so each couple is
taken as a single object.) But there can be 2 possible ways for each couple sitting
together. 6 (3 ´ 1)! ˆ 2 ˆ ˆ2 ˆ 2 ways.

2. (5 ´ 1)! ˆ 2 = 48 ways. As Ali and Soban being together is 1 condition, so a total of 5


objects to arrange, but there can be 2 possible ways for Ali and Soban sitting together.

Example 2.2.7

20
Combinatorics 2.2 Counting Rules

2.2.6 §§ Combinations

Definition 2.2.8 (Combinations).

Combinations are unordered selections of all or a part of set of objects. Combina-


tions are fundamentally of 2 types:

1. Combination without Repetition:

• A sample of k elements is to be chosen without replacement from a set


of n elements. The number of different samples of size k that can be
selected from n is equal to
 
n n!
= , where k ď n
k k!(n ´ k )!
 
n
is called binomial coefficient.
k
• Nice & symmetrical formula
 
n n!
= .
n´k k!(n ´ k)!

2. Combination with Repetition: A sample of k elements is to be chosen with


replacement from a set of n elements. The number of different samples of size
k that can be selected from n is equal to:
 
n+k´1 ( n + k ´ 1) !
= , where k ď n
k k!(n ´ 1)!

§§ Pascal’s Triangle

There is a connection between the total number of subsets of n elements and the binomial
n  
ÿ n
coefficients: = 2n . In Figure 2.2.6, the sum of binomial coefficients in each row is
k
k =0
equal to 2n ; the cardinality of the power-set.

21
Combinatorics 2.2 Counting Rules

Figure 2.2.6.

Pascal’s Triangle.

Example 2.2.9 (Combinations with Repetition)


If I roll 3 identical dice one time each, how many possible unique results can I get? We have
6 options on each die (n = 6), and there are 3 dice (r = 3), but since the dice are identical,
a result of ‘1, 2, 3’ would be the same as ‘2, 3, 1’ or ‘3, 1, 2’. The number of unique results
will be

   
n+r´1 6+3´1
= = 56
r 3

For details on this concept, click on this https://math.libretexts.org/Courses/Monroe_Community_


College/MATH_220_Discrete_Math/7

Example 2.2.9

22
Combinatorics 2.2 Counting Rules

§§ Relationship Between Permutations & Combinations

Figure 2.2.7.

Relationship between Permutation & Combination.

Figure 2.2.7 shows the connection between Permutations and Combinations.


3 3! 3 3!
P2 = = 6; C2 = =3
(3 ´ 2) ! 2!(3 ´ 2)!

Example 2.2.10

1. A store has to hire two cashiers. Five people are interviewed for the jobs. How many
different ways can the hiring decisions be made?
2. Suppose there were 15 business people at a meeting. At the end of the meeting, each
person at the meeting shook hands with every other person. How many handshakes
were there?
3. A poker hand is a set of 5 cards chosen without replacement from a deck of 52 playing
cards. In how many ways can you get a hand with 3 red cards and 2 black cards?
4. There are 3 copies of Harry Potter and the Philosopher’s Stone, 4 copies of The Lost
Symbol, 5 copies of The Secret of the Unicorn. In how many ways can you arrange
these books on a shelf?
Solution:
 
5
1. = 10
2
2. As each person at the meeting shook hands with every otherperson,
 and the order of
15
the handshakes between people does not matter; so a total of = 105 handshakes.
2

23
Combinatorics 2.2 Counting Rules
  
26 26
3.
3 2
4. There are a total of 12 books, therefore 12! ways to arrange those. These 12 books can
be categorized in to 3 distinct sets

(a) 3 copies of Harry Potter


(b) 4 copies of The Lost Symbol
(c) 5 copies of The Secret of the Unicorn

However 3 copies of Harry Potter are not distinct; 4 copies of The Lost Symbol are not
distinct & likewise 5 copies of The Secret of the Unicorn are not distinct. Therefore
multinomial coefficients are used to find the number of arrangements here, i.e.,
12!
= 27720 ways
(3! ˆ 4! ˆ 5!)

Example 2.2.10

24
Combinatorics 2.3 Home Work

2.3 IJ Home Work


1. Find the number of possible 10 character passwords when:

(a) all the 10 characters have to be letters


(b) all letters must be distinct
(c) letters and digits must alternate and be distinct

2. How many ways are there to seat 10 people, consisting of 5 couples, in a row of seats
if:

(a) seats are assigned at random


(b) all couples are to get adjacent seats?

3. A box contains 30 balls, of which 10 are red and the other 20 blue. Suppose you take
out 8 balls from this box without replacement. How many possible ways are there to
have 3 red and 5 blue balls in this sample?

4. How many ways can eight people (including Mandy and Cindy) line up for a bus, if
Mandy and Cindy refuse to stand together?

5. How many integers, greater than 999 but not greater than 4000, can be formed with
the digits 0, 1, 2, 3 and 4, if repetition of digits is allowed?

6. In the laboratory analysis of samples from a chemical process, five samples from the
process are analyzed daily. In addition, a control sample is analyzed two times each
day to check the calibration of the laboratory instruments.

(a). How many different sequences of process and control samples are possible each
day? Assume that the five process samples are considered identical and that the
two control samples are considered identical.
(b). How many different sequences of process and control samples are possible if we
consider the five process samples to be different and the two control samples to
be identical?
(c). For the same situation as part (b), how many sequences are possible if the first
test of each day must be a control sample?

7. Consider the design of a communication system.

(a). How many three-digit phone prefixes that are used to represent a particular ge-
ographic area (such as an area code) can be created from the digits 0 through
9?
(b). As in part (a), how many three-digit phone prefixes are possible that do not start
with 0 or 1, but contain 0 or 1 as the middle digit?
(c). How many three-digit phone prefixes are possible in which no digit appears more
than once in each prefix?

25
Combinatorics 2.3 Home Work

§§ Answers:
1. (a) 2610
(b) 26 ˆ 25 ˆ ¨ ¨ ¨ ˆ 17
(c) 2 ˆ 26 ˆ 10 ˆ 25 ˆ 9 ˆ 24 ˆ 8 ˆ 23 ˆ 7 ˆ 22 ˆ 6

2. (a) 10!
(b) 10 ˆ 1 ˆ 8 ˆ 1 ˆ 6 ˆ 1 ˆ 4 ˆ 1 ˆ 2 ˆ 1
  
10 20
3.
3 5
4. 30240

5. 376

6. (a) 21 (b) 2520 (c) 720

7. (a) 1000 (b) 160 (c) 720

26
Chapter 3

Basic Concepts & Laws of Probability

AS YOU READ . . .

1. What are the basic concepts of probability?

2. What is inclusion-exclusion principle?

3. What are independent and dependent events?

4. What is conditional probability?

5. What is Bayes’ theorem and how it is useful in getting a data based updated probabil-
ity?

3.1 IJ Some Definitions


Definition 3.1.1 (Sample Space & Events).

For each random experiment, there is an associated a random variable, which rep-
resents the outcome of any particular experiment.
A sample space is any set that lists all possible outcomes (or, responses) of some
unknown experiment or situation. A sample space is generally denoted by the cap-
ital letter S, e.g., when predicting tomorrow’s weather, then the sample space is
S = tRain, Cloudy, Sunnyu.
Each subset of a sample space is defined to be an event. When some experiment is
performed, an event either will or will not occur, for the weather forecast example,
the subsets trainu, tcloudyu, train, cloudyu, train, sunnyu, train, cloudy, sunnyu, . . .,
and even the empty set ϕ = tu, are all examples of subsets of S that could be events.

27
Basic Concepts & Laws of Probability 3.1 Some Definitions

3.1.1 §§ Types of Events

Definition 3.1.2 (Null Event).

A null or empty event is one that cannot happen, denoted by ϕ such as getting a
sum of 14 on 2 rolls of a fair die.

Definition 3.1.3 (Mutually Exclusive Events).

If for 2 events A & B, A X B = ϕ then A and B are said to be mutually exclusive


or disjoint, e.g., in rolling a die, let A be the event that even numbers appear and
B the event that odd numbers appear. Then A and B are mutually exclusive events.

Definition 3.1.4 (Equally Likely Events).

All outcomes in the sample space have an equal chance to occur. e.g., coin toss
outcomes ‘H & T’ are equally likely events. In rolling a balanced die each of the
outcomes t1, 2, . . . , 6u are equally likely.

Definition 3.1.5 (At least & At Most Type of Events).

Take an example of tossing 3 coins & the sample space S also visualized in Figure
2.1.1.
S = tTTT, TTH, THT, THH, HTT, HTH, HHT, HHHu
let X denote the number of heads in this example, let A be the event that at least
2 heads appear & B be the event that at most 2 heads appear. Write down A & B

1. At least 2 heads appear, i.e., number of heads is 2 or more in this example

A = X ě 2 = tTHH, HTH, HHT, HHHu

2. At most 2 heads appear, i.e., number of heads is 2 or less in this example

B = X ď 2 = tTTT, TTH, THT, THH, HTT, HTH, HHTu

28
Basic Concepts & Laws of Probability 3.1 Some Definitions

3.1.2 §§ Axioms of Probability

Definition 3.1.6 (Axioms of Probability).

For a random experiment with sample space S, the probability of an event A is


denoted as:
Number Of Ways Event A Can Occur
P( A) =
Total number Of Possible Outcomes
Certain rules govern the assignment of numeric values (probability) to chance events
and outcomes. P( A) must satisfy the following conditions.

a. For every event A in the sample space, 0 ď P( A) ď 1

b. P(S) = 1

c. P(ϕ) = 0

d. If A1 , A2 , . . . , is a collection of disjoint events, then


8
ÿ
P ( A1 Y A2 Y ¨ ¨ ¨ ) = P ( Ai )
i =1

Example 3.1.7 (Roll of 2 Dice Cont’d)


Consider the sample space S from the 2 rolls of a balanced die as in Figure 2.2.1.
Let A: Sum is even, then probability of the event A is

18
P( A) = P(Sum is even) =
36

Let B: Sum is greater than 6, then probability of the event B is

21
P( B) = P(Sum ą 6) =
36

Example 3.1.7

Definition 3.1.8 (Complement Rule).

Let A: Sum is even when 2 fair dice are rolled. Then you might have to find the
probability that A does not occur, i.e., P( Ac ).

18 18
P( Ac ) = 1 ´ P( A) = 1 ´ =
36 36

29
Basic Concepts & Laws of Probability 3.1 Some Definitions

Figure 3.1.1.

Ac

Complement of an Event.

Example 3.1.9 (Example: 3 Coin Toss Cont’d)


Take an example of tossing 3 coins & the sample space S as shown in Figure 2.1.1.

S = tTTT, TTH, THT, THH, HTT, HTH, HHT, HHHu

Find the probability that:

1. at least 2 heads appear,

2. at most 2 heads appear,

Solution:
In Definition 3.1.5, the events corresponding to at least and at most 2 heads were specified.
Let X be the number of heads that appear in 3 tosses, then possible values of X = t0, 1, 2, 3u

1. At least 2 heads appear, then

X ě 2 = tTHH, HTH, HHT, HHHu

30
Basic Concepts & Laws of Probability 3.1 Some Definitions

6 P( X ě 2) = 4/8
= 1/2

We can also use complement rule to find the required probability

P ( X ě 2) = 1 ´ P ( X ă 2)
= 1 ´ 4/8
= 1/2

2. At most 2 heads appear, then

X ď 2 = tTTT, TTH, THT, THH, HTT, HTH, HHTu

P( X ď 2) = 7/8

Alternatively, we can also use complement rule to find the required probability

P ( X ď 2) = 1 ´ P ( X ą 2)
= 1 ´ 1/8
= 7/8

Remember that using complement rule of probability, you partition the sample space into
mutually exclusive events.

Example 3.1.9

31
Basic Concepts & Laws of Probability 3.1 Some Definitions

3.1.3 §§ Inclusion-exclusion principle

Definition 3.1.10 (Addition Law of Probability).

Also called as Inclusion-exclusion principle

1. P( A Y B) = P( A) + P( B) ´ P( A X B)

• In two rolls of a fair die, you might be interested to find the probability
that sum is either even or greater than 6. Let A be the event that sum
is even, while B be the event that sum is greater than 6. Then

P( A Y B) = 18/36 + 21/36 ´ 9/36 = 30/36 = 5/6


Then the probability that the sum is neither even nor greater than 6

P( A Y B)1 = 1 ´ P( A Y B) = 5/6

2. Extension

P( A Y B Y C ) = P( A) + P( B) + P(C )
´ P( A X B) ´ P( A X C ) ´ P( B X C )
+ P( A X B X C )

32
Basic Concepts & Laws of Probability 3.1 Some Definitions

Figure 3.1.2.

A
A∪ B

2 4,6 3,5

Union of 2 Events

33
Basic Concepts & Laws of Probability 3.1 Some Definitions

Figure 3.1.3.

2 4

1 5 13

B C

Union of 3 Events

Definition 3.1.11 (Some Simple Propositions).

If A Ă B, then
P( B) = P( A) + P( Ac X B)
P( A) ď P( B) which is called monotonicity of probability.

Example 3.1.12 (Roll of 2 Dice Cont’d)


Let A: Sum is even and greater than 6 & let A Ă B. You are also given that Ac X B is the
set that the sum is even but less than 7. Then find B from the given information.
Solution:

34
Basic Concepts & Laws of Probability 3.1 Some Definitions

A = t8, 8, 8, 8, 8, 10, 10, 10, 12u


9
P( A) =
36
Ac X B = t2, 4, 4, 4, 6, 6, 6, 6, 6u
9
P( Ac X B) =
36
P( B) = P( A) + P( Ac X B)
9 9
= +
36 36
18
=
36
As A Ă B, then

B = A Y ( Ac X B)
= t8, 8, 8, 8, 8, 10, 10, 10, 12u Y t2, 4, 4, 4, 6, 6, 6, 6, 6u
= t2, 4, 4, 4, 6, 6, 6, 6, 6, 8, 8, 8, 8, 8, 10, 10, 10, 12u

B is thus the event that an even number appears when 2 dice are rolled.
Example 3.1.12

Figure 3.1.4.

Subset

35
Basic Concepts & Laws of Probability 3.1 Some Definitions

Definition 3.1.13 (Probability of Equally likely Events).

• If there are N outcomes in the sample space and each outcome is equally likely,
1
then the probability of each outcome is , e.g., the Probability of getting a
N
Red with the spinner in Figure 3.1.5 is 1/8 as each of the N = 8 outcomes in
spinner are equally likely.

• If there are N outcomes in the sample space and each outcome is equally likely,
n
and A is an event with n outcomes, then P( A) = , e.g., the Probability of
N
getting a Yellow with the spinner in Figure 3.1.5 is 3/8

Figure 3.1.5.

A Spinner.

Example 3.1.14

36
Basic Concepts & Laws of Probability 3.1 Some Definitions

1
Ellie will take 2 books on vacation. She will like the first with probability with , the second
2
2 3
with probability . She will like both the books with probability . What is the probability
5 10
that she likes at least one of them? Find the probability that she dislikes both.
Solution:
1 2 3
P(1st) = ; P(2nd) = ; P( Both) =
2 5 10
1 2 3 6
P(likes at least 1 of them) = P(1st) + P(2nd) ´ P( Both) = + ´ =
2 5 10 10
3 7
P(Dislikes both) = 1 ´ P(Likes both) = 1 ´ =
10 10

Example 3.1.14

§§ Odds
Definition 3.1.15 (Odds).

Odds represent the likelihood that the event will occur. The odds in favor - the
ratio of the number of ways that an outcome can occur compared to how many
ways it cannot occur, i.e.,

Odds in favor = Number of successes (r ) : Number of failures (s)

r/s r
P( A) = =
(r/s) + 1 r+s
e.g., when you roll a fair die the odds of getting a ‘6’ are 1 to 5
1
• Convert from odds to probability 6 P(6) =
1+5
• Convert from a probability to odds, e.g., if probability is 1/6 , then the odds
are ‘1 : 5’

https://www.theweek.co.uk/99357/us-election-2020-polls-who-will-win-trump-biden

Example 3.1.16

1. A study was designed to compare two energy drink commercials. Each participant was
shown the commercials, A and B, in random order and asked to select the better one.
There were 100 women and 140 men who participated in the study. Commercial A was
selected by 45 women and by 80 men. Find the odds of selecting Commercial A for the
men. Do the same for the women.

37
Basic Concepts & Laws of Probability 3.1 Some Definitions

2. People with type O negative blood are universal donors. That is, any patient can
receive a transfusion of O negative blood. Only 7% of the American population have
O negative blood. If 10 people appear at random to give blood, what is the probability
that at least 1 of them is a universal donor?

3. Birthday Paradox: Two people enter a room and their birthdays (ignoring years) are
recorded.

(a.) What is the probability that the two people have a specific pair of birthdates?
(b.) What is the probability that the two people have different birthdates?

Solution:

1. Odds for Commercial A (Women)= 45:55 = 9:11;


Odds for Commercial A (Men)= 80:60 = 4:3

2. Let X be the number of people with O negative group in a group of 10 people. We are
interested to find the probability that in a group of 10 people at least 1 of them has O
negative group, i.e., P( X ě 1). Probability of a single randomly selected person with O
negative group is 0.07, i.e., P( X ) = 0.07; using complement rule P( X c ) = 1 ´ 0.07 =
0.93 is the probability of not having O negative group.

P ( X ě 1) = 1 ´ P ( X ă 1)
= 1 ´ P ( X = 0)
= 1 ´ (0.93)10 7 Donors are independent.
= 0.516

3. (a.) P(two people have a specific pair of birthdates) = 1/365 = 0.0027


(b.) P(two people have different birthdates) = 1 ´ 1/365 = 364/365 = 0.9972

Example 3.1.16

Example 3.1.17 (Birthday Problem Cont’d)


Three people enter a room and their birthdays (ignoring years) are recorded. What is the
probability that there is a pair of people who have the same birthday?
Solution:
Probability of a pair of people sharing the same birthday can be calculated by using the
complement rule as below:

P(at least two people have same birthdates) = 1 ´ P(None have same birthdates)
= 1 ´ 365 ˆ 364 ˆ 363/(3653 )
= 1 ´ 365 ˆ 364 ˆ (365 ´ 3 + 1)/(3653 )
= 0.0082

38
Basic Concepts & Laws of Probability 3.2 Conditional Probability

The birthday problem is also shown in Figure 3.1.6. For sharing a birthday, a single pair
has a fixed probability of 0.0027 for matching. That’s low for just one pair. However, as the
number of people increases rapidly, so does the probability of a match.
Example 3.1.17

Figure 3.1.6.

Probability of Shared Birthday


1.0
0.8
0.6
Probability

0.4
0.2
0.0

0 20 40 60 80 100

Number of People

3.2 IJ Conditional Probability


§§ Background
The relationship between multiple events that occur is important, e.g., draw 2 cards from
a deck of 52 playing cards & find the probability of the 2 Aces. The key question to think
about is, ‘Does the first event influence the outcome of the next event?’ When we know
(or assume) something about a random phenomenon in advance, it allows us to essentially
shrink the sample space to a smaller set of possible outcomes also called a reduced sample
space. This fundamentally alters the probabilities.

39
Basic Concepts & Laws of Probability 3.2 Conditional Probability

Definition 3.2.1 (Conditional Probability).

Flip a coin 3 times, (see Figure 2.1.1). What is the probability that the first coin
comes up heads? Suppose that some additional information that exactly two of the
three coins came up heads becomes available.

S = tTTT, TTH, THT, THH, HTT, HTH, HHT, HHHu

1. P(first coin heads|two coins heads) = ¨ ¨ ¨

2. How do probabilities change when we know that some event B has occurred?

3. The additional information has changed our available information, so proba-


bilities should also change

4. Conditional Probability P( A|B) asks for: Out of all outcomes in B, what


proportion of them are also in A? See Figure 3.2.1.
P( A X B)
P( A|B) ” , given that P( B) ‰ 0
P( B)

Conditional probabilities satisfy all of the three axioms of probabilities

a. For every event A in the sample space, 0 ď P( A|B) ď 1

b. P(S|B) = 1

c. When A1 & A2 are mutually exclusive, then

P( A1 |B Y A2 |B) = P( A1 |B) + P( A2 |B)

(given that P( B) ‰ 0)

Figure 3.2.1.

Conditional Probability.

40
Basic Concepts & Laws of Probability 3.3 Multiplication Rule

§§ Contingency Table: Conditional Probability as Relative Frequency

Example 3.2.2
A recent survey in US asked 100 people if they thought women in the armed forces should
be permitted to participate in combat. The results of the survey of Males & Females cross-
classified by their responses are given in the Table below:
Male (M) Female (F) Total
Yes 32 8 40
No 18 42 60
Total 50 50 100
Find the probability that a randomly selected respondent
1. was female who answered ’yes’.
2. who said ’no’ was a male.
Solution:
8/100
1. P( F|Yes) = = 8/40
40/100
18/100
2. P( M|No ) = = 18/60 = 3/10
60/100
Example 3.2.2

3.3 IJ Multiplication Rule

Example 3.3.1
When a company receives an order, there is a probability of 0.42 that its value is over $1000.
If an order is valued at over $1000, then there is a probability of 0.63 that the customer will
pay with a credit card. What is the probability that the next order will be valued at over
$1000 but will not be paid with a credit card?
Solution:
P(Over 1k) = 0.42
P(C|Over 1k) = 0.63
P(C1 X Over 1k) =?
P(C1 X Over 1k)
P(C1 |Over 1k) =
P(Over 1k)
P(C1 X Over 1k)
1 ´ 0.63 =
0.42
1
P(C X Over 1k) = 0.42 ˆ (1 ´ 0.63)
= 0.1554

41
Basic Concepts & Laws of Probability 3.3 Multiplication Rule

Example 3.3.1

Definition 3.3.2 (Multiplication Rule for Dependent Events).

How to compute the joint probability of A and B when we are given the probability
of A and the conditional probability of B given A, etc?

P( A X B) = P( B|A) ¨ P( A)
P( A X B) = P( A|B) ¨ P( B)

When the outcome or occurrence of the first event affects the outcome or occurrence
of the second event in such a way that the probability is changed, the events are
said to be dependent.

§§ Independence

Definition 3.3.3 (Independent Events).

Roll one fair die and flip one fair coin.

S = t1H, 2H, 3H, 4H, 5H, 6H, 1T, 2T, 3T, 4T, 5T, 6Tu

1. P(Die shows 5) = 2/12 = 1/6

2. What is the probability that the die comes up 5, conditional on knowing that
the coin came up tails?, i.e., P(Die shows 5|tail) = 1/6

In this example P(Die shows 5|tail) = P(Die shows 5) = 1/6, such events are
independent.

Definition 3.3.4 (Multiplication Rule for Independent Events).

Whether or not the occurrence of one event affects the probability of the occurrence
of the other? Two events A and B are independent if the fact that A occurs does
not affect the probability of B occurring. By definition

P( A|B) = P( A)
P( B|A) = P( B)
P( A X B) = P( A) ¨ P( B)

42
Basic Concepts & Laws of Probability 3.3 Multiplication Rule

Example 3.3.5

A Harris pole found that 46% of Americans say they suffer great stress at least once a
week. If three people are selected at random, find the probability that all three will say that
they suffer stress at least once a week.?
Solution:
P(Stress at least once a week)=0.46; As 3 selected people are independent,

6 all three will suffer stress at least once a week = 0.46 ˆ 0.46 ˆ 0.46
= 0.097

Example 3.3.5

Example 3.3.6 (Sampling with Replacement)


Suppose a day’s production of 850 manufactured parts contains 50 parts that do not meet
customer requirements. Suppose two parts are selected from the batch, but the first part is
replaced before the second part is selected. What is the probability that the second part is
defective (denoted as B) given that the first part is defective (denoted as A)?
Solution:
The probability needed can be expressed as P( B|A) Because the first part is replaced prior
to selecting the second part, the batch still contains 850 parts, of which 50 are defective.
Therefore, the probability of B does not depend on whether or not the first part was defective.
That is,

P( B|A) = 50/850
Also, the probability that both parts are defective is

50 50
P( A X B) = P( B|A) ¨ P( A) = ¨ = 0.0035
850 850

Example 3.3.6

Definition 3.3.7 (Independent & Disjoint Events).

Don’t confuse independence and mutually exclusive.

• independence means that probability of one event does not affect the proba-
bility of the other, i.e., P( A|B) = P( A)

• mutually exclusive events can’t occur together, i.e.,

P( A X B) = 0 ùñ P( A|B) = 0 or P( B) = 0

43
Basic Concepts & Laws of Probability 3.3 Multiplication Rule

Definition 3.3.8 (Dependent Events).

When the outcome or occurrence of the first event affects the outcome or occurrence
of the second event in such a way that the probability is changed, the events are
said to be dependent.

Example 3.3.9
Four of the light bulbs in a box of ten bulbs are burnt out or otherwise defective. If two
bulbs are selected at random without replacement; (see Figure 3.3.1) and tested, what is the
probability that

(a). exactly two defective bulbs are found?

(b). exactly one defective bulb is found?

Solution:
As the bulbs are selected without replacement, therefore the selection is of dependent events
and the Multiplication Law for dependent events is used here.

(a). P(exactly two defective bulbs) = P( D1 ) ˆ P( D2 |D1 ) = 4/10 ˆ 3/9 = 12/90

(b). P(exactly one defective bulb) = P( D1 ) ˆ P( G2 |D1 ) + P( G1 ) ˆ P( D2 |G1 ) = 4/10 ˆ


6/9 + 6/10 ˆ 4/9 = 48/90

Example 3.3.9

44
Basic Concepts & Laws of Probability 3.3 Multiplication Rule

Figure 3.3.1.

Tree Diagram for Bulbs Selection.

Generally there are two rules with Tree diagrams that you should keep in mind while
computing probabilities

1. When you are traveling along a branch you multiply the probabilities, i.e., use Multi-
plication Law of Probability.

2. When you go from branch to branch you add, i.e., either of the branches, so use Addition
Law of Probability.

Definition 3.3.10 (Multiplication Rule: Dependent Events).

P( A X B) = P( A|B) ¨ P( B)
P( A X B) = P( B|A) ¨ P( A)
P( A X B X C ) = P(C|A X B) P( A) P( B|A)
P ( A1 X A2 X A n . . . X A n ) = P( A1 ) ¨ P( A2 |A1 ) ¨ P( A3 |A1 X A2 )
¨ ¨ ¨ P( An |A1 X A2 X A3 ¨ ¨ ¨ X An´1 )

Students find it difficult to decide which Probability law to use for a certain scenario. Use
of Figure 3.3.2 while solving each problem will be helpful in making the correct choice.

45
Basic Concepts & Laws of Probability 3.3 Multiplication Rule

Figure 3.3.2.

Some Tips for Selection of Probability Laws: A Flowchart

46
Basic Concepts & Laws of Probability 3.4 Law of Total Probability

3.4 IJ Law of Total Probability


Definition 3.4.1 (Law of Total Probability).

Sometimes a problem gives several different conditional probabilities and or inter-


section probabilities of an event A, but never gives the probability of the event A
itself. Let B1 , B2 , . . . , Bk be a partition of the sample space S so that Bi are disjoint
events with S = B1 Y B2 Y ¨ ¨ ¨ Y Bk as in Figure 3.4.1. Events resulting from such
a partition of the sample space into disjoint events are called mutually exclusive
and collectively exhaustive events. Such a partition divides any set A into disjoint
pieces as

A = A X B1 Y A X B2 ¨ ¨ ¨ A X Bk

As event A is the union of mutually exclusive events A X Bi then using Addition


Law of Probability for disjoint events
k
ÿ
P( A) = P( A X Bi )
i =1
ÿ k
= P( A|Bi ) P( Bi )
i =1

Figure 3.4.1.

Partitioning of the smple space S into several mutually exclusive subsets.

Example 3.4.2 (Binary Signal)


A simple binary communication channel carries messages by using only two signals, say 0

47
Basic Concepts & Laws of Probability 3.4 Law of Total Probability

and 1. We assume that, for a given binary channel, 40% of the time a 1 is transmitted;
the probability that a transmitted 0 is correctly received is 0.90, and the probability that a
transmitted 1 is correctly received is 0.95. Determine the probability of a 1 being received.
Solution:
Use a Tree diagram as in Figure 3.4.2. Here we are given different simple probabilities
P(0) = 0.6; P(1) = 0.4
and some conditional probabilities
P(0|0) = 0.90; P(1|1) = 0.95
We need to find the probability of 1 being received. Using the tree diagram in Figure
3.4.2

P(one being received) = P(0 X 1) + P(1 X 1)


= 0.6 ˆ (1 ´ 0.90) + 0.4 ˆ 0.95
= 0.44

Example 3.4.2

Figure 3.4.2.

P (0 ∩ 0)
0|0
0 . 90
|0 )=
P (0
P (0)
1|0
P (1
0 |0) =
.60 1−
=0 0.90
0) P (0 ∩ 1)
P(
S
P (1 ∩ 0)
P( 1 0|1
1) .95
=0
.4 1 −0
|1) =
P (0
P (1)
1|1
P (1
|1) =
0 . 95
P (1 ∩ 1)

Binary Signal

48
Basic Concepts & Laws of Probability 3.5 Bayes’ Theorem

3.5 IJ Bayes’ Theorem


Definition 3.5.1 (Bayes’ Theorem).

• When an event Bi occurs, it is natural to investigate which of the events A


caused Bi ,

• With known P( A|Bi ), move in the ’reverse’ direction in the tree diagram and
use P( A|Bi ) to find P( Bi |A) called ’Posterior Probability’

P( Bi ) ¨ P( A|Bi ) P( Bi ) ¨ P( A|Bi )
P( Bi |A) = =
P( A) k
ř
P( A|Bi ) ¨ P( Bi )
i =1

The prior (marginal) probability of an event Bi , i.e., P( Bi ) is revised after event A


has been considered to yield a posterior probability P( Bi |A). (See Figure 3.5.1.)

Figure 3.5.1.

P (A ∩ B1 )
A| B 1
)
|B 1
P (A
P (B1 )
A¯|B
1
B1 P (A¯
) |B1
B1 )
P( P (Ā ∩ B1 )
S
P (A ∩ B2 )
B2
P(
B2 A| B 2
)
) |B 2
P (A
P (B2 )
A¯|B
2
P (A¯
|B2
)
P (Ā ∩ B2 )

Example 3.5.2 (Binary Signal Cont’d)


In the Example 3.4.2 given a 1 is received, find the probability that 1 was transmitted.

49
Basic Concepts & Laws of Probability 3.5 Bayes’ Theorem

Solution:
P(one was transmitted|one being received) is asking to move in the reverse direction in the
Figure 3.4.2.
P(one received|one transmitted). P(1 transmitted)
P(one was transmitted|one being received) =
P(one being received)
P (1 X 1)
=
P (0 X 1) + P (1 X 1)
0.4 ˆ 0.95
=
0.6 ˆ (1 ´ 0.90) + 0.4 ˆ 0.95
0.38
=
0.44
= 0.863
There is 86.3% chance that signal one was transmitted when signal one was received.
Example 3.5.2

Example 3.5.3 (Bayes’ Theorem: Detection of Rare Events)


The reliability of a particular skin test for tuberculosis (TB) is as follows:
If the subject has TB, the test comes back positive 98% of the time. If the subject does not
have TB, the test comes back negative 99% of the time. In a large population 2 in every
10,000 people have TB. A person, who was randomly selected from this large population, has
a test that comes back positive. What is the probability they actually have TB?
Solution:
P(TB|+ test Result) is asking to get an updated probability of having TB.
P(+test Result|TB).P( TB)
P(TB|+ test Result) =
P(+test Result)
P( TB X +test Result)
=
P( TB X +test Result) + P( TBc X +test Result)
0.0002 ˆ 0.98
=
0.0002 ˆ 0.98 + 0.9998 ˆ 0.01
0.000196
=
0.010194
= 0.019227
There is 0.019 probability of having TB when the test result was positive, means that most
likely the test result is false positive.
Example 3.5.3

A Case Study: Modern British Legal History


• Sally Clark was arrested for the murder of her two infant sons in 1998.
• The prosecution case relied on flawed statistical evidence presented by paediatrician
Professor Roy Meadow, who testified that the chance of two children from an affluent
family suffering SIDS was 1/8500 ˆ 1/8500 =, i.e, 1 in 73 million.

50
Basic Concepts & Laws of Probability 3.5 Bayes’ Theorem

• The Royal Statistical Society later issued a statement and expressed concern at the ’mis-
use of statistics in the courts arguing that there was no statistical basis for Meadow’s
claim,’.

• Clark was wrongly convicted in November 1999. The convictions were upheld on appeal
in October 2000, but overturned in a second appeal in January 2003, after it emerged
that the prosecution forensic pathologist who examined both babies, had failed to
disclose microbiological reports that suggested the 2nd of her sons had died of natural
causes.

• Clark’s experience caused her to develop serious psychiatric problems and she died in
her home in March 2007 from alcohol poisoning.

§§ Applications
Standard applications of the multiplication formula, the law of total probabilities, and Bayes’
theorem occur with two-stage systems. The response for such systems can be thought of as
occurring in two steps or stages.

• Typically, we are given the probabilities for the first stage and the conditional proba-
bilities for the second stage.

• The multiplication formula is then used to calculate joint probabilities for what happens
at both stages;

• Law of Total Probability: used to compute the probabilities for what happens at the
second stage;

• Bayes’ Theorem: used to calculate the conditional probabilities for the first stage, given
what has occurred at the second stage

51
Basic Concepts & Laws of Probability 3.6 Home Work

3.6 IJ Home Work


Identify the Probability law used from the scenario and calculate the required probability.

1. The WW Insurance Company found that 53% of the residents of a city had homeowner’s
insurance with its company. Of these clients, 27% also had automobile insurance with
the company. If a resident is selected at random, find the probability that the resident
has both homeowner’s and automobile insurance.

2. If there are 25 people in a room, what is the probability that at least two of them share
the same birthday?

3. You have a blood test for a rare disease that occurs by chance in 1 person in 100,000.
If you have the disease, the test will report that you do with probability 0.95 (and that
you do not with probability 0.05). If you do not have the disease, the test will report
a false positive with probability 0.001. If the test says you do have the disease, what
is the probability it that you actually have the disease? Interpret the results

4. You go to see the doctor about an ingrown toenail. The doctor selects you at random
to have a blood test for swine flu, which is currently suspected to affect 1 in 10,000
people in Australia. The test is 99% accurate, in the sense that the probability of a
false positive is 1%. The probability of a false negative is zero. You test positive. What
is the new probability that you have swine flu? Interpret the results

5. Suppose that 65 percent of a discount chain’s employees are women and 33 percent of
the discount chain’s employees having a management position are women. If 25 percent
of the discount chain’s employees have a management position, what percentage of the
discount chain’s female employees have a management position?

6. A company administers an “aptitude test for managers” to aid in selecting new man-
agement trainees. Prior experience suggests that 60 percent of all applicants for man-
agement trainee positions would be successful if they were hired. Furthermore, past
experience with the aptitude test indicates that 85 percent of applicants who turn out
to be successful managers pass the test and 90 percent of applicants who do not turn
out to be successful managers fail the test. a If an applicant passes the “aptitude test
for managers,” what is the probability that the applicant will succeed in a management
position? b Based on your answer to part a, do you think that the “aptitude test for
managers” is a valuable way to screen applicants for management trainee positions?
Explain.

7. Three data entry specialists enter requisitions into a computer. Specialist 1 processes 30
percent of the requisitions, specialist 2 processes 45 percent, and specialist 3 processes
25 percent. The proportions of incorrectly entered requisitions by data entry specialists
1, 2, and 3 are .03, .05, and .02, respectively. Suppose that a random requisition is
found to have been incorrectly entered. What is the probability that it was processed
by data entry specialist 1? By data entry specialist 2? By data entry specialist 3?

52
Basic Concepts & Laws of Probability 3.6 Home Work

§§ Answers
1. 0.1431. Multiplication Law of Probability for Dependent Events.
365 ˆ 364 ˆ ¨ ¨ ¨ ˆ 341
2. P( None) = = 0.4313 P(At least 2 share)=1-0.4313=0.5687
36525
3. 0.0094. There is only 0.94% chance that you do have the disease, or in other words the
test result is most likely false positive.

4. 0.0099. There is only 0.99% chance that you do have the swine flue, or in other words
the test result is most likely false positive

5. 0.1269

6. 0.927; Yes

7. 0.247; 0.616; 0.137; Bayes’ Law

53
Basic Concepts & Laws of Probability 3.6 Home Work

54
Chapter 4

Discrete Distributions

AS YOU READ . . .

1. What is a Random Variable and what are different types of Random Variable?

2. What is a Discrete Random Variables?

3. What is Probability Mass Function (pmf), Cumulative Distribution Function (cdf)?

4. What is Expected Value and Variance?

5. What are different Discrete Probability Models, i.e., Bernoulli, Binomial, Poisson, Ge-
ometric, Negative Binomial and Hypergoemetric Distributions?

6. How are these models used to quantify uncertainty in practical life?

4.1 IJ Random Variables


§§ Background
We often summarize the outcome from a random experiment by a simple number. In many
of the examples of random experiments that we have considered, the sample space has been
a description of possible outcomes. In some cases, descriptions of outcomes are sufficient, but
in other cases, it is useful to associate a number with each outcome in the sample space. The
idea is to summarize the outcome from a random experiment by a simple number, e.g., closing
price of Twitter Stock on NYSE. Before an experiment is carried out, its outcome is uncertain.
It follows that, because a random variable assigns a number to each experimental outcome,
a random variable can be thought of as representing an uncertain numerical outcome that
is not known in advance. For this reason, the variable that associates a number with the
outcome of a random experiment is referred to as a random variable, e.g., Twitter Stock
plunged by 7% the day after banning Trump

55
Discrete Distributions 4.1 Random Variables

Definition 4.1.1 (Random Variable).

1. A random variable is a function from the sample space S to the real numbers,
i.e., X is a rule which assigns a number X (s) for each outcome x P S

2. A random variable is denoted by an uppercase letter such as X, Y, Z. After


an experiment is conducted, the measured value of the random variable is
denoted by a lowercase letter such as x, y, z.

Example 4.1.2

1. Toss a coin. The sample space is S = tT, Hu. Let X be the number of heads from the
result of a coin toss, then X = t0, 1u
"
1/2 for x = 0, 1;
P( X = x ) =
0 otherwise.

2. Let X denote the sum of the numbers on the upper faces that might appear when 2
fair dice are rolled, (see Figure 2.2.1). Then X = t2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12u.

3. If we roll three dice, there are 63 = 216 possible outcomes. Let the random variable X
be the sum of the three dice. In this case, X = t3, 4, . . . , 17, 18u.

Example 4.1.2

56
Discrete Distributions 4.1 Random Variables

4.1.1 §§ Types of Random Variable


Consider the difference in how we measure the outcome of 3 tosses of a fair coin, e.g., our
interest might be in the number of heads (see Figure 4.1.1) versus the heights of the students
in cm, (see Figure 4.1.2).
1. Discrete Random Variable: The possible values of such variable can be listed in either
a finite or an infinite list, e.g.,

a. The number, x, of the next three customers entering a store who will make a
purchase. Here x could be 0, 1, 2, or 3.
b. The number, x, of four patients taking a new antibiotic who experience gastroin-
testinal distress as a side effect. Here x could be 0, 1, 2, 3, or 4.
c. The number of defective parts produced in manufacturing. Here x could be 0, 1,
2, or 3.
d. The number of people getting flu in winter. Here x could be 0, 1, 2, or 3.

Figure 4.1.1.

0 1 2 3

Number of Heads in 3 Tosses of a Fair Coin

2. Continuous Random Variable: assumes any values in an interval (either finite or infi-
nite) of real numbers for its range, e.g.,

a. The height of students,


b. The weight of new born babies,
c. The temperature (in degrees Celcius) during a day.

57
Discrete Distributions 4.1 Random Variables

d. The life length of some electric device


e. The time (in minutes) that a customer in a store must wait to receive a credit
card authorization.
f. The interest rate (in percent) charged for mortgage loans at a bank.

Figure 4.1.2.

150 160 170 180 190

Heights of Students in cm

4.1.2 §§ Discrete Probability Distribution


Definition 4.1.3 (Probability Mass function (pm f )).

The probability distribution of a random variable X is a description of the prob-


abilities associated with the possible values of X. For a discrete random variable,
the distribution is often specified by just a list of the possible values along with
the probability of each, or in some cases, it is convenient to express the probability
in terms of a formulatable, or a graph that provides p( x ) = P( X = x ) for all x.
Probability Mass Function or pmf is written as:

P( X = x ) = p( x )
Because p( x ) is a function that assigns probabilities to each value x of the random
variable X, it is sometimes called the probability function for X.

58
Discrete Distributions 4.1 Random Variables

§§ Probability Mass Function (pm f ): Properties


For any discrete probability distribution, the following must be true:

1. 0 ď P( X = x ) ď 1, @ x P S .
ÿ
2. P( X = x ) = 1, where the summation is over all values of x with nonzero probability.
x

For a discrete random variable, its probability mass function (pm f ) defines all that we need
to know about the random variable. A pm f for a discrete random variable is defined (with
positive probabilities) only for a finite or countably infinite set of possible values - typically
integers. Toss a fair coin three times and let X denote the number of heads observed. The
probability distribution of X is shown graphically in Figure 4.1.3.

Figure 4.1.3.
1/2
3/8
P(X=x)

1/4
1/8
0

0 1 2 3

Number of Heads in 3 tosses of a fair coin

A Probability Mass Function.

If we roll 2 dice, let X be the sum that appears on the upper faces. The probability
distribution of X is shown graphically in Figure 4.1.4.

59
Discrete Distributions 4.1 Random Variables

Figure 4.1.4.

Example 4.1.4
A company has five warehouses, only two of which have a particular product in stock. A
salesperson calls the five warehouses in random order until a warehouse with the product
is reached. Let the random variable Y be the number of calls made by the salesperson.
Calculate the probability mass function.
Solution:
Let X be the event that a particular product in stock, then P( X ) = p = 2/5. Let Y be the
number of calls made by the salesperson needed to find a warehouse with the product. He
calls the warehouses one by one until he finds the warehouse with required product.

Y P (Y ) F (Y )
1 2/5 2/5
2 3/5 ˆ 2/4 = 3/10 2/5 + 3/10 = 7/10
3 3/5 ˆ 1/2 ˆ 2/3 = 1/5 2/5 + 3/10 + 1/5 = 9/10
4 3/5 ˆ 1/2 ˆ 1/3 = 1/10 2/5 + 3/10 + 1/5 + 1/10 = 1

Example 4.1.4

4.1.3 §§ Cumulative Distribution Function (cd f )


§§ Background
We calculated the probability that at most 2 heads turn up when we roll a coin 3 times, i.e.,
P( X ď 2). The probability that the random variable X does not exceed x (i.e., will be less
than or equal to x) is denoted as:
ÿ
F ( x ) = P( X ď x ) = P( X )
Xďx

60
Discrete Distributions 4.1 Random Variables

Definition 4.1.5 (Cumulative Distribution Function (cd f )).

F ( x ) = P( X ď x ) is called the Cumulative Distribution Function cd f . For discrete


random variables the cumulative distribution function will always be a step function
with jumps at each value of x that has probability greater than 0.

Example 4.1.6
Toss a fair coin three times and let x denote the number of heads observed. Find the corre-
sponding cumulative distribution function cd f .

$

’ 1/8 for x = 0;
& 3/8 for x = 1;


P( X = x ) = 3/8 for x = 2;
1/8 for x = 3;




0 otherwise.
%

Solution:

$

’ 0 x ă 0;
& 1/8 0 ď x ă 1;


F(X = x) = 4/8 1 ď x ă 2;
7/8 2 ď x ă 3;




1 x ě 3.
%

The plot for the cd f is given in Figure 4.1.5.

61
Discrete Distributions 4.1 Random Variables

Figure 4.1.5.

1
7/8

6/8
5/8
F(x)

4/8


3/8
2/8
1/8


0

0 1 2 3

Number of Heads in 3 tosses of a fair coin

A Cumulative Distribution Function corresponding to Figure 4.1.3.

Pay attention to the jump size in the step function? What do you conclude?
Example 4.1.6

Example 4.1.7
In Example 4.1.4, calculate the cumulative distribution function of the number of calls made
by the salesperson.
Solution:
Let Y be the number of calls made by the salesperson needed to find a warehouse with the
product. He calls the warehouses one by one until he finds the warehouse with required
product. The cdf of Y is the probability that he makes at most Y call
$

’ 0 y ă 1;
2/5 1 ď y ă 2;


&
F (Y = y ) = 7/10 2 ď y ă 3;
9/10 3 ď y ă 4;




1 y ě 4.
%

62
Discrete Distributions 4.1 Random Variables

Example 4.1.7

§§ Getting pm f from cd f
It is sometimes useful to be able to provide cumulative probabilities such as and that such
probabilities can be used to find the probability mass function of a random variable. There-
fore, using cumulative probabilities is an alternate method of describing the probability
distribution of a random variable. If the range of a discrete random variable X consists of
the values x1 ă x2 ă ¨ ¨ ¨ ă xn then

p ( x1 ) = F ( x1 )
p( xi ) = F ( xi ) ´ F ( xi´1 ); i = 2, 3, . . . n

Example 4.1.8
X is a discrete random variable with cd f as:
$

’ 0.00 ´8 ď x ă ´3;
& 0.03 ´3 ď x ă 1;


F(X = x) = 0.20 1 ď x ă 2.5;
0.76 2.5 ď x ă 7;




1.00 7 ď x ă 8.
%

Write down the pm f from the above cd f in appropriate form. Given that X is positive, what
is the probability that it will be at least 2?
Solution:
The probability mass function at each point is the change in the cumulative distribution
function at the point. Therefore,
$
’ 0.03 for x = ´3;

& 0.17 for x = 1;


P( X = x ) = 0.56 for x = 2.5;
0.24 for x = 7;




0 otherwise.
%

Given that X is positive, what is the probability that it will be at least 2?


P ( X ě 2 X X ą 0)
P( X ě 2|X ą 0) =
P ( X ą 0)
0.80
=
1 ´ 0.03
= 0.8247

Example 4.1.8

63
Discrete Distributions 4.2 Expectation of a Random Variable

§§ Cumulative Distribution Function cdf: Properties


The cumulative distribution function satisfies the following properties:

1. lim F ( x ) = 0
xÑ´8

2. lim F ( x ) = 1
xÑ+8

3. For a ď b, F ( a) ď F (b) i.e., F is a non-decreasing function

4. F ( X ) is a right continuous function, ( lim F ( xn ) = F ( x ) for all x P R).


xn Ñx

4.2 IJ Expectation of a Random Variable


Toss a coin 50 times and count the number of times heads might turn up. How many heads
do you expect?

Figure 4.2.1.

The probability mass function provides complete information about the probabilistic
properties of a random variable. One of the most basic summary measures is the expec-
tation or mean of a random variable, i.e., E( X ). It is the average value of random variable X
and certainly reveals one of the most important characteristics of its distribution, i.e., center.
It is a probabilistic term that describes the likely outcome of a scenario. The concept of the
expected value of a random variable parallels the notion of a weighted average. The possible
values of the random variable are weighted by their probabilities, as specified in the following
definition:

64
Discrete Distributions 4.2 Expectation of a Random Variable

Definition 4.2.1 (Expected value of a Discrete Random Variable).

If X is a discrete random variable that assumes values x1 , x2 , . . . , xn along with


corresponding probabilities P( x1 ), P( x2 ), . . . , P( xn ). Then expected value of X is
n
defined as: ÿ
E( X ) = µ = x j P( X = x j )
j =1

1. E( X ) is also called the 1st moment of the random variable X about zero. The
first moment of X is synonymously called the mean, expectation, or average
value of X.

2. Let g( X ) is a real valued function of random variable X, then:


  ÿ n
E g( X ) = g( x j ) P( X = x j )
j =1
 
E g( X ) is also called generalized moment. Let g( X ) = X n , n = 1, 2, . . .; the
expectation E( X n ) , when it exists, is called the nth moment of X.

If X takes on a countably infinite number of values x1 , x2 , . . . , then


8
ÿ
E( X ) = x j P( X = x j )
j =1

Imagine placing the masses p( xi ) at the points xi on a beam; the balance point of the
beam is the expected value of X. Consequently, it describes the ’center’ of the distribution
of X in a manner similar to the balance point of a loading.

Figure 4.2.2.

Example 4.2.2

1. Toss three fair coins and let X denote the number of heads observed. Find the expected

65
Discrete Distributions 4.2 Expectation of a Random Variable

number of heads. $

’ 1/8 for x = 0;
& 3/8 for x = 1;


P( X = x ) = 3/8 for x = 2;
1/8 for x = 3;




0 otherwise.
%

2. Consider a game that costs $1 to play. The probability of losing is 0.7. The probability
of winning $50 is 0.1, and the probability of winning $35 is 0.2. Would you expect to
win or lose if you play this game?

Solution:

1.

E( X ) = 0 ˆ 1/8 + 1 ˆ 3/8 + 2 ˆ 3/8 + 3 ˆ 1/8


= 12/8
= 1.5

2. Let X be the gain when you play the game

Gain ( X ) P( X )
-1 0.7
(50-1) 0.1
(35-1) 0.2

E( X ) = ´1 ˆ 0.7 + (50 ´ 1) ˆ 0.1 + (35 ´ 1) ˆ 0.2


= 11

In the long run, you are expected to gain $11 if you play this game.

Example 4.2.2

4.2.1 §§ Expected Values of Sums of Random Variable: Some Properties


A fundamental property of the expectation operator is that it is linear.

• For any constant c,


E[cX ] = cE[ X ].

• If X1 , X2 , . . . , Xn are discrete random variables with finite expected values, and a1 , a2 , . . . , an


are constant numbers, then

E ( a 1 X1 + a 2 X2 + ¨ ¨ ¨ + a n X n ) = a 1 E ( X1 ) + a 2 E ( X2 ) + ¨ ¨ ¨ + a n E ( X n )

This holds even when the X 1j s are dependent.

66
Discrete Distributions 4.2 Expectation of a Random Variable

• If X1 , X2 , . . . , Xn are identically distributed discrete random variables, then

E ( X1 + X2 + ¨ ¨ ¨ + X n ) = E ( X1 ) + E ( X2 ) + ¨ ¨ ¨ + E ( X n ) (4.2.1)
= nE( X ) identical distributions (4.2.2)

This holds even when the X 1j s are independent or dependent.

• For any random variable X and constants a & b, if Y = aX + b, then

E( aX + b) = aE( X ) + b

Definition 4.2.3 (Independent and Identically Distributed (i.i.d.)).

A collection of random variables X1 , . . . , X N is called independent and identically


distributed (i.i.d.) if all X1 , . . . , X N are independent and all X1 , . . . , X N have the
same distribution, i.e., Px1 ( x ) = ¨ ¨ ¨ = PXN ( x ). � �

Definition 4.2.4 (Median of a Random Variables).

A median of X is any point that divides the mass of the distribution into two equal
parts; that is, x0 is a median of X if
1
P( X ď xo ) =
2
The mean of X may not exist, but there exists at least one median.

Definition 4.2.5 (Indicator Random Variables: Bernoulli Variable).

Random variables that are coded as 1 when an event occurs or 0 when the event
does not occur are called indicator random variables. In other words, I A maps all
outcomes in the set A to 1 and all outcomes outside A to 0. Roll a die. Let A be
the event that a 6 appears. Then
"
1 if x P A;
IA (x) =
0 otherwise.

If X is an indicator random variable for event A (6 appears), then E( X ) = P( A).

Example 4.2.6
Four students order noodles at a certain local restaurant. Their orders are placed indepen-
dently. Each student is known to prefer Japanese pan noodles 40% of the time. How many
of them do we expect to order Japanese pan noodles?

67
Discrete Distributions 4.3 Variance

Solution:

Let X denote the number of students that order Japanese pan noodles altogether. Let
X1 , X2 , X3 , X4 be the indicator random variables representing the 4 students if they make a
choice of Japanese pan noodles or not

"
0 for xi = 0;
P ( Xi ) =
0.4 for xi = 1;

E( Xi ) = 1 ˆ 0.4 + 0 ˆ 0.6
= 0.4

Then the number of students that order Japanese pan noodles X = X1 + X2 + X3 + X4 . As


X1 , X2 , . . . , X4 are identically distributed discrete random variables, then expected value of
their sums is equal to the sum of the respective expectations, see Equation 4.2.1. Because
each has the same expected value, E( x ) = 4(0.4) = 1.6.

Example 4.2.6

4.3 IJ Variance

Figure 4.3.1.

68
Discrete Distributions 4.3 Variance

Definition 4.3.1 (Variance).

The variance of a random variable X is a measure of how spread out its possible
values are. The variance of X is the 2nd central moment, commonly denoted by σ2
or Var ( X ). It is the most commonly used measure of dispersion of a distribution
about its mean. Large values of σ2 imply a large spread in the distribution of X
about its mean. Conversely, small values imply a sharp concentration of the mass of
distribution in the neighborhood of the mean as shown in Figure 5.2.4. For discrete
random variable

Var ( X ) = E( X ´ µ)2
= E( X 2 ) ´ [ E( X )]2 ,

where E( X 2 ) is the 2nd moment of the random variable X about zero. Variance is
the average value of the squared deviation of X from its mean µ. If X has units of
meters, e.g., the variance has units of meters squared.
For any random variable X, the variance of X is nonnegative, i.e.,

Var ( X ) = E( X ´ µ)2 ě 0

4.3.1 §§ Variance: Properties


1. If c is a constant, then Var (c) = 0

2. Var [cX ] = c2 Var [ X ]

3. Var [ X + c] = Var [ X ].

4. Var ( X ) ě 0

5. If X1 , . . . , Xn are independent random variables, and a1 , . . . , an are constants, then

Var ( a1 X1 + ¨ ¨ ¨ + an Xn ) = a21 Var ( X1 ) + ¨ ¨ ¨ + a2n Var ( Xn )

6. The variance operator is not linear, but it is straightforward to determine the variance
of a linear function of a random variable. For any random variable X and any constants
a and b, let Y = aX + b, then Y is also a random variable and

Var (Y ) = Var ( aX + b) = a2 Var ( X )

4.3.2 §§ Standard Deviation

69
Discrete Distributions 4.3 Variance

Definition 4.3.2 (Standard Deviation).

The sample standard deviation σ is the square root of the sample variance. It can
be interpreted as the distance of the data values to the mean.
b
σ = Var ( X )

Standard deviation has the same units as that of X.

b b
SD (Y ) = SD ( aX + b) = Var ( aX + b) = a2 Var ( X )

Example 4.3.3

(a) Ali and his brother both like chocolate chip cookies best. They have a jar of cookies
with 5 chocolate chip cookies, 3 oatmeal cookies, and 4 peanut butter cookies.
They are each allowed to have 3 cookies. To be fair, they agree to randomly
select their cookies without peeking, and they each must keep the cookies that
they select. What is the variance of the number of chocolate chip cookies that Ali
gets?
(b) A student was at work at the county amphitheater, and was given the task of
cleaning 1500 seats. To make the job more interesting, his boss hid a golden
ticket somewhere in the seats. The ticket is equally likely to be in any of the
seats. Let X be the number of seats cleaned until the ticket is found. Calculate
the variance of X.

Solution:

(a) Let X denote the number of chocolate chip cookies that Ali selects. As he is
allowed to have 3 cookies, therefore X = 0, 1, 2, 3

X P( X
 )   
 X ¨ P( X ) X 2 ¨ P( X )
5 7 12
0 / = 7/44 0 0
03  3 
5 7 12
1 / = 21/44 21/44 21/44
12  3 
5 7 12
2 / = 7/22 14/22 28/22
21  3 
5 7 12
3 / = 1/22 3/22 9/22
3 0 3
Total 5/4 95/44

70
Discrete Distributions 4.3 Variance

n
ÿ
E( X ) = x j P( X = x j )
j =1
= 5/4

n
ÿ
2
E( X ) = x2j P( X = x j )
j =1
= 95/44

Var ( X ) = E( X 2 ) ´ [ E( X )]2
= 95/44 ´ (5/4)2
= 0.6

(b) Let X be the number of seats cleaned until the ticket is found. As there are 1500
seats, so the probability of finding the ticket is p = 1/1500. The student will
start cleaning the seats & move to clean the next seat only if he does not find the
ticket in the previous seat.
X P( X )
1 1/1500
2 1 ´ 1/1500 ˆ 1/1499 = 1/1500
3 1 ´ 1/1500 ˆ 1 ´ 1/1499 ˆ 1/1498 = 1/1500
.. ..
. .
1500 1/1500

E( X ) = 1/1500 ˆ (1 + 2 + ¨ ¨ ¨ + 1500)
= 1/1500 ˆ 1500(1500 + 1)/2
= 750.5

E( X 2 ) = 1/1500 ˆ (12 + 22 + ¨ ¨ ¨ + 15002 )


= 1/1500 ˆ 1500(1500 + 1)(2 ˆ 1500 + 1)/6
= 750750.1667

Var ( X ) = E( X 2 ) ´ [ E( X )]2
= 750750.167 ´ 750.52
= 187499.9067

Example 4.3.3

Example 4.3.4 (Examples of Applications of Variance)

71
Discrete Distributions 4.3 Variance

1. Uncertain effect of a change in the State Bank’s monetary policy on economy,


2. Variation in ocean temperatures at a single location indicates something about how
heat moves from place to place in the ocean.
3. Why different patches of the same forest have different plants on them.
Example 4.3.4

Example 4.3.5
Four cards are labeled $1, $2, $3, and $6. A player pays $4, selects two cards without re-
placement at random, and then receives the sum of the winnings indicated on the two cards.
Will the player win or lose money in the long run? What is the variance of the winning?
Solution:
Let X be the sum of the2cards that he selects without replacement. Probability for any
4
value of P( X = x ) = 1/ = 1/6.
2

X=Sum Y = X´4 P( X ) Y ¨ P( X )
(1,2)=3 3 ´ 4 = ´1 1/6 ´1/6
(1,3)=4 4´4 = 0 1/6 0
(1,6)=7 7´4 = 3 1/6 3/6
(2,3)=5 5´4 = 1 1/6 1/6
(2,6)=8 8´4 = 4 1/6 4/6
(3,6)=9 9´4 = 5 1/6 5/6
Total 1 12/6 = 2

Table 4.1
ÿ
As the expected winning E(Y ) = Y ¨ P( X ) = $2 (see Table 4.1), so the player will win
money ($2) in the long run.
Expected winning could also be calculated using expected value of the linear combination,
Y = X ´ 4.

E (Y ) = E ( X ´ 4 )
= E( X ) ´ 4
= 36/6 ´ 4
=2
Variance of the winning can be calculated using property of variance as given below:
Var ( X ´ 4) = Var ( X )
Var ( X ) = E( X 2 ) ´ [ E( X )]2
= 122/3 ´ 62
= 4.67

72
Discrete Distributions 4.4 Bernoulli Distribution

So variance of the winning is $4.67.


Example 4.3.5

IJ Important Discrete Probability Distribution


A probability distribution is a representation of random variables and the associated probabil-
ities of different outcomes. A probability distribution is characterized by a probability mass
function pm f for discrete, or by probability density function pd f for continuous random
variables respectively. Distributions relate probabilities to outcomes of random variables.
Distributions may be considered ’models’ for describing variables from populations. We will
introduce several well-known distributions along with properties of the random variables, i.e.,
expected value, variance, etc.

IJ Warning About Notation!


It is traditional to write X ∼ F to indicate that X has distribution F. This is unfortunate
notation since the symbol ∼ is also used to denote an approximation. Read X ∼ F as ’X has
distribution F’ not as ’X is approximately F’ .

4.4 IJ Bernoulli Distribution

4.4.1 §§ Conditions for Bernoulli Variable


1. A Bernoulli random variable models random experiments that have two possible out-
comes, sometimes referred to as ’success’ and ’failure.’ Bernoulli random variable is
called an indicator variable

2. This random variable can only take two possible values, usually 0 and 1.

(a) Toss a coin. The outcome is ether heads ( X = 1) or tails ( X = 0).

(b) Taking a pass-fail exam; either pass ( X = 1) or fail ( X = 0).

(c) A newborn child is either male ( X = 1) or female ( X = 0).

3. The probabilities of success p and of failure q = 1 ´ p are positive such that p + q = 1

73
Discrete Distributions 4.4 Bernoulli Distribution

4.4.2 §§ Probability Mass Function (pm f )

Definition 4.4.1 (Bernoulli Distribution (pm f )).

A random variable X „ Bernoulli ( p), where p is the probability of success is the


only parameter for this distribution. The pm f for Bernoulli distribution is:

P( X = x ) = p x (1 ´ p)1´x
Also written as "
1 ´ p for x = 0
P( X ) =
p for x = 1

In applications, Bernoulli random variables often occur as indicators; see 4.2.5.

4.4.3 §§ Bernoulli Distribution: Expectation & Variance

1
ÿ
E( X ) = xi p xi (1 ´ p)1´xi
i =0
=p

The second moment is


1
ÿ
E( X 2 ) = xi2 p xi (1 ´ p)1´xi
i =0
=p

6 Var ( X ) = E( X 2 ) ´ [ E( X )]2
= p (1 ´ p )

Example 4.4.2
Thirty-eight percent of the songs on a student’s music player are rock songs. A student
chooses a song at random, with all songs equally likely to be chosen. Let X indicate whether
the selected song is a rock song. Find the expected number and variance of X.
Solution:
Let X be the indicator random variable with X = 1 if the selected song is a rock song, X = 0
otherwise. The probability of rock song X is p = 0.38.
"
1 ´ 0.38 f or x = 0
P( X ) =
0.38 f or x = 1

74
Discrete Distributions 4.5 Binomial Distribution

E( X ) = p
= 0.38
Var ( X ) = p(1 ´ p)
= 0.38 ˆ (1 ´ 0.38)
= 0.2356
Example 4.4.2

4.5 IJ Binomial Distribution


4.5.1 §§ Background Example
The possible outcomes in 3 tosses of a fair coin are:
S = tTTT, TTH, THT, THH, HTT, HTH, HHT, HHHu
We want to calculate the probability of getting 1 head in 3 tosses, e.g.,
1 2 1
P( TTH ) = ˆ
2 2
1 1 1 3
P(1 H in 3 tosses) = + + =
8 8 8 8
What will be the probability of obtaining 1 H in 50 tosses? Do we write down the sample
space for the experiment for 50 tosses? It will be very tedious.

4.5.2 §§ Binomial Random Variable


In 3 flips of a fair coin example given above, since each flip is a random variable (Bernoulli),
the sum is also a random variable. This new random variable is the binomial random variable.
a. Count the number of successes X that occur in n independent bernoulli trials, then X
is said to be a binomial random variable with parameters (n, p)
b. Bernoulli random variable is just a binomial random variable with parameters (1, p)
c. If X1 , X2 , . . . , Xn are chosen independently and each has the Bernoulli( p) distribution,
and Y = X1 + ¨ ¨ ¨ + Xn , then Y will have the Binomial (n, p) distribution.

§§ Practical Life Examples of Binomial Distribution


a. Each item from manufacturing production line can be either defective or non-defective.
b. Chances of profit and loss in Stock Market.
c. Proportion of students who pass the exam.
d. Proportion of people who have recovered from COVID-19.
e. Proportion of voters who favored Biden in the US election.

75
Discrete Distributions 4.5 Binomial Distribution

4.5.3 §§ Conditions for Binomial Distribution


Definition 4.5.1.

Binomial Experiment

1. A random experiment with fixed number of trials, i.e., n.

2. Trials are independent & identical.

3. Each trial results in one of 2 possible outcomes, i.e., ”success”, or a ”failure”.

4. The probabilities of success p and of failure (q = 1 ´ p) are constant across


trials.

4.5.4 §§ Probability Mass Function (pm f )


Definition 4.5.2 (Binomial Distribution (pm f )).

A random variable X „ Bin (n, p), where n and p are the parameters of the
Binomial distribution. The pm f for Binomial distribution is:
 
n x
P( X = x ) = p (1 ´ p)n´x
x

1. n is the fixed number of trials.

2. x is the number of successes in n trials, i.e., x = 0, 1, . . . , n.

3. p is the probability of success on any given trial.


 
n
4. is the binomial coefficient.
x
5. n & p are the parameters of the binomial distribution.

Example 4.5.3
A particular concentration of a chemical found in polluted water has been found to be lethal
to 20% of the fish that are exposed to the concentration for 24 hours. Ten fish are placed in
a tank containing this concentration of the chemical in water.

(a). Find the probability that at least 8 survive.

(b). Find the probability that at most 6 survive.

Solution:
n = 10; p = 0.20. Let X be the number of fish that survive,

76
Discrete Distributions 4.5 Binomial Distribution

(a).

P( X ě 8) = P( X = 8) + P( X = 9) + P( X = 10)
   
10 8 10´8 10
= 0.2 (1 ´ 0.2) + 0.29 (1 ´ 0.2)10´9
8 9
 
10
+ 0.210 (1 ´ 0.2)10´10
10
= 0.000078

(b).

P ( X ď 6) = 1 ´ P ( X ą 6)
 
= 1 ´ P( X = 7) + P( X = 8) + P( X = 9) + P( X = 10)
   
10 7 10´7 10
= 1´ 0.2 (1 ´ 0.2) + 0.28 (1 ´ 0.2)10´8
7 8
    
10 9 10´9 10 10 10´10
+ 0.2 (1 ´ 0.2) + 0.2 (1 ´ 0.2)
9 10
= 0.999136

Example 4.5.3

Example 4.5.4
An airline estimates that 5% of the people making reservations on a certain flight will not
show up. Consequently, their policy is to sell 84 tickets for a flight that can only hold 80
passengers. What is the probability that there will be a seat available for every passenger
that shows up?
Solution:
P(No show) = 0.05; 6 P(show) = 0.95; n = 84

P( X ď 80) = 1 ´ P( X ą 80)
 
= 1 ´ P( X = 81) + P( X = 82) + P( X = 83) + P( X = 84)

= 0.6103

There will be 61.03% chance of a seat being available for everyone who show up.
Example 4.5.4

Example 4.5.5
A hospital receives 1/5 of its COVID-19 vaccine shipments from Moderna and the remainder
of its shipments from Pfizer. Each shipment contains a very large number of vaccine vials.
For Moderna shipments, 10% of the vials are ineffective, while for Pfizer, 2% of the vials are

77
Discrete Distributions 4.5 Binomial Distribution

ineffective. The hospital tests 30 randomly selected vials from a shipment and finds that one
vial is ineffective. What is the probability that this shipment came from Moderna?
Solution:
Let M be the event that shipment is from Moderna, P be the event that shipment is from
Pfizer, while I be the event that shipment is ineffective.
We are given P( M ) = 1/5; P( P) = 1 ´ 1/5 = 4/5; n = 30. Let X be the number of
ineffective vials in the sample of size 30.
 
30
1. P( I|M) = 0.10; P( X = 1|M ) = 0.101 (1 ´ 0.10)30´1 = 0.141
1
 
30
2. P( I|P) = 0.02; P( X = 1|P) = 0.021 (1 ´ 0.02)30´1 = 0.334
1
P( M|I ) is asking to get an updated probability of having the ineffective shipment from
Moderna.
P( I|M) P( M )
P( M|I ) =
P( I )
P( I X M)
=
P( I X M) + P( P X M)
1/5 ˆ 0.141
=
1/5 ˆ 0.141 + 4/5 ˆ 0.334
= 0.0954
There is a 9.54% chance that the ineffective vial is from Moderna.
Example 4.5.5

Example 4.5.6
The probability of a student passing an exam is 0.2. How many students must take the exam
to make the probability 0.99 that any number of students will pass the exam?
Solution:
p = 0.2; n =?. Let X be the number of students who pass. Any number of students will
pass, means that at least 1 student will pass the exam.
P ( X ě 1) = 1 ´ P ( X ă 1)
 
n
0.99 = 1 ´ 0.20 (1 ´ 0.2)n´0
0
0.99 = 1 ´ 0.8n
0.8n = 1 ´ 0.99
n = log(0.01)/log(0.8)
= 20.6377
Therefore n « 21. So 21 students must take an exam so that probability of any passing the
exam is 0.99.
Example 4.5.6

78
Discrete Distributions 4.5 Binomial Distribution

4.5.5 §§ Shape of Binomial Distribution

Binomial distribution is unimodal.

1. If p ă 0.5 the distribution will exhibit POSITIVE SKEWNESS, as shown in Figure


4.5.1

2. for p ą 0.5 the distribution will exhibit NEGATIVE SKEWNESS, (see, Figure 4.5.2).

3. if p = 0.5 the distribution will be SYMMETRIC, (see, Figure 4.5.3).

4. if n Ñ 8 binomial distribution becomes symmetric & bell-shaped, (see, Figure 4.5.4).


This is an important result that is central to the Central Limit Theorem.

Figure 4.5.1.

n=15, p=0.2
0.25
0.20
0.15
0.10
0.05
0.00

0 1 2 3 4 5 6 7 8 9 11 13 15

79
Discrete Distributions 4.5 Binomial Distribution

Figure 4.5.2.

n=15, p=0.8

0.25
0.20
0.15
0.10
0.05
0.00

0 1 2 3 4 5 6 7 8 9 11 13 15

Figure 4.5.3.

n=15, p=0.5
0.15
0.10
0.05
0.00

0 1 2 3 4 5 6 7 8 9 11 13 15

80
Discrete Distributions 4.5 Binomial Distribution

Figure 4.5.4.

n=40, p=0.2

0.15
0.10
0.05
0.00

0 3 6 9 12 16 20 24 28 32 36 40

4.5.6 §§ Binomial Distribution: Expectation & Variance


The expectation, and variance of a binomial random variable are summarized below:

n  
ÿ n x
E( X ) = xi p (1 ´ p)n´x
x
i =0
= np

Var ( X ) = E( X 2 ) ´ [ E( X )]2
= np(1 ´ p)

Example 4.5.7
A company is considering drilling four oil wells. The probability of success for each well is
0.40, independent of the results for any other well. The cost of each well is $200,000. Each
well that is successful will be worth $600,000. What is the expected gain?
Solution:
Let X be the number of successful wells, i.e., X = 0, 1, . . . , 4. n = 4; p = 0.4. X is a binomial
random variable. The cost is a fixed constant of $200,000. So the total cost of 4 wells is a
fixed constant, i.e., b = $800, 000. The worth of each successful well is a fixed constant of
a = $600, 000. Let Y be the gain from 4 wells. Then Y = aX ´ b

81
Discrete Distributions 4.6 Poisson Distribution

E( X ) = np
= 4 ˆ 0.4
= 1.6
E(Y ) = E( aX ´ b)
= aE( X ) ´ b
= 600, 000 ˆ 1.6 ´ 800, 000
= 160000

The expected gain from drilling 4 oil wells is $160,000.


Example 4.5.7

4.6 IJ Poisson Distribution


Many experimental situations occur in which we observe the counts of events within a set
unit of time, area, volume, length, etc.

§§ Background: A Case Study


The number of typing errors made by a typist has a Poisson distribution with an average of
four errors per page. If more than four errors appear on a given page, the typist must retype
the whole page. What is the probability that a randomly selected page needs to be retyped?

§§ Practical Life Examples of Poisson Distribution


The Poisson distribution is used as a model for counts of rare events, i.e., events which occur
infrequently in time, space, volume or any other dimension.

1. The number of airplanes that come into an airport in 2 hours.

2. The number of phone calls received by a telephone operator in a 10-minute period.

3. The number of typos per page made by a secretary.

4. The number of customers arriving in 1/2 an hour at a shop.

5. The number of defects in a certain sized carpet.

4.6.1 §§ Conditions for Poisson Variable


A random variable X has a Poisson distribution if the following conditions hold

1. X counts the number of events within a specified time or space, etc.

2. The events occur independently of each other.

82
Discrete Distributions 4.6 Poisson Distribution

3. Any 2 events can not happen exactly at the same time.

4. A Poisson random variable can take on any positive integer value, i.e., X = 0, 1, 2, . . ..
In contrast, the Binomial distribution always has a finite upper limit, i.e., X =
0, 1, 2, . . . , n.

Figure 4.6.1.

83
Discrete Distributions 4.6 Poisson Distribution

4.6.2 §§ Probability Mass Function (pm f )


Definition 4.6.1 (Poisson Distribution (pm f )).

A random variable X „ Poisson(λ), where λ is the only parameters of the Poisson


distribution. The pm f for Poisson distribution is:

λx
P( X = x ) = e´λ , x = 0, 1, 2, . . .
x!
1. Here X is the number of events that occur during the specified 1 unit of time

2. λ, the average rate of events that occur during the specified 1 unit of time,
space, volume, etc is the parameter of the Poisson Distribution.

The pm f for Poisson distribution for various values of λ is shown in Figure 4.6.1.

4.6.3 §§ Poisson Distribution: Expectation and Variance

8
ÿ λ xi
E( X ) = xi e´λ
xi !
i =0

Var ( X ) = E( X 2 ) ´ [ E( X )]2

The Poisson random variable is special in the sense that the mean and the variance are equal.

§§ Units in Poisson Probability


It is important to use consistent units in the calculation of probabilities, means, and variances
involving Poisson random variables, e.g., if there are 25 imperfections on average in 100 meters
of optical cable, then the
1. average number of imperfections in 10 meters of optical cable is 2.5, and the

2. average number of imperfections in 1000 meters of optical cable is 250.

Example 4.6.2

1. The number of typing errors made by a typist has a Poisson distribution with an average
of four errors per page. If more than four errors appear on a given page, the typist
must retype the whole page. What is the probability that a randomly selected page
needs to be retyped?

84
Discrete Distributions 4.6 Poisson Distribution

2. The number of meteors found by a radar system in any 30-second interval under speci-
fied conditions averages 1.81. Assume the meteors appear randomly and independently.
What is the probability that at least one meteor is found in a one-minute interval?

Solution:

1. λ = 4/page. Let X be the number of typing errors made. We need to calculate the
probability of retyping a randomly selected page, i.e., P( X ą 4)

P ( X ą 4) = 1 ´ P ( X ď 4)
 
= 1 ´ P ( X = 0) + P ( X = 1) + P ( X = 2) + P ( X = 3) + P ( X = 4)
 0 1 2 3 4
´4 4 ´4 4 ´4 4 ´4 4 ´4 4
= 1´ e +e +e +e +e
0! 1! 2! 3! 4!
= 0.3711

2. λ = 1.81/30-Second. Let X be the number of meteors. We need to calculate probability


of at least 1 meteor in one minute, i.e., P( X ě 1). Here time specified is one minute,
while unit of λ is per 30-second. We need to make sure that both are in the same units.

P ( X ě 1) = 1 ´ P ( X ă 1)
= 1 ´ P ( X = 0)
 0
´1.81ˆ2 (1.81 ˆ 2)
= 1´ e
0!
= 0.9732

Remember that for calculation of probabilities of ’at least type’ or ’greater than type’ events
for Poisson distribution, you will always have to use ’Complement Rule of Probability’.
Example 4.6.2

4.6.4 §§ Poisson Approximation to the Binomial Distribution


Definition 4.6.3.

The Poisson distribution is used as an approximation to the binomial distribution


when the parameter n is large, (i.e., n Ñ 8) while p is small (p Ñ 0);
 
´λ λ
x n x
e « p (1 ´ p)n´x ,
x! x
where λ « np. Poisson can be regarded as a limiting distribution of a binomial
random variable.

85
Discrete Distributions 4.7 Geometric Distribution

A rule of thumb; when np ă 7, then we can use Poisson Approximation to the Binomial
Distribution to find approximate probabilities.
Example 4.6.4
5% of the tools produced by a certain process are defective. Find the probability that in a
sample of 40 tools chosen at random, exactly three will be defective. Calculate a) using the
binomial distribution, and b) using the Poisson distribution as an approximation.
Solution:
p = 0.05; n = 40; np = 2; λ « 2

(a) Binomial Distribution (b) Poisson Distribution


 
40 23
P ( X = 3) = 0.053 (1 ´ 0.05)40´3 P( X = 3) = e´2
3 3!
= 0.1851 = 0.180447

As np ă 7, so Poisson approximation to Binomial is quite accurate.


Example 4.6.4

4.6.5 §§ Comparison of Binomial & Poisson Distribution


Binomial Poisson
Fixed number of trials (n) A large number of trials
Fixed probability of success ( p) Very small probability of success ( p)
Random variable: X Random variable: X
= Number of successes. = Number of success within
a specified time, space, etc
Possible values: 0 ď X ď n Possible values: X ě 0
Parameters: n, p Parameter: λ

4.7 IJ Geometric Distribution


In some applications, we are interested in trying a binary experiment until we succeed.

§§ Background: A Case Study


How can we use probability to solve problems involving the expected number of times before
we get 1st success.
At a ’busy time’, a telephone exchange is very near capacity, so callers have difficulty
placing their calls. It may be of interest to know the number of attempts necessary in order
to make a connection. Let p = 0.05 be the probability of a connection during a busy time.

1. What is the probability you will have to make 5 attempts to make a successful call?

2. How many attempts do you expect to make for a successful call?

86
Discrete Distributions 4.7 Geometric Distribution

4.7.1 §§ Geometric Distribution Conditions

1. Instead of a pre-planned number of trials, we keep conducting Bernoulli trials, until we


finally get 1st success.

2. Other than that, the trials are independent and

3. Each trial can result in either a success (S) or a failure (F)

4. The probability of success p is constant for each trial.

Example 4.7.1
Toss a coin until you get a ’H’

1
1. P(H on 1st toss)=
2

1 1
2. P(T on 1st. H on 2nd toss)= ¨
2 2

1 2 1
3. P(T on 1st 2. H on 3rd toss)= ¨
2 2
& so on until we get 1st ’H’.

The probabilities of the number of tosses until 1st ’H’ are displayed in Figure 4.7.1.
Example 4.7.1

87
Discrete Distributions 4.7 Geometric Distribution

Figure 4.7.1.

0.5

0.4
0.3
P(x)


0.2


0.1




0.0

● ● ● ● ● ● ● ●

2 4 6 8 10 12 14

x: Number of Tosses up to and including 1st Head

Geometric Distribution: pm f for coin toss until 1st head.

88
Discrete Distributions 4.7 Geometric Distribution

4.7.2 §§ Probability Mass Function (pm f )

Definition 4.7.2 (Geometric Distribution (pm f )).

The geometric distribution is also constructed from independent Bernoulli trials,


but from an infinite sequence. On each trial, a success occurs with probability
p, and X is the total number of trials up to and including the first success, i.e.,
x = 1, 2, 3, . . .. From the independence of the trials, this occurs with probability
No of Trials ( X ) P( X )
1 p
2 (1 ´ p ) p
3 (1 ´ p )2 p
.. ..
. .

The terms in this pm f form a geometric sequence as in Figure 4.7.1, that is why
the distribution is called Geometric Distribution. In general,

P( X = n) = (1 ´ p)n´1 p

Some references define Geometric distribution as the number of failures until you
get 1st success, i.e., Number of failures = number of trials - 1 that are followed by
1st success.

Example 4.7.3
A driver is eagerly eyeing a precious parking space some distance down the street. There are
five cars in front of the driver, each of which having a probability 0.2 of taking the space.
What is the probability that the car immediately ahead will enter the parking space?
Solution:
p = 0.2. Five cars in front & the probability that the car immediately ahead will enter the
parking space is P( X = 5)

P( X = 5) = (1 ´ 0.2)4 ˆ 0.2 = 0.082

Example 4.7.3

89
Discrete Distributions 4.7 Geometric Distribution

Figure 4.7.2.

1.0
● ● ● ● ● ● ● ●


0.9

0.8
F(x)


0.7
0.6
0.5

2 4 6 8 10 12 14

x: Number of Tosses up to and including 1st Head

Geometric Distribution: cd f for coin toss until 1st head.

4.7.3 §§ Geometric Distribution: Cumulative Distribution Function cd f


There is a useful closed-form formula for the cumulative distribution function (cdf), i.e.,
P( X ď k ). The sample space S can be decomposed into 2 mutually exclusive events
1. The event X ď k denotes that the first success occurs within k attempts.

2. Its complement is the event that there are no successes in any of the first k attempts,
i.e., X ě k which has a probability of qk . You will only need more than k attempts, if
the 1st k attempts all resulted in failure.
6 P( X ď k ) = 1 ´ P( X ą k ) = 1 ´ qk ;
where q = 1 ´ p.
The cd f for coin toss until 1st H is shown in Figure 4.7.2. The cd f is a step function
which follows the properties of cd f .
Example 4.7.4
Assume that the probability of a specimen failing during a given experiment is 0.1. What
is the probability that it will take more than three specimens to have one surviving the
experiment?

90
Discrete Distributions 4.7 Geometric Distribution

Solution:
Let X be the number of specimens. Probability of surviving p = 1 ´ 0.1 = 0.9. We are
interested in P( X ą 3) to get one specimen surviving the experiment.
We can calculate the required probability using 2 approaches as explained below:

1. X ą 3 in Geometric means that in the 1st 3 specimens tested, none did survive the
experiment.

2. Let Y be the number of specimen surviving, i.e. success then we can fix n = 3 & find
the probability that none of the specimen did survive in the 1st 3 tested in the binomial
experiment. That is P( X ą 3) = P(Y = 0).

1. Geometric Distribution 2. Binomial Distribution


P ( X ą 3 ) = P (Y = 0 )
P ( X ą 3) = 1 ´ P ( X ď 3)  
  3
= 0.90 (1 ´ 0.9)3´0
= 1 ´ 1 ´ 0.13 0
= 0.001 = 0.001

Example 4.7.4

4.7.4 §§ Geometric Distribution: Expectation and Variance

The geometric distribution with parameter p has an expected value and a variance of

ÿ
E( X ) = xi (1 ´ p)i´1 p
i =1
1
=
p
Var ( X ) = E( X 2 ) ´ [ E( x )]2
1´ p
=
p2

1
Expected number of trials required to obtain the 1st success is . The fact that E( X ) is the
p
reciprocal of p is intuitively appealing, since it says that small values of p = P( A) require
many repetitions in order to have an event A occur.

91
Discrete Distributions 4.7 Geometric Distribution

Definition 4.7.5 (Geometric Distribution: Memoryless Property).

The geometric distribution has the memoryless (forgetfulness) property. The dis-
tribution would be exactly the same regardless of the past, i.e.,

P( X ą n + k X X ą k)
P( X ą n + k|X ą k ) =
P( X ą k)
P( X ą n + k)
=
P( X ą k)
(1 ´ p ) n + k
=
(1 ´ p ) k
= (1 ´ p)n+k´k
= (1 ´ p ) n
= P( X ą n)

P( X ą n) = (1 ´ p)n is the probability that it takes more than n trials to get 1st
success means that all the previous n trials resulted in failure. Use of this property
simplifies conditional probability problems!

Example 4.7.6
The Super Breakfast Challenge (SBC) is known to be very difficult to consume. Only 10%
of people are able to eat all of the SBC.

1. How many people are needed, on average, until the first successful customer?

2. What is the variance of the number of people needed?

3. Given that the first 4 are unsuccessful, what is the probability at least 8 are needed?

Solution:
Let X be the number of people required until the first successful customer

1. E( X ) = 1/p = 1/0.1 = 10

2. Var ( X ) = (1 ´ p)/p2 = (1 ´ 0.1)/0.12 = 90

3. P( X ě 8|X ą 4) = P( X ą 7|X ą 4) = P( X ą 3) = (1 ´ 0.1)3 = 0.729. Remember


that we converted X ě 8 = X ą 7 to use memoryless property.

Example 4.7.6

92
Discrete Distributions 4.8 Negative Binomial Distribution

4.8 IJ Negative Binomial Distribution

§§ A Case Study

A coach wants to put together an intramural basketball team, from people living in a large
dorm. She estimates that 12% of people in the dorm like to play basketball. She goes door to
door to ask people if they would be interested in playing on the team. What is the probability
that she needs to

1. interview 20 dorm residents to find 1 willing to play?

2. talk to 20 people, in order to find 5 people who will join the team?

3. How many dorm residents does she expect to interview before finding 5 people to create
the team?

4.8.1 §§ Probability Mass Function (pm f )

Example 4.8.1 (A Coin Toss Scenario)


We fix a positive integer r ą 1, and toss the coin until the rth head appears. Figure 4.8.1
shows the pm f for different values of r.

Example 4.8.1

93
Discrete Distributions 4.8 Negative Binomial Distribution

Figure 4.8.1.

0.25
● ●

● r=2
● r=5
0.20

● r=10

0.15

● ●
prob

● ●

0.10


● ●
● ●
● ● ●


● ● ●
0.05

● ●


● ● ●
● ●
● ●
● ●
● ● ●
● ● ●
0.00

● ● ●
● ● ● ● ● ● ● ● ● ● ●
● ●
● ● ● ● ● ● ● ● ●

5 10 15 20 25 30

x: Number of Tosses to get r Heads

Negative Binomial Distribution pm f for different values of number of heads r

94
Discrete Distributions 4.8 Negative Binomial Distribution

Definition 4.8.2 (Negative Binomial Distribution (pm f )).

The geometric distribution can be generalized to situations in which the quantity


of interest is the number of trials X required up to and including the rth success. In
this case, the appropriate distribution is the negative binomial distribution, which
has a probability mass function given by
 
x´1 r
P( X = x ) = p (1 ´ p) x´r , where x = r, r + 1, . . . ,
r´1

where r and p are the 2 parameters of the negative binomial distribution. The
number of successes r ą 1 & the probability of success p is fixed from trial to trial.
Negative distribution is also known as the Pascal distribution.

1. Negative Binomial distribution is a more general version of Geometric probability dis-


tribution. That is we conduct i.i.d.1 Bernoulli trials until rth success appears, where
r is specified in advance. For r = 1 the negative binomial distribution converts to
Geometric Distribution.
2. The experiment consists of X number of repeated trials to produce r successes in such
experiment
3. The Negative Binomial can also be defined in terms of the number of failures until the
rth success.

4.8.2 §§ Negative Binomial Distribution: Expected Value and Variance


8  
ÿ x´1 r
E( X ) = x p (1 ´ p) x´r
x =r
r ´ 1
r
=
p
Var ( X ) = E( X 2 ) ´ [ E( x )]2
r (1 ´ p )
=
p2
i.e., the expected value and variance of the number of trials that it takes to get r successes.
When r = 1, then mean and variance transform to the mean and variance of the Geometric
Distribution.
Example 4.8.3

1. A curbside parking facility has a capacity for 3 cars. Determine the probability that it
will be full within 10 minutes. It is estimated that 6 cars will pass this parking space
within the time span and, on average, 80% of all cars will want to park there.

1 independent & identically distributed

95
Discrete Distributions 4.8 Negative Binomial Distribution

2. A public relations intern realizes that she forgot to assemble the consumer panel her
boss asked her to do. She panics and decides to randomly ask (independent) people if
they will work on the panel for an hour. Since she is willing to pay them for their work,
she believes she will have a 75% chance of people agreeing to work with her. Find the
probability that she will need to interview at least 10 people to find 5 willing to work
on the panel?

Solution:

1. The desired probability is simply the probability that the number of cars until the third
success (taking the parking space) is less than or equal to 6, i.e., we need to compute
the cd f . Let X be the number of cars to the third success, then X has a negative
binomial distribution with r = 3 and p = 0.8.

P ( X ď 6) = P ( X = 3) + P ( X = 4) + P ( X = 5) + P ( X = 6)
 
3´1
= 0.83 (1 ´ 0.8)3´3
3´1
 
4´1
+ 0.83 (1 ´ 0.8)4´3
3´1
 
5´1
+ 0.83 (1 ´ 0.8)5´3
3´1
 
6´1
+ 0.83 (1 ´ 0.8)6´3
3´1
= 0.983

2. The desired probability is simply the probability that the number of people to ask to
get the fifth success is at least 10, i.e., P( X ě 10). If X is this number, it has a negative
binomial distribution with r = 5 and p = 0.75.

P( X ě 10) = 1 ´ P( X ă 10)
 
= 1 ´ P ( X = 5) + P ( X = 6) + P ( X = 7) + P ( X = 8) + P ( X = 9)
 
5´1
= 1´ 0.755 (1 ´ 0.75)5´5
5´1
   
6´1 5 6´5 7´1
+ 0.75 (1 ´ 0.75) + 0.755 (1 ´ 0.75)7´5
5´1 5´1
    
8´1 5 8´5 9´1 5 9´5
+ 0.75 (1 ´ 0.75) + 0.75 (1 ´ 0.75)
5´1 5´1
= 0.0489

Alternate Method: This problem can be solved based on the intuitive concept that she
will only need to interview 10 or more people if she fails to get the required number of
people willing to work for her, i.e., less than 5 people are willing to work from the 9
people she would have interviewed. So n = 9. Let Y be the number of people willing

96
Discrete Distributions 4.8 Negative Binomial Distribution

to work.

P( X ě 10) = P(Y ă 5)
= P (Y = 0 ) + P (Y = 1 ) + P (Y = 2 ) + P (Y = 3 ) + P (Y = 4 )
   
9 0 5´0 9
= 0.75 (1 ´ 0.75) + 0.751 (1 ´ 0.75)5´1
0 1
   
9 2 5´2 9
+ 0.75 (1 ´ 0.75) + 0.753 (1 ´ 0.75)5´3
2 3
 
9
+ 0.754 (1 ´ 0.75)5´4
4
= 0.0489

Example 4.8.3

4.8.3 §§ Comparison of Binomial and Negative Binomial Models


§§ Relationship between the Binomial and Negative Binomial Models
• Let X have a Negative Binomial distribution with parameters r and p. (That is, X =
number of Bernoulli trials required to obtain r successes with P(success) = p.)

• Let Y have a binomial distribution with parameters n and p. (That is, Y = number of
successes in n Bernoulli trials with P(success) = p.)

Then the following relationships hold:

( a ) P ( X ď n ) = P (Y ě r )
( b ) P ( X ą n ) = P (Y ă r )

Binomial Negative Binomial


Fixed number of trials (n) Fixed number of successes (r )
Fixed probability of success ( p) Fixed probability of success ( p)
Random variable: X Random variable: X
= Number of successes. = Number of trials until the
rth success
Possible values: 0 ď X ď n Possible values: r ď X

1. The negative binomial gets its name as it does the opposite of what the binomial does.

2. Geometric distribution is a special case of the negative binomial distribution when you
get the 1st success.

97
Discrete Distributions 4.9 Hypergeometric Distribution

4.9 IJ Hypergeometric Distribution

§§ A Case Study: Dependent Bernoulli Trials

Figure 4.9.1.

1. Draw a ball from the urn, note the color of the ball and don’t replace it back in the
urn.

P(1st black ball) = 10/20

2. Draw a 2nd ball without replacement from the urn, note the color of the ball,

P(2nd black ball) = 10/20 ˆ 9/19

& so on until n = 10th ball draw,

3. Find the probability that 10 black balls are obtained?

The hypergeometric distribution pm f for this case study is shown in Figure 4.9.2.

98
Discrete Distributions 4.9 Hypergeometric Distribution

Figure 4.9.2.

0.4

● hypergeom (20, 10, 10)



0.3

● ●
P(x)

0.2
0.1

● ●

● ●
0.0

● ● ● ●

0 2 4 6 8 10

x: # of Black Balls from a bag with 10 R & 10 B Balls

4.9.1 §§ Conditions for Hypergeometric Distribution


1. Useful in experiments where n elements are picked at random without replacement
from a small finite population of size N

2. The population of interest is dichotomized; Succss (S) & Failure (F)

3. It describes the probability of x successes in n draws without replacement from a finite


population of size N containing exactly M successes.

99
Discrete Distributions 4.9 Hypergeometric Distribution

Success Failure Total


Samples Drawn x n´x n
Not drawn M´x ( N ´ M) ´ (n ´ x ) N´n
Total M N´M N

4.9.2 §§ Probability Mass Function (pm f )


Definition 4.9.1 (Hypergeometric Distribution (pm f )).

X „ hypergeom( N, M, n)

(M N´M
x )( n´x )
P( X = x ) = , where X = 0, 1, min( M, n) (4.9.1)
( Nn )

Parameters

1. Sample Size: n where (1 ď n ď N )

2. Population Size: N where ( N ě 1)

3. Number of Successes: M where ( M ě 1)

Example 4.9.2
In a group of 25 factory workers, 20 are low-risk and 5 are high-risk. Two of the 25 factory
workers are randomly selected without replacement. Calculate the probability that exactly
one of the two selected factory workers is low-risk.
Solution:
N = 25; M = 5; n = 2

(20 5
1 )(1)
P ( X = 1) =
(25
2)
= 0.3333

Example 4.9.2

4.9.3 §§ Hypergeometric Distribution: Expected Value and Variance


 
M
E( X ) = n
N
   
M N´M N´n
Var ( X ) = n
N N N´1

100
Discrete Distributions 4.9 Hypergeometric Distribution

N´n
The factor is called finite population correction. For a fixed sample size n, as N Ñ 8
N´1
it is clear that the correction goes to 1, i.e., for infinite populations the hypergeometric
distribution can be approximated by Binomial.
Example 4.9.3
A college student is running late for his class. He has 12 folders on his desk, 4 of which in-
clude assignments due today. Without taking time to look, he accidentally grabs just 3 folders
from his stack. When he gets to class, he counts how many of them contain his homework
assignments. What is the probability at least 2 of the 3 folders contain his assignments?
Solution:
N = 12; M = 4; n = 3

P ( X ě 2) = P ( X = 2) + P ( X = 3)
(42)(81) (43)(80)
= +
(12
3) (12
3)
= 0.2363

Example 4.9.3

4.9.4 §§ Binomial Approximation to Hypergeometric Distribution


Let X have a hypergeometric distribution with pm f as defined in equation 4.9.1. Let Let
p = M/N, q = 1 ´ p. Then as N Ñ 8, we have
 
(M N´M
x )( n´x ) n x
P( X = x ) = » p (1 ´ p)n´x
( Nn ) x

Rule of Thumb: For very large population size N, if the sample size n is at most 5% of the
population size and sampling is without replacement, then the experiment may be analyzed
as if it were a binomial experiment. The probability of success p in this case is approximated
as M/N « p.
Example 4.9.4
A nationwide survey of 17,000 college seniors by the University of Michigan revealed that
almost 70% disapprove of daily smoking. If 18 of these seniors are selected at random and
asked their opinion, what is the probability that more than 9 but fewer than 14 disapprove
of smoking daily?
Solution:
N = 17, 000; p = 0.70; M = 4; n = 18; n/N = 18/17000 = 0.001. As n ď 0.05N, so we
can effectively use the binomial approximation to hypergeometric.

101
Discrete Distributions 4.9 Hypergeometric Distribution

P(9 ă X ă 14) = P( X = 10) + P( X = 11) + P( X = 12) + P( X = 13)


 
18
= 0.710 (1 ´ 0.7)18´10
10
 
18
+ 0.711 (1 ´ 0.7)18´11
11
 
18
+ 0.712 (1 ´ 0.7)18´12
12
 
18
+ 0.713 (1 ´ 0.7)18´13
13
= 0.6077

Example 4.9.4
In general, it is a bit difficult to decide the appropriate distribution in a particular scenario.
Students should practice problems that will provide them with some skills for making correct
decisions. Figure 4.9.3 might be useful in making a correct choice.

Figure 4.9.3.

102
Discrete Distributions 4.10 Home Work

4.10 IJ Home Work


In each problem given below, also write down the name & the parameters of a distribution
applicable if any.
1. Ten motors are packaged for sale in a certain warehouse. The motors sell for $100 each,
but a double-your-money-back guarantee is in effect for any defectives the purchaser
may receive. Find the expected net gain for the seller if the probability of any one
motor being defective is .08. (Assume that the quality of any one motor is independent
of the others.)

2. The demand for a particular type of pump at an isolated mine is random and indepen-
dent with an average demand of 2.8 pumps in a week (7 days). Further supplies are
ordered each Tuesday morning and arrive on the weekly plane on Friday morning. Last
Tuesday morning only one pump was in stock, so the storesman ordered six more to
come on Friday morning. Find the probability that stock will be exhausted and there
will be unsatisfied demand for at least one pump by Friday morning.

3. A salesperson has found that the probability of a sale on a single contact is approxi-
mately .03. If the salesperson contacts 100 prospects, what is the approximate proba-
bility of making at least one sale?

4. Used watch batteries are tested one at a time until a good battery is found. Let X
denote the number of batteries that need to be tested in order to find the first good
one. Find the expected value of X, given that P( X ą 3) = 0.5

5. A research study is concerned with the side effects of a new drug. The drug is given
to patients, one at a time, until two patients develop side effects. If the probability of
getting a side effect from the drug is 1/6, what is the probability that eight patients
are needed?

6. When drawing cards with replacement and re-shuffling, you bet someone that you can
draw an Ace within k draws. You want your chance of winning this bet to be at least
52%. What is the minimum value of k needed? What is the probability that you will
need at least ten draws to get 4 Aces?

7. A company is interested in evaluating its current inspection procedure for shipments


of 50 identical items. The procedure is to take a sample of 5 and pass the shipment
if no more than 2 are found to be defective. What proportion of shipments with 20%
defectives will be accepted?

§§ Answers
1. $840

2. 0.3374

3. 0.9524

4. « 5

103
Discrete Distributions 4.10 Home Work

5. « 0.0651

6. « 10; 0.0236

7. 0.9517

104
Chapter 5

Continuous Distributions

AS YOU READ . . .

1. What is Continuous Random Variable?

2. What is Continuous Uniform Distribution and what are its parameters? In which
scenario can we use it to model probabilities?

3. What is Normal Distribution and its parameters? Why is Normal Distribution widely
applicable in practical life?

4. What is the Exponential Distribution and its parameters? How can we use it to model
chances of waiting times?

5.1 IJ Continuous Random Variable


A random variable X is continuous if its possible values comprise a single interval on the
number line (for some A ă B, any number x between A and B is a possible value).

§§ A Case Study
Consider daily rainfall in Karachi in July. Theoretically, using measuring equipment with
perfect accuracy, the amount of rainfall could take on any value e.g., between 0 and 5 inches.
Let X represents the amount of rainfall in inches. We might want to calculate probabilities
such as:

1. the amount of rainfall in Karachi in July this year would be less than 5 inches, i.e.,
P( X ă 5) or

2. the amount of rainfall in Karachi in July this year would be between 2-inches to 4-
inches, i.e., P(2 ď X ď 4).

105
Continuous Distributions 5.1 Continuous Random Variable

The amount of rainfall X being a continuous random variable, includes all values in an interval
of real numbers. This could be an infinite interval such as (´8, 8). You could usually state
the beginning and end points, but you would have infinitely many possibilities of answers
within that range, e.g., 2 ď X ď 4; (see Figure 5.1.1).

Figure 5.1.1.

f(x)

0 1 2 3 4 5 6

Distribution of Amount of Rainfall.

§§ Continuous Random Variable: Applications


Below are some real life applications of continuous random variables in different fields.
1. Medical Trials: the time until a patient experiences a relapse.

2. Sports: the length of a javelin throw.

3. Ecology: the lifetime of a tree.

4. Manufacturing: the diameter of a ball bearing.

5. Computing: the amount of time a Help Line customer spends on hold.

6. Physics: the time until a uranium atom decays.

7. Oceanography: the temperature of ocean water at a specified latitude, longitude and


depth.

106
Continuous Distributions 5.2 Continuous Probability Distribution

Figure 5.1.2.

20
15
10
f(x)

5
0

0.18 0.22

5.2 IJ Continuous Probability Distribution


5.2.1 §§ Background
For continuous random variables, since there is an infinite number of possible values, we
describe the probability distribution with a smooth curve. Probability density function (pdf)
is an analogue of the probability mass function (pmf) for discrete random variable.
1. How likely is it that X falls between 0.18 and 0.22?
2. We can answer this question by considering the area under the curve between 0.18 and
0.22, as shown by shaded area in Figure 5.1.2
3. The overriding concept here for a continuous random variable is that AREA=PROBABILITY.
4. More specifically, the area under the pdf curve between points a and b is the same as
the probability that the random variable will have a value between a and b.
5. If f ( x ) is a known function, then we can answer this question by integration, i.e.,
0.22
ż
P(0.18 ď X ď 0.22) = f ( x )d( x )
0.18

107
Continuous Distributions 5.2 Continuous Probability Distribution

S&P % Returns displayed in Figure 5.2.1 show real life application of continuous probability
distribution in Forex.

Figure 5.2.1.

Definition 5.2.1 (Probability Density Function: pd f ).

1. A pd f for a continuous random variable is defined for all real numbers in the
range of the random variable.

2. More specifically, the area under the pd f curve between points a and b is the
same as the probability that the random variable will have a value between a
and b, (see Figure 5.2.2).

żb
P( a ď X ď b) = f ( x )dx
a

108
Continuous Distributions 5.2 Continuous Probability Distribution

Figure 5.2.2.

§§ Properties of pdf
For a continuous random variable X, a probability density function pdf denoted as f ( x ) is a
function such that:
1. Non-Negativity: f ( x ) ě 0, for all x
ż8
2. Unity: f ( x )dx = 1
´8

3. The probability that a continuous random variable X takes any specific value a is always
ża
0!, i.e., P( X = a) = P( a ď X ď a) = f ( x )dx = 0.
a

4. This has a very useful consequence in the continuous case:


P( a ď X ď b) = P( a ă X ď b) = P( a ď X ă b) = P( a ă X ă b)

§§ Comparison of Properties of pdf with Properties of pm f

Continuous Random variable Discrete Random variable


pd f : f ( x ) pm f : P( X )

f ( x ) ě 0, @x 0 ď P( X = x ) ď 1, @x P S
ż8 ÿ
f ( x )dx = 1 P( x ) = 1
´8 i

Example 5.2.2
Let a continuous random variable X has density function

109
Continuous Distributions 5.2 Continuous Probability Distribution

#
A (1 ´ x 2 ) ´1 ă x ă 1,
f (x) =
0 elsewhere

1. Find the value of A for which f ( x ) would be a valid density function.

2. Find the probability that X will be more than 1/2 but lesser than 3/4.

3. Find the probability that X will be greater than 1/4.

4. Find the cd f , i.e., F ( X ).

Solution:

1. To find A we require
ż8
f ( x )dx = 1
´8
ż1
= A(1 ´ x2 )dx
´1
1
x3
= A( x ´ )
3 ´1
= A(1 ´ 1/3 ´ (´1 + 1/3))
= A(2 ´ 2/3)
6 A = 3/4

2.
3/4
3
ż
P(1/2 ď X ď 3/4) = (1 ´ x2 )dx
4
1/2
3/4
3 x3
= ( x ´ )
4 3 1/2
= 29/256

3.
ż1
P( X ě 1/4) = 3/4(1 ´ x2 )dx
1/4
1
3 x3
= ( x ´ )
4 3 1/4
= 81/256

110
Continuous Distributions 5.2 Continuous Probability Distribution

4. An expression for F ( x ) is:


$
&0
’ x ă ´1,
F(x) = 2+3x´x3
´1 ă x ă 1
’ 4
1 xě1
%

Example 5.2.2

Example 5.2.3
The probability density function of the time to failure of an electronic component in a copier
(in hours) is #
0 x ă 0,
f ( x ) = 1 ´0.5x
2e for x ě 0
a. Determine the probability that a component fails in the interval from 1 to 2 hours.
b. At what time do we expect 50% of the components to have failed, i.e., median of the
distribution
Solution:
a.
ż2
1 ´0.5x
P (1 ď X ď 2) = e dx
2
1
2

= ´( e ´0.5x
)
1
= 0.2386
b. For the median of the distribution, we need to find the value of X that divides the
distribution into two halves, e.g., P(0 ď X ď x ) = 0.5.
P(0 ď X ď x ) = 0.5
żx
1 ´0.5x
= e dx
2
0
x

´0.5x
= ´( e )
0
= ´( e ´0.5x
) ´ (e´0.5(0) )
0.5 = 1 ´ (e´0.5x )

Solving for x, we get x = 1.3865. Therefore, after 1.39 hours we expect 50% of the
components to have failed.
Example 5.2.3

111
Continuous Distributions 5.2 Continuous Probability Distribution

5.2.2 §§ Cumulative Distribution Function (cd f )

Definition 5.2.4 (Cumulative Distribution Function (cd f )).

We often need to compute the probability that the random variable X will be less
than or equal to a, i.e. P( X ď a), known as cd f
Continuous Case

F ( x ) = P( X ď a)
= F ( a)
ża
= f ( x )dx
´8

Discrete Case

F ( x ) = P( X ď a)
= F ( a)
ÿ
= P( x )
Xďa

§§ pd f from cd f
Discrete Case (pm f from cd f ): pm f was the jump size in the step function. The size of the
jump at any x can be written as

PX ( xi ) = F ( xi ) ´ F ( xi´1 )

Continuous Case (pd f from cd f ): The density pd f is the derivative of the cd f

d
f (x) = F ( x ).
dx

This holds at every x at which the derivative of F ( x ), denoted by F1 ( x ), exists.

§§ Use of cd f to find the probabilities


In the continuous case, it is very useful to use the cd f to find probabilities using the formulas:

P( X ą a) = 1 ´ P( X ď a)
= 1 ´ F ( a)
P( a ď X ď b) = F (b) ´ F ( a)

112
Continuous Distributions 5.2 Continuous Probability Distribution

Example 5.2.5
If X is a continuous random variable with cd f given by
#
0 x ă 0,
F(x) =
1 ´ e´0.5x for x ě 0

Find the pd f of x.
Solution:
$
& 1 ´0.5x
d e xě0
f ( x ) = dx F ( x ) = 2
%0 elsewhere
#
0 x ă 0,
6 f (x) = 1 ´0.5x
2e for x ě 0

Example 5.2.5

5.2.3 §§ Expectation

Definition 5.2.6 (Expectation).

Mathematically, the expected value E( X ) is defined as:

§§ Continuous Case
ż8
E( X ) = x f ( x )dx
ż´8
 8
E g( X ) = g( x ) f ( x )dx
´8

Graphically, E( X ) is the point where the distribution balances as shown in Figure


5.2.3.

§§ Discrete Case
ÿ
E( X ) = x j P( x j )
j
 ÿ
E g( X ) = g( x j ) P( x j )
j

113
Continuous Distributions 5.2 Continuous Probability Distribution

Figure 5.2.3.

5.2.4 §§ Variance

114
Continuous Distributions 5.2 Continuous Probability Distribution

Definition 5.2.7 (Variance σ2 ).

Mathematically, the variance of the random variable X denoted as σ2 or Var ( X ) is


defined as:

Var ( X ) = E( x ´ µ)2

§§ Continuous Case
Var ( X ) = E( x ´ µ)2
ż8
= ( x ´ µ)2 f ( x )dx
´8
= E( X 2 ) ´ [ E( X )]2
ż8
2
E( X ) = x2 f ( x )dx
´8

Graphically, Var ( X ) is the spread of the values of the random variable around it’s
mean as shown in Figure 5.2.4.

§§ Discrete Case
Var ( X ) = E( x ´ µ)2

= E( X 2 ) ´ [ E( X )]2

ÿ
E( X 2 ) = x2j P( x j )
j

115
Continuous Distributions 5.2 Continuous Probability Distribution

Figure 5.2.4.

Example 5.2.8
Let a continuous random variable X has density function
#
A(1 ´ x2 ) ´1 ă x ă 1,
f (x) =
0 elsewhere

Find the expected value and the variance of the distribution of X.


Solution:
The value of A was computed as 3/4 in Example 5.2.2.
ż8
E( X ) = x f ( x )dx
´8
ż1
3
= x (1 ´ x2 )dx
´1 4
=0
ż8
2
E( X ) = x2 f ( x )dx
´8
ż1
3 2
= x (1 ´ x2 )dx
´1 4
= 0.2
Var ( X ) = E( X 2 ) ´ [ E( X )]2
Var ( X ) = 0.2 ´ (0)2
= 0.2

116
Continuous Distributions 5.2 Continuous Probability Distribution

Example 5.2.8

Example 5.2.9
The probability density function of the weight of packages delivered by a post office is
#
70
1 ď x ď 70,
f ( x ) = 69x2
0 elsewhere
1. If the cost is $2.50 per pound, what is the mean shipping cost of a package?
2. Find the Variance of the distribution of the shipping cost.
Solution:
Let X be the weight of the package. Shipping cost per pound is $2.50. The total cost can be
defined as Y = 2.5X.
1.
ż8
E( X ) = x f ( x )dx
´8
ż 70
70
= x dx
1 69x2
= 4.31

The total cost Y = 2.5X. Therefore, the mean shipping cost is


E(Y ) = 2.5 ˆ 4.31
= 10.775
ż8
2
E( X ) = x2 f ( x )dx
´8
ż 70
70
= x2 dx
1 69x2
= 70

2.
Var ( X ) = E( X 2 ) ´ [ E( X )]2
= 70 ´ (4.31)2
= 51.42
Var (Y ) = Var (2.5X )
= 2.52 Var ( X )
= 321.3994

Example 5.2.9

117
Continuous Distributions 5.3 Piecewise Distributions

5.3 IJ Piecewise Distributions


Some distributions are not necessarily continuous, but they are continuous over particular
intervals. These types of distributions are known as Piecewise distributions, (see Figure
5.3.1).
Example 5.3.1
The pd f of a random variable X is given below and shown in Figure 5.3.1.
$
3
&4 0 ď x ď 1

f ( x ) = 14 3 ď x ď 4

0 Otherwise
%

1. Find the cd f

2. Find E( x )

3. Find Var ( x )

Solution:

1. We integrate to find the cd f in five disjoint regions:


For a ă 0, we have
ża
F ( a) = f ( x )dx = 0
´8
For 0 ď a ď 1, we have
ża ża
3 3
F ( a) = f ( x )dx = dx = a
4 4
0 0

For 1 ď a ď 3, we have
ż1 ża
3 3
F ( a) = dx + 0dx =
4 4
0 1

For 3 ď a ď 4, we have
ż1 ż3 ża
3 1 3 1
F ( a) = dx + 0dx + dx = + 0 + ( a ´ 3)
4 4 4 4
0 1 3

For 4 ď a, we have
ż1 ż3 ż4 ża
3 1 3 1
F ( a) = dx + 0dx + dx + 0.dx = + =1
4 4 4 4
0 1 3 4

118
Continuous Distributions 5.3 Piecewise Distributions

Remember cd f by definition is cumulative probability from lower limit.


Thus, the cd f is:
$

’0, for x ă 0;

3/4x, for 0 ď x ď 1;




for 1 ď x ď 3;

&3/4,
F(x) =


’3/4 + 1/4( x ´ 3), for 3 ď x ď 4;
1, for x ą 4





0 elsewhere.
%

The cd f for this distribution is shown in Figure 5.3.2.


2.
ż8
E( X ) = x f ( x )dx
´8
1 ż3 ż4
1
ż
= x f ( x )dx + x. f ( x )dx + x.dx
4
0 1 3
ż1 ż3 ż4
3 1
= x dx + x.0dx + x.dx
4 4
0 1 3
= 1.25

ż8
2
E( X ) = x2 f ( x )dx
´8
1 ż3 ż4
1 2
ż
= x2 f ( x )dx + x2 . f ( x )dx + x dx
4
0 1 3
ż1 ż3 ż4
3 1 2
= x2 dx + x2 .0dx + x dx
4 4
0 1 3
= 3.33

3.
 2
Var ( X ) = E( X 2 ) ´ E( X )
= 1.77

Example 5.3.1

119
Continuous Distributions 5.3 Piecewise Distributions

Figure 5.3.1.



●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

0.6
0.4
f(x)



●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●
0.2
0.0



●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
● ●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●

0 1 2 3 4 5 6

pd f for Piecewise Distribution

120
Continuous Distributions 5.4 Continuous Uniform Distribution

Figure 5.3.2.

1.0

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●


●●

●●



●●



●●


●●

●●



●●


●●

●●

0.8



●●


●●
●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●








●●







0.6







●●




F(x)





●●








0.4

●●








●●








●●





0.2









●●








●●







0.0


0 1 2 3 4 5 6

cd f for Piecewise Distribution in Figure 5.3.1.

5.4 IJ Continuous Uniform Distribution

Uniform random variables are one of the most elementary continuous random variables.

§§ A Case Study

The total time to process a passport application by the state department is between 3 and 7
weeks. The interest might be to find out the expected time for an application processing. If
my passport needs renewal what is the probability that my application will be processed in
5 weeks or less? Let X be the processing time, it is important to note that X is equally likely
to fall anywhere in this interval of 3-7 weeks, i.e., X has a constant density on this interval.

121
Continuous Distributions 5.4 Continuous Uniform Distribution

Figure 5.4.1.

5.4.1 §§ Probability Density Function


Definition 5.4.1 (Uniform Distribution (pd f )).

Let X be a continuous uniform random variable. The pd f of X „ U ( a, b) in the


interval a ď X ď b is defined as
$
& 1
, aďxďb
f (x) = b´a
% 0, otherwise

Uniform distribution has 2 parameters a and b.

Figure 5.4.2 shows the Uniform density over the interval a and b. The random variable
X uniformly distributed on ( a, b), is equally likely to fall anywhere in this interval.

122
Continuous Distributions 5.4 Continuous Uniform Distribution

Figure 5.4.2.

Uniform Density f(x)

0.5
0.4
0.3
b−a
1
f(x) =

0.2
0.1
0.0

a b

The distribution is also called a Rectangular Distribution. U(0,1) is the most commonly
used uniform distribution.

5.4.2 §§ Cumulative Distribution Function (cd f )

Definition 5.4.2 (Uniform Distribution: (cd f )).

By definition, the cd f F ( x ) is the probability that X is at most x, i.e., F ( x )

if x ď a
$
żx ’
& x´ 0
1 a
F ( x ) = P( X ď x ) = dx = aăxăb
a b´a % b´a

1 if x ě b

Graphically, the Uniform cd f is displayed in Figure 5.4.3.

123
Continuous Distributions 5.4 Continuous Uniform Distribution

Figure 5.4.3.

Uniform CDF F(x)

1.0
0.8
0.6
F(x)

0.4
0.2
0.0

a b

It can be seen that the cd f grows linearly and saturates at 1.

5.4.3 §§ Uniform Distribution: Expectation and Variance


ż8
E( X ) = x f ( x )d( x )
´8
żb
1
= x d( x )
a b´a
a+b
= ;
2
The result should be intuitive because it says that the mean is the midpoint of the PDF.

Var ( X ) = E( X 2 ) ´ [ E( X )]2
( b ´ a )2
= .
12

124
Continuous Distributions 5.4 Continuous Uniform Distribution

Example 5.4.3
Suppose a bus always arrives at a particular stop between 8:00 AM and 8:10 AM. The density
is shown in Figure 5.4.4.
a. Find the probability that the bus will arrive tomorrow between 8:00 AM and 8:02 AM?
b. What is the expected time of the bus arrival?
c. Eighty percent of the time, the waiting time of a customer for the bus must fall below
what value?
d. If the bus did not arrive in the 1st 5 minutes, what is the probability that it will arrive
in the last 2 minutes?

Figure 5.4.4.
0.25
0.20
0.15
f(x)

0.10
0.05
0.00

0 10

Solution:
Let the random variable X be the waiting time on minutes scale.
a. Probability that the bus will arrive tomorrow between 8:00 AM and 8:02 AM, i.e.,
X ă 2).
ż2
1
P ( x ď 2) = dx
0 (b ´ a)
ż2
1
= dx
0 (10 ´ 0)
= 2/10

125
Continuous Distributions 5.5 Normal Distribution

There is a 20% chance that the bus will arrive tomorrow between 8:00 AM and 8:02
AM. It is also clear that, owing to uniformity in the distribution, the solution can be
found simply by taking the ratio of the length from 0 to 2 to the total length of the
distribution interval.

b.
a+b
E( x ) =
2
= 10/2
=5

i.e., Bus is expected to arrive at 8:05 AM.

c. We need to find 80th Percentile, i.e.,

P( x ď k ) = 0.80
żk
1
= dx
0 10 ´ 0

Solving for k, we get k = 0.8. Therefore, 80% of the time, the waiting time of a
customer for the bus must fall below 8:08 AM.

d. Here, the condition that bus did not arrive in the 1st 5 minutes is given.

P ( X ą 8 X X ą 5)
P( X ą 8|X ą= 5) =
P ( X ą 5)
P ( X ą 8)
=
P ( X ą 5)
2/10
=
5/10
= 2/5

Example 5.4.3

5.5 IJ Normal Distribution


Normal distribution is the most important of all continuous probability distributions and is
used extensively as the basis for many statistical inference methods. Its importance stems
from the fact that it is a natural probability distribution for directly modeling error distribu-
tions and many other naturally occurring phenomena. In addition, by virtue of the central
limit theorem, which is discussed in Chapter 6.2, the normal distribution provides a useful,
simple, and accurate approximation to the distribution of general sample averages.

126
Continuous Distributions 5.5 Normal Distribution

§§ A Case Study
Smart phone batteries have on average lifetime of 1 year with 1 month margin of error.
You buy a new phone, what is the chance that your phone battery does not work past 1
month? Or it lasts at least 11 months? Such a random variable is expected to have a central
value around which most of the observations cluster; bell-shaped (approximately normal)
also called a Gaussian distribution named after German Mathematician Carl Gauss. Due to
the significance of his work, his picture and the normal pd f along with the normal curve are
displayed on German currency.

Figure 5.5.1.

5.5.1 §§ Probability Density Function (pd f )

Definition 5.5.1 (Normal Distribution (pd f )).

Let X be a Gaussian random variable, i.e., X „ N (µ, σ2 ). The probability density


function ( pd f ) of X is:

1 ( x´µ)2
´
f ( x, µ, σ ) = ? e 2σ2 ; ´8 ă x ă 8,
σ 2π
where

a. µ (the mean) is the location and σ (the standard deviation) is the scale pa-
rameter. µ is exactly the first moment and the variance σ2 is second central
moment of the random variable.

b. The constants π = 3.141593 and e = 2.71828.

127
Continuous Distributions 5.5 Normal Distribution

§§ Cumulative Distribution Function cd f

żx ( x´µ)2
1
FX ( x ) = ? e´ 2σ2 dx
σ 2π
´8

There is no closed form solution for cd f :

5.5.2 §§ Effect of Mean and Variance

Figure 5.5.2 shows the effect of the location parameter µ on the pd f and the Figure 5.5.3
shows the impact of the location parameter µ on the cd f of the simulated normal distribu-
tions. The scale parameter σ = 1 is kept constant in the 3 distributions simulated. It can
be observed that the pd f of the Gaussian moves left or right depending on the value of the
mean µ, i.e., the change in the value of µ shifts the location of the curves.

Figure 5.5.2.
0.4

µ=0; σ=1
µ=1; σ=1
µ=−1; σ=1
0.3
f(x)

0.2
0.1
0.0

−4 −2 0 2 4

128
Continuous Distributions 5.5 Normal Distribution

Figure 5.5.3.

1.0
µ=0; σ=1
µ=1; σ=1
µ=−1; σ=1

0.8
0.6
F(x)

0.4
0.2
0.0

−4 −2 0 2 4

The impact of the scale parameter σ on the pd f is shown in Figure 5.5.4 and the Figure
5.5.5 shows similar impact of σ on the cd f of the normal distributions. The normal distribu-
tions presented in these figures were simulated with the a constant mean µ = 0, but different
variances. Changing the standard deviation either tightens or spreads out the width of the
distribution along the X-axis. Larger standard deviations produce wider distributions. The
change in σ scales the distribution.

129
Continuous Distributions 5.5 Normal Distribution

Figure 5.5.4.

0.8
µ=0; σ=1
µ=0; σ=2
µ=0; σ=0.5

0.6
f(x)

0.4
0.2
0.0

−4 −2 0 2 4

Figure 5.5.5.
1.0

µ=0; σ=1
µ=0; σ=2
0.8

µ=0; σ=0.5
0.6
F(x)

0.4
0.2
0.0

−4 −2 0 2 4

5.5.3 §§ Properties of Normal Distribution

130
Continuous Distributions 5.5 Normal Distribution

Figure 5.5.6.

a. It is a symmetric, bell shaped distribution with total area under the curve being equal
to 1. This property is useful to solve practical application problems.

b. The mean, median and mode are all equal and located at the center of the distribution.

c. Maximum occurs at µ, 50% area lies to either side of the mean µ.

d. The inflection points are located at µ ´ σ and µ + σ as shown by the red points in the
curve in Figure 5.5.6. (An inflection point is a point on the curve where the sign of the
curvature changes.)

e. This curve lies entirely above the horizontal axis, and x-axis is an asymptote in both
horizontal directions

f. The area between the curve and the horizontal axis is exactly 1. Note that this is the
area of a region that is infinitely wide, since the curve never actually touches the x-axis.

5.5.4 §§ Standard Normal Distribution

§§ Background
A normal distribution with a mean of µ and standard deviation of σ, i.e., X „ N (µ, σ) has
pd f as:
1 ( x´µ)2
´
f ( x, µ, σ) = ? e 2σ2 ; ´8 ă x ă 8
σ 2π

To find the probability that a normal random variable x lies in the interval from a to b, we
need to find the area under the normal curve between the points a and b (see Figure 5.1.1).
However, there are an infinitely large number of normal distributions-one for each different
mean and standard deviation, (e.g., see Figure 5.5.2). A separate table of areas for each
of these curves is obviously impractical. Instead, we use a standardization procedure that
allows us to use the same table for all normal distributions.

131
Continuous Distributions 5.5 Normal Distribution

Definition 5.5.2 (Standard Normal Distribution).

A normal distribution with a mean of µ = 0 and standard deviation of σ = 1 is


called a Standard Normal Distribution. The standard normal random variable is
denoted as z „ N (0, 1). The pd f of z is:
2
e´z /2
f (z) = ? , ´8 ă z ă 8

That is, Z „ N (0, 1) is a Gaussian with µ = 0 and σ2 = 1.

§§ Standardizing a Normal Random Variable

All normally distributed variables X can be transformed into the standard normal variable
Z.

x´µ
z= ñ X = µ + σz
σ

A z´value tells how many standard deviations above or below the mean a certain value of
X is.

§§ CDF of the Standard Normal Distribution

The CDF of the standard Gaussian can be determined by integrating the PDF.

Definition 5.5.3 (Standard Normal Distribution: cdf).

The CDF of the standard Gaussian is defined as the Φ() function

żz
1 2
Fz (z) = Φ(z) = ? e´z dz

´8

132
Continuous Distributions 5.5 Normal Distribution

Figure 5.5.7.

1.0
0.8
0.6
Φ(z)

0.4
0.2
0.0

−4 −2 0 2 4

CDF of Standard Normal Distribution.

The cumulative distribution function is shown in Figure 5.5.7 and is often referred to as
an ’S-shaped’ curve. Notice that Φ(0) = 0.5 because the standard normal distribution is
symmetric about z = 0, and that the cumulative distribution function Φ(z) approaches 1
as z tends to 8 and approaches 0 as z tends to ´8. The symmetry of the standard normal
distribution about 0 implies that if the random variable Z has a standard normal distribution,
then
1 ´ Φ(z) = P( Z ě z) = P( Z ď ´z) = Φ(´z)

§§ Z-Table
1. The table for the cumulative distribution of the standard normal variable is shown in
Figure 5.5.8. The entries inside the table give the area under the standard normal
curve for any value of z from 0 Ñ 3.49 or so.

2. The table gives values for non-negative z. For negative values of z, the area can be
obtained from the symmetry property of the curve.

133
Continuous Distributions 5.5 Normal Distribution

3. Before using the table, remember to convert the normal random variable X to Z as:

x´µ
z=
σ

4. Convert Z back to X as X = µ + σZ

Figure 5.5.8.

134
Continuous Distributions 5.5 Normal Distribution

5.5.5 §§ Finding Probabilities Using Table

Example 5.5.4
In each of the following cases, evaluate the required probabilities.

1. Probability to the left of z value

a. P( Z ď 1.96)?
b. P( Z ď ´1.96)?

2. Probability to the right of z value

a. P( Z ě 1.96)?
b. P( Z ě ´1.96)?

3. Probability between 2 z values, i.e., P(´1.96 ď Z ď 1.96)?

4. Probability between 2 x values; X „ N (µ = 10, σ = 4), i.e., P(4 ď X ď 16)?

Solution:

1. a. If z ě 0 and we want P( Z ď z), we just directly look up P( Z ď z) in the table.


For instance,
P( Z ď 1.96) = F (1.96) = 0.9750
P( Z ď 1.96) is the shaded area under the curve in Figure 5.5.9.
b. If z ă 0 and we want P( Z ă ´z), we use 2 properties of the normal curve, i.e.,
total area under the curve is 1, symmetry of the normal curve. For instance

P( Z ď ´1.96) = P( Z ě 1.96),

due to symmetry of the curve.

P( Z ě 1.96) = 1 ´ P( Z ď 1.96),

using the Complement Law of Probability. Using the standard normal distribution
table,

P( Z ď ´1.96) = F (´1.96) = P( Z ě 1.96) = 1 ´ 0.975 = 0.025.

P( Z ď ´1.96) is the shaded area under the curve in Figure 5.5.10.

2. Probability to the right of z value

a.
P( Z ě 1.96) = 1 ´ P( Z ď 1.96),
using the Complement Law of Probability. Using the table P( Z ě 1.96) = 1 ´
0.975 = 0.025. P( Z ě 1.96) is the shaded area under the curve in Figure 5.5.11.

135
Continuous Distributions 5.5 Normal Distribution

b.
P( Z ě ´1.96) = P( Z ď 1.96),
using the symmetry property. Using the table P( Z ě ´1.96) = 0.975 that is the
shaded area under the curve in Figure 5.5.12.

3. Probability between 2 z values, i.e.,

P(´1.96 ď Z ď 1.96) = F (1.96) ´ F (´1.96) = 0.975 ´ 0.025 = 0.950

is the shaded area under the curve in Figure 5.5.13.

4. Probability between 2 x values, i.e., P(4 ď X ď 16) if X „ N (µ = 10, σ = 4)

• Convert x into z as
 
4 ´ 10 16 ´ 10
P ďXď = P(´1.5 ď Z ď 1.5)
4 4

P(´1.5 ď Z ď 1.5) = F (1.5) ´ F (´1.5)


= 0.9332 ´ (1 ´ 0.9332)
= 0.8664

• 6 P(4 ď X ď 16) = 0.8664

Example 5.5.4

Figure 5.5.9.
0.4
0.3
0.2
0.1
0.0

−3 −2 −1 0 1 2 3

P( Z ď 1.96)

136
Continuous Distributions 5.5 Normal Distribution

Figure 5.5.10.

0.4
0.3
0.2
0.1
0.0

−3 −2 −1 0 1 2 3

P( Z ď ´1.96)

Figure 5.5.11.
0.4
0.3
0.2
0.1
0.0

−3 −2 −1 0 1 2 3

P( Z ě 1.96)

137
Continuous Distributions 5.5 Normal Distribution

Figure 5.5.12.

0.4
0.3
0.2
0.1
0.0

−3 −2 −1 0 1 2 3

P(z ě ´1.96).

Figure 5.5.13.
0.4
0.3
0.2
0.1
0.0

−3 −2 −1 0 1 2 3

P(´1.96 ď z ď 1.96).

Example 5.5.5
The achievement scores for a college entrance examination are normally distributed with
mean 75 and standard deviation 10. What percentage of the students will score:
1. above 90?

2. below 70?

138
Continuous Distributions 5.5 Normal Distribution

3. between 80 and 90?


Solution:
Let X be the achievement score with µ = 75 and σ = 10. Convert x into z and use the
Normal Distribution Table to find out the required probabilities.
1. above 90?
 
X´µ 90 ´ 75
P( X ą 90) = P ą
σ 10
= P( Z ą 1.5)
= 1 ´ 0.9332
= 0.0668

2. below 70?
 
X´µ 70 ´ 75
P( X ă 70) = P ă
σ 10
= P( Z ă ´0.5)
= 0.3085

3. between 80 and 90?


 
80 ´ 75 X´µ 90 ´ 75
P(80 ă X ă 90) = P ă ă
10 σ 10
= P(0.5 ă Z ă 1.5)
= F (1.5) ´ F (0.5)
= 0.9332 ´ 0.6914
= 0.2417

Example 5.5.5

5.5.6 §§ Finding Probabilities and Percentiles


In the previous problems, we knew the x´value and we wanted to find the probability
P( X ď x ) or P( X ě x ). What if you know the probability or percentile (e.g., ‘top quarter’ or
‘bottom tenth’ or ‘middle 50%’), but you don’t know the cut-off x´value that will give you
this probability? We will call this situation a ‘backward’ Normal problem because you solve
for the x´value in a procedure that is backward from what you did in the previous types of
problems in this chapter.
Example 5.5.6
Manufactured items have a strength that has a normal distribution with a standard deviation
of 4.2. The mean strength can be altered by the operator. At what value should the mean
strength be set so that exactly 95% of the items have a strength less than 100? For a random

139
Continuous Distributions 5.5 Normal Distribution

sample of ten items, what is the probability that exactly two will have strength more than
100?
Solution:
Let X be the strength with σ = 4.2.

P( X ă 100) = 0.95
 
X´µ 100 ´ µ
P ă = 0.95
σ 4.2

Now we need to find the z value corresponding to the probability of 0.95, i.e., 95th percentile.
Looking inside the Table we see that z value is 1.645, i.e., P( Z ď 1.645) = 0.95

X´µ
z=
σ
100 ´ µ
1.645 =
4.2
µ = 100 ´ 1.645 ˆ 4.2
= 93.091

Therefore the mean strength µ should be set at 93.091.


Let Y be the number of items in a random sample of size 10 that will have strength less
than 100. To find the probability that exactly two will have strength more than 100 in a
random sample of 10 items, we need to use the Binomial distribution. We are given the
probability of having strength less than 100 is 0.95, so P( X ą 100) = 1 ´ 0.95 = 0.05 =
p; n = 10. The items are independent in the strength.
 
10
P (Y = 2 ) = 0.052 (1 ´ 0.05)10´2
2
= 0.075

Example 5.5.6

5.5.7 §§ Empirical Rule


§§ The 68-95-99.7 Rule for the Normal Curve
It is helpful to know the probability that X is within one or two or three standard deviations
of its expected value, µ. The Empirical Rule also called as 68-95-99.7 Rule for the Normal
Curve is shown in Figure 5.5.14. Basically the rule states that

1. approximately 68% of observations fall within 1 standard deviation of the mean, i.e.,
µ ˘ σ. The probability that X is within one standard deviation of its mean µ is 0.68

140
Continuous Distributions 5.5 Normal Distribution

2. approximately 95% of observations fall within 2 standard deviation of the mean, i.e.,
µ ˘ 2σ. The probability that X is within two standard deviation of its mean µ is 0.95

3. approximately 99.7% of observations fall within 3 standard deviation of the mean, i.e.,
µ ˘ 3σ. The probability that X is within three standard deviation of its mean µ is
0.997

Figure 5.5.14.

Example 5.5.7
What’s a normal pulse rate? That depends on a variety of factors. Pulse rates between 60
and 100 beats per minute are considered normal for children over 10 and adults. Suppose that
these pulse rates are approximately normally distributed with a mean of 72 and a standard
deviation of 12.

1. What proportion of adults will have pulse rates between 60 and 84?

2. 16% of the adults have pulse rate below what value?

3. 2.5% of the adults will have their pulse rates exceeding x. Find x?

Solution:
Let X be the pulse rate that has Normal Distribution with µ = 72 and σ = 12. Convert x
into z and use the Normal Distribution Table to find out the required probabilities.

1. P(60 ă X ă 84) = P(72 ´ 12 ă X ă 72 + 12) = P(µ ´ σ ă X ă µ + σ ) is 68% due to


empirical rule. Therefore 68% of children over 10 and adults have pulse rates between
60-84.

2. P( X ă x ) = 0.16, i.e., we need to find out the 16th Percentile of X. As 68% of


observations fall between µ ˘ σ, means that 32% fall outside µ ˘ σ or 16% fall below
µ ´ σ = 72 ´ 12 = 60. Therefore 16% of children over 10 and adults have pulse rates
below 60.

141
Continuous Distributions 5.5 Normal Distribution

3. P( X ą x ) = 0.025, i.e., we need to find out the 97.5th Percentile of X. As 95% of


observations fall between µ ˘ 2σ, this means that 5% fall outside µ ˘ 2σ or 2.5% fall
above µ + 2σ = 72 + 2(12) = 96. Therefore 2.5% of children over 10 and adults have
pulse rates above 96.

Example 5.5.7

5.5.8 §§ Normal Distribution: Moment Generating Function

Moment generating functions is an exciting new tool to make probabilistic computations very
efficient.

142
Continuous Distributions 5.5 Normal Distribution

Definition 5.5.8 (Moment Generating Function Mg f ).

The moment generating function of a random variable X, denoted by Mx (t); is


defined as:

Mx (t) = E(etx )

provided that the expectation exists for t in some neighborhood of 0. We call Mx (t)
the moment generating function because all of the moments of X can be obtained
by successively differentiating this function Mx (t) and then evaluating the result at
t = 0.
Let X be a Normal random variable, the moment generating function Mx (t) is:

Mx (t) = E(etx )
ż8 ( x´µ)2
tx ?1 ´
= e e 2σ2 dx
σ 2π
´8

x´µ
Let z = ñ X = µ + σz
σ
ż8
1 1 2
Mx (t) = etσz+tµ ? e´ 2 z σdz
σ 2π
´8
ż8
tµ 1 1 2
=e ? etσz´ 2 z dz

´8
tµ+ 12 σ2 t2
=e

Differentiation twice yields the first two moments



B 1 2 2
Mx (t) = (µ + σ t)e 2
2 tµ + σ t
Bt t =0 t =0

= E( X )

B2 1 2 2
Mx (t) + (µ + σ t) e 2
1 2 2
2 tµ + σ t 2 2 tµ + σ t
=σ e 2
Bt2 t =0 t =0
= σ 2 + µ2
= E( X 2 )

143
Continuous Distributions 5.5 Normal Distribution

5.5.9 §§ Sums of Independent Normal Random Variables


§§ Background
What is the total amount of food eaten by all individuals in PDC at LUMS on a given day?
In a given month? In a given year? How can we scale our results to compare the average,
variance, and standard deviation of these food totals to have an estimate of the total amount
of food required to cater for the needs of individuals?

Definition 5.5.9.

1. If X1 , X2 , . . . , Xn are independent Normal random variables, with means


µ1 , µ2 , . . . , µn , respectively, and variances σ12 , σ22 , . . . , σn2 , respectively, then
ÿn ÿ n ÿn
Xi = X1 + X2 + ¨ ¨ ¨ + Xn is normal with mean µi and variance σi2 ,
i =1 i =1 i =1
i.e.,
n n
Xi ´ ( µ i )
ř ř
n n n
σi2 ) ñ Z = i=1 d i=1
ÿ ÿ ÿ
Xi „ N ( µ i , „ N (0, 1)
n
i =1 i =1 i =1
σi2
ř
i =1

2. If X1 , X2 , . . . , Xn are independent Normal random variables, with each having


n
ÿ
2
common mean µ, and common variance σ , then X i = X1 + X2 + ¨ ¨ ¨ + X n
i =1
n
ÿ n
ÿ
is normal with mean µi = nµ and variance σi2 = nσ2 , i.e.,
i =1 i =1

n
ř
n Xi ´ nµ
i =1
ÿ
2
Xi „ N (nµ, nσ ) ñ Z = ? „ N (0, 1)
i =1 nσ2

Example 5.5.10
The weight of each of the eight individuals is approximately normally distributed with a mean
equal to 150 pounds and a standard deviation of 35 pounds each. What is the probability
that the total weight of eight people who occupy an elevator exceeds 1300 pounds?
Solution:
Let X be the weight of a single individual that is Normal with µ = 150 and σ = 35. As indi-
ÿ8
vidual weights are independent and Normal, so are the total weights Xi „ N (8(150), 8(35)2 ).
i =1
8
ÿ
Convert Xi into z and use the Normal Distribution Table to find out the required proba-
i =1

144
Continuous Distributions 5.6 Exponential Distribution

bilities.
n
ÿ   
ř
8 Xi ´ nµ
i =1 1300 ´ 8(150)
P Xi ą 1300 =P ? ą a
i =1 nσ2 8(35)2
= P( Z ą 1.01)
= 1 ´ 0.8438
= 0.1562
There is a 15.62% chance that the total weight of eight individuals in the elevator would
exceed 1300 pounds
Example 5.5.10

5.6 IJ Exponential Distribution


The density curve of any normal pdf is bell-shaped and thus symmetric. In many practical
situations, the variable of interest to the experimenter might not be symmetric but have a
skewed distribution. The exponential distribution plays an important role in describing a
large class of phenomena, particularly in the area of reliability theory.

§§ Background
Waiting is painful. What is the expected time until an air conditioning system fails as shown
in Figure 5.6.1? When a mother is waiting for her three children to call her, what is the
probability that the first call will arrive within the next 5 minutes?

Figure 5.6.1.

145
Continuous Distributions 5.6 Exponential Distribution

How to model the time to events X, e.g.,

1. Customer service: time on hold on a help line

2. Medicine: remaining years of life for a cancer patient

3. Ecology: dispersal distance of a seed

4. Seismology: Time elapsed before an earthquake occurs in a given region

In such cases, let X be the time between successive occurrences. Clearly, X is a continuous
random variable whose range consists of the non-negative real numbers. It is expected that
most calls, times or distances will be short and a few will be long. So the density should be
large near x = 0 and decreasing as x increases.

5.6.1 §§ Link between Poisson and Exponential Distribution

§§ Poisson Process
The number of events that occur in a window of time or region in space

1. Events occur randomly, but with a long-term average rate of λ per unit time. e.g.,
λ = 10 per hour or λ = 24 ˆ 10 per day

2. The events are rare enough that in a very short time interval, there is a negligible
chance of more than one event.

3. Poisson distribution provides an appropriate description of the number of events per


time interval

4. Exponential distribution provides a description of the length of time between two con-
secutive events

5. The important point is we know the average time between events but they are randomly
spaced (stochastic). Let X be the wait time until the first call at a Customer Centre
from any start point in this setting.

6. If the wait time for a call is at least t minutes, then how many calls occurred in the
first t minutes?

146
Continuous Distributions 5.6 Exponential Distribution

Definition 5.6.1 (Exponential Random Variable).

• Let X be the time elapsed after a Poisson event.

• Let Y be the number of events in a time interval [0, t ), i.e., Y „ Poisson(λt)

P( X ą t) = P(no event occurred in the time interval t)


= P (Y = 0 ) ;
e´λt (λt)0
=
0!
´λt
=e
P( X ď t) = 1 ´ P( X ą t)
= 1 ´ e´λt

The time gap between successive events from a Poisson process (with mean number
of events λ ą 0 per unit interval) is an exponential random variable with rate
parameter λ.

5.6.2 §§ Exponential Distribution: (cd f )

Definition 5.6.2 (Exponential Distribution: (cd f )).

The cd f is shown in Figure 5.6.2.

F ( x ) = P( X ď x )
żx
= λe´λx dx
0
$
&
= 1 ´ e´λx , if x ě 0,
0, otherwise.
%

147
Continuous Distributions 5.6 Exponential Distribution

Figure 5.6.2.

Exponential cdf with λ=1 30

1.0
0.8
0.6
F(t)

0.4
0.2
0.0

0 50 100 150 200 250 300

5.6.3 §§ Exponential Distribution: (pd f )

Definition 5.6.3 (Exponential Distribution: (pd f )).

A random variable X „ Exp(λ), if its density is given by

d
f (x) = F(X )
dx
$
&
= λe´λx , if x ě 0,
0, otherwise.
%

Exponential distribution has only 1 parameter λ which is the average rate, i.e., the
number of events per time period.

The pd f is shown in Figure 5.6.3.

148
Continuous Distributions 5.6 Exponential Distribution

Figure 5.6.3.

Exponential Density with λ=1 30

0.030
0.025
0.020
f(t)

0.015
0.010
0.005
0.000

0 50 100 150 200

Some examples of the densities pd f s and cd f s for Exponential random variables with
various values of λ are given in Figure 5.6.4 and Figure 5.6.5 respectively.

Figure 5.6.4.

λ = 0.05
λ=1
λ=2
λ=4
f(t)

Time

149
Continuous Distributions 5.6 Exponential Distribution

Figure 5.6.5.

F(t)
λ = 0.05
λ=1
λ=2
λ=4

Time

These examples show that no matter what the λ parameter is, the density starts at λ
when x = 0 and then quickly moves closer to 0 as x Ñ 8. The cd f starts at 0 but quickly
climbs close to 1 as x Ñ 8. For larger λ, the pd f and cd f curves are steeper, i.e., when λ
is large, the pdf f X ( x ) decays rapidly but the cd f FX ( x ) shows a rapid increase.

5.6.4 §§ Exponential Distribution: Expectation and Variance


If X „ Exp(λ), then

a. E( X ) = 1/λ is the expected time between successive occurrences

b. Var ( X ) = 1/λ2

That is, for exponential distribution, mean and standard deviation are equal.
Example 5.6.4 (Arrival Time of Factory Workers)
The arrival times of workers at a factory first-aid room satisfy a Poisson process with an
average of 1.8 per hour.

1. What is the expectation of the time between two arrivals at the first-aid room?

2. What is the probability that there is at least 1 hour between two arrivals at the first-aid
room?

3. What is the distribution of the number of workers visiting the first-aid room during a
4-hour period?

4. What is the probability that at least four workers visit the first-aid room during a
4-hour period?

150
Continuous Distributions 5.6 Exponential Distribution

Solution:
Let X be the time between 2 arrivals, then X „ exp(λ), with λ = 1.8

1. E( X ) = 1/1.8 = 0.5556.

2. P( X ě 1) = 1 ´ P( X ă 1) = 1 ´ (1 ´ e´1.81 ) = e´1.8 = 0.1653

3. Number of workers Y visiting the first-aid room during a 4-hour period is Poisson with
parameter λt = 1.8 ˆ 4 = 7.2

4.

P (Y ě 4 ) = 1 ´ P (Y ă 4 )
= 1 ´ P (Y = 0 ) ´ P (Y = 1 ) ´ P (Y = 2 ) ´ P (Y = 3 )
e´7.2 ˆ 7.20 e´7.2 ˆ 7.21 e´7.2 ˆ 7.22 e´7.2 ˆ 7.23
= 1´ ´ ´ ´
0! 1! 2! 3!
= 0.9281

Example 5.6.4

Definition 5.6.5 (Exponential Distribution: Memoryless Property).

P( T ą t1 + t2 |T ą t1 ) = P( T ą t2 ); for all t1 , t2 ě 0

• From the point of view of waiting time, the memoryless property means that
it does not matter how long you have waited so far. If you have waited for
at least t1 time, the distribution of waiting time (from time t1 ) until t2 is the
same as when you started at time zero.

• The likelihood of an event is completely independent of the past history (mem-


oryless). Both the exponential density and the geometric distribution share
this ’memoryless’ property

A memoryless wait for a bus would mean that the probability that a bus arrived in
the next minute is the same whether you just got to the station or if you’ve been
sitting there for twenty minutes already.

Example 5.6.6 (Arrival Time of Factory Workers Cont’d)


What is the probability that there is at least 1 hour between two arrivals at the first-aid room,
if no worker arrived in the 1st half an hour?
Solution:

151
Continuous Distributions 5.6 Exponential Distribution

Here we need conditional probability which in the current scenario can be computed by using
the Memoryless Property

P( X ą 1|X ą 0.5) = P( X ą 0.5)


= 1 ´ P( X ă 0.5)
= 1 ´ (1 ´ e´1.8ˆ0.5 )
= e´0.9
= 0.4066

Example 5.6.6

Summary: Relationship between Poisson and Exponential


Poisson Distribution Exponential Distribution
X: Number of successes in X: time between successive
unit time successes
discrete Random Variable continuous Random Variable
λ: mean number of β = 1/λ = expected time between
successes in unit time successive successes

152
Continuous Distributions 5.7 Home Work

5.7 IJ Home Work


1. For the cd f given below

0 x ď ´1,
$

x + 1


´1 ă x ď 1


& 4



F(x) = x

’ 1ăxď2



’ 2


%
1 xą2

a. Find the pdf


b. Find E( x )
c. Find Var ( x )

2. Ninety identical electrical circuits are monitored at an extreme temperature to see how
long they last before failing. The 50th failure occurs after 263 minutes. If the failure
times are modeled with an exponential distribution,

a. when would you predict that the 80th failure will occur?
b. At what time will only 5% of the circuits fail?

3. In a study of the bone disease osteoporosis, heights of 351 elderly women were measured.
Suppose that their heights follow a normal distribution with µ = 160cm, but unknown
σ. Suppose that 2.27% of those women are taller than 170 cm, what is the standard
deviation? For a random sample of ten women, what is the probability that exactly
two will be shorter than 155cm?

4. A soft-drink machine can be regulated so that it discharges an average of µ ounces per


cup. If the ounces of fill are normally distributed with standard deviation 0.3 ounce,
give the setting for µ so that 8-ounce cups will overflow only 1% of the time.

5. The operator of a pumping station has observed that demand for water during early
afternoon hours has an approximately exponential distribution with mean 100 cfs (cubic
feet per second). Find the probability that the

a. demand will exceed 200 cfs during the early afternoon on a randomly selected day.
b. demand will exceed 200 cfs on a given day, given that previous demand was at
least 150 cfs?

What water-pumping capacity should the station maintain during early afternoons so
that the probability that demand will exceed capacity on a randomly selected day is
only .01?

6. Five students are waiting to talk to the TA when office hours begin. The TA talks
to the students one at a time, starting with the first student and ending with the
fifth student, with no breaks between students. Suppose that the time taken by the

153
Continuous Distributions 5.7 Home Work

TA to talk to a student has a normal distribution with a mean of 8 minutes and a


standard deviation of 2 minutes, and suppose that the times taken by the students are
independent of each other. What is the probability that the total time taken by the
TA to talk to all five students is longer than 45 minutes?

7. A weather forecaster predicts that the May rainfall in a local area will be between three
and six inches but has no idea where within the interval the amount will be. Let X be
the amount of May rainfall in the local area. What is the probability that May rainfall
will be at least four inches? At most five inches? Explicitly specify the distribution
involved and the parameters from the scenario.

8. A student waits for a bus. Let X be the number of hours that the student waits.
Assume that the waiting time is Exponential with average 20 minutes.

a. What is the probability that the student waits more than 30 minutes?
b. What is the probability that the student waits more than 45 minutes (total), given
that she has already waited for 20 minutes?
c. Given that someone waits less than 45 minutes, what is the probability that they
waited less than 20 minutes?
d. What is the standard deviation of the student’s waiting time?

§§ Answers
$
1
& 4 ´1 ă x ď 1



1. f ( x ) = 1
1ăxď2
% 2



0 otherwise
3/4; 37/48

2. 732.4; 16.6537

3. σ = 5; 0.2844

4. µ = 6.953 ounce

5. 0.1353; 0.6065; 460.52 cfs

6. 0.132

7. 0.667

8. a. 0.2231
b. 0.2865
c. 0.7066
d. 20 minutes

154
Chapter 6

Limit Theorems

AS YOU READ . . .

1. What is the law of large numbers?


2. What is Chebyshev’s Inequality?
3. What are limit theorems?
4. What are the conditions under which the sum of large number of variables or general
sample averages is approximately normally distributed?

6.1 IJ Limit Theorems


This is a very important result in Probability Theory with several different variations.

Laws of Large Numbers: For large n, the average of a large number of i.i.d.1 random
variables converges to the expected value.
Central Limit Theorems: Determining conditions under which the sum of a large num-
ber of random variables has an approximately normal distribution.

6.1.1 §§ Chebyshev Inequality


§§ Background
If we know the probability distribution of a random variable X (either the pd f in the con-
tinuous case or the pm f in the discrete case), we may then compute E( X ) and Var ( X ), if
these exist. However, for unknown Probability Distribution of X, we cannot compute quan-
tities such as P(|X ´ E( X )| ď C ). Chebyshev Inequality is used to find an approximation
of the distribution in this scenario, when both E( X ) and Var ( X ) are known. It is a way of

1 independent & identically distributed

155
Limit Theorems 6.1 Limit Theorems

quantifying the fact that a random variable is ‘relatively close’ to its expected value ‘most
of the time’. It gives bounds that quantify both ‘how close’ and ‘how much of the time’ the
random variable is to its expected value.

Definition 6.1.1 (Chebyshev’s Inequality).

§§ Bounding Probabilities using Expectations & Variance


Let X be a random variable with mean µ and standard deviation σ. Then, for any
k ě 1,
1
P(|X ´ µ| ď kσ ) ě 1 ´ 2 ;
k
1
That is at least 1 ´ 2 of observations fall within µ ˘ kσ.
k

Compare this inequality with Empirical Rule that also gives bound for probabilities for
k = 1, 2, 3. However, the difference between the entities is that Chebyshev’s Inequality is
applicable when distribution is not known. Figure 6.1.1 shows 1 ´ 1/k2 as the shaded area
that falls between µ ˘ kσ.

156
Limit Theorems 6.1 Limit Theorems

Figure 6.1.1.

Relative Frequency

1
1−
k2

µ− kσ µ+ kσ

Chebyshev Inequality

1
a. For k = 2, approximately 1 ´ 2 = 0.75 of observations fall within k = 2 standard
2
deviation of the mean
1
b. For k = 3, approximately 1 ´ 2 = 0.89 of observations fall within k = 3 standard
3
deviation of the mean

Example 6.1.2

1. The number of customers per day (Y) at a sales counter, has been observed for a long
period of time and found to have mean 20 and standard deviation 2. The probability
distribution of Y is not known. What can be said about the probability that, tomorrow
Y will be greater than 16 but less than 24?

157
Limit Theorems 6.2 Central Limit Theorem (CLT)

2. A mail-order computer business has six telephone lines. Let X denote the number of
lines in use at a specified time. Compute µ and σ for the distribution below. Using
k = 2, 3, what does Chebyshev Inequality suggest about the upper bound relative to
the corresponding probability? Interpret.

x 0 1 2 3 4 5 6
p( x ) 0.10 0.15 0.20 0.25 0.20 0.06 0.04

Solution:

1. µ = 20; σ = 2

P(16 ă Y ă 24) = P(20 ´ 2 ˆ 2 ă Y ă 20 + 2 ˆ 2)


= P(µ ´ 2σ ă Y ă µ + 2σ)

6 k = 2. This suggests that 3/4 of the observations fall between 16-24.

2. µ = 2.64; σ = 1.53961

a. For k = 2, approximately 75% of observations fall within -0.43922-5.71922


b. For k = 3, approximately 89% of observations fall within -1.97883-7.25883

Example 6.1.2

6.2 IJ Central Limit Theorem (CLT)


§§ Background
How to model the chance behavior for:

1. the electricity consumption in a city at any given time that is the sum of the demands
of a large number of individual consumers

2. the quantity of water in a reservoir may be thought of as representing the sum of a


very large number of individual contributions.

3. the error of measurement in a physical experiment is composed of many unobservable


small errors which may be considered additive.

In these examples, the interest is to model the sum of either demands or quantity of water
as a sum of individual contributions or the measurement error as the sum of unobservable
small errors. What will be the distribution of the sum in these examples?

158
Limit Theorems 6.2 Central Limit Theorem (CLT)

6.2.1 §§ Sample Total (CLT)


This is a truly powerful concept that is used throughout the sciences and beyond. The Central
Limit Theorem, often called the CLT, is amazing because the random variables X1 , X2 , X3 , . . .
need not to be Normal.

Definition 6.2.1 (Central Limit Theorem (CLT): Sample Total).

Let X be a random variable, with finite mean µ and finite variance σ2 . Suppose you
repeatedly draw independent samples of size n from the distribution of X. Then as
ÿn
n ÝÑ 8, the distribution of the sample total Xi = X1 + X2 + ¨ ¨ ¨ + Xn becomes
i =1
approximately normal, i.e.,
n
ř
n Xi ´ nµ
i =1
ÿ
2
Xi « N (nµ, nσ ), while z = ? « N (0, 1)
i =1 nσ2
?
nσ2 is called the standard error of the total.
In other words
 
n
ř
 Xi ´ nµ  ż a e´z2 /2
 i=1? 
lim P  ď a = ? dz
nÑ8  nσ 2  ´8 2π

= F ( a)

This theorem basically says that sums of n independent random variables (of any type)
are distributed similarly to a Normal random variable when n is large. The CLT is more
effective, when n is larger. The next examples show some applications of CLT.
Example 6.2.2

1. When a batch of a certain chemical product is prepared, the amount of a particular


impurity in the batch is a random variable with mean value 4.0 g and standard deviation
1.5 g. If 50 batches are independently prepared, what is the (approximate) probability
that the total amount of impurity is between 175 and 190 g?

2. Consider the volumes of soda remaining in 100 cans of soda that are nearly empty. Let
X1 , . . . , X100 , denote the volumes (in ounces) of cans one through one hundred, respec-
tively. Suppose that the volumes X j are independent, and that each X j is Uniformly
distributed between 0 and 2. Find the probability that the 100 cans of soda contain
less than 90 ounces of soda in total.

Solution:

159
Limit Theorems 6.2 Central Limit Theorem (CLT)

1. Let X be the amount of a particular impurity. X „ N (µ = 4; σ = 1.5). For


50
ÿ
n = 50 independently prepared batches, the total amount of impurity is Xi . Then
i =1
50
ÿ
Xi « N (nµ = 50(4), nσ2 = 50(1.52 )).
i =1

 ř 
175 ´ 50(4) X ´ nµ 190 ´ 50(4)
P(175 ď T ď 190) = P a ă ? ă a
50(1.52 ) nσ2 50(1.52 )
= F (´0.94) ´ F (´2.36)
= 0.1645

In other words, there is 16.45% chance that the total amount of impurity in 50 batches
is between 175 - 190g.

2. The volumes X j „ U (0, 2); 6 µ = (0 + 2)/2 = 1; σ2 = (2 ´ 0)2 /12 = 1/3. For


100
ÿ
n = 100 independent cans of soda, the total amount of remaining soda is T = Xi .
i =1
Then T « N (nµ = 100(1), nσ2 = 100(1/3)).

ř 
X ´ nµ 90 ´ 100(1)
P( T ď 90) = P ? ăa
nσ2 100(1/3)
= P( Z ď ´1.73)
= 1 ´ 0.9582
= 0.0418

In other words, the probability that the total volume of soda in 100 cans of soda is less
than 90 ounces is approximately 4.18%.

Example 6.2.2

6.2.2 §§ Sample Mean (CLT)

If random samples of size n are drawn from a population repeatedly and sample mean x̄ are
computed. Figure 6.2.1 shows the main idea of Central Limit Theorem using a hypothetical
example. How do all sample means that are generated this way behave as this process
continues indefinitely?

160
Limit Theorems 6.2 Central Limit Theorem (CLT)

Figure 6.2.1.

6.2.3 §§ Simulations

Repeatedly sampling from a population using a specific sampling plan, we can assess the
performance of the resulting sample means.

§§ Simulation Study 1: Normal Population

Population data were generated from a Normal population with µ = 60; σ = 1. Figure 6.2.2
displays the distribution of 10,000 such data points simulated from a N (µ = 60; σ = 1).

161
Limit Theorems 6.2 Central Limit Theorem (CLT)

Figure 6.2.2.

Population of Heights

400
300
Frequency

200
100

µ = 60
0

56 58 60 62 64

Heights

Parent Distribution: Normal

A random sample of size n = 30 was drawn from the population data simulated and
sample mean x̄ was computed. This procedure was repeated 100,000 times. The distribution
of 100,000 sample means x̄ is displayed in Figure 6.2.3.

162
Limit Theorems 6.2 Central Limit Theorem (CLT)

Figure 6.2.3.

Histogram of x

20000
15000
Frequency

10000
5000

µ = 60
0

58 60 62 64

Distribution of the Sample Means x̄

Both the distributions are centered at µ = 60. The variability of the 2 distributions needs
special attention, distribution of 100,000 sample means is much narrower, showing much less
variability.

§§ Simulation Study 2: Uniform Population

Population data were generated from a Uniform population with U (0, 1). Figure 6.2.4 dis-
plays the 10,000 such data points simulated from a U (0, 1).

163
Limit Theorems 6.2 Central Limit Theorem (CLT)

Figure 6.2.4.

Population Distribution

120
100
80
Frequency

60
40
20
0

0.0 0.2 0.4 0.6 0.8 1.0

Parent Distribution U(0,1)

A random sample of size n = 30 was drawn from the population data simulated &
sample mean x̄ was computed. This procedure was repeated 100,000 times. The distribution
of 100,000 sample means x̄ is displayed in Figure 6.2.5.

164
Limit Theorems 6.2 Central Limit Theorem (CLT)

Figure 6.2.5.

Histogram of x

15000
10000
Frequency

5000
0

0.0 0.2 0.4 0.6 0.8 1.0

Distribution of the Sample Means x̄

Both the distributions are centered at µ = 0.5. This shows that regardless of the parent
population, the distribution of the sample means is approximately normal with a mean of
µ = 0.5. Again the same phenomenon is observed in the variability of the 2 distributions,
distribution of 100,000 sample means is much narrower, showing much less variability.
Definition 6.2.3 (Central Limit Theorem (CLT): Sample Mean X̄).

The Central Limit Theorem basically says that for non-normal data, the distribution
of the sample means has an approximate normal distribution, no matter what the
distribution of the original data looks like, as long as the sample size is large enough
(usually at least 30) and all samples have the same size.
If X1 , X2 , . . . , Xn is a random sample of size n taken from a population (either finite
or infinite) with mean µ and finite variance σ2 , and if x̄ is the sample mean 2

σ2 x̄ ´ µ
X̄ „ N (µ, ) as n Ñ 8 Z= ? „ N (0, 1);
n σ/ n
?
where σ/ n is called the standard error of the mean.

2 Statistics For Dummies, 2nd Edition, Deborah J. Rumsey

165
Limit Theorems 6.2 Central Limit Theorem (CLT)

The central limit theorem tells us that for a population with any distribution,

• the distribution of the sample means approaches a normal distribution as the sample
size increases.

• the mean of the sample means is the same as the mean of the original population as
the sample size increases.

• the distribution of the sample means becomes narrower as the sample size increases,
showing that the standard deviation of the sample means becomes smaller.

Example 6.2.4
A coffee dispensing machine is supposed to dispense a mean of 7.00 fluid ounces of coffee per
cup with a standard deviation of 0.25 fluid ounces. The distribution approximates a normal
distribution. What is the probability that, when 12 cups are dispensed, their mean volume
is more than 7.15 fluid ounces?
Solution:
Let X be the amount of coffee dispensed, X „ N (µ = 7; σ = 0.25); n = 12
 
X̄ ´ µ 7.15 ´ 7
P( x̄ ą 7.15) = P ? ą ?
σ/ n 0.25/ 12
= P( Z ą 2.08)
= 1 ´ P( Z ă 2.08)
= 1 ´ 0.9811665
= 0.01883

In other words, there is 1.88% chance that the average amount of coffee dispensed exceeds
7.15 ounces.
Example 6.2.4

Example 6.2.5
The fracture strength of tempered glass averages 14 (measured in thousands of pounds per
square inch) and has standard deviation of 2.

a. What is the probability that the average fracture strength of 100 randomly selected
pieces of this glass exceeds 14.5?

b. Find an interval that includes, with probability 0.95, the average fracture strength of
100 randomly selected pieces of this glass.

Solution:

166
Limit Theorems 6.2 Central Limit Theorem (CLT)

a.
 
X̄ ´ µ 14.5 ´ 14
P( x̄ ą 14.5) = P ? ą ?
σ/ n 2/ 100
= P( Z ą 2.5)
= 1 ´ P( Z ă 2.5)
= 1 ´ 0.9938
= 0.0062

There is 0.6% chance that the average fracture strength of 100 randomly selected pieces
of this glass exceeds 14.5.

b. The central 95% means the area of 5% is divided equally in the 2 tails of the normal
curve. Therefore, P( Z ď z2 ) = 0.95 + 0.05/2 = 0.975 gives the cd f corresponding to
z2 . Looking inside the Normal Distribution table, we find the corresponding z-value as
1.96.

P(z1 ă Z ă z2 ) = 0.95
P(´1.96 ă Z ă 1.96) = 0.95
X̄ ´ µ
Z= ?
σ/ n
X̄2 ´ 14
1.96 = ?
2/ 100
X̄2 = 1.96 ˆ 1/5 + 14
= 14.392
X̄1 = ´1.96 ˆ 1/5 + 14
= 13.608
P(13.608 ă x̄ ă 14.392) = 0.95

There is 95% chance that the average fracture strength of 100 randomly selected pieces
of this glass lies in the interval 13.608 - 14.392.

Example 6.2.5

6.2.4 §§ Normal Approximation to Binomial


If X „ bin(n, p), then E( X ) = np and Var ( X ) = np(1 ´ p)
If the binomial probability histogram is not too skewed, then as n Ñ 8,

X « N (np, np(1 ´ p));

X ´ np
while Z = a « N (0, 1),
np(1 ´ p)

167
Limit Theorems 6.2 Central Limit Theorem (CLT)

i.e., the binomial distribution approaches to normal for large n. This phenomenon is shown
in simualted data distribution from a binomial with n = 40; p = 0.2 in Figure 6.2.6. The
binomial distribution has become evidently symmetrical.

Figure 6.2.6.

n=40, p=0.2
0.15
0.10
0.05
0.00

0 3 6 9 12 16 20 24 28 32 36 40

Binomial Distribution for n = 40 and p = 0.2

1. A general rule: The normal approximation is reasonable as long as both np ě 5 and


n (1 ´ p ) ě 5
2. The normal approximation accuracy improves as n Ñ 8.
3. Caveat: Suppose X = 12, then P( X = 12) ą 0 for a discrete random variable, but
when approximating it with normal, P( X = 12) = 0. Therefore, some care must be
taken with the endpoints of the intervals involved.

168
Limit Theorems 6.2 Central Limit Theorem (CLT)

To use the normal distribution to approximate the probability of obtaining exactly 12 (i.e.,
P( X = 12)), we would find the area under the normal curve from X = 11.5 to X = 12.5, the
lower and upper boundaries of 12, (see Figure 6.2.7). The small correction of 0.5 is used to
allow for the fact of using normal distribution to approximate binomial probabilities.

Figure 6.2.7.
0.20
0.15
0.10
0.05
0.00

0 2 4 6 8 10 13 16 19 22 25

Continuity Correction

§§ Continuity Correction

To find the binomial probabilities in the left hand side of the expressions, calculate the
approximation using normal distribution as shown in the right hand side of expressions given

169
Limit Theorems 6.2 Central Limit Theorem (CLT)

below .

Binomial « Normal
1 1
P( X = k) « P(k ´ ď X ď k + )
2 2
1 1
P( a ď X ď b) « P( a ´ ď X ď b + )
2 2
1
P( X ą k) « P( X ą k + )
2
1
P( X ě k) « P( X ą k ´ )
2
1
P( X ă k) « P( X ă k ´ )
2
1
P( X ď k) « P( X ă k + )
2

Caution: Continuity correction is only used for applications of the Central Limit Theorem
to discrete random variables. Continuity correction is not needed when applying the Central
Limit Theorem to sums of continuous random variables.
Example 6.2.6
At a certain local restaurant, students are known to prefer Japanese pan noodles 40% of the
time. Consider 2000 randomly chosen students, what is the probability that at most 840 of
the students eat Japanese pan noodles there?
Solution:
p = 0.4; n = 2000; np = 2000(0.4) = 800; n(1 ´ p) = 2000(1 ´ 0.4) = 1200; np(1 ´ p) =
480. As both np & n(1 ´ p) ă 5, so normal approximation to binomial is appropriate here.
 
P( X ď 840) = P X ď 840 + 0.5

840.5 ´ 1200
= P( Z ď ? )
480
= P( Z ď 1.85)
= 0.9677

Probability that at most 840 students eat Japanese pan noodles is 0.9677.
Example 6.2.6
A Poisson random variable with large parameter λ will be distributed like a Normal random
variable.

170
Limit Theorems 6.3 Home Work

6.3 IJ Home Work


1. Ali is pursuing a major in computer science. He notices that a memory chip containing
212 = 4096 bits is full of data that seems to have been generated, bit-by-bit, at random,
with 0’s and 1’s equally likely, and the bits are stored independently. If each bit is
equally likely to be a 0 or 1, estimate the probability that there are actually 2140 or
more 1’s stored in the memory chip?

2. The service times for customers coming through a checkout counter in a retail store are
independent random variables with mean 1.5 minutes and variance 1.0. Approximate
the probability that 100 customers can be served in less than 2 hours of total service
time.

3. The quality of computer disks is measured by the number of missing pulses. Brand
X is such that 80% of the disks have no missing pulses. If 100 disks of brand X are
inspected, what is the probability that 15 or more contain missing pulses?

4. Consider the lengths of calls handled by Zahir in a call center. The calls are indepen-
dent Exponential random variables, and each call lasts, on average, 1/3 of an hour.
On a particular day, Zahir records the lengths of 24 consecutive calls. What is the
approximate probability that the average of these 24 calls exceeds 1/4 of an hour?

5. At an auction, exactly 282 people place requests for an item. The bids are placed
‘blindly,’ which means that they are placed independently, without knowledge of the
actions of any other bidders. Assume that each bid (measured in dollars) is a continuous
random variable with a mean of $14.9 and a standard deviation of $2.54. Find the
probability that the sum of all the bids exceeds $4150.

6. A machine is shut down for repairs if a random sample of 100 items selected from the
daily output of the machine reveals at least 15% defectives. (Assume that the daily
output is a large number of items.) If on a given day the machine is producing only
10% defective items, what is the probability that it will be shut down?

7. An electronics company manufactures resistors that have a mean resistance of 100 ohms
and a standard deviation of 10 ohms. The distribution of resistance is normal. Find the
probability that a random sample of n = 25 resistors will have an average resistance
less than 95 ohms.

8. PVC pipe is manufactured with a mean diameter of 1.01 inch and a standard deviation
of 0.003 inch. Find the probability that a random sample of n = 9 sections of pipe will
have a sample mean diameter greater than 1.009 inch and less than 1.012 inch.

§§ Answers
1. 0.0021

2. 0.0013

3. 0.9162

171
Limit Theorems 6.3 Home Work

4. 0.8888

5. 0.8869

6. 0.0668

7. 0.0062

8. 0.8186

172
Chapter 7

Joint Distributions

AS YOU READ . . .
1. What is a Joint Distribution?
2. How do you model the joint chance behavior of more than one random variable?
3. What are marginal distributions?
4. What is convolution? How is it useful to find the distribution of sums of independent
random variables?

7.1 IJ Bivariate Distributions


A problem involving many random variables and the quantification of the simultaneous
chance behavior of the random variables falls under the area of joint distributions. The joint
distribution involving 2 random variables is called a bivariate distribution.

§§ Real Life Examples


In science and in real life, we are often interested in two (or more) random variables at
the same time, e.g., the interest might be in the joint distribution of the values of various
physiological variables in medical studies, e.g.,
1. Stress Index and Blood Pressure
2. Heights of parents and of offsprings
3. Frequency of exercise and the rate of heart disease in adults
4. Level of air pollution and rate of respiratory illness in Lahore
5. Census: Studying several variables, such as income, age, and gender, etc., provide
detailed information on the society where the census is performed.
In general, if X and Y are two random variables, a joint probability distribution defines their
simultaneous chance behavior.

173
Joint Distributions 7.2 Joint Distributions: Discrete case

Example 7.1.1 (Rolling of 2 Dice)

1. Let X = the outcome on the 1st die D1 = t1, 2, 3, 4, 5, 6u


2. Let Y = the outcome on the 2nd die D2 = t1, 2, 3, 4, 5, 6u

What is the probability that X takes on a particular value x, and Y takes on a particular
value y? i.e., what is P( X = x, Y = y)?
The entries in the cells of the Table 7.1 show the joint probabilities associated with X and
Y.
x/y 1 2 3 4 5 6 PX ( x )
1 1/36 1/36 1/36 1/36 1/36 1/36 1/6
2 1/36 1/36 1/36 1/36 1/36 1/36 1/6
3 1/36 1/36 1/36 1/36 1/36 1/36 1/6
4 1/36 1/36 1/36 1/36 1/36 1/36 1/6
5 1/36 1/36 1/36 1/36 1/36 1/36 1/6
6 1/36 1/36 1/36 1/36 1/36 1/36 1/6
PY (y) 1/6 1/6 1/6 1/6 1/6 1/6

Table 7.1

Example 7.1.1

7.2 IJ Joint Distributions: Discrete case


Definition 7.2.1 (Joint Probability Mass Function).

The joint probability mass function of a pair of discrete random variables X and Y
is:
PX,Y ( x, y) = P( X = x & Y = y)
Properties:

1. 0 ď p X,Y ( x, y) ď 1 @ x, y
ÿÿ
2. PX,Y ( x, y) = 1
x y

The joint probability mass function for the roll of 2 dice is shown in Figure 7.2.1. A
nonzero probability is assigned to a point ( x, y) in the plane if and only if x = 1, 2, . . . , 6
and y = 1, 2, . . . , 6. Thus, exactly 36 points in the plane are assigned nonzero probabilities
of 1/36. Further, the probabilities are assigned in such a way that the sum of the nonzero
probabilities is equal to 1.

174
Joint Distributions 7.2 Joint Distributions: Discrete case

Figure 7.2.1.

P(X,Y)

6 6

5 5

4 4

3 3
Die 2 Die 1
2 2

1 1

Joint Probability Mass Function for Two Dice Roll

Definition 7.2.2 (Marginal Distributions).

If we are given a joint probability distribution for X and Y, we can obtain the
individual probability distribution for X or for Y.

1. The probability mass function of X alone, called the marginal probability


mass function of X, is defined by:
ÿ
PX ( x ) = p( x ) = P( x, y)
y

2. The probability mass function of Y alone, called the marginal probability mass
function of Y, is defined by:
ÿ
PY (y) = p(y) = P( x, y)
x

The term marginal, as applied to the univariate probability functions of X and Y,


has intuitive meaning. To find PX ( x ), we sum p( x, y) over all values of y and hence
accumulate the probabilities on the x axis (or margin).

175
Joint Distributions 7.2 Joint Distributions: Discrete case

7.2.1 §§ Joint Cumulative Distribution Function (cd f )

Example 7.2.3 (Rolling of 2 Dice Cont’d)

FX,Y ( a, b) = P( X ď a & Y ď b)

The joint cd f for 2 dice rolls is given in Table 7.2 below. The Table entries can be filled in
by cumulating the probabilities in Table 7.1 from the lower end to a certain value of X and
Y

Fxy 1 2 3 4 5 6
1 1/36 2/36 . . . .
2 2/36 4/36 . . . .
3 3/36 6/36 . . . .
4 . . . . . .
5 . . . . . .
6 . . . . . 36/36

Table 7.2

Example 7.2.3

176
Joint Distributions 7.2 Joint Distributions: Discrete case

Figure 7.2.2.

F(X,Y)

6 6

5 5

4 4

3 3
Die 2 Die 1
2 2

1 1

Joint Cumulative Distribution Function

The joint Cumulative Distribution Function for roll of 2 dice is shown in Figure 7.2.2.
A nonzero cumulative probability is assigned to a point ( x, y) in the plane if and only if
x = 1, 2, . . . , 6 and y = 1, 2, . . . , 6. These cumulative probabilities are increasing functions of
x, y and approach to maximum value of 1.

177
Joint Distributions 7.2 Joint Distributions: Discrete case

Definition 7.2.4 (Joint Cumulative Distribution Function: Discrete Case).

The joint cd f of a pair of discrete random variables X and Y is

FX,Y ( a, b) = P( X ď a & Y ď b)

The joint cd f satisfies the following properties:

1. 0 ď FX,Y ( a, b) ď 1 @ a, b

2. lim FXY ( a, b) = 0
aÑ´8,bÑ´8

3. lim FXY ( a, b) = 1
aÑ8,bÑ8

4. If X and Y are independent, then FX,Y ( a, b) = FX ( a) ¨ FY (b)

5. For a ă b & c ă d ùñ FX,Y ( a, c) ď FX,Y (b, d)

Example 7.2.5 (Association of Gender and CHD)


The main purpose of some studies is to see how a set of data is distributed across a small set
of categories or classes. The tabular arrangement of 2 categorical variables into classes along
with corresponding counts is called a contingency table. An association study of CHD1 with
2 possible outcomes: Present coded as ( X ) = 1 and Absent coded as ( X ) = 0, with gender
coded as (Y ) = 1 for males and (Y ) = 0 for females. The joint frequency distribution of X
and Y is given in the following contingency table:

CHD Absent CHD Present Total


( X = 0) ( X = 1)
Female (Y = 0) 977 23 1000
Male (Y = 1) 948 52 1000
Total 1925 75 2000

1. Find FX,Y (1, 0)


2. Find FX,Y (0, 1)
Solution:
Before starting to use the contingency table, convert it into joint pm f by dividing cell fre-
quencies by grand total of 2000. Remember that your Table should be arranged with values
of X, Y along with corresponding joint probabilities in ascending order for X and Y.
1. FX,Y (1, 0) = P( X ď 1, Y ď 0) = P( X = 0, Y = 0) + P( X = 1, Y = 0) = 977/2000 +
23/2000 = 1/2
2. FX,Y (0, 1) = P( X ď 0, Y ď 1) = P( X = 0, Y = 0) + P( X = 0, Y = 1) = 977/2000 +
948/2000 = 1925/2000

1 CHD: Coronary heart disease

178
Joint Distributions 7.2 Joint Distributions: Discrete case

Example 7.2.5

7.2.2 §§ Independent Random Variables

Example 7.2.6 (Rolling of 2 Dice Cont’d)

x/y 1 2 3 4 5 6 PX ( x )
1 1/36 1/36 1/36 1/36 1/36 1/36 1/6
2 1/36 1/36 1/36 1/36 1/36 1/36 1/6
3 1/36 1/36 1/36 1/36 1/36 1/36 1/6
4 1/36 1/36 1/36 1/36 1/36 1/36 1/6
5 1/36 1/36 1/36 1/36 1/36 1/36 1/6
6 1/36 1/36 1/36 1/36 1/36 1/36 1/6
PY (y) 1/6 1/6 1/6 1/6 1/6 1/6

PX,Y ( x, y) = 1/36 = P( X = x ) ¨ P(Y = y) = 1/6 ˆ 1/6


This holds true for all x & y.
Example 7.2.6

Definition 7.2.7 (Independent Random Variables).

Two discrete random variables are independent if either

1. p X,Y ( x, y) = p X ( x ) ¨ pY (y) @ x & y. Recall that two events A and B are


independent if and only if P( A X B) = P[ A] ¨ P[ B].

2. FX,Y ( x, y) = FX ( x ) ¨ FY (y) @ x & y


p X,Y ( x, y)
3. (a) p X|Y ( x|y) = p X ( x ); where p X|Y ( x|y) =
pY ( y )
p X,Y ( x, y)
(b) pY|X (y|x ) = pY (y); where pY|X (y|x ) =
pX (x)

Example 7.2.8 (Association of Gender and CHD Cont’d)


The joint probability distribution of X & Y is given in Table:

Are CHD and Gender independent random variables?


Solution:

179
Joint Distributions 7.2 Joint Distributions: Discrete case

CHD Absent CHD Present


( X = 0) ( X = 1)
Female (Y = 0) 977/2000 23/2000
Male (Y = 1) 948/2000 52/2000

CHD Absent CHD Present P (Y )


( X = 0) ( X = 1)
Female 1925/2000 ˆ 1/2 ‰ 977/2000 75/2000 ˆ 1/2 ‰ 23/2000 1/2
(Y = 0 )
Male 1925/2000 ˆ 1/2 ‰ 948/2000 75/2000 ˆ 1/2 ‰ 52/2000 1/2
(Y = 1 )
P( X ) 1925/2000 75/2000

Table 7.3

For independence p X,Y ( x, y) = p X ( x ) ¨ pY (y) @ x& y

The condition for independence, i.e., p X,Y ( x, y) = p X ( x ) ¨ pY (y) @ x& y does not hold
true, (see Table 7.3). So X and Y are not independent. There is association between Gender
and CHD.

Example 7.2.8

180
Joint Distributions 7.3 Joint Distributions: Continuous Case

7.3 IJ Joint Distributions: Continuous Case

Definition 7.3.1 (Joint Probability Density Function (pd f )).

When X and Y are continuous random variables, the joint density function f ( x, y)
describes the likelihood that the pair ( X, Y ) belongs to the neighborhood of the
point ( x, y). The joint pd f of X, Y „ U (0, 1) is visualized as a surface lying above
the xy plane (see Figure 7.3.1).
Properties:

1. The joint density is always nonnegative, i.e.,

f X,Y ( x, y) ě 0

2.
ż8 ż8
f ( x, y)dx ¨ dy = 1
´8 ´8

3. The joint density can be integrated to get probabilities, i.e., if A and B are
sets of real numbers, then
ż ż
PXPA,YPB = f ( x, y)dx ¨ dy
B A

181
Joint Distributions 7.3 Joint Distributions: Continuous Case

Figure 7.3.1.

f(x,y)

0.8 0.8

0.6 0.6

y 0.4 0.4 x

0.2 0.2

Joint Probability Density Function

Definition 7.3.2 (Marginal Probability Density Function).

If X and Y are continuous random variables with joint probability density function
f XY ( x, y), then the marginal density functions for X can be retrieved by integrating
over all y1 s: ż
f X (x) = f ( x, y)dy.
y

Similarly, the marginal density functions for Y can be retrieved by integrating over
all x1 s: ż
f Y (y) = f ( x, y)dx.
x

Example 7.3.3
Consider the joint pd f for X and Y:
12 2
f ( x, y) = ( x + xy) for 0 ď X ď 1; 0 ď Y ď 1
7
= 0 elsewhere

182
Joint Distributions 7.3 Joint Distributions: Continuous Case

1. Find the marginal pd f of X and Y


2. Find P( X ą Y ).
Solution:
ż1
12 2
f X (x) = ( x + xy)dy
0 7
1
12 2
= ( x y + xy /2)
2
7 0
12 6
= x2 + x
7 7

The marginal pd f of X is:


12 2 6
f (x) = x + x for 0 ď x ď 1
7 7
= 0 elsewhere

ż1
12 2
fY (y) = ( x + xy)dx
0 7
1
12 3
= ( x /3 + x y/2)
2
7 0
1
= (4 + 6y)
7

The marginal pd f of Y is:


1
f (y) = (4 + 6y) for 0 ď y ď 1
7
= 0 elsewhere
ż1ż1
12 2
P( X ą Y ) = ( x + xy)dxdy
0 y 7
1

ż1
12 3
= ( x /3 + x y/2) dy
2
0 7 y
ż1 
12 3 3
= + 12/14y ´ 12/21y ´ 12/14y dy
0 21
9
=
14

Example 7.3.3

183
Joint Distributions 7.3 Joint Distributions: Continuous Case

7.3.1 §§ Joint Cumulative Distribution Function (cd f )

Definition 7.3.4 (Joint Cumulative Distribution Function (cd f ): Continuous Case).

Let X, Y be jointly continuous random variables with joint density f (X,Y ) ( x, y), then
the joint cumulative distribution function F ( a, b) is defined as:
żb ża
P( X ď a and Y ď b) = F ( a, b) = f ( x, y)dx.dy
´8 ´8

Properties:

1. 0 ď FX,Y ( a, b) ď 1 @ a, b

2. lim FXY ( a, b) = 0
a,bÑ´8

3. lim FXY ( a, b) = 1
a,bÑ8

4. For a ă b & c ă d ùñ FX,Y ( a, c) ď FX,Y (b, d)

We can obtain joint pd f from joint cd f by:

B2
f ( x, y) = F ( x, y)
Bx ¨ By

wherever the derivative is defined.

Figure 7.3.2 shows the joint cd f of X, Y „ U (0, 1). The probability F ( x, y) corresponds
to the volume under f ( x, y) = 1, which is shaded. F ( x, y) is an increasing function of X, Y
that is also evident from the shaded part.

184
Joint Distributions 7.3 Joint Distributions: Continuous Case

Figure 7.3.2.

F(x,y)

0.8 0.8

0.6 0.6

y 0.4 0.4 x

0.2 0.2

Joint Cumulative Distribution Function

Example 7.3.5 (Joint pd f from Joint cd f )


The joint cd f of X and Y
1
F ( x, y) = xy( x + y); for 0 ď x ď 2 & 0 ď y ď 2
16
1. Find the joint pd f f ( x, y)
2. Find the marginal pd f ’s of X and Y respectively.

Solution:
1.
B2 1
f ( x, y) = xy( x + y); for 0 ď x ď 2 & 0 ď y ď 2
Bx ¨ By 16
1 B 2
= ( x + 2xy); for 0 ď x ď 2 & 0 ď y ď 2
16 Bx
1
= (2x + 2y); for 0 ď x ď 2 & 0 ď y ď 2
16
1
= ( x + y); for 0 ď x ď 2 & 0 ď y ď 2
8

185
Joint Distributions 7.3 Joint Distributions: Continuous Case

1
f ( x, y) = ( x + y) for 0 ď X ď 2; 0 ď Y ď 2
8
= 0 elsewhere
2.
ż2
1
f X (x) = ( x + y)dy
0 8
1
= ( x + 1)
4
ż2
1
fY (y) = ( x + y)dx
0 8
1
= ( y + 1)
4

The marginal pd f of X is:


1
f (x) = ( x + 1) for 0 ď X ď 2
4
= 0 elsewhere
The marginal pd f of Y is:

1
f (y) = (y + 1) for 0 ď Y ď 2
4
= 0 elsewhere

Example 7.3.5

Example 7.3.6
Consider the joint pd f for X and Y:
12 2
f ( x, y) = ( x + xy) for 0 ď X ď 1; 0 ď Y ď 1
7
= 0 elsewhere
Find the joint cd f of X and Y
Solution:
żyżx
12 2
FX,Y ( x, y) = ( x + xy)dxdy
0 0 7
x
12 y 3
ż
= ( x /3 + x y/2) dy
2
7 0
y0
12
= ( x3 y/3 + x2 y2 /4)
7 0
1 2
= x y(4x + 3y)
7

186
Joint Distributions 7.3 Joint Distributions: Continuous Case

Example 7.3.6

7.3.2 §§ Independent Random Variables

Definition 7.3.7 (Independent Random Variables).

Let X, Y be jointly continuous random variables with joint density f (X,Y ) ( x, y) and
marginal densities f X ( x ), f Y (y). We say that X and Y are independent if

1. f X,Y ( x, y) = f X ( x ) ¨ f Y (y) @ x & y

2. FX,Y ( x, y) = FX ( x ) ¨ FY (y) @ x & y


f X,Y ( x, y)
3. (a) f X|Y ( x|y) = f X ( x ); where f X|Y ( x|y) =
fY (y)
f X,Y ( x, y)
(b) f Y|X (y|x ) = f Y (y); where f Y|X (y|x ) =
f X (x)
If we know the joint density of X and Y, then we can use the definition to see if
they are independent.

Example 7.3.8
For the Example 7.3.3

12 2
f ( x, y) = ( x + xy) for 0 ď X ď 1; 0 ď Y ď 1
7
= 0 elsewhere

Are X and Y are independent?


Solution:

12 2 6
f X (x) = x + x
7 7
1
f Y (y) = (4 + 6y)
7

As f ( x, y) ‰ f X ( x ) ˆ f Y (y), so X and Y are not independent.


Example 7.3.8

187
Joint Distributions 7.4 Convolution

7.4 IJ Convolution
Engineers face a typical problem to determine the pd f of the sum of two random variables X
and Y, i.e., X + Y. This is a common problem as they have to evaluate the average of many
random variables, e.g., the sample mean of a collection of data points.

IJ Distribution of Sums of Independent Random Variables

Example 7.4.1
Let S be the sum that appears on the roll of 2 dice, i.e., S = X1 + X2 ;

P(S = 2) = 1/36
P(S = 3) = P( X1 = 1, X2 = 2) + P( X1 = 2, X2 = 1)
= P ( X1 = 1 ) P ( X2 = 2 ) + P ( X1 = 2 ) P ( X2 = 1 )
= 2/36

Example 7.4.1

Figure 7.4.1.

188
Joint Distributions 7.4 Convolution

Definition 7.4.2 (Convolution).

1. We have independent random variables X and Y with known distributions.

2. It is often important to be able to calculate the distribution of Z = X + Y


from the distributions of X and Y when X and Y are independent.

In probability theory, convolution is a mathematical operation that allows the


derivation of the distribution of a sum of two random variables from the distri-
butions of the two summands.

7.4.1 §§ Convolution: Discrete Case

Definition 7.4.3 (Convolution: Discrete Case).

Suppose that X and Y are independent, integer valued random variables having
probability mass functions Px and Py , then Z = X + Y is also an integer-valued
random variable with probability mass function. Using the Law of Total Probability,
and independence

PX +Y (z) = P( X + Y = z)
ÿ
= P( X = k, Y = z ´ k). Law of Total Probability
k
ÿ
= P( X = k ) ¨ P(Y = z ´ k). Independence
k
ÿ
= PX (k ) PY (z ´ k )
k

189
Joint Distributions 7.4 Convolution

Figure 7.4.2.

Rolling of 3 Dice

Example 7.4.4
Let X1 and X2 be the outcomes from the 2 dice roll, and let S2 = X1 + X2 be the sum of
these outcomes & S3 = X1 + X2 + X3 the sample space for 3 dice rolls. The distribution
for S3 would then be the convolution of the distribution for S2 with the distribution for X3 .
Find the distribution of S3 = 7.
Solution:

P ( S3 = 7 ) = P ( S2 = 6 ) P ( X3 = 1 )
+ P ( S2 = 5 ) P ( X3 = 2 )
+ P ( S2 = 4 ) P ( X3 = 3 )
+ P ( S2 = 3 ) P ( X3 = 4 )
+ P ( S2 = 2 ) P ( X3 = 5 )
= 5/36 + 1/6
+ 4/36 + 1/6
+ 3/36 + 1/6
+ 2/36 + 1/6
+ 1/36 + 1/6
= 15/216

190
Joint Distributions 7.4 Convolution

Example 7.4.4

Example 7.4.5 (Sum of common distributions)

n
ÿ
1. Let Xi „ Bernoulli ( p); i = 1, . . . , n. Then the distribution of Xi „ Bin(n; p).
i

2. Let X „ Bin(n1 ; p) and Y „ Bin(n2 ; p) are independent. Then the distribution of


X + Y is X + Y „ Bin(n1 + n2 ; p).

3. Let X „ Poi (λ) and Y „ Poi (µ) are independent. Then the distribution of X + Y, i.e.,
X + Y „ Poi (λ + µ).

4. Let Xi ; i = 1, . . . , r be independent random variables having the geometric distribution


with parameter p, i.e., P[ Xi = k ] = p(1 ´ p)k´1 for k = 1, 2, . . . (k: # of trials until
ÿr
1st success). Then Xi is Negative Binomial with parameters p and the fixed number
i
of successes r

Example 7.4.5

7.4.2 §§ Convolution: Continuous Case


Definition 7.4.6 (Convolution: Continuous Case).

Suppose that X and Y are independent, continuous random variables having proba-
bility density functions f x and f y . Then the density of their sum is the convolution
of their densities, i.e., let sum Z = X + Y is a continuous random variable with
density
ż8
f X +Y ( z ) = f X (z ´ y) f Y (y)dy
´8
ż8
= f X ( x ) f Y (z ´ x )dx
´8

Example 7.4.7 (Sums of Independent Uniform Random Variables)


Let X and Y be independent uniform random variables each on [0, 1].
"
1 if 0 ď x, y ď 1;
f X ( x ) = fY (y) =
0 otherwise

191
Joint Distributions 7.4 Convolution

Let Z = X + Y. The minimum possible value of Z is zero when x = 0 & y = 0, mid-interval


value of z = 1 when x + y = 1, and the maximum possible value is two, when x = 1 & y = 1.
Thus the sum Z is defined only on the interval (0, 2) since the probability of z ă 0 or z ą 2
is zero, that is,

P( Z|z ă 0) = 0 & P( Z|z ą 2) = 0.

ż8
f Z (z) = f X ( x ) f Y (z ´ x )dx
´8

ż8
Finding f X ( x ) f Y (z ´ x )dx comes to the same as finding the area of set: t( x, y)(0, 1) ˆ
´8
(0, 1)|x + y ď zu. As f X ( x ) = 1 if 0 ď x ď 1 and 0 otherwise

ż1
f Z (z) = f Y (z ´ x )dx
0

Plotting the region defined by the limits 0 ă x ă 1; & 0 ă z ´ x ă 1 in the ( x, z) plane, we


get the integrand as shown in Figure 7.4.3. The limits of integration on x depend on the
value of z: The integrand is 1 when 0 ă x ă 1 and 0 ă z ´ x ă 1 Ñ z ´ 1 ă x ă z, and zero
otherwise. There are three cases (as in Figure 7.4.3);

żz
1. When 0 ă z ă 1, the limits run from x = 0 to x = z, so f Z (z) = 1 ¨ dx = z.
0

ż1
2. When 1 ă z ă 2, the limits run from x = z ´ 1 to x = 1, so f Z (z) = 1 ¨ dx = 2 ´ z.
z´1

3. When z ă 0 or z ą 2, the integrand is zero, so f Z (z) = 0.

$
& z if 0 ď z ď 1,
f Z (z) = 2 ´ z if 1 ă z ď 2
0 otherwise
%

Convolution of two rectangle functions will give you a triangle function. (See the pdf of Z in
Figure 7.4.4).
Example 7.4.7

192
Joint Distributions 7.4 Convolution

Figure 7.4.3.

193
Joint Distributions 7.4 Convolution

Figure 7.4.4.

1.0
0.8
0.6
f(z)

0.4
0.2
0.0

0.0 0.5 1.0 1.5 2.0

Convolution of two independent Uniform Densities: Triangular Distribution

Example 7.4.8 (Sums of Independent Normal Distribution)


If X1 , X2 , . . . , Xn are independent Normal random variables, with expected values µ1 , µ2 , . . . , µn ,
n
ÿ
2 2 2
respectively, and with variances σ1 , σ2 , . . . , σn , respectively, then X i = X1 + X2 + ¨ ¨ ¨ + X n
i =1
n
ÿ
is also a Normal random variable with expected value µi = µ1 + µ2 + ¨ ¨ ¨ + µn , and vari-
i =1
n
ÿ
ance σi2 = σ12 + σ22 + ¨ ¨ ¨ + σn2 , i.e.,
i =1

n
ÿ n
ÿ n
ÿ 
Xi „ N µi , σi2
i =1 i =1 i =1

Example 7.4.8

194
Joint Distributions 7.4 Convolution

Example 7.4.9 (Sums of Independent Exponential Distribution)


Let X and Y be independent exponential random variables, each having parameter λ. Let
Z = X + Y, the density of their sums Z is

f X +Y ( z ) = f ( X + Y = z )
ż8
= f X ( x ) f Y (z ´ x )dx
´8
żz
= λe´λx λe´λ(z´x) dx
0
= λ2 e´λz z

This density is called the Gamma(2, λ) density. The convolution of n = 2 i.i.d.2 Exponential
distributions results in the Gamma(n = 2, λ) density.
Example 7.4.9

2 independent and identically distributed

195
Joint Distributions 7.5 Home Work

7.5 IJ Home Work


1. An association study of color blindness ( X ) with 2 possible outcomes: Yes=1 for color
blind and No=0 for not color blinded with gender (Y ) coded as 1 for females and 0 for
males. The joint frequency distribution of X and Y is given in Table:

Color blinded/Gender Male=0 Female=1 Total


Yes=1 16 2 18
No=0 240 254 494
Total 256 256 512

(a) Find FX,Y (1, 0)


(b) Find FX,Y (0, 1)

2. Let ( X, Y ) have the joint pmf given in Table below:

X=1 X=2 X=3


Y=1 0.3 0.1
Y=2 0.2 0.2
Y=3 0.1 0.1

(a) Find P( X ) and P(Y )


(b) Find P( X = Y ); & P( X ą Y ),
(c) Find P(Y = 2|X = 2)

3. Consider the following joint pm f , f (0, 0) = 1/12; f (1, 0) = 5/12; f (0, 1) = f (1, 1) =
3/12; f ( x, y) = 0 for all other values. Find the marginal distributions of X and Y
respectively.

4. Suppose that a radioactive particle is randomly located in a square with sides of unit
length. Let X and Y denote the coordinates of the particle’s location. A reasonable
model for the relative frequency histogram for X and Y is the bivariate analogue of the
univariate uniform density function:

f ( x, y) = 1 for 0 ď X ď 1; 0 ď Y ď 1
= 0 elsewhere

a. Verify that f ( x; y) is a valid pd f .


b. Find F (0.2, 0.4).
c. Find P(0.1 ď X ď 0.3; 0 ď Y ď 0.5)

5. Suppose that two continuous random variables X and Y have a joint probability density
function

f X,Y ( x, y) = A( x ´ 3)y for ´ 2 ď X ď 3 & 4 ď Y ď 6


= 0 otherwise

196
Joint Distributions 7.5 Home Work

a. Find A.
b. Construct the marginal probability density functions f X ( x ) and f Y (y).
c. Are the random variables X and Y independent?

6. A certain process for producing an industrial chemical yeilds a product that contains
two main types of impurities. Let X denote the proportion of impurities of Type I and
Y denote the proportion of impurities of Type II. Suppose that the joint density of X
and Y can be modelled as,

f X,Y ( x, y) = 2(1 ´ x ) for 0 ď X ď 1 & 0 ď Y ď 1


= 0 otherwise

Compute P(0 ď X ď 0.5; 0.4 ď Y ď 0.7).

§§ Answers
1. 256/512; 494/512

2. p X (1) = 0.1, p X (2) = 0.6; p X (3) = 0.3


pY (1) = 0.4, pY (2) = 0.4; & pY (3) = 0.2
P( X = Y ) = 0.2; & P( X ą Y ) = 0.6, P( X = 2|Y = 2) = 1/2

3. f X (0) = 1/3, f X (1) = 2/3


f Y (0) = 1/2, f Y (1) = 1/2.

4. 0.08; 0.10
1 2(3 ´ x ) y
5. A = ´ ; f X (x) = ; for ´ 2 ď x ď 3; f Y (y) = ; for 4 ď y ď 6; The
125 25 10
random variables X and Y are independent.

6. 0.225

197
Joint Distributions 7.5 Home Work

198
Chapter 8

Properties of Expectation

AS YOU READ . . .

1. What is expectation for jointly distributed random variables?

2. What are the conditional distributions and the respective expectations?

3. What are the Covariance and Correlation measures?

8.1 IJ Jointly Distributed Variables: Expectation for Discrete Case

Definition 8.1.1 (Expectation: Discrete Case).

If X and Y are jointly distributed discrete random variables, to calculate E( X ) or


Var ( X ) (i.e. values related to only 1 of the 2 variables), first calculate the marginal
distribution PX ( x ) from the joint distribution. The marginal pm f of X, PX ( x ) is
defined as: ÿ
PX ( x ) = PXY ( x, y)
y

The mean of X is still given by:


ÿ
E[ X ] = x ¨ P( x )
x
ÿ ÿ
= x ¨ PXY ( x, y)
x y

199
Properties of Expectation
8.2 Jointly Distributed Variables: Expectation for Continuous Case

8.2 IJ Jointly Distributed Variables: Expectation for Continuous Case

Definition 8.2.1 (Expectation: Continuous Case).

If X and Y are jointly distributed continuous random variables, then initially cal-
culate the marginal pd f of X defined as:
ż
f X ( x ) = f XY ( x, y)dy
y

The mean of X is then given by:


ż8
E[ X ] = x ¨ f ( x )dx
´8
ż8 ż8
= x¨ f XY ( x, y)dydx
´8 ´8

§§ Expectation: Properties

E[ X ] is a weighted average of the possible values of X, it follows that if a ď X ď b, then so


must its expected value. That is, if

P( a ď X ď b) = 1, then
a ď E[ X ] ď b

A fundamental property of the expectation operator is that it is linear. If X and Y are jointly
distributed random variables and a, b are real numbers, then

E[ aX + bY ] = aE[ X ] + bE[Y ]

200
Properties of Expectation 8.3 Some Function of Jointly Distributed Random Variable

8.3 IJ Some Function of Jointly Distributed Random Variable


Definition 8.3.1 (Expectation: Some Function of Random Variables).

Suppose that X and Y are jointly distributed random variables and g( x, y) is a


function of two variables.

1. For discrete X and Y


  ÿÿ
E g( X, Y ) = g( x, y) P( x, y)
y x

2. For continuous X and Y


 
ż ż
E g( X, Y ) = g( x, y) f ( x, y)dxdy
y x

Example 8.3.2 (Association between Gender and Color-blindness)


The following table shows the joint pm f of Color-blindness that is coded as X = 0 for No,
X = 1 for Yes and Gender coded as Y = 0 for Males, Y = 1 for Females:
X=Color blinded/Y=Gender Male=0 Female=1 Total
No=0 240/512 254/512 494/512
Yes=1 16/512 2/512 18/512
Total 256/512 256/512 1
 
Let g( X, Y ) = XY. Find E g( X, Y )
Solution:
  ÿÿ
E g( X, Y ) = g( x, y) P( x, y)
y x
= (0) ˆ 240/512 + (0 ˆ 1) ˆ 254/512
+ (1 ˆ 0) ˆ 16/512 + (1 ˆ 1) ˆ 2/512
= 0 + 0 + 0 + 2/512
= 2/512

Example 8.3.2


Example 8.3.3 Expectation and Variance of X
Let X1 , . . . , Xn be i.i.d.1 random variables having distribution function F and expected value

1 independent identically distributed

201
Properties of Expectation 8.3 Some Function of Jointly Distributed Random Variable

n
2
ÿ Xi
µ and variance σ . Let X =
n
i =1

ÿ
n 
Xi
E( X ) = E
n
i =1
ÿ n 
1
= E Xi
n
i =1
n  
1ÿ
= E Xi
n
i =1
1
= nµ
n

ÿ
n 
Xi
Var ( X ) = Var
n
i =1
 2 ÿ n 
1
= ¨ Var Xi
n
i =1
 2 ÿ n  
1
= ¨ Var Xi
n
i =1
 2
1
= ¨ nσ2
n
σ2
=
n

The same results were also displayed in the Central Limit Theorem (see Figure 6.2.3 and
Figure 6.2.5). The reason for the much smaller variability is now mathematically evident.
The variance of the distribution of the sample means x̄ is scaled down by a factor of size n,
the sample size.

Example 8.3.3

202
Properties of Expectation 8.3 Some Function of Jointly Distributed Random Variable

8.3.1 §§ Expectation of Sums of Jointly Distributed Random Variables


Definition 8.3.4 (Expectation of Sums of Jointly Distributed Random Variables).

Suppose that X and Y are random variables with joint probability mass function
PXY and marginal probability mass functions PX and PY . Then E[ X + Y ] is given
by
ÿÿ
E[ X + Y ] = ( x + y) P( x, y)
x y
= E( x ) + E(y)

Example 8.3.5 (Association between Gender and Color-blindness Cont’d)


For Example 8.3.2, find E[ X + Y ] and show that E[ X + Y ] = E( x ) + E(y).
Solution:
Let g( X, Y ) = X + Y
  ÿÿ
E g( X, Y ) = g( x, y) P( x, y)
y x
ÿÿ
= ( x + y) ¨ P( x, y)
y x
= (0 + 0) ˆ 240/512 + (0 + 1) ˆ 254/512
+ (1 + 0) ˆ 16/512 + (1 + 1) ˆ 2/512
= 0 + 254/512 + 16/512 + 2(2/512)
= 274/512
  ÿ
E X = xP( x, y)
x
= 0 ˆ 240/512 + 0 ˆ 254/512
+ 1 ˆ 16/512 + 1 ˆ 2/512
= 18/512
  ÿ
E Y = yP( x, y)
y
= 0 ˆ 240/512 + 1 ˆ 254/512
+ 0 ˆ 16/512 + 1 ˆ 2/512
= 256/512
E[ X + Y ] = 274/512
= 18/512 + 256/512
= E ( X ) + E (Y )
6 E [ X + Y ] = E ( X ) + E (Y )

203
Properties of Expectation 8.3 Some Function of Jointly Distributed Random Variable

Another result is also evident that as X and Y are indicator random variables, so their
expectation is equal to their respective marginal probability at X = 1 or at Y = 1.

Example 8.3.5

8.3.2 §§ Expectation of Sums of Functions of Jointly Distributed Random Variables

Definition 8.3.6 (Expectation of Sums of Functions of Random Variables).

Suppose that X and Y are random variables and g( x, y) and h( x, y) are some func-
tions of the two variables, then:
     
E g( X, Y ) ˘ h( X, Y ) = E g( X, Y ) ˘ E h( X, Y )

Example 8.3.7 (Association between Gender and Color-blindness Cont’d)


For Example 8.3.2, let g( X, Y ) = X + Y and h( X, Y ) = X ´ Y find E[ X ´ Y ] and show that

     
E g( X, Y ) + h( X, Y ) = E g( X, Y ) + E h( X, Y )

Solution:

Let g( X, Y ) = X + Y and h( X, Y ) = X ´ Y
 
From Example 8.3.5, E g( X, Y ) = 274/512

204
Properties of Expectation 8.3 Some Function of Jointly Distributed Random Variable

  ÿÿ
E h( X, Y ) = h( x, y) P( x, y)
y x
ÿÿ
= ( x ´ y) ¨ P( x, y)
y x
= (0 ´ 0) ˆ 240/512 + (0 ´ 1) ˆ 254/512
+ (1 ´ 0) ˆ 16/512 + (1 ´ 1) ˆ 2/512
= 0 ´ 254/512 + 16/512 + 0
= ´238/512

   
E g( X, Y ) + h( X, Y ) = E ( X + Y ) + ( X ´ Y )
 
= E 2X
 
= 2E X
= 2 ˆ 18/512
= 18/256

       
E g( X, Y ) + E h( X, Y ) = E ( X + Y ) + E ( X ´ Y )
= 274/512 + (´238/512)
= 18/256
     
6 E g( X, Y ) + h( X, Y ) = E g( X, Y ) + E h( X, Y )

Example 8.3.7

Definition 8.3.8 (Independent Random Variables).

Two discrete random variables are independent if:

PX,Y ( x, y) = PX ( x ) ¨ PY (y) @ x& y

If two random variables are independent, then the expectation of the product factors
into a product of expectations, i.e.,
     
E g ( X ) h (Y ) = E g ( X ) ¨ E h (Y )

In particular,
E( XY ) = E( X ) ¨ E(Y )

205
Properties of Expectation 8.4 Conditional Distribution

8.4 IJ Conditional Distribution


For jointly distributed random variables X and Y, we can define their conditional distribu-
tions, which quantify the probability of X = x given Y = y.

8.4.1 §§ Conditional Distributions: Discrete Case


Definition 8.4.1 (Discrete Case: Conditional Distribution).

The conditional probability mass function of a discrete random variable X, given


the value of the other random variable Y, for all values of y such that pY (y) ą 0 is

PX|Y ( x|y) = P( X = x|Y = y)


p X,Y ( x, y)
=
pY ( y )

§§ Discrete Case: Conditional Distribution Properties


A conditional probability distribution PX|y ( x ) has the following properties:
For discrete random variables ( X, Y )

1. PX|y ( x ) ě 0
ÿ
2. PX|y ( x ) = 1
x

3. PX|y ( x ) = P( X = x|Y = y)

8.4.2 §§ Conditional Expectation: Discrete Case


Definition 8.4.2 (Discrete Case: Conditional Expectation).

The expectation of a function of two random variables conditioned on one of them


taking a certain value can be computed using the conditional pm f or conditional
pd f . The conditional mean of Y given X = x, denoted as E(Y|X ) is:
ÿ
E(Y|x ) = yP(Y|X = x )
y

Example 8.4.3 (Association between Gender and Color-blindness Cont’d)


For the Example 8.3.2

206
Properties of Expectation 8.4 Conditional Distribution

(a). Find the conditional distribution of X given Y = 0 and X given Y = 1.


(b). Find the E( X|Y = 0)
(c). Find the E( X|Y = 1)
Solution:
(a).
P( X, Y = 0)
P( X|Y = 0) =
P (Y = 0 )
240/512
P( X = 0|Y = 0) =
256/512
15
=
16
16/512
P( X = 1|Y = 0) =
256/512
1
=
16

The conditional distribution of X|Y = 0 is


15 1
P( X|Y = 0) = ;
16 16
The conditional probability distribution P( X|Y = 0) is a probability mass function as
it satisfies the properties of pm f .
P( X, Y = 1)
P( X|Y = 1) =
P (Y = 1 )
254/512
P( X = 0|Y = 1) =
256/512
127
=
128
2/512
P( X = 1|Y = 1) =
256/512
1
=
128

Similarly, the conditional probability distribution P( X|Y = 1) is also a probability


mass function.
(b).
  ÿ
E X|Y = 0 = xP( X|Y = 0)
x
= 0 ˆ 15/16 + 1 ˆ 1/16
= 1/16

207
Properties of Expectation 8.4 Conditional Distribution

(c).
  ÿ
E X|Y = 1 = xP( X|Y = 1)
x
= 0 ˆ 127/128 + 1 ˆ 1/128
= 1/128

Conditional expectations such as E[ X|Y = 0] or E[ X|Y = 1] are numbers that depend on y.


So E[ X|Y ] is a function of y. E( X|Y ) is a random variable that is a function of Y. So the
pm f of E( X|Y ) is:

E( X|Y ) 1/16 1/128 Sum


p (Y ) 1/2 1/2 1

Example 8.4.3

8.4.3 §§ Conditional Distribution: Continuous Case

Definition 8.4.4 (Continuous Case: Conditional Distribution).

The conditional probability density function of a continuous random variable X,


given the value of the other random variable Y = y for all values of y such that
f (Y ) ą 0, is

f X|Y ( x|y) = f ( X = x|Y = y)


f X,Y ( x, y)
=
fY (y)

§§ Continuous Case: Conditional Distribution Properties


A conditional probability distribution f X|y ( x ) has the following properties:
For continuous random variables ( X, Y )

1. f X|y ( x ) ě 0
ż
2. f X|y ( x|y)dx = 1
x

3. f X|y ( x ) = f ( X = x|Y = y)

208
Properties of Expectation 8.4 Conditional Distribution

8.4.4 §§ Conditional Expectation: Continuous Case


Definition 8.4.5 (Continuous Case: Conditional Expectation).

The conditional mean of Y given X = x, denoted as E(Y|X ), is


ż
E(Y|X = x ) = y f (Y|X = x )dy

The conditional mean reduces distributions to single summary measure.

Example 8.4.6
The joint cd f of X and Y
1
F ( x, y) = xy( x + y); for 0 ď x ď 2 & 0 ď y ď 2
16
Find the conditional expectation E(Y|X ).
Solution:
The joint pd f , marginal pd f of X and Y were computed in Example 7.3.5.
#
1
( x + y) for 0 ď X ď 2; 0 ď Y ď 2
f ( x, y) = 8
0 elsewhere
#
1
( x + 1) for 0 ď X ď 2
f (x) = 4
0 elsewhere
#
1
(y + 1) for 0 ď Y ď 2
f (y) = 4
0 elsewhere

1
8 ( x + y)
f (Y|X ) = 1
4 ( x + 1)
 
x+y
= 1/2
x+1
ż2  
x+y
E(Y|X ) = 1/2 y dy
0 x+1
2
1
= ( xy /2 + y /3)
2 3
2( x + 1) 0
x + 4/3
=
x+1
The conditional mean E(Y|X ) is a function of X.
Example 8.4.6

209
Properties of Expectation 8.4 Conditional Distribution

8.4.5 §§ Properties of Conditional Expectation


Definition 8.4.7 (Conditional Expectation of Some Function of Random Variable).

Suppose that X and Y are random variables and g( x, y) is a function of two variables,
then the conditional expectation of g( X|Y ) is:

• Discrete Case:   ÿ
E g( X )|Y = y = g( x ) PX|Y ( x|y)
x

• Continuous Case:
 
ż
E g( X )|Y = y = g( x ) f X|Y ( x|y)dx
x

Definition 8.4.8 (Properties of Conditional Expectation).

a. If X1 , X2 , . . . , Xi , . . . , Xk and Y are random variables, then


ÿ  ÿ  
E Xi |Y = y = E Xi |Y = y
i i

b. Law of iterated Expectations (Total Expectation Theorem): Iterated expec-


tation is a useful tool to compute the expected value of a certain quantity that
depends on several random variables. As we have seen previously in Example
8.4.3 E( X|Y ) is a random
 variable
 that is a function of Y, so its expectation
can be calculated as E E[ X|Y ] . The idea is that the expected value can be
obtained as the expectation of the conditional expectation.
$ ÿ
’ E[ X|Y ] ¨ P(Y = y) Discrete
  & y

E E[ X|Y ] = ż


% E[ X|Y ] ¨ f (Y )dy Continuous
y

 
=E X

c. If X and Y are independent random variables, then


   
E X|Y = y = E X
   
E Y|X = x = E Y

210
Properties of Expectation 8.4 Conditional Distribution

Example 8.4.9 (Association between Gender and Color-blindness Cont’d)


 
Find E E[ X|Y ] for this dataset.
Solution:
The pm f of E[ X|Y ] was evaluated in Example 8.4.3.

E( X|Y ) 1/16 1/128 Sum


p (Y ) 1/2 1/2 1

As E[ X|Y ] is a random variable, its expectation can be computed using the law of iterated
expectations as:
  ÿ
E E[ X|Y ] = E[ X|Y ] P(Y = y)
y
= 1/16 ˆ 1/2 + 1/128 ˆ 1/2
= 9/256
= E( X )

Example 8.4.9

Example 8.4.10 (Wage Distribution)


Suppose in a firm of 100 employees, 40 have a University degree. Let Y be the monthly aver-
age income paid by the firm. The average income of University degree employees is twice the
average income of the non-University degree employees. Suppose that the average income of
the non-University degree employees is Rs.100,000, what is the expected monthly income Y?
Identify the expectation property/law used if any.
Solution:
Let X be degree status with X = 0 for non-University degree employee and X = 1 for
University degree employee. We are given that
"
0.4 for x = 0;
P( X = x ) =
0.6 for x = 1.

We need to find the conditional expectation of average income given degree status.
     
E E(Y|X ) = E Y|X = 0 P( X = 0) + E Y|X = 1 P( X = 1)
= 100000 ˆ 0.6 + 200000 ˆ 0.4
= 60, 000 + 80, 000
= 140, 000

Using the law of iterated expectations.


Example 8.4.10

211
Properties of Expectation 8.5 Covariance

8.5 IJ Covariance
The covariance matrix of a random vector captures the interaction between the components
of the vector. The diagonal entries contain the variance of each variable and the covariances
between the different variables are placed in the off diagonals.
Example 8.5.1 (Iris Data Set: Covariance Matrix)
The covariance matrix between Petal Length and Petal Width in Fisher’s Iris dataset is given
below:
" #
σx2 σxy
Σ=
σxy σy2
 
0.68 0.29
=
0.29 0.18
Figure 8.5.1 shows the scatter plot matrix for Fisher’s Iris dataset. Each scatter plot shows
the relationship between a pair of variables Petal Length, Petal Width, Sepal Length and
Sepal Width.

Figure 8.5.1.

4.5 5.5 6.5 7.5


7.5
6.5

Sepal.Length
5.5
4.5




7 2.0 2.5 3.0 3.5 4.0



● ● ●●
● ●●
● ●● ●
●●● ●
● ● ●●● ● ● ●●
● ●● ●
● ●●
●●

● ●
●● ●●● ●
● ●●
●● ●●● ● ●● ●●● ●●●● ●● ●●
Sepal.Width
● ●● ●●●●● ● ●
●●● ●●●●● ● ● ●
● ● ● ● ●●
● ●● ● ●
● ● ●●● ● ●
● ●
● ● ● ●
●●

● ●
●● ● ● ●
● ● ● ●
● ●● ●
● ● ● ●● ●
6

● ●●● ●● ● ● ●

● ●
● ●● ● ● ●●●●●●
● ●
● ●● ●● ● ●●● ●
●●● ●●
●● ●
●● ●● ●●●● ● ● ●● ●●
●●●
5

● ● ● ●●
● ●

●●● ●●●● ●
● ●
●●
●●
●●
●●
● ●●●● ● ●●● ●● ●● ●● ●●●
● ● ●
●●● ● ● ●●●
●●
●● ● ●● ●● ●
● ●
●●●
● ●
Petal.Length
4

● ●●● ●● ●
●● ●
● ● ● ● ●
●● ●●
● ●
3
2

● ● ● ●
● ● ● ●●●●
●● ● ●
●●●
●●● ●●
●●
●●●●● ● ●● ●● ●●
● ● ●
● ●●●●
●●● ●● ● ●●
● ●
●●●●
●● ●● ●
● ● ● ● ● ● ●
1
0.5 1.0 1.5 2.0 2.5

0.5 1.0 1.5 2.0 2.5

● ● ● ● ● ● ●●
● ● ● ● ● ● ● ●
● ● ●●● ● ● ●●● ● ●●●● ●●● ●
●● ● ● ● ● ●● ●
● ●●● ● ● ● ●● ● ●●●●● ●
●● ● ●● ● ● ● ● ● ●●●● ●●
● ●● ● ● ●● ●●● ●
●●●●●●● ● ●● ● ●●●●●● ●●● ●●●● ●
● ● ● ● ● ●
● ● ● ● ● ●● ●● ● ●
● ● ●● ●●●● ● ● ● ● ●●●●● ● ●●●●●●
● ● ●●● ● ●●●●●●● ● ●●●● ●

● ●●
●●● ●●●● ●
● ●● ●
● ● ●●●●
●●
●●● ●

● ●●●●●●●
●●●● ●
●●
Petal.Width
●● ● ●● ● ● ●●● ●● ●●● ●●

● ● ●
● ● ●
●● ● ● ● ●●● ● ●●●●●
●● ● ●● ● ● ● ●● ● ●●●●
● ●●●●●●●●●● ● ●●●●
●●●●●●●● ● ● ●●●
●●
●●
●●●●●
● ●● ● ●● ● ● ● ●●

4.5 5.5 6.5 7.5 2.0 2.5 3.0 3.5 4.0 1 2 3 4 5 6 7 0.5 1.0 1.5 2.0 2.5

Scatterplot Matrix for Iris Dataset

Example 8.5.1

212
Properties of Expectation 8.5 Covariance

Example 8.5.2 (Covariance: Shape of the Data)


The covariance matrix defines the shape of the data. Diagonal spread is captured by the
covariance, while axis-aligned spread is captured by the variance, (see Figure 8.5.2).

Figure 8.5.2.

Link Between Shape of the Data and the Covariance Matrix:

https://www.visiondummy.com/2014/04/geometric-interpretation-covariance-matrix/

Example 8.5.2

213
Properties of Expectation 8.5 Covariance

Definition 8.5.3 (Covariance).

The covariance is a measure of joint variability between two random variables X


and Y and denoted as σXY is:
  
σXY = E X ´ E( X ) . Y ´ E(Y )
     
= E XY ´ E X ¨ E Y

If X and Y are independent, then E( XY ) = E( X ) ¨ E(Y ),


     
σXY = E XY ´ E X ¨ E Y
       
= E X ¨E Y ´E X ¨E Y
=0

However, the converse is not true.

Example 8.5.4
Let X and Y be two independent Bernoulli random variables with parameter p = 1/2. Con-
sider the random variables
U = X+Y
V = X´Y

Calculate the covariance between U and V.


Solution:
Note that
PU (0) = P( X = 0; Y = 0)
= 1/4
PV (0) = P( X = 1; Y = 1) + P( X = 0; Y = 0)
= 1/2
PU,V (0, 0) = P( X = 0; Y = 0)
= 1/4
PU,V (0, 0) ‰ PU (0) PV (0)
so U and V are not independent. However, they are uncorrelated as
     
σUV = E UV ´ E U ¨ E V
     
= E ( X + Y )( X ´ Y ) ´ E ( X + Y ) ¨ E ( X ´ Y )
= E ( X 2 ) ´ E (Y 2 ) ´ E ( X ) 2 + E (Y ) 2
=0
The final equality holds because X and Y have the same distribution.
Example 8.5.4

214
Properties of Expectation 8.6 Correlation

8.5.1 §§ Properties

Definition 8.5.5 (Covariance: Properties).

Cov( X, Y ) = Cov(Y, X ); Symmetry


Cov( X, X ) = Var ( X ) ě 0;
Cov( aX, Y ) = aCov( X, Y )
ÿ ÿ  ÿÿ
Cov Xi , Yj = Cov( Xi , Yj )
i j i j

Example 8.5.6 (Association between Gender and Color-blindness Cont’d)


Find the covariance σXY between Color-blindness X and Gender Y in Example 8.3.2.
Solution:

  
σXY = E X ´ E( X ) . Y ´ E(Y )
     
= E XY ´ E X .E Y

     
E XY = 2/512 was computed in Example 8.3.2, while E X = 18/512 and E Y =
256/512 in Example 8.3.5.
     
σXY = E XY ´ E X ¨ E Y
= 2/512 ´ 18/512 ˆ 256/512
= ´7/512

There is weak negative association between Gender and color-blindness.


Example 8.5.6

8.6 IJ Correlation

§§ Background
The covariance does not take into account the magnitude of the variances of the random
variables involved. Correlation quantifies the strength of the linear relationship between a
pair of random variables.

215
Properties of Expectation 8.6 Correlation

Figure 8.6.1.

Definition 8.6.1 (Correlation: Strength of Linear Relationship).

The correlation is a measure of mutual relationship between two random variables


X and Y, denoted by ρ( X, Y ). Correlation is a scaled version of covariance and
is obtained by normalizing the covariance using the standard deviations of both
variables. Mathematically, ρ( X, Y ) is defined as:

Cov( X, Y )
ρ( X, Y ) = a
Var ( X ).Var (Y )

Note: The variance is only zero when a random variable is constant. So, as long as
X and Y are not constant, then the correlation between them is well-defined.

´1 ď ρ( X, Y ) ď 1

• A value of ρ( X, Y ) « 1 indicates a high degree of linear relationship between


X and Y,

• ρ( X, Y ) « 0 means weak linear relationship

• A value of ρ( X, Y ) « ´1 indicates a high degree of negative linear relationship


between X and Y.

216
Properties of Expectation 8.6 Correlation

The scatter plots below illustrate the cases of strong and weak (positive or negative)
correlation. Figure 8.6.2 shows the strong positive linear association between duration of
the eruption and waiting time between eruptions for the Old Faithful geyser in Yellowstone
National Park, Wyoming, USA.

Figure 8.6.2.



● ●


90

● ●● ●●●
● ● ●
● ● ● ● ●●
● ●
● ●● ● ●
● ● ● ●●●
●● ●● ●●● ●● ●
● ● ● ● ● ●●● ● ●
● ● ●●
●●
● ●● ● ● ●
● ● ● ●● ● ●● ● ●● ●
80

● ● ● ● ●● ●
● ● ● ● ●● ● ● ●●
● ●
● ● ● ●● ● ● ●● ●● ●●
●● ●● ● ● ● ● ● ●
●● ●● ● ● ● ● ●
● ● ●
● ● ●
●●
● ● ● ● ●●
● ● ● ●● ●

waiting

● ● ● ● ●
70

● ● ●●
● ●


● ●
● ● ●
● ● ● ●
● ● ●
● ● ● ●
60

● ● ● ●● ●
● ●● ● ● ●●
● ●● ●
● ●
●● ●
● ●● ● ● ●

●●●●● ● ●
●● ●● ● ●
● ● ● ● ●
●●●● ● ●
50

● ● ● ●
●●● ● ●
● ● ●
● ● ●
●●
● ●
●● ●

1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

eruptions

Scatterplot for Old Faithful Geyser Dataset; r = 0.90

217
Properties of Expectation 8.6 Correlation

Figure 8.6.3 shows the relationship between weight of patient after study period (lbs)
(Postwt) and weight of patient before study period (lbs) (Prewt) for young female anorexia
patients. There seems to be no strong linear relationship between Postwt and Prewt.

Figure 8.6.3.


100

● ●






95

● ●





● ●

● ●
90



Postwt

● ●

● ● ●

● ●
85


● ●

● ● ●
● ● ●
● ● ●●
● ● ● ●
80

● ●
● ●

● ●
●●
● ●●

● ●
75

● ●

●●


70 75 80 85 90 95

Prewt

Scatterplot between Prewt and Postwt in Anorexia Dataset; r = 0.33.

218
Properties of Expectation 8.6 Correlation

Figure 8.6.4 shows the relationship between Miles/gallon (mpg) and Displacement (disp)
(cu.in.) in cars dataset. There seems to be a negative relationship between mpg and disp.

Figure 8.6.4.



400

● ●
● ●


300

●●
disp

● ● ●


200

● ●

● ●


● ●

100


● ● ●

10 15 20 25 30

mpg

Scatterplot between mpg and disp in cars dataset; r = ´0.85

219
Properties of Expectation 8.6 Correlation

Example 8.6.2 (Association between Gender and Color-blindness Cont’d)


Find the correlation ρ XY between Color-blindness X and Gender Y in Example 8.3.2.
Solution:

Cov( X, Y )
ρ( X, Y ) = a
Var ( X ) ¨ Var (Y )
 2  
Var ( X ) = E X ´ ( E X )2
   
Var (Y ) = E Y 2 ´ ( E Y )2
  ÿ 2
E X2 = x P( x, y)
x
= 0 ˆ 240/512 + 02 ˆ 254/512
2

+ 12 ˆ 16/512 + 12 ˆ 2/512
= 18/512
Var ( X ) = 18/512 ´ (18/512)2
= 0.0339
 2 ÿ 2
E Y = y P( x, y)
y

= 02 ˆ 240/512 + 12 ˆ 254/512
+ 02 ˆ 16/512 + 12 ˆ 2/512
= 256/512
= 1/2
Var (Y ) = 1/2 ´ (1/2)2
= 0.25

From Example 8.5.6, Cov( X, Y ) = ´7/512.

Cov( X, Y )
ρ( X, Y ) = a
Var ( X ) ¨ Var (Y )
´7/512
=a
0.0339 ˆ 0.25)
= ´0.1485

There is weak negative correlation.


Example 8.6.2

Example 8.6.3
Find a simplified expression for correlation between 10X, Y + 4
Solution:

220
Properties of Expectation 8.6 Correlation

Certain properties of Covariance and variance are in effect here.

Cov(10X, Y + 4) = 10Cov( X, Y )
Var (10X ) = 100Var ( X )
Var (Y + 4) = Var (Y )
Cov(10X, Y + 4)
ρ( X, Y ) = a
Var (10X ) ¨ Var (Y + 4)
10Cov( X, Y )
=
10SD ( X ) ¨ SD (Y )
Cov( X, Y )
=
SD ( X ) ¨ SD (Y )

Example 8.6.3

Example 8.6.4
Let X be a Uniform random variable on the interval [0, 1], and let Y = X 2 . Find the corre-
lation between X and Y .

Solution:

Cov( X, Y )
ρ( X, Y ) = a
Var ( X ) ¨ Var (Y )
 2  
Var ( X ) = E X ´ ( E X )2
   
Var (Y ) = E Y 2 ´ ( E Y )2

We know that X „ (0, 1), so the pd f is:

"
1 if 0 ď x ď 1;
f X (x) =
0 otherwise

221
Properties of Expectation 8.6 Correlation

We see that
ż1

E X = ( x )1dx
0
= 1/2
ż1

E X = ( x2 )1dx
2

0
= 1/3
Var ( X ) = 1/3 ´ (1/2)2
= 1/12
 
E Y = E X2
= 1/3
2
 
E Y = E X4
ż1
= ( x4 )1dx
0
= 1/5
Var (Y ) = 1/5 ´ (1/3)2
= 4/45
   
Cov( X, Y ) = E XY ´ E( X ) E(Y )
   
= E X 3 ´ E ( X ) E (Y )
ż1
3

E X = ( x3 )1dx
0
= 1/4
   
Cov( X, Y ) = E X 3 ´ E( X ) E(Y )
= 1/4 ´ 1/2(1/3)
= 1/12
1/12
ρ( X, Y ) = ?
1/12 ˆ 4/45
= 0.968

There is a strong linear relationship between X and Y = X 2 .


Example 8.6.4

222
Properties of Expectation 8.7 Home Work

8.7 IJ Home Work


1. Let ( X, Y ) have the joint pm f given in Table:
X=1 X=2 X=3
Y=1 0.3 0.1
Y=2 0.2 0.2
Y=3 0.1 0.1
• Find the covariance σXY
• Find the correlation ρ XY . Interpret the results
2. Consider the following joint pm f , f (0, 0) = 1/12; f (1, 0) = 5/12; f (0, 1) = f (1, 1) =
3/12
f ( x, y) = 0 for all other values.
• Find the covariance σXY
• Find the correlation ρ XY . Interpret the results
3. Let ( X, Y ) have the joint pd f that is given below:
12 2
f ( x, y) = ( x + xy) for 0 ď X ď 1; 0 ď Y ď 1
7
= 0 otherwise

• Find the covariance σXY


• Find the correlation ρ XY . Interpret the results
4. Let X and Y be continuous random variables with joint pd f
"
3x 0 ď y ď x ď 1
f ( x, y) =
0 Otherwise
Find Cov( X, Y ), Cor ( X, Y )
5. Let X and Y correspond to the horizontal and vertical coordinates in the triangle with
corners at (2, 0), (0, 2), and the origin. Let
15
f ( x, y) = ( xy2 + y) for (x, y) inside the triangle
28
= 0 otherwise

• Find the covariance σXY


• Find the correlation ρ XY . Interpret the results
6. Let X and Y be two continuous random variables with joint probability density function:
3
f X,Y ( x, y) = for 0 ď X ď 1 & x2 ď Y ď 1
2
= 0 otherwise
Compute the conditional probability density function of Y given X = x

223
Properties of Expectation 8.7 Home Work

§§ Answers
1. -0.16; -0.3563

2. -1/12; -0.3535

3. 0.4048; -0.056

4. 0.01875; 0.397

5. -0.0986; -0.5958
1
6.
1 ´ x2

224
Bibliography

[1] Chan, Stanley H., Introduction to Probability for Data Science, Michigan Publishing,
2021.

[2] Ward, M. D., and Gundlach, E. Introduction to Probability, Freeman Company, 2016.

[3] DeCoursey W. J., Statistics and Probability for Engineering Applications With Mi-
crosoft Excel, Newnes, 2003.

[4] Devore J. L., Probability and Statistics for Engineering & Sciences, Brooks/Cole, 2012.

[5] Forsyth D., Probability and Statistics for Computer Science, Springer, 2018.

[6] Hayter A., Probability and Statistics for Engineers & Scientists, Brooks/Cole, 2012.

[7] Montgomery, Douglas C., and Runger George C., Applied Statistics and Probability for
Engineers, John Wiley & Sons, Inc, 2011.

[8] Mendenhall W., Beaver R.J., and Beaver, B. M., Introduction to Probability and Statis-
tics, 14th Edition , Brooks/Cole, 2013.

[9] Meyer P.L., Introductory Probability And Statistical Applications, Addison-Wesley,


1970

[10] Rice J. A., Mathematical Statistics and Data Analysis, 3rd Edition, 2007

[11] Ross S., A First Course in Probability, 9th Edition, 2012

[12] Ross S., Introduction to Probability Models, 10th Edition, 2010

[13] Ross S., Introduction to Probability and Statistics For Engineers And Scientists , 3rd
Edition, 2004

[14] Taboga, Marco. Conditional expectation, Lectures on probability theory and


mathematical statistics. Kindle Direct Publishing. (2021). Online appendix.
https://www.statlect.com/fundamentals-of-probability/conditional-expectation.

225
BIBLIOGRAPHY BIBLIOGRAPHY

[15] Triola E., M., Elementary Statistics, Pearson Education, New York 2005.

[16] Walpole R. E., Myers, R. H., Myers, S. L. and Ye, K.,Probability and Statistics for
Engineers & Scientists, Brooks/Cole, 2012.

226
Index

Axioms, 29 Events, 11, 28


At least Events, 28
Bayes’ Theorem, 49 Dependent Events, 44
Birthday Paradox, 38 Disjoint Events, 28
Equally Likely Events, 28
Central Limit Theorem (CLT) Independent, 42
Sample Mean, 160 Null Events, 28
Sample Total, 159 Expectation, 64
Chebyshev Inequality, 156 Continuous, 113
Conditional Probability, 39 Discrete, 65
Continuity Correction, 169
Convolution, 189 i.i.d., 67
Continuous Case, 191 Indicator Random Variable, 67
Discrete Case, 189
Joint Distribution, 174
Correlation, 216
Conditional
Counting Rules, 14 expectation properties, 210
Combinations, 21 Continuous
Multinomial Coefficient, 18 Conditional, 208
Multiplication Rule, 15 Expectation, 200
Permutation, 17 Discrete
Covariance, 214 Conditional, 206
Cumulative Distribution Function (cdf), 60 Expectation, 199
Independent, 205
Distribution
Marginal pdf, 182
Bernoulli, 74 Marginal pmf, 175
Binomial, 75, 76 pdf, 181
Exponential, 145 pmf, 174
Geometric, 89
Hypergeometric, 100 Median, 67
Negative Binomial, 93 Memoryless Property
Normal, 126 Exponential Distribution, 151
Poisson, 82, 84 Geometric Distribution, 92
Uniform, 121 Moment, 65

227
INDEX INDEX

Moment Generating Function, 142

Normal approximation
Binomial, 167

Odds, 37

pdf, 108
Poker Hand, 11
Probability
Equally-likely events, 36
Probability Laws
Complement Rule, 29
Inclusion-exclusion principle, 32
Law-of-Total Probability, 47
Multiplication Law, 41
Dependent Events, 42, 45
Independent Events, 42
Probability Mass Function (pmf), 58

Random Variable, 56
Continuous, 57
Discrete, 57
Randomness, 7

Sample Space, 10

Tree Diagram, 10

Variance
Continuous, 115
Discrete, 69

228

You might also like