Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
13 views

Probability Theory Lecture Notes

Uploaded by

cehis83760
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Probability Theory Lecture Notes

Uploaded by

cehis83760
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

Probability Theory – STAT311

University of Tabuk - Faculty of Science


Department of Statistics

PROBABILITY THEORY

STAT311

Lecture note: Dr. Elfarazdag Mahjoub


Dr.Amin Haleeb

1
Probability Theory – STAT311

This course will cover the following topics:

- Random variables & Probability distributions for both discrete and


continuous cases - Expectation and variance
- Probability Distributions for Discrete Random Variables: Binomial,
geometric, and Poisson
- Probability Distributions for continuous Random Variables:
The Uniform And Exponential Distributions
The Moment and Moment Generating Function

- The Moments about the origin:


- The Moments about the mean:
- Moments from the Moment-Generating Function:
- The MGF of the Binomial Distribution
- The MGF of the Geometric Distribution
- The MGF of the Poisson Distribution
- The MGF of common Continuous distributions
- Continuous Uniform Distribution
- The exponential Distribution
Joint probability distribution function
- Discrete Case
- Joint probability mass function
- Marginal Probability Distribution Function
- Conditional Distribution
- Statistical Independence
- Mathematical Expectation
- Covariance of random variables
- Correlation Coefficient
Joint probability distribution function
- Continuous Case
- Joint probability mass function
- Marginal Probability Distribution Function
- Conditional Distribution
- Statistical Independence
- Mathematical Expectation
- Covariance of random variables
- Correlation Coefficient
2
Probability Theory – STAT311

PART ONE

Random variables &


Probability
distributions

3
Probability Theory – STAT311

RANDOM VARIABLES:

A random variable is a function that assigns numbers to the basic experimental outcomes.
Let’s consider the coin example and let X(tail) = 0 and X(head) = 1 in which case the
variable X is defined as the number of heads occurring.
Generally speaking random variables are either discrete or continuous. It is important to
distinguish between discrete and continuous random variables because different
mathematical techniques are utilized depending on which type is involved.

A random variable is said to be discrete if it can assume only a finite number of values. A
random variable is said to be continuous if it can assume any value in some interval or set
of intervals. In the continuous case, the set of possible outcomes is always infinite.
Therefore, it is not possible to list the sample space by individual values in any form. The
way to distinguish between discrete and continuous is to ask whether the values of the
random variable can be counted. The outcomes of discrete random variables can be counted
(e.g., the number of heads in 10 coin tosses number, or the number of defectives items in
batch of 300 units). The outcomes of continuous random variables are measured rather
than counted (e.g., the weight of an individual).

DISCRETE PROBABILITY DISTRIBUTION - 𝐏(𝐱)


The probability distribution of a discrete random variable X is a list or table of the distinct
numerical values of X and the probabilities associated with those values. The probability
distribution is usually given in tabular form or in the form of an equation
PROPERTIES:
i. 0 ≤ 𝑝(𝑥) ≤ 1; so as before the probability must be greater than 0 and
less than 1.
∑ N
ii. i=1p(x) = 1; so the sum of all the probabilities must add to 1.
EXAMPLE: Consider the random variable x = number of car sales per day with the
corresponding probabilities (probability distribution).

𝑥 𝑃(𝑥)
0 0.45
1 0.25
2 0.20
3 0.10
∑𝟑 𝒑(𝒙) = 𝟏
𝒊=𝟎

We can see that the sum of all of probabilities ∑i=0


3 p(x) = 1.

4
Probability Theory – STAT311

EXAMPLE:

Consider the experiment of tossing a pair of coin. The sample space


{HH, HT, TH, TT} point of sample space equal 4 because the faces of the coin are
2 heads and two tails 22 = 4,
If we toss the coin three times: 23 = 8 and the sample space is ( All possible
outcomes.)
{𝐻𝐻𝐻, 𝐻𝐻𝑇, 𝐻𝑇𝐻, 𝐻𝑇𝑇, 𝑇𝐻𝐻. 𝑇𝐻𝑇, 𝑇𝑇𝐻, 𝑇𝑇𝑇}.

Set up sample space for single toss of a pair of dice?

Let 𝑥 be the number of heads, find the probability distribution of 𝑥 if we toss a coin
three times.

𝑥 Sample point 𝑃(𝑥)


0 TTT 1/8
1 HTT, HTH, TTH 3/8
2 HHT. THT, HTH 3/8
3 HHH 1/8
Total 1

X is a random variable and p(x ) is probability of x. so this table is probability


distribution of x since it satisfies the above criteria.

𝑝(𝑥) ≥ 0
∑3𝑖=0 𝑝(𝑥) = 1

ANOTHER EXAMPLE:
𝑥
𝑃(𝑥) = , 𝑥 = 1,2,3,4 is a probability distribution since:
10

P( I ) = .I , P(2) = .2, P(3) = 3,and P(4) = .4 . So both conditions are satisfied.


A THIRD EXAMPLE:
It is known from census data that for a particular income group that 10% of
households have no children, 25% have one child, 50% have two children, 10% have
three children, and 5% have four children. If X represents the number of children
per household for this income group, then the probability distribution of X is given
in the following table:

𝑥 0 1 2 3 4 ∑ 𝑃(𝑋)

𝑃(𝑋) 0.10 0.25 0.50 0.10 0.05 1

Again both conditions are satisfied

5
Probability Theory – STAT311

- The event 𝑋 ≥ 2 is the event that a household in this income group has at least
two children and means that 𝑋 = 2, or 𝑋 = 3 , or 𝑋 = 4. The probability that
𝑋 ≥ 2 is given by:
𝑃(𝑋 ≥ 2) = 𝑃(𝑋 = 2) + 𝑃(𝑋 = 3) + 𝑃(𝑋 = 4) = .50+ . 𝐼0 + .05 = . 𝟔𝟓

- The event X ≤ 1 is the event that a household in this income group has at most
one child and is equivalent to X = 0, or X = 1. The probability that 𝑋 ≤ 1 is given
by: P (𝑋 ≤ 1) = P(X = 0) + P(X = I ) = .I0+ .25 = . 35

- The event 1 ≤ 𝑋 ≤ 3 is the event that a household has between one and three
children inclusive and is equivalent to X = I , or X = 2, or X = 3. The probability
that 1 ≤ 𝑋 ≤ 3 is given by
𝑃(1 ≤ 𝑋 ≤ 3) =P(X = I ) + P(X = 2) + P(X = 3 ) = .25 + .50 + .I0 = .85
CUMULATIVE DISTRIBUTION FUNCTIONS FOR DISCRETE RANDOM VARIABLES
The cumulative distribution or the cumulative probability distribution of a random
variable is P(X ≤ x). It is obtained in a way similar to finding the cumulative relative
frequency distribution for samples.
EXAMPLE: For the following probability distribution, calculate the below
probabilities:
x P(x)

1 0.10
2 0.44
3 0.30
4 0.16
∑4 p(x) = 1
i=1

i. P(X ≤ 1) = 0.10
ii. P(X ≤ 2) = P(X = 1) + P(X = 2) = 0.1 + 0.44 = 0.54
iii. P(X ≤ 3) = P(X = 1) + P(X = 2) + P(X = 3) = 0.1 + 0.44 + 0.3 = 0.84
iv. P(X ≤ 4) = P(X = 1) + P(X = 2) + P(X = 3) + P(X = 4) = 0.1 + 0.44 +
0.3 + 0.16 = 1

6
Probability Theory – STAT311

EXPECTED VALUE, VARIANCE AND STANDARD DEVIATION OF A DISCRETE RANDOM


VARIABLE

In working with discrete random variables it is helpful to know certain values that aid in
describing the distribution. The most commonly used values are those that identify the
physical center and the dispersion (the way the values are spread around the center). Given
a discrete random variable 𝑥, with probability function 𝑃(𝑥) the expected value of 𝑥 (also
called the mean and denoted E[X] ) or  (mean of random variable 𝑥) is defined as the
weighted average of the values of 𝑥 may assume where weights are the corresponding
probabilities, that is:

𝝁 = 𝑬(𝒙) = ∑ 𝒙 𝑷(𝒙)

EXAMPLE:
Determine the expected number of broken tools per day for the probability distribution
given in Table1:
TABLE1: DISCRETE PROBABILITY DISTRIBUTION FOR BROKEN TOOLS
Number Broken per day(𝒙) 𝑷(𝒙) 𝑿 𝑷(𝒙)
0 .23 0
1 .50 .50
2 .15 .30
3 .08 .24
4 .04 .16
𝑻𝒐𝒕𝒂𝒍 ∑ 𝑷(𝒙) = 𝟏 ∑ 𝒙 𝑷(𝒙) = 𝟏. 𝟐

Hence the mean or expected number of broken tools per day is 1.20. This value can be
interpreted as the long run average of broken tools per day. Obviously, this value cannot
occur on any day since the average is not an integer value. However in interpreting this
value say over 50 days, the factory can expect to have (50).(1.2)= 60 broken tools. This
result does not imply that exactly 60 tools will be broken over the 50 day period, but it
does provide an estimate of the replacement tools that will be needed.

VARIANCE AND STANDARD DEVIATION


The general formula for computing the variance is:
σ2 = Ex2 − (Ex)2 = Ex2 − μ2
= ∑ x2P(x) − [∑ x P(x)]2

7
Probability Theory – STAT311

The variance for the tool breakage example is computed in Table 2:


TABLE2: CALCULATION OF VARIANCE FOR A DISCRETE RANDOM VARIABLE
Number Broken per 𝑷(𝒙) 𝒙𝟐 𝒙 𝑷(𝒙) 𝒙𝟐𝑷(𝒙)
day(𝒙)
0 .23 0 0 0
1 .50 1 0.50 0.50
2 .15 4 0.30 0.60
3 .08 9 0.24 0.72
4 .04 16 0.16 0.64
𝑇𝑜𝑡𝑎𝑙 1.0 ∑ 𝑥 𝑃(𝑥) =1.2 ∑ 𝑥2𝑃(𝑥) =2.46

Therefore, the variance 𝝈𝟐 is:


σ2 = ∑ x2P(x) − [∑ x P(x)]2 = 2.46 − (1.2)2 = 1.02
Hence the variance of daily tool breakdowns is  2  1.02 breakdowns squared/day
Even though variance shows dispersion or variation around the mean, it is based on squared
values.
The standard deviation is generally preferred measure of dispersion because the result is
in the same terms (not squared) as the original random variable (𝑥). It is denoted as  and
is computed by taking the square root of the variance. For our example:   1.02 1.01
breakdowns per month. A small standard deviation relative to the mean can be interpreted
as the values of the random variable are closely bunched to the expected value (mean).

8
Probability Theory – STAT311

PROBABILITY
DISTRIBUTIONS FOR
DISCRETE RANDOM
VARIABLES: BINOMIAL,
GEOMETRIC, AND POISSON
BINOMIAL DISTRIBUTION
There are many different discrete probability distributions; here we will be concerned
with just one: binomial distribution.
The simplest random variable is one that has one value. However, we would have little
interest in such a random variable. If the random variable can assume one of two possible
values, it could be used to describe an experiment that can be classified as resulting in either
“success” or “failure”. Let’s assume that the random variable assigns the value of 1 to
success and value of 0 to failure with probability of p to success and probability of 1-p to
failure. This type of discrete variable is generally known as a Bernoulli random variable.
If a random experiment consists of making n independent trials from an infinite population
where the probability of success 𝑝 is constant from trial to trial, the probability of number
of successes 𝑃(𝑥) is given by the binomial distribution. The general forms of the probability
function for the family of binomial distributions given by:
𝒏 𝒏!
𝑷(𝒙) = 𝑷(𝑿 = 𝒙) = ( ) 𝒑𝒙𝒒𝒏−𝒙 = 𝒑𝒙𝒒𝒏−𝒙 , 𝒙 = 𝟎, 𝟏, 𝟐 … 𝒏
𝒙 𝒙! (𝒏 − 𝒙)!
PROPERTIES OF A BINOMIAL PROCESS ARE LISTED BELOW:

1. There are two possible outcomes for each trial. Outcomes could be yes or
no, success or failure, defective or non-defective, heads or tails and so on.
2. The probability of an outcome remains constant from trial to trial. For
example the probability of success or failure on any trial remains the same
regardless of the number of trials. If the probability of success is .30 it will
remain .30 on each trial regardless of the number of successes on previous
trials.
3. Related to number 2, outcomes of the trials are independent. In other
words, if a success occurred on a previous trial, it does not affect the
probability of success on the next trial.

9
Probability Theory – STAT311

4. The number of trials are discrete and integer. For example, the number of
trials can be 10 but not 10.3.

Then as we stated above, the discrete random variable 𝑋 = the number of successes in 𝑛
trials has a Binomial (n, p) distribution for which the probability distribution function is
given by:

𝑛 𝑛!
𝑃(𝑋) = 𝑃 (𝑋 = 𝑥) = ( ) 𝑝𝑥 𝑞𝑛−𝑥 = 𝑝 𝑥 𝑞𝑛−𝑥 , 𝑥 = 0, 1, 2 … 𝑛
𝑥 𝑥! (𝑛 − 𝑥)!

WHERE:

𝑛 is the number of trials


𝑝 = the probability of success in a single trial
𝑞 = 1 – 𝑝 = the probability of failure in a single trial
𝑥 is the number of observed successes.
𝑛
And ( ) = 𝑛! (combination)
𝑥 𝑥!(𝑛−𝑥)!
𝑛! = 𝑛(𝑛 − 1)(𝑛 − 2) ......... 1, e.g. 4! = 4 × 3 × 2 × 1 = 24
By definition 0! is equal to 1.
EXAMPLE:

(5) =
5! 5! 5×4×3!
i. = = = 10
2 2!(5−2)! 2!3! 2×1×3!
8 8! 8! 8×7×6!
ii. ( )= = = = 28
6 6!(8−6)! 6!2! 6!2×1

Example: Consider the experiment of tossing a coin twice. Let 𝑥 be the number of
heads(𝐻). Find the probability of getting:
i. 0 heads
ii. 1 head
iii. Two heads
SOLUTION: In this example:
1 1
𝑛=2 𝑝= 𝑞 =1−𝑝 =
2 2
10 12
∴ 𝑃(𝑋 = 0) = (2) 2 × 2 = 0!(2)! 4 = 𝟒
2! 1 𝟏
i.
0
𝑃(𝑋 = 1) = (2) 2 × 2 = 1!1! 4 = 2 × 4 = 𝟐
1 1 2! 1 1 𝟏
ii.
1
iii. 2 12 10
𝑃(𝑋 = 2) = ( ) 2 × 2 =
2! 1 𝟏
=
2 0!(2)! 4 𝟒
THE MEAN AND VARIANCE OF BINOMIAL DISTRIBUTION:

10
Probability Theory – STAT311

The mean and variance for a Binomial (𝑛 , 𝑝) random variable are:


i. The mean 𝐸(𝑥) = 𝜇 = ∑(𝑋 = 𝑥) 𝑝(𝑥) = 𝑛𝑝
ii. The variance 𝜎2 = 𝑛𝑝(1 − 𝑝) = 𝑛𝑝𝑞
For the previous example find the mean and variance for the number of heads.
1
(i) The mean 𝐸(𝑥) = 𝑛𝑝 = 2 × = 1
2
1 1 1
(ii) The variance σ2 = npq = 2 × × =
2 2 2

EXAMPLE:

Suppose that the probability that a man in certain country has a high blood pressure is
0.15. If we randomly select six men in this country:

a. Find the probability distribution function for the number of men out of 6 with high
blood pressure.
b. Find the probability that there are 4 men with high blood pressure?
c. Find the probability that all the 6 men have high blood pressure?
d. Find the probability that none of the 6 men have high blood pressure?
e. What is the probability that more than two men will have high blood pressure?
f. Find the expected number and variance of high blood pressure

SOLUTION:

Let x =the number of men out of 6 with high blood pressure. Then x has a binomial
distribution.
Success = The man has a high blood pressure. p

Failure = The man doesn’t have a high blood pressure. q

Probability of success= p = 0.15 (and hence Probability of failure = q = 1 − p =


0.85

Number of trials n = 6 , P = 0.15 , 1 − P = 0.85

Then X has a Binomial distribution, X~ Bin (6, 0.15)


a. The probability distribution function is:
6
P(X = x) = ( ) (0.15)X(0.85)6−X , x = 0, 1, 2 … .6
x
b. the probability that 4 men will have high blood pressure
6
P(X = 4) = ( ) (0.15)4(0.85)2 = (15)(0.15)4(0.85)2 = 0.00549
4
c. the probability that all the 6 men have high blood pressure:
6
P(X = 6) = ( ) (0.15)6(0.85)0 = (0.15)6 = 0.00001
6
d. the probability that none of 6 men have high blood pressure is
6
P(X = 0) = ( ) (0.15)0(0.85)6 = (0.85)6 = 0.37715
0
e. the probability that more than two men will have high blood pressure:

11
Probability Theory – STAT311

P(X > 2) = 1 − P(X ≤ 2) = 1 − [P(x = 0) + P(X = 1) + P(X = 2)]


= 1 − [0.37715 + 0.39933 + 0.17618] = 1 − 0.95266 = 0.04734
THE EXPECTED VALUE AND VARIANCE:

i. The mean E(x) = np = 6 × 0.15 = 0.9

ii. The variance σ2 = npq = 6 × 0.15 × 0.85 = 0.765

H.W.:

XYZ Manufacturing Company produces a product that has a known defective


rate of 5%. In a day, 10 items are randomly selected and checked to see if they
are defective. From a sample of 10 units selected, what is the probability of
finding 2 units or less to be defective?

GEOMETRIC DISTRIBUTION
PROPERTIES:
1. An experiment consists of repeating trials until first success.
2. Each trial has two possible outcomes;
(a) A success with probability p

12
Probability Theory – STAT311

(b) A failure with probability 𝑞 = 1 − 𝑝.


3. Repeated trials are independent.
X = number of trials to first success
The probability mass function PMF is given by:
P(X = n) = (1 − p)𝑋−1p, x = 1,2, . ..
In this case, we say that X follows a geometric distribution.
The cumulative distribution function CDF is given by:

EXAMPLE 1
Products produced by a machine has a 3% defective rate.
i. What is the probability that the first defective occurs in the fifth item inspected?
ii. What is the probability that the first defective occurs in the first five inspections?
Solution
P(X = 5) = P(1st 4 non-defective )P( 5th defective)
P(X= 5) = (0.97)4×0.03 = 0.027
What is the probability that the first defective occurs in the first five inspections?
P(X ≤ 5) = 1 − P(First 5 non-defective)
1 − (0.97)5 = 0.141
EXAMPLE 2
A representative from the National Football League's Marketing Division randomly
selects people on a random street in Kansas City, Kansas until he finds a person who
attended the last home football game. Let p, the probability that he succeeds in
finding such a person (marketing), equal 0.20. And, let X denote the number of
people he selects until he finds his first success. What is the probability that the
marketing representative must select 4 people before he finds one who attended the
last home football game?
SOLUTION:

To find the desired probability, we need to find P(X = 4), which can be determined
readily using the p.m.f. of a geometric random variable with p = 0.20, 1−p = 0.80,
and x = 4:

P(X = 4) = (𝟏 − 𝟎. 𝟐𝟎)𝒙−𝟏𝑷
P(X= 4) = 0.803×0.20
=0.1024
There is about a 10% chance that the marketing representative would have to select
4 people before he would find one who attended the last home football game.
EXPECTATION OF A GEOMETRIC RANDOM VARIABLE:

13
Probability Theory – STAT311

To calculate the expectation of a geometric random variable having parameter p

where q =1−p,

The variance of a Geometric Random Variable is:


1−𝑝
𝜎2 = 𝑉𝑎𝑟(𝑋) =
𝑝2
FOR THE PREVIOUS EXAMPLE:
How many people should we expect (that is, what is the average number) the
marketing representative needs to select before he finds one who attended the last
home football game? And, while we're at it, what is the variance?
SOLUTION: The average number is:
1 1
𝐸(𝑋) = µ = = =5
𝑝 0.20
That is, we should expect the marketing representative to have to select 5 people
before he finds one who attended the last football game. Of course, on any given try,
it may take 1 person or it may take 10, but 5 is the average number.
The variance is 20, as determined by:
1 − 0.20
𝜎2 = 𝑉𝑎𝑟(𝑋) = = 20
(0.20)2

POISSON DISTRIBUTION
The Poisson distribution describes the numbers of occurrences, over some defined
interval, of independent random events that are uniformly distributed over that
interval.

14
Probability Theory – STAT311

The probability that X will occur (the probability distribution function) is given
by:

𝑒−𝜆𝜆𝑥
𝑃(𝑋 = 𝑥) = 𝑥 = 0, 1, 2 … … …
𝑥!

Where:

- 𝑒 = 2.7182
- 𝑋 ∶ Representing the number of occurrences in a continuous interval.
- λ is the expected (average) number of occurrences of the random variable in
this interval.
EXAMPLES OF POISSON DISTRIBUTION:

- The number of telephone call within certain time interval


- The number of patients in a waiting room in an hour.
- The number of car accidents on the highway
- The number of serious injuries in a particular factory in a year.
- The number of times a three year old child has an ear infection in a year.

PROPERTIES OF POISSON EXPERIMENT:

- The probability of an occurrence is the same for any two intervals of equal
length!! The expected value of occurrences in an interval is proportional
to the length of this interval.
- The occurrence or nonoccurrence in any interval is independent of the
occurrence or nonoccurrence in any other interval.
- The probability of two or more occurrences in a very small interval is close
to 0

THE MEAN AND VARIANCE OF POISSON PROBABILITY DISTRIBUTION:


A random variable X has the Poisson probability distribution 𝑓(𝑥) with parameter
λ, then:

𝐸(𝑥) = 𝜇 = 𝜆 (The expected number of occurrence), and the variance

15
Probability Theory – STAT311

𝜎2 = 𝜆

EXAMPLE: Suppose we are interested in the number of snake bite cases seen in a
particular hospital in a year. Assume that the average number of snake bite cases at
the hospital in a year is 6.

1. What is the probability that in a randomly chosen year, the


number of snake bites cases will be 7?
2. What is the probability that the number of cases will be less
than 2 in 6 months?
3. What is the probability that the number of cases will be 13 in 2
year ?
2. What is Expected number of snake bites in a year? What is the
variance of snake bites in a year?
SOLUTION
Let 𝑋 be the number of snake bite cases seen at this hospital in a year.

Then 𝑋~ 𝑃𝑜𝑖𝑠𝑠𝑜𝑛 (6)

FIRST NOTE THE FOLLOWING


• The average number of snake bite cases at the hospital in a 𝒚𝒆𝒂𝒓 = 𝝀 = 𝟔
• The average number of snake bite cases at the hospital in 6 months = the
average number of snake bite cases at the hospital in (1/2) 𝒚𝒆𝒂𝒓 =
(𝟏/𝟐)𝝀 =3
• The average number of snake bite cases at the hospital in 𝟐 𝒚𝒆𝒂𝒓𝒔 = 𝟐 𝝀 =
𝟏𝟐
1. The probability that the number of snake bites will be 7 in a
year λ=6
𝑒−𝜆𝜆𝑥
𝑃(𝑋 = 𝑥) =
𝑥!

𝑒−667
𝑃(𝑋 = 7) = = 0.138
7!

2- The probability that the number of cases will be less than 2 in 6 months

λ=3
16
Probability Theory – STAT311

𝑃(𝑥 < 2) = 𝑃(𝑋 = 0) + 𝑃(𝑋 = 1)

𝑒−330 𝑒−331
𝑃(𝑋 < 2) = + = 0.0498 + 0.149 = 0.199
0! 1!

3- The probability that the number of cases will be 13 in 2 years

𝑒−121213 λ=12
𝑃(𝑋 = 13) = = 0.106
13!

4- the expected number of snake bites in a year:

𝐸(𝑥) = 𝜇 = 𝜆 = 6, and the variance of snake bites in a year is:

𝜎2 = 𝜆=6

EXAMPLE: Suppose the average number of car accidents on the highway in one day
is 4. What is the probability of no car accident in one day? What is the probability
of 1 car accidence in two days?

SOLUTION:
It is sensible to use Poisson random variable representing the number of car
accidents on the high way. Let X representing the number of car accidents on the
high way in one day. Then,

𝑒−44𝑖
𝑃(𝑋 = 𝑖) = , 𝑖 = 0, 1, 2, … ….
𝑖!
And, 𝐸(𝑥) = 𝜆 = 4

Then,

𝑒−440
𝑃 (𝑁𝑜 𝑐𝑎𝑟 𝑎𝑐𝑐𝑖𝑑𝑒𝑛𝑡 𝑖𝑛 𝑜𝑛𝑒 𝑑𝑎𝑦) = 𝑃(𝑋 = 0) = = 𝑒−4 = 0.0183
0!

Since the average number of car accidents in one day is 4, thus the average number
of car accidents in two days should be 8. Let Y represent the number of car accidents
in two days. Then,

𝐸(𝑥) = 𝜆 = 8

Therefore,

17
Probability Theory – STAT311

𝑒−881
𝑃 (𝑜𝑛𝑒 𝑐𝑎𝑟 𝑎𝑐𝑐𝑖𝑑𝑒𝑛𝑡 𝑖𝑛 𝑡𝑤𝑜 𝑑𝑎𝑦𝑠) = 𝑃(𝑌 = 1) = = 8𝑒−8 ≈ 0.002
1!
EXAMPLE:
Suppose the average number of calls by 80 in one minute is 2. What is the probability
of 5 calls in 5 minutes?

SOLUTION:
Since the average number of calls by 100 in one minute is 2, thus the average number
of calls in 5 minutes is 10. 𝐿𝑒𝑡 𝑋 represents the number of calls in 5 minutes. Then,

e101010
P(10 calls in 5 minutes)  P(X 10)  fx (10)   0.1251 .

10! λ=10
𝑒−10105
𝑃(5 𝑐𝑎𝑙𝑙𝑠 𝑖𝑛 5 𝑚𝑖𝑛𝑢𝑡𝑒𝑠) = 𝑃(𝑋 = 5) = = 0.0378
5!

ASSIGNMENT 1
1. The distribution of the number of children per household for households receiving Aid
to Dependent Children (ADC) in a large eastern city is as follows: Five percent of the
ADC households have one child, 35% have 2 children, 30% have 3 children, 20% have
4 children, and 10%have 5 children. Construct the probability distribution and find the
mean and the variance number of children per ADC household in this city

18
Probability Theory – STAT311

……………………………………………………………………………………………
…………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
………………………………………………………………………
2. Approximately 12% of the U.S. population is composed of African-Americans.
Assuming that the same percentage is true for telephone ownership, what is the
probability that when 25 phone numbers are selected at random for a small survey, that
5 of the numbers belong to an African-American family? Use binomial distribution to
solve the problem

……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………………………
…………………………………………………………………………………

19
Probability Theory – STAT311

CONTINUOUS PROBABILITY
DISTRIBUTION

CONTINUOUS PROBABILITY DISTRIBUTION:

A continuous random variable is a random variable capable of assuming all the values in
an interval or several intervals of real numbers. Because of the uncountable number of
possible values, it is not possible to list all the values and their probabilities for a continuous
random variable in a table as is true with a discrete random variable. The probability
distribution for a continuous random variable is represented as the area under a curve called
the probability density function, abbreviated pdf. A pdf is characterized by the following
two basic properties: The graph of the pdf is never below the x axis and the total area under
the pdf always equals I.

we really want to know is the probability of the value falling within a certain interval.

20
Probability Theory – STAT311

We represent a continuous probability distribution with a probability density function


(𝒑𝒅𝒇) such as follows:

1
𝑓(𝑥) = 0 ≤ 𝑥 ≤ 10
10

This function defines a uniform distribution over the interval [0,10]. Every value in the
range from 0 to 10 can occur (and not just 0, 1, 2, etc., but all the fractional values in
between). We cannot interpret 𝑓(𝑥) as the probability of the value x, because there are
more than 10 possible values of 𝑥, so the probabilities would add up to more than 1. And
1
that would clearly be wrong anyway, because the chance of (say) 𝑥 = 2 is not .
10
What 𝑓(𝑥) does do for us is allow us to find the probability of intervals. We do this by
looking at the area underneath the curve defined by 𝑓(𝑥). [Draw graph of this function: a
horizontal line at 𝑓(𝑥) = 1/10, going from 𝑥 = 0 𝑡𝑜 𝑥 = 10. ] Note that the total area
underneath this function is 1. This makes sense, because all probabilities must add up to 1,
and no value can fall outside the interval [0,10]. Note also that the area under the curve for
any interval with a length of one, such as [0,1] or [1,2] or [3.5,4.5] is equal to 1/10.
The probability density function of 𝑥, 𝑓(𝑥) 𝑜𝑟 𝑝𝑑𝑓(𝑥), supplies the probability
density (y) for each possible value of x.

In general, for a continuous variable, the probability of 𝑥 falling between 𝑎 and 𝑏


is:
𝑏

𝑃(𝑎 ≤ 𝑥 ≤ 𝑏) = ∫ 𝑓(𝑥)𝑑𝑥
𝑎

PROPERTIES OF PROBABILITY DENSITY FUNCTION (𝐩𝐝𝐟):

(i) 𝑓(𝑥) ≥ 0

(ii) ∫−∞ 𝑓(𝑥)𝑑𝑥 = 1
EXAMPLE:

 Given the probability density function (𝑝𝑑𝑓):

𝑓(𝑥) = 𝑐𝑥2 0<𝑥<3

0 otherwise

21
Probability Theory – STAT311

(a) Find c

(b) Calculate 𝑃(1 < 𝑥 ≤ 2)


SOLUTION:

∞ 3
𝑥3 3 1
∫ 𝑓(𝑥)𝑑𝑥 = 1 ⟹ ∫ 𝐶𝑥2 𝑑𝑥 = 1 ⟹ 𝐶 ⌊ ⌋ = 9𝐶 = 1 ⟹ 𝐶 =
3 9
−∞ 0

For x to be (𝑝𝑑𝑓), 𝐶 must equal to 1


9

2 2
1 1 𝑥3 2 1 8 − 1 7
𝑃(1 < 𝑥 ≤ 2) = ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 9 𝑥2𝑑𝑥 = ⌊ ⌋ = ⌊ ⌋=
9 3 9 3 27
1 1 1

THE MEAN OR EXPECTED VALUE AND VARIANCE FOR A CONTINUOUS VARIABLE:

(i) The mean 𝜇 or 𝐸(𝑥)can be calculated as follows:


𝜇 = 𝐸(𝑥) = ∫ 𝑥 𝑓(𝑥)𝑑𝑥
−∞

(ii) The variance 𝜎2 can be calculated as follows:

∞ ∞ 2

𝐸𝑥2 − (𝐸𝑥)2 = ∫ 𝑥2 𝑓(𝑥)𝑑𝑥 − [ ∫ 𝑥 𝑓(𝑥)𝑑𝑥]


−∞ −∞

22
Probability Theory – STAT311

EXAMPLE:

Calculate: (i) the mean 𝜇 or 𝐸(𝑥), and (ii) the variance 𝜎2 for the following
probability density function (𝑝𝑑𝑓):
1
𝑓(𝑥) = 𝑥 0<𝑥<2
2

0 otherwise
SOLUTION:
2 1
(i) 𝜇 = 𝐸(𝑥) = ∞ 𝑥 𝑓(𝑥)𝑑𝑥 = 𝑥 ⌊ 𝑥⌋ 𝑑𝑥
∫−∞ ∫0 2
2
1 𝑥3 1 8 4
= ⌊ ⌋ = ⌊⌋=
2 3 2 3 3
0

(ii) the variance 𝜎2 = 𝐸𝑥2 − (𝐸𝑥)2


2 1 21
𝐸𝑥2 = ∞ 𝑥2 𝑓(𝑥)𝑑𝑥 = 𝑥2 ⌊ 𝑥⌋ 𝑑𝑥 = 𝑥3 𝑑𝑥
∫−∞ ∫ ∫0 2
0 2
1 4 2
= ⌊𝑥 ⌋ = 1 ⌊16 ⌋ = 2
2 4 1 2 4
4 2 16
(𝐸𝑥)2 = ⌊ ⌋ =
3 9
16 2
Therefore, the variance 𝜎2 = 𝐸𝑥2 − (𝐸𝑥)2 = 2 − =
9 9

23
Probability Theory – STAT311

THE UNIFORM AND


EXPONENTIAL
DISTRIBUTIONS

24
Probability Theory – STAT311

THE UNIFORM DISTRIBUTION


The uniform distribution (continuous) is one of the simplest probability distributions
in statistics. It is a continuous distribution, this means that it takes values within a
specified range, e.g. between 0 and 1.

The probability density function for a uniform distribution taking values in the range
a to b is:
1
𝑓(𝑥) = {𝑏 − 𝑎 , a≤ x ≤ b
0 , 𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒
∞ 1

∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝑑𝑥 = 1
−∞ 0

EXAMPLE: You arrive into a building and are about to take an elevator to your floor.
Once you call the elevator, it will take between 0 and 40 seconds to arrive to you.
We will assume that the elevator arrives uniformly between 0 and 40 seconds after
you press the button. In this case a = 0 and b = 40.
CALCULATING PROBABILITIES

Remember, from any continuous probability density function we can calculate


probabilities by using integration.
𝑑 𝑑
1 𝑑−𝑐
P(c ≤ x ≤ d) = ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝑑𝑥 =
𝑏 −𝑎 𝑏−𝑎
𝑐 𝑐

In our example, to calculate the probability that elevator takes less than 15 seconds
to arrive we set d = 15 and c = 0. The correct probability is 15−0 =15.
40−0 40

EXPECTED VALUE

The expected value of a uniform distribution is:


𝑏 𝑏
x 𝑏+𝑎
𝐸(𝑥) = ∫ 𝑥𝑓(𝑥)𝑑𝑥 = ∫ 𝑑𝑥 =
𝑏−𝑎 2
𝑎 𝑎

In our example, the expected value is 40+0 = 20 seconds.


2

THE VARIANCE

The variance of a uniform distribution is:

25
Probability Theory – STAT311

Var(X) = E(X2 ) – [E (X)]2


𝑏
2 𝑏 + 𝑎 2 (𝑏 − 𝑎)2
= ∫ 𝑥 𝑑𝑥 − ( ) =
𝑏−𝑎 2 12
𝑎
2 400
In our example, the variance is (40−0) =
12 3

EXAMPLE2:

Suppose in a quiz there are 30 participants. A question is given to all 30 participants


and the time allowed to answer it is 25 seconds.

Find the probability of participants responds within 6 seconds?

i. Find the expected value participants


ii. Find the variance
SOLUTION:
Interval of probability distribution = [0 seconds, 25 seconds]
1
Density of probability = 1 =
25−0 25
The Interval of probability distribution of successful event = [0 seconds, 6 seconds]

i. The probability P(x<6) =The probability ratio = 6


25
ii. Find the expected value participants
25+0
E(x) = 𝑏+𝑎 = = 12.5
2 2
iii. Find the variance
(𝑏 − 𝑎)2 (25 − 0)2 625
𝑉𝑎𝑟(𝑥) = = = = 52.08
12 12 12

THE EXPONENTIAL DISTRIBUTION

26
Probability Theory – STAT311

For a positive real number 𝝀 the probability density function of a Exponentially


distributed Random variable is given by:

𝑓(𝑥) = {𝝀𝑒 , if x ϵR
−𝝀𝑥

0 , 𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒
To check if the above function is a legitimate probability density function, we need
to check if it’s integral over its support is 1.
∞ ∞

= ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝝀𝑒−𝝀𝑥𝑑𝑥
−∞ 0

𝝀
[𝑒−𝝀𝑥]∞0 = −[0 − 1] = 1
−𝝀
CUMULATIVE DENSITY FUNCTION
As we know, the cumulative density function is nothing but the sum of probability
of all events up to a certain value of x = t In the Exponential distribution, the
cumulative density function F(x) is given by
t t
e−λX
F(x) = ∫ λe dx = [ −λ ] = [−e−λt + 1]
−λ X

0 0

= 1 − e−λt
EXPECTED VALUE –
To find out the expected value, we simply multiply the probability distribution
function with x and integrate over all possible values(support).

27
Probability Theory – STAT311

VARIANCE AND STANDARD DEVIATION –


The variance of the Exponential distribution is given by-

The Standard Deviation of the distribution

28
Probability Theory – STAT311

EXAMPLE – Let X denote the time between detections of a particle with a Geiger
counter and assume that X has an exponential distribution with E(X) = 1.4 minutes.
What is the probability that we detect a particle within 30 seconds of starting the
counter?

SOLUTION – Since the Random Variable (X) denoting the time between successive
detection of particles is exponentially distributed, the Expected Value is given by
1
E(X) =
λ
1 1
= 1.4 then λ =
λ 1.4
To find the probability of detecting the particle within 30 seconds of the start of the
experiment, we need to use the cumulative density function discussed above. We
convert the given 30 seconds in minutes since we have our rate parameter in terms
of minutes.

F(X) = 1 − e−λt
0.5

F(0.5) = 1 − e 1.4

F(0.5) = 0.30

29
Probability Theory – STAT311

ASSIGNMENT 2
1. If the monthly expenditure for certain family (1000 S.R) on food has the following
probability density function(𝑝𝑑𝑓):

𝑓(𝑥) = 𝑐𝑥 (10 − 𝑥) 0 ≤ 𝑥 ≤ 10

0 otherwise

(a) Find C.
(b) Calculate 𝑃(5 ≤ 𝑥 ≤ 8)
(c) If we have 600 households (families), what’s the expected number of family
whose expenditure is less than or equal to 3 thousand S.R. per month
(d) Calculate: (i) the mean 𝜇 or 𝐸(𝑥), and (ii) the variance 𝜎2 for monthly
expenditure.

……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
…………………………………………………………………

2. The weights of 10-pound bags of potatoes packaged by Idaho Farms Inc. are
uniformly distributed between 9.75 pounds and 10.75 pounds. Calculate the mean and
the standard deviation weight per bag

……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
…………………………………………………………………

30
Probability Theory – STAT311

THE MOMENT AND MOMENT GENERATING FUNCTION

THE MOMENTS:

THE MOMENTS ABOUT THE ORIGIN:

Let X be a random variable with a probability distribution f (x) the rth moment
about the origin of X is given by:
𝑋
‫ﻟ‬ ∑ 𝑋𝑟𝑓(𝑥), 𝑖𝑓 𝑋 𝑖𝑠 𝑎 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒
I
𝑎𝑙𝑙𝑥
µ𝑟 = 𝐸(𝑋𝑟) ∞ .

∫ 𝑋𝑟f(𝑋)𝑑𝑥 , 𝑖𝑓 𝑋 𝑖𝑠 𝑎 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠
I
𝗅−∞
if the expectation exists.

As special case:

1= 𝐸(𝑋) = µ𝑋 Mean of X.

THE MOMENTS ABOUT THE MEAN:

Let X be a random variable with a probability distribution f (x) the rth central
moment of X about µ is defined as:
𝑋
‫ﻟ‬
∑(𝑋 − µ)𝑟𝑓(𝑥), 𝑖𝑓 𝑋 𝑖𝑠 𝑎 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒
I
𝑎𝑙𝑙𝑥
µ𝑟 = 𝐸(𝑋 − µ)𝑟 = ∞

∫(𝑋 − µ)𝑟f(𝑋)𝑑𝑥 , 𝑖𝑓 𝑋 𝑖𝑠 𝑎 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠
I
𝗅−∞

AS SPECIAL CASE:

- µ𝟏 = 𝟎 𝟐
- µ = 𝐸(𝑋 − µ)2 = 𝜎2 is the variance of X . 𝝁 = 𝝁! − 𝝁!
2 𝑥 𝟐 𝟐 𝟏

EXAMPLE - DISCRETE CASE


If X is a discrete random variable having the following probability distribution,
calculate: ( a) the first four moments about the origin and (b) about the mean.
𝑿 0 1 2 3
𝑷(𝑿) 1/56 15/56 15/28 5/28

31
Probability Theory – STAT311

SOLUTION:
(a) The first four moments about the origin by using the formula:
µ𝒓 = 𝑬(𝑿𝒓) = ∑𝑿𝒂𝒍𝒍𝒙 𝑿𝒓𝒇(𝒙),
𝒙 𝒙𝟐 𝒙𝟑 𝒙𝟒 𝑷(𝒙) 𝑿𝑷(𝒙) 𝒙𝟐𝑷(𝒙) 𝒙𝟑𝑷(𝒙) 𝒙𝟒𝑷(𝒙)
0 0 0 0 0.05 0 0 0 0
1 1 1 1 0.20 0.20 0.20 0.20 0.20
2 4 8 16 0.45 0.90 1.80 3.60 7.20
3 9 27 81 0.30 0.90 2.70 8.10 24.3
∑ 1 2 4.70 11.90 31.70
Therefore, the first four moments about the origin are:
𝜇1! = 𝐸(𝑥) = ∑ 𝑥𝑃(𝑥) = 2

𝜇2! = 𝐸(𝑥2) = ∑ 𝑥2 𝑃(𝑥) = 4.7

𝜇3! = 𝐸(𝑥3) = ∑ 𝑥3 𝑃(𝑥) = 11.9

𝜇4! = 𝐸(𝑥4) = ∑ 𝑥4 𝑃(𝑥) = 31.7


(b) Using 𝐸(𝒙) = 𝝁 = 𝟐 and the formula ∑X (X − µ)rf(x), we get: the following table:
allX

(𝒙 − 𝟐) (𝒙 − 𝟐)𝟐 (𝒙 − 𝟐)𝟑 (𝒙 − 𝟐)𝟒

-2 4 -8 16

-1 1 -1 1

0 0 0 0

1 1 1 1

(𝒙 − 𝟐)𝒇(𝒙) (𝒙 − 𝟐)𝟐𝒇(𝒙) (𝒙 − 𝟐)𝟑 𝒇(𝒙) (𝒙 − 𝟐)𝟒 𝒇(𝒙)

-0.10 0.20 -0.40 0.80

-0.20 0.20 -0.20 0.20

0 0 0 0

0.30 0.30 0.30 0.30

TOTALS: 0 0.70 -0.30 1.30

32
Probability Theory – STAT311

Therefore, the first four moments about the mean are:


𝜇1 = 𝐸(𝑥 − 2) = ∑(𝑥 − 2)𝑃(𝑥) = 0

𝜇2 = 𝐸(𝑥 − 2)2 = ∑(𝑥 − 2)2𝑃(𝑥) = 0.70

𝜇3 = 𝐸(𝑥 − 2)3 = ∑(𝑥 − 2)3𝑃(𝑥) = −0.30

𝜇4 = 𝐸(𝑥 − 2)4 = ∑(𝑥 − 2)4𝑃(𝑥) = 1.30

EXAMPLE - CONTINUOUS CASE

If X is a continuous random variable having the following probability density


function
2x 0<x<1
f(x) = {
0 otherwise
FIND;

(a) The first three moments about the origin


(b) The first and second moments about the mean of X

SOLUTION:
(a) The first four moments about the origin by using the formula:
1 1
𝜇! 1= 𝐸(𝑥) = ∫ 𝑥𝑓(𝑥)𝑑𝑥 = 2 ∫ 𝑥2𝑑𝑥
0 0
2x2/3 1 2
( ) =
0 3
1 1
𝜇!2 = 𝐸(𝑥2) =∫ 𝑥2𝑓(𝑥)𝑑𝑥 = 2 ∫ 𝑥3𝑑𝑥
0 0
x4/2 1 1
( ) =
0 2
1 1
𝜇!3 = 𝐸(𝑥3) =∫ 𝑥3𝑓(𝑥)𝑑𝑥 = 2 ∫ 𝑥4𝑑𝑥
0 0
2x5/5 1 2
( ) =
𝟐
0 5
(b) Using 𝝁 = we get:
𝟑
The first moment about the mean: 𝜇1 = 𝐸(𝑋 − 2/3) = 0

33
Probability Theory – STAT311

The first moment about the mean


1
22
μ2 = E(X − 2/3)2 = ∫ (X − ) f(x)dx
3
0
1
2 2
= 2 ∫ 𝑋 (X − ) dx
3
0
1
4𝑥2 4x
= 2 ∫(𝑥3 − + )dx
3 9
0
x4 4𝑥3 2
− +4𝑥 /18 1
=2 ( 4 9 )
0
1 4 4 1
= 2( − + ) =
4 9 18 18

NOTICE THAT:
2
μ = μ! − μ!
2 2 1

1 2 2 1
−( ) =
2 3 18

34
Probability Theory – STAT311

THE MOMENT-GENERATING FUNCTION:


Let X be a random variable with a probability distribution f (x) the moment-
generating function of X, is given by E(etx) and is denoted by MX(t) . Hence:
𝑋
‫ﻟ‬ ∑ 𝑒𝑡𝑥𝑓(𝑥), 𝑖𝑓 𝑋 𝑖𝑠 𝑎 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒
I
𝑎𝑙𝑙𝑥
𝑀𝑋 (𝑡) = 𝐸(𝑒𝑡𝑥) = ∞

∫ 𝑒𝑡𝑥f(𝑋)𝑑𝑥 , 𝑖𝑓 𝑋 𝑖𝑠 𝑎 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠
I
𝗅−∞

Moment-generating functions will exist only if the sum or integral of the above
definition converges. If a moment-generating function of a random variable X does
exist, it can be used to generate all the moments of that variable.

Moments from the Moment-Generating Function:


Let X be a random variable with moment-generating function MX(t) . Then:
𝑑𝑟𝑀𝑋(𝑡)
| = 𝑟
𝑑𝑡𝑟
𝑡=0
Therefore,
𝑑𝑀𝑋(𝑡)
| = 1 =µ
𝑑𝑡
𝑡=0

𝑑2𝑀𝑋(𝑡)
| = 2
𝑑𝑡2
𝑡=0
In addition:
𝝈𝟐 = 2 − ( 1 )2

35
Probability Theory – STAT311

ASSIGNMENT 3
1. If Y is a discrete random variable having the following probability distribution,
calculate: (a) the first four moments about the origin and (b) the first and second
moment about the mean.
𝒀 0 1 2 3
𝑷(𝒀) 0.05 0.35 0.20 0.40

……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………….…………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
…………………………………………………….

2. If X is a continuous random variable having the following probability density


function
4x(9 − 𝑥2)/81 0<x<3
f(x) = {
0 otherwise
FIND;

(a) The first three moments about the origin


(b) The first and second moments about the mean of X
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
………………………………………………..

36
Probability Theory – STAT311

THE MGF OF THE BINOMIAL DISTRIBUTION

Suppose X has a Binomial (x; n, p) distribution. The Function 𝒇(𝒙) is


𝑛
𝑓(𝑥) = ( ) 𝑝𝑥 𝑞𝑛−𝑥 , 𝑖𝑓 𝑥 = 0,1,2, … … . . , 𝑛
{𝑥
0, 𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒

Then its moment generating function of the binomial random variable X is:
𝑛
𝑛
𝑀𝑋 (𝑡) = ∑ ( ) 𝑝𝑥 𝑞𝑛−𝑥
𝑒𝑡𝑥 𝑥
𝑥=0

𝑛𝑛
𝑡 𝑥 𝑛−𝑥
= ∑ ( ) (𝑝𝑒 ) 𝑞
𝑥
𝑥=0

Recognizing this last sum as the binomial expansion of (pet +q)n , we obtain

𝑀𝑋(𝑡) = (pet +q)n

MEAN AND V ARIANCE

Use the mgf to verify that

1- The mean µ = np .
2- The variance σ2 = npq .
Solution

The men
(𝑡)
E(x) = 𝑑𝑀𝑋 = 𝑛(𝑝𝑒𝑡 + 𝑞)𝑛−1𝑝𝑒𝑡
𝑑𝑡

Setting t = 0 , we get:

1= 𝑛𝑝

The variance

Differentiating a second time yields 2

𝑑2𝑀𝑋(𝑡)
= [𝒏(𝑛 − 1)(𝑝𝑒𝑡 + 𝑞)𝑛−2(𝑝𝑒𝑡)2] + [𝑛(𝑝𝑒𝑡 + 𝑞)𝑛−1𝑝𝑒𝑡]
𝑑𝑡2

Hence

𝐸(𝑋2) = 2 = 𝑀𝑋′′(0) = 𝒏(𝑛 − 1)𝑝2 + 𝑛𝑝

Therefore,
37
Probability Theory – STAT311

µ= 1 = 𝑛𝑝

Var (X) = 𝐸(𝑋2) − [𝐸(𝑋)]2

𝜎2 = 2 − ( 1) 2

= [𝑛(𝑛 − 1)𝑝2 + 𝑛𝑝] − [𝑛𝑝]2

= [𝑝𝑛]2− 𝑛𝑝2 + 𝑛𝑝 − [𝑛𝑝]2

= 𝑛𝑝(1 − 𝑝) = 𝑛𝑝𝑞

THE MGF OF THE POISSON DISTRIBUTION

Suppose X has a Poisson (x; 𝝀) distribution. The Function 𝒇(𝒙) is

𝑒−𝜆𝜆𝑥
𝑓(𝑋) = 𝑥 = 0, 1, 2 …
𝑥!
Then its moment generating function of the Poisson random variable X and mean λ
is:

MX(t) = E(etX) = ∑ etXf(x)


X=0

𝑛
𝑒−𝜆𝜆𝑥
= ∑(𝑒𝑡𝑥)( )
𝑥!
𝑥=0


(𝜆𝑒𝑡)𝑥
= 𝑒−𝜆 ∑
𝑥!
𝑥=0

𝑡
= 𝑒−𝜆𝑒𝜆𝑒
𝑡
𝑒𝜆(𝑒 −1)
𝑡 𝜆𝑒𝑡 (𝜆𝑒𝑡)2 (𝜆𝑒𝑡)𝑥
Since, 𝑒𝜆𝑒 = 1 + + + ⋯+ +..
1! 2! 𝑥!

(𝜆𝑒𝑡)𝑥
=∑
𝑥!
𝑥=0

MEAN AND VARIANCE

Use the mgf to verify that

1- The mean µ = 𝜆
2- The variance σ2 = 𝜆

38
Probability Theory – STAT311

SOLUTION:
THE MEN
𝑑𝑀𝑋(𝑡) 𝑡
E(x) = = 𝝀𝑒𝑡𝒆𝝀(𝑒 −1)
𝑑𝑡

Setting t = 0 , we get:
E(x) =𝑀𝑋′(0) = 𝝀

THE VARIANCE

Differentiating a second time yields 2

d2MX(t) t t
= (λet)2 eλ(e −1) + λeteλ(e −1)
dt2

HENCE

E(X2) = 2 = MX′′(0) = λ2 + λ
Therefore,
Var (X) = 𝐸(𝑋2) − [𝐸(𝑋)]2
= λ2 + λ − λ2

THE MGF OF THE GEOMETRIC DISTRIBUTION
Suppose X has a Geometric distribution. The Function 𝒇(𝒙) is
P(X) = (1 − p)𝑋−1p, x = 1,2, . ..
Then moment generating function of the Geometric random variable X is:
𝑛

𝑀𝑋(𝑡) = 𝐸(𝑒𝑡𝑥) = ∑ 𝑒𝑡𝑥p(𝑥)


𝑥=0
𝑛 𝑛

= ∑ 𝑒𝑡𝑥q𝑋−1p = 𝑒𝑡𝑝 ∑ 𝑒𝑡(𝑥−1)q𝑋−1


𝑥=0 𝑥=0
𝑛
1
=𝑝 𝑒𝑡 ∑(𝑞𝑒𝑡)𝑋−1 = 𝑝𝑒𝑡
1 − 𝑞𝑒𝑡
𝑥=0

𝑝𝑒𝑡
= 𝑤𝑕𝑒𝑟𝑒 𝑞 = 1 − 𝑝
1 − 𝑞𝑒𝑡
Since, ∑𝑛 (𝑞𝑒𝑡)𝑋−1 is a geometric progression its sum is 1
𝑥=0 1−𝑞𝑒𝑡

39
Probability Theory – STAT311

From this generating function, we can find the moments. For instance, E(x)
=𝑀𝑋′(0). The derivative


(1 − (1 − 𝑝)𝑒𝑡𝑝𝑒𝑡 − 𝑝𝑒𝑡(−(1 − 𝑝)𝑒𝑡
𝑀𝑋 (𝑡) = =
(1 − (1 − 𝑝)𝑒𝑡)2

(1 − 𝑞𝑒𝑡)𝑝𝑒𝑡 − 𝑝𝑒𝑡(−𝑞𝑒𝑡)
𝑀𝑋′(𝑡) =
(1 − 𝑞𝑒𝑡)2

Setting t = 0 gives E(x)=1/p.


′ 𝑝𝑒𝑡 1
E(x) = 𝑀𝑋 (0) = =
(1 − 𝑞𝑒𝑡)2 𝑝
1−𝑝
H.W. Show that the variance is
𝑝2

Distribution Probability Mass MX(t) E[X] Var(X)


Function mean Variance
𝑛
Binomial Bin(n, p) ( ) 𝑝𝑥(1 − 𝑃)𝑛−𝑥 (Pet +(1 − 𝑃))n np np(1 – p)
𝑥
Poisson P(λ) 𝒆−𝝀𝝀𝒙 𝑡
𝒆𝝀(𝑒 −1) 𝝀 𝝀
𝒙!
Geometric G(p) (1 − p)𝑋−1p 𝑝𝑒𝑡 1 1−𝑝
1 − (1 − 𝑝)𝑒𝑡 𝑝 𝑝2

40
Probability Theory – STAT311

THE MGF OF COMMON CONTINUOUS DISTRIBUTIONS


CONTINUOUS UNIFORM DISTRIBUTION
The probability density function for a uniform distribution taking values in the range
a to b is:
1
𝑓(𝑥) = {𝑏 − 𝑎 , a≤ x ≤ b
0 , 𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒
Computation of the mgf. Let X be a continuous random variable that follows a
uniform distribution U(a,b). The mgf of X is given by :

𝑀𝑋(𝑡) = ∫ 𝒆𝒕𝒙𝑓(𝑥)𝑑𝑥
−∞

𝑏
1
= ∫ 𝒆𝒕𝒙( )𝑑𝑥
𝑏−𝑎
𝑎

𝑏
1
=( )∫ 𝒆 𝑑𝑥
𝒕𝒙
𝑏−𝑎
𝑎
1 𝒆𝒕𝒙
=( ) 𝑥=𝑏
| 𝑥=𝑎
𝑏−𝑎 𝒕

𝒆𝒕𝒃−𝒆𝒕𝒂
=
𝑡(𝑏−𝑎)

The above equality holds for t ≠ 0. We notice that 𝑀𝑋(0) = 1.

THE EXPONENTIAL DISTRIBUTION


For a positive real number, 𝝀 the probability density function of a Exponentially
distributed Random variable is given by:

𝑓(𝑥) = {𝝀𝑒 , if x ϵR
−𝝀𝑥

0 , 𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒
Computation of the mgf. Let X be a continuous random variable that follows an
exponential distribution E(λ). The mgf of X is given by :

𝑀𝑋(𝑡) = ∫ 𝒆𝒕𝒙𝑓(𝑥)𝑑𝑥
−∞

41
Probability Theory – STAT311

= ∫ 𝒆𝒕𝒙𝝀𝑒−𝝀𝑥𝑑𝑥
0

= 𝝀 ∫ 𝑒−X(𝝀−𝑡)𝑑𝑥
0

𝑒−X(𝝀−𝑡) 𝑥=∞
= 𝝀( )|
−(𝝀 − 𝑡) 𝑥 = 0
𝝀
=( ) (0 − 1)
−(𝝀−𝑡)

𝝀
=
(𝝀 − 𝑡)

When integrating the exponential, we must be aware that

lim x → ∞ 𝑒−X(𝝀−𝑡) = 0

If and only if λ−t > 0. Therefore the derived formula holds if and only if t < λ.

The mean of an exponential random variable X with parameter 𝝀 is:


1
Use the mgf to verify that: E(X) =
𝜆

SOLUTION
So let us start with the derivatives. Say that we want the first moment, or the mean.
We will just take the derivative with respect to the dummy variable t and plug in
0 for t after taking the derivative
(𝑡)
E(x) = 𝑑𝑀𝑋 = 𝑀 ′(𝑡)
𝑑𝑡 𝑋

𝝀
𝑀𝑋(𝑡) =
(𝝀 − 𝑡 )
Then
′ 𝝀
𝑀𝑋 (𝑡) =
(𝝀 − 𝑡)2
′ 𝝀 𝝀 𝟏
𝑀𝑋 (0) = = =
(𝝀 − 0)2 (𝝀)2 𝝀

42
Probability Theory – STAT311

THE VARIANCE
Let’s do the variance now; we already have the first moment, but we need the
second moment E(X2) as well. We can just derive again and plug in t = 0 to find
the second moment:
2𝜆
𝑀𝑋 ′′(𝑡) =
(𝜆 − 𝑡)3
′′ 2𝜆 2
𝑀𝑋 (0) = =
(𝜆 − 0)3 𝜆2
VAR(X) = E(X2) - E(X)2
2 1 2 1
VAR(X) =
𝜆2
− (𝜆) = 𝜆2

SUMMARY OF COMMON CONTINUOUS DISTRIBUTION PDF, MGF, E(X), AND VAR(X)

43
Probability Theory – STAT311

PART TWO
Joint Probability
Distribution
Function

44
Probability Theory – STAT311

JOINT PROBABILITY DISTRIBUTION FUNCTION


- DISCRETE CASE
CONTENTS:
1- Joint probability mass function
2- Marginal Probability Distribution Function
3- Conditional Distribution
4- Statistical Independence
5- Mathematical Expectation
6- Covariance of random variables
7- Correlation Co-efficient

JOINT PROBABILITY MASS FUNCTION


DEFINITION:

If X and Y are two discrete random variables, then P(x,y) = P(X = x,Y= y) is called
the joint probability mass function j.p.m.f of X and Y , and P (x,y) has the
following properties:

1- 0 ≤ f (x,y) ≤ 1 ∀x and y
2- ∑∀𝑋 ∑∀𝑌 𝑝(𝑥, 𝑦)= 1

THE MARGINAL DISTRIBUTION:-


DEFINITION:

If X and Y are jointly discrete random variables, with the j.p.m.f p(x,y), then g(x)
and h(y) are called marginal probability mass function of X and Y, respectively
which can be calculated as follows:

1- g(x) =∑∀𝑥 𝑝(𝑋, 𝑌)


2- h(y) =∑∀𝑦 𝑝(𝑋, 𝑌)
The term marginal is used here because, in the discrete case, the values of g(x) and
h(y) are just the marginal totals of the respective columns and rows when the values
of p(x,y) are displayed in a rectangular table.

45
Probability Theory – STAT311

CONDITIONAL DISTRIBUTION

DEFINITION:

If X and Y are jointly random variables discrete or continuous, with the j.p.f
p(x,y),g(x) and h(y) are marginal probability distributions of X and Y respectively,
then the conditional distribution of the random variable Y given that X = x is :
𝑝(𝑥, 𝑦)
𝑝(𝑌|𝑋) = , 𝑔(𝑥) > 0
𝑔(𝑥)
Similarly the conditional distribution of the random variable X given that Y = y is:
𝑝(𝑥, 𝑦)
𝑝(𝑋|𝑌) = , 𝑕(𝑦) > 0
𝑕(𝑦)

MATHEMATICAL EXPECTATION OF TWO RANDOM VARIABLES

DEFINITION:

Let X and Y be a random variables with joint probability distribution f (x,y) the
expected value (mean) of the random variable g(X,Y) denoted by µg(X,Y) is :
:
𝐸(𝑥𝑦) = ∑𝑥 ∑𝑦 𝑥𝑦 𝑝(𝑥, 𝑦)

Let X and Y be a random variables with joint probability distribution p (x,y) the
expected value (mean) of the random variable X, is :

𝐸(𝑋) = ∑𝑥 ∑𝑦 𝑥𝑝(𝑥, 𝑦) =∑𝑎𝑙𝑙𝑥 𝑥𝑔(𝑥),

Where g(x) is the marginal distribution of X. Therefore, in calculating E(X) over a


two-dimensional space, one may use either the joint probability distribution of X and
Y or the marginal distribution of X.

Similarly, let X and Y be a random variables with joint probability distribution p(x,y)
the expected value (mean) of the random variable Y is :
𝐸(𝑦) = ∑𝑥 ∑𝑦 𝑦𝑝(𝑥, 𝑦) =∑𝑎𝑙𝑙𝑥 𝑦𝑕(𝑦),

46
Probability Theory – STAT311

Where h(y) is the marginal distribution of Y..

EXAMPLE 1:

Let X and Y are jointly discrete random variables, with the following j.p.m.f P
(x,y).

X 0 1 2 Sum

Y
0 3 9 3 15
28 28 28 28
1 6 6 0 12
28 28 28
2 1 0 0 1
28 28
Sum 10 15 3 1
28 28 28

a. Find 𝑃(𝑋 <2, 𝑌 ≤ 1)


b. Find the marginal PMFs of X and Y.
c. Find P(Y=1/X=0) and P(X=1/Y=1)..
d. Find the expected value of X and the expected value of Y
e. Find the expected value of g(X,Y) = E( XY).

SOLUTION

a. Find 𝑃(𝑋 <2, 𝑌 ≤ 1)

P[(X,Y) ∈ A] = 𝑃(𝑋 <2, 𝑌 ≤ 1)

= p(0,0) + p(0,1) + p(1,0) + p(1,1)

3 6 9 6 24
+ + + =
28 28 28 28 28
H.W. Find P [(X, Y) ∈ A] = P (X + Y ≤ 1)

FIND THE MARGINAL PMFS OF X AND Y.

For the random variable X, we see that:

47
Probability Theory – STAT311

6 1 10
g(0) = p(0,0) + p(0,1) + p(0,2) = 3 + + =
28 28 28 28
6 15
g(1) = p(1,0) + p(1,1) + p(1,2) = 9 + +0=
28 28 28
3
g(2) = p(2,0) + p(2,1) + p(2,2) = 3 + 0 + 0 =
28 28

X 0 1 2
g(X) 10 15 3
28 28 28
For the random variable Y, we see that:
9 3 15
h(0) = f (0,0) + f (0,1) + f (0,2) = 3 + + =
28 28 28 28
6 12
h(1) = f (1,0) + f (1,1) + f (1,2) = 6 + +0=
28 28 28
1
h(2) = f (2,0) + f (2,1) + f (2,2) = 1 + 0 + 0 =
28 28

X 0 1 2
h(y) 15 12 1
28 28 28

b. Find P(Y=1/X=0) and P(X=1/Y=1)..

𝑝(𝑥,𝑦) 6
P(Y=1/X=0)∶ 𝑝(𝑌|𝑋) = =
𝑝(0,1)
= 28
=
3
10
𝑔(𝑥) 𝑔(0) 5
28

𝑝(𝑥,𝑦) 6
P(X=1/Y=1): 𝑝(𝑋|𝑌) = =
𝑝(1,1)
= 28
=
1
12
𝑕(𝑦) 𝑕(1) 2
28

FIND THE EXPECTED VALUE OF X AND THE EXPECTED VALUE OF Y

Expected value of X: E(X) = µX = ∑𝑥 𝑥𝑔(𝑥)

0 1 2
X
10 15 3
g(x) 28 28 28

= (0)(10) + (1) (15) + (2) ( 3) = 3


28 28 28 4

Expected value of Y: E(Y) = µy = ∑𝑦 𝑦𝑕(𝑦)

48
Probability Theory – STAT311

0 1 2
Y
15 12 1
h(y) 28 28 28

= (0)(15) + (1) (12) + (2) ( 1) = 1


28 28 28 2

c. Find the expected value of g(X,Y) = E( XY).


𝐸(𝑥𝑦) = ∑𝑥 ∑𝑦 𝑥𝑦 𝑝(𝑥, 𝑦)

= (0)(0)p(0,0) + (0)(1)p(0,1) + (0)(2)p(0,2) + (1)(0)p(1,0) + (1)(1)p(1,1) +


(1)(2)p(1,2) + (2)(0)p(2,0) + (2)(1)p(2,1) + (2)(2)p(2,2)
3 6 1
=(0)(0)( ) + (0)(1) ( )+ (0)(2)( )+ (1)(0) ( 9 ) + (1)(1) ( 6 ) + (1)(2) (0) +
28 28 28 28 28
(2)(0) ( 3 ) + (2)(1) (0) + (2)(2) (0)= 6 = 3
28 28 14

EXPECTATION OF LINEAR COMBINATION

The expected value of the sum or difference of two or more functions of the random
variables X and Y is the sum or difference of the expected values of the functions.
That is,
E[g(X, Y) ± h(X, Y)] = E[g(X, Y)] ± E[h(X, Y)].
STATISTICAL INDEPENDENCE

DEFINITION:
Let X and Y be two random variables discrete or continuous, with the j.p.f f(x,y),
and marginal probability distributions g(x) and h(y) respectively. The random
variables X and Y are said to be statistically independent if and only if:
p(x,y) = g(x)h(y)
for all (x,y) within their ranges
EXAMPLE2:
If X and Y are jointly discrete random variables, with the following j.p.m.f p(x,y):
X 2 3 4
Y
1 0.06 0.15 0.09
2 0.14 0.35 0.21

1- Find the marginal probability function of X g(X).


2- Find the marginal probability function of Y h(y)
3- Are the two random variables X and Y independent?

49
Probability Theory – STAT311

SOLUTION:

1. The marginal probability function of X:

X 2 3 4
0.2 0.5 0.3
g(x)

2.The marginal probability function of Y:

Y 1 2
0.3 0.7
h(y)

3. Are the two random variables X and Y independent? if X and Y are


independent then, p(x,y) = g(x)h(y)
For example:
p(2,1) = 0.06
g(2)h(1) = 0.2×0.3 = 0.06

p(4,2) = 0.21
g(4)h(2) = 0.3×0.7= 0.21

p(2,2) = 0.14
g(2)h(2) = 0.2 × 0.7

p(4,1) = 0.09
g(4)h(1) = 0.3×0.3= 0.09
Hence, X and Y are statistically independent.

COVARIANCE OF RANDOM VARIABLES

DEFINITION

Let X and Y be a random variables with joint probability distribution p(x,y), the
covariance of X and Y which denoted by cov(X,Y) or σXY is :

E(X − µ𝑋)(Y − µ𝑌) = σ𝑋𝑌 = ∑𝑥 ∑𝑦(X − µ𝑋)(Y − µ𝑌) 𝑓(𝑥, 𝑦)

The covariance between two random variables is a measurement of the nature of


the association between the two variables.

NOTE

50
Probability Theory – STAT311

The alternative and preferred formula for σXY 𝑖𝑠 :

The covariance of two random variables X and Y with means µ𝑋 and µ𝑦


respectively, is given by:

σXY = E(XY) − µ𝑋µ𝑌

Example3: from example 1 find the covariance of two random variables X and Y

SOLUTION

SINCE,

3 3 1
𝐸(𝑥𝑦) = , µX = , µy =
14 4 2

σXY = E(XY) - µ𝑋µ𝑌


3 1
= 3 − ( )( )
14 4 2
9
=−
56

LINEAR COMBINATION

let X and Y be random variables with joint probability function p(x,y), a and b are
constants, then ,
var(ax ± bx) = 𝜎2 = 𝑎2𝜎2 + 𝑏2 𝜎 2 ±2abσ
𝑎𝑥±𝑏𝑦 𝑥 𝑦 XY

 If X and Y are independent random variables, then


var(ax ± bx) = 𝜎 2 = 𝑎2𝜎2 + 𝑏 2 𝜎2
𝑎𝑥±𝑏𝑦 𝑥 𝑦

51
Probability Theory – STAT311

CORRELATION COEFFICIENT

DEFINITION

Let X and Y be random variables with covariance σXY and standard deviations σX
and σY respectively. The correlation coefficient of X and Y is :

𝜌 𝜎𝑥𝑦
𝑥𝑦 =
𝜎𝑥𝜎𝑦

DEFINITION

The correlation coefficient satisfies the inequality ( 1 ≤ 𝜌𝑥𝑦 ≤ −1)

It assumes a value of zero when 𝜎𝑥𝑦 = 0 .

Where there is an exact linear dependency, say

Y = a + bX, 𝜌𝑥𝑦 = 1, if (b > 0) and 𝜌𝑥𝑦 = −1, if b < 0.

EXAMPLE3:

Let X and Y are jointly discrete random variables, with the following j.p.m.f P
(x,y).

X 0 1 2 Sum

Y
0 3 9 3 15
28 28 28 28
1 6 6 0 12
28 28 28
2 1 0 0 1
28 28
Sum 10 15 3
28 28 28

Find the correlation coefficient 𝜌𝑥𝑦.

52
Probability Theory – STAT311

SOLUTION

𝜌 𝜎𝑥𝑦
𝑥𝑦 =𝜎 𝜎
𝑥 𝑦

 We know that,
3 1
µX = , µy = σXY =- 9
4 2 56

Now we need to compute 𝜎𝑥 and 𝜎𝑦,


Var (X) = E(X2) − µ2 = ∑ 𝑥2𝑔(𝑥) − µ2
𝑥 allX 𝑥
3
[02 × 10+12 × 15+22 × ]- 21 2
28 28
( )
28 28

21 2
= 28 − (28)
27

315
=
784
= 0.964 – 0.563 = 0.401

σX = √0.401 = 0.6338
Var (y) = E(y2) − µ2 = ∑ 𝑦2𝑕(𝑦) − µ2
𝑦 ally 𝑦
1
[02 × 15+12 × 12+22 × ]-( 14)2
28 28 28 28

4 1 2
= −( )
7 2
9
=
28

σy = √0.3214 = 0.5669
Since,
σX = 0.6338, σy = 0.5669, σXY = -0.161
Therefore,

𝜌 −0.161
𝑥𝑦 = = − 0.448
(0.3593)

53
Probability Theory – STAT311

ASSIGNMENT 4
1. Let X denotes the number of times a certain numerical control machine will
malfunction: 1, 2, or 3 times on any given day. Let Y denote the number of times a
technician is called on an emergency call. Their joint probability distribution is given
as

X
𝒇(𝒙, 𝒚) 1 2 3
1 0.05 0.05 0.10
Y 3 0.05 0.10 0.35
5 0.00 0.20 0.10

(a) Evaluate the marginal distribution of X.


(b) Evaluate the marginal distribution of Y .
(c) Find P (Y = 3 | X = 2)

…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………
………………………………………………………………………………………..

2. Suppose that X and Y have the following joint probability distribution:

X
𝒇(𝒙, 𝒚) 2 4
1 0.10 0.15
Y 3 0.20 0.30
5 0.10 0.15
Find:

a. covariance X and Y
b. P [(X, Y ) ∈ A], where A is the region given by {(x, y) | x + y ≤ 5}.

…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………..

54
Probability Theory – STAT311

JOINT PROBABILITY DISTRIBUTION FUNCTION


- CONTINUOUS CASE
- CONTENTS:
1- Joint probability mass function
2- Marginal Probability Distribution Function
3- Conditional Distribution
4- Statistical Independence
5- Mathematical Expectation
6- Covariance of random variables
7- Correlation Coefficient

JOINT PROBABILITY DENSITY FUNCTION


DEFINITION:
If X and Y are two continuous random variables, then f (x,y) = P(X = x,Y = y) is
called the joint probability density function j.p.d.f of X and Y , and f (x,y) has the
following properties:

1- f (x,y) ≥ 0 ∀(x,y)

2- ∬−∞ 𝑓(𝑋, 𝑌)𝑑𝑥𝑑𝑦 = 1
EXAMPLE1:
A candy company distributes boxes of chocolates with a mixture of creams,
toffees, and nuts coated in both light and dark chocolate. For a randomly selected
box, let X and Y, respectively, be the proportions of the light and dark chocolates
that are creams and suppose that the joint density function is
2
𝑓(𝑥, 𝑦) = {5 (2𝑥 + 3𝑦) 𝑖𝑓 0 ≤ 𝑥 ≤ 1, 0 ≤ 𝑦 ≤ 1
0, 𝑜𝑡𝑕𝑒𝑤𝑖𝑒
1- Verify that f (x,y) is a joint probability density function.
2- Find P[(X,Y) ∈ A] where A = [(X,Y)|0 ≤ x ≤ 1, 1 ≤ y ≤ 1]
2 4 2

SOLUTION ∞
1. ∫ ∫ 𝑓(𝑋, 𝑌)𝑑𝑥𝑑𝑦 = 0 1 2 (2𝑥 + 3𝑦)𝑑𝑥𝑑𝑦
−∞ −∞ ∫1 ∫0 5
1
6𝑥𝑦 𝑥=1
2𝑥2
= ∫( + )| 𝑑𝑦
5 5 𝑥=0
0

55
Probability Theory – STAT311

0
2 6𝑦
= ∫ ( + ) 𝑑𝑦
5 5
1
2𝑦 3𝑦2 𝑦=1
=( + )|
5 5 𝑦=0
2 3
= + =1
5 5
2. To calculate the probability, we use
P [(X, Y ) ∈ A] = P( 0 < X < 1/ 2, 1/ 4 < Y < 1/ 2)
3. 𝑃 [(𝑋, 𝑌 ) ∈ 𝐴] = 𝑃( 0 < 𝑋 < 1/ 2, 1/ 4 < 𝑌 < 1/ 2)

1 1
2 2
2
= ∫ ∫ (2𝑥 + 3𝑦)𝑑𝑥𝑑𝑦
5
1 0
4
1
2 1
𝑥=
2𝑥2 6𝑥𝑦 2
= ∫( + )| 𝑑𝑦
5 5
1 𝑥=0
4
1
2
1 3𝑦
= ∫( + ) 𝑑𝑦
10 5
1
4
1 1
1 3𝑦 𝑦 3𝑦 2 𝑦=2
∫12 ( + ) 𝑑𝑦=( + )|
10 1
5 10 10 𝑦=
4 4
1 1 1 3
[( + 3) − ( + )] = 13 =0.081
10 2 4 4 16 160

2-MARGINAL PROBABILITY DISTRIBUTION FUNCTION


DEFINITION:
If X and Y are jointly continuous random variables, with the j.p.d.f, f (x,y), then
g(x) and h(y) are called marginal probability density function of X and Y,
respective ly which can be calculated as follows:


1- g(x) = ∫−∞ 𝑓(𝑋, 𝑌)𝑑𝑦


2- h(y) = ∫−∞ 𝑓(𝑋, 𝑌)𝑑𝑥

56
Probability Theory – STAT311

EXAMPLE2:
A candy company distributes boxes of chocolates with a mixture of creams,
toffees, and nuts coated in both light and dark chocolate. For a randomly selected
box, let X and Y, respectively, be the proportions of the light and dark chocolates
that are creams and suppose that the joint density function is

2
𝑓(𝑥, 𝑦) = {5 (2𝑥 + 3𝑦) 𝑖𝑓 0 ≤ 𝑥 ≤ 1, 0 ≤ 𝑦 ≤ 1
0, 𝑜𝑡𝑕𝑒𝑤𝑖𝑒

1- Find g(x) : the marginal probability density function of X.


2- Find h(y): the marginal probability density function of Y.


1- g(x) = ∫−∞ 𝑓(𝑋, 𝑌)𝑑𝑦
1
2
𝑔(𝑥) = ∫ (2𝑥 + 3𝑦)𝑑𝑦
5
0

4𝑥𝑦 6𝑦2 𝑦=1


=( + )|
5 10 𝑦=0
4𝑥 3
=( + )
5 5
4𝑥 + 3
=( )
5
4𝑥 + 3
( ) 𝑖𝑓 0 ≤ 𝑥 ≤ 1
𝑔(𝑥) = { 5
0, 𝑜𝑡𝑕𝑒𝑤𝑖𝑒


2- h(y) = ∫−∞ 𝑓(𝑋, 𝑌)𝑑𝑥
1
2
𝑕(𝑦) = ∫ (2𝑥 + 3𝑦)𝑑𝑥
5
0

57
Probability Theory – STAT311

𝑥=1
2𝑥2 6𝑥𝑦
=( + )|
5 5 𝑥=0
2
= ( + 6𝑦)
5 5

2(1 + 3𝑦)
=
5
2(1 + 3𝑦)
𝑖𝑓 0 ≤ 𝑦 ≤ 1
𝑕(𝑦) = { 5
0, 𝑜𝑡𝑕𝑒𝑤𝑖𝑒

3-CONDITIONAL DISTRIBUTION
DEFINITION:
If X and Y are jointly random variables discrete or continuous, with the j.p.f f
(x,y), g(x) and h(y) are marginal probability distributions of X and Y respectively,
then the conditional distribution of the random variable Y given that X = x is :
𝐹(𝑥, 𝑦)
𝑓(𝑌|𝑋) = , 𝑔(𝑥) > 0
𝑔(𝑥)
Similarly the conditional distribution of the random variable X given that Y = y is:
𝐹(𝑥, 𝑦)
𝑓(𝑋|𝑌) = , 𝑕(𝑦) > 0
𝑕 ( 𝑦)
EXAMPLE3:

If X and Y are jointly continuous random variables, with the following j.p.d.f f
(x,y):
10𝑥𝑦2 𝑖𝑓 0 < 𝑥 < 𝑦 < 1
𝑓(𝑥, 𝑦) = {
0, 𝑜𝑡𝑕𝑒𝑤𝑖𝑒

1- Find g(x) : the marginal probability density function of X.


2- Find h(y): the marginal probability density function of Y.
3- Find P(y > 1/2 | X = 0.25).
SOLUTION:

1- 𝑔(𝑥) = ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑦
1

𝑔(𝑥) = ∫ 10𝑥𝑦2𝑑𝑦
𝑥
10
= 𝑥𝑦3|𝑦=1
𝑦=𝑥
3

58
Probability Theory – STAT311

10
= 𝑥(1 − 𝑥3)
3
10
𝑥(1 − 𝑥3) 𝑖𝑓 0 < 𝑥 < 1
𝑔(𝑥) = { 3
0, 𝑜𝑡𝑕𝑒𝑤𝑖𝑒

2− 𝑕(𝑦) = ∫ 𝑓(𝑥, 𝑦)𝑑𝑥


−∞
𝑦

𝑕(𝑦) = ∫ 10𝑥𝑦2𝑑𝑦
0
10
= 𝑥2𝑦2|𝑥=𝑦
𝑥=0
2

= 5𝑦4
5𝑦4 𝑖𝑓 0 < 𝑦 < 1
𝑕(𝑦) = {
0, 𝑜𝑡𝑕𝑒𝑤𝑖𝑒

𝑓(𝑥,𝑦)
3 - 𝑓(𝑦|𝑥) =
𝑔(𝑥)

= 10𝑥𝑦2
10
3 𝑥(1 − 𝑥 )
3

3𝑦2
=
(1 − 𝑥3)
Therefore
1

P(Y > 0.5 | X = 0.25) = ∫ 𝑓(𝑦|𝑥 = 0.25)𝑑𝑦


0.5
1
3𝑦2
∫ 𝑑𝑦
(1 − 0.253)
0.5
1
3𝑦2
∫ 𝑑𝑦
0.98
0.5

𝑦3 𝑦=1
= |𝑦=0.5 = 0.89
0.98
59
Probability Theory – STAT311

4-STATISTICAL INDEPENDENCE

Definition:

Let X and Y be two random variables discrete or continuous, with the j.p.f f (x,y),
and marginal probability distributions g(x) and h(y) respectively. The random
variables X and Y are said to be statistically independent if and only if:

f (x,y) = g(x)h(y)

for all (x,y) within their ranges.

EXAMPLE4:

If X and Y are jointly continuous random variables, with the following j.p.d.f

f (x,y):
𝑥(1 + 3𝑦2)
𝑖𝑓 0 < 𝑥 < 2, 0 < 𝑦 < 1
𝑓(𝑥, 𝑦) = { 4
0, 𝑜𝑡𝑕𝑒𝑤𝑖𝑒

1- Find g(x) : the marginal probability density function of X.


2- Find h(y): the marginal probability density function of Y.
3- Are the two random variables X and Y independent?
SOLUTION

1- Find g(x) : the marginal probability density function of X.

1
𝑥(1 + 3𝑦2)
𝑔(𝑥) = ∫ 𝑑𝑦
4
0
𝑦=1
𝑥𝑦 𝑥𝑦3
=( + )|
4 4 𝑦=0
𝑥
=
2
𝑥
𝑖𝑓 0 < 𝑥 < 2
𝑔(𝑥) = { 2
0, 𝑜𝑡𝑕𝑒𝑤𝑖𝑒

2- Find h(y): the marginal probability density function of Y.

60
Probability Theory – STAT311

2
𝑥(1 + 3𝑦2)
𝑕(𝑦) = ∫ 𝑑𝑥
4
0
𝑥 3𝑥2𝑦2 𝑥=2
2
=( + )|
8 8 𝑥=0
1 3𝑦 2
=( + )
2 2
1 + 3𝑦2
𝑖𝑓 0 < 𝑦 < 1
𝑕(𝑦) = { 2
0, 𝑜𝑡𝑕𝑒𝑤𝑖𝑒
3- Are the two random variables X and Y independent? if X and Y are
independent then,
f (x ,y) = g(x)h(y)

𝑥 1+3𝑦2
= ( )
2 2
𝑥(1+3𝑦2)
=
4
= f (x ,y)

Hence, X and Y are statistically independent.

61
Probability Theory – STAT311

5- MATHEMATICAL EXPECTATION

Definition:
Let X and Y be a random variables with joint probability distribution f (x,y) the
expected value (mean) of the random variable g(X,Y) denoted by µg(X,Y) is :

∞ ∞

𝐸[𝑔(𝑋, 𝑌)] = { ∫ ∫ 𝑔(𝑥, 𝑦)𝑓(𝑥, 𝑦) 𝑖𝑓 𝑥 𝑎𝑛𝑑 𝑦 𝑎𝑟𝑒 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠


−∞ −∞

EXAMPLE5:

If X and Y are jointly continuous random variables, with the following j.p.d.f
f (x,y):
𝑥(1 + 3𝑦2)
𝑖𝑓 0 < 𝑥 < 2, 0 < 𝑦 < 1
𝑓(𝑥, 𝑦) = { 4
0, 𝑜𝑡𝑕𝑒𝑤𝑖𝑒
𝑦
Find 𝐸( )
𝑥

2 1
𝑦 𝑦 𝑥(1 + 3𝑦2)
𝐸( ) = ∫∫ 𝑑𝑥𝑑𝑦
𝑥 𝑥 4
0 0
2 1
𝑦(1 + 3𝑦2)
= ∫∫ 𝑑𝑥𝑑𝑦
4
0 0
1 𝑥=2
𝑥𝑦(1 + 3𝑦2)
=∫ 𝑑𝑦|
4
0 𝑥=0
1
𝑦(1 + 3𝑦2)
=∫ 𝑑𝑦
2
0
𝑦=1
1 𝑦2 3
= ( + 𝑦4)|
2 2 4 𝑦=0
𝑦=1
𝑦2 3
= ( + 𝑦4)|
4 8 𝑦=0
5
=
8
62
Probability Theory – STAT311

EXAMPLE5:

If X and Y are jointly continuous random variables, with the following j.p.d.f f
(x,y):
𝑥(1 + 3𝑦2)
𝑖𝑓 0 < 𝑥 < 2, 0 < 𝑦 < 1
𝑓(𝑥, 𝑦) = { 4
0, 𝑜𝑡𝑕𝑒𝑤𝑖𝑒

Find
1- E(XY)
2- E(X)
3- E(Y)
4- Illustrate that E(XY) = E(X)E(Y)
SOLUTION

1- 𝐸(𝑥𝑦) = 2 1
𝑥𝑦
𝑥(1+3𝑦2)
𝑑𝑥𝑑𝑦
∫0 ∫0 4
2 1
𝑦𝑥2(1 + 3𝑦2)
= ∫∫ 𝑑𝑥𝑑𝑦
4
0 0
1 𝑥=2
𝑦𝑥3(1 + 3𝑦2)
=∫ 𝑑𝑦|
12
0 𝑥=0
1
2𝑦(1 + 3𝑦2)
=∫ 𝑑𝑦
3
0
1
(2𝑦 + 6𝑦3)
=∫ 𝑑𝑦
3
0

1 3 𝑦=1
= (𝑦2 + 𝑦4)|
3 2 𝑦=0
1 3
= +
3 6

5
=
6

63
Probability Theory – STAT311

2- 𝐸(𝑥) = 2 1
𝑥
𝑥(1+3𝑦2)
𝑑𝑥𝑑𝑦
∫0 ∫0 4
2 1
𝑥2(1 + 3𝑦2)
= ∫∫ 𝑑𝑥𝑑𝑦
4
0 0
1 𝑥=2
𝑥3(1 + 3𝑦2)
=∫ 𝑑𝑦|
12
0 𝑥=0
1
2(1 + 3𝑦2)
=∫ 𝑑𝑦
3
0
1
(2 + 6𝑦2)
=∫ 𝑑𝑦
3
0

𝑦=1
(2𝑦 + 2𝑦3)
= |
3
𝑦=0

4
=
3
3- 𝐸(𝑦) = 2 1
𝑦
𝑥(1+3𝑦2)
𝑑𝑥𝑑𝑦
∫0 ∫0 4
2 1
𝑦𝑥(1 + 3𝑦2)
= ∫∫ 𝑑𝑥𝑑𝑦
4
0 0
1 𝑥=2
𝑥2𝑦(1 + 3𝑦2)
=∫ 𝑑𝑦|
8
0 𝑥=0
1
(𝑦 + 3𝑦3)
=∫ 𝑑𝑦
2
0

1 2 3 𝑦=1

(2 𝑦 + 4 𝑦
4 )
= |
2
𝑦=0
5
=
8
HENCE,
5
E(X)E(Y) = 4 * 5 = = E(XY)
3 8 6

64
Probability Theory – STAT311

6- COVARIANCE COEFFICIENT
COVARIANCE OF RANDOM VARIABLES
DEFINITION
Let X and Y be a random variables with joint probability distribution f (x,y) the
covariance of X and Y which denoted by cov(X,Y) or σXY is :

E(X −µX)(Y −µY) =


σ = { ∞ ∞ (X − µ )(Y − µ ) 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦
𝑋𝑌 ∫−∞ ∫−∞ 𝑋 𝑌

 The covariance between two random variables is a measurement of the nature


of the association between the two.
 If large values of X often result in large values of Y or small values of X result
in small values of Y, positive X −µX will often result in positive Y −µY and
negative X −µX will often result in negative Y −µY Thus, the product (X
−µX)(Y −µY) will tend to be positive.
 On the other hand, if large X values often result in small Y values, the product
(X −µX)(Y −µY) will tend to be negative.
 Thus the sign of the covariance indicates whether the relationship between two
dependent random variables is positive or negative
 When X and Y are statistically independent, it can be shown that the covariance
is zero.
 Note that the covariance only describes the linear relationship between two
random variables.
 Therefore, if a covariance between X and Y is zero, X and Y may have a
nonlinear relationship, which means that they are not necessarily independent..

DEFINITION
The alternative and preferred formula for σXY is:

The covariance of two random variables X and Y with means µX and µY


respectively, is given by:

σXY = E(XY) - µ𝑋µ𝑌

EXAMPLE2:
If X and Y are jointly continuous random variables, with the following j.p.d.f

f (x,y):

8𝑥𝑦 𝑖𝑓 0 < 𝑦 < 𝑥 < 1


𝑓(𝑥, 𝑦) = {
0,
𝑜𝑡𝑕𝑒𝑤𝑖𝑒
Find the covariance of X and Y

65
Probability Theory – STAT311

SOLUTION
We first compute the marginal density functions g(x) and h(y)

𝑔(𝑥) = ∫ 𝑓(𝑥, 𝑦)𝑑𝑦


−∞
𝑥

𝑔(𝑥) = ∫ 8𝑥𝑦𝑑𝑦
0
= 4𝑥𝑦2|𝑦=0
𝑦=𝑥

= 4𝑥3
4𝑥3 𝑖𝑓 0 < 𝑥 < 1
𝑔(𝑥) = {
0, 𝑜𝑡𝑕𝑒𝑤𝑖𝑒

𝑕(𝑦) = ∫ 𝑓(𝑥, 𝑦)𝑑𝑥


−∞
1

𝑕(𝑦) = ∫ 8𝑥𝑦𝑑𝑥
𝑦
= 4𝑦𝑥2|𝑥=𝑦
𝑥=1

= 4𝑦(1 − 𝑦2)
4𝑦(1 − 𝑦2) 𝑖𝑓 0 < 𝑦 < 1
𝑕(𝑦) = {
0, 𝑜𝑡𝑕𝑒𝑤𝑖𝑒

We need to compute µX and µY from marginal distribution functions g(x) and h(y)
respectively
∞ 1

µ𝑥 = ∫ 𝑥𝑔(𝑥)𝑑𝑥 = ∫ 4𝑥4𝑑𝑥
−∞ 0
4 5 1
= 𝑥| = 4
5 0 5
∞ 1

µ𝑦 = ∫ 𝑦𝑕(𝑥)𝑑𝑥 = ∫ 4𝑦2(1 − 𝑦2)𝑑𝑦


−∞ 0
4 4 1
= 𝑦3 − 𝑦3| = 8
3 5 0 15

66
Probability Theory – STAT311

1 1

𝐸(𝑥𝑦) = ∫ ∫ 8𝑥2𝑦2𝑑𝑥𝑑𝑦
𝑦 0

1 𝑥=1
8
= (∫ 𝑥3𝑦2𝑑𝑦)|
3
0 𝑥=𝑦

18
=(∫ (𝑦2 − 𝑦5)𝑑𝑦
03

8 8 𝑦=1
=(𝑦 −
3 𝑦6)|
9 18 𝑦=0

4
=
9

SINCE,
4 4 8
𝐸(𝑥𝑦) = , µX = , µy =
9 5 15

Therefore,

σXY = E(XY) - µ𝑋µ𝑌


4 8
= 4 − ( )( )
9 5 15

4
=
225
= 0.018

67
Probability Theory – STAT311

ASSIGNMENT 5
1- Let X and Y be two jointly continuous random variables with joint PDF

10𝑥2𝑦 𝑖𝑓 0 ≤ 𝑥 ≤ 𝑦 ≤ 1
𝑓(𝑥, 𝑦) = {
0, 𝑜𝑡𝑕𝑒𝑤𝑖𝑒

a) Find g(x) : the marginal probability density function of X.


b) Find h(y): the marginal probability density function of Y.
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………….

……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
…………………………………………………………………..

2- Let X and Y be two jointly continuous random variables with joint PDF

10𝑥2𝑦 𝑖𝑓 0 ≤ 𝑥 ≤ 𝑦 ≤ 1
𝑓(𝑥, 𝑦) = {
0, 𝑜𝑡𝑕𝑒𝑤𝑖𝑒

Illustrate that E(XY) = E(X)E(Y)

……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………….…………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
…………………………………………

68

You might also like