Probability Theory Lecture Notes
Probability Theory Lecture Notes
PROBABILITY THEORY
STAT311
1
Probability Theory – STAT311
PART ONE
3
Probability Theory – STAT311
RANDOM VARIABLES:
A random variable is a function that assigns numbers to the basic experimental outcomes.
Let’s consider the coin example and let X(tail) = 0 and X(head) = 1 in which case the
variable X is defined as the number of heads occurring.
Generally speaking random variables are either discrete or continuous. It is important to
distinguish between discrete and continuous random variables because different
mathematical techniques are utilized depending on which type is involved.
A random variable is said to be discrete if it can assume only a finite number of values. A
random variable is said to be continuous if it can assume any value in some interval or set
of intervals. In the continuous case, the set of possible outcomes is always infinite.
Therefore, it is not possible to list the sample space by individual values in any form. The
way to distinguish between discrete and continuous is to ask whether the values of the
random variable can be counted. The outcomes of discrete random variables can be counted
(e.g., the number of heads in 10 coin tosses number, or the number of defectives items in
batch of 300 units). The outcomes of continuous random variables are measured rather
than counted (e.g., the weight of an individual).
𝑥 𝑃(𝑥)
0 0.45
1 0.25
2 0.20
3 0.10
∑𝟑 𝒑(𝒙) = 𝟏
𝒊=𝟎
4
Probability Theory – STAT311
EXAMPLE:
Let 𝑥 be the number of heads, find the probability distribution of 𝑥 if we toss a coin
three times.
𝑝(𝑥) ≥ 0
∑3𝑖=0 𝑝(𝑥) = 1
ANOTHER EXAMPLE:
𝑥
𝑃(𝑥) = , 𝑥 = 1,2,3,4 is a probability distribution since:
10
𝑥 0 1 2 3 4 ∑ 𝑃(𝑋)
5
Probability Theory – STAT311
- The event 𝑋 ≥ 2 is the event that a household in this income group has at least
two children and means that 𝑋 = 2, or 𝑋 = 3 , or 𝑋 = 4. The probability that
𝑋 ≥ 2 is given by:
𝑃(𝑋 ≥ 2) = 𝑃(𝑋 = 2) + 𝑃(𝑋 = 3) + 𝑃(𝑋 = 4) = .50+ . 𝐼0 + .05 = . 𝟔𝟓
- The event X ≤ 1 is the event that a household in this income group has at most
one child and is equivalent to X = 0, or X = 1. The probability that 𝑋 ≤ 1 is given
by: P (𝑋 ≤ 1) = P(X = 0) + P(X = I ) = .I0+ .25 = . 35
- The event 1 ≤ 𝑋 ≤ 3 is the event that a household has between one and three
children inclusive and is equivalent to X = I , or X = 2, or X = 3. The probability
that 1 ≤ 𝑋 ≤ 3 is given by
𝑃(1 ≤ 𝑋 ≤ 3) =P(X = I ) + P(X = 2) + P(X = 3 ) = .25 + .50 + .I0 = .85
CUMULATIVE DISTRIBUTION FUNCTIONS FOR DISCRETE RANDOM VARIABLES
The cumulative distribution or the cumulative probability distribution of a random
variable is P(X ≤ x). It is obtained in a way similar to finding the cumulative relative
frequency distribution for samples.
EXAMPLE: For the following probability distribution, calculate the below
probabilities:
x P(x)
1 0.10
2 0.44
3 0.30
4 0.16
∑4 p(x) = 1
i=1
i. P(X ≤ 1) = 0.10
ii. P(X ≤ 2) = P(X = 1) + P(X = 2) = 0.1 + 0.44 = 0.54
iii. P(X ≤ 3) = P(X = 1) + P(X = 2) + P(X = 3) = 0.1 + 0.44 + 0.3 = 0.84
iv. P(X ≤ 4) = P(X = 1) + P(X = 2) + P(X = 3) + P(X = 4) = 0.1 + 0.44 +
0.3 + 0.16 = 1
6
Probability Theory – STAT311
In working with discrete random variables it is helpful to know certain values that aid in
describing the distribution. The most commonly used values are those that identify the
physical center and the dispersion (the way the values are spread around the center). Given
a discrete random variable 𝑥, with probability function 𝑃(𝑥) the expected value of 𝑥 (also
called the mean and denoted E[X] ) or (mean of random variable 𝑥) is defined as the
weighted average of the values of 𝑥 may assume where weights are the corresponding
probabilities, that is:
𝝁 = 𝑬(𝒙) = ∑ 𝒙 𝑷(𝒙)
EXAMPLE:
Determine the expected number of broken tools per day for the probability distribution
given in Table1:
TABLE1: DISCRETE PROBABILITY DISTRIBUTION FOR BROKEN TOOLS
Number Broken per day(𝒙) 𝑷(𝒙) 𝑿 𝑷(𝒙)
0 .23 0
1 .50 .50
2 .15 .30
3 .08 .24
4 .04 .16
𝑻𝒐𝒕𝒂𝒍 ∑ 𝑷(𝒙) = 𝟏 ∑ 𝒙 𝑷(𝒙) = 𝟏. 𝟐
Hence the mean or expected number of broken tools per day is 1.20. This value can be
interpreted as the long run average of broken tools per day. Obviously, this value cannot
occur on any day since the average is not an integer value. However in interpreting this
value say over 50 days, the factory can expect to have (50).(1.2)= 60 broken tools. This
result does not imply that exactly 60 tools will be broken over the 50 day period, but it
does provide an estimate of the replacement tools that will be needed.
7
Probability Theory – STAT311
8
Probability Theory – STAT311
PROBABILITY
DISTRIBUTIONS FOR
DISCRETE RANDOM
VARIABLES: BINOMIAL,
GEOMETRIC, AND POISSON
BINOMIAL DISTRIBUTION
There are many different discrete probability distributions; here we will be concerned
with just one: binomial distribution.
The simplest random variable is one that has one value. However, we would have little
interest in such a random variable. If the random variable can assume one of two possible
values, it could be used to describe an experiment that can be classified as resulting in either
“success” or “failure”. Let’s assume that the random variable assigns the value of 1 to
success and value of 0 to failure with probability of p to success and probability of 1-p to
failure. This type of discrete variable is generally known as a Bernoulli random variable.
If a random experiment consists of making n independent trials from an infinite population
where the probability of success 𝑝 is constant from trial to trial, the probability of number
of successes 𝑃(𝑥) is given by the binomial distribution. The general forms of the probability
function for the family of binomial distributions given by:
𝒏 𝒏!
𝑷(𝒙) = 𝑷(𝑿 = 𝒙) = ( ) 𝒑𝒙𝒒𝒏−𝒙 = 𝒑𝒙𝒒𝒏−𝒙 , 𝒙 = 𝟎, 𝟏, 𝟐 … 𝒏
𝒙 𝒙! (𝒏 − 𝒙)!
PROPERTIES OF A BINOMIAL PROCESS ARE LISTED BELOW:
1. There are two possible outcomes for each trial. Outcomes could be yes or
no, success or failure, defective or non-defective, heads or tails and so on.
2. The probability of an outcome remains constant from trial to trial. For
example the probability of success or failure on any trial remains the same
regardless of the number of trials. If the probability of success is .30 it will
remain .30 on each trial regardless of the number of successes on previous
trials.
3. Related to number 2, outcomes of the trials are independent. In other
words, if a success occurred on a previous trial, it does not affect the
probability of success on the next trial.
9
Probability Theory – STAT311
4. The number of trials are discrete and integer. For example, the number of
trials can be 10 but not 10.3.
Then as we stated above, the discrete random variable 𝑋 = the number of successes in 𝑛
trials has a Binomial (n, p) distribution for which the probability distribution function is
given by:
𝑛 𝑛!
𝑃(𝑋) = 𝑃 (𝑋 = 𝑥) = ( ) 𝑝𝑥 𝑞𝑛−𝑥 = 𝑝 𝑥 𝑞𝑛−𝑥 , 𝑥 = 0, 1, 2 … 𝑛
𝑥 𝑥! (𝑛 − 𝑥)!
WHERE:
(5) =
5! 5! 5×4×3!
i. = = = 10
2 2!(5−2)! 2!3! 2×1×3!
8 8! 8! 8×7×6!
ii. ( )= = = = 28
6 6!(8−6)! 6!2! 6!2×1
Example: Consider the experiment of tossing a coin twice. Let 𝑥 be the number of
heads(𝐻). Find the probability of getting:
i. 0 heads
ii. 1 head
iii. Two heads
SOLUTION: In this example:
1 1
𝑛=2 𝑝= 𝑞 =1−𝑝 =
2 2
10 12
∴ 𝑃(𝑋 = 0) = (2) 2 × 2 = 0!(2)! 4 = 𝟒
2! 1 𝟏
i.
0
𝑃(𝑋 = 1) = (2) 2 × 2 = 1!1! 4 = 2 × 4 = 𝟐
1 1 2! 1 1 𝟏
ii.
1
iii. 2 12 10
𝑃(𝑋 = 2) = ( ) 2 × 2 =
2! 1 𝟏
=
2 0!(2)! 4 𝟒
THE MEAN AND VARIANCE OF BINOMIAL DISTRIBUTION:
10
Probability Theory – STAT311
EXAMPLE:
Suppose that the probability that a man in certain country has a high blood pressure is
0.15. If we randomly select six men in this country:
a. Find the probability distribution function for the number of men out of 6 with high
blood pressure.
b. Find the probability that there are 4 men with high blood pressure?
c. Find the probability that all the 6 men have high blood pressure?
d. Find the probability that none of the 6 men have high blood pressure?
e. What is the probability that more than two men will have high blood pressure?
f. Find the expected number and variance of high blood pressure
SOLUTION:
Let x =the number of men out of 6 with high blood pressure. Then x has a binomial
distribution.
Success = The man has a high blood pressure. p
11
Probability Theory – STAT311
H.W.:
GEOMETRIC DISTRIBUTION
PROPERTIES:
1. An experiment consists of repeating trials until first success.
2. Each trial has two possible outcomes;
(a) A success with probability p
12
Probability Theory – STAT311
EXAMPLE 1
Products produced by a machine has a 3% defective rate.
i. What is the probability that the first defective occurs in the fifth item inspected?
ii. What is the probability that the first defective occurs in the first five inspections?
Solution
P(X = 5) = P(1st 4 non-defective )P( 5th defective)
P(X= 5) = (0.97)4×0.03 = 0.027
What is the probability that the first defective occurs in the first five inspections?
P(X ≤ 5) = 1 − P(First 5 non-defective)
1 − (0.97)5 = 0.141
EXAMPLE 2
A representative from the National Football League's Marketing Division randomly
selects people on a random street in Kansas City, Kansas until he finds a person who
attended the last home football game. Let p, the probability that he succeeds in
finding such a person (marketing), equal 0.20. And, let X denote the number of
people he selects until he finds his first success. What is the probability that the
marketing representative must select 4 people before he finds one who attended the
last home football game?
SOLUTION:
To find the desired probability, we need to find P(X = 4), which can be determined
readily using the p.m.f. of a geometric random variable with p = 0.20, 1−p = 0.80,
and x = 4:
P(X = 4) = (𝟏 − 𝟎. 𝟐𝟎)𝒙−𝟏𝑷
P(X= 4) = 0.803×0.20
=0.1024
There is about a 10% chance that the marketing representative would have to select
4 people before he would find one who attended the last home football game.
EXPECTATION OF A GEOMETRIC RANDOM VARIABLE:
13
Probability Theory – STAT311
where q =1−p,
POISSON DISTRIBUTION
The Poisson distribution describes the numbers of occurrences, over some defined
interval, of independent random events that are uniformly distributed over that
interval.
14
Probability Theory – STAT311
The probability that X will occur (the probability distribution function) is given
by:
𝑒−𝜆𝜆𝑥
𝑃(𝑋 = 𝑥) = 𝑥 = 0, 1, 2 … … …
𝑥!
Where:
- 𝑒 = 2.7182
- 𝑋 ∶ Representing the number of occurrences in a continuous interval.
- λ is the expected (average) number of occurrences of the random variable in
this interval.
EXAMPLES OF POISSON DISTRIBUTION:
- The probability of an occurrence is the same for any two intervals of equal
length!! The expected value of occurrences in an interval is proportional
to the length of this interval.
- The occurrence or nonoccurrence in any interval is independent of the
occurrence or nonoccurrence in any other interval.
- The probability of two or more occurrences in a very small interval is close
to 0
15
Probability Theory – STAT311
𝜎2 = 𝜆
EXAMPLE: Suppose we are interested in the number of snake bite cases seen in a
particular hospital in a year. Assume that the average number of snake bite cases at
the hospital in a year is 6.
𝑒−667
𝑃(𝑋 = 7) = = 0.138
7!
2- The probability that the number of cases will be less than 2 in 6 months
λ=3
16
Probability Theory – STAT311
𝑒−330 𝑒−331
𝑃(𝑋 < 2) = + = 0.0498 + 0.149 = 0.199
0! 1!
𝑒−121213 λ=12
𝑃(𝑋 = 13) = = 0.106
13!
𝜎2 = 𝜆=6
EXAMPLE: Suppose the average number of car accidents on the highway in one day
is 4. What is the probability of no car accident in one day? What is the probability
of 1 car accidence in two days?
SOLUTION:
It is sensible to use Poisson random variable representing the number of car
accidents on the high way. Let X representing the number of car accidents on the
high way in one day. Then,
𝑒−44𝑖
𝑃(𝑋 = 𝑖) = , 𝑖 = 0, 1, 2, … ….
𝑖!
And, 𝐸(𝑥) = 𝜆 = 4
Then,
𝑒−440
𝑃 (𝑁𝑜 𝑐𝑎𝑟 𝑎𝑐𝑐𝑖𝑑𝑒𝑛𝑡 𝑖𝑛 𝑜𝑛𝑒 𝑑𝑎𝑦) = 𝑃(𝑋 = 0) = = 𝑒−4 = 0.0183
0!
Since the average number of car accidents in one day is 4, thus the average number
of car accidents in two days should be 8. Let Y represent the number of car accidents
in two days. Then,
𝐸(𝑥) = 𝜆 = 8
Therefore,
17
Probability Theory – STAT311
𝑒−881
𝑃 (𝑜𝑛𝑒 𝑐𝑎𝑟 𝑎𝑐𝑐𝑖𝑑𝑒𝑛𝑡 𝑖𝑛 𝑡𝑤𝑜 𝑑𝑎𝑦𝑠) = 𝑃(𝑌 = 1) = = 8𝑒−8 ≈ 0.002
1!
EXAMPLE:
Suppose the average number of calls by 80 in one minute is 2. What is the probability
of 5 calls in 5 minutes?
SOLUTION:
Since the average number of calls by 100 in one minute is 2, thus the average number
of calls in 5 minutes is 10. 𝐿𝑒𝑡 𝑋 represents the number of calls in 5 minutes. Then,
e101010
P(10 calls in 5 minutes) P(X 10) fx (10) 0.1251 .
10! λ=10
𝑒−10105
𝑃(5 𝑐𝑎𝑙𝑙𝑠 𝑖𝑛 5 𝑚𝑖𝑛𝑢𝑡𝑒𝑠) = 𝑃(𝑋 = 5) = = 0.0378
5!
ASSIGNMENT 1
1. The distribution of the number of children per household for households receiving Aid
to Dependent Children (ADC) in a large eastern city is as follows: Five percent of the
ADC households have one child, 35% have 2 children, 30% have 3 children, 20% have
4 children, and 10%have 5 children. Construct the probability distribution and find the
mean and the variance number of children per ADC household in this city
18
Probability Theory – STAT311
……………………………………………………………………………………………
…………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
………………………………………………………………………
2. Approximately 12% of the U.S. population is composed of African-Americans.
Assuming that the same percentage is true for telephone ownership, what is the
probability that when 25 phone numbers are selected at random for a small survey, that
5 of the numbers belong to an African-American family? Use binomial distribution to
solve the problem
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………………………
…………………………………………………………………………………
19
Probability Theory – STAT311
CONTINUOUS PROBABILITY
DISTRIBUTION
A continuous random variable is a random variable capable of assuming all the values in
an interval or several intervals of real numbers. Because of the uncountable number of
possible values, it is not possible to list all the values and their probabilities for a continuous
random variable in a table as is true with a discrete random variable. The probability
distribution for a continuous random variable is represented as the area under a curve called
the probability density function, abbreviated pdf. A pdf is characterized by the following
two basic properties: The graph of the pdf is never below the x axis and the total area under
the pdf always equals I.
we really want to know is the probability of the value falling within a certain interval.
20
Probability Theory – STAT311
1
𝑓(𝑥) = 0 ≤ 𝑥 ≤ 10
10
This function defines a uniform distribution over the interval [0,10]. Every value in the
range from 0 to 10 can occur (and not just 0, 1, 2, etc., but all the fractional values in
between). We cannot interpret 𝑓(𝑥) as the probability of the value x, because there are
more than 10 possible values of 𝑥, so the probabilities would add up to more than 1. And
1
that would clearly be wrong anyway, because the chance of (say) 𝑥 = 2 is not .
10
What 𝑓(𝑥) does do for us is allow us to find the probability of intervals. We do this by
looking at the area underneath the curve defined by 𝑓(𝑥). [Draw graph of this function: a
horizontal line at 𝑓(𝑥) = 1/10, going from 𝑥 = 0 𝑡𝑜 𝑥 = 10. ] Note that the total area
underneath this function is 1. This makes sense, because all probabilities must add up to 1,
and no value can fall outside the interval [0,10]. Note also that the area under the curve for
any interval with a length of one, such as [0,1] or [1,2] or [3.5,4.5] is equal to 1/10.
The probability density function of 𝑥, 𝑓(𝑥) 𝑜𝑟 𝑝𝑑𝑓(𝑥), supplies the probability
density (y) for each possible value of x.
𝑃(𝑎 ≤ 𝑥 ≤ 𝑏) = ∫ 𝑓(𝑥)𝑑𝑥
𝑎
(i) 𝑓(𝑥) ≥ 0
∞
(ii) ∫−∞ 𝑓(𝑥)𝑑𝑥 = 1
EXAMPLE:
0 otherwise
21
Probability Theory – STAT311
(a) Find c
∞ 3
𝑥3 3 1
∫ 𝑓(𝑥)𝑑𝑥 = 1 ⟹ ∫ 𝐶𝑥2 𝑑𝑥 = 1 ⟹ 𝐶 ⌊ ⌋ = 9𝐶 = 1 ⟹ 𝐶 =
3 9
−∞ 0
2 2
1 1 𝑥3 2 1 8 − 1 7
𝑃(1 < 𝑥 ≤ 2) = ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 9 𝑥2𝑑𝑥 = ⌊ ⌋ = ⌊ ⌋=
9 3 9 3 27
1 1 1
𝜇 = 𝐸(𝑥) = ∫ 𝑥 𝑓(𝑥)𝑑𝑥
−∞
∞ ∞ 2
22
Probability Theory – STAT311
EXAMPLE:
Calculate: (i) the mean 𝜇 or 𝐸(𝑥), and (ii) the variance 𝜎2 for the following
probability density function (𝑝𝑑𝑓):
1
𝑓(𝑥) = 𝑥 0<𝑥<2
2
0 otherwise
SOLUTION:
2 1
(i) 𝜇 = 𝐸(𝑥) = ∞ 𝑥 𝑓(𝑥)𝑑𝑥 = 𝑥 ⌊ 𝑥⌋ 𝑑𝑥
∫−∞ ∫0 2
2
1 𝑥3 1 8 4
= ⌊ ⌋ = ⌊⌋=
2 3 2 3 3
0
23
Probability Theory – STAT311
24
Probability Theory – STAT311
The probability density function for a uniform distribution taking values in the range
a to b is:
1
𝑓(𝑥) = {𝑏 − 𝑎 , a≤ x ≤ b
0 , 𝑜𝑡𝑒𝑟𝑤𝑖𝑠𝑒
∞ 1
∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝑑𝑥 = 1
−∞ 0
EXAMPLE: You arrive into a building and are about to take an elevator to your floor.
Once you call the elevator, it will take between 0 and 40 seconds to arrive to you.
We will assume that the elevator arrives uniformly between 0 and 40 seconds after
you press the button. In this case a = 0 and b = 40.
CALCULATING PROBABILITIES
In our example, to calculate the probability that elevator takes less than 15 seconds
to arrive we set d = 15 and c = 0. The correct probability is 15−0 =15.
40−0 40
EXPECTED VALUE
THE VARIANCE
25
Probability Theory – STAT311
EXAMPLE2:
26
Probability Theory – STAT311
𝑓(𝑥) = {𝝀𝑒 , if x ϵR
−𝝀𝑥
0 , 𝑜𝑡𝑒𝑟𝑤𝑖𝑠𝑒
To check if the above function is a legitimate probability density function, we need
to check if it’s integral over its support is 1.
∞ ∞
= ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝝀𝑒−𝝀𝑥𝑑𝑥
−∞ 0
𝝀
[𝑒−𝝀𝑥]∞0 = −[0 − 1] = 1
−𝝀
CUMULATIVE DENSITY FUNCTION
As we know, the cumulative density function is nothing but the sum of probability
of all events up to a certain value of x = t In the Exponential distribution, the
cumulative density function F(x) is given by
t t
e−λX
F(x) = ∫ λe dx = [ −λ ] = [−e−λt + 1]
−λ X
0 0
= 1 − e−λt
EXPECTED VALUE –
To find out the expected value, we simply multiply the probability distribution
function with x and integrate over all possible values(support).
27
Probability Theory – STAT311
28
Probability Theory – STAT311
EXAMPLE – Let X denote the time between detections of a particle with a Geiger
counter and assume that X has an exponential distribution with E(X) = 1.4 minutes.
What is the probability that we detect a particle within 30 seconds of starting the
counter?
SOLUTION – Since the Random Variable (X) denoting the time between successive
detection of particles is exponentially distributed, the Expected Value is given by
1
E(X) =
λ
1 1
= 1.4 then λ =
λ 1.4
To find the probability of detecting the particle within 30 seconds of the start of the
experiment, we need to use the cumulative density function discussed above. We
convert the given 30 seconds in minutes since we have our rate parameter in terms
of minutes.
F(X) = 1 − e−λt
0.5
−
F(0.5) = 1 − e 1.4
F(0.5) = 0.30
29
Probability Theory – STAT311
ASSIGNMENT 2
1. If the monthly expenditure for certain family (1000 S.R) on food has the following
probability density function(𝑝𝑑𝑓):
𝑓(𝑥) = 𝑐𝑥 (10 − 𝑥) 0 ≤ 𝑥 ≤ 10
0 otherwise
(a) Find C.
(b) Calculate 𝑃(5 ≤ 𝑥 ≤ 8)
(c) If we have 600 households (families), what’s the expected number of family
whose expenditure is less than or equal to 3 thousand S.R. per month
(d) Calculate: (i) the mean 𝜇 or 𝐸(𝑥), and (ii) the variance 𝜎2 for monthly
expenditure.
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
…………………………………………………………………
2. The weights of 10-pound bags of potatoes packaged by Idaho Farms Inc. are
uniformly distributed between 9.75 pounds and 10.75 pounds. Calculate the mean and
the standard deviation weight per bag
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
…………………………………………………………………
30
Probability Theory – STAT311
THE MOMENTS:
Let X be a random variable with a probability distribution f (x) the rth moment
about the origin of X is given by:
𝑋
ﻟ ∑ 𝑋𝑟𝑓(𝑥), 𝑖𝑓 𝑋 𝑖𝑠 𝑎 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒
I
𝑎𝑙𝑙𝑥
µ𝑟 = 𝐸(𝑋𝑟) ∞ .
❪
∫ 𝑋𝑟f(𝑋)𝑑𝑥 , 𝑖𝑓 𝑋 𝑖𝑠 𝑎 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠
I
𝗅−∞
if the expectation exists.
As special case:
1= 𝐸(𝑋) = µ𝑋 Mean of X.
Let X be a random variable with a probability distribution f (x) the rth central
moment of X about µ is defined as:
𝑋
ﻟ
∑(𝑋 − µ)𝑟𝑓(𝑥), 𝑖𝑓 𝑋 𝑖𝑠 𝑎 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒
I
𝑎𝑙𝑙𝑥
µ𝑟 = 𝐸(𝑋 − µ)𝑟 = ∞
❪
∫(𝑋 − µ)𝑟f(𝑋)𝑑𝑥 , 𝑖𝑓 𝑋 𝑖𝑠 𝑎 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠
I
𝗅−∞
AS SPECIAL CASE:
- µ𝟏 = 𝟎 𝟐
- µ = 𝐸(𝑋 − µ)2 = 𝜎2 is the variance of X . 𝝁 = 𝝁! − 𝝁!
2 𝑥 𝟐 𝟐 𝟏
31
Probability Theory – STAT311
SOLUTION:
(a) The first four moments about the origin by using the formula:
µ𝒓 = 𝑬(𝑿𝒓) = ∑𝑿𝒂𝒍𝒍𝒙 𝑿𝒓𝒇(𝒙),
𝒙 𝒙𝟐 𝒙𝟑 𝒙𝟒 𝑷(𝒙) 𝑿𝑷(𝒙) 𝒙𝟐𝑷(𝒙) 𝒙𝟑𝑷(𝒙) 𝒙𝟒𝑷(𝒙)
0 0 0 0 0.05 0 0 0 0
1 1 1 1 0.20 0.20 0.20 0.20 0.20
2 4 8 16 0.45 0.90 1.80 3.60 7.20
3 9 27 81 0.30 0.90 2.70 8.10 24.3
∑ 1 2 4.70 11.90 31.70
Therefore, the first four moments about the origin are:
𝜇1! = 𝐸(𝑥) = ∑ 𝑥𝑃(𝑥) = 2
-2 4 -8 16
-1 1 -1 1
0 0 0 0
1 1 1 1
0 0 0 0
32
Probability Theory – STAT311
SOLUTION:
(a) The first four moments about the origin by using the formula:
1 1
𝜇! 1= 𝐸(𝑥) = ∫ 𝑥𝑓(𝑥)𝑑𝑥 = 2 ∫ 𝑥2𝑑𝑥
0 0
2x2/3 1 2
( ) =
0 3
1 1
𝜇!2 = 𝐸(𝑥2) =∫ 𝑥2𝑓(𝑥)𝑑𝑥 = 2 ∫ 𝑥3𝑑𝑥
0 0
x4/2 1 1
( ) =
0 2
1 1
𝜇!3 = 𝐸(𝑥3) =∫ 𝑥3𝑓(𝑥)𝑑𝑥 = 2 ∫ 𝑥4𝑑𝑥
0 0
2x5/5 1 2
( ) =
𝟐
0 5
(b) Using 𝝁 = we get:
𝟑
The first moment about the mean: 𝜇1 = 𝐸(𝑋 − 2/3) = 0
33
Probability Theory – STAT311
NOTICE THAT:
2
μ = μ! − μ!
2 2 1
1 2 2 1
−( ) =
2 3 18
34
Probability Theory – STAT311
Moment-generating functions will exist only if the sum or integral of the above
definition converges. If a moment-generating function of a random variable X does
exist, it can be used to generate all the moments of that variable.
𝑑2𝑀𝑋(𝑡)
| = 2
𝑑𝑡2
𝑡=0
In addition:
𝝈𝟐 = 2 − ( 1 )2
35
Probability Theory – STAT311
ASSIGNMENT 3
1. If Y is a discrete random variable having the following probability distribution,
calculate: (a) the first four moments about the origin and (b) the first and second
moment about the mean.
𝒀 0 1 2 3
𝑷(𝒀) 0.05 0.35 0.20 0.40
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………….…………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
…………………………………………………….
36
Probability Theory – STAT311
Then its moment generating function of the binomial random variable X is:
𝑛
𝑛
𝑀𝑋 (𝑡) = ∑ ( ) 𝑝𝑥 𝑞𝑛−𝑥
𝑒𝑡𝑥 𝑥
𝑥=0
𝑛𝑛
𝑡 𝑥 𝑛−𝑥
= ∑ ( ) (𝑝𝑒 ) 𝑞
𝑥
𝑥=0
Recognizing this last sum as the binomial expansion of (pet +q)n , we obtain
1- The mean µ = np .
2- The variance σ2 = npq .
Solution
The men
(𝑡)
E(x) = 𝑑𝑀𝑋 = 𝑛(𝑝𝑒𝑡 + 𝑞)𝑛−1𝑝𝑒𝑡
𝑑𝑡
Setting t = 0 , we get:
1= 𝑛𝑝
The variance
𝑑2𝑀𝑋(𝑡)
= [𝒏(𝑛 − 1)(𝑝𝑒𝑡 + 𝑞)𝑛−2(𝑝𝑒𝑡)2] + [𝑛(𝑝𝑒𝑡 + 𝑞)𝑛−1𝑝𝑒𝑡]
𝑑𝑡2
Hence
Therefore,
37
Probability Theory – STAT311
µ= 1 = 𝑛𝑝
𝜎2 = 2 − ( 1) 2
= 𝑛𝑝(1 − 𝑝) = 𝑛𝑝𝑞
𝑒−𝜆𝜆𝑥
𝑓(𝑋) = 𝑥 = 0, 1, 2 …
𝑥!
Then its moment generating function of the Poisson random variable X and mean λ
is:
∞
𝑛
𝑒−𝜆𝜆𝑥
= ∑(𝑒𝑡𝑥)( )
𝑥!
𝑥=0
∞
(𝜆𝑒𝑡)𝑥
= 𝑒−𝜆 ∑
𝑥!
𝑥=0
𝑡
= 𝑒−𝜆𝑒𝜆𝑒
𝑡
𝑒𝜆(𝑒 −1)
𝑡 𝜆𝑒𝑡 (𝜆𝑒𝑡)2 (𝜆𝑒𝑡)𝑥
Since, 𝑒𝜆𝑒 = 1 + + + ⋯+ +..
1! 2! 𝑥!
∞
(𝜆𝑒𝑡)𝑥
=∑
𝑥!
𝑥=0
1- The mean µ = 𝜆
2- The variance σ2 = 𝜆
38
Probability Theory – STAT311
SOLUTION:
THE MEN
𝑑𝑀𝑋(𝑡) 𝑡
E(x) = = 𝝀𝑒𝑡𝒆𝝀(𝑒 −1)
𝑑𝑡
Setting t = 0 , we get:
E(x) =𝑀𝑋′(0) = 𝝀
THE VARIANCE
d2MX(t) t t
= (λet)2 eλ(e −1) + λeteλ(e −1)
dt2
HENCE
E(X2) = 2 = MX′′(0) = λ2 + λ
Therefore,
Var (X) = 𝐸(𝑋2) − [𝐸(𝑋)]2
= λ2 + λ − λ2
=λ
THE MGF OF THE GEOMETRIC DISTRIBUTION
Suppose X has a Geometric distribution. The Function 𝒇(𝒙) is
P(X) = (1 − p)𝑋−1p, x = 1,2, . ..
Then moment generating function of the Geometric random variable X is:
𝑛
𝑝𝑒𝑡
= 𝑤𝑒𝑟𝑒 𝑞 = 1 − 𝑝
1 − 𝑞𝑒𝑡
Since, ∑𝑛 (𝑞𝑒𝑡)𝑋−1 is a geometric progression its sum is 1
𝑥=0 1−𝑞𝑒𝑡
39
Probability Theory – STAT311
From this generating function, we can find the moments. For instance, E(x)
=𝑀𝑋′(0). The derivative
′
(1 − (1 − 𝑝)𝑒𝑡𝑝𝑒𝑡 − 𝑝𝑒𝑡(−(1 − 𝑝)𝑒𝑡
𝑀𝑋 (𝑡) = =
(1 − (1 − 𝑝)𝑒𝑡)2
(1 − 𝑞𝑒𝑡)𝑝𝑒𝑡 − 𝑝𝑒𝑡(−𝑞𝑒𝑡)
𝑀𝑋′(𝑡) =
(1 − 𝑞𝑒𝑡)2
40
Probability Theory – STAT311
𝑀𝑋(𝑡) = ∫ 𝒆𝒕𝒙𝑓(𝑥)𝑑𝑥
−∞
𝑏
1
= ∫ 𝒆𝒕𝒙( )𝑑𝑥
𝑏−𝑎
𝑎
𝑏
1
=( )∫ 𝒆 𝑑𝑥
𝒕𝒙
𝑏−𝑎
𝑎
1 𝒆𝒕𝒙
=( ) 𝑥=𝑏
| 𝑥=𝑎
𝑏−𝑎 𝒕
𝒆𝒕𝒃−𝒆𝒕𝒂
=
𝑡(𝑏−𝑎)
𝑓(𝑥) = {𝝀𝑒 , if x ϵR
−𝝀𝑥
0 , 𝑜𝑡𝑒𝑟𝑤𝑖𝑠𝑒
Computation of the mgf. Let X be a continuous random variable that follows an
exponential distribution E(λ). The mgf of X is given by :
𝑀𝑋(𝑡) = ∫ 𝒆𝒕𝒙𝑓(𝑥)𝑑𝑥
−∞
41
Probability Theory – STAT311
= ∫ 𝒆𝒕𝒙𝝀𝑒−𝝀𝑥𝑑𝑥
0
= 𝝀 ∫ 𝑒−X(𝝀−𝑡)𝑑𝑥
0
𝑒−X(𝝀−𝑡) 𝑥=∞
= 𝝀( )|
−(𝝀 − 𝑡) 𝑥 = 0
𝝀
=( ) (0 − 1)
−(𝝀−𝑡)
𝝀
=
(𝝀 − 𝑡)
lim x → ∞ 𝑒−X(𝝀−𝑡) = 0
If and only if λ−t > 0. Therefore the derived formula holds if and only if t < λ.
SOLUTION
So let us start with the derivatives. Say that we want the first moment, or the mean.
We will just take the derivative with respect to the dummy variable t and plug in
0 for t after taking the derivative
(𝑡)
E(x) = 𝑑𝑀𝑋 = 𝑀 ′(𝑡)
𝑑𝑡 𝑋
𝝀
𝑀𝑋(𝑡) =
(𝝀 − 𝑡 )
Then
′ 𝝀
𝑀𝑋 (𝑡) =
(𝝀 − 𝑡)2
′ 𝝀 𝝀 𝟏
𝑀𝑋 (0) = = =
(𝝀 − 0)2 (𝝀)2 𝝀
42
Probability Theory – STAT311
THE VARIANCE
Let’s do the variance now; we already have the first moment, but we need the
second moment E(X2) as well. We can just derive again and plug in t = 0 to find
the second moment:
2𝜆
𝑀𝑋 ′′(𝑡) =
(𝜆 − 𝑡)3
′′ 2𝜆 2
𝑀𝑋 (0) = =
(𝜆 − 0)3 𝜆2
VAR(X) = E(X2) - E(X)2
2 1 2 1
VAR(X) =
𝜆2
− (𝜆) = 𝜆2
43
Probability Theory – STAT311
PART TWO
Joint Probability
Distribution
Function
44
Probability Theory – STAT311
If X and Y are two discrete random variables, then P(x,y) = P(X = x,Y= y) is called
the joint probability mass function j.p.m.f of X and Y , and P (x,y) has the
following properties:
1- 0 ≤ f (x,y) ≤ 1 ∀x and y
2- ∑∀𝑋 ∑∀𝑌 𝑝(𝑥, 𝑦)= 1
If X and Y are jointly discrete random variables, with the j.p.m.f p(x,y), then g(x)
and h(y) are called marginal probability mass function of X and Y, respectively
which can be calculated as follows:
45
Probability Theory – STAT311
CONDITIONAL DISTRIBUTION
DEFINITION:
If X and Y are jointly random variables discrete or continuous, with the j.p.f
p(x,y),g(x) and h(y) are marginal probability distributions of X and Y respectively,
then the conditional distribution of the random variable Y given that X = x is :
𝑝(𝑥, 𝑦)
𝑝(𝑌|𝑋) = , 𝑔(𝑥) > 0
𝑔(𝑥)
Similarly the conditional distribution of the random variable X given that Y = y is:
𝑝(𝑥, 𝑦)
𝑝(𝑋|𝑌) = , (𝑦) > 0
(𝑦)
DEFINITION:
Let X and Y be a random variables with joint probability distribution f (x,y) the
expected value (mean) of the random variable g(X,Y) denoted by µg(X,Y) is :
:
𝐸(𝑥𝑦) = ∑𝑥 ∑𝑦 𝑥𝑦 𝑝(𝑥, 𝑦)
Let X and Y be a random variables with joint probability distribution p (x,y) the
expected value (mean) of the random variable X, is :
Similarly, let X and Y be a random variables with joint probability distribution p(x,y)
the expected value (mean) of the random variable Y is :
𝐸(𝑦) = ∑𝑥 ∑𝑦 𝑦𝑝(𝑥, 𝑦) =∑𝑎𝑙𝑙𝑥 𝑦(𝑦),
46
Probability Theory – STAT311
EXAMPLE 1:
Let X and Y are jointly discrete random variables, with the following j.p.m.f P
(x,y).
X 0 1 2 Sum
Y
0 3 9 3 15
28 28 28 28
1 6 6 0 12
28 28 28
2 1 0 0 1
28 28
Sum 10 15 3 1
28 28 28
SOLUTION
3 6 9 6 24
+ + + =
28 28 28 28 28
H.W. Find P [(X, Y) ∈ A] = P (X + Y ≤ 1)
47
Probability Theory – STAT311
6 1 10
g(0) = p(0,0) + p(0,1) + p(0,2) = 3 + + =
28 28 28 28
6 15
g(1) = p(1,0) + p(1,1) + p(1,2) = 9 + +0=
28 28 28
3
g(2) = p(2,0) + p(2,1) + p(2,2) = 3 + 0 + 0 =
28 28
X 0 1 2
g(X) 10 15 3
28 28 28
For the random variable Y, we see that:
9 3 15
h(0) = f (0,0) + f (0,1) + f (0,2) = 3 + + =
28 28 28 28
6 12
h(1) = f (1,0) + f (1,1) + f (1,2) = 6 + +0=
28 28 28
1
h(2) = f (2,0) + f (2,1) + f (2,2) = 1 + 0 + 0 =
28 28
X 0 1 2
h(y) 15 12 1
28 28 28
𝑝(𝑥,𝑦) 6
P(Y=1/X=0)∶ 𝑝(𝑌|𝑋) = =
𝑝(0,1)
= 28
=
3
10
𝑔(𝑥) 𝑔(0) 5
28
𝑝(𝑥,𝑦) 6
P(X=1/Y=1): 𝑝(𝑋|𝑌) = =
𝑝(1,1)
= 28
=
1
12
(𝑦) (1) 2
28
0 1 2
X
10 15 3
g(x) 28 28 28
48
Probability Theory – STAT311
0 1 2
Y
15 12 1
h(y) 28 28 28
The expected value of the sum or difference of two or more functions of the random
variables X and Y is the sum or difference of the expected values of the functions.
That is,
E[g(X, Y) ± h(X, Y)] = E[g(X, Y)] ± E[h(X, Y)].
STATISTICAL INDEPENDENCE
DEFINITION:
Let X and Y be two random variables discrete or continuous, with the j.p.f f(x,y),
and marginal probability distributions g(x) and h(y) respectively. The random
variables X and Y are said to be statistically independent if and only if:
p(x,y) = g(x)h(y)
for all (x,y) within their ranges
EXAMPLE2:
If X and Y are jointly discrete random variables, with the following j.p.m.f p(x,y):
X 2 3 4
Y
1 0.06 0.15 0.09
2 0.14 0.35 0.21
49
Probability Theory – STAT311
SOLUTION:
X 2 3 4
0.2 0.5 0.3
g(x)
Y 1 2
0.3 0.7
h(y)
p(4,2) = 0.21
g(4)h(2) = 0.3×0.7= 0.21
p(2,2) = 0.14
g(2)h(2) = 0.2 × 0.7
p(4,1) = 0.09
g(4)h(1) = 0.3×0.3= 0.09
Hence, X and Y are statistically independent.
DEFINITION
Let X and Y be a random variables with joint probability distribution p(x,y), the
covariance of X and Y which denoted by cov(X,Y) or σXY is :
NOTE
50
Probability Theory – STAT311
Example3: from example 1 find the covariance of two random variables X and Y
SOLUTION
SINCE,
3 3 1
𝐸(𝑥𝑦) = , µX = , µy =
14 4 2
LINEAR COMBINATION
let X and Y be random variables with joint probability function p(x,y), a and b are
constants, then ,
var(ax ± bx) = 𝜎2 = 𝑎2𝜎2 + 𝑏2 𝜎 2 ±2abσ
𝑎𝑥±𝑏𝑦 𝑥 𝑦 XY
51
Probability Theory – STAT311
CORRELATION COEFFICIENT
DEFINITION
Let X and Y be random variables with covariance σXY and standard deviations σX
and σY respectively. The correlation coefficient of X and Y is :
𝜌 𝜎𝑥𝑦
𝑥𝑦 =
𝜎𝑥𝜎𝑦
DEFINITION
EXAMPLE3:
Let X and Y are jointly discrete random variables, with the following j.p.m.f P
(x,y).
X 0 1 2 Sum
Y
0 3 9 3 15
28 28 28 28
1 6 6 0 12
28 28 28
2 1 0 0 1
28 28
Sum 10 15 3
28 28 28
52
Probability Theory – STAT311
SOLUTION
𝜌 𝜎𝑥𝑦
𝑥𝑦 =𝜎 𝜎
𝑥 𝑦
We know that,
3 1
µX = , µy = σXY =- 9
4 2 56
21 2
= 28 − (28)
27
315
=
784
= 0.964 – 0.563 = 0.401
σX = √0.401 = 0.6338
Var (y) = E(y2) − µ2 = ∑ 𝑦2(𝑦) − µ2
𝑦 ally 𝑦
1
[02 × 15+12 × 12+22 × ]-( 14)2
28 28 28 28
4 1 2
= −( )
7 2
9
=
28
σy = √0.3214 = 0.5669
Since,
σX = 0.6338, σy = 0.5669, σXY = -0.161
Therefore,
𝜌 −0.161
𝑥𝑦 = = − 0.448
(0.3593)
53
Probability Theory – STAT311
ASSIGNMENT 4
1. Let X denotes the number of times a certain numerical control machine will
malfunction: 1, 2, or 3 times on any given day. Let Y denote the number of times a
technician is called on an emergency call. Their joint probability distribution is given
as
X
𝒇(𝒙, 𝒚) 1 2 3
1 0.05 0.05 0.10
Y 3 0.05 0.10 0.35
5 0.00 0.20 0.10
…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………
………………………………………………………………………………………..
X
𝒇(𝒙, 𝒚) 2 4
1 0.10 0.15
Y 3 0.20 0.30
5 0.10 0.15
Find:
a. covariance X and Y
b. P [(X, Y ) ∈ A], where A is the region given by {(x, y) | x + y ≤ 5}.
…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………..
54
Probability Theory – STAT311
1- f (x,y) ≥ 0 ∀(x,y)
∞
2- ∬−∞ 𝑓(𝑋, 𝑌)𝑑𝑥𝑑𝑦 = 1
EXAMPLE1:
A candy company distributes boxes of chocolates with a mixture of creams,
toffees, and nuts coated in both light and dark chocolate. For a randomly selected
box, let X and Y, respectively, be the proportions of the light and dark chocolates
that are creams and suppose that the joint density function is
2
𝑓(𝑥, 𝑦) = {5 (2𝑥 + 3𝑦) 𝑖𝑓 0 ≤ 𝑥 ≤ 1, 0 ≤ 𝑦 ≤ 1
0, 𝑜𝑡𝑒𝑤𝑖𝑒
1- Verify that f (x,y) is a joint probability density function.
2- Find P[(X,Y) ∈ A] where A = [(X,Y)|0 ≤ x ≤ 1, 1 ≤ y ≤ 1]
2 4 2
∞
SOLUTION ∞
1. ∫ ∫ 𝑓(𝑋, 𝑌)𝑑𝑥𝑑𝑦 = 0 1 2 (2𝑥 + 3𝑦)𝑑𝑥𝑑𝑦
−∞ −∞ ∫1 ∫0 5
1
6𝑥𝑦 𝑥=1
2𝑥2
= ∫( + )| 𝑑𝑦
5 5 𝑥=0
0
55
Probability Theory – STAT311
0
2 6𝑦
= ∫ ( + ) 𝑑𝑦
5 5
1
2𝑦 3𝑦2 𝑦=1
=( + )|
5 5 𝑦=0
2 3
= + =1
5 5
2. To calculate the probability, we use
P [(X, Y ) ∈ A] = P( 0 < X < 1/ 2, 1/ 4 < Y < 1/ 2)
3. 𝑃 [(𝑋, 𝑌 ) ∈ 𝐴] = 𝑃( 0 < 𝑋 < 1/ 2, 1/ 4 < 𝑌 < 1/ 2)
1 1
2 2
2
= ∫ ∫ (2𝑥 + 3𝑦)𝑑𝑥𝑑𝑦
5
1 0
4
1
2 1
𝑥=
2𝑥2 6𝑥𝑦 2
= ∫( + )| 𝑑𝑦
5 5
1 𝑥=0
4
1
2
1 3𝑦
= ∫( + ) 𝑑𝑦
10 5
1
4
1 1
1 3𝑦 𝑦 3𝑦 2 𝑦=2
∫12 ( + ) 𝑑𝑦=( + )|
10 1
5 10 10 𝑦=
4 4
1 1 1 3
[( + 3) − ( + )] = 13 =0.081
10 2 4 4 16 160
∞
1- g(x) = ∫−∞ 𝑓(𝑋, 𝑌)𝑑𝑦
∞
2- h(y) = ∫−∞ 𝑓(𝑋, 𝑌)𝑑𝑥
56
Probability Theory – STAT311
EXAMPLE2:
A candy company distributes boxes of chocolates with a mixture of creams,
toffees, and nuts coated in both light and dark chocolate. For a randomly selected
box, let X and Y, respectively, be the proportions of the light and dark chocolates
that are creams and suppose that the joint density function is
2
𝑓(𝑥, 𝑦) = {5 (2𝑥 + 3𝑦) 𝑖𝑓 0 ≤ 𝑥 ≤ 1, 0 ≤ 𝑦 ≤ 1
0, 𝑜𝑡𝑒𝑤𝑖𝑒
∞
1- g(x) = ∫−∞ 𝑓(𝑋, 𝑌)𝑑𝑦
1
2
𝑔(𝑥) = ∫ (2𝑥 + 3𝑦)𝑑𝑦
5
0
∞
2- h(y) = ∫−∞ 𝑓(𝑋, 𝑌)𝑑𝑥
1
2
(𝑦) = ∫ (2𝑥 + 3𝑦)𝑑𝑥
5
0
57
Probability Theory – STAT311
𝑥=1
2𝑥2 6𝑥𝑦
=( + )|
5 5 𝑥=0
2
= ( + 6𝑦)
5 5
2(1 + 3𝑦)
=
5
2(1 + 3𝑦)
𝑖𝑓 0 ≤ 𝑦 ≤ 1
(𝑦) = { 5
0, 𝑜𝑡𝑒𝑤𝑖𝑒
3-CONDITIONAL DISTRIBUTION
DEFINITION:
If X and Y are jointly random variables discrete or continuous, with the j.p.f f
(x,y), g(x) and h(y) are marginal probability distributions of X and Y respectively,
then the conditional distribution of the random variable Y given that X = x is :
𝐹(𝑥, 𝑦)
𝑓(𝑌|𝑋) = , 𝑔(𝑥) > 0
𝑔(𝑥)
Similarly the conditional distribution of the random variable X given that Y = y is:
𝐹(𝑥, 𝑦)
𝑓(𝑋|𝑌) = , (𝑦) > 0
( 𝑦)
EXAMPLE3:
If X and Y are jointly continuous random variables, with the following j.p.d.f f
(x,y):
10𝑥𝑦2 𝑖𝑓 0 < 𝑥 < 𝑦 < 1
𝑓(𝑥, 𝑦) = {
0, 𝑜𝑡𝑒𝑤𝑖𝑒
𝑔(𝑥) = ∫ 10𝑥𝑦2𝑑𝑦
𝑥
10
= 𝑥𝑦3|𝑦=1
𝑦=𝑥
3
58
Probability Theory – STAT311
10
= 𝑥(1 − 𝑥3)
3
10
𝑥(1 − 𝑥3) 𝑖𝑓 0 < 𝑥 < 1
𝑔(𝑥) = { 3
0, 𝑜𝑡𝑒𝑤𝑖𝑒
(𝑦) = ∫ 10𝑥𝑦2𝑑𝑦
0
10
= 𝑥2𝑦2|𝑥=𝑦
𝑥=0
2
= 5𝑦4
5𝑦4 𝑖𝑓 0 < 𝑦 < 1
(𝑦) = {
0, 𝑜𝑡𝑒𝑤𝑖𝑒
𝑓(𝑥,𝑦)
3 - 𝑓(𝑦|𝑥) =
𝑔(𝑥)
= 10𝑥𝑦2
10
3 𝑥(1 − 𝑥 )
3
3𝑦2
=
(1 − 𝑥3)
Therefore
1
𝑦3 𝑦=1
= |𝑦=0.5 = 0.89
0.98
59
Probability Theory – STAT311
4-STATISTICAL INDEPENDENCE
Definition:
Let X and Y be two random variables discrete or continuous, with the j.p.f f (x,y),
and marginal probability distributions g(x) and h(y) respectively. The random
variables X and Y are said to be statistically independent if and only if:
f (x,y) = g(x)h(y)
EXAMPLE4:
If X and Y are jointly continuous random variables, with the following j.p.d.f
f (x,y):
𝑥(1 + 3𝑦2)
𝑖𝑓 0 < 𝑥 < 2, 0 < 𝑦 < 1
𝑓(𝑥, 𝑦) = { 4
0, 𝑜𝑡𝑒𝑤𝑖𝑒
1
𝑥(1 + 3𝑦2)
𝑔(𝑥) = ∫ 𝑑𝑦
4
0
𝑦=1
𝑥𝑦 𝑥𝑦3
=( + )|
4 4 𝑦=0
𝑥
=
2
𝑥
𝑖𝑓 0 < 𝑥 < 2
𝑔(𝑥) = { 2
0, 𝑜𝑡𝑒𝑤𝑖𝑒
60
Probability Theory – STAT311
2
𝑥(1 + 3𝑦2)
(𝑦) = ∫ 𝑑𝑥
4
0
𝑥 3𝑥2𝑦2 𝑥=2
2
=( + )|
8 8 𝑥=0
1 3𝑦 2
=( + )
2 2
1 + 3𝑦2
𝑖𝑓 0 < 𝑦 < 1
(𝑦) = { 2
0, 𝑜𝑡𝑒𝑤𝑖𝑒
3- Are the two random variables X and Y independent? if X and Y are
independent then,
f (x ,y) = g(x)h(y)
𝑥 1+3𝑦2
= ( )
2 2
𝑥(1+3𝑦2)
=
4
= f (x ,y)
61
Probability Theory – STAT311
5- MATHEMATICAL EXPECTATION
Definition:
Let X and Y be a random variables with joint probability distribution f (x,y) the
expected value (mean) of the random variable g(X,Y) denoted by µg(X,Y) is :
∞ ∞
EXAMPLE5:
If X and Y are jointly continuous random variables, with the following j.p.d.f
f (x,y):
𝑥(1 + 3𝑦2)
𝑖𝑓 0 < 𝑥 < 2, 0 < 𝑦 < 1
𝑓(𝑥, 𝑦) = { 4
0, 𝑜𝑡𝑒𝑤𝑖𝑒
𝑦
Find 𝐸( )
𝑥
2 1
𝑦 𝑦 𝑥(1 + 3𝑦2)
𝐸( ) = ∫∫ 𝑑𝑥𝑑𝑦
𝑥 𝑥 4
0 0
2 1
𝑦(1 + 3𝑦2)
= ∫∫ 𝑑𝑥𝑑𝑦
4
0 0
1 𝑥=2
𝑥𝑦(1 + 3𝑦2)
=∫ 𝑑𝑦|
4
0 𝑥=0
1
𝑦(1 + 3𝑦2)
=∫ 𝑑𝑦
2
0
𝑦=1
1 𝑦2 3
= ( + 𝑦4)|
2 2 4 𝑦=0
𝑦=1
𝑦2 3
= ( + 𝑦4)|
4 8 𝑦=0
5
=
8
62
Probability Theory – STAT311
EXAMPLE5:
If X and Y are jointly continuous random variables, with the following j.p.d.f f
(x,y):
𝑥(1 + 3𝑦2)
𝑖𝑓 0 < 𝑥 < 2, 0 < 𝑦 < 1
𝑓(𝑥, 𝑦) = { 4
0, 𝑜𝑡𝑒𝑤𝑖𝑒
Find
1- E(XY)
2- E(X)
3- E(Y)
4- Illustrate that E(XY) = E(X)E(Y)
SOLUTION
1- 𝐸(𝑥𝑦) = 2 1
𝑥𝑦
𝑥(1+3𝑦2)
𝑑𝑥𝑑𝑦
∫0 ∫0 4
2 1
𝑦𝑥2(1 + 3𝑦2)
= ∫∫ 𝑑𝑥𝑑𝑦
4
0 0
1 𝑥=2
𝑦𝑥3(1 + 3𝑦2)
=∫ 𝑑𝑦|
12
0 𝑥=0
1
2𝑦(1 + 3𝑦2)
=∫ 𝑑𝑦
3
0
1
(2𝑦 + 6𝑦3)
=∫ 𝑑𝑦
3
0
1 3 𝑦=1
= (𝑦2 + 𝑦4)|
3 2 𝑦=0
1 3
= +
3 6
5
=
6
63
Probability Theory – STAT311
2- 𝐸(𝑥) = 2 1
𝑥
𝑥(1+3𝑦2)
𝑑𝑥𝑑𝑦
∫0 ∫0 4
2 1
𝑥2(1 + 3𝑦2)
= ∫∫ 𝑑𝑥𝑑𝑦
4
0 0
1 𝑥=2
𝑥3(1 + 3𝑦2)
=∫ 𝑑𝑦|
12
0 𝑥=0
1
2(1 + 3𝑦2)
=∫ 𝑑𝑦
3
0
1
(2 + 6𝑦2)
=∫ 𝑑𝑦
3
0
𝑦=1
(2𝑦 + 2𝑦3)
= |
3
𝑦=0
4
=
3
3- 𝐸(𝑦) = 2 1
𝑦
𝑥(1+3𝑦2)
𝑑𝑥𝑑𝑦
∫0 ∫0 4
2 1
𝑦𝑥(1 + 3𝑦2)
= ∫∫ 𝑑𝑥𝑑𝑦
4
0 0
1 𝑥=2
𝑥2𝑦(1 + 3𝑦2)
=∫ 𝑑𝑦|
8
0 𝑥=0
1
(𝑦 + 3𝑦3)
=∫ 𝑑𝑦
2
0
1 2 3 𝑦=1
(2 𝑦 + 4 𝑦
4 )
= |
2
𝑦=0
5
=
8
HENCE,
5
E(X)E(Y) = 4 * 5 = = E(XY)
3 8 6
64
Probability Theory – STAT311
6- COVARIANCE COEFFICIENT
COVARIANCE OF RANDOM VARIABLES
DEFINITION
Let X and Y be a random variables with joint probability distribution f (x,y) the
covariance of X and Y which denoted by cov(X,Y) or σXY is :
DEFINITION
The alternative and preferred formula for σXY is:
EXAMPLE2:
If X and Y are jointly continuous random variables, with the following j.p.d.f
f (x,y):
65
Probability Theory – STAT311
SOLUTION
We first compute the marginal density functions g(x) and h(y)
∞
𝑔(𝑥) = ∫ 8𝑥𝑦𝑑𝑦
0
= 4𝑥𝑦2|𝑦=0
𝑦=𝑥
= 4𝑥3
4𝑥3 𝑖𝑓 0 < 𝑥 < 1
𝑔(𝑥) = {
0, 𝑜𝑡𝑒𝑤𝑖𝑒
(𝑦) = ∫ 8𝑥𝑦𝑑𝑥
𝑦
= 4𝑦𝑥2|𝑥=𝑦
𝑥=1
= 4𝑦(1 − 𝑦2)
4𝑦(1 − 𝑦2) 𝑖𝑓 0 < 𝑦 < 1
(𝑦) = {
0, 𝑜𝑡𝑒𝑤𝑖𝑒
We need to compute µX and µY from marginal distribution functions g(x) and h(y)
respectively
∞ 1
µ𝑥 = ∫ 𝑥𝑔(𝑥)𝑑𝑥 = ∫ 4𝑥4𝑑𝑥
−∞ 0
4 5 1
= 𝑥| = 4
5 0 5
∞ 1
66
Probability Theory – STAT311
1 1
𝐸(𝑥𝑦) = ∫ ∫ 8𝑥2𝑦2𝑑𝑥𝑑𝑦
𝑦 0
1 𝑥=1
8
= (∫ 𝑥3𝑦2𝑑𝑦)|
3
0 𝑥=𝑦
18
=(∫ (𝑦2 − 𝑦5)𝑑𝑦
03
8 8 𝑦=1
=(𝑦 −
3 𝑦6)|
9 18 𝑦=0
4
=
9
SINCE,
4 4 8
𝐸(𝑥𝑦) = , µX = , µy =
9 5 15
Therefore,
4
=
225
= 0.018
67
Probability Theory – STAT311
ASSIGNMENT 5
1- Let X and Y be two jointly continuous random variables with joint PDF
10𝑥2𝑦 𝑖𝑓 0 ≤ 𝑥 ≤ 𝑦 ≤ 1
𝑓(𝑥, 𝑦) = {
0, 𝑜𝑡𝑒𝑤𝑖𝑒
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
…………………………………………………………………..
2- Let X and Y be two jointly continuous random variables with joint PDF
10𝑥2𝑦 𝑖𝑓 0 ≤ 𝑥 ≤ 𝑦 ≤ 1
𝑓(𝑥, 𝑦) = {
0, 𝑜𝑡𝑒𝑤𝑖𝑒
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………….…………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
……………………………………………………………………………………………
…………………………………………
68