Mathematical Expectation

Lecture 4
Mathematical expectation
(Of a random variable)
CH2010 Engineering Statistics AY2023 v6 PL 1

Intended learning outcomes
• Concepts:
• Expectation/mean of a random variable
• Expectation/mean of functions of random variables
• Variance of a random variable
• Covariance of two random variables
• Skills
• Calculate mathematical expectation/mean of a random variable or functions of
random variables, given their (joint) probability distribution functions
• Calculate the variance of a random variable or functions of random variables, given
their probability distribution functions.
• Calculate the covariance and correlation coefficients of two random variables, given
their joint probability distribution function.

4.0 Revision
Assume we have a process generating one integer at a time, with value
between 1 and 10 at random.
The outcome is a random variable, as discussed in Lecture 3.
X 1, 2, 3,…., or 10
f(x)
The probability of the system producing each outcome can be collectively
described by the probability distribution (function) of the random variable, as
discussed in Lecture 3.
4.1 Mean of a random variable
If we record three specific outcomes, e.g. 3, 4 & 9, we are doing random sampling.
In Lecture 1, we have learnt to calculate the sample mean: 3 + 4 + 9
= 5.333
3
and the sample variance: (3 - 5.333) 2 + (4 - 5.333) 2 + (9 - 5.333) 2
= 10.33
3 -1
Today, we look at the mean and the variance of the random variable, that is
equivalent to the sample mean and sample variance if the sample size is infinitely
large, i.e. population mean.
X 1, 2, 3,…., or 10
Example 4.1
A process produces a random variable that is an integer between 1 – 10. The
probability distribution function f(x) can be expressed as:
ì0.1 x = 1,2,3,  ,10

f ( x) = í
î0 elsewhere
If we sample X infinite number of times, the sample mean becomes the mean of the
random variable X or the mean of the probability distribution of X, written as μx
or simply as μ.

Example 4.1 (continued)
In statistics, this is also referred to as the expected value of the random variable X,
denoted E(X).
In this Example:
µ = E ( X ) = 0.1´1 + 0.1´ 2 + 0.1´ 3 + 0.1´ 4 + 0.1´ 5
+ 0.1´ 6 + 0.1´ 7 + 0.1´ 8 + 0.1´ 9 + 0.1´10 = 5.5
N.B. the term “expected value” can be confusing as its literal meaning implies a
prediction – even though it is not a specific prediction.
In this course, “expected value” is just another name for “population mean”.

Definition 4.1
N.B. The sample mean, x in Lecture 1, is calculated using sample data. It is a

measure of the centre of the data.
The expected value, μ, is calculated using the distribution function. It is a measure
of the centre of the probability distribution function.
Example 4.2
7 components, 4 good and 3 defective, is sampled by a quality inspector. The inspector
takes a sample of 3 at random. Find the expected value of number of good components in
the sample.
Solution
Following Definition 4.1, we need f(x) in order to compute E(X). Let random variable X
represent the number of good components in the sample. The probability distribution of X
can be calculated by counting sample point for value of x (viz. 0, 1, 2 or 3):
æ 4 öæ 3 ö
çç ÷÷çç ÷÷
è x øè 3 - x ø
f ( x) =
æ7ö
çç ÷÷
è 3ø
Simple calculations yield: f(0) = 1/35, f(1) = 12/35, f(2) = 18/35 and f(3) = 4/35.
Following Definition 4.1, we can calculate:
æ 1 ö æ 12 ö æ 18 ö æ 4 ö 12
µ = E ( X ) = (0)ç ÷ + (1)ç ÷ + (2)ç ÷ + (3)ç ÷ = = 1.7
è 35 ø è 35 ø è 35 ø è 35 ø 7
Thus, if a sample of 3 components is selected at random over and over again (for
infinite number of times) from a lot of 4 good and 3 defective components. We put
the sampled components back to the lot every time after sampling. The average
number of good components in the sample would be 1.7.

4.2 Mean of a function of random variables
Let X be a continuous random variable.

If we know f(x), the probability density function of x, we can find
g(X) is a function of X.
If X is a random variable, then so is g(X).
So how about the mean of g(X)?

Theorem 4.1
N.B. E[g(X)] is not the same as g[E(x)].

Example 4.3
Let X be a random variable with density function:
ì x2
ï
f ( x) = í 3 , - 1 < x < 2,
ïî 0, elsewhere.
Find the expected value of g(X) = 4X + 3.
Solution
By Theorem 4.1, we have:
Now, we extend the definition to functions of two random variables, i.e. g(X, Y), where X
and Y have a joint probability function f(x, y).
Definition 4.2

Example 4.4
Recall the joint PDF from Example 3.5 in Lecture 3.
We can calculate, for example, the expected value of g(X, Y) = XY using definition
4.2.

Solution:
= (0)(0) f (0,0) + (0)(1) f (0,1) + (1)(0) f (1,0) + (1)(1) f (1,1)

+ (2)(0) f (2,0) + (0)(2) f (0,2)
3
= f (1,1) =
14

4.3 Variance of random variables
In Lecture 1, we briefly touched upon the concept of sample variance, which is a
measure of variability/spread of the sample.
When applied to random variables, variance refers to the variability/spread of the
probability distribution of random variables, i.e. population variance.
This is particularly useful in distinguishing between two distributions having the
same mean/expected value.

Distributions with equal means and unequal
variances

Definition 4.3
The variance of the random variable X, denoted Var(X) or σX2, or simply σ2, is
calculated by applying Theorem 4.1 to g(X) = (X – μ)2.
(X – μ) is called the error of the random variable

Example 4.5
Compare the variances of the two probability distributions below:
A B
x 1 2 3 x 0 1 2 3 4
f(x) 0.3 0.4 0.3 f(x) 0.2 0.1 0.3 0.3 0.1
Solution
To find the variances, we firstly need the means
µ A = E ( X ) = (1)(0.3) + (2)(0.4) + (3)(0.3) = 2.0
µ B = E ( X ) = (0)(0.2) + (1)(0.1) + (2)(0.3) + (3)(0.3) + (4)(0.1) = 2.0

Then, calculate the variances according to Definition 4.3.
3
s A2 = å ( x - 2) 2 f ( x) =(1 - 2) 2 (0.3) + (2 - 2) 2 (0.4) + (3 - 2) 2 (0.3)
x =1
= 0.6
4
s = å ( x - 2) 2 f ( x)
2
B
x =0
= (0 - 2) 2 (0.2) + (1 - 2) 2 (0.1) + (2 - 2) 2 (0.3)

+ (3 - 2) 2 (0.3) + (4 - 2) 2 (0.1) = 1.6
\s B2 > s A2

Useful identity 4.1

Useful identity 4.2

Summary
Discrete random variables Continuous random variables

Lecture 4
Mathematical expectation

4.4 Covariance
E[(X - μX)(Y - μY)] is covariance of X and Y, denoted σXY or Cov(X, Y).
N.B.
Variance = mean of the squared error of one random variable
Covariance = mean of the product of errors of two random variables
Variance is a measure of variability/spread of one random variable
Covariance is a measure of correlation of between the variabilities of two
random variables

Definition 4.4

Significance of covariance
Positive covariance: if X is large, is it more likely to have large Y. The two are
positively correlated.
Negative covariance: if X is large, it is more likely to have small Y. The two are
negatively correlated.
If X and Y are statistically independent. The covariance is zero. However, the

reverse is not true.

Useful identity 4.3
This is true for both discrete and continuous random variables.

Example 4.6
Let again recall the joint probability distribution in Example 3.5.
In Example 3.10, we’ve shown that random variables X and Y are not statistically
independent, i.e. they’re somewhat correlated. This correlation can be described by
Cov(X, Y).

Recall Example 4.4, we know that E(XY) = 3/14. Now
%
5 15 3 3
𝜇& = # 𝑥𝑔(𝑥) = 0 + 1 + 2 =
14 28 28 4
'#$
%
15 3 1 1
𝜇! = # 𝑦ℎ(𝑦) = 0 + 1 + 2 =
28 7 28 2
"#$

Using Useful Identity 4.3, we can therefore calculate:
3 æ 3 öæ 1 ö 9
s XY = E ( XY ) - µ X µY = - ç ÷ç ÷ = - .
14 è 4 øè 2 ø 56
The sign of σXY implies X and Y are negatively correlated.
High X is likely to give low Y.

High Y is likely to give low X.
As shown, X and Y are negatively correlated.

Correlation coefficient
Definition 4.4:
s XY = E ( XY ) - µ X µY
suggests that the magnitude of the covariance depends on the magnitude of X and Y.
In science and engineering, we always like to have a dimensionless “generalisation”.
The dimensionless version of covariance is called the correlation coefficient.

Definition 4.5
where −1 ≤ ρXY ≤ 1.
When ρXY = 0, there is no linear dependency between X and Y.
If the two variables are linearly dependent, e.g. Y = aX + b,

then ρXY = 1 if a > 0 and ρXY = −1 if a < 0.

Example 4.7
Continue from Example 4.6, let’s find the correlation coefficient between
X and Y.
( ) æ5ö
( )
æ 15 ö
( )
æ 3 ö 27
E ( X 2 ) = 0 2 ç ÷ + 12 ç ÷ + 2 2 ç ÷ = ( ) ( )
æ 15 ö æ3ö
( ) æ 1 ö 4
E (Y 2 ) = 0 2 ç ÷ + 12 ç ÷ + 2 2 ç ÷ =
è 14 ø è 28 ø è 28 ø 28 è 28 ø è7ø è 28 ø 7

Using Theorem 4.2, we can calculate the variance of X and Y.
2
27 æ 3 ö 45 45
s X2 = E ( X 2 ) - µ X2 = - ç ÷ = sX =
28 è 4 ø 112 112
2
4 æ1ö 9 9
s Y2 = E (Y 2 ) - µY2 = - ç ÷ = sY =
7 è2ø 28 28
By Definition 4.5.
9
-
s XY 56 1
r XY = = =-
s XsY 45 9 5
112 28
4.5 Means and variances of linear combinations of random
variables
The means and variances of linear functions of random variables can be calculated
conveniently using the means and variances of the random variables, using various short
cut methods.
N.B. Linear means that in the function, the combined power of all variables = 1. For
example:
g1(x) = 3x – 1 is linear h1(x, y) = 2x + 6y – 10 is linear
g2(x) = x2 + 1 is non-linear h2(x, y) = 4xy is non linear
g3(x) = log(x) + 2 is non linear h3(x, y) = 5xy is non linear

Some useful formulae
E(aX) = aE(X)
E(X + b) = E(X) + b
E(aX + b) = aE(X) + b
Var(aX) = a2Var(X)
Var(X + b) = Var(X)
Cov(aX,bY) = abCov(X,Y)
Summary
For discrete random variables
For continuous random variables

Mathematical Expectation

Uploaded by

Copyright:

Available Formats

Mathematical Expectation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mathematical Expectation

Uploaded by

Copyright:

Available Formats

Lecture 4

CH2010 Engineering Statistics AY2023 v6 PL 1

CH2010 Engineering Statistics AY2023 v6 PL 2

ì0.1 x = 1,2,3,  ,10

CH2010 Engineering Statistics AY2023 v6 PL 5

CH2010 Engineering Statistics AY2023 v6 PL 6

N.B. The sample mean, x in Lecture 1, is calculated using sample data. It is a

CH2010 Engineering Statistics AY2023 v6 PL 9

Let X be a continuous random variable.

CH2010 Engineering Statistics AY2023 v6 PL 10

N.B. E[g(X)] is not the same as g[E(x)].

CH2010 Engineering Statistics AY2023 v6 PL 11

CH2010 Engineering Statistics AY2023 v6 PL 13

CH2010 Engineering Statistics AY2023 v6 PL 14

= (0)(0) f (0,0) + (0)(1) f (0,1) + (1)(0) f (1,0) + (1)(1) f (1,1)

CH2010 Engineering Statistics AY2023 v6 PL 15

CH2010 Engineering Statistics AY2023 v6 PL 16

CH2010 Engineering Statistics AY2023 v6 PL 17

(X – μ) is called the error of the random variable

µ B = E ( X ) = (0)(0.2) + (1)(0.1) + (2)(0.3) + (3)(0.3) + (4)(0.1) = 2.0

= (0 - 2) 2 (0.2) + (1 - 2) 2 (0.1) + (2 - 2) 2 (0.3)

CH2010 Engineering Statistics AY2023 v6 PL 20

CH2010 Engineering Statistics AY2023 v6 PL 21

CH2010 Engineering Statistics AY2023 v6 PL 22

Discrete random variables Continuous random variables

CH2010 Engineering Statistics AY2023 v6 PL 23

CH2010 Engineering Statistics AY2023 v6 PL 24

E[(X - μX)(Y - μY)] is covariance of X and Y, denoted σXY or Cov(X, Y).

CH2010 Engineering Statistics AY2023 v6 PL 25

CH2010 Engineering Statistics AY2023 v6 PL 26

If X and Y are statistically independent. The covariance is zero. However, the

CH2010 Engineering Statistics AY2023 v6 PL 27

This is true for both discrete and continuous random variables.

CH2010 Engineering Statistics AY2023 v6 PL 28

CH2010 Engineering Statistics AY2023 v6 PL 29

CH2010 Engineering Statistics AY2023 v6 PL 30

High X is likely to give low Y.

CH2010 Engineering Statistics AY2023 v6 PL 31

CH2010 Engineering Statistics AY2023 v6 PL 32

When ρXY = 0, there is no linear dependency between X and Y.

If the two variables are linearly dependent, e.g. Y = aX + b,

CH2010 Engineering Statistics AY2023 v6 PL 33

CH2010 Engineering Statistics AY2023 v6 PL 34

CH2010 Engineering Statistics AY2023 v6 PL 36

For discrete random variables

For continuous random variables

CH2010 Engineering Statistics AY2023 v6 PL 38

You might also like