Mathematical Expectation
Mathematical Expectation
Mathematical Expectation
Figure 1: Discrete vs. continuous r.v.s. Left: The CDF of a discrete r.v. has
jumps at each point in the support. Right: The CDF of a continuous r.v.
increases smoothly.
Theorem 2.
LOTUS, continuous: If X is a continuous r.v. with PDF f and g is
a function from R toR, then
Z ∞
E (g (X )) = g (x)f (x)dx
−∞
Uniform distribution
I Uniform distribution: A continuous r.v. U is said to have the
Uniform distribution on the interval (a, b) if its PDF is:
(
1
a<x <b
f (x) = b−a
0 otherwise
For a general Unif (a, b) distribution, the PDF is constant on (a, b), and
the CDF is ramp-shaped, increasing linearly from 0 to 1 as x ranges from
a to b.
I Location-scale transformation: Let X be an r.v. and
Y = σX + µ, where σ and µ are constants with σ > 0. Then
we say that Y has been obtained as a location-scale
transformation of X. Here µ controls how the location is
changed and σ controls how the scale is changed.
I In a location-scale transformation, starting with
X ∼ Unif (a, b) and transforming it to Y = cX + d where c
and d are constants with c > 0, Y is a linear function of X
and Uniformity is preserved: Y ∼ Unif (ca + d, cb + d).
I In studying Uniform distributions, a useful strategy is to start
with an r.v. that has the simplest Uniform distribution, figure
things out in the friendly simple case, and then use a
location-scale transformation to handle the general case.
I The location-scale strategy says to start with U ∼ Unif (0, 1).
Z 1
1
E (U) = xd(x) =
0 2
Z 1
2 1
E (U ) = x 2 d(x) =
0 3
Var (U) = 1/3 − 1/4 = 1/12
I First we change the support from an interval of length 1 to an interval of
length b − a, so we multiply U by the scaling factor b − a to obtain a
Unif (0, b − a) r.v. Then we shift everything until the left endpoint of the
support is at a. Thus, if U ∼ Unif (0, 1), the random variable.
Ũ = (b − a)U + a
is distributed Unif (a, b).
I By linearity of expectation,
b−a a+b
E (Ũ) = E (a + (b − a)U) = a + (b − a)E (U) = a + =
2 2
By the fact that additive constants don’t affect the variance while
multiplicative constants come out squared,
Var (Ũ) = Var (a + (b − a)U) = Var ((b − a)U)
(b − a)2
= (b − a)2 Var (U) =
12
Normal distribution
I (Standard Normal distribution). A continuous r.v. Z is said to
have the standard Normal distribution if its PDF ϕ is given by
1 −z 2
ϕ(z) = √ e 2 , ∞ < z < ∞
2π
I We write this as Z ∼ N(0, 1) since, as we will show, Z has
mean 0 and variance 1.
I The constant √1 in front of the PDF is needed to make the
2π
PDF integrate to 1. Such constants are called normalizing
constants because they normalize the total area under the
PDF to 1.
I The standard Normal CDF Φ is the accumulated area under
the PDF:
Z z Z z
1 −t 2
Φ(z) = ϕ(t)dt = √ e 2 dt.
−∞ −∞ 2π
Figure 3 Standard Normal PDF ϕ (left) and CDF Φ (right).
Important symmetry properties
1. Symmetry of PDF: ϕ satisfies ϕ(z) = ϕ(−z),
2. Symmetry of tail areas: For example. the area under the PDF curve to
the left of -2, which is P(Z ≤ −2) = Φ(−2) by definition, equals the area
to the right of 2, which is P(Z ≥ 2) = 1 − Φ(2). In general, we have
Φ(z) = 1 − Φ(−z)
for all z. This can be seen visually by looking at the PDF curve, and
mathematically by substituting u = −t below and using the fact that
PDFs integrate to 1:
Z −z Z ∞
Φ(−z) = ϕ(t)dt = ϕ(u)du
−∞ z
Z z
=1− ϕ(u)du = 1 − Φ(z)
−∞
E (µ + σZ ) = E (µ) + σE (Z ) = µ,
Var (µ + σZ ) = Var (σZ ) = σ 2 Var (Z ) = σ 2 .
X −µ
∼ N(0, 1)
σ
Theorem 3.
Normal CDF and PDF: Let X ∼ N(µ, σ 2 ). Then the CDF of X is
x −µ
F (x) = Φ( )
σ
x −µ 1
f (x) = ϕ( )
σ σ
I Proof: For the CDF, we start from the definition F (x) = P(X ≤ x),
standardize, and use the CDF of the standard Normal:
X −µ x −µ x −µ
F (x) = P(X ≤ x) = P( ≤ ) = Φ( )
σ σ σ
Then we differentiate to get the PDF, remembering to apply the chain rule:
d x −µ
f (x) = Φ( )
dx σ
x −µ 1
= ϕ( )
σ σ
We can also write out the PDF as
1 (x−µ)2
(− )
f (x) = √ e 2σ 2
2πσ
Exponential distribution
Var (X ) = E (X 2 ) − (EX )2 = 1
I Y = X
λ
∼ Expo(λ) we then have
1 1
E (X ) = ;
E (Y ) =
λ λ
1 1
Var (Y ) = 2 Var (X ) = 2
λ λ
so the mean and variance of the Expo(λ) distribution are 1/λ and 1/λ2 ,
respectively.