Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Mathematical Expectation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Lecture 5

Continuous random variables


I So far we have been working with discrete random variables, whose possible
values can be written down as a list. In this lecture we will discuss continuous
r.v.s, which can take on any real value in an interval (possibly of infinite length,
such as (0, ∞) or the entire real line).

Figure 1: Discrete vs. continuous r.v.s. Left: The CDF of a discrete r.v. has
jumps at each point in the support. Right: The CDF of a continuous r.v.
increases smoothly.

I Continuous r.v.: A random variable X is continuous if its cdf, FX (x) is a


continuous function of x.
I Probability density function: For a continuous r.v. X with CDF
F , the probability density function (PDF) of X is the derivative f of
the CDF, given by f (x) = F 0 (x). The support of X, and of its
distribution, is the set of all x where f (x) > 0.
I An important way in which continuous r.v.s differ from discrete r.v.s
is that for a continuous r.v. X, P(X = x) = 0 for all x. This is
because P(X = x) is the height of a jump in the CDF at x, but the
CDF of X has no jumps!
I The PDF is analogous to the PMF in many ways, but there is a key
difference: for a PDF f , the quantity f (x) is not a probability, and
in fact it is possible to have f (x) > 1 for some values of x. To
obtain a probability, we need to integrate the PDF.
I PDF to CDF: Let X be a continuous r.v. with PDF f . Then the
CDF of X is given by
Z x
f (t)dt.
−∞
I The above result is analogous to how we obtained the value of a discrete
CDF at x by summing the PMF over all values less than or equal to x;
here we integrate the PDF over all values up to x, so the CDF is the
accumulated area under the PDF. Since we can freely convert between
the PDF and the CDF using the inverse operations of integration and
differentiation, both the PDF and CDF carry complete information about
the distribution of a continuous r.v.
I Since the PDF determines the distribution, we should be able to use it to
find the probability of X falling into an interval (a, b). A handy fact is
that we can include or exclude the endpoints as we wish without altering
the probability, since the endpoints have probability 0:

P(a < X < b) = P(a < X ≤ b) = P(a ≤ X < b) = P(a ≤ X ≤ b).


Z b
P(a < X ≤ b) = F (b) − F (a) = f (x)dx.
a

I Therefore, to find the probability of X falling in the interval (a, b]


(or (a, b), [a, b), or [a, b]) using the PDF, we simply integrate the PDF
from a to b. In general, for an arbitrary region A ⊆ R
Z
P(X ∈ A) = f (x)dx.
A
Theorem 1.
Valid PDFs: The PDF f of a continuous r.v. must satisfy the
following two criteria:
I Nonnegative: f (x) ≥ 0;
R∞
I Integrates to 1: −∞ f (x)dx = 1.
I Proof: The first criterion is true because probability is
nonnegative; if f (x0 ) were negative, then we could integrate
over a tiny region around x0 and get a negative probability.
Alternatively, note that the PDF at x0 is the slope of the CDF
at x0 , so f (x0 ) < 0 would imply that the CDF is decreasing at
xR0 , which is not allowed. The second criterion is true since

−∞ f (x)dx is the probability of X falling somewhere on the
real line, which is 1.
I Expectation of a continuous r.v.: The expected value (also
called the expectation or mean) of a continuous r.v. X with
PDF f is
Z ∞
E (X ) = xf (x)dx
−∞

Theorem 2.
LOTUS, continuous: If X is a continuous r.v. with PDF f and g is
a function from R toR, then
Z ∞
E (g (X )) = g (x)f (x)dx
−∞
Uniform distribution
I Uniform distribution: A continuous r.v. U is said to have the
Uniform distribution on the interval (a, b) if its PDF is:
(
1
a<x <b
f (x) = b−a
0 otherwise

We denote this by U ∼ Unif (a, b).


I This is a valid PDF because the area under the curve is just the
1
area of a rectangle with width b − a and height (b−a) .
I CDF:

0
 x ≤a
x−a
F (x) = a<x <b
 b−a
1 x ≥b

Uniform distribution (contd.)
The Uniform distribution that we will most frequently use is the
Unif (0, 1) distribution, also called the standard Uniform. The Unif (0, 1)
PDF and CDF are particularly simple: f (x) = 1 and F (x) = x for
0 < x < 1.

Figure 2: Unif(0, 1) PDF and CDF.

For a general Unif (a, b) distribution, the PDF is constant on (a, b), and
the CDF is ramp-shaped, increasing linearly from 0 to 1 as x ranges from
a to b.
I Location-scale transformation: Let X be an r.v. and
Y = σX + µ, where σ and µ are constants with σ > 0. Then
we say that Y has been obtained as a location-scale
transformation of X. Here µ controls how the location is
changed and σ controls how the scale is changed.
I In a location-scale transformation, starting with
X ∼ Unif (a, b) and transforming it to Y = cX + d where c
and d are constants with c > 0, Y is a linear function of X
and Uniformity is preserved: Y ∼ Unif (ca + d, cb + d).
I In studying Uniform distributions, a useful strategy is to start
with an r.v. that has the simplest Uniform distribution, figure
things out in the friendly simple case, and then use a
location-scale transformation to handle the general case.
I The location-scale strategy says to start with U ∼ Unif (0, 1).
Z 1
1
E (U) = xd(x) =
0 2
Z 1
2 1
E (U ) = x 2 d(x) =
0 3
Var (U) = 1/3 − 1/4 = 1/12
I First we change the support from an interval of length 1 to an interval of
length b − a, so we multiply U by the scaling factor b − a to obtain a
Unif (0, b − a) r.v. Then we shift everything until the left endpoint of the
support is at a. Thus, if U ∼ Unif (0, 1), the random variable.
Ũ = (b − a)U + a
is distributed Unif (a, b).
I By linearity of expectation,
b−a a+b
E (Ũ) = E (a + (b − a)U) = a + (b − a)E (U) = a + =
2 2
By the fact that additive constants don’t affect the variance while
multiplicative constants come out squared,
Var (Ũ) = Var (a + (b − a)U) = Var ((b − a)U)
(b − a)2
= (b − a)2 Var (U) =
12
Normal distribution
I (Standard Normal distribution). A continuous r.v. Z is said to
have the standard Normal distribution if its PDF ϕ is given by
1 −z 2
ϕ(z) = √ e 2 , ∞ < z < ∞

I We write this as Z ∼ N(0, 1) since, as we will show, Z has
mean 0 and variance 1.
I The constant √1 in front of the PDF is needed to make the

PDF integrate to 1. Such constants are called normalizing
constants because they normalize the total area under the
PDF to 1.
I The standard Normal CDF Φ is the accumulated area under
the PDF:
Z z Z z
1 −t 2
Φ(z) = ϕ(t)dt = √ e 2 dt.
−∞ −∞ 2π
Figure 3 Standard Normal PDF ϕ (left) and CDF Φ (right).
Important symmetry properties
1. Symmetry of PDF: ϕ satisfies ϕ(z) = ϕ(−z),
2. Symmetry of tail areas: For example. the area under the PDF curve to
the left of -2, which is P(Z ≤ −2) = Φ(−2) by definition, equals the area
to the right of 2, which is P(Z ≥ 2) = 1 − Φ(2). In general, we have

Φ(z) = 1 − Φ(−z)

for all z. This can be seen visually by looking at the PDF curve, and
mathematically by substituting u = −t below and using the fact that
PDFs integrate to 1:
Z −z Z ∞
Φ(−z) = ϕ(t)dt = ϕ(u)du
−∞ z
Z z
=1− ϕ(u)du = 1 − Φ(z)
−∞

3. Symmetry of Z and -Z: If Z ∼ N(0, 1), then −Z ∼ N(0, 1) as well. To


see this, note that the CDF of -Z is

P(−Z ≤ z) = P(Z ≥ −z) = 1 − Φ(−z)

but that is Φ(z) according to what we just argued. So −Z has CDF Φ


I The general Normal distribution has two parameters, denoted
µ and σ 2 , which correspond to the mean and variance (so the
standard Normal is the special case where µ = 0 and σ 2 = 1).
Starting with a standard Normal r.v. Z ∼ N(0, 1), we can get
a Normal r.v. with any mean and variance by a location-scale
transformation (shifting and scaling).
I Normal distribution: If Z ∼ N(0, 1), then X = µ + σZ is
said to have the Normal distribution with mean µ and
variance σ 2 . We denote this by X ∼ N(µ, σ 2 ).

E (µ + σZ ) = E (µ) + σE (Z ) = µ,
Var (µ + σZ ) = Var (σZ ) = σ 2 Var (Z ) = σ 2 .

I For X ∼ N(µ, σ 2 ), the standardized version of X is

X −µ
∼ N(0, 1)
σ
Theorem 3.
Normal CDF and PDF: Let X ∼ N(µ, σ 2 ). Then the CDF of X is

x −µ
F (x) = Φ( )
σ

and the PDF of X is

x −µ 1
f (x) = ϕ( )
σ σ

I Proof: For the CDF, we start from the definition F (x) = P(X ≤ x),
standardize, and use the CDF of the standard Normal:
X −µ x −µ x −µ
F (x) = P(X ≤ x) = P( ≤ ) = Φ( )
σ σ σ
Then we differentiate to get the PDF, remembering to apply the chain rule:
d x −µ
f (x) = Φ( )
dx σ
x −µ 1
= ϕ( )
σ σ
We can also write out the PDF as
1 (x−µ)2
(− )
f (x) = √ e 2σ 2
2πσ
Exponential distribution

I The Exponential distribution is the continuous counterpart to the


Geometric distribution. Recall that a Geometric random variable
counts the number of failures before the first success in a sequence
of Bernoulli trials (Exercise 45 in the book explores the sense in
which the Geometric converges to the Exponential, in the limit
where the Bernoulli trials are performed faster and faster but with
smaller and smaller success probabilities.)
I The story of the Exponential distribution is analogous, but we are
now waiting for a success in continuous time, where successes arrive
at a rate of λ successes per unit of time. The average number of
successes in a time interval of length t is λt, though the actual
number of successes varies randomly. An Exponential random
variable represents the waiting time until the first arrival of a
success.
Exponential distribution (contd.)

I (Exponential distribution). A continuous r.v. X is said to have the


Exponential distribution with parameter λ if its PDF is

f (x) = λe −λx , x > 0, λ > 0

We denote this by X ∼ Expo(λ).


The corresponding CDF is

F (x) = 1 − e −λx , x > 0


Exponential distribution (contd.)

Figure 4 Expo(1) PDF and CDF.


Exponential distribution (contd.)
I We can use scaling to get from the simple Expo(1) to the general
Expo(λ): if X ∼ Expo(1), then
X
Y = ∼ Expo(λ)
λ
since
X
P(Y ≤ y ) = P( ≤ y ) = P(X ≤ λy ) = 1 − e −λy , y > 0
λ
Conversely, if Y ∼ Expo(λ), then λY ∼ Expo(1).
I Using standard integration by parts calculations
Z ∞ Z ∞
E (X ) = xe −x dx = 1; E (X 2 ) = x 2 e −x dx = 2;
0 0

Var (X ) = E (X 2 ) − (EX )2 = 1
I Y = X
λ
∼ Expo(λ) we then have
1 1
E (X ) = ;
E (Y ) =
λ λ
1 1
Var (Y ) = 2 Var (X ) = 2
λ λ
so the mean and variance of the Expo(λ) distribution are 1/λ and 1/λ2 ,
respectively.

You might also like