Lecture04 Continuous Random Variables Ver1
Lecture04 Continuous Random Variables Ver1
Ping Yu
F (x ) = P (X x).
- This definition is the same as in the discrete r.v. case, but there F (x ) is a step
function so is not differentiable.
- This definition of cdf implies P (a < X b ) = F (b ) F (a); recall that the
probability of a single value is zero for a continuous r.v., so whether a and b are
included in the interval or not does not affect the result.
The counterpart of pmf for a continuous r.v. is the probability density function (pdf),
which is defined as
d
f (x ) = F (x ) . [figure here]
dx
- Since F (x ) is nondecreasing, f (x ) 0. We denote the area where f (x ) > 0 as
S , called the support
Rb
of X .1 R R
- P (a X b ) = a f (x )dx, ∞∞ f (x )dx = S f (x )dx = 1 and
Rx Rx
F (x ) = ∞ f (x )dx = xm f (x )dx, where xm = inf (S ). [figure here]
1
(**) Usually, S is defined as the closure of this area, but we will not distinguish this difference in this lecture.
Ping Yu (HKU) Continuous Random Variables 4 / 35
Continuous Random Variables
R Rx
S f (x )dx = 1 and F (x ) = xm f (x )dx
Assume the gasoline sales at a gasoline station is equally likely from 0 to 1,000
gallons during a day; then the gasoline sales follow a uniform (probability)
distribution: 8
< 0, if x < 0,
F (x ) = 0.001x, if 0 x 1000
:
1, if x > 1000,
whose pdf is
0.001, if 0 x 1000
f (x ) =
0, otherwise.
= 0.001 1(0 x 1000),
where 1( ) is the indicator function which equals 1 when the statement in the
parentheses is true and 0 otherwise. [figure here]
In general, the uniform distribution on (a, b ) has the pdf
1
f (x ) = 1(a x b ),
b a
and the cdf
x a
F (x ) = for x 2 [a, b ] .
b a
Ping Yu (HKU) Continuous Random Variables 6 / 35
Continuous Random Variables
-3
10
1.2
1
1
0.5
0 0
-200 0 500 1000 1200 -200 0 500 1000 1,200
CDF PDF
2
i = 0, a + i∆ = a, and i = b a
∆ 1, a + (i + 1) ∆ = a + ∆ ∆ = b.
b a
Mean
The mean (or expected value, or expectation) of a continuous r.v. can be defined
through an approximation of a discrete r.v. in the previous slide:
(b a)/∆ 1
µ X : = E [X ] ∑ a + i + 21 ∆ P (a + i∆ < X a + (i + 1) ∆ )
i =0
(b a)/∆ 1
∑ a + i + 21 ∆ f a + i + 21 ∆ ∆
i =0
∆!0 R b
! a xf (x )dx.
R
∑ ! , a + i + 21 ∆ ! x, and ∆ ! dx.
The mean is the center of gravity of a pole (a, b ) with density at x being f (x ).
In general, the mean of any function of X , g (X ), is
Z
E [g (X )] = g (x )f (x ) dx.
S
Variance
1 (x µ )2
f xjµ, σ 2 = p e 2σ 2 for ∞ < x < ∞, (1)
2πσ 2
where µ 2 R, and σ 2 2 (0, ∞), e is Euler’s number, and π = 3.14159 is
Archimedes’ constant (the ratio of a circle’s circumference to its diameter).
- Since the normal distribution depends only on µ and σ 2 , we denote a r.v. X with
pdf (1) as X N µ, σ 2 .
R
The cdf of the normal distribution, F xjµ, σ 2 = x ∞ f xjµ, σ 2 dx, does not have
an analytic form (i.e., a closed-form expression), but computation of probabilities
based on the normal distribution is direct nowadays.
This distribution has many applications in business and economics, e.g., the
dimensions of parts, the heights and weights of human beings, the test scores, the
stock prices, etc. all roughly follow normal distributions.
As will be discussed in Lecture 5, the distribution of sample mean will converge to
a normal distribution when the sample size gets large.
continue
3
He is also referred to as the "prince of mathematics", known for many things, e.g., the least squares.
Ping Yu (HKU) Continuous Random Variables 14 / 35
The Normal Distribution
For X N µ, σ 2 ,
Figure: f xjµ, σ 2 with Different µ and σ 2 ’s: (a) same σ 2 , different µ’s; (b) same µ = 5, different
σ 2 ’s.
2
The tail of f xjµ, σ 2 is approximately e x which shrinks to zero very quickly.
The (population) kurtosis is often used to measure the heaviness of a distribution’s
tail [why?]: h i
E (X µ X )4 µ
KurtX = =: 44 ,
σ 4X σ
where µ 4 is the fourth central moment.
- It can be shown that the kurtosis of N µ, σ 2 is 3, which is chosen as a
benchmark, i.e., if a distribution’s kurtosis is larger than 3, it is called heavy tailed,
and if less than 3, called light tailed. Heavy-tailed phenomena seem more frequent
than light-tailed ones since the tail of a normal distribution is already very thin.
The sample kurtosis is
4
∑ni=1 (xi x̄ )
kurtosis = 4
.
ns
Figure: Normal Density Function with Symmetric Upper and Lower Values
Since the normal distribution is most-used, we often need a way to check whether
the data in hand are approximately normally distributed.
Normal probability plots provide an easy way to achieve this goal; Lecture 8 will
provide a more rigorous test.
If the data are indeed from a normal distribution, then the plot will be a straight
line. [figure here]
- The vertical axis has a transformed cumulative normal scale.
- Two dotted lines provide an interval within which data points from a normal
distribution would occur most cases.
If the data are not from a normal distribution, then the plot will deviate from a
straight line. [figure here]
Figure: Normal Probability Plot for Data Simulated from a Normal Distribution
Figure: Normal Probability Plot for Data Simulated from a Uniform Distribution
iid
Figure: Galton Board (or quincunx, or bean machine): X = ∑ni=1 Xi , where Xi Bernoulli(0.5)
2 f 1, 1g
5
He was exiled to England due to religious persecution, and became a friend of Newton there. To make a
living, he became a private tutor of mathematics.
6
He was referred to as the French Newton. His students include Poisson and Napoleon.
Ping Yu (HKU) Continuous Random Variables 25 / 35
Normal Distribution Approximation for Binomial Distribution
continue
Because normal distributions are easier to handle, this approximation can simplify
the analysis of some problems. For example, if X is the number of customers after
n people browsed a store’s website, and based on past experiences, the
probability of visiting the store after browsing is p, the manager wants to predict
the probability of the number of customers falling in an interval, say, [a, b ].
- From the normal approximation,
!
a np X np b np
P (a X b ) = P p p p
np (1 p ) np (1 p ) np (1 p )
! !
b np a np
Φ p Φ p .
np (1 p ) np (1 p )
The normal approximation can also be applied to the proportion (or percentage)
r.v., P = X /n:
approx. np np (1 p ) P (1 P )
P N , N P, ,
n n2 n
where the last approximation is from using P to substitute p, i.e., p is estimated
rather than known a priori.
Ping Yu (HKU) Continuous Random Variables 26 / 35
Normal Distribution Approximation for Binomial Distribution
F (tjλ ) can be used to model the waiting time, i.e., the probability that an arrival
will occur during an interval of time t (T is the waiting time before the first arrival),
so is particularly useful for waiting-line, or queuing, problems.
The exponential distribution is closely related to the Poisson distribution: If
iid
T1 , , Tn Exponential(λ ), then
n o
max nj ∑i =1 Ti 1
n
Poisson (λ ) ; [proof not required]
continue
Memorylessness:
P (T > s + t \ T > s )
P (T > s + tjT > s ) =
P (T > s )
P (T > s + t )
=
P (T > s )
e λ (s +t )
= λs
e
λt
= e
= P (T > t ) .
The counterparts of the joint and marginal probability distributions for multivariate
discrete r.v.’s are the joint pdf
dn
f ( x1 , , xK ) = F (x1 , , xK )
dx1 dxK
and the marginal pdf
Z Z
d
f (xi ) = F ( xi ) = f (x1 , , xK ) dx1 dxi 1 dxi +1 dxK .
dxi
- The independence of X1 , , XK can be equivalently defined as
f (x1 , , xK ) = f (x1 ) f (xK ) for all x1 , , xK .
Ping Yu (HKU) Continuous Random Variables 33 / 35
Jointly Distributed Continuous Random Variables
f (x, y )
f (y jx ) = .
f (x )
If W = ∑K
i =1 ai Xi , then
K
µ W = E [W ] = ∑ ai µ i ,
i =1
and
σ 2W = Var (W ) = ∑i =1 ai2 σ 2i + 2 ∑i =1 ∑j>i ai aj σ ij ,
K K 1 K
which reduces to ∑K 2
i =1 σ i if σ ij = 0 for all i 6= j and ai = 1 for all i.
Actually, all the results on mean and variance for discrete r.v.’s apply to continuous
r.v.’s.
The covariance and correlation between two continuous r.v.’s (X , Y ) are similarly
defined as
Cov (X , Y ) = σ XY = E [(X µ X ) (Y µ Y )] = E [XY ] µX µY ,
σ XY
Corr (X , Y ) = ρ XY = .
σX σY