Notes 2

CLASSICAL PROBABILITY 2008
2. MODES OF CONVERGENCE AND INEQUALITIES

JOHN MORIARTY
In many interesting and important situations, the object of interest

is influenced by many random factors. If we can construct a probability model for the individual factors, then the limit theorems of classical
probability may give useful statistical information about their cumulative effect. Examples are:
stock prices are affected by many individual trades and pieces
of information
the flow of traffic on a motorway or of a crowd though a stadium
is the result of many individual decisions.
We will see that in such situations, recognisable patterns can occur. In other words, we see convergence to some limiting object. The
limiting object may be, among other things:
a number (the law of large numbers)
a random variable (the central limit theorem).
The simulations given in the lecture illustrate these two possiblities.
Other limiting objects are possible, but we do not study them in this
course.
Depending on the purpose of our probability model, we may have
different types of convergence in mind. For example, we may wish to
know:
Does this convergence always happen?

If not, what is the probability that it does not?
What is the distribution of the limiting object?
How big is the average error between the actual and limiting
objects?
and even these questions are still imprecise. It should be clear that
we need a range of definitions of convergence to random objects.
We have seen above that, in the statements of the law of large numbers and the central limit theorem, the limiting objects are of different
types. Appropriately, they use different notions or modes of convergence of a sequence of random variables. Below four different modes
of convergence are defined, and certain relationships between them are
proven.
First, however, we state and prove some useful inequalities.
1
JOHN MORIARTY
1. Inequalities
Question. Suppose you make a sequence of 10 investments, and I
offer to either:
(1) take your average gain, then square it; or
(2) square each of your gains, then average it,
and pay you the result. Which should you choose?
Theorem 1.1. (Jensens inequality) Let X be a random variable
with E(X) < , and let f : R R be a convex function. Then
f (E(X)) E(f (X)).
Remark Recall that f : R R is convex if for any x0 R there exists
a R such that
f (x) (x x0 ) + f (x0 )
for all x R. If f is twice differentiable, then f is convex if and only
if f 00 0. Examples of convex functions: y 7 x, x2 , ex , |x|.
Proof. (Jensens inequality) Let f be convex, and let R be such
that
f (x) (x E(X)) + f (E(X))
for all x. Then
E(f (X)) E ((x E(X)) + f (E(X)))
= f (E(X)).

Question. Suppose we are designing a flood defence. Let X be
the (random) level of high tide. We have historical data giving an
estimate for E(X), but no information on the distribution of X. How
high should the flood defence be to ensure just 5 percent chance of
flooding?
Theorem 1.2. Let X be a random variable, and let f : R [0, ).
Then
E(f (X))
P (f (X) a)
a
for all a > 0.
Proof. Let A = {f (X) a}. Then
f (X) a1A ,
where
1 if A
0 if 6 A
is the indicator function of A. Taking expectations gives
1A () =
E(f (X)) E(a1A ) = aP (A) = aP (f (X) a),

which finishes the proof.
LECTURE NOTES 2
Choosing f (x) = |x| and f (x) = x2 gives Markovs inequality

P (|X| a)
E|X|
a
and Chebychevs inequality

P (|X| a)
E(X 2 )
,
a2
respectively.
Question. Suppose that X is a random variable with finite mean
and variance. Using all the results proved so far, which inequality gives
more information about P (|X| a)? Is it
(1) Markovs,
(2) Chebyshevs, or
(3) neither?
Question. Suppose that we have two random variables X and Y ,
and want some information on the average of the size of their product,
|XY |. What information about X and Y might we need?
Recall that the nth moment of a random variable X is defined to be
E(X n ).
Theorem 1.3. (H
olders inequality) Assume p > 1 and q > 1
1
1
satisfy p + q = 1. Let X and Y be two random variables. Then
E|XY | (E|X|p )1/p (E|Y |q )1/q .
Proof. If E|X|p = 0, then P (X = 0) = 1 (you can use Theorem 1.2),
so the inequality clearly holds (and also if E|Y |q = 0). Thus we may
assume that E|X|p > 0 and E|Y |q > 0.
Note that the function g defined by
tp tq
g(t) = +
,
p
q
t>0
satisfies g(t) 1 (you can examine the derivative for t around the point
1). Inserting

t=
|X|
(E|X|p )1/p
1/q
|Y |
(E|Y |q )1/q
1/p
gives
1
1 g(t) =
p
|X|
(E|X|p )1/p
p/q
E(|Y |q )1/q 1 (E|X|p )1/p

+
|Y |
q
|X|
|Y |
(E|Y |q )1/q
q/p
JOHN MORIARTY
for such that X()Y () 6= 0. Consequently,

|XY |
|X|1+p/q
|Y |1+q/p
1
1
+
(E|X|p )1/p (E|Y |q )1/q
p (E|X|p )1/q+1/p q (E|Y |q )1/p+1/q
1 |X|p
1 |Y |q
=
+
p (E|X|p ) q (E|Y |q )
(this inequality also holds if X()Y () = 0). Taking expectations of
both sides gives
E|XY | (E|X|p )1/p (E|Y |q )1/q .

Using p = q = 2 in Holders inequality gives the Cauchy-Schwartz
inequality.
Consequence 1.4. (Cauchy-Schwartz inequality)
(E|XY |)2 (E|X|2 )(E|Y |2 ).
Question. Let X, Y be random variables with means 1 , 2 and
variances 1 , 2 respectively, all finite. What can you now say about
the average size of their product?
Our final inequality gives a similar estimate, in terms of the moments
of X and Y , of the average of the size of their sum, |X + Y |.
Theorem 1.5. (Minkowskis inequality) Let p 1. Then
(E|X + Y |p )1/p (E|X|p )1/p + (E|Y |p )1/p .
Proof. If p = 1, the inequality follows directly from the triangle inequality. Thus we assume that p > 1, and we let q > 1 be such that
1
+ 1q = 1. Let Z = |X + Y |. Then
p
EZ p = E(ZZ p1 )
E(|X|Z p1 ) + E(|Y |Z p1 )
(E|X|p )1/p (EZ q(p1) )1/q + (E|Y |p )1/p (EZ q(p1) )1/q

= (E|X|p )1/p + (E|Y |p )1/p (EZ q(p1) )1/q ,
where we in the first inequality used |X + Y | |X| + |Y |, and in the
second inequality we used Holders inequality. Since q(p 1) = p we
have
(EZ p )11/q (E|X|p )1/p + (E|Y |p )1/p ,
which finishes the proof since 1 1/q = 1/p.
LECTURE NOTES 2
2. Different modes of convergence

In simulations we saw convergence of random objects to limiting
objects. Those limiting objects were either random or deterministic
(which is a special case of random!). We also discussed different possible
criteria for convergence to a random object. In order to do calculations
or prove theorems, we of course need precise technical definitions of
these different modes of convergence.
Let X1 , X2 , X3 , ... be a sequence of random variables defined on the
same probability space (, F, P ), and let X be a random variable on
this probability space. We want to make precise the statement
Xn X as n .
More precisely, we will consider four different modes of convergence.
Definition 2.1. We say that
Xn converges to X almost surely if
{ : Xn () X() as n }
is an event with probability 1.
Xn converges to X in r:th mean if E|Xn |r < for all n and
E|Xn X|r 0
as n .
Xn converges to X in probability if for all > 0 we have
P (|Xn X| > ) 0
as n .
Xn converges to X in distribution if Fn (x) F (x) as n
for all x where F (x) is continuous.
Question. Match each of the following four statements to the corresponding mode of convergence:
As n ,
(1) For any 0 < a < b < 1, the probability that Xn lies between
the values a and b tends to (b a).
(2) Xn always tends to Y /2.
(3) The probability that Xn is more than distance 1/n2 from eZ is
1/n.
(4) Xn has mean value 1 and the variance of Xn is en .
An obvious question is whether we really need so many modes of
convergence. The answer is yes, although the situation does simplify a
little: using inequalities proved in this chapter we can establish some
relationships between the modes.
It turns out that these four different modes of convergence are distinct, i.e. no two modes of convergence are equivalent. We show below
that the following set of implications between them holds:
JOHN MORIARTY
Theorem 2.2. Let Xn , n = 1, 2, 3... and X be random variables on

some probability space. We then have
Xn X in distribution
Xn X in probability
Xn X almost surely
Xn X in r:th mean for some r 1.

Also,
Xn X in r:th mean = Xn X in s:th mean
for r > s 1.
Theorem 2.2 follows from the lemmas below.
Question. Can you convince yourself that the next lemma is true
(without a formal proof)?
Lemma 2.3. Convergence in probability implies convergence in distribution.
Proof. Assume Xn X in probability. If > 0, then
Fn (x) = P (Xn x)
= P (Xn x and X x + ) + P (Xn x and X > x + )
F (x + ) + P (|X Xn | > ).
Similarly,
F (x ) P (X x and Xn x) + P (X x and Xn > x)
Fn (x) + P (|Xn X| > ).
Thus we have
F (x ) P (|Xn X| > ) Fn (x) F (x + ) + P (|Xn X| > ).
Technical point: We now want to let n so that the P ( ) terms
disappear; however we dont yet know whether Fn (x) has a limit. So
we write
F (x ) lim inf Fn (x) lim sup Fn (x) F (x + ).
n
If x is a point of continuity of F , then letting 0 shows that the

lim inf and lim sup are equal, so Fn (x) does have a limit and
lim Fn (x) = F (x),
Lemma 2.4. i. Convergence in rth mean implies convergence in sth

mean for 1 s < r.
ii. Convergence in mean (r = 1) implies convergence in probability.
LECTURE NOTES 2
Proof. i. Use Jensens inequality to show that

(E|Z|s )1/s (E|Z|r )1/r ,
see Problem 3 on Exercise Sheet 2. Using this inequality with Z =
Xn X gives
(E|Xn X|s )1/s (E|Xn X|r )1/r .
It follows (by continuity) that convergence in rth mean implies convergence in sth mean.
ii. Markovs inequality gives
E|Xn X|
,

which shows that convergence in mean implies convergence in probability.

P (|Xn X| )
Question. Could you have also seen that part ii above was true
before seeing the proof? If so, is your argument different to the formal
proof?
Lemma 2.5. Xn X almost surely if and only if

P sup |Xk X| 0
kn
as n for all > 0.

Proof. Let Ak () = {|Xk X| }, and let
A() = {Ak (), i.o.}.
We claim that
P (A()) = 0 for all > 0 P (Xn X) = 1.
(1)
To see this, first note that

Xk () X()
implies 6 A() for all > 0. Thus
P (Xn X) = 1 = P (A()C ) = 1 = P (A()) = 0 for all > 0.
Moreover,
!
P {Xn X}

C
= P
A()
=P
>0
!
A(1/m)
m=1
P (A(1/m)),
m=1
which proves the other implication, and thus finishes the proof of (1).
JOHN MORIARTY
Now, let
Bn () =
Ak () = {sup |Xk X| }.

kn
k=n
Then
B1 () B2 () B3 () ... A()
with
lim Bn () = A().
Thus
P (A()) = 0 P ( lim Bn ()) = lim P (Bn ()) = 0
n
(the second last equality is justified by Problem 2 on Exercise Sheet

1). From (1) we conclude that
P (Xn X) = 1
lim P (Bn ()) = 0 for all > 0
lim P (sup |Xk X| ) = 0 for all > 0,
kn
Lemma 2.5 has two important consequences:

Consequence 2.6. Convergence almost surely implies convergence in
probability.
Proof. Let > 0. Then
P (|Xn X| > ) P (sup |Xk X| ) 0
kn
as n if Xn X almost surely.
Consequence 2.7. If
P (|Xk X| > ) <
k=1
for all > 0, then Xk X almost surely.

Proof. If
P (|Xk X| > ) < ,
k=1
then
P (sup |Xk X| )
kn
P (|Xk X| ) 0
k=n
as n . Thus Xk X almost surely follows from Lemma 2.5.
LECTURE NOTES 2
All implications in Theorem 2.2 are strictthat is, their reverse implications do not hold. For example, there exists a sequence of random
variables {Xn }
n=1 such that Xn X in probability but not almost
surely (the strictness of the implications in Theorem 2.2 follows from
the problems in Example Sheet 2 and 3). However, under some extra assumptions we have the following converses to the implications in
Theorem 2.2.
Theorem 2.8. Let Xn , n = 1, 2, ... and X be random variables.
(i) Xn C in distribution, where C is a constant, implies Xn C
in probability.
(ii) Xn X in probability and P (|Xn | M ) = 1 for some constant
M imply Xn X in rth mean for r 1.
(iii)
X
P (|Xn X| > ) < for all > 0 = Xn X a.s.
n=1
(iv) Xn X in probability implies the existence of a subsequence

{nk }
k=1 with lim nk such that Xnk X as k almost
surely.
Proof. (i) is Problem 1 on Exercise Sheet 4, and (iii) is Consequence 2.7
above.
To prove (ii), we first claim that
(2)
P (|X| M ) = 1.
To see this, let > 0. Then

P (|X| M + ) =
P (|X
P (|X
P (|X
P (|X
Xn + Xn | M + )
Xn | + |Xn | M + )
Xn | M |Xn | + )
Xn | ) 0
since Xn X in probability. Letting 0 and using the continuity

property (for increasing families of events) proves the claim (2). Now,
for > 0 let
An = {|Xn X| > }.
Then
|Xn X|r r 1ACn + (2M )r 1An ,
so
E|Xn X|r E(r 1ACn + (2M )r 1An )
r + (2M )r P (An )
= r + (2M )r P (|Xn X| > ) r
as n if Xn X in probability. Since > 0 is arbitrary, it follows
that Xn X in rth mean.
10
JOHN MORIARTY
To prove (iv), pick an increasing sequence nk such that

P (|Xnk X| > 1/k) 1/k 2
(this can be done since Xn X in probability). Then, if > 0, we
have
1/
X
X
P (|Xnk X| > )
P (|Xnk X| > )
k=1
k=1
P (|Xnk X| > 1/k)
k1/
1/
X
k=1
P (|Xnk X| > ) +
1/k 2 < .
k=1
Consequently, Xnk X almost surely as k according to (iii)

above.

Notes 2

Uploaded by

Copyright:

Available Formats

Notes 2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Notes 2

Uploaded by

Copyright:

Available Formats

CLASSICAL PROBABILITY 2008

2. MODES OF CONVERGENCE AND INEQUALITIES

In many interesting and important situations, the object of interest

Does this convergence always happen?

E(f (X)) E(a1A ) = aP (A) = aP (f (X) a),

Choosing f (x) = |x| and f (x) = x2 gives Markovs inequality

and Chebychevs inequality

E(|Y |q )1/q 1 (E|X|p )1/p

for such that X()Y () 6= 0. Consequently,

2. Different modes of convergence

Theorem 2.2. Let Xn , n = 1, 2, 3... and X be random variables on

Xn X in r:th mean for some r 1.

If x is a point of continuity of F , then letting  0 shows that the

which finishes the proof.

Lemma 2.4. i. Convergence in rth mean implies convergence in sth

Proof. i. Use Jensens inequality to show that

as n for all  > 0.

To see this, first note that

Ak () = {sup |Xk X| }.

(the second last equality is justified by Problem 2 on Exercise Sheet

lim P (Bn ()) = 0 for all  > 0

lim P (sup |Xk X| ) = 0 for all  > 0,

which finishes the proof.

Lemma 2.5 has two important consequences:

P (|Xk X| > ) <

for all  > 0, then Xk X almost surely.

P (|Xk X| > ) < ,

as n . Thus Xk X almost surely follows from Lemma 2.5.

(iv) Xn X in probability implies the existence of a subsequence

To see this, let  > 0. Then

since Xn X in probability. Letting  0 and using the continuity

To prove (iv), pick an increasing sequence nk such that

P (|Xnk X| > 1/k)

Consequently, Xnk X almost surely as k according to (iii)

You might also like

If x is a point of continuity of F , then letting 0 shows that the

as n for all > 0.

Ak () = {sup |Xk X| }.

lim P (Bn ()) = 0 for all > 0

lim P (sup |Xk X| ) = 0 for all > 0,

P (|Xk X| > ) <

for all > 0, then Xk X almost surely.

P (|Xk X| > ) < ,

To see this, let > 0. Then

since Xn X in probability. Letting 0 and using the continuity