Notes 2
Notes 2
Notes 2
and even these questions are still imprecise. It should be clear that
we need a range of definitions of convergence to random objects.
We have seen above that, in the statements of the law of large numbers and the central limit theorem, the limiting objects are of different
types. Appropriately, they use different notions or modes of convergence of a sequence of random variables. Below four different modes
of convergence are defined, and certain relationships between them are
proven.
First, however, we state and prove some useful inequalities.
1
JOHN MORIARTY
1. Inequalities
Question. Suppose you make a sequence of 10 investments, and I
offer to either:
(1) take your average gain, then square it; or
(2) square each of your gains, then average it,
and pay you the result. Which should you choose?
Theorem 1.1. (Jensens inequality) Let X be a random variable
with E(X) < , and let f : R R be a convex function. Then
f (E(X)) E(f (X)).
Remark Recall that f : R R is convex if for any x0 R there exists
a R such that
f (x) (x x0 ) + f (x0 )
for all x R. If f is twice differentiable, then f is convex if and only
if f 00 0. Examples of convex functions: y 7 x, x2 , ex , |x|.
Proof. (Jensens inequality) Let f be convex, and let R be such
that
f (x) (x E(X)) + f (E(X))
for all x. Then
E(f (X)) E ((x E(X)) + f (E(X)))
= f (E(X)).
Question. Suppose we are designing a flood defence. Let X be
the (random) level of high tide. We have historical data giving an
estimate for E(X), but no information on the distribution of X. How
high should the flood defence be to ensure just 5 percent chance of
flooding?
Theorem 1.2. Let X be a random variable, and let f : R [0, ).
Then
E(f (X))
P (f (X) a)
a
for all a > 0.
Proof. Let A = {f (X) a}. Then
f (X) a1A ,
where
1 if A
0 if 6 A
is the indicator function of A. Taking expectations gives
1A () =
LECTURE NOTES 2
E|X|
a
E(X 2 )
,
a2
respectively.
Question. Suppose that X is a random variable with finite mean
and variance. Using all the results proved so far, which inequality gives
more information about P (|X| a)? Is it
(1) Markovs,
(2) Chebyshevs, or
(3) neither?
Question. Suppose that we have two random variables X and Y ,
and want some information on the average of the size of their product,
|XY |. What information about X and Y might we need?
Recall that the nth moment of a random variable X is defined to be
E(X n ).
Theorem 1.3. (H
olders inequality) Assume p > 1 and q > 1
1
1
satisfy p + q = 1. Let X and Y be two random variables. Then
E|XY | (E|X|p )1/p (E|Y |q )1/q .
Proof. If E|X|p = 0, then P (X = 0) = 1 (you can use Theorem 1.2),
so the inequality clearly holds (and also if E|Y |q = 0). Thus we may
assume that E|X|p > 0 and E|Y |q > 0.
Note that the function g defined by
tp tq
g(t) = +
,
p
q
t>0
satisfies g(t) 1 (you can examine the derivative for t around the point
1). Inserting
t=
|X|
(E|X|p )1/p
1/q
|Y |
(E|Y |q )1/q
1/p
gives
1
1 g(t) =
p
|X|
(E|X|p )1/p
p/q
|Y |
(E|Y |q )1/q
q/p
JOHN MORIARTY
+
(E|X|p )1/p (E|Y |q )1/q
p (E|X|p )1/q+1/p q (E|Y |q )1/p+1/q
1 |X|p
1 |Y |q
=
+
p (E|X|p ) q (E|Y |q )
(this inequality also holds if X()Y () = 0). Taking expectations of
both sides gives
E|XY | (E|X|p )1/p (E|Y |q )1/q .
Using p = q = 2 in Holders inequality gives the Cauchy-Schwartz
inequality.
Consequence 1.4. (Cauchy-Schwartz inequality)
(E|XY |)2 (E|X|2 )(E|Y |2 ).
Question. Let X, Y be random variables with means 1 , 2 and
variances 1 , 2 respectively, all finite. What can you now say about
the average size of their product?
Our final inequality gives a similar estimate, in terms of the moments
of X and Y , of the average of the size of their sum, |X + Y |.
Theorem 1.5. (Minkowskis inequality) Let p 1. Then
(E|X + Y |p )1/p (E|X|p )1/p + (E|Y |p )1/p .
Proof. If p = 1, the inequality follows directly from the triangle inequality. Thus we assume that p > 1, and we let q > 1 be such that
1
+ 1q = 1. Let Z = |X + Y |. Then
p
EZ p = E(ZZ p1 )
E(|X|Z p1 ) + E(|Y |Z p1 )
(E|X|p )1/p (EZ q(p1) )1/q + (E|Y |p )1/p (EZ q(p1) )1/q
= (E|X|p )1/p + (E|Y |p )1/p (EZ q(p1) )1/q ,
where we in the first inequality used |X + Y | |X| + |Y |, and in the
second inequality we used Holders inequality. Since q(p 1) = p we
have
(EZ p )11/q (E|X|p )1/p + (E|Y |p )1/p ,
which finishes the proof since 1 1/q = 1/p.
LECTURE NOTES 2
JOHN MORIARTY
Xn X in probability
Xn X almost surely
LECTURE NOTES 2
Question. Could you have also seen that part ii above was true
before seeing the proof? If so, is your argument different to the formal
proof?
Lemma 2.5. Xn X almost surely if and only if
P sup |Xk X| 0
kn
(1)
C
= P
A()
=P
>0
!
A(1/m)
m=1
P (A(1/m)),
m=1
which proves the other implication, and thus finishes the proof of (1).
JOHN MORIARTY
Now, let
Bn () =
k=n
Then
B1 () B2 () B3 () ... A()
with
lim Bn () = A().
Thus
P (A()) = 0 P ( lim Bn ()) = lim P (Bn ()) = 0
n
kn
as n if Xn X almost surely.
Consequence 2.7. If
k=1
k=1
then
P (sup |Xk X| )
kn
P (|Xk X| ) 0
k=n
LECTURE NOTES 2
All implications in Theorem 2.2 are strictthat is, their reverse implications do not hold. For example, there exists a sequence of random
variables {Xn }
n=1 such that Xn X in probability but not almost
surely (the strictness of the implications in Theorem 2.2 follows from
the problems in Example Sheet 2 and 3). However, under some extra assumptions we have the following converses to the implications in
Theorem 2.2.
Theorem 2.8. Let Xn , n = 1, 2, ... and X be random variables.
(i) Xn C in distribution, where C is a constant, implies Xn C
in probability.
(ii) Xn X in probability and P (|Xn | M ) = 1 for some constant
M imply Xn X in rth mean for r 1.
(iii)
X
P (|Xn X| > ) < for all > 0 = Xn X a.s.
n=1
P (|X| M ) = 1.
P (|X
P (|X
P (|X
P (|X
Xn + Xn | M + )
Xn | + |Xn | M + )
Xn | M |Xn | + )
Xn | ) 0
10
JOHN MORIARTY
X
X
P (|Xnk X| > )
P (|Xnk X| > )
k=1
k=1
k1/
1/
X
k=1
P (|Xnk X| > ) +
1/k 2 < .
k=1