Convergence of Random Variables
Convergence of Random Variables
One of the most important parts of probability theory concerns the be-
havior of sequences of random variables. This part of probability is often
called large sample theory or limit theory or asymptotic theory. This
material is extremely important for statistical inference. The basic question
is this: what can we say about the limiting behavior of a sequence of random
variables X1 , X2 , X3 , . . .? Since statistics is all about gathering data, we will
naturally be interested in what happens as we gather more and more data,
hence our interest in this question.
Recall that in calculus, we say that a sequence of real numbers xn con-
verges to a limit x if, for every > 0, |xn x| < for all large n. In
probability, convergence is more subtle. Going back to calculus for a mo-
ment, suppose that xn = x for all n. Then, trivially, limn xn = x. Consider a
probabilistic version of this example. Suppose that X1 , X2 , . . . are a sequence
of random variables which are independent and suppose each has a N (0, 1)
distribution. Since these all have the same distribution, we are tempted to
say that Xn converges to Z N (0, 1). But this cant quite be right since
P (Xn = Z) = 0 for all n.
Here is another example. Consider X1 , X2 , . . . where Xi N (0, 1/n).
Intuitively, Xn is very concentrated around 0 for large n. But P (Xn = 0) =
0 for all n. The next section develops appropriate methods of discussing
convergence of random variables.
E(Xn X)2 0
1
as n .
p
Xn converges to X in probability, written Xn X, if, for every > 0,
P (|Xn X| > ) 0
as n .
Let Fn denote the cdf of Xn and let F denote the cdf of X. Xn converges
d
to X in distribution, written Xn X, if,
lim
n
Fn (t) = F (t)
2
Point Mass
Also,
F (x ) = P (X x ) = P (X x , Xn x) + P (X x + , Xn > x)
Fn (x) + P (|Xn X| > ).
Hence,
3
This holds for all > 0. Take the limit as 0 and use the fact that F is
continuous at x and conclude that limn Fn (x) = F (x).
Proof of (c). Fix > 0. Then,
4
is close to the mean of the distribution. For example, the proportion of heads
of a large number of tosses is expected to be close to 1/2. We now make this
more precise.
Let X1 , X2 , . . . , be an iid sample and let = E(X1 ) and 2 = V ar(X1 ).1
P
The sample mean is defined as X n = n1 ni=1 Xi . Recall these two important
facts: E(X n ) = and V ar(X n ) = 2 /n.
PROOF. Assume that < . This is not necessary but it simplifies the
proof. Using Chebyshevs inequality,
V ar(X n ) 2
P |X n | > =
2 n2
which tends to 0 as n .
There is a stronger theorem in the appendix called the strong law of large
numbers.
5
5.4. The Central Limit Theorem
In this section we shall show that the sum (or average) of random variables
has a distribution which is approximately Normal. Suppose that X1 , . . . , Xn
are iid with mean and variance . The central limit theorem (CLT) says
P
that X n = n1 i Xi has a distribution which is approximately Normal with
mean and variance 2 /n. This is remarkable since nothing is assumed
about the distribution of Xi , except the existence of the mean and variance.
where Z z 1 2
(z) = ex /2 dx
2
is the cdf of a standard normal.
The proof is in the appendix. The central limit theorem says that the
distribution of Zn can be approximated by a N (0, 1) distribution. In other
words:
probability statements about Zn can be approximated using a
Normal distribution. Its the probability statements that we are
approximating, not the random variable itself.
There are several ways to denote the fact that the distribution of Zn can
be approximated be a normal. They all mean the same thing. Here they are:
Zn N (0, 1)
!
2
X n N ,
n
!
2
X n N 0,
n
6
n(X n ) N 0, 2
n(X n )
N (0, 1).
Hence,
7
The central limit theorem tells us that Zn = n(X )/ is approxi-
mately N(0,1). This is interesting but there is a practical problem: we dont
always know . We can estimate 2 from X1 , . . . , Xn by
1X n
Sn2 = (Xi X n )2 .
n i=1
This raises the following question: if we replace with Sn is the central limit
theorem still true? The answer is yes.
33 E|X1 |3
sup |P (Zn z) (z)| 3 .
z 4 n
Often, but not always, convergence properties are preserved under trans-
formations.
d d
Generally, it is not the case that Xn X and Yn Y implies that
d
Xn + Yn X + Y . However, it does hold if one of the limits is constant.
8
d d
THEOREM 5.5.2 (Slutzkys Theorem.) If Xn X and Yn c, then
d
Xn + Yn X + c.
THEOREM 5.5.3.
p p p
(a) If Xn X and Yn Y , then Xn Yn XY .
d d p
(b) If Xn X and Yn c, then Xn Yn cX.
P (lim
n
Xn = c) = 1.
L
We say that Xn converges in L1 to X, written Xn 1 X, if
E|Xn X| 0
as n .
The following relationships hold in addition to those in 4.2.1.
9
Appendix A5.2. The Strong Law of Large Numbers
The weak law of large numbers says that X n converges to EX1 in prob-
ability. The strong law asserts that this is also true almost surely.
10
which is the mgf of a N(0,1). The result follows from the previous Theorem.
In the last step we used the following fact from calculus:
FACT: If an a then
n
an
1+ ea .
n
11