Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
6 views

4 Convergence and Simulation

Uploaded by

Huế Lê
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

4 Convergence and Simulation

Uploaded by

Huế Lê
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Types of Convergence

and
Simulation Illustration

1 / 50
Recap and Recheck

i.i.d
In the last lecture, we know that if X1 , . . . , Xn ∼ N (0, 1) then

1
 
X̄ ∼ N 0, and S 2 ∼ χ2n−1
n

For many (seemingly simple) statistics, it’s difficult to describe its


p.m.f or p.d.f exactly.
Example
i.i.d
If X1 , . . . , X100 ∼ U([−1, 1]), what can we say about the
2 or distribution of X̄?
distribution of X12 + · · · + X100
For questions that we don’t know how to answer exactly, we’ll try
to answer them approximately.

2 / 50
Sample Mean of i.i.d Uniform
1 import numpy as np
2 import scipy . stats
3 from scipy . stats import uniform
4
5 X = uniform ( -1 ,1) # create a random variate U ([ -1 ,1])
6
7 n = 10000 # number of sample
8 X_bar = np . array ([]) # create an array to store values
sample mean
9 for i in range ( n ) :
10 x = X . rvs (100) # generate 100 random numbers from U
([ -1 ,1])
11 x_bar = np . mean ( x ) # generate a sample mean
12 X_bar = np . append ( X_bar , x_bar ) # add sample mean to
the storage array
13
14 # plot histogram of sample mean to have a shape of
distribution of X_bar
15 plt . hist ( X_bar , bins = ’ auto ’)
16 plt . show ()

3 / 50
Histogram of Generated Sample Mean from U([−1, 1])

4 / 50
Asymptotic analysis

Oftentimes, a very good approximate answer emerges when n is


large (in other words, you have many samples). We call results
that rely on this type of approximation asymptotic. If we can just
simulate, why do asymptotic analysis?
1. Better understanding of the behavior. (Understanding the
assumptions: What if Xi are not uniform? What if I don’t
really know the distribution of Xi ? Understanding the scaling:
What if n = 1000 instead of 100? What if n = 1,000,000?)
2. Faster to get an answer

5 / 50
In this session
we study and illustrate with simulation some types of convergence
of a sequence of random variables in probability discipline including
convergence
▶ in distribution
▶ in probability
▶ almost surely
which are related to the two important limit theorems. We know
that these limits are useful in statistics inference with the sample
taken from a population with mean µ and variance σ 2
▶ Law of large number (LLN)
X1 + · · · + Xn
X¯n = →µ
n
in probability and almost surely
▶ Central limit theorem (CLT)
Xn − µ
√ → N (0, 1)
σ/ n
in distribution
6 / 50
Sequence of Random Variables

Types of Convergence

Law of Large Number

Central Limit Theorem

Properties of convergence

7 / 50
Here, we would like to discuss what we precisely mean by a
sequence of random variables.
▶ Remember that, in any probability model, we have a sample
space Ω and a probability measure P . A random variable X is
a mapping that assigns a real number to any of the possible
outcome in Ω.
▶ When we have a sequence of random variables
X1 , X2 , X3 . . . , it is also useful to remember that we have an
underlying sample space Ω

8 / 50
Example 1
Consider the following random experiment: A fair coin is tossed
once. Here, the sample space has only two elements Ω = {H, T }.
We define a sequence of random variables X0 , X1 , X2 , X3 , . . . , on
this sample space as follows:
(
1
n+1 if the coin turn Head
Xn =
0 if the coin turn Tail

The sequence value of (Xn )n≥0 with respect to each possible


tossing result always converges to 0. We say that (Xn )n≥0
converges surely to 0. 9 / 50
Example 2
Consider the following random experiment: A fair coin is tossed
forever. Here, the sample spaceconsists of all possible sequences of
heads and tails. We define a sequence of random variables
X1 , X2 , X3 , . . . , on this sample space as follows:
(
1 if the coin n turn Head
Xn =
0 if the coin n turn Tail

10 / 50
Example 2 (cont)

Corresponding to each possible outcome, the sequence values of


(Xn )n≥‘ is fluctuate between 0 and 1 and has no limit. But the
distribution of Xn is Bernoulli with parameter p = 0.5 for all n. So
at any time n, we know that Xn can be equal to 1 with probability
0.5

11 / 50
Sequence of Random Variables

Types of Convergence
Convergence in Distribution
Convergence in Probability
Almost Surely Convergence

Law of Large Number

Central Limit Theorem

Properties of convergence

12 / 50
Convergence in Distribution

Definition (Convergence in Distribution)


A sequence of random variables (Xn )n≥ 1 converges in distribution
d
to a random variable X, denoted by Xn −
→ X if

lim FXn (x) = F (x) ∀x


n→∞

where FXn and Fx are the cumulative distribution function (c.d.f)


of Xn and X respectively

13 / 50
Example 2 - Revisit

(Xn )n≥1 converges in Bernoulli distribution with parameter p = 0.5


d
Xn −
→ Ber(p)

14 / 50
Example 3
Consider a sequence of random varible (Xn )n≥1 with c.d.f
1 − 1 − 1 nx if x > 0
  
n
FXn (x) =
0 otherwise
then for all x > 0
nx
1

lim FXn (x) = lim 1 − 1 − = 1 − e−x
n→∞ n→∞ n
and for all x ≤ 0
lim FXn (x) = lim 0 = 0
n→∞ n→∞
That is
(
1 − e−x if x > 0
lim FXn (x) = = FX (x)
n→∞ 0 otherwise
where X ∼ E(1), i.e. X has exponential distribution with intensity
d
1. In other word, Xn −
→ X or (Xn )n≥0 converges in distribution to
E(1)
15 / 50
16 / 50
Exercise

Suppose that Xn is a geometric random variable with parameter


1
n . Its probability mass function is given by
!
1 k−1 1
P (Xn = k) = 1− for k ≥ 1
n n

d
Prove that Xn −
→ E(1)

17 / 50
Convergence in Probability

A sequence of random variable (Xn ) is convergence in prob-


ability to a random variable X if for any ϵ > 0

lim P (|Xn − X| > ϵ) = 0


n→∞

The concept of convergence in probability is used very often in


statistics. For example, an estimator is called consistent if it
converges in probability to the parameter being estimated

18 / 50
Example 4

P
Suppose Xn ∼ E(n) then Xn − → 0. That is the sequence
X1 , X2 , . . . converges in probability to the zero random variable X.

19 / 50
Example 4

P
Suppose Xn ∼ E(n) then Xn − → 0. That is the sequence
X1 , X2 , . . . converges in probability to the zero random variable X.
Solution
For any ϵ > 0, we have
Z ∞
P (|Xn − 0| > ϵ) = P (Xn > ϵ) = ne−nx dx = e−nϵ
ϵ

So

lim P (|Xn − 0| > ϵ) = lim e−nϵ = 0


n→∞ n→∞

P
So Xn −
→0

19 / 50
20 / 50
Exercise

Consider a sequence of random variables X1 , X2 , . . . such that


(
1 with probability n1
Xn = 1
0 with probability 1 − n

P
Show that Xn −
→0

21 / 50
Some Useful Concentration Inequalities

Markov’s Inequality
If X is a non-negative RV and a > 0 then

E(X)
P (X ≥ a) ≤
a

Chebysev’s Inequality
If X has E(X) = µ and V ar(X) = σ 2 then

σ2
P (|X − µ| ≥ k) ≤
k2

22 / 50
Proof for(Markov’s Inequality.
a, if X ≥ a
Let Y = →Y ≤X
0, otherwise
So
E(X)
E(Y ) ≤ E(X) ⇔ aP (X ≥ a) ≤ E(X) ⇒ P (X ≥ a) ≤
a

23 / 50
Proof for(Markov’s Inequality.
a, if X ≥ a
Let Y = →Y ≤X
0, otherwise
So
E(X)
E(Y ) ≤ E(X) ⇔ aP (X ≥ a) ≤ E(X) ⇒ P (X ≥ a) ≤
a

Proof for Chebysev’s Inequality.


Apply Markov’s for (X − µ)2

E[(X − µ)2 ] V ar(X)


P [(X − µ)2 ≥ k 2 ] ≤ 2
=
k k2
It is clear that P [(X − µ)2 ≥ k 2 ] = P (|X − µ| ≥ k). So

V ar(X)
P (|X − µ| ≥ k) ≤
k2
23 / 50
Example 5
1 σ2
Let Yn be random variable with E(Yn ) = n and V ar(Yn ) = n
P
where σ > 0 is a constant. Prove that Xn −
→0

24 / 50
Example 5
1 σ2
Let Yn be random variable with E(Yn ) = n and V ar(Yn ) = n
P
where σ > 0 is a constant. Prove that Xn −
→0
Solution
For any ϵ > 0, we have

0 ≤ P (|Xn − 0| > ϵ) ≤ P (|Xn − 0| ≥ ϵ)


V ar(Yn )
≤ Chebysev’s Inequality
ϵ2
σ2
= 2

σ2
It is clear that lim 0 = lim nϵ 2 = 0. By Squeeze theorem,
n→∞ n→∞

P
lim P (|Xn − 0| = 0 ⇒ Xn −
→0
n→∞

24 / 50
Exercise

1 σ2
Let Yn be random variable with E(Yn ) = n and V ar(Yn ) = n
where σ > 0 is a constant.
For a given random variable X, consider

Xn = X + Yn

P
Prove that Xn −
→X

25 / 50
Almost Surely Convergence
Definition
A sequence of random variables X1 , X2 , . . . , Xn . . . converges
almost surely to a random variable X if

P (w ∈ Ω : lim Xn (w) = X(w)) = 1

Example
Revisit example 1, consider the random variable X defined by

X(H) = 0, X(T ) = 0

We have

lim Xn (H) = X(H) = 0 and lim Xn (T ) = X(T ) = 0

So
{w ∈ Ω : lim Xn (w) = X(w)} = {H, T } = Ω
⇒ P (w ∈ Ω : lim X (w) = X(w)) = P (Ω) = 1 26 / 50
Exercise

Consider the following random experiment: A fair coin is tossed


once. Here, the sample space has only two elements Ω = {H, T }.
We define a sequence of random variables X1 , X2 , X3 , . . . on this
sample space as follows
(
n
n+1 if the coin turn H
Xn =
(−1)n if the coin turn T

1. For each of the possible outcomes (H or T), determine


whether the resulting sequence of real numbers converges or
not.
2. Find P (w ∈ Ω : P (w ∈ Ω : lim Xn (w)1))
3. Does the sequence of X1 , X2 , X3 , . . . converge almost surely
to 1

27 / 50
Sufficient Condition for Almost Surely Convergence

Theorem
Consider a sequence of random variables X1 , X2 , . . . . If

X
P (|Xn − X| > ϵ) < ∞
n=1

a.s
for all ϵ > 0 then Xn −−→ X.

28 / 50
Example 6
Consider a sequence X1 , X2 , . . . such that
(
1
n with probability 0.5
Xn =
− n1 with probability 0.5
a.s
Show that Xn −−→ 0

29 / 50
Example 6
Consider a sequence X1 , X2 , . . . such that
(
1
n with probability 0.5
Xn =
− n1 with probability 0.5
a.s
Show that Xn −−→ 0
Solution
For ϵ > 0, we have
1 1
|Xn − 0| > ϵ ⇔ >ϵ⇔n<
n ϵ
Hence

∞ ⌊1⌋
ϵ
⌊1⌋
ϵ
X X X 1
P (|Xn − X| > ϵ) = P (|Xn | > ϵ) = 1=⌊ ⌋<∞
n=1 n=1 n=1
ϵ

a.s
So Xn −−→ 0
29 / 50
Exercise

Let X1 , X2 , X3 , . . . be a sequence of i.i.d. U nif orm(0, 1) random


variables. Define the sequence Yn as

Yn = min(X1 , X2 , . . . , Xn )
a.s
Prove that Yn −−→ 0

30 / 50
Sequence of Random Variables

Types of Convergence

Law of Large Number

Central Limit Theorem

Properties of convergence

31 / 50
Weak Law of Large Numbers

Theorem
X1 , . . . , Xn : independent and identically distributed RVs (they
have the same distribution) and E(Xi ) = µ, V ar(Xi ) = σ 2 < ∞.
Then for any ϵ > 0
X1 + · · · + Xn
 
P − µ > ϵ → 0 as n → ∞
n
X1 +···+Xn P
That is the sample mean X¯n = n −
→ µ

32 / 50
Weak Law of Large Numbers

Theorem
X1 , . . . , Xn : independent and identically distributed RVs (they
have the same distribution) and E(Xi ) = µ, V ar(Xi ) = σ 2 < ∞.
Then for any ϵ > 0
X1 + · · · + Xn
 
P − µ > ϵ → 0 as n → ∞
n
X1 +···+Xn P
That is the sample mean X¯n = n −
→ µ

Proof.
We have E(X̄n ) = µ and V ar(X̄n ) = σ 2 . Apply Chebysev’s
inequality
σ 2 n→∞
P (|X¯n − µ| > nϵ) ≤ 2 −−−→ 0

32 / 50
Strong Law of Large Number

Theorem
X1 , . . . , Xn : independent and identically distributed RVs (they
have the same distribution) and E(Xi ) = µ, V ar(Xi ) = σ 2 < ∞.
Then for any ϵ > 0
X1 + · · · + Xn
 
P − µ > ϵ → 0 as n → ∞
n

33 / 50
Sequence of Random Variables

Types of Convergence

Law of Large Number

Central Limit Theorem

Properties of convergence

34 / 50
Central Limit Theorem

If X1 , X2 , . . . , Xn , . . . are independent and identically distributed


(i.i.d) (that is has the same distribution) with mean µ and finite
variance σ 2 then
X̄n − µ d

→ N (0, 1)
√σ
n

X1 +···+Xn Sn
where X̄n = n = n is the sample mean

35 / 50
Example 7 - CLT 1

Toss a fair coin forever and consider the variable Xn as following


(
1 if the coin n turn Head
Xn =
0 if the coin n turn Tail

then Xn ’s are independent and has the same distribution


P (Xn = 1) = 0.5, P (Xn = 0) = 0.5 with mean µ = 0.5 and
variance σ 2 = 0.25.
Think of Xn = 1 as a ”success”, so that Sn = X1 + · · · + Xn is
the number of successes in n trials which has binomial distribution
Bin(n, 0.5)

36 / 50
The distribution of Sn is closer and closer to the bell shape - the
shape of normal distribution. So is the sample mean X̄n

37 / 50
Example 8 - CLT 2

The underlying distribution is E(0.5). Histogram of 50000 values


of Yn corresponding to sample mean with sample size n = 300
show that the distribution of Yn is quite significant close to the
standard normal distribution

38 / 50
Proof Idea of CLT

−µ
X̄n√
Prove that the moment generating function of σ/ n

M X̄n√−µ (s)
σ/ n

converges to the moment generating function of Z ∼ N (0, 1) as


n→∞
s2
Mz (s) = e 2

39 / 50
Outline proof for µ = 0
▶ MaX+b (s) = ebs MX (as)
▶ MX1 +···+Xn (s) = (MX1 (s))n
▶ MX̄n = (MX1 ( ns ))n

▶ M X̄√
n (s) = MX̄n ( σ/s√n ) = MX̄n ( s σ n ) = MX1 ( σ√
s
n
)n
σ/ n
▶ Second order Taylor series expansion

E(X12 ) 2 σ2
MX1 (s) = 1 + E(X1 )s + s + o(s2 ) = 1 + s2 + o(s2 )
2! 2

2 !n !n
σ2 s s2

M X̄√
n (s) = 1+ √ + o(s2 ) = 1+ + o(s2 )
σ/ n 2 σ n 2n
s2
→e2

as n → ∞
40 / 50
For µ ̸= 0

▶ Yi = Xi − µ has mean 0 and variance σ 2



s2
M Ȳ√
n (s) → e 2
σ/ n

as n → ∞
▶ X̄n = Ȳn + µ and then

X̄n − µ Ȳn
√ = √
σ/ n σ/ n

s2
M X̄n√−µ (s) = M Ȳ√
n (s) → e 2
σ/ n σ/ n

as n → ∞

41 / 50
Simulation to illustrate CLT

Since the CLT seems almost magical, running simulations that


verify its implications is one good way to build intuition.
We now perform the following simulation
▶ Choose an arbitrary distribution F for the underlying
observations X
▶ Generate independent draws of Yn = X̄−µ √σ
n
▶ Use these draws to compute some measure of their
distribution such as a histogram.
▶ Compare the latter to N (0, 1)

42 / 50
Delta Method
If Yn has a limiting Normal distribution then the delta method
allows us to find the limiting distribution of g(Yn ) where g is any
smooth function.
Theorem (Delta Method)
Suppose that
Yn − µ d

→ N (0, 1)
√σ
n

and that g is a differentiable function such that g ′ (µ) ̸= 0. Then

g(Yn ) − g(µ) d
g ′√
(µ)σ

→ N (0, 1)
n

In other word,
! !
σ2 (g ′ (µ)σ)2
If Yn ∼ N µ, then g(Yn ) ∼ N g(µ),
n n

43 / 50
Example 9

Let X1 , ,̇Xn are i.i.d with mean µ and finite variance σ 2 . By CLT

X̄n − µ X1 + · · · + Xn
∼ N (0, 1) where Xn =
√σ n
n

Let Wn = eX̄n = g(X̄n ) with g(x) = ex . Since g ′ (x) = ex , the


Delta method implies that
! !
d (g ′ (µ)σ)2 µ e2µ σ 2
Wn −
→N g(µ), ≡N e ,
n n

44 / 50
Sequence of Random Variables

Types of Convergence

Law of Large Number

Central Limit Theorem

Properties of convergence

45 / 50
Properties of Convergence

Let Xn , X, Yn , Y be random variables. Let g be a continuous


function.
a.s P
1. If Xn −−→ X then Xn − →X
P d
2. If Xn −
→ X then Xn −
→X
d P
3. If Xn −
→ c for some real number c then Xn −
→X
P P
4. If Xn −
→ X then g(Xn ) −
→ g(X)
d d
5. If Xn −
→ X then g(Xn ) −
→ g(X)
P P P
6. If Xn −
→ X and Yn −
→ Y then Xn + Yn −
→X +Y
P P P
7. If Xn −
→ X and Yn −
→ Y then Xn Yn −
→ XY
d d d
8. If Xn −
→ X and Yn −
→ c then Xn + Yn −
→X +c
d d d
9. If Xn −
→ X and Yn −
→ c then Xn Yn −
→ cX

46 / 50
Example
Let X1 , . . . , Xn are i.i.d with mean µ and finite variance σ 2 .
Denote
X1 + · · · + Xn
X̄n =
n
then by weak LLN

X12 + · · · + Xn2 P
→ E(X 2 )

n
and
P
X̄n −
→ E(X)
It implies that
P
[X̄n ]2 −
→ [E(X)]2
and
n
1X P
Xi2 − [X̄n ]2 −
→ E(X 2 ) − [E(X)]2 = V ar(X)
n i=1

47 / 50
Homework

Rice Exercise 1 page 188


Rice Exercise 2 page 188
Rice Exercise 5 page 188
Rice Exercise 6 page 188
Use simulation to illustrate the results in the above exercise.

48 / 50
content...

49 / 50
content...

50 / 50

You might also like