Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
5 views

Lecture2

Uploaded by

Rohit Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Lecture2

Uploaded by

Rohit Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

HSP 511: Economics Lab

Lecture 2
Indian Institute of Technology Delhi

Contents
Estimation of CDF 1
Asymptotic Normality 4

In the previous lecture, we were only interested in the mean and variance of
the distribution as mean is the best prediction of any random variable in terms of
mean squared error. Suppose we have i.i.d. sample from some population and we
are interested in knowing the distribution of this random variable. What can be say
about CDF of any random variable from an i.i.d. sample from this distribution?

※ Estimation of CDF
The CDF of any random variable can be written as

F (x) = Pr(Xi ≤ x)
= E(1{Xi ≤ x}),

where

1 if Xi ≤ x
1 (Xi ≤ x) =
0 if Xi > x.

Based on this, we can easily find plug-in estimator of CDF which is called empirical
distribution function. The empirical distribution function F̂ is the CDF that puts
mass 1/n at each data point Xi . Formally,
Pn
1 (Xi ≤ x)
i=1
F̂ (x) =
n
|observations less than or equal to x|
= .
n
### Estimating CDF at points x=.01,.02,.03,........

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

1
Lecture Notes Estimation of CDF

e = np.random.rand(100,1)
F=np.arange(0, 1, 0.01)
Fn = np.mean(-(np.sign(np.matmul(e,np.ones((1,100)))-
np.matmul(np.ones((100,1)),F.reshape(1,100)))-1)/2,axis=0)

plt.plot(Fn)
plt.plot(F)
plt.show()

1.0

0.8

0.6

0.4

0.2

0.0
0 20 40 60 80 100

Theorem 1.1
At any fixed value of x,
    F (x)(1 − F (x))
E F̂ (x) = F (x) and Var F̂ (x) =
n
Thus,
F (x)(1 − F (x))
MSE = →0
n
P
and hence, F̂ (x) −
→ F (x).

Proof. We have:

2
Lecture Notes Estimation of CDF

Pn
i=1 1 (Xi ≤ x)
F̂ (x) = .
n
Thus,
n
E(F̂ (x)) = n−1
X
E(1 (Xi ≤ x))
i=1
n
= n−1
X
P (Xi ≤ x)
i=1
n
= n−1
X
F (x)
i=1
= F (x).
For Variance,
n
!2
E(F̂ (x)2 ) = n−2 E
X
1 (Xi ≤ x)
i=1
 
n n n
n−2 E  1 (Xi ≤ x)2 +
X X X
= 1 (Xi ≤ x) 1 (Xj ≤ x)
i=1 i=1 j=1,j̸=i
 
n   n n
n−2  E 1 (Xi ≤ x)2 +
X X X
= E (1 (Xi ≤ x) 1 (Xj ≤ x))
i=1 i=1 j=1,j̸=i
 
n n n
n−2  E (1 (Xi
X X X
= ≤ x)) + E (1 (Xi ≤ x)) E (1 (Xj ≤ x))
i=1 i=1 j=1,j̸=i
 
n n n
n−2  P (Xi
X X X
= ≤ x) + P (Xi ≤ x) P (Xj ≤ x)
i=1 i=1 j=1,j̸=i
 
n n n
n−2  F (x) + F (x)2 
X X X
=
i=1 i=1 j=1,j̸=i
 
= n−2 nF (x) + (n2 − n)F (x)2
= n−1 (F (x) + (n − 1)F (x)2 )
Therefore,
F (x)(1 − F (x))
Var(F̂ (x)) = E(F̂ (x)2 )−E(F̂ (x))2 = F (x)/n+(1−1/n)F (x)2 −F (x)2 =
n
Finally,
MSE(x) = E(F̂ (x)) − F (x))2
= (E(F̂ (x)) − F (x))2 + Var(F̂ (x))
F (x)(1 − F (x))
= Var(F̂ (x)) = →0
n

3
Lecture Notes Asymptotic Normality

※ Asymptotic Normality
Our estimator is simple sample average of 1 (Xi ≤ x),
Pn
i=1 1 (Xi ≤ x)
F̂ (x) =
n
For some fixed x, using the central limit theorem,

n(F̂ (x) − F (x)) ⇝ N (0, F (x)(1 − F (x)))

We can built confidence interval at any point x using this distribution by inverting
a test. Suppose, we want to test

H0 : F (x) = F0 (x) vs F (x) ̸= F0 (x).

Consider following test statistic



n(F̂ (x) − F0 (x))
T (X) = q .
F̂ (x)(1 − F̂ (x))

It is easy to see that under the null

T (X) ⇝ N (0, 1),

and under the alternate



n(F̂ (x) − F0 (x))
T (X) = q
F̂ (x)(1 − F̂ (x))
√ √
n(F̂ (x) − F1 (x)) n(F1 (x) − F0 (x))
= q + q .
F̂ (x)(1 − F̂ (x)) F̂ (x)(1 − F̂ (x))
| {z } | {z }
⇝N (0,1) large +ve or -ve

We can use the test above to find our confidence interval as


 √ 
 n(F̂ (x) − F (x)) 
C(X, x) = F (x) : q ≤ zα/2
F̂ (x)(1 − F̂ (x))
 
 q q 
 zα/2 F̂ (x)(1 − F̂ (x)) zα/2 F̂ (x)(1 − F̂ (x)) 
= F (x) : F̂ (x) − √ ≤ θ′ ≤ F̂ (x) + √ .
 n n 

Theorem 2.1: Dvoretsky-Kiefer-Wolfowitz (DKW) inequality


Let X1 , . . . , Xn be iid from F . Then, for any ϵ > 0,
 
2
P sup |F̂ (x) − F (x)| > ϵ ≤ 2e−2nϵ .
x

4
Lecture Notes Asymptotic Normality

From the DKW inequality, we can construct a confidence set for the full CDF.
A 1 − α nonparametric confidence band for F is (L(x), U (x)) where
L(x) = max{F̂ (x) − ϵn , 0}
U (x) = min{F̂ (x) + ϵn , 1}
s
1 2
 
ϵn = log
2n α
### Confidence interval for full CDF function

import math
epsilon = np.sqrt(math.log(2/.05)/(2*100))

L_n = np.maximum(Fn - epsilon, 0)


U_n = np.minimum(Fn + epsilon, 1)

plt.plot(Fn)
plt.plot(L_n)
plt.plot(U_n)
plt.plot(F)
plt.show()

1.0

0.8

0.6

0.4

0.2

0.0
0 20 40 60 80 100

### Check confidence interval coverage for full CDF

H=np.zeros(1000)

5
Lecture Notes Asymptotic Normality

for i in range(1000):
e = np.random.rand(100,1)
Fn = np.mean(-(np.sign(np.matmul(e,np.ones((1,100)))-
np.matmul(np.ones((100,1)),F.reshape(1,100)))-1)/2,axis=0)
H[i] = np.sum((np.sign(np.abs(Fn-F)-epsilon)+1)/2)

print('The coverage is: {}'.format(1-np.mean(np.sign(H))))

## The coverage is: 0.961

You might also like