Advanced Econometric Methods I: Lecture Notes On Bootstrap: 1 Motivation
Advanced Econometric Methods I: Lecture Notes On Bootstrap: 1 Motivation
Advanced Econometric Methods I: Lecture Notes On Bootstrap: 1 Motivation
We first discuss the empirical bootstrap for the sample mean, and then generalize the analysis
to linear regression. We then discuss algorithmic details and extensions.
1 Motivation
• Summary: The bootstrap is a simulation method for computing standard errors and
distributions of statistics of interest, which employs an estimated dgp (data generating
process) for generating artificial (bootstrap) samples and computing the (bootstrap)
draws of the statistic. Empirical or nonparametric bootstrap relies on nonparametric
estimates of the dgp, whereas parametric bootstrap relies on parametric estimates of
the dgp.
The bootstrap is a method for estimating the distribution of an estimator or test statistic
by resampling ones data. It amounts to treating the data as if they were the population for
the purpose of evaluating the distribution of interest. Under mild regularity conditions, the
bootstrap yields an approximation to the distribution of an estimator or test statistic that
is at least as accurate as the approximation obtained from first-order asymptotic theory.
Thus, the bootstrap provides a way to substitute computation for mathematical analysis if
calculating the asymptotic distribution of an estimator or statistic is difficult.
While the asymptotic distribution of the OLS estimator was quite simple to derive and
estimate there are many statistics for which the asymptotic distribution is very hard to
1
estimate. Prominent examples include the distribution of complicated nonlinear functions
of parameters (e.g. impulse responses in vector autoregressive models, not fun to do delta
method) and the distribution of parameter estimates in complicated nonlinear models.
In fact, the bootstrap is often more accurate in finite samples than first-order asymptotic
approximations but does not entail the algebraic complexity of higher-order expansions.
Thus, it can provide a practical method for improving upon first-order approximations.
Such improvements are called asymptotic refinements. One use of the bootstraps ability to
provide asymptotic refinements is bias reduction. It is not unusual for an asymptotically
unbiased estimator to have a large finite-sample bias, see for example the discussion in
Hayashi regarding the estimation of Ŝ. This bias may cause the estimators finite-sample mean
square error to greatly exceed the mean-square error implied by its asymptotic distribution.
The bootstrap can be used to reduce the estimators finite-sample bias and, thereby, its
finite-sample mean-square error. The bootstraps ability to provide asymptotic refinements is
also important in hypothesis testing. First-order asymptotic theory often gives poor approx-
imations to the distributions of test statistics with the sample sizes available in applications.
As a result, the nominal probability that a test based on an asymptotic critical value rejects a
true null hypothesis can be very different from the true rejection probability. The bootstrap
often provides a tractable way to reduce or eliminate finite-sample errors in the rejection
probabilities of statistical tests.
The problem of obtaining critical values for test statistics is closely related to that of
obtaining confidence intervals. Accordingly, the bootstrap can also be used to obtain confi-
dence intervals with reduced errors in coverage probabilities. That is, the difference between
the true and nominal coverage probabilities is often lower when the bootstrap is used than
when first-order asymptotic approximations are used to obtain a confidence interval.
The purpose of these lecture notes is to explain and illustrate the usefulness of the boot-
strap in context of simple examples. The presentation is informal and expository. Its aim is
to provide an intuitive understanding of how the bootstrap works and a feeling for its prac-
tical value in econometrics. The discussion in these notes does not provide a mathematically
2
detailed or rigorous treatment of the theory of the bootstrap. Such treatments are available
in the journal articles that are cited later in these notes.
The remainder is organized as follows. To illustrate the main idea we discuss the boot-
strap for the sample mean based on iid data. Then we extend these arguments to the linear
regression model (for iid data) and provide the relevant algorithms for practical implemen-
tation. Finally, we discuss some extensions.
• In what follows we mostly focus on the empirical (or non-parametric) bootstrap, the
parametric bootstrap is discussed at the end.
Almost everything about the empirical bootstrap can be understood by studying the
bootstrap for the sample mean statistic
n
1X
ȳ = yi
n i=1
We first show how one can explore the behavior of ȳ using simulation. Assume for simplicity
that {yi , i = 1, . . . , n} is a random sample with fixed distribution function F and with
the second moments bounded from above and variance bounded away from zero. This
characterizes the dgp sufficiently for understanding the standard behavior of its sample
means.
In illustrations given below, we shall use standard exponential distribution as F and the
sample size of n = 100. However, there is nothing special about this distribution and we
could have used other distributions with bounded second moments and non-zero variance to
illustrate our points.
Since we know the true dgp in this running example, we can in principle compute the
exact finite distribution of the sample mean. However, setting aside special cases suitable
for textbook problems, the exact distribution of ȳ is not analytically tractable. Instead we
3
proceed by simulating out the finite sample distribution of ȳ. Our simulation will produce
the exact distribution, modulo numerical error, which we take as negligible. In Figure 1
we see the resulting finite sample distribution as well as the standard deviation (standard
error) for this distribution. The distribution is represented by a histogram computed over
S = 1000 simulated samples. The standard error here is computed over the simulated draws
of ȳ, namely v
u S S
!2
u1 X 1 X
t ȳs − ȳj ,
S s=1 S j=1
where ȳs is the sample mean of the sth simulated sample. This standard error is a numerical
approximation to the standard deviation of ȳ
r
p 1
Var(ȳ) = Var (y1 )
n
which in the case of the standard exponential and n = 100 is 1/100 = 0.1. You see that the
standard error in the top panel of Figure 1 is very close.
Although the exact distribution of ȳ is not available without knowledge of F0 , we know
that it is approximately normal by the central limit theorem. Thus, since E(ȳ) = E(y1 ), we
have that
√ d
n(ȳ − E(y1 )) → N (0, Var(y1 ))
Next we consider the empirical bootstrap. Now we want to understand the behavior
of the sample mean ȳ from an unknown dgp F with characteristics as above. Since we
do not know F , we cannot simulate from it. The main idea of the bootstrap is to replace
the unknown true dgp F with a good estimate F̂ . Empirical bootstrap uses the empirical
distribution F̂ , which assigns point-masses of 1/n to each of the data points {y1 , . . . , yn }.
1 a
The notation ∼ implies that the asymptotic distributions on the left and right are equivalent. This
allows us to think about the large sample behavior of statistics without having to take the limit.
4
Using the true law F 0
SD = 0.10231
100
50
50
0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25 1.30
Figure 1: True and bootstrap distributions of the mean of a standard exponential random
sample, with the sample size equal 100. Both distributions are approximately normal by the
central limit theorem, but centered at different points: the true distribution is centered at
the true mean and the second is centered at the empirical mean.
5
In other words, F̂ is a multinomial variable that takes on values {y1 , . . . , yn } with equal
probability 1/n. We proceed as above by simulating i.i.d. samples (bootstrap samples)
{yi∗ }ni=1
where the ∗ indicates that the variables are not simulated from the true distribution function
but instead from F̂ , the empirical distribution function. Notice that this is equivalent to
sampling from the original data randomly with replacement.2 Each bootstrap sample gives
us a bootstrap draw of the sample mean
n
∗ 1X ∗
ȳ = y
n i=1 i
We repeat this procedure many times )S = 1000) to construct many bootstrap samples and
hence many draws of this statistic.
In Figure 1 we see the finite sample distribution of ȳ ∗ (and standard error) for this
distribution. An important point is that this bootstrap distribution is computed conditional
on one draw of data {y1 , . . . , yn }, which you should think of as the original data sample.
We note that not only the standard deviations of the bootstrap draws ȳ ∗ and actual
draws ȳ look very similar, but also the overall distribution of bootstrap draws ȳ ∗ and the
actual draws ȳ looks very similar. This is not a coincidence.
To explain this we must keep in mind that yi∗ ∼ F̂ means that yi∗ |y1 , . . . , yn follows a
discrete distribution with possible outcomes y1 , . . . , yn that each occur with probability 1/n.
On a computer you can implement this as follows. Randomly draw a variable u∗ ∼ U (0, 1) and compute
2
z = b(n − 1) ∗ u + 1c, where b·c takes the integer part of the argument. Now set yi∗ = yz∗ .
∗
6
This implies that the mean of the bootstrap distribution of ȳ ∗ is
n
∗ 1X
E(ȳ |y1 , . . . , yn ) = E(yi∗ |y1 , . . . , yn )
n i=1
n n
!
1X X
= yj P (yi∗ = yj )
n i=1 j=1
n n
1X1X
= yj = ȳ
n i=1 n j=1
v
u n
p u1 X
∗
Var(ȳ |y1 , . . . , yn ) = t (yi − ȳ)2
n2 i=1
which is simply the root of the empirical variance scaled by n. The latter follows as
n
1 X
Var(ȳ ∗ |y1 , . . . , yn ) = Var(yi∗ |y1 , . . . , yn )
n2 i=1
n n
!
1 X X
= 2 (yj − ȳ)2 P (yi∗ = yj )
n i=1 j=1
n
1 X
= (yj − ȳ)2 .
n2 j=1
n
1X p
(yi − ȳ)2 → Var(y1 )
n i=1
we have that the ratio of bootstrap standard error and the actual standard error converges in
probability to 1. Thus, the similarity of the computed standard errors was not a coincidence.
Of course, we did not need the bootstrap to compute the standard errors of a sample mean,
but we will need it for less tractable cases.
We can approximate the exact distribution of ȳ ∗ , conditional on the data, by simulation.
Moreover, it is also approximately normal in large samples. By the central limit theorem
7
and the law of large numbers,
n
!
a 1 X
ȳ ∗ |y1 , . . . , yn ∼ N ȳ, 2 (yi − ȳ)2
n i=1
a
∼ N (ȳ, Var (y1 ) /n)
n
!
√ a 1X
n(ȳ ∗ − ȳ)|y1 , . . . , yn ∼ N 0, (yi − ȳ)2
n i=1
a
∼ N (0, Var (y1 ))
√ √
(i) The approximate distributions of n(ȳ ∗ − ȳ)|y1 , . . . , yn and n(ȳ − E(y1 )), namely
N 0, n1 ni=1 (yi − ȳ)2 and N (0, Var (y1 )), are indeed close.
P
We summarize the discussion of empirical bootstrap of the sample mean in the following
table.
Thus, what we see in Figure 1 is not a coincidence: we conclude that the empirical
bootstrap ’works’ or ’is valid’ for the case of the sample mean. We formalize these statements
further below. It is clear that the reasoning about the approximate distributions for the
sample and the bootstrap sample extends to vector-valued yi ’s of fixed dimension.
It is also clear that the central limit theorem and approximate normality play a crucial
role in the above argument. The argument will generalize to a very large class of estimators
that are approximately linear and normal.
8
3 Bootstrap for the linear regression model
• In this section we consider the same assumptions as in Hayashi Chapter 2, but with
the modification that we additionally assume that {(yi , xi )} is an iid sequence, which
strengthens assumptions 2.2 and 2.5.
√ d
n(b − β) → Σ−1
xx × N (0, S) = N (0, Avar(b)) Avar(b) = Σ−1 −1
xx SΣxx .
We now want to construct a bootstrap draw b∗ using the bootstrap method that would allow
√
us to mimic the behavior of n(b − β), namely
√ d
n(b∗ − b)|(y, X) → N (0, Avar(b)) .
We formally require.
√
sup P ( n(b − β) ∈ A) − P (N (0, Avar(b)) ∈ A)) → 0
A∈A
and
√
sup P ( n(b∗ − b) ∈ A|y, X) − P (N (0, Avar(b)) ∈ A)) → 0
A∈A
From the triangular inequality we have that a sufficient condition for the validity of the
bootstrap is given by
√ √
sup P ( n(b∗ − b) ∈ A|y, X) − P ( n(b − β) ∈ A) → 0 ,
A∈A
The previous definition however emphasizes the link with approximate normality, which is
9
key to demonstrating that the empirical bootstrap works. In particular, to prove that a
bootstrap is valid, an econometrician would typically proceed by proving both statements
in the theorem separately.
We first discuss a quick way of bootstrapping the linear regression model which is based on
the following approximation.
n
!−1 n
√ 1X 1 X
n(b − β) = xi x0i √ xi ε i
n i=1 n i=1
!−1
n n n
1X 1 X 1 X
= xi x0i − Σ−1
xx
√ xi εi + Σ−1
xx √ xi ε i
n i=1 n i=1 n i=1
n
1 X
=Σ−1
xx √ gi + op (1) ,
n i=1
p
where op (1) denotes a term that converges to zero in probability, it arises as n1 ni=1 xi x0i →
P
d
Σxx and √1n ni=1 xi εi → N (0, S). Implying that the first term in the second line converges
P
√
to zero by lemma 2.4.b in Hayashi. Thus the OLS estimator is approximately equal to n
times a sample mean over the {gi }’s times a fixed matrix Σxx .
Thus we could simply bootstrap the gi ’s. Indeed, let {g1∗ , . . . , gn∗ } denote the empirical
bootstrap draws from the sample {gi }, and define the bootstrap draw b∗ via the relation:
n
√ 1 X ∗
n(b∗ − b) = Σ−1
xx √ (g − ḡ) ,
n i=1 i
1
Pn
where ḡ = n i=1 gi .
By the central limit theorem, law of large numbers, and smoothness of the Gaussian law
10
we have the following properties:
n
1 X ∗ a
√ (gi − ḡ)|g ∼ N (0, Ŝ)
n i=1
a
∼ N (0, S)
Pn
where Ŝ = 1
n i=1 (gi − ḡ)(gi − ḡ)0 and S = E(gi gi0 ). These findings follow similarly as for
the sample mean case discussed above (keep in mind that I am assuming {(yi , xi )} to be an
iid sample).
This reasoning implies that our quick bootstrap is valid
√ a
n(b∗ − b)|y, X ∼ Σ−1
xx × N (0, S)
Pn
In practice we need to replace Σxx and gi ’s with consistent estimators Sxx = 1
n i=1 xi x0i
and ĝi = xi (yi − x0i b) such that
n
p 1X p
Sxx − Σxx → 0 and kĝi − gi k2 → 0
n i=1
n
√ ∗ 1 X ∗ ¯
n(b − b) = S−1
xx ×√ (ĝ − ĝ) ,
n i=1 i
¯ =
where ĝ 1
Pn
ĝi . Note that here we are bootstrapping the estimated errors xi ei . We
n i=1
Theorem 1 (Validity of Quick Bootstrap for Linear Regression). Under regularity con-
ditions, the quick bootstrap method is valid. That is, the quick bootstrap method approxi-
mately implements the normal distributional approximation for the OLS estimator. More-
over, the bootstrap variance estimator V̂ = E((b∗ − b)(b∗ − b)|y, X) is consistent, namely
p
that V̂ − Avar(b) → 0.
11
Notice that in the theorem we define V̂ = E((b∗ − b)(b∗ − b)|y, X) as the estimator,
this may seem odd, but keep in mind that the expectation is with respect to the bootstrap
law F̂ , which we can approximate arbitrarily accurate by drawing samples as we show in the
section on algorithms below.
Instead of bootstrapping {xi ei } we could also bootstrap the entire procedure for obtaining
OLS estimates. That is, we sample with replacement
from the observed sample {(yi , xi )}. Importantly, the pairs (yi∗ , x∗i ) are sampled as pairs,
therefore this is sometimes referred to as the pairs-bootstrap.
Based on the bootstrap sample we may compute
n
!−1 n
∗ 1 X ∗ ∗0 1X ∗ ∗
b = xx xy
n i=1 i i n i=1 i i
p 0
b∗ |(y, X) → E(x∗i x∗i |y, X)−1 E(x∗i yi∗ |y, X)
and !−1
n n
0 1X 1X
E(x∗i x∗i |y, X)−1 E(x∗i yi∗ |y, X) = xi x0i xi yi = b ,
n i=1 n i=1
which again follows from similar arguments as used for the sample mean case.
This implies that the bootstrap coefficient b∗ is the OLS estimate in the linear projection
model (or more clearly put the bootstrap dgp) given by
0
yi∗ = x∗i b + e∗i E(x∗i e∗i |y, X) = 0 ,
12
where the moment restriction E(x∗i e∗i |y, X) = 0 follows by construction as
0
E(x∗i e∗i |y, X) = E(x∗i (yi∗ − x∗i b)|y, X)
0 0
= E(x∗i yi∗ |y, X) − E(x∗i x∗i |y, X)E(x∗i x∗i |y, X)−1 E(x∗i yi∗ |y, X) = 0 .
You should think about the bootstrap dgp as the model for (yi∗ , x∗i ) that you obtain by
sampling (yi∗ , x∗i ) from the empirical distribution sample. Note that the key difference is
that it replaces the unknown parameter β by the OLS estimate b. This is what is meant by
replacing the unknown data generating process by an estimate.
Now consider !−1
n n
√ ∗ 1 X ∗ ∗0 1 X ∗ ∗
n(b − b) = xx √ xe
n i=1 i i n i=1 i i
n
1 X ∗ ∗0 p
xi xi |(y, X) → Σxx
n i=1
and
n
1 X ∗ ∗ a
√ xi ei |(y, X) ∼ N (0, S)
n i=1
√ a
n(b∗ − b)|(y, X) ∼ Σ−1
xx × N (0, S)
Theorem 2 (Validity of Slow Bootstrap for Linear Regression). Under regularity condi-
tions, the slow bootstrap method is valid. That is, the slow bootstrap method approximately
implements the normal distributional approximation for the OLS estimator. Moreover, the
bootstrap variance estimator V̂ = E((b∗ − b)(b∗ − b)|y, X) is consistent, namely that
p
V̂ − Avar(b) → 0.
Formal proofs of Theorems 1 and 2 follow exactly the same steps, but stating the conver-
0 p
gence results formally requires a little more case. For instance, the statement n1 ni=1 x∗i x∗i |(y, X) →
P
13
Σxx is true, but the mode of convergence is slightly different as the the bootstrap mean has
a conditional distribution (given the data) so we are actually using a conditional law of large
numbers, which you can think about as a law of large number under the bootstrap law F̂ .
You do not need to worry about these issues for this lecture. You can find an accessible
treatment in Chapter 10 of Hansen’s textbook.
4 Algorithms
A basic use of bootstrap is for estimation of standard errors and construction of the confi-
dence intervals. The following algorithms give the details.
(i) Obtain many bootstrap draws b∗s of the estimator b where the index s = 1, . . . , S
enumerates the bootstrap draws.
S
1X ∗
V̂/n = (b − b)(b∗s − b)
S s=1 s
1/2
(iii) Report (V̂/n)jj as the standard error for bj , for j = 1, . . . , K.a
√
a
Note that we defined V̂ as the variance estimate for n(b∗ − b), the 1/n comes from this.
14
Computing Bootstrap Confidence Intervals:
(i) Obtain many bootstrap draws b∗s of the estimator b where the index s = 1, . . . , B
enumerates the bootstrap draws.
S
1X ∗
V̂/n = (b − b)(b∗s − b)
S s=1 s
1/2 1/2
[bj − zα/2 (V̂/n)jj , bj + zα/2 (V̂/n)jj ]
[b∗j,(α/2) , b∗j,(1−α/2) ]
where b∗j,(α) is the α empirical quantile of the jth coefficients of the sample of
bootstrap draws.
To clarify the percentile confidence interval, suppose that S = 1000 and α = 0.05 we
would use b∗j,(25) and b∗j,(975) , in words b∗j,(25) is the 25th largest b∗j,s for s = 1, . . . , S and b∗j,(975)
is the 975th largest b∗j,s for s = 1, . . . , S.
15
5 Extensions
Now clearly this will work if exp(λ) is indeed the true distribution, but if the true dis-
tribution is different the parametric bootstrap will not be valid in general. Therefore the
empirical (non-parametric) bootstrap is preferable in most empirical work.
We obtain a natural estimator of θ by using the plug-in principle, namely we plug-in the
estimator b instead of β:
θ̂ = a(b)
16
Next we can think of bootstrapping the estimator θ̂. A natural way to define the bootstrap
draw θ̂ ∗ is to apply the transformation a(·) to the bootstrap draw of the OLS estimator b∗ ,
that is
θ̂ ∗ = a(b∗ )
This works. Bootstrapping smooth functionals of vector means provides a valid distributional
approximation, namely that
√ d √ d
n(θ̂ − θ) → N (0, Avar(θ̂)) n(θ̂ ∗ − θ̂)|(y, X) → N (0, Avar(θ̂))
where
∂f (θ)
Avar(θ̂) = F (θ)Avar(b)F (θ)0 F (θ) =
∂θ 0
√
The approximate distribution of n(θ̂ − θ) is obtained by the delta-method, see Hayashi
√
page 93. Similar arguments can be used to derive the approximate distribution of n(θ̂ ∗ −
θ̂)|(y, X).
The main practical benefit of this result is that standard errors and confidence intervals
for complicated nonlinear functions can be computed in a simple way from the bootstrap
sample of θ̂s∗ , for s = 1, . . . , S. In particular, the algorithms of the previous section apply
and no analytical computation of the derivative is required.
So far we only discussed the bootstrap for iid data. When there is dependence in the data
the implementation of the bootstrap changes as sampling the pairs (yi , xi ) independently is
no longer appropriate. This would ignore the dependence is the sample and lead to incorrect
standard errors and confidence sets.
The idea is to divide data in blocks, where dependence is preserved within blocks, and
then bootstrap the blocks, treating them as independent units of observations. Here we
provide a brief description of the construction of the bootstrap samples. We refer to Horowitz
17
(2001) for assumptions and theoretical results.
Lets assume that we want to draw a bootstrap sample from a stationary series where
the dependence is not too strong {y1 , . . . , yn }.3 The blocks of data can be overlapping or
non-overlapping. We focus on the non-overlapping case. Let yij = (yi , yi+1 , . . . , yi+j ) for
i < j, and s = j − i + 1 is the block size. We assume for simplicity that n = sd for some
integer d. We construct the bootstrap sample {y1∗ , . . . , yn∗ } by stacking d blocks random
drawn from {y1s , ys+1
2s ds
, . . . , y(d−1)s+1 } with replacement. The block size s should be chosen
such that s → ∞ and s = O(n1/3 ) as n → ∞.
With this construction of the bootstrap sample in hand we can proceed as for the iid
case above.
The bootstrap method was introduced by Bradley Efron. Pioneer work in the development
of asymptotic theory for the bootstrap includes Bickel & Friedman (1981) and Gine & Zinn
(1990). Hall (1992) studied the higher order properties of the bootstrap. For applications to
Econometrics, including GMM that we discuss in Hayashi Chapter 3, see Horowitz (2001)s
chapter in the Handbook of Econometrics.
• Peter J. Bickel and David A. Freedman. Some asymptotic theory for the bootstrap.
Ann. Statist., 9(6):1196-1217, 1981.
• B. Efron. Bootstrap methods: another look at the jackknife. Ann. Statist., 7(1):126,
1979.
• Evarist Gine and Joel Zinn. Bootstrapping general empirical measures. Ann. Probab.,
18(2):851869, 1990.
18
• Peter Hall. The bootstrap and Edgeworth expansion. Springer Series in Statistics.
Springer-Verlag, New York, 1992.
• Peter Hall and Joel L. Horowitz. Bootstrap critical values for tests based on generalized-
method-of-moments estimators. Econometrica, 64(4):891916, 1996.
• Joel L. Horowitz. The bootstrap. In James J. Heckman and Edward Leamer, editors,
Handbook of Econometrics. Volume 5. Elsevier: North-Holland, 2001.
19