Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Advanced Econometric Methods I: Lecture Notes On Bootstrap: 1 Motivation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Advanced Econometric Methods I:

Lecture notes on Bootstrap

October 31, 2020

We first discuss the empirical bootstrap for the sample mean, and then generalize the analysis
to linear regression. We then discuss algorithmic details and extensions.

1 Motivation

• Summary: The bootstrap is a simulation method for computing standard errors and
distributions of statistics of interest, which employs an estimated dgp (data generating
process) for generating artificial (bootstrap) samples and computing the (bootstrap)
draws of the statistic. Empirical or nonparametric bootstrap relies on nonparametric
estimates of the dgp, whereas parametric bootstrap relies on parametric estimates of
the dgp.

The bootstrap is a method for estimating the distribution of an estimator or test statistic
by resampling ones data. It amounts to treating the data as if they were the population for
the purpose of evaluating the distribution of interest. Under mild regularity conditions, the
bootstrap yields an approximation to the distribution of an estimator or test statistic that
is at least as accurate as the approximation obtained from first-order asymptotic theory.
Thus, the bootstrap provides a way to substitute computation for mathematical analysis if
calculating the asymptotic distribution of an estimator or statistic is difficult.
While the asymptotic distribution of the OLS estimator was quite simple to derive and
estimate there are many statistics for which the asymptotic distribution is very hard to

1
estimate. Prominent examples include the distribution of complicated nonlinear functions
of parameters (e.g. impulse responses in vector autoregressive models, not fun to do delta
method) and the distribution of parameter estimates in complicated nonlinear models.
In fact, the bootstrap is often more accurate in finite samples than first-order asymptotic
approximations but does not entail the algebraic complexity of higher-order expansions.
Thus, it can provide a practical method for improving upon first-order approximations.
Such improvements are called asymptotic refinements. One use of the bootstraps ability to
provide asymptotic refinements is bias reduction. It is not unusual for an asymptotically
unbiased estimator to have a large finite-sample bias, see for example the discussion in
Hayashi regarding the estimation of Ŝ. This bias may cause the estimators finite-sample mean
square error to greatly exceed the mean-square error implied by its asymptotic distribution.
The bootstrap can be used to reduce the estimators finite-sample bias and, thereby, its
finite-sample mean-square error. The bootstraps ability to provide asymptotic refinements is
also important in hypothesis testing. First-order asymptotic theory often gives poor approx-
imations to the distributions of test statistics with the sample sizes available in applications.
As a result, the nominal probability that a test based on an asymptotic critical value rejects a
true null hypothesis can be very different from the true rejection probability. The bootstrap
often provides a tractable way to reduce or eliminate finite-sample errors in the rejection
probabilities of statistical tests.
The problem of obtaining critical values for test statistics is closely related to that of
obtaining confidence intervals. Accordingly, the bootstrap can also be used to obtain confi-
dence intervals with reduced errors in coverage probabilities. That is, the difference between
the true and nominal coverage probabilities is often lower when the bootstrap is used than
when first-order asymptotic approximations are used to obtain a confidence interval.
The purpose of these lecture notes is to explain and illustrate the usefulness of the boot-
strap in context of simple examples. The presentation is informal and expository. Its aim is
to provide an intuitive understanding of how the bootstrap works and a feeling for its prac-
tical value in econometrics. The discussion in these notes does not provide a mathematically

2
detailed or rigorous treatment of the theory of the bootstrap. Such treatments are available
in the journal articles that are cited later in these notes.
The remainder is organized as follows. To illustrate the main idea we discuss the boot-
strap for the sample mean based on iid data. Then we extend these arguments to the linear
regression model (for iid data) and provide the relevant algorithms for practical implemen-
tation. Finally, we discuss some extensions.

2 Bootstrapping for the sample mean

• In what follows we mostly focus on the empirical (or non-parametric) bootstrap, the
parametric bootstrap is discussed at the end.

Almost everything about the empirical bootstrap can be understood by studying the
bootstrap for the sample mean statistic

n
1X
ȳ = yi
n i=1

We first show how one can explore the behavior of ȳ using simulation. Assume for simplicity
that {yi , i = 1, . . . , n} is a random sample with fixed distribution function F and with
the second moments bounded from above and variance bounded away from zero. This
characterizes the dgp sufficiently for understanding the standard behavior of its sample
means.
In illustrations given below, we shall use standard exponential distribution as F and the
sample size of n = 100. However, there is nothing special about this distribution and we
could have used other distributions with bounded second moments and non-zero variance to
illustrate our points.
Since we know the true dgp in this running example, we can in principle compute the
exact finite distribution of the sample mean. However, setting aside special cases suitable
for textbook problems, the exact distribution of ȳ is not analytically tractable. Instead we

3
proceed by simulating out the finite sample distribution of ȳ. Our simulation will produce
the exact distribution, modulo numerical error, which we take as negligible. In Figure 1
we see the resulting finite sample distribution as well as the standard deviation (standard
error) for this distribution. The distribution is represented by a histogram computed over
S = 1000 simulated samples. The standard error here is computed over the simulated draws
of ȳ, namely v
u S S
!2
u1 X 1 X
t ȳs − ȳj ,
S s=1 S j=1

where ȳs is the sample mean of the sth simulated sample. This standard error is a numerical
approximation to the standard deviation of ȳ

r
p 1
Var(ȳ) = Var (y1 )
n

which in the case of the standard exponential and n = 100 is 1/100 = 0.1. You see that the
standard error in the top panel of Figure 1 is very close.
Although the exact distribution of ȳ is not available without knowledge of F0 , we know
that it is approximately normal by the central limit theorem. Thus, since E(ȳ) = E(y1 ), we
have that
√ d
n(ȳ − E(y1 )) → N (0, Var(y1 ))

or equivalently stated as1


a
ȳ ∼ N (E(y1 ), Var(y1 )/n)

Next we consider the empirical bootstrap. Now we want to understand the behavior
of the sample mean ȳ from an unknown dgp F with characteristics as above. Since we
do not know F , we cannot simulate from it. The main idea of the bootstrap is to replace
the unknown true dgp F with a good estimate F̂ . Empirical bootstrap uses the empirical
distribution F̂ , which assigns point-masses of 1/n to each of the data points {y1 , . . . , yn }.
1 a
The notation ∼ implies that the asymptotic distributions on the left and right are equivalent. This
allows us to think about the large sample behavior of statistics without having to take the limit.

4
Using the true law F 0
SD = 0.10231
100

50

0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4

100 SD = 0.096677 Using the bootstrap law F^

50

0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25 1.30

Figure 1: True and bootstrap distributions of the mean of a standard exponential random
sample, with the sample size equal 100. Both distributions are approximately normal by the
central limit theorem, but centered at different points: the true distribution is centered at
the true mean and the second is centered at the empirical mean.

5
In other words, F̂ is a multinomial variable that takes on values {y1 , . . . , yn } with equal
probability 1/n. We proceed as above by simulating i.i.d. samples (bootstrap samples)

{yi∗ }ni=1

where the ∗ indicates that the variables are not simulated from the true distribution function
but instead from F̂ , the empirical distribution function. Notice that this is equivalent to
sampling from the original data randomly with replacement.2 Each bootstrap sample gives
us a bootstrap draw of the sample mean

n
∗ 1X ∗
ȳ = y
n i=1 i

We repeat this procedure many times )S = 1000) to construct many bootstrap samples and
hence many draws of this statistic.
In Figure 1 we see the finite sample distribution of ȳ ∗ (and standard error) for this
distribution. An important point is that this bootstrap distribution is computed conditional
on one draw of data {y1 , . . . , yn }, which you should think of as the original data sample.
We note that not only the standard deviations of the bootstrap draws ȳ ∗ and actual
draws ȳ look very similar, but also the overall distribution of bootstrap draws ȳ ∗ and the
actual draws ȳ looks very similar. This is not a coincidence.
To explain this we must keep in mind that yi∗ ∼ F̂ means that yi∗ |y1 , . . . , yn follows a
discrete distribution with possible outcomes y1 , . . . , yn that each occur with probability 1/n.
On a computer you can implement this as follows. Randomly draw a variable u∗ ∼ U (0, 1) and compute
2

z = b(n − 1) ∗ u + 1c, where b·c takes the integer part of the argument. Now set yi∗ = yz∗ .

6
This implies that the mean of the bootstrap distribution of ȳ ∗ is

n
∗ 1X
E(ȳ |y1 , . . . , yn ) = E(yi∗ |y1 , . . . , yn )
n i=1
n n
!
1X X
= yj P (yi∗ = yj )
n i=1 j=1
n n
1X1X
= yj = ȳ
n i=1 n j=1

Similarly, the standard deviation of the bootstrap distribution of ȳ ∗ is

v
u n
p u1 X

Var(ȳ |y1 , . . . , yn ) = t (yi − ȳ)2
n2 i=1

which is simply the root of the empirical variance scaled by n. The latter follows as

n
1 X
Var(ȳ ∗ |y1 , . . . , yn ) = Var(yi∗ |y1 , . . . , yn )
n2 i=1
n n
!
1 X X
= 2 (yj − ȳ)2 P (yi∗ = yj )
n i=1 j=1
n
1 X
= (yj − ȳ)2 .
n2 j=1

By the law of large numbers and some simple calculations,

n
1X p
(yi − ȳ)2 → Var(y1 )
n i=1

we have that the ratio of bootstrap standard error and the actual standard error converges in
probability to 1. Thus, the similarity of the computed standard errors was not a coincidence.
Of course, we did not need the bootstrap to compute the standard errors of a sample mean,
but we will need it for less tractable cases.
We can approximate the exact distribution of ȳ ∗ , conditional on the data, by simulation.
Moreover, it is also approximately normal in large samples. By the central limit theorem

7
and the law of large numbers,

n
!
a 1 X
ȳ ∗ |y1 , . . . , yn ∼ N ȳ, 2 (yi − ȳ)2
n i=1
a
∼ N (ȳ, Var (y1 ) /n)
n
!
√ a 1X
n(ȳ ∗ − ȳ)|y1 , . . . , yn ∼ N 0, (yi − ȳ)2
n i=1
a
∼ N (0, Var (y1 ))

Thus, we find that

√ √
(i) The approximate distributions of n(ȳ ∗ − ȳ)|y1 , . . . , yn and n(ȳ − E(y1 )), namely
N 0, n1 ni=1 (yi − ȳ)2 and N (0, Var (y1 )), are indeed close.
P 

(ii) This means that their finite-sample distributions must be close.

We summarize the discussion of empirical bootstrap of the sample mean in the following
table.

world dgp sample statistic approximate distributions


√ a
real F y1 , . . . , yn ȳ n(ȳ − E(y1 )) ∼ N (0, Var (y1 ))
√ ∗ a
y1∗ , . . . , yn∗ ȳ ∗ n(ȳ − ȳ)|y1 , . . . , yn ∼ N 0, n1 ni=1 (yi − ȳ)2
P 
bootstrap F̂

Thus, what we see in Figure 1 is not a coincidence: we conclude that the empirical
bootstrap ’works’ or ’is valid’ for the case of the sample mean. We formalize these statements
further below. It is clear that the reasoning about the approximate distributions for the
sample and the bootstrap sample extends to vector-valued yi ’s of fixed dimension.
It is also clear that the central limit theorem and approximate normality play a crucial
role in the above argument. The argument will generalize to a very large class of estimators
that are approximately linear and normal.

8
3 Bootstrap for the linear regression model

• In this section we consider the same assumptions as in Hayashi Chapter 2, but with
the modification that we additionally assume that {(yi , xi )} is an iid sequence, which
strengthens assumptions 2.2 and 2.5.

In Chapter 2 of Hayashi we established that

√ d
n(b − β) → Σ−1
xx × N (0, S) = N (0, Avar(b)) Avar(b) = Σ−1 −1
xx SΣxx .

We now want to construct a bootstrap draw b∗ using the bootstrap method that would allow

us to mimic the behavior of n(b − β), namely

√ d
n(b∗ − b)|(y, X) → N (0, Avar(b)) .

We formally require.

Definition 1 (Validity of Bootstrap). A bootstrap method producing b∗ ∈ Rk for some true


parameter β ∈ Rk conditional on data (y, X) is valid if the following two conditions hold


sup P ( n(b − β) ∈ A) − P (N (0, Avar(b)) ∈ A)) → 0
A∈A

and

sup P ( n(b∗ − b) ∈ A|y, X) − P (N (0, Avar(b)) ∈ A)) → 0

A∈A

where A are all convex sets in Rk .

From the triangular inequality we have that a sufficient condition for the validity of the
bootstrap is given by

√ √
sup P ( n(b∗ − b) ∈ A|y, X) − P ( n(b − β) ∈ A) → 0 ,

A∈A

The previous definition however emphasizes the link with approximate normality, which is

9
key to demonstrating that the empirical bootstrap works. In particular, to prove that a
bootstrap is valid, an econometrician would typically proceed by proving both statements
in the theorem separately.

3.1 A quick bootstrap for linear regression

We first discuss a quick way of bootstrapping the linear regression model which is based on
the following approximation.

n
!−1 n
√ 1X 1 X
n(b − β) = xi x0i √ xi ε i
n i=1 n i=1
 !−1 
n n n
1X 1 X 1 X
= xi x0i − Σ−1
xx
 √ xi εi + Σ−1
xx √ xi ε i
n i=1 n i=1 n i=1
n
1 X
=Σ−1
xx √ gi + op (1) ,
n i=1

p
where op (1) denotes a term that converges to zero in probability, it arises as n1 ni=1 xi x0i →
P

d
Σxx and √1n ni=1 xi εi → N (0, S). Implying that the first term in the second line converges
P


to zero by lemma 2.4.b in Hayashi. Thus the OLS estimator is approximately equal to n
times a sample mean over the {gi }’s times a fixed matrix Σxx .
Thus we could simply bootstrap the gi ’s. Indeed, let {g1∗ , . . . , gn∗ } denote the empirical
bootstrap draws from the sample {gi }, and define the bootstrap draw b∗ via the relation:

n
√ 1 X ∗
n(b∗ − b) = Σ−1
xx √ (g − ḡ) ,
n i=1 i

1
Pn
where ḡ = n i=1 gi .
By the central limit theorem, law of large numbers, and smoothness of the Gaussian law

10
we have the following properties:

n
1 X ∗ a
√ (gi − ḡ)|g ∼ N (0, Ŝ)
n i=1
a
∼ N (0, S)

Pn
where Ŝ = 1
n i=1 (gi − ḡ)(gi − ḡ)0 and S = E(gi gi0 ). These findings follow similarly as for
the sample mean case discussed above (keep in mind that I am assuming {(yi , xi )} to be an
iid sample).
This reasoning implies that our quick bootstrap is valid

√ a
n(b∗ − b)|y, X ∼ Σ−1
xx × N (0, S)

Pn
In practice we need to replace Σxx and gi ’s with consistent estimators Sxx = 1
n i=1 xi x0i
and ĝi = xi (yi − x0i b) such that

n
p 1X p
Sxx − Σxx → 0 and kĝi − gi k2 → 0
n i=1

and then define the bootstrap draws via

n
√ ∗ 1 X ∗ ¯
n(b − b) = S−1
xx ×√ (ĝ − ĝ) ,
n i=1 i

¯ =
where ĝ 1
Pn
ĝi . Note that here we are bootstrapping the estimated errors xi ei . We
n i=1

can then arrive at the following conclusion.

Theorem 1 (Validity of Quick Bootstrap for Linear Regression). Under regularity con-
ditions, the quick bootstrap method is valid. That is, the quick bootstrap method approxi-
mately implements the normal distributional approximation for the OLS estimator. More-
over, the bootstrap variance estimator V̂ = E((b∗ − b)(b∗ − b)|y, X) is consistent, namely
p
that V̂ − Avar(b) → 0.

11
Notice that in the theorem we define V̂ = E((b∗ − b)(b∗ − b)|y, X) as the estimator,
this may seem odd, but keep in mind that the expectation is with respect to the bootstrap
law F̂ , which we can approximate arbitrarily accurate by drawing samples as we show in the
section on algorithms below.

3.2 A slow bootstrap method for linear regression

Instead of bootstrapping {xi ei } we could also bootstrap the entire procedure for obtaining
OLS estimates. That is, we sample with replacement

{(y1∗ , x∗1 ) . . . (yn∗ , x∗n )}

from the observed sample {(yi , xi )}. Importantly, the pairs (yi∗ , x∗i ) are sampled as pairs,
therefore this is sometimes referred to as the pairs-bootstrap.
Based on the bootstrap sample we may compute

n
!−1 n
∗ 1 X ∗ ∗0 1X ∗ ∗
b = xx xy
n i=1 i i n i=1 i i

Note that from Hayashi section 2.9 we have that

p 0
b∗ |(y, X) → E(x∗i x∗i |y, X)−1 E(x∗i yi∗ |y, X)

and !−1
n n
0 1X 1X
E(x∗i x∗i |y, X)−1 E(x∗i yi∗ |y, X) = xi x0i xi yi = b ,
n i=1 n i=1

which again follows from similar arguments as used for the sample mean case.
This implies that the bootstrap coefficient b∗ is the OLS estimate in the linear projection
model (or more clearly put the bootstrap dgp) given by

0
yi∗ = x∗i b + e∗i E(x∗i e∗i |y, X) = 0 ,

12
where the moment restriction E(x∗i e∗i |y, X) = 0 follows by construction as

0
E(x∗i e∗i |y, X) = E(x∗i (yi∗ − x∗i b)|y, X)
0 0
= E(x∗i yi∗ |y, X) − E(x∗i x∗i |y, X)E(x∗i x∗i |y, X)−1 E(x∗i yi∗ |y, X) = 0 .

You should think about the bootstrap dgp as the model for (yi∗ , x∗i ) that you obtain by
sampling (yi∗ , x∗i ) from the empirical distribution sample. Note that the key difference is
that it replaces the unknown parameter β by the OLS estimate b. This is what is meant by
replacing the unknown data generating process by an estimate.
Now consider !−1
n n
√ ∗ 1 X ∗ ∗0 1 X ∗ ∗
n(b − b) = xx √ xe
n i=1 i i n i=1 i i

which follows by plugging in the bootstrap dgp. Now we have

n
1 X ∗ ∗0 p
xi xi |(y, X) → Σxx
n i=1

and
n
1 X ∗ ∗ a
√ xi ei |(y, X) ∼ N (0, S)
n i=1

Combining we have that

√ a
n(b∗ − b)|(y, X) ∼ Σ−1
xx × N (0, S)

Theorem 2 (Validity of Slow Bootstrap for Linear Regression). Under regularity condi-
tions, the slow bootstrap method is valid. That is, the slow bootstrap method approximately
implements the normal distributional approximation for the OLS estimator. Moreover, the
bootstrap variance estimator V̂ = E((b∗ − b)(b∗ − b)|y, X) is consistent, namely that
p
V̂ − Avar(b) → 0.

Formal proofs of Theorems 1 and 2 follow exactly the same steps, but stating the conver-
0 p
gence results formally requires a little more case. For instance, the statement n1 ni=1 x∗i x∗i |(y, X) →
P

13
Σxx is true, but the mode of convergence is slightly different as the the bootstrap mean has
a conditional distribution (given the data) so we are actually using a conditional law of large
numbers, which you can think about as a law of large number under the bootstrap law F̂ .
You do not need to worry about these issues for this lecture. You can find an accessible
treatment in Chapter 10 of Hansen’s textbook.

4 Algorithms

A basic use of bootstrap is for estimation of standard errors and construction of the confi-
dence intervals. The following algorithms give the details.

Computing Bootstrap Standard Errors:

(i) Obtain many bootstrap draws b∗s of the estimator b where the index s = 1, . . . , S
enumerates the bootstrap draws.

(ii) Compute the bootstrap variance estimator

S
1X ∗
V̂/n = (b − b)(b∗s − b)
S s=1 s

1/2
(iii) Report (V̂/n)jj as the standard error for bj , for j = 1, . . . , K.a

a
Note that we defined V̂ as the variance estimate for n(b∗ − b), the 1/n comes from this.

14
Computing Bootstrap Confidence Intervals:

(i) Obtain many bootstrap draws b∗s of the estimator b where the index s = 1, . . . , B
enumerates the bootstrap draws.

(ii) Compute the bootstrap variance estimator

S
1X ∗
V̂/n = (b − b)(b∗s − b)
S s=1 s

(iii) Report the normal-approximating confidence interval

1/2 1/2
[bj − zα/2 (V̂/n)jj , bj + zα/2 (V̂/n)jj ]

where zα is the level α critical value of the normal distribution.

(iv) Or report the percentile confidence interval

[b∗j,(α/2) , b∗j,(1−α/2) ]

where b∗j,(α) is the α empirical quantile of the jth coefficients of the sample of
bootstrap draws.

To clarify the percentile confidence interval, suppose that S = 1000 and α = 0.05 we
would use b∗j,(25) and b∗j,(975) , in words b∗j,(25) is the 25th largest b∗j,s for s = 1, . . . , S and b∗j,(975)
is the 975th largest b∗j,s for s = 1, . . . , S.

15
5 Extensions

5.1 Parametric bootstrap

An alternative to the empirical (non-parametric) bootstrap is the parametric bootstrap. In


some settings researchers may have confidence that the true distribution is of the parametric
form F (θ) where θ is a finite dimensional vector of parameters, e.g. F (θ) may correspond
to the normal distribution function with parameters θ = (µ, σ 2 ). However, the researcher
does not know the parameter vector θ.
In this case the bootstrap can be implemented by exploiting the knowledge of F and
replacing the unknown parameters θ by consistent estimates.
Recall, the example for the sample mean. Suppose that now the researcher knows that
the data is exponentially distributed yi ∼ exp(θ), but the parameter θ is unknown. One can
construct the bootstrap dgp by replacing θ by θ̂ = n1 ni=1 yi and then sampling yi∗ ∼ exp(θ̂).
P

Now clearly this will work if exp(λ) is indeed the true distribution, but if the true dis-
tribution is different the parametric bootstrap will not be valid in general. Therefore the
empirical (non-parametric) bootstrap is preferable in most empirical work.

5.2 Delta method for bootstrap

Here we are interested in some smooth nonlinear transformation θ = a(β) of β. The


function a(·) is assumed to be differentiable with continuous first derivatives, similar as for
the nonlinear Wald test in Hayashi. We have that β can be consistently estimated by OLS,
such that
√ d
n(b − β) → N (0, Avar(b))

We obtain a natural estimator of θ by using the plug-in principle, namely we plug-in the
estimator b instead of β:
θ̂ = a(b)

16
Next we can think of bootstrapping the estimator θ̂. A natural way to define the bootstrap
draw θ̂ ∗ is to apply the transformation a(·) to the bootstrap draw of the OLS estimator b∗ ,
that is
θ̂ ∗ = a(b∗ )

This works. Bootstrapping smooth functionals of vector means provides a valid distributional
approximation, namely that

√ d √ d
n(θ̂ − θ) → N (0, Avar(θ̂)) n(θ̂ ∗ − θ̂)|(y, X) → N (0, Avar(θ̂))

where
∂f (θ)
Avar(θ̂) = F (θ)Avar(b)F (θ)0 F (θ) =
∂θ 0

The approximate distribution of n(θ̂ − θ) is obtained by the delta-method, see Hayashi

page 93. Similar arguments can be used to derive the approximate distribution of n(θ̂ ∗ −
θ̂)|(y, X).
The main practical benefit of this result is that standard errors and confidence intervals
for complicated nonlinear functions can be computed in a simple way from the bootstrap
sample of θ̂s∗ , for s = 1, . . . , S. In particular, the algorithms of the previous section apply
and no analytical computation of the derivative is required.

6 Bootstrap for dependent data

So far we only discussed the bootstrap for iid data. When there is dependence in the data
the implementation of the bootstrap changes as sampling the pairs (yi , xi ) independently is
no longer appropriate. This would ignore the dependence is the sample and lead to incorrect
standard errors and confidence sets.
The idea is to divide data in blocks, where dependence is preserved within blocks, and
then bootstrap the blocks, treating them as independent units of observations. Here we
provide a brief description of the construction of the bootstrap samples. We refer to Horowitz

17
(2001) for assumptions and theoretical results.
Lets assume that we want to draw a bootstrap sample from a stationary series where
the dependence is not too strong {y1 , . . . , yn }.3 The blocks of data can be overlapping or
non-overlapping. We focus on the non-overlapping case. Let yij = (yi , yi+1 , . . . , yi+j ) for
i < j, and s = j − i + 1 is the block size. We assume for simplicity that n = sd for some
integer d. We construct the bootstrap sample {y1∗ , . . . , yn∗ } by stacking d blocks random
drawn from {y1s , ys+1
2s ds
, . . . , y(d−1)s+1 } with replacement. The block size s should be chosen
such that s → ∞ and s = O(n1/3 ) as n → ∞.
With this construction of the bootstrap sample in hand we can proceed as for the iid
case above.

7 Notes and references

The bootstrap method was introduced by Bradley Efron. Pioneer work in the development
of asymptotic theory for the bootstrap includes Bickel & Friedman (1981) and Gine & Zinn
(1990). Hall (1992) studied the higher order properties of the bootstrap. For applications to
Econometrics, including GMM that we discuss in Hayashi Chapter 3, see Horowitz (2001)s
chapter in the Handbook of Econometrics.

• Peter J. Bickel and David A. Freedman. Some asymptotic theory for the bootstrap.
Ann. Statist., 9(6):1196-1217, 1981.

• B. Efron. Bootstrap methods: another look at the jackknife. Ann. Statist., 7(1):126,
1979.

• Evarist Gine and Joel Zinn. Bootstrapping general empirical measures. Ann. Probab.,
18(2):851869, 1990.

• F. Gotze and H. R. K unsch. Second-order correctness of the blockwise bootstrap for


stationary observations. Ann. Statist., 24(5):19141933, 1996.
3
The formal assumptions are stated in terms of the mixing coefficients of the series, which we do not
cover.

18
• Peter Hall. The bootstrap and Edgeworth expansion. Springer Series in Statistics.
Springer-Verlag, New York, 1992.

• Peter Hall and Joel L. Horowitz. Bootstrap critical values for tests based on generalized-
method-of-moments estimators. Econometrica, 64(4):891916, 1996.

• Joel L. Horowitz. The bootstrap. In James J. Heckman and Edward Leamer, editors,
Handbook of Econometrics. Volume 5. Elsevier: North-Holland, 2001.

19

You might also like