0% found this document useful (0 votes)

8 views

Bootstrap Report

Uploaded by

thyagosmesme

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Bootstrap Report

Uploaded by

thyagosmesme

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 92

The Bootstrap Small Sample Properties1

F.W. Scholz
University of Washington

June 25, 2007

1 edited
version of a technical report of same title and issued as bcstech-93-051,
Boeing Computer Services, Research and Technology.
Abstract

This report reviews several bootstrap methods with special emphasis on small
sample properties. Only those bootstrap methods are covered which promise
wide applicability. The small sample properties can be investigated ana-
lytically only in parametric bootstrap applications. Thus there is a strong
emphasis on the latter although the bootstrap methods can be applied non-
parametrically as well. The disappointing confidence coverage behavior of
several, computationally less extensive, parametric bootstrap methods should
raise equal or even more concerns about the corresponding nonparametric
bootstrap versions. The computationally more expensive double bootstrap
methods hold great hope in the parametric case and may provide enough
assurance for the nonparametric case.
Contents
1 The General Bootstrap Idea 3
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Setup and Objective . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Bootstrap Samples and Bootstrap Distribution . . . . . . . . . 7

2 The Bootstrap as Bias Reduction Tool 10

2.1 Simple Bias Reduction . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Bootstrap Bias Reduction . . . . . . . . . . . . . . . . . . . . 13
2.3 Iterated Bias Reduction . . . . . . . . . . . . . . . . . . . . . 13
2.4 Iterated Bootstrap Bias Reduction . . . . . . . . . . . . . . . 14

3 Variance Estimation 16
3.1 Jackknife Variance Estimation . . . . . . . . . . . . . . . . . . 16
3.2 Substitution Variance Estimation . . . . . . . . . . . . . . . . 17
3.3 Bootstrap Variance Estimation . . . . . . . . . . . . . . . . . 17

4 Bootstrap Confidence Bounds 19

4.1 Efron’s Percentile Bootstrap . . . . . . . . . . . . . . . . . . . 20
4.1.1 General Definition . . . . . . . . . . . . . . . . . . . . 20
4.1.2 Example: Bounds for Normal Mean . . . . . . . . . . . 22
4.1.3 Transformation Equivariance . . . . . . . . . . . . . . . 24
4.1.4 A Justification in the Single Parameter Case . . . . . . 25
4.2 Bias Corrected Percentile Bootstrap . . . . . . . . . . . . . . . 27
4.2.1 General Definition . . . . . . . . . . . . . . . . . . . . 27
4.2.2 Example: Bounds for Normal Variance . . . . . . . . . 28
4.2.3 Transformation Equivariance . . . . . . . . . . . . . . . 32
4.2.4 A Justification in the Single Parameter Case . . . . . . 33
4.3 Hall’s Percentile Method . . . . . . . . . . . . . . . . . . . . . 35
4.3.1 General Definition . . . . . . . . . . . . . . . . . . . . 35
4.3.2 Example: Bounds for Normal Variances Revisited . . . 38
4.3.3 Relation to Efron’s Percentile Method . . . . . . . . . 39
4.4 Percentile-t Bootstrap . . . . . . . . . . . . . . . . . . . . . . 41
4.4.1 Motivating Example . . . . . . . . . . . . . . . . . . . 41
4.4.2 General Definition . . . . . . . . . . . . . . . . . . . . 45
4.4.3 General Comments . . . . . . . . . . . . . . . . . . . . 46

1
5 Double Bootstrap Confidence Bounds 47
5.1 Prepivot Bootstrap Methods . . . . . . . . . . . . . . . . . . . 48
5.1.1 The Root Concept . . . . . . . . . . . . . . . . . . . . 48
5.1.2 Confidence Sets From Exact Pivots . . . . . . . . . . . 48
5.1.3 Confidence Sets From Bootstrapped Roots . . . . . . . 50
5.1.4 The Iteration or Prepivoting Principle . . . . . . . . . 51
5.1.5 Calibrated Confidence Coefficients . . . . . . . . . . . . 52
5.1.6 An Analytical Example . . . . . . . . . . . . . . . . . . 53
5.1.7 Prepivoting by Simulation . . . . . . . . . . . . . . . . 55
5.1.8 Concluding Remarks . . . . . . . . . . . . . . . . . . . 57
5.2 The Automatic Double Bootstrap . . . . . . . . . . . . . . . . 58
5.2.1 Exact Confidence Bounds for Tame Pivots . . . . . . . 58
5.2.2 The General Pivot Case . . . . . . . . . . . . . . . . . 62
5.2.3 The Prepivoting Connection . . . . . . . . . . . . . . . 66
5.2.4 Sensitivity to Choice of Estimates . . . . . . . . . . . . 68
5.2.5 Approximate Pivots and Iteration . . . . . . . . . . . . 70
5.3 A Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.3.1 Efron’s Percentile Method . . . . . . . . . . . . . . . . 77
5.3.2 Hall’s Percentile Method . . . . . . . . . . . . . . . . . 78
5.3.3 Bias Corrected Percentile Method . . . . . . . . . . . . 82
5.3.4 Percentile-t and Double Bootstrap Methods . . . . . . 89
5.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

2
1 The General Bootstrap Idea
1.1 Introduction
The bootstrap method was introduced by Efron in 1979. Since then it has
evolved considerably. Efron’s paper has initiated a large body of hard theo-
retical research (much of it of asymptotic or large sample character) and it
has found wide acceptance as a data analysis tool. Part of the latter is due
to its considerable intuitive appeal, which is in contrast to the often deep
mathematical intricacies underlying much of statistical analysis methodol-
ogy. The basic bootstrap method is easily grasped by practitioners and by
consumers of statistics.
The popularity of the bootstrap was boosted early on by the very readable
Scientific American article by Diaconis and Efron (1983). Having chosen the
catchy name “bootstrap” certainly has not hurt its popularity. In Germany
one calls the bootstrap method “die Münchhausen Methode,” named after
Baron von Münchhausen, a fictional character in many phantastic stories. In
one of these he is supposed to have saved his life by pulling himself out of a
swamp by his own hairs. The first reference to “die Münchhausen Methode”
can be traced to the German translation of the Diaconis and Efron article,
which appeared in Spektrum der Wissenschaft in the same year. There the
translator recast the above episode to the following iamge: Pull yourself by
your mathematical hairs out of the statistical swamp.
Hall (1992) on page 2 of his extensive monograph on the bootstrap expresses
these contrasting thoughts concerning the “bootstrap” name:

Somewhat unfortunately, the name “bootstrap” conveys the im-

pression of “something for nothing” — of statisticians idly re-
sampling from their samples, presumably having about as much
success as they would if they tried to pull themselves up by their
bootstraps. This perception still exists in some quarters. One
of the aims of this monograph is to dispel such mistaken impres-
sions by presenting the bootstrap as a technique with a sound
and promising theoretical basis.

Much of the bootstrap’s strength and acceptance also lies in its versatility. It
can handle a very wide spectrum of data analysis situations with equal ease.
In fact, it facilitates data anlyses that heretofore were simply impossible be-
cause the obstacles in the mathematical analysis were just too forbidding.

3
This gives us the freedom to model the data more accurately and obtain ap-
proximate answers to the right questions instead of right answers to often the
wrong questions. This freedom is bought at the cost of massive simulations
of resampled data sets followed by corresponding data analyses for each such
data set. The variation of results obtained in these alternate data analyses
should provide some insight into the accuracy and uncertainty of the data
analysis carried out on the original data.
This approach has become feasible only because of the concurrent advances
in computing. However, certain offshoots of the bootstrap, such as iterated
bootstrap methods, can still strain current computing capabilities and effi-
cient computing strategies are needed.
As stated above, the bootstrap has evolved considerably and there is no
longer a single preferred method, but a wide spectrum of separate methods,
all with their own strengths and weaknesses. All of these methods generally
agree on the same basic bootstrap idea but differ on how they are imple-
mented.
There are two major streams, namely the parametric bootstrap and the non-
parametric bootstrap, but even they can be viewed in a unified fashion. The
primary focus of this report is on parametric bootstrap methods, although
the definitions for the various bootstrap methods are general enough to be
applicable for the parametric and nonparametric case. The main reason for
this focus is that in certain parametric examples one can examine analyti-
cally the small sample properties of the various bootstrap methods. Such an
analysis is not possible for the nonparametric bootstrap.

1.2 Setup and Objective

We begin by assuming a very generic data analysis situation, namely that
we have some data set X. Data are uncertain for various reasons (sampling
variability, measurement error, etc.) and we agree that the data set was
generated by a probability mechanism which we denote by P . We do not
know P , but through X we get some indirect information about P . Much of
statistical inference consists in using X to make some inference concerning
the particulars of P .
A very common structure for X is that it represents some random sample,
i.e., X = (X1 , . . . , Xn ) and the Xi are independent and identically distributed
(i.i.d.). Other structures involve known covariates, which can be thought of
as being a known part of the specified probability model. By keeping the

4
data set as generic as possible we wish to emphasize the wide applicability
of the bootstrap methods.
Not knowing P is usually expressed by stating that P is one of many possible
probability mechanisms, i.e., we say that P is a member of a family P of
probability models that could have generated X.
In the course of this report we will repeatedly use specific examples for prob-
ability models and for ease of reference we will list most of them here.

Example 1: X = (X1 , . . . , Xn ) is a random sample of size n from some

distribution function F ∈ F, the family of all such distribution func-
tions on the real line, and let P = {PF : F ∈ F}. We say that X =
(X1 , . . . , Xn ) was generated by PF (and write X ∼ PF ) if X1 , . . . , Xn
are independent, each having the same distribution function F .

Example 2: X = (X1 , . . . , Xn ) is a random sample of size n from a normal

population with mean µ and variance σ 2 . Let F be the family of all
normal distributions, with µ ∈ R and σ > 0 and let P = {PF : F ∈
F}. We say that X = (X1 , . . . , Xn ) was generated by PF (and write
X ∼ PF ) if X1 , . . . , Xn are independent, each having the same normal
distribution function F ∈ F.

Example 3: X = {(t1 , Y1 ), . . . , (tn , Yn )}, where t1 , . . . , tn are fixed known

constants (not all equal) and Y1 , . . . , Yn are independent random vari-
ables, which are normally distributed with common variance σ 2 and
respective means µ(t1 ) = α + βt1 , . . . , µ(tn ) = α + βtn , i.e., we write
Yi ∼ N (α + βti , σ 2 ). Here α, β, and σ > 0 are unknown parameters.
Let F = {F = (F1 , . . . , Fn ) : Fi ≡ N (α + βti , σ 2 ), i = 1, . . . , n} and
we say X ∼ PF ∈ P if the distribution of the independent Y ’s is given
by F = (F1 , . . . , Fn ) ∈ F, i.e., Yi ∼ Fi .

Example 4: X = {(U1 , V1 ), . . . , (Un , Vn )} is a random sample of size n from

a bivariate normal population with means µ1 and µ2 , standard devia-
tions σ1 > 0 and σ2 > 0 and correlation coefficient ρ ∈ (−1, 1), i.e., we
write (Ui , Vi ) ∼ N2 (µ1 , µ2 , σ1 , σ2 , ρ). Let F be the family of all such
bivariate normal distributions and let P = {PF : F ∈ F}. We say that
X = {(U1 , V1 ), . . . , (Un , Vn )} was generated by PF (and write X ∼ PF )
if (U1 , V1 ), . . ., (Un , Vn ) are independent, each having the same bivariate
normal distribution function F .

5
The first example is of nonparametric character, because the parameter F
that indexes the various PF ∈ P cannot be fit into some finite dimensional
space. Also, we deal here with a pure random sample, i.e., with i.i.d. random
variables.
The second, third, and fourth example are of parametric nature, since there is
a one to one correspondence between F and θ = (µ, σ) in Example 2, between
F and θ = (α, β, σ) in Example 3, and between F and θ = (µ1 , µ2 , σ1 , σ2 , ρ)
in Example 4. We could as well have indexed the possible probability mech-
anisms by θ, i.e., write Pθ , with θ varying over some appropriate subset
Θ ⊂ R2 , Θ ⊂ R3 , or Θ ⊂ R5 , respectively. In Example 3 the data are inde-
pendent but not identically distributed, since the mean of Yi changes linearly
with ti .
Of course, we could identify θ with F also in the first example and write
P = {Pθ : θ ∈ Θ}, with Θ = F being of infinite dimensionality in that
case. Because of this we will use the same notation describing any family P,
namely
P = {Pθ : θ ∈ Θ}
and the whole distinction of nonparametric and parametric probability model
disappears in the background, where it is governed by the character of the
indexing set Θ.
Many statistical analyses concern themselves with estimating θ, i.e., with
estimating the probability mechanism that generated the data. We will as-
sume that we are always able to find such estimates and we denote a generic
estimate of θ by θb = θ(X),
b where the emphasized dependence on X should
make clear that any reasonable estimation procedure should be based on the
data at hand. Similarly, if we want to emphasize an estimate of P we write
Pb = Pθb. Finding any estimate at all can at times be a big order, but that
difficulty is not adressed here.
In Example 1 we may estimate θ = F by the empirical distribution function
of the data, i.e, by
n
1X
Fb (x) = I(−∞,x] (Xi )
n i=1
and we write θb = Fb . Here IA (x) = 1 if x ∈ A and IA (x) = 0 if x 6∈ A. Thus
Fb (x) is that fraction of the sample which does not exceed x. Fb (x) can also be
viewed as the cumulative distribution function of a probability distribution
which places probability mass 1/n at each of the Xi . If some of the Xi
coincide, then that common value will receive the appropriate multiple mass.

6
Often one is content with estimating a particular functional ψ = ψ(P ) of P .
This will be the situation on which we will focus from now on. A natural
estimate of ψ would then be obtained in ψb = ψ(Pb ) = ψ(Pθb).
In Example 1 one may be interested in estimating the mean of the sampled
distribution F . Then
Z ∞
ψ = ψ(PF ) = x dF (x)
−∞

and we obtain
∞ n
Z
1X
ψb = ψ PFb = x dFb (x) = Xi = X̄ ,
−∞ n i=1

i.e., the sample average as our estimate of ψ.

Since we index P by θ we may also speak of estimating a functional ψ(θ).
A natural estimate of ψ(θ) would then be ψb = ψ(θ). b This dual use of
ψ(θ) and ψ(P ) should not be confusing, if we keep in mind the convention
ψ(θ) ≡ ψ(Pθ ).
Actually, this functional approach contains the estimation of the full proba-
bility model as a special case by using ψ(θ) = θ. In that case the value set of
ψ may be quite large depending on the nature of Θ. However, in most cases
we will focus on real valued functionals ψ(θ).
Having obtained an estimate ψb raises the following questions: How good is
it? What is its bias? To what extent does the uncertainty in the original
data set influence the estimate, i.e., can we get confidence bounds for the
unknown ψ? These are some of the concerns that the bootstrap method
tries to address.

1.3 Bootstrap Samples and Bootstrap Distribution

If we had the luxury of knowing θ we could generate B resampled data sets
X1 , . . . , XB from Pθ . For each such data set we could get the corresponding
estimate, i.e., obtain ψb1 , . . . , ψbB . By resampled data set we mean that Pθ
generates independent replicates X1 , . . . , XB just as Pθ generated the original
data set X. Of course, it is assumed here that it is always possible to generate
such data sets Xi from Pθ for any given θ. All nonrandom aspects, such as
sample sizes within each data set and the values t1 , . . . , tn in Example 3, are
kept fixed. This should all be understood in the description of the probability
model Pθ .

7
The scatter of these estimates ψb1 , . . . , ψbB would be a reflection of the sam-
pling uncertainty in our original estimate ψ. b As B → ∞, the distribution of
the ψ1 , . . . , ψB represents the sampling distribution of ψ,
b b b i.e., we would then
be in a position to evaluate probabilities such as

QA (θ) = Pθ (ψb ∈ A)

for all appropriate sets A. This follows from the law of large numbers (LLN ),
namely
B
b (θ) = 1
X
Q A IA (ψbi ) −→ QA (θ)
B i=1
as B → ∞. This convergence is “in probability” or “almost surely” and we
will not dwell on it further. Since computing power is cheap, we can let B be
quite large and thus get a fairly accurate approximation of QA (θ) by using
Q
b (θ).
A
Knowledge of this sampling distribution could then be used to set error limits
on our estimate ψ.b For example, we could, by trial and error, find ∆ and
1
∆2 such that
.95 = Pθ (∆1 ≤ ψb ≤ ∆2 ) ,
i.e., 95% of the time we would expect ψb to fall between ∆1 and ∆2 . This
still does not express how far ψb is from the true ψ. This can only be judged
if we relate the position of the ∆i to that of ψ, i.e., write δ1 = ψ − ∆1 and
δ2 = ∆2 − ψ and thus

.95 = Pθ (ψb − δ2 ≤ ψ ≤ ψb + δ1 ) .

All the above is hypothetical, since in reality we don’t know θ. If we did, we

would simply evaluate ψ = ψ(θ) and be done, i.e., we would have no need
for estimating it and would have no need for confidence bounds for ψ.
What we know instead, is an estimate θb of θ. Thus we use Pb = Pθb when
generating B independent replicate data sets X?1 , . . ., X?B . This collection
of alternate data sets is called the bootstrap sample. The asterisk on the X?j
emphasizes that these data sets come from Pθb and not from Pθ . Note that
Pθb represents a conditional distribution of X? given the original data set X,
since θb = θ(X)
b is kept fixed in the resampling process. It is as though we
treat θ as the truth, i.e., Pθb as the true probability model which generates
b
the data set X? .

8
For each X?i obtain the corresponding estimate θbi? and evaluate ψbi? = ψ(θbi? ).
The bootstrap idea is founded in the hope that the scatter of these ψb1? , . . .,
ψbB? should serve as a reasonable proxy for the scatter of ψb1 , . . . , ψbB which we
cannot observe. If we let B → ∞, we would by the LLN be able to evaluate

QA (θ) b? ∈ A) = P (ψ
b = Pb (ψ b? ∈ A)
θb

for all appropriate sets A. This evaluation can be done to any desired degree
of accuracy by choosing B large enough in our simulations, since
B
1 X
Q
b (θ)
A
b = IA (ψbi? ) −→ QA (θ)
b
B i=1

as B → ∞. This collection of probabilities is called the bootstrap distribution

of ψb? .
In this context and in all future bootstrap appeals to the W LLN , it is worth
noting that there is a certain similarity between interpreting Q b (θ)
A
b as an
approximation of QA (θ) b for large B and the computation of any analytical
result to so many decimal places by some algorithm. Mostly, such computed
analytical results are at best approximations. In either case, the more effort
one expends, the more accuracy one gets in the approximation. The only
real difference between the two is that the simulation approach will not get
exactly the same answer when repeated with a different starting seed for the
random number generator.
Much of the theoretical bootstrap discussion has focussed on large samples.
If the chosen estimate θb is a reasonable one (namely consistent), then θb will,
in large samples, yield a very good approximation to the unknown θ. Under
appropriate continuity conditions, namely

Pθb −→ Pθ as θb −→ θ ,

in a sense, to be left unspecified here, one can then say that the bootstrap
distribution of ψb? is a good approximation to the sampling distribution of ψ,
b
i.e.,
Pθb(ψb? ∈ A) ≈ Pθ (ψb ∈ A) .
Research has focussed on making this statement more precise by resorting
to limit theory. In particular, research has studied the conditions under
which this approximation is reasonable and through sophisticated high order
asymptotic analysis has tried to reach for conclusions that are meaningful

9
even for moderately small samples. Our main concern in later sections will
be to examine the qualitative behavior of the various bootstrap methods in
small samples.

2 The Bootstrap as Bias Reduction Tool

As a first application of the bootstrap method we present its general utility
for reducing bias in estimates. The exposition is divided into four subsec-
tions. The first covers bias reduction when the functional form of the bias
is known. It is pointed out that bias reduction may or may not increase the
estimation accuracy, as measured by the mean squared error of the estimate.
This is illustrated with two examples. The second subsection shows that
the bootstrap can accomplish the same bias reduction without knowing the
functional form of the bias. The third subsection discusses the iteration of
the bias reduction principle, again assuming a known functional form of the
bias. The last subsection shows that this can be accomplished by the iterated
bootstrap method without knowing the functional bias form.

2.1 Simple Bias Reduction

Suppose we are interested in estimating a real valued functional ψ(θ) and
we use as estimate ψb = ψ(θ).
b Such estimates may be biased, i.e., (assuming
that expectations are finite)

Eθ ψ(θ)
b = ψ(θ) + b(θ)

with bias b(θ) 6≡ 0. This means that the mean Eθ ψ(θ) b of the ψ(θ) b dis-

tribution is not centered on the unknown value ψ(θ), but is off by the bias
amount b(θ).
If we know the functional form of the bias term b(θ), then the following “bias
reduced” estimate
b − b(θ)
ψbbr1 = ψ(θ) b

suggests itself. The subscript 1 indicates that this could be just the first
in a sequence of bias reduction iterations, i.e., what we do with ψb for bias
reduction we could repeat on ψbbr1 and so on, see Section 2.3.
Such a correction will typically reduce the bias of the original estimate ψ(θ),
b
but will usually not eliminate it completely, unless of course b(θ)
b is itself an
unbiased estimate of b(θ).

10
Note that such bias correction often entails more variability in the bias cor-
rected estimate due to the additional variability of the subtracted bias cor-
rection term b(θ).
b However, it is not clear how the mean squared error of the
estimate will be affected by such a bias reduction, since
2
M SEθ ψb = Eθ ψb − ψ = varθ ψb + b2 (θ) .

The reduction in bias may well be more than offset by the increase in the
variance. In fact, one has the following expression for the difference of the
mean squared errors of ψbbr1 and ψb
2 2
Eθ ψbbr1 − ψ − Eθ ψb − ψ b 2 − 2E b(θ)(
= Eθ b(θ) b ψ b − ψ) .
θ

There appears to be no obvious way of characterizing the nonnegativity of the

right side of this equation, i.e., when bias reduction would lead to increased
mean squared error.
As illustration of this point we will present two examples, where the variances
increase in both cases and the mean squared errors go in either direction,
respectively. In the setting of Example 2 consider first estimating ψ = ψ(θ) =
ψ(µ, σ) = σ 2 . When we use the maximum likelihood estimates
n n
1X 1X
µb = X̄ = Xi and σb 2 = (Xi − X̄)2 ,
n i=1 n i=1

we find for ψb = ψ(θ) b2

b =σ

σ2
Eθ σb 2 = σ 2 − ,
n
i.e., the bias is b(θ) = −σ 2 /n. The bias reduced version is

2 σb 2
σbbr1 = σb 2 + .
n
Here one finds
2
n+1

2
varθ σbbr1 = varθ σb 2 > varθ σb 2
n
and
2 σ4
M SE(σb 2 ) = Eθ σb 2 − σ 2 = (2n3 − n2 ) ,
n4
11
2

2
2 σ4
M SE(σbbr1 ) = Eθ σbbr1 − σ2 = 2(n + 1) 2
(n − 1) + 1
n4
and thus
M SE(σb 2 ) < M SE(σbbr1
2
) for n > 1 ,
since

2(n + 1)2 (n − 1) + 1 − (2n3 − n2 ) = (3n + 1)(n − 1) > 0 for n > 1 .

As a second example consider estimating ψ = ψ(θ) = µ2 by ψb = ψ(θ)

b = X̄ 2 ,
again in the setting of Example 2. We find
σ2
Eθ X̄ 2 = µ2 + ,
n
i.e., with bias reduced version

σb 2
ψbbr1 = X̄ 2 − .
n
Here we find

varθ (X̄)2 − σb 2 /n = varθ (X̄)2 + varθ σb 2 /n > varθ (X̄)2

and
µ2 σ 2 σ4
M SE(ψ)
b =4 +3 2 ,
n n
2 2 4
µσ σ 2n − 1

M SE(ψbbr1 ) = 4 + 2 2+
n n n2
and thus clearly

M SE(ψbbr1 ) < M SE(ψ)

b for n > 1 ,

since
2n − 1 n2 − 2n + 1 (n − 1)2

3− 2+ = = > 0 for n > 1 .
n2 n2 n2

12
2.2 Bootstrap Bias Reduction
In many problems the functional form of the bias term b(θ) is not known. It
turns out that the bootstrap provides us with just the above bias correction
without having any knowledge of the function b(θ). Getting a bootstrap
sample of estimates ψb1? , . . . , ψbB? from Pθb we can form their average
B
1 X
ψ̄B? = ψbi? .
B i=1

By the LLN

ψ̄B? −→ Eθb ψb? = ψ(θ) b as B → ∞ ,
b + b(θ)

so that ψ̄B? − ψ(θ)

b is an accurate approximation of b(θ).
b This can be as
accurate as we wish by taking B sufficiently large. Thus we can take as
bootstrap bias corrected estimate
? b − (ψ̄ ? − ψ(θ)) b − ψ̄ ? .
ψbbr1 = ψ(θ) B
b = 2ψ(θ)
B

For large enough B this will be indistinguishable from ψbbr1 , for all practical
purposes.

2.3 Iterated Bias Reduction

The bias reduction technique discussed in Section 2.1 can obviously be iter-
ated, as was already hinted in explaining the subscript 1 on ψbbr1 . This works,
b − b(θ)
since ψbbr1 = ψ(θ) b is again a function of θb and we thus denote it by
ψbr1 (θ). Suppose ψbr1 is still biased, i.e.,
b b

Eθ ψbr1 (θ)
b = ψ(θ) + b (θ) .
1

We can also express this as

Eθ ψbr1 (θ) b − b(θ)
b = E ψ(θ) b = ψ(θ) + b(θ) − E b(θ)
b .
θ θ

From these two representations we get

n o
b1 (θ) = − Eθ b(θ)
b − b(θ) = E −b(θ)
θ
b − (−b(θ))

13
and thus we can interpret b1 (θ) as the bias of −b(θ)
b for estimating −b(θ).
The second order bias reduced estimate thus becomes
ψbbr2 = ψbr2 (θ)
b = ψ (θ)b − b (θ)
b
br1 1
h i
b − b(θ)
= ψ(θ) b − E b(θb? )
b − b(θ)
θb

b + E b(θb? ) ,
b − 2b(θ)
= ψ(θ) θb

where the θb? inside the expectation indicates that its distribution is governed
by θ,
b the subscript on the expectation. Since ψ (θ)
br2
b is a function of θ,b we
can keep on iterating this scheme and even go to the limit with the iterations.
In the two examples of Section 2.1 the respective limits of these iterations
result ultimately in unbiased estimates of σ 2 and µ2 , respectively. In the case
of the variance estimate the ith iterate gives
1 1 1 − 1/ni+1

2
σbbri = σb 2 + + · · · + 1 = b2
σ
ni ni−1 1 − 1/n
n
→ σb 2 = s2 as i → ∞ ,
n−1
where s2 is the usual unbiased estimate of σ 2 . In the case of estimating µ2
the ith iterate gives
1 1 n − 1 1 − 1/ni

2 2
ψb
bri = X̄ − σb + ··· + i = X̄ 2 − s2
n n n2 1 − 1/n
s2
→ X̄ 2 − as i → ∞ ,
n
the latter being the conventional unbiased estimate of µ2 . In both exam-
ples the resulting limiting unbiased estimate is UMVU, i.e., has uniformly
minimum variance among all unbiased estimates of the respective target.
According to Hall (1992, p. 32) it is not always clear that these bias reduc-
tion iterations should converge to something. He does not give examples.
Presumably one may be able to get such examples from situations, in which
unbiased estimates do not exist. Since the analysis for such examples is com-
plicated and often involves estimates with infinite expectations, we will not
pursue this issue further.

2.4 Iterated Bootstrap Bias Reduction

Here we will examine to what extent one can do the above bias reduction
iteration without knowing the forms of the bias functions involved. We will

14
do this only for the case of one iteration since even that can stretch the
simulation capacity of most computers.
Suppose we have generated the ith bootstrap data set X?i and from it we have
obtained θbi? . Then we can spawn a second generation or iterated bootstrap
sample X?? ??
i1 , . . . , XiC from Pθb? . Each such iterated bootstrap sample then
i
results in corresponding estimates
?? ??
θbi1 , . . . , θbiC

and thus
?? ??
ψbi1 , . . . , ψbiC , with ??
ψbij?? = ψ θbij .
From the LLN we have that
C
1 X
ψbij?? → Eθb? ψ(θbi?? ) = ψ(θbi? ) + b(θbi? ) as C → ∞ .
C j=1 i

Here θbi?? inside the expectation varies randomly as governed by Pθb? , while θbi?
i

is held fixed, just as θb? would vary randomly as governed by Pθb, while θb is
held fixed and just as θb would vary randomly as governed by Pθ , while the
true θ is held fixed.
By the LLN and glossing over double limit issues we have that
B C B
1 X 1 X 1 X
AbBC = ψbij?? ≈ ψ(θbi? ) + b(θbi? ) → Eθb ψ(θb? ) + b(θb? )
B i=1 C j=1 B i=1

as C → ∞ and B → ∞. To a good approximation we thus have that

AbBC ≈ Eθb ψ(θb? ) + b(θb? ) = ψ(θ) b + E b(θb? )
b + b(θ)
θb

and hence
? b − 3ψ̄ ? + A
ψbbr2 = 3ψ(θ) B
b
BC

≈ 3ψ(θ)
b − 3 ψ(θ)
b + b(θ)
b + ψ(θ) b + E b(θb? )
b + b(θ)
θb

b + E b(θb? ) = ψ
b − 2b(θ)
= ψ(θ) br2 .
b
θb

? b? and ψb?? , as per

Note that ψbbr2 is evaluated completely in terms of ψ(θ),
b ψ
i ij
definition of ψ̄B? and AbBC , i.e., without knowledge of the bias functions b(·)
and b1 (·).

15
3 Variance Estimation
Suppose X ∼ Pθ and we are given an estimate ψb = ψ(X)b of the real valued
functional ψ = ψ(θ). We are interested in obtaining an estimate of the vari-
ance σψ2b(θ) of ψ.
b Such variance estimates are useful in assessing the quality

of the estimate ψ,b especially if the distribution of ψ

b is approximately nor-
mal, as is often the case in large samples. However, such variance estimates
are also useful in Studentizing estimates, as for example in the percentile-t
bootstrap method of Section 3.4. Here we will briefly mention three general
variance estimation procedures. The first is the jackknife method, the second
is the substitution method and the third is a bootstrap implementation of
the substitution method, that bypasses a major obstacle of the substitution
method.

3.1 Jackknife Variance Estimation

When the data vector X represents a random sample of size n, i.e., X =
(X1 , . . . , Xn ), it often is possible to provide such variance estimates by the
jackknife method. See Efron (1982) for a general account. Here we will only
briefly indicate the construction of such variance estimates. Let ψb(−i) denote
the estimate ψb when it is computed from all observations but the ith one and
let n
1X
ψb(·) = ψb(−i) .
n i=1
Then the jackknife estimate of the variance σψ2b(θ) is given by

n
2 n−1X 2
σbψJ
b = ψb(−i) − ψb(·) .
n i=1

Unfortunately, this variance estimate is not always reasonable. For example,

if ψ and ψb are population and sample median, respectively, then the above
jackknife variance estimate behaves badly in large samples and presumably
also in not so large samples, see Efron (1982) for details.
Furthermore, a data vector often has much richer structure than allowed for
in a pure random sample scenario. For more complicated structures it is
not always clear how to extend the above notion of the jackknife variance
estimate.

16
3.2 Substitution Variance Estimation
Another general variance estimation procedure is based on the following sub-
stitution idea. Knowing the functional form of σψ2b(θ) (as a function of θ), it
would be very natural to simply estimate σψ2b(θ) by replacing the unknown
parameter θ by θ,
b namely use as variance estimate

σbψ2b = σψ2b(θ)
b .

Whether σbψ2b itself is a reasonable estimate of σψ2b(θ) is another question. In

order for this procedure to be reasonable σψ2b(θ) needs to be a continuous
function of θ, and θb would have to be a reasonable estimate of θ, i.e., θb be
sufficiently near θ.

3.3 Bootstrap Variance Estimation

The applicability of the above natural substitution procedure is quite general
and it can be carried out provided we have the functional form of σψ2b(θ) as
a function of θ. Unfortunately, this functional form is usually not known. It
turns out that the bootstrap method provides a very simple algorithm for
getting accurate approximations to σbψ2b.
If Gθ denotes the distribution function of ψb with variance σψ2b(θ), then Gθb
denotes the distribution function of ψb? with variance σψ2b(θ). b Here ψb? is
obtained as estimate from X? , which is generated from Pθb. In this fashion
we can get a bootstrap sample of estimates ψb1? , . . . , ψbB? and we can compute
the sample variance of these bootstrap estimates as
B B
1 X ¯
2
¯ 1 X
σb 2b = ?
ψi − ψ
b ? ?
, where ψ = ψbi? .
ψB B − 1 i=1 B i=1

This sample variance is an unbiased estimate of σψ2b(θ)b and its accuracy can

be controlled by selecting B sufficiently large, again appealing to the LLN .

Thus for all practical purposes we can evaluate the substitution variance
estimate σbψ2b by using σbψB
2
b instead. Note that this process does not require
the functional form of σψ2b(θ).
As an illustration
R
we will use Example 1. There consider estimating the mean
ψ(F ) = µ = xdF (x), using ψb = ψ(Fb ) = X̄, with θb = Fb , the empirical

17
distribution function of the sample, estimating θ = F . From analytical
considerations we know that

2 σ 2 (F )
σX̄ (F ) = ,
n
where σ(F ) is the standard deviation of F . The substitution principle would
estimate σ 2 (F )/n by σ 2 (Fb )/n, where
n
2
1X 2
σ F =
b Xi − X̄ .
n i=1

This σ 2 Fb is the variance of Fb , which places probability 1/n on each of
the Xi , whence the computational formula. Instead of using the analytical
2
form of σX̄ (F ) and substitution, the bootstrap variance estimation method
generates B samples, of size n each, from Fb and computes the B sample
averages X̄1? , . . . , X̄B? of these samples. For large B the sample variance
B B
1 X 2 1 X
2
σbX̄B = X̄i? − X̄¯? , where X̄¯? = X̄ ?
B − 1 i=1 B i=1 i

will then be an accurate approximation of σ 2 (Fb )/n. This approximation only

requires that we evaluate the averages X̄i? and form their sample variance.
No other analytic formula is required in this approach.
2
By the LLN we again have that σbX̄B and σ 2 (Fb )/n are essentially identical for
very large B. Of course, here it seems silly to conduct this many simulations
and compute the sample variance from such a large bootsstrap sample of
estimates, when we could have computed σ 2 (Fb )/n directly from the original
sample. However, this simpler analytic approach is not always available
to us, whereas the bootstrap method is applicable universally for variance
estimation. The purpose of this example is to show that both approaches
reach the same goal.
Here it is worth pointing out that a random sample X? of size n taken from
Fb amounts to sampling n times with replacement from the original sample
X1 , . . . , Xn . Since Fb places probability 1/n on each of the Xi , each Xi has the
same chance of being selected. Since the resampled observations need to be
independent, this sampling from {X1 , . . . , Xn } has to be with replacement.

18
4 Bootstrap Confidence Bounds
There are many methods for constructing bootstrap confidence bounds. We
will not describe them all in detail. The reason for this is that we wish to
emphasize the basic simplicity of the bootstrap method and its generality of
applicability. Thus we will shy away from any bootstrap modifications which
take advantage of analytical devices that are very problem specific and limit
the generic applicability of the method.
We will start by introducing Efron’s original percentile method, followed by
its bias corrected version. The accelerated bias corrected percentile method is
not covered as it seems too complicated for general application. It makes use
of a certain analytical adjustment, namely the acceleration constant, which is
not easily determined from the bootstrap distribution. It is not entirely clear
to us whether the method is even well defined in general multiparameter sit-
uations not involving maximum likelihood estimates. These three percentile
methods are equivariant under monotone transformations on the parameter
to be estimated.
Next we will discuss what Hall calls the percentile method and the Student-t
percentile method. Finally, we discuss several double bootstrap methods,
namely Beran’s prepivoting, Loh’s calibrated bootstrap, and the automatic
double bootstrap. These, but especially the last one, appear to be most
promising as far as coverage accuracy in small samples is concerned. How-
ever, they also are computationally most intensive. As we go along, we
illustrate the methods with specific examples. In a case study we will further
illustrate the relative merits of all these methods for small sample sizes in the
context of estimating a normal quantile and connect the findings with the
approximation rate results given in the literature. All of these investigations
concentrate on parametric bootstrap methods, but the definitions are gen-
eral enough to allow them to be used in the nonparametric context as well.
However, in nonparametric settings it typically is not feasible to investigate
the small sample coverage properties of the various bootstrap methods, other
than by small sample asymptotic methods or by doubly or triply nested sim-
ulation loops, the latter being prohibitive. We found that the small sample
asymptotics are not very representative of the actual small sample behavior
in the parametric case. Thus the small sample asymptotic results in the
nonparametric case are of questionable value in really small samples.
Throughout our treatment of confidence intervals, whether by simple boot-
strap or by double bootstrap methods, it is often convenient to assume that

19
the distribution functions Fθ of the estimates ψb are generally continuous and
strictly increasing on their support {x : 0 < Fθ (x) < 1}. These assumptions
allow us to use the probability integral transform result, which states that
U = Fθ (ψ) b ∼ U (0, 1), and quantities like F −1 (p) are well defined without
θ
complications. Making this blanket assumption here saves us from repeating
it over and over. In some situations it may well be possible to maintain
greater validity by arguing more carefully, but that would entail inessential
technicalities and distract from getting the basic bootstrap ideas across. It
will be up to the reader to perform the necessary detail work, if such gener-
ality is desired. If we wish to deviate from the above tacit assumption, we
will do so explicitly.

4.1 Efron’s Percentile Bootstrap

This method was introduced by Efron (1981). Hall (1992) refers to this
also as the “other percentile method,” since he reserves the name “per-
centile method” for another method. In Hall’s scheme of viewing the boot-
strap Efron’s method does not fit in well and he advances various arguments
against this “other percentile method.” However, he admits that the “other
percentile method” performs quite well in the double bootstrap approach.
We seem to have found the reason for this as the section on the automatic
double bootstrap will make clear. For this reason we prefer not to use the
abject term “other percentile method” but instead call it “Efron’s percentile
method.” However, we will usually refer to the percentile method in this
section and only make the distinction when confusion with Hall’s percentile
method is to be avoided. We will first give the method in full generality,
present one simple example illustrating what the method does for us, show
its transformation equivariance and then provide some justification in the
single parameter case.

4.1.1 General Definition

Suppose X ∼ Pθ and we are interested in confidence bounds for the real
valued functional ψ = ψ(θ). We also have available the estimate θb of θ
and estimate ψ by ψb = ψ(θ). b Hence we can obtain a bootstrap sample of
estimates ψb1? , . . . , ψbB? from Pθb. The scatter in these bootstrap values should
reflect to some degree the uncertainty in our original estimate ψb of ψ. Hence

20
an appropriately chosen high value of the ordered bootstrap sample
? ?
ψb(1) ≤ . . . ≤ ψb(B)

might serve well as upper confidence bound for ψ. This has some intuitive
appeal, but before completely subscribing to this intuition the reader should
wait until reading the section on Hall’s percentile method.
To make the above definition more precise we appeal to the LLN . For
sufficiently large B we can treat the empirical distribution of the bootstrap
sample of estimates
B
b (t) = 1
X
G B I ?
B i=1 [ψbi ≤t]
as a good approximation to the distribution function Gθb(t) of ψb? , where

Gθb(t) = Pθb ψb? ≤ t .

Solving
Gθb(t) = 1 − α for t = ψbU (1 − α) = G−1
θb
(1 − α)
we will consider ψbU (1 − α) as a nominal 100(1 − α)% upper confidence bound
for ψ. For large B this upper bound can, for practical purposes, also be
obtained by taking G b −1 (1 − α) instead of G−1 (1 − α). This substitution
B θb
amounts to computing m = (1 − α)B and taking the mth of the sorted
? ? ?
bootstrap values, ψb(1) ≤ . . . ≤ ψb(B) , namely ψb(m) , as our upper bound. If
m = (1 − α)B is not an integer, one may have to resort to an interpolation
? ?
scheme for the two bracketing order statistics ψb(k) and ψb(k+1) , where k is the
largest integer ≤ m. In that case define

? ? ? ?
ψb(m) = ψb(k) + (m − k) ψb(k+1) − ψb(k) .
?
When B is sufficiently large, this bootstrap sample order statistic ψb(m) is a
−1
good approximation of Gθb (1 − α). Similarly, one defines

ψbL (α) = G−1

θb
b −1 (α)
(α) ≈ G B

as the corresponding nominal 100(1 − α)% lower confidence bound for ψ.

With ` = αB, it can be obtained as the `th order statistic ψb(`)
?
of the bootstrap
?
sample of estimates. If ` is not an integer, one finds ψ(`) by interpolation as
b
above.

21
Together these two bounds comprise a nominal 100(1 − 2α)%, equal tailed
confidence interval for ψ. These are the bounds according to Efron’s per-
centile method. The qualifier “nominal” indicates that the actual coverage
probabilities of these bounds may be different from the intended or nominal
confidence level.
The above construction of upper bound, lower bound, and equal tailed inter-
val shows that generally one only needs to know how to construct an upper
bound. At times we will thus only discuss upper or lower bounds.
In situations where we deal with independent, identically distributed data
samples, i.e., X = (X1 , . . . , Xn ) with X1 , . . . , Xn i.i.d. ∼ Fθ , one can show
under some regularity conditions
√ that for large sample size n the coverage
error is proportional to 1/ n for the upper and lower bounds, respectively.
Due to fortuitous error cancellation the coverage error is proportional to 1/n
for the equal tailed confidence interval. What this may really mean in small
samples will later be illustrated in some concrete examples.

4.1.2 Example: Bounds for Normal Mean

At this point we will illustrate the method with a very simple example in
which the method works very well. The example is presented here to show
what the bootstrap method does for us, as compared to analytical methods.
Suppose we have a random sample X = (X1 , . . . , Xn ) from a normal popula-
tion with unknown mean µ, but with known variance σ02 . Here the classical
(1 − α)% upper confidence bound for the mean µ is obtained as
σ0
µb U (1 − α) = X̄ + z1−α √ ,
n
where X̄ is the sample mean and z1−α = Φ−1 (1 − α) is the (1 − α)-quantile
of the standard normal distribution function Φ. This bound is based on the
fact that X̄ has a normal distribution with mean µ and variance σ02 /n. This
is so well known, that it is in the subconscious of most statisticians and one
forgets that this is actually an analytical result.
In the bootstrap method we would start with an estimate of the unknown
parameter. For simplicity we will take the natural estimate µb = X̄ and
will discuss later what would happen if other estimates were chosen. When
b σ02 ) and computing the
resampling bootstrap samples X1 , . . . , XB from N (µ,
resulting bootstrap sample of estimates
(µb ?1 , . . . , µb ?B ) = (X̄1? , . . . , X̄B? ) ,

22
we know that the empirical distribution function of this sample is a good
approximation of
!
t − µb
Gµb(t) = Pµb X̄ ? ≤ t = Φ √ ,
σ0 / n

where the latter equation describes the analytical fact that X̄ ? ∼ N (µ, b σ02 /n),
when X̄ ? is the sample mean of X1? , . . . , Xn? i.i.d. ∼ N (µ,
b σ02 ). The bootstrap
method does not know this analytical fact. We only refer to it to see what
the bootstrap percentile method generates. The percentile method takes the
(1 − α)-percentile of the bootstrap sample of estimates as upper bound. For
large B this percentile is an excellent approximation to G−1 b (1 − α), namely
µ
2
the (1 − α)-percentile of the N (µ,
b σ0 /n) population or

σ
G−1 −1
√0
b (1 − α) = µ + Φ (1 − α) n = µU (1 − α) .
µ
b b

Hence we wind up (approximately) with the classical upper bound just by

picking an appropriate percentile of the bootstrap sample of estimates. The
analytical results were only used to show that this is the case. They were
not used to find the percentile method upper bound. Here the percentile
bootstrap method comes up with confidence bounds which have the intended
coverage probabilities. This is an accident and is not a general phenomenon,
as will be explained in Section 4.1.4. The case where σ 2 is unknown as well
is examined later in the context of the bootstrap t-percentile method.
If we had chosen a different estimate for µ, such as the sample median or a
trimmed sample mean, there would be no conceptual difference in the ap-
plication of the percentile bootstrap method. The only thing that would
change is that we would compute this type of estimate for each of the resam-
pled samples X?i , i = 1, . . . , B.
Since the sampling distribution of sample mean or trimmed mean is continu-
ous and symmetric around µ we can deduce from the results in Section 4.1.4
that the corresponding percentile bootstrap confidence bounds will have ex-
act coverage rate. When using median or trimmed mean as estimates of µ,
the equivalent analytic description of these bounds is complicated and, in
the case of the trimmed mean, one has to resort to simulation.

23
4.1.3 Transformation Equivariance
The property of transformation equivariance is defined as follows. If we have
a “method” for constructing confidence bounds for ψ and if g(ψ) = τ is a
strictly increasing transformation of ψ, then we could try to obtain upper
confidence bounds for τ = τ (θ) by two methods. On the one hand we can
obtain an upper bound ψbU for ψ and treat g(ψbU ) as upper bound for τ with
the same coverage proability, since

1 − α = P ψbU ≥ ψ = P g(ψbU ) ≥ τ .

We refer to this approach as the indirect method. On the other hand we

could apply our “method” directly to τ = τ (θ) without reference to ψ, i.e.
obtain τbU . If both applications of our method (direct and indirect) lead to
the same result, then we say that the “method” is transformation equiv-
ariant. This property is very natural and desirable. It basically says that
the method is independent of the way the probability model for the data
is parametrized. As it turns out, the percentile method discussed here is
transformation equivariant.
The proof of this assertion is based on the identity

τ (θ) = g(ψ(θ))

and thus on
τb? = τ (θb? ) = g(ψ(θb? )) = g(ψb? ) .
This in turn implies

Hθb(t) = Pθb (τb? ≤ t) = Pθb g(ψb? ) ≤ t = Pθb ψb? ≤ g −1 (t) = Gθb g −1 (t)

and thus
Hθb−1 (p) = g G−1
θb
(p) .
The percentile method applied to τb yields as upper bound

τbU = Hθb−1 (1 − α) = g G−1
θb
(1 − α) = g ψbU ,

i.e., we have the desired transformation equivariance relation between τbU and
ψbU .

24
4.1.4 A Justification in the Single Parameter Case
In this subsection we will describe conditions under which the percentile
method will give confidence bounds with exact coverage probabilities. In
fact, it is shown that the percentile method agrees with the classical bounds
in such situations.
Let θb = θ(X)
b be an estimate of θ and let ψb = ψ(θ) b be the estimate of
ψ, the real valued parameter of interest. Consider the situation, in which
the distribution of ψb depends only on ψ and not on any other nuisance
parameters, although these may be present in the model. Thus we essentially
deal with a single parameter problem. Suppose we want to get confidence
bounds for ψ = ψ(θ). Then ψb has distribution function

Gψ (t) = Pψ ψb ≤ t .
Here we write Pψ instead of Pθ because of the assumption made concerning
the distribution of ψ.
b In order to keep matters simple we will assume that
Gψ (t) is continuous in both t and ψ and that Gψ (t) & in ψ for fixed t. The
latter monotonicity assumption is appropriate for reasonable estimates, i.e.,
for responsive estimates that tend to increase as the target ψ increases.
Using the probability integral transform we have that U = Gψ (ψ) b is dis-
tributed uniformly over [0, 1]. Thus

1 − α = Pψ Gψ (ψ)
b ≥α =P ψ≤ψ
ψ
b
[1−α]

where ψb[1−α] solves

Gψb (ψ)
b =α
[1−α]

and the above equation results from the equivalence

b ≥G
Gψ (ψ) b = α ⇐⇒ ψ ≤ ψ
(ψ) [1−α] ,
b
ψ
b [1−α]

invoking the monotonicity of Gψ in ψ. Hence we can regard ψb[1−α] as a

100(1 − α)% upper confidence bound for the parameter ψ.
Now suppose further that there is a monotonically increasing function g and
a constant τ > 0 such that
τ {g(ψ)
b − g(ψ)} ∼ Z or g(ψ)
b ∼ g(ψ) + Z/τ ,

where Z has a fixed distribution function H(z) which is assumed to be sym-

metric around 0. This assumption alludes to the fact that sometimes it is

25
possible to transform estimates in this fashion so that the resulting distri-
bution is approximately standard normal, i.e., Z above would be a standard
normal random variable. The consequence of this transformation assump-
tions is that the percentile method will yield the same upper bound ψb[1−α] ,
and it does so without knowing g, τ or H. Only their existence is assumed
in the above transformation.
Under the above assumption we find
n o
Gψ (t) = P ψb ≤ t b − g(ψ) ≤ τ {g(t) − g(ψ)}
= P τ g(ψ)
= H (τ {g(t) − g(ψ)}) . (1)

Using this identity with t = ψb and ψ = ψb[1−α] we have

n o
α = Gψb (ψ) b − g(ψ
b = H τ g(ψ)
[1−α] )
b
[1−α]

and thus

ψb[1−α] = g −1 g(ψ)
b − H −1 (α)/τ = g −1 g(ψ)
b + H −1 (1 − α)/τ ,

where the last equality results from the symmetry of H. From Equation (1)
we obtain further
n o
1 − α = Gψ G−1 −1
ψ (1 − α) = H τ g Gψ (1 − α) − g(ψ)

and thus
G−1
ψ (1 − α) = g
−1
g(ψ) + H −1 (1 − α)/τ
and replacing ψ by ψb we have

G−1
b (1 − α) = g
ψ
−1 b + H −1 (1 − α)/τ = ψ
g(ψ) b
[1−α] .

This means that we can obtain the upper confidence bound ψb[1−α] simply
by simulating the cumulative distribution function Gψb(t) and then solving
Gψb(t) = 1 − α for t = ψb[1−α] , i.e., generate a large bootstrap sample of
estimates ψb1? , . . . , ψbB? and for m = (1 − α)B take ψb(m)
?
, the mth ordered value
? ?
of ψb(1) ≤ . . . ≤ ψb(B) , as a good approximation to

G−1
b (1 − α) = ψ[1−α] .
ψ
b

When m is not an integer perform the usual interpolation.

26
4.2 Bias Corrected Percentile Bootstrap

When our estimate ψb consistently underestimates or overestimates the tar-

get ψ it would seem that a bias correction might help matters when setting
confidence intervals. This led Efron (1981) to propose also the following bias
corrected percentile bootstrap method. It is as easily implemented as the or-
dinary percentile method and it generally improves matters somewhat. The
transformation equivariance property is maintained, but there is a somewhat
arbitrary link to the normal distribution. However, for not so small sam-
ples a case can often be made that the normal approximation is appropriate
when dealing with properly transformed estimates. We give the general defi-
nition of the bias corrected percentile method, illustrate its application in the
simple example of estimating the normal variance, demonstrate the transfor-
mation equivariance, and present an exact coverage justification when the
distribution of ψb only depends on ψ and some other normalizing conditions
apply.

4.2.1 General Definition

Suppose X ∼ Pθ and we are interested in confidence bounds for the real
valued functional ψ = ψ(θ). We also have available an estimate θb of θ and
estimate ψ by ψb = ψ(θ).
b If this estimate satisfies

Gθ (ψ) = Pθ (ψb ≤ ψ) = .5

it is called median unbiased. For the bootstrap distribution Gθb this entails
Gθb(ψ)
b = .5. In order to correct for the bias in estimates that are not median
unbiased Efron proposed to compute the following estimated bias correction

x0 = Φ−1 Gθb(ψ)
b ,

which reduces to zero when ψb is median unbiased. Efron then suggested

ψbU bc = G−1
θb
(Φ(2x0 + z1−α ))

as nominal (1 − α)-level upper confidence bound for ψ. Here zp = Φ−1 (p).

Similarly,
ψbLbc = G−1
θb
(Φ(2x0 + zα ))

27
is the corresponding lower bound, i.e., 1 − α is replaced by α as we go from
upper bound to lower bound. Together these two bounds form an equal
tailed confidence interval for ψ, with nominal level (1 − 2α)%. Note that
these bounds revert to the Efron percentile bounds when x0 = 0, i.e., when
Gθb(ψ)
b = .5.

In practice, one proceeds by obtaining a bootstrap sample of estimates ψb1? , . . . ,

ψbB? from Pθb and with qb denoting the proportion of these bootstrap values
which are ≤ ψb one takes Φ−1 (q)
b as a good approximation of x0 . Next deter-
mine
q1−α = Φ(2x0 + z1−α ) ,
compute m = Bq1−α and take the mth ordered value of the ψb(1)? ?
≤ . . . ≤ ψb(B) ,
?
namely ψ(m) , as the (1 − α)-level upper confidence bound for ψ. This then is
b
the upper bound according to the bias corrected percentile method. If m is
not an integer, one performs the usual interpolation between the appropriate
? ?
bracketing order statistics ψb(k) and ψb(k+1) . A corresponding procedure is
carried out for the lower bound and combining the two bounds results in the
usual equal tailed confidence interval. Under certain regularity conditions
(see Hall, 1992) one can show,
√ in the i.i.d. case with sample size n, that the
coverage error is of order 1/ n for either of the bounds and of order 1/n for
the interval.

4.2.2 Example: Bounds for Normal Variance

In the context of Example 2 we are here interested in confidence bounds for
ψ(θ) = ψ(µ, σ) = σ 2 . As estimates for θ = (µ, σ) we take the maximum
likelihood estimates µb and σb . The variance estimate ψb = ψ(µ,
b σ b 2 is not
b) = σ
median unbiased since

Gθ (ψ) = Pθ σb 2 ≤ σ 2 = P (V ≤ n) = χn−1 (n) = Gθb(ψ)
b ,

where V = nσb 2 /σ 2 has a chi-square distribution with n−1 degrees of freedom,

with distribution function denoted by χn−1 (·). The table below illustrates
how far from median unbiased σb 2 is, even for large samples.

n χn−1 (n) n χn−1 (n) n χn−1 (n) n χn−1 (n)

2 0.843 6 0.694 10 0.650 50 0.567
3 0.777 7 0.679 15 0.622 100 0.547
4 0.739 8 0.667 20 0.605 200 0.533
5 0.713 9 0.658 30 0.586 500 0.521

28
For the following it is useful to get the distribution function of ψb = σb 2
explicitly as follows

Gθ (x) = Pθ (σb 2 ≤ x) = P (V ≤ nx/σ 2 ) = χn−1 (nx/σ 2 ) .

Its inverse is
σ2
G−1 −1
θ (p) = χn−1 (p) ,
n
where χ−1
n−1 (p) is the inverse of χn−1 .
Rather than simulating a bootstrap sample of estimates ψb1? , . . . , ψbB? , we pre-
tend that B is very large, say B = ∞, so that we actually have knowledge
of the exact bootstrap distribution

Gθb(x) = Pθb σb 2? ≤ x = χn−1 (nx/σb 2 ) .

This allows us to write down the bias corrected bootstrap confidence bounds
in compact mathematical notation and analyze its coverage properties with-
out resorting to simulations. However, keep in mind that this is not necessary
in order to get the bounds. They can always be obtained from the bootstrap
sample, as outlined in Section 4.2.1.
The upper confidence bound for ψ = σ 2 obtained by the bias corrected
percentile method can be expressed as

ψbU bc = Gθ−1
b Φ 2Φ
−1
Gθb(ψ)
b +z
1−α

= Gθ−1
b Φ 2Φ−1
(χ n−1 (n)) + z1−α
b2
σ
= χ−1
n−1 Φ 2Φ
−1
(χn−1 (n)) + z1−α .
n
In comparison, the ordinary Efron percentile upper bound can be expressed
as
σb 2
ψbU = G−1 (1 − α) = χ −1
n−1 (1 − α) .
θb n
The actual coverage probabilities of both bounds are given by the following
formulas:

Pθ ψbU ≥ ψ = Pθ χ−1 b 2 /n ≥ σ 2 = P (V ≥ n2 /χ−1
n−1 (1 − α)σ n−1 (1 − α))

= 1 − χn−1 (n2 /χ−1

n−1 (1 − α))

29
Figure 1: Actual − Nominal Coverage Probability
of 95% Upper & Lower Bounds and Asymptotes
0.4
0.2

● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ●
● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ●
0.0

●
● ●
● ● ● ●
●
●●● ● ●
● ● ●
coverage error

● ● ●
● ●●● ● ●
●● ● ●
● ● ●
● ● ●
● ●
●
●●● ●
● ●●
● ● ●
●
−0.2

●
●
● ●
●
●
●

●
●
−0.4

Efron percentile method

bias corrected perc. method n=3 ●
−0.6

n=2 ●

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

30
Figure 2: Actual − Nominal Coverage Probability
of 90% Confidence Intervals and Asymptotes
0.0

●
●
●●
●● ●●
● ●
●●●●
● ●●●
● ● ● ●
● ●
●
●
●
●●
●
−0.1

●●
●
● ●
●
●
●
● ●
●
●
●
−0.2

●
●

●
coverage error

●
−0.3

●
●
−0.4

n=3 ●

Efron percentile method

−0.5

bias corrected perc. method

n=2
−0.6

0.0 0.1 0.2 0.3 0.4 0.5

1 n

31
and
h i
Pθ ψbU bc ≥ ψ = Pθ χ−1 −1
n−1 Φ 2Φ (χn−1 (n)) + z1−α σb 2 /n ≥ σ 2
h i
= P V ≥ n2 /χ−1 −1
n−1 Φ 2Φ (χn−1 (n)) + z1−α
h i
= 1 − χn−1 n2 /χ−1 −1
n−1 Φ 2Φ (χn−1 (n)) + z1−α .
The coverage probabilities for the corresponding lower bounds are the com-
plement of the above probabilities with 1 − α replaced by α.
Figure 1 shows the coverage error (actual − nominal coverage rate) of nom-
inally 95% upper √ and lower confidence bounds for σ 2 plotted against the
theoretical rate 1/ n, for sample sizes n = 2, . . . , 20, 30, 40, 50, 100, 200, 500,
1000, 2000. The asymptotes are estimated by drawing lines through (0, 0)
and the points corresponding to n = 2000. Note the symmetry of the asymp- √
totes around the zero line, confirming the error cancellation of order 1/ n.
However, the sample size has to be fairly large, say n ≥ 30, before the asymp-
totes reasonably approximate the coverage error. The coverage error of the
upper bounds is negative and quite substantial for moderate n, whereas that
of the lower bounds is positive and small even for moderate n. Through-
out, the coverage error of the bias corrected percentile method appears to be
smaller than that of the Efron percentile method by a factor of at least two.
Figure 2 shows the coverage error (actual − nominal coverage rate) of the
corresponding nominally 90% confidence intervals for σ 2 plotted against the
theoretical rate of 1/n. The approximation to the asymptotes is good for
much smaller n here. Again the bias corrected version is better by a factor
of at least two and for large n by a factor of three.

4.2.3 Transformation Equivariance

Again assume that the parameter of interest is the transform τ = g(ψ), with
g strictly increasing and define τb = g(ψ)b as its estimate. The bias corrected
percentile method applied directly to the estimate τb yields as (1 − α)-level
upper bound for τ
τbU bc = Hθb−1 (Φ(2y0 + z1−α ))
with
y0 = Φ−1 Hθb(τb)
and

Hθb(t) = Pθb (τb? ≤ t) = Pθb g(ψb? ) ≤ t = Pθb ψb? ≤ g −1 (t) = Gθb g −1 (t) .

32
Thus we have
h h ii
y0 = Φ−1 Gθb g −1 g(ψ)
b = Φ−1 Gθb(ψ)
b =x
0

and with Hθb−1 (·) = g Gθ−1
b (·) we can write

τbU bc = g Gθ−1
b (Φ(2x0 + z1−α )) = g ψU bc ,
b

i.e., the bound has the transformation equivariance property.

4.2.4 A Justification in the Single Parameter Case

Let θb = θ(X)
b be an estimate of θ and let ψb = ψ(θ) b be the estimate of
ψ = ψ(θ), the real valued parameter for which we desire confidence bounds.
Consider again the situation in which the distribution of ψb depends only on
ψ and not on any other nuisance parameters, although these may be present
in the model. Thus we essentially deal with a single parameter problem.
Then ψb has distribution function

Gψ (t) = Pψ ψb ≤ t .

In order to keep matters simple we will assume that Gψ (t) is continuous

in both t and ψ and that Gψ (t) & in ψ for fixed t. These are the same
assumptions as in Section 4.1.4, where it was shown that this results in exact
coverage confidence bounds for ψ. The exact upper confidence bound ψb[1−α]
for ψ is found as solution to

Gψb (ψ)
b =α.
[1−α]

Here we assume the existence of an increasing function g and constants τ > 0

and x0 such that

τ {g(ψ)
b − g(ψ)} + x ∼ Z or g(ψ)
0
b ∼ g(ψ) − x /τ + Z/τ ,
0

where Z has distribution function H(z), which now is assumed to be standard

normal, i.e., H(z) = Φ(z). Thus, to some extent we have widened the scope
over the corresponding assumption in Section 4.1.4 by allowing the bias term
x0 , but we also impose the restriction that H has to be standard normal.
This restriction may seem severe, but in many situations the distribution

33
of estimates, transformed in the above fashion, are well approximated by a
standard normal distribution. Given the above transformation assumption
it is shown below that the bias corrected percentile upper bound for ψ agrees
again with ψb[1−α] . A priori knowledge of g and τ is not required, they only
need to exist. The bias correction constant x0 , which figures explicitly in
the definition of the bias corrected percentile method, is already defined in
terms of the accessible bootstrap distribution Gθb(·). The remainder of this
subsection proves the above claim. The argument is somewhat convoluted
and may be skipped.
First we have

Gψ (t) = Pψ ψb ≤ t = Pψ τ [g(ψ)
b − g(ψ)] ≤ τ [g(t) − g(ψ)]

= P (Z ≤ x0 + τ [g(t) − g(ψ)]) = Φ (x0 + τ [g(t) − g(ψ)]) . (2)

Replacing (ψ, t) by (ψ,

b ψ)
b we have

b = Φ(x ) and thus x = Φ−1 G (ψ)
Gψb(ψ) b ,
0 0 ψ
b

agreeing with the original definition of the bias. The exact upper bound
ψb[1−α] is found by solving
Gψ (ψ)
b =α

for ψ. Using Equation (2) for t = ψb and ψ = ψb[1−α] we obtain

α = Gψ (ψ) b − g(ψ)] ,
b = Φ x + τ [g(ψ)
0

i.e.,
zα = Φ−1 (α) = x0 + τ [g(ψ)
b − g(ψ)]

or
b − g(ψ) = −(x − z )/τ = −(x + z
g(ψ) 0 α 0 1−α )/τ

and finally
1

−1
ψb
[1−α] =ψ=g g(ψ) + (x0 + z1−α ) .
b (3)
τ
On the other hand, using again Equation (2) (in the second identity below),
we have

Φ(2x0 + z1−α ) = Gψ G−1
ψ (Φ(2x0 + z1−α ))
h i
= Φ x0 + τ g G−1
ψ [Φ(2x0 + z1−α )] − g(ψ) .

34
Equating the arguments of Φ on both sides we have
h i
x0 + z1−α = τ g G−1
ψ [Φ(2x0 + z1−α )] − g(ψ)

or
1
(x0 + z1−α ) + g(ψ) = g G−1
ψ [Φ(2x 0 + z1−α )]
τ
and
1

g −1
(x0 + z1−α ) + g(ψ) = G−1
ψ [Φ(2x0 + z1−α )] .
τ
Replacing ψ by ψb on both sides and recalling Equation (3) we obtain

ψb[1−α] = G−1
b [Φ(2x0 + z1−α )] ,
ψ

i.e., the bias corrected percentile upper bound coincides with the exact upper
bound ψb[1−α] .

4.3 Hall’s Percentile Method

Hall (1992) calls this method simply the percentile method, whereas he refers
to Efron’s percentile method as “the other percentile method.” Using the
terms “Efron’s percentile method” and “Hall’s percentile method” we pro-
pose to remove any value judgment and eliminate confusion. It is not clear
who first initiated Hall’s percentile method, although Efron (1979) already
discussed bootstrapping the distribution of ψb − ψ, but not in the context of
confidence bounds. The method fits well within the general framework that
Hall (1992) has built for understanding bootstrap methods. We will first give
a direct definition of Hall’s percentile method together with its motivation,
illustrate it with an example and relate it to Efron’s percentile method. The
method is generally not transformation equivariant.

4.3.1 General Definition

Suppose X ∼ Pθ and we are interested in confidence bounds for the real
valued functional ψ = ψ(θ). We also have available the estimate θb of θ and
estimate ψ by ψb = ψ(θ).
b Instead of bootstrapping the distribution G of ψ
θ
b
we propose here to bootstrap the distribution Hθ of ψ − ψ, i.e.,
b

Hθ (x) = Pθ (ψb − ψ ≤ x) .

35
This can be done by simulating a bootstrap sample ψb1? , . . . , ψbB? and forming
ψb1? − ψ, b? − ψ
b ...,ψ
B
b,

whose empirical distribution function

B
1 X
H
c (x) =
B I ? b ,
B i=1 [ψbi −ψ≤x]
for large B, approximates

Hθb(x) = Pθb ψb? − ψb ≤ x .

Here ψb is held fixed within the probability statement Pθb (· · ·) and the term
ψb? = ψ(θ(X
b ? )) is random with X? generated from the probability model P .
θb
The bootstrap method here consists of treating Hθb(x) as a good approxi-
mation to Hθ (x), the latter being unknown since it usually depends on the
unknown parameter θ. Of course, H c (x) will serve as our bootstrap approxi-
B
mation to Hθb(x) and thus of Hθ (x). The accuracy of the first approximation
(Hc (x) ≈ H (x)) can be controlled by the bootstrap sample size B, but the
B θb
accuracy of Hθb(x) ≈ Hθ (x) depends on the accuracy of θb as estimate of the
unknown θ. The latter accuracy is usually affected by the sample size, which
often is governed by other considerations beyond the control of the analyst.
Hall’s percentile method gives the 100(1 − α)% upper confidence bound for
ψ as
ψbHU = ψb − Hθb−1 (α) ,
and similarly the 100(1 − α)% lower confidence bound as
ψbHL = ψb − Hθb−1 (1 − α) .
The remainder of the discussion will focus on upper bounds, since the dis-
cussion for lower bounds would be entirely parallel.
The above upper confidence bound is motivated by the exact 100(1 − α)%
upper bound
Ub = ψb − Hθ−1 (α) ,
since

Pθ (Ub > ψ) = Pθ ψb − Hθ−1 (α) > ψ

= 1 − Pθ ψb − ψ ≤ Hθ−1 (α) = 1 − Hθ Hθ−1 (α) = 1 − α .

36
However, Ub is not a true confidence bound, since it typically depends on
the unknown θ through Hθ−1 (α). The bootstrap step consists in sidestepping
this problem by approximating Hθ−1 (α) by Hθb−1 (α). For large enough B,
we can obtain Hθb−1 (α) to any accuracy directly from the bootstrap sample
of the Di = ψbi? − ψ. b Simply order the D , i.e., find its order statistics
i
D(1) ≤ D(2) ≤ . . . ≤ D(B) and, for ` = Bα, take the `th value D(`) as
approximation of Hθb−1 (α). If ` is not an integer interpolate between the
appropriate bracketing values of D(k) and D(k+1) . Note that it is not required
that we know the analytical form of Hθ . All we need to know is how to
create new bootstrap samples X?i from Pθb and thus estimates ψbi? and finally
Di = ψbi? − ψ.b
In the exceptional case, where Hθ−1 (α) is independent of θ, we have Hθb−1 (α) =
Hθ−1 (α) = H −1 (α) and then the resulting confidence bounds have indeed
exact coverage probabilities, if we allow B → ∞.
The basic idea behind this method is to form some kind of pivot, i.e., a
function of the data and the parameter of interest, which has a distribution
independent of θ. This would be successful if indeed Hθ did not depend on
θ. The distribution of ψb will typically depend on θ, but it is hoped that
it depends on θ only through ψ = ψ(θ). Further, it is hoped that this
dependence is of a special form, namely that the distribution of ψb depends
on ψ only as a location parameter, so that the distribution of ψb − ψ is free
of any unknown parameters.
Treating ψ as a location parameter is often justifiable on asymptotic grounds,
i.e., for large samples, but may be very misplaced in small samples. In small
samples there is really no compelling reason for focussing on the location
pivot ψb − ψ as a general paradigm. For example, in the normal variance ex-
ample discussed earlier and revisited below it would be much more sensible to
consider the scale pivot σb 2 /σ 2 instead of the location pivot σb 2 −σ 2 . Similarly,
when dealing with a random sample from the bivariate normal population
of Example 4, parametrized by θ = (µ1 , µ2 , σ1 , σ2 , ρ) and with the correla-
tion coefficient ρ = ψ(θ) as the parameter of interest, it would make little
sense, except in very large samples, to treat ρ as a location parameter for the
maximum likelihood estimate ρ. b
The focus on ψ − ψ as the proper pivot for Hall’s percentile method is mainly
b
justified on asymptotic grounds. The reason for this is that most theoretical
bootstrap research has focused on the large sample aspects of the various
bootstrap methods.

37
For other pivots one would have to make appropriate modifications in Hall’s
percentile method. This is presented quite generally in Beran (1987) and we
will illustrate it here with the scale pivot ψ/ψ,
b where it is assumed that the
parameter ψ is positive. Suppose the distribution function of ψ/ψ b is Hθ (x)
then
1 − α = Pθ ψ/ψb > Hθ−1 (α) = Pθ ψ < ψ/Hb −1
θ (α)
and replacing the unknown Hθ−1 (α) by Hθb−1 (α) gives us the Beran/Hall per-
centile method upper bound for ψ, namely
−1
ψbHU = ψ/H
b
θb
(α) .
From now on, when no further qualifiers are given, it is assumed that a lo-
cation pivot was chosen in Hall’s percentile method. This simplifies matters,
especially since it is not always easy to see what kind of pivot would be most
appropriate in any given situation, the above normal correlation example
being a case in point. Since large sample considerations give some support
to location pivots, this default is quite natural.

4.3.2 Example: Bounds for Normal Variances Revisited

Revisiting Example 2, with ψ = ψ(θ) = ψ(µ, σ) = σ 2 as parameter of in-
terest, we use again maximum likelihood estimates for θ = (µ, σ). We are
interested in bounds for ψ(θ) = σ 2 . The distribution function of the location
pivot D = σb 2 − σ 2 is

Hθ (x) = Pθ (D ≤ x) = Pθ σb 2 ≤ x + σ 2

= P V ≤ n + nx/σ 2 = χn−1 n + nx/σ 2 .
See Section 3.2.2 for the definition of V and χn−1 . Thus
χ−1
!
n−1 (α)
Hθ−1 (α) =σ 2
−1
n
and thus
χ−1
!
n−1 (α)
Hθb−1 (α) = σb 2
−1 ,
n
resulting in the bound
χ−1 (α)
!
2
σbHU 2
= σb − Hθb−1 (α) = σb 2
2 − n−1 .
n

38
Again we should remind ourselves that this analytic form of Hθb−1 (α) is not
required in order to compute the upper bound via the Hall percentile method.
However, it facilitates the analysis of the coverage rates of the method in this
example. This coverage rate can be expressed as
χ−1 (α)
! !

2
Pθ σbHU ≥ σ2 = Pθ σb 2 2 − n−1 ≥ σ2
n
n2 n2
! !
= P V ≥ = 1 − χn−1 .
2n − χ−1
n−1 (α) 2n − χ−1
n−1 (α)

Figure 1a is a repeat of Figure 1 (without asymptotes) with the coverage error

of the Hall percentile method added. There is little difference between the
Hall and Efron percentile√methods in this particular example. Note that the
rate is again of order 1/ n, which happens quite generally under regularity
conditions, see Hall (1992). The coverage error rates of the corresponding
confidence intervals, not shown here, are again of order 1/n.

4.3.3 Relation to Efron’s Percentile Method

In this subsection we will show that the two percentile methods (Efron’s and
Hall’s) agree when the sampling distribution Gθ (x) of ψb is continuous and
symmetric around ψ, i.e., when

Gθ (ψ + x) = 1 − Gθ (ψ − x) for all x.

In terms of the sampling distribution Hθ (x) of ψb −ψ this symmetry condition

is expressed as
Hθ (x) = 1 − Hθ (−x) for all x.
The relationship between Hθ and Gθ is the key to the equivalence of the two
percentile methods. Namely, we have

Hθ (x) = Pθ (ψb − ψ ≤ x) = Pθ (ψb ≤ x + ψ) = Gθ (x + ψ) .

Solving
1 − α = Hθb(x) = Gθb(x + ψ)
b

for x we get the following two representations for this (1 − α)-quantile x =

x1−α :
x1−α = Hθb−1 (1 − α) = G−1
θb
(1 − α) − ψb .

39
Figure 1a: Actual − Nominal Coverage Probability
of 95% Upper & Lower Bounds
0.4
0.2

● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ●
● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●
0.0

●
● ●
● ● ● ●
●
●●● ●
● ● ● ●
coverage error

● ● ●
● ●●● ● ●
●● ● ●
● ● ●
●
● ● ●
●
● ●
●
●●●
●● ●
●● ●●
●●
●●
● ● ●
● ●
●
−0.2

●
● ●
● ●
● ●
●
●
●
●
●
●
●
● ●
−0.4

●
●

Efron percentile method

bias corrected perc. method n=3 ●
●
Hall percentile method
−0.6

●
n=2 ●

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

1 n

40
Hall’s percentile upper bound is

ψbHU = ψb − Hθb−1 (α) = ψb + Hθb−1 (1 − α) ,

where the second equality results from the assumed symmetry of Hθb. Making
use of the dual representation of the above x1−α we find

ψbHU = ψb + G−1
θb
(1 − α) − ψb = G−1
θb
(1 − α) ,

which is nothing but Efron’s percentile method upper bound.

4.4 Percentile-t Bootstrap

In this subsection we discuss the percentile-t bootstrap method for construct-

ing confidence bounds and intervals. It appears that the method was first
introduced by Efron (1981). The method is motivated by revisiting the ex-
ample of confidence bounds for the normal mean, covered in Section 4.1.2
under the assumption of a known variance. This is followed by a general
definition and some comments.

4.4.1 Motivating Example

Before giving a definition of the percentile-t method we revisit the example
in Section 4.1.2. This time we will assume that both mean and standard
deviation of the sampled normal population are unknown and are estimated
by the maximum likelihood estimates µb and σb . If we were to apply Efron’s
percentile method to obtain the (1 − α)-level upper confidence bound for the
mean µ, we would be taking the (1 − α)-quantile of a large bootstrap sample
of estimates
(µb ?1 , . . . , µb ?B ) = (X̄1? , . . . , X̄B? ) .
These are obtained from bootstrap samples X?1 , . . . , X?B generated from the
N (µ,
b σb 2 ) distribution. This (1−α)-quantile is obtained as the mth value X̄(m)
?

among the ordered bootstrap sample of estimates

? ?
X̄(1) ≤ . . . ≤ X̄(B) ,

where m = (1 − α)B. If m is not an integer one performs the usual inter-

polation. For large B this bound approximately equals the (1 − α)-quantile

41
of the bootstrap distribution of X̄ ? . This distribution is N (µ,
b σb 2 /n) and its
(1 − α)-quantile is
σb
µb zU (1 − α) = X̄ + z1−α √ with z1−α = Φ−1 (1 − α).
n
Hence Efron’s percentile method results in the same bound as in Section 3.1.2
with the only difference being that the previously assumed known σ0 is re-
placed by the estimate σb . The multiplier z1−α remains unchanged. Compare
this with the classical upper confidence bound given by
s
σb n sb
µb tU (1 − α) = X̄ + tn−1 (1 − α) √ = X̄ + tn−1 (1 − α) √ ,
n n−1 n

where tn−1 (1 − α) is the (1 − α)-quantile of the Student t-distribution

q with
n − 1 degrees of freedom. This t factor, together with the factor n/(n − 1),
adjusts for the sampling variability of the estimate σb and results in exact
coverage probability for any sample size n ≥ 2.
In this particular example Hall’s percentile method agrees with Efron’s, be-
cause the sampling distribution of µb = X̄ is continuous and symmetric around
µ, see Section 3.3.3. In motivating the transition to the percentile-t method
we repeat the derivation in this specific case. Recall that in Hall’s percentile
method we appeal to the bootstrap distribution Hθb (with θb = (µ, b σ b )) of

µb ? − µb = X̄ ? − X̄

as an approximation to the sampling distribution Hθ of µb − µ = X̄ − µ. Ac-

cording to Hall’s percentile method, the (1−α)-level upper bound is obtained
by taking the α-quantile of Hθb and forming

σb σb
µb HU (1 − α) = µb − Hθb−1 (α) = X̄ − zα √ = X̄ + z1−α √ = µb zU (1 − α) .
n n

Bootstrapping the distribution of X̄ ? or X̄ ? −X̄ certainly mimics the sampling

variability of X̄ relative to µ, but it does not capture the sampling variability
of the estimate σb , which explicitly is part of the formula for µb HU = µb zU . Note
that the percentile method (Hall’s or Efron’s) uses σb only to obtain samples
from N (µ, b 2 ) and does not use the above formula to obtain µ
b σ b HU = µ b zU .
However, the formula is useful in showing explicitly what either of the two
percentile methods accomplishes in this example. Namely, the z-factor in

42
the above formula indicates, that the percentile methods act as though σb
is equal to the true (unknown) standard deviation σ, in which case the use
of the z-factor would be most appropriate. Since σb varies around σ from
sample to sample, this sampling variation needs to be accounted for in setting
confidence bounds.
The percentile-t method carries the pivoting step of Hall’s percentile method
(of bootstrapping X̄ − µ) one step further by considering a Studentized pivot
X̄ − µ
T = .
σb
If we knew the distribution function Kθ (x) of T we could obtain a (1 − α)
level upper confidence bound for µ as follows:
X̄ − Kθ−1 (α)σb
since
!
X̄ − µ
1−α=P ≥ Kθ−1 (α) = P X̄ − Kθ−1 (α)σb ≥ µ .
σb
The subscript θ on Kθ−1 (α) allows for the possibility that the distribution
of T may still depend on θ. In this particular example K is independent of
θ and thus X̄ − K −1 (α)σb is an exact (1 − α)-level upper confidence bound
for µ. To obtain K −1 (α) we can either appeal to tables of the Student-t
distribution, because for this particular example we know that
√ √
K −1 (α) = tn−1 (α)σb / n − 1 = −tn−1 (1 − α)σb / n − 1 ,
or, in a more generic approach, we can simulate the distribution K of T by
generating samples from N (µ, σ 2 ) for any θ = (µ, σ), since in this example
K is not sensitive to the choice of θ. However, for reasons to be explained in
the next section, we may as well simulate independent samples X?1 , . . . , X?B
from N (µ, b 2 ) and generate T1? , . . . , TB? with Ti? = (X̄i? − X̄)/σ
b σ b i? computed
from the ith bootstrap sample X?i . For very large B this simulation process
will approximate the bootstrap distribution K c = K of

X̄ ? − X̄
T? = .
σb ?
The percentile-t method constructs the (1 − α)-level upper confidence bound
as
c−1 (α)σ
µb tU = X̄ − K b .

43
For ` = αB we can consider the `th ordered value of T(1) ? ?
≤ . . . ≤ T(B) , namely
? −1
T(`) , as an excellent approximation to K (α). When αB is not an integer
c
one does the usual interpolation of the appropriate adjacent ordered values
? ?
T(k) and T(k+1) .
By bootstrapping the distribution of the Studentized ratio T we hope that
we capture to a large extent the sampling variability of the scale estimate
used in the denominator of T . That this may not be completely successful is
reflected in the possiblity that the distribution Kθ of T may still depend on
θ.
The above discussion gives rise to a small excursion, which is not an integral
part of the percentile-t method, but represents a rough substitute for it. Since
?
X̄(m) ≈ µb zU (1 − α), Efron (1982) considered the following t-factor patch to
the Efron percentile method, namely
s
tn−1 (1 − α) n ?
X̄ + X̄(m) − X̄ ,
z1−α n−1
with m = (1 − α)B. This patched version of the Efron percentile method
upper bound is approximately equal to the above µb tU (1 − α), as is seen from
? σb
X̄(m) ≈ µb zU (1 − α) = X̄ + z1−α √
n
? σb
⇒ X̄(m) − X̄ ≈ z1−α √
n
s s
tn−1 (1 − α) n ? σb n
⇒ X̄(m) − X̄ ≈ tn−1 (1 − α) √
z1−α n−1 n n−1
and thus
s
tn−1 (1 − α) n ?
X̄ + X̄(m) − X̄ ≈ µb tU (1 − α) .
z1−α n−1
This idea of patching the Efron percentile method can be applied to other
situations as well, especially when estimates are approximately normal. The
effect is to widen the bounds in order to roughly protect the coverage confi-
dence. In this particular example the patch works perfectly in that it results
in the classical bound. The patch is easily applied, provided we have a reason-
able idea of the degrees of freedom to use in the t-factor correction. However,
Efron (1982) warns against its indiscriminate use. Note also that in apply-
ing the patch we lose the transformation equivariance of Efron’s percentile
method.

44
4.4.2 General Definition
Suppose X ∼ Pθ and we are interested in confidence bounds for the real
valued functional ψ = ψ(θ). We also have available the estimate θb of θ and
estimate ψ by ψb = ψ(θ).b Furthermore, it is assumed that we have some scale
estimate σbψb, so that we can define the Studentized pivot

ψb − ψ
T = .
σbψb

In order for T to be a pivot in the strict sense, its distribution would have to
be independent of any unknown parameters. This is not assumed here, but
if this distribution Kθ depends on θ, it is hoped that it does so only weakly.
The (1 − α)-level percentile-t upper bound for ψ is defined as

ψbtU = ψb − Kθb−1 (α)σbψb .

Here Kθb−1 (α) is obtained by simulating the distribution Kθb of

ψb? − ψb
T? = .
σbψ?b

This is done by simulating samples X?1 , . . . , X?B from Pθb, generating T1? , . . . , TB? ,
with
ψb? − ψb
Ti? = i ?
σbψi
b
computed from the ith bootstrap sample X?i . For ` = αB take the `th ordered
?
value T(`) of the order statistics
? ?
T(1) ≤ . . . ≤ T(B)

as a good approximation of Kθb−1 (α). When ` = αB is not an integer, perform

?
the usual interpolation between the appropriate adjacent order statistics T(k)
?
and T(k+1) .
In the definition of the percentile-t upper bound the estimated quantile
Kθb−1 (α) was used instead of the more appropriate but unknown Kθ−1 (α).
Replacing the unknown parameter θ by the estimate θb has two motivations.
First, it is practical, because we know θ,
b and second, θb is in the vicinity

45
of the true, but unknown value of θ, and thus Kθb−1 (α) is likely to be more
relevant than taking any value of θ in Kθ−1 (α) and solely appealing to the
insensitivity of Kθ with respect to θ.
The above definition of percentile-t bounds is for upper bounds, but by
switching from α to 1 − α we are covering 1 − α lower bounds as well.
Combining (1 − α)-level upper and lower bounds we obtain (1 − 2α)-level
confidence intervals for ψ.

4.4.3 General Comments

The percentile-t bootstrap method appears to have improved coverage prop-
erties. In fact, under regularity conditions it can be shown (see Hall, 1988)
that the coverage √error of one-sided percentile-t bounds is of order 1/n, in
contrast to the 1/ n rate in the case of Hall’s and Efron’s percentile or bias
corrected percentile method.
The method shares with Hall’s percentile method the drawback that it is
generally not transformation equivariant.
Also, Studentization makes most sense when ψ is a location parameter, but
that is not always the case. In Example 4, with ψ = ρ, we can hardly treat ρ
as a location parameter and Studentization has performed poorly here. The
z-transform has been suggested as the appropriate cure for this problem,
namely applying the percentile-t method to the transformed parameters z =
1
2
log{(1 + ρ)/(1 − ρ)} and corresponding estimates. The confidence bounds
for z are then backtransformed to confidence bounds for ρ. However, this is
a very problem specific fix and not useful as a general bootstrap tool.
A further disadvantage of the percentile-t method is the source of its bet-
ter coverage properties, namely the explicit requirement of an appropriate
scale estimate for Studentization. Such a scale estimate, to serve its purpose,
should be distributionally proportional to the standard deviation of the orig-
inal estimate ψ.b Section 3 discusses several schemes for getting variance
estimates of estimates, of which the bootstrap method is the most versatile.
If we do employ bootstrap variance estimates in order to accomplish the Stu-
dentization, then that would require an extra level of simulations for each of
the Ti? to be generated. In effect, this would amount to some form of double
bootstrap, which is the topic of discussion in the next section, although not
from the percentile-t perspective.

46
5 Double Bootstrap Confidence Bounds
This section introduces two closely related double bootstrap methods for
constructing confidence bounds. Single bootstrapping amounts to generat-
ing B bootstrap samples, where B is quite large, typically B = 1, 000, and
computing estimates for each such bootstrap sample. In double bootstrap-
ping each of these B bootstrap samples spawns itself a set of A second order
bootstrap samples. Thus, all in all, A · B samples will be generated with
the attending data analyses to compute estimates, typically A · B + B of
them. If A = B = 1000 this amounts to 1, 001, 000 such analyses and is thus
computationally very intensive. This is a high computational price to pay,
especially when the computation of estimates θ(X)
b is costly to begin with. If
that cost grows with the sample size of X, one may want to limit its use only
to analyses involving small sample sizes, but that is the area where coverage
improvement makes most sense anyway. Before these methods will be used
routinely, progress will need to be made in computational efficiency. We hope
that some time soon clever algorithms will be found that reduce the effort of
A · B simulations to k · B, where k is of the order of ten. Such a reduction
would make these double bootstrap methods definitely the preferred choice
as a general tool for constructing confidence bounds.
It appears that methods based on double bootstrap approaches are most suc-
cessful in maintaining the intended coverage rates for the resulting confidence
bounds. A first application of the double bootstrap method to confidence
bounds surfaced in the last section when discussing the possibility of boot-
strap scale estimates to be used in the bootstrapped Studentized pivots of
the percentile-t method. Here we first discuss Beran’s (1987) method, which
is based on the concept of a root (a generalization of the pivot concept) and
the prepivoting idea. The latter invokes an estimated probability integral
transform in order to obtain improved pivots, which then are bootstrapped.
It is shown that Beran’s method is equivalent to Loh’s (1987) calibration
of confidence coefficients. This calibration uses the bootstrap method to
estimate the coverage error with the aim of correcting for it. The second it-
erated bootstrap method, proposed by Scholz (1992), automatically finds the
proper natural pivot when such pivots exist. This yields confidence bounds
with essentially exact coverage whenever these are possible.

47
5.1 Prepivot Bootstrap Methods
This subsection introduces the concept of a root, motivates the use of roots
by showing how confidence bounds are derived from special types of roots,
namely from exact pivots. Then single bootstrap confidence bounds, based
on roots, are introduced and seen to be a simple extension of Hall’s percentile
method. These confidence sets are based on an estimated probability integral
transform. This transform can be iterated, which suggests the prepivoting
step. The effect of this procedure is examined analytically in a special exam-
ple, where it results in exact coverage. Since analysis is not always, feasible
it is then shown how to accomplish the same by an iterated bootstrap simu-
lation procedure. This is concluded with remarks about the improved large
sample properties of the prepivot methods and with some critical comments.

5.1.1 The Root Concept

Suppose X ∼ Pθ and we are interested in confidence bounds for the real
valued functional ψ = ψ(θ). We also have available the estimate θb of θ and
estimate ψ by ψb = ψ(θ).
b Beran (1987) introduces the concept of a root. This
is a function R = R(X, ψ) = R(X, ψ(θ)) of θ (through ψ(θ)) and the data
X. If the distribution function
Fθ (r) = Pθ (R ≤ r) = Pθ (R(X, ψ(θ)) ≤ r)
of such a root does not depend on θ, then R is a pivot in the strict sense.
The idea behind pivots is to play off the double dependence on θ in the above
probability statement, namely through Pθ and ψ(θ), so that no dependence
on θ remains. Such pivots are not always possible. In fact, they are the
exception and not the rule. It was the possible dependence of Fθ on θ, which
led Beran to introduce this broader terminology of root, and we follow his
example at least in this section. Before describing the use of such roots for
constructing confidence bounds, we will discuss the procedure in the case of
strict pivots.

5.1.2 Confidence Sets From Exact Pivots

Pivots have long been instrumental in finding confidence bounds. In this
subsection we will assume that Fθ (r) = F (r) is independent of θ. Let r1−α
be such that F (r1−α ) = 1 − α or r1−α = F −1 (1 − α). Now let
C(X, 1 − α) = {ψ : R(X, ψ) ≤ r1−α } ,

48
then C(X, 1 − α) can be considered a (1 − α)-level confidence set for ψ. This
results from

Pθ (ψ ∈ C(X, 1 − α)) = Pθ (R(X, ψ) ≤ r1−α ) = F (r1−α ) = 1 − α .

Typically, when R(X, ψ) is monotone in ψ, the set C(X, 1 − α) will be some

kind of interval, infinite on the right or left, which is equivalent to either a
lower or upper confidence bound for ψ. When R(X, ψ) is first decreasing and
then increasing in ψ, we will usually obtain a bounded confidence interval
for ψ.
Often the distribution F is known analytically for certain pivots and the
quantiles rα are tabulated. As an example, consider Example 2, where we
are interested in confidence bounds for ψ = ψ(µ, σ) = µ. Then
√ X̄ − ψ n n
1X 1 X
R= n , with X̄ = Xi and S 2 = (Xi − X̄)2 ,
S n i=1 n − 1 i=1
is a pivot with distribution function given by the tabulated Student-t distri-
bution with n − 1 degrees of freedom. Following the above generic recipe for
C(X, 1 − α) with r1−α = tn−1 (1 − α) we get the following lower confidence
bound for ψ
S
X̄ − tn−1 (1 − α) √ .
n
Upper bounds can be obtained by changing R to −R and intervals can be
obtained by changing R to |R|.
In some situations the pivot distribution F can not be determined analyt-
ically. Then the only recourse is simulation. As an example consider the
case where X = (X1 , . . . , Xn ) is a random sample from the extreme value
distribution
x−a

H(x) = 1 − exp − exp for − ∞ < x < ∞ ,
b
where a ∈ R and b > 0 are the unknown parameters, i.e., θ = (a, b). Such
a random sample can also be considered as a log-transform of a random
sample from a Weibull distribution with scale parameter κ = exp(a) and
shape parameter β = 1/b. If ab and bb are the maximum likelihood estimates
of a and b, one can treat
ab − a b
R1 = and R2 = b
b
b b
49
as appropriate pivots for a and b, respectively. However, their distribution
is not practically obtainable by analytical methods. By simulating R1 and
R2 for many random samples generated from one specifice extreme value
distribution (it does not matter which, because of the pivot property) one
can obtain accurate estimates of these pivot distributions. Bain (1978) has
tabulated the simulated quantiles of these pivots. In a sense these can be
considered as a forerunner of the bootstrap method.

5.1.3 Confidence Sets From Bootstrapped Roots

Here we assume that the distribution Fθ of R may still depend on θ. One can
then use the bootstrap approach and estimate Fθ by Fθb. We do not need to
know the functional form of Fθ , i.e., we don’t have to plug an estimate θb for
θ into Fθ in order to obtain Fθb.We can instead simulate a bootstrap sample
X?1 , . . . , X?B from Pθb, in which case we only need to know how to plug in θb
into Pθ to get Pθb and how to generate samples from it. From this bootstrap
sample compute the corresponding bootstrap sample of roots

R(X?1 , ψ),
b . . . , R(X? , ψ)
B
b ,

where ψb = ψ(θ).
b Note that we have replaced all appearances of θ by θ,b i.e.,
?
in the distribution Pθb generating the bootstrap samples Xi and in ψ = ψ(θ).
b b
For large B this bootstrap sample of roots will give an accurate description
of Fθb(·), namely
B
1 X
I ? b −→ Fθb(x) as B → ∞ .
B i=1 [R(Xi ,ψ)≤x]

By sorting the bootstrap sample of roots we can, by the usual process, get a
good approximation to the quantile r1−α (θ),
b which is defined by

b = 1 − α or r −1
Fθb r1−α (θ) 1−α (θ) = Fb (1 − α) .
b
θ

To get a bootstrap confidence set for ψ one replaces r1−α in C(X, 1 − α)

by r1−α (θ)
b or by its just suggested approximation. We will not distinguish
between the two. Thus we have the following bootstrap confidence set
n o n o
CB (X, 1 − α) = ψ : R(X, ψ) ≤ r1−α (θ)
b = ψ : F (R(X, ψ)) ≤ 1 − α
θb
.

50
The second representation of CB shows that the construction of the confi-
dence set appeals to the probability integral transform. Namely, for con-
tinuous Fθ the random variable U = Fθ (R) has a uniform distribution on
the interval (0, 1) and then P (U ≤ 1 − α) = 1 − α. Unfortunately, we can
only use the estimated probability integral transform Ub = Fθb (R) and Ub is
no longer distributed uniformly on (0, 1). In addition, its distribution will
usually still depend on θ. However, the distribution of Ub should approximate
that of U (0, 1).
The above method for bootstrap confidence sets is nothing but Hall’s per-
centile method, provided we take as root the location root

R(X, ψ) = ψb − ψ = ψ(X)
b −ψ .

Thus the above bootstrap confidence sets based on roots represent an exten-
sion of Hall’s percentile method to other than location roots.

5.1.4 The Iteration or Prepivoting Principle

Beran’s double bootstrap or prepivoting idea consists in treating

R1 (X, ψ) = Ub = Fθb (R(X, ψ))

as another root and in applying the above bootstrap confidence set process
with R1 (X, ψ) as root. Note that R1 (X, ψ) depends on X in two ways, once
through θb = θ(X)
b in Fθb and once through X in R(X, ψ). We denote the
distribution function of R1 by F1θ . It is worthwhile to point out again the
double dependence of F1θ on θ, namely through Pθ and ψ(θ) in

F1θ (x) = Pθ (R1 (X, ψ(θ)) ≤ x) .

The formal bootstrap procedure consists in estimating F1θ (x) by F1θb(x), i.e.,
by replacing θ with θ.
b When the functional form of F is not known one
1θ
resorts again to simulation as will be explained later.
Denoting the (1 − α)-quantile of F1θb(x) by
b = F −1 (1 − α)
r1,1−α (θ) 1θb

we obtain the following confidence set for ψ

n o n o
C1B (X, 1−α) = ψ : R1 (X, ψ) ≤ r1,1−α (θ)
b = ψ : F (R (X, ψ)) ≤ 1 − α
1θb 1 ,

51
with nominal confidence level 1 − α. Again, the second form of the confi-
dence set C1B (X, 1−α) shows the appeal to the estimated probability integral
transform, since F1θ (R1 (X, ψ)) is exactly U (0, 1), provided F1θ is continuous.
Actually we are dealing with a repeated estimated probability integral trans-
form since R1 already represented such a transform. It is hoped that this
repeated transform

R2 (X, ψ) = F1θb(R1 (X, ψ)) = F1θb Fθb(R(X, ψ))

provides a closer approximation to the U (0, 1) distribution than the original

single transform
R1 (X, ψ) = Fθb(R(X, ψ)) .
Beran refers to the step of going from R(X, ψ) to R1 (X, ψ) as prepivoting.
Of course, the above process, R(X, ψ) → R1 (X, ψ) → R2 (X, ψ), can in
principle be continued, but this is not very useful in practice, especially
when nested simulations are needed to carry out the iteration steps.

5.1.5 Calibrated Confidence Coefficients

The following third form of C1B (X, 1 − α) will not only be more useful in
the construction of C1B (X, 1 − α) via the bootstrap simulation, but also in
elaborating the connection to Loh’s calibration scheme, namely
n o
C1B (X, 1 − α) = ψ : F1θb (R1 (X, ψ)) ≤ 1 − α
n o
= ψ : F1θb Fθb (R(X, ψ)) ≤ 1 − α
n o
= ψ : R(X, ψ) ≤ Fθb−1 F1−1
θb
(1 − α) ,

i.e., we compare the original root R(X, ψ) against an adjusted or recalibrated

quantile. This recalibration idea was introduced independently by Loh (1987)
and was shown to be equivalent to Beran’s prepivoting by DiCiccio and
Romano (1988) in the case, where the uncalibrated intervals are the ordinary
bootstrap intervals CB (X, 1 − α). To see this, note that the exact coverage
of CB is

Pθ Fθb (R(X, ψ)) ≤ 1 − α = Pθ (R1 (X, ψ) ≤ 1 − α) = F1θ (1 − α) .

By replacing θ by θb in F1θ we are invoking the bootstrap principle and get

an estimated exact coverage of CB as F1θb(1 − α). The calibration idea is

52
to choose the original nominal confidence level in the definition of CB , now
denoted by 1 − α1 , such that the estimated exact coverage of CB (X, 1 − α1 )
becomes 1 − α, i.e.,

F1θb(1 − α1 ) = 1 − α or 1 − α1 = F1−1
θb
(1 − α) .

Thus the recalibrated bootstrap confidence set becomes

n o
CB (X, 1 − α1 ) = ψ : Fθb (R(X, ψ)) ≤ 1 − α1
n o
= ψ : Fθb (R(X, ψ)) ≤ F1−1
θb
(1 − α)
n o
= ψ : R(X, ψ) ≤ Fθb−1 F1−1
θb
(1 − α)
= C1B (X, 1 − α) ,

which establishes the equivalence.

5.1.6 An Analytical Example

Before going into the simulation aspects of the just described confidence
sets we will illustrate the method with Example 2, where one can track
analytically what happens. Here we are interested in confidence bounds for

ψ = ψ(θ) = ψ(µ, σ) = µ

and as estimates for µ and σ we consider the maximum likelihood estimates

µb and σb . As root we consider the location root

R(X, µ) = µb − µ = X̄ − µ .

Analytically Fθ is found to be
√ !
nx
Fθ (x) = Pθ X̄ − µ ≤ x = Φ .
σ
This leads to
√ ! √ !
nR(X, µ) n(X̄ − µ)
R1 (X, µ) = Fθb(R(X, µ)) = Φ =Φ .
σb σb
Since √ √
n(X̄ − µ) n(X̄ − µ)
Tn−1 = q = ∼ Gn−1 ,
σb n/(n − 1) S

53
where Gn−1 represents the Student-t distribution function with n − 1 degrees
of freedom, we find that
q
F1θ (x) = Pθ (R1 (X, µ) ≤ x) = P Φ n/(n − 1) Tn−1 ≤ x
q
−1
= Gn−1 (n − 1)/n Φ (x) .

In this specific case Fθ still depends on θ, namely on σ, but F1θ is independent

of θ. Thus
F1θ (x) = F1θb(x) = F1 (x)
and its (1 − α)-quantile is
s !
n
r1,1−α = r1,1−α (θ)
b =Φ tn−1 (1 − α) ,
n−1

where tn−1 (1 − α) = G−1

n−1 (1 − α). The confidence set C1B (X, 1 − α) can now
be derived as
( s !)
n
C1B (X, 1 − α) = µ : R1 (X, µ) ≤ Φ tn−1 (1 − α)
n−1
( √ ! s !)
n(X̄ − µ) n
= µ:Φ ≤Φ tn−1 (1 − α)
σb n−1
√ X̄ − µ
( )
= µ: n ≤ tn−1 (1 − α)
S
( )
S
= µ : X̄ − √ tn−1 (1 − α) ≤ µ
n
" !
S
= X̄ − √ tn−1 (1 − α), ∞ ,
n

leading to the classical lower confidence bound for µ, with exact coverage
1 − α. Of course, the above derivation appears rather convoluted in view of
the usual straightforward derivation of the classical bounds. This convoluted
process is not an intrinsic part of the prepivoting method and results only
from the analytical tracking of the prepivoting method. When prepivoting is
done by simulation, see Section 5.1.7, the derivation of the confidence bounds
is conceptually more straightforward, and all the work is in the simulation
effort.

54
A similar convoluted exercise, still in the context of Example 2 and using the
prepivot method with the location root R(X, σ 2 ) = σb 2 −σ 2 , leads to the lower
bound nσb 2 /χ2n−1 (1 − α). This coincides with the classical lower confidence
bound for σ 2 , with exact coverage 1 − α. Here χ2n−1 (1 − α) is the (1 − α)-
quantile of the chi-square distribution with n − 1 degrees of freedom. Here
matters would have been even better had we used the scale pivot R(X, σ 2 ) =
σb 2 /σ 2 instead. In that case the simple bootstrap confidence set CB (X, 1 − α)
would immediately lead to the classical bounds and bootstrap iteration would
not be necessary. This particular example shows that the choice of root
definitely improves matters.
It turns out that the above examples can be generalized and in doing so the
demonstration of the exact coverage property becomes greatly simplified.
However, the derivation of the confidence bounds themselves may still be
complicated.
The exact coverage in both the above examples is just a special case of the
following general result. In our generic setup let us further assume that
R1 (X, ψ) = Fθb (R(X, ψ))
is an exact pivot with continuous distribution function F1 , which is indepen-
dent of θ. This pivot assumption is satisfied in both our previous normal
examples and it is the reason behind the exact coverage there as well as in
this general case. Namely,
n o
C1B (X, 1 − α) = ψ : F1 Fθb (R(X, ψ)) ≤ 1 − α
has exact coverage since

U = F1 Fθb (R(X, ψ))

has the U (0, 1) distribution.

5.1.7 Prepivoting by Simulation

When analytical methods fail in determining F1θb or Fθb, we can still do so by
bootstrap simulation methods. This was already illustrated in Section 5.1.3
for Fθb, but for F1θb a nested simulation is needed.
In order to approximate F1θb(x) we will simulate X?1 , . . . , X?B from Pθb and
compute a bootstrap sample of roots

R1 X?1 , ψ(θ)
b , . . . , R X? , ψ(θ)
1 B
b ,

55
where we postpone for the moment the discussion of how to compute each
such root. Note that θb has taken the place of θ in ψ(θ)
b and in P , which
θb
generated the bootstrap sample.
By the LLN we have that
B
1 X
I −→ F1θb(x) as B → ∞ .
B i=1 [R1 (Xi ,ψ(θb))≤x]
?

As for the computation of each R1 X?i , ψ(θ) b , we will need to employ a
second level of bootstrap sampling. Recall that

R1 (X, ψ(θ)) = Fθb (R (X, ψ(θ)))

and thus
R1 X?i , ψ(θ)
b =F
θbi?
R X?i , ψ(θ)
b ,
b ? ), with X? generated by P .
where θbi? = θ(X i i θb
For any θbi? generate a second level bootstrap sample

X??
ij , j = 1, . . . , A

and compute the corresponding bootstrap sample of original roots

R X?? b?
ij , ψ θi , j = 1, . . . , A .

By the LLN we have that

A
1 X
I −→ Fθb? (x) as A → ∞
A j=1 [R(Xij ,ψ(θbi ))≤x]
?? ?
i

and thus
A
1 X
R
b
1i = I
A j=1 [R(Xij ,ψ(θbi ))≤R(Xi ,ψ(θb))]
?? ? ?

−→ Fθb? R X?i , ψ(θ)
b = R1 X?i , ψ(θ)
b as A → ∞ .
i

b as a good approximation to R X? , ψ(θ)
Thus, for large A we can consider R b .
1i 1 i
In the same vein we can, for large B and A, consider
B B
1 X 1 X
R
b (x) =
2 I[Rb1i ≤x] = I ?
B i=1 B i=1 [R1 (Xi ,ψ(θb)≤x]

56
as a good approximation of F1θb(x). In particular, by sorting the R b we can
1i
obtain their (1 − α)-quantile γ = r1 (1 − α) by the usual method and treat it
b b
as a good approximation for γ = F1−1 θb
(1−α). Sorting the first level bootstrap
sample
R(X?1 , ψ),
b . . . , R(X? , ψ)
B
b

we can find their γb -quantile rbbγ as a good approximation to

b = F −1 (γ) = F −1 F −1 (1 − α)
rγ (θ) θb θb 1θb

and we may then consider

n o
Cb1B (X) = ψ : R(X, ψ) ≤ rbbγ

as a good approximation to C1B (X, 1 − α) for large A and B. In fact, under

some general regularity conditions, legitimizing the double limit A → ∞ and
B → ∞, one has

Cb1B (X) −→ C1B (X, 1 − α) as A → ∞, B → ∞ .

5.1.8 Concluding Remarks

The examples in Section 5.1.6 show that the coverage properties of the prepiv-
oting bootstrap method improved over that of Hall’s percentile method. In
fact, in these particular examples we wound up with exact coverage prob-
abilities. This exactness is special to these examples and to the general-
ization given in Section 5.1.6. However, Beran (1987) shows under fairly
broad conditions √(namely X = (X1 , . . . , Xn ) being a random sample, the
root R(X, ψ) = n(ψb − ψ) being asymptotically normal N (0, σ 2 (θ)), and
some more regularity conditions) that the coverage error of C1B (X, 1 − α) is
of order 1/n. Further,
√ bthe coverage error of C1B (X, 1 − α), when using the
Studentized root n(ψ − ψ)/σ(θ), is of order 1/n3/2 . Thus the bootstrap
b
iteration in the prepivot method definitely improves matters.
Just as Hall’s percentile method generally is not transformation equivariant,
one cannot generally expect this property to hold for the prepivoting method
either.
Aside from the simulation and computational burden another minor draw-
back of the prepivoting method is that nothing is said about the choice of
the root. Sometimes roots, like the location root ψb − ψ or the scale root
ψ/ψ,
b are quite natural but at other times that is not the case. For example,

57
when ψ = ρ is the bivariate normal correlation in Example 4, one can hardly
treat ρ as location or scale parameter and either of the above two roots is
inappropriate. There is of course a natural pivot for ρ, but it is very compli-
cated and difficult to compute. The next section presents a modification of
the double bootstrap method which gets around the need of choosing a root
by automatically generating a canonical root as part of the process.

5.2 The Automatic Double Bootstrap

This section presents a modification of the double bootstrap method. This
method, due to Scholz (1992), gets around the need for choosing a root, by
automatically generating a canonical root as part of the process. It is not
necessary to know the form of the root. If this canonical root is indeed an
exact pivot, this modified double bootstrap method will yield exact coverage
confidence bounds. We will introduce the method first in the context of tame
pivots and then extend it to general exact pivots and show that the resulting
confidence bounds are transformation equivariant. We examine the prepiv-
oting connection and then present a case study that examines the sensitivity
of the method to the choice of starting estimates. Finally we discuss the case
when exact pivots do not exist and we suggest that the automatic double
bootstrap procedure should still work well in an approximate sense. This is
illustrated with the classical Behrens-Fisher problem.

5.2.1 Exact Confidence Bounds for Tame Pivots

Suppose X ∼ Pθ and we are interested in confidence bounds for the real
valued functional ψ = ψ(θ). We also have available the estimate θb of θ and
estimate ψ by ψb = ψ(θ).
b Throughout this subsection it is assumed that we
deal with a special type of pivot R, namely a “tame pivot,” a name suggested
to me by Antonio Possolo. A tame pivot is a function R of ψb and ψ only,
with R(ψ,
b ψ) having distribution function F independent of θ. Further, we
assume this pivot function R to have the following monotonicity properties:
b ψ) & in ψ for fixed ψ
(i) R(ψ, b

b ψ) % in ψ
(ii) R(ψ, b for fixed ψ.

Note that these assumptions do not preclude the presence of nuisance param-
eters. However, the role of such nuisance parameters is masked in that they

58
neither appear in the pivot nor influence its distribution F . As such, these
parameters are not really a nuisance. The following two examples satisfy the
above assumptions and in both cases nuisance parameters are present in the
model.
In the first example we revisit Example 4. Here we are interested in confi-
dence bounds for the correlation coefficient ψ = ψ(θ) = ρ. Fortuitously, the
distribution function Hρ (r) of the maximum likelihood estimate ρb is contin-
uous, depends only on the parameter ρ, and is monotone decreasing in ρ for
fixed r (see Lehmann 1986, p.340). Further,

R(ρ, b ∼ U (0, 1)
b ρ) = Hρ (ρ)

is a pivot. Thus (i) and (ii) are satisfied. This example has been exam-
ined extensively in the literature and Hall (1992) calls it the “smoking gun”
of bootstrap methods, i.e., any good bootstrap method better perform rea-
sonably well on this example. For example, the percentile-t method fails
spectacularly here, mainly because Studentizing does not pivot in this case.
This question was raised by Reid (1981) in the discussion of Efron (1981).
In the second example we revisit Example 2. Here we are interested in
confidence bounds on ψ = ψ(θ) = σ 2 . Using again maximum likelihood
estimates we have that
b ψ) = ψ = σ
b b2
R(ψ,
ψ σ2
is a pivot and satisfies (i) and (ii).
If we know the pivot distribution function F and the functional form of R,
we can construct exact confidence bounds for ψ as follows. From

b ψ) ≤ F −1 (1 − α) = 1 − α
Pθ R(ψ,

we obtain via monotonicity property (i)

Pθ ψ ≥ Rψ−1
b F −1
(1 − α) =1−α,

where ψbL = Rψ−1 −1

b (F (1 − α)) solves

b ψ) = F −1 (1 − α) or F R(ψ,
R(ψ, b ψ) = 1 − α

for ψ. Hence we have in ψbL an exact 100(1 − α)% lower confidence bound
for ψ. The dependence of ψbL on F and R is apparent.

59
It turns out that it is possible in principle to get the same exact confidence
bound without knowing F or R, as long as they exist. This is done at the
expense of performing the double bootstrap. Here exactness holds provided
both bootstrap simulation sample sizes tend to infinity.
The procedure is as follows. First obtain a bootstrap sample of estimates
ψb1? , . . . , ψbB? by the usual process from Pθb. By the LLN we have
B
1 X
G
b
B y|θ =
b I[ψb? ≤y] −→ Pθb ψb? ≤ y as B → ∞ .
B i=1 i

Using this empirical distribution function G
b
B y|θ we are able to approx-
b

imate Pθb ψb? ≤ y to any accuracy by just taking B large enough. With

the understanding of this approximation we will thus use Gb
B y|θb and

Pθb ψb? ≤ y interchangeably.
From monotonicity property (ii) we then have

Pθb ψb? ≤ y = Pθb R(ψb? , ψ)
b ≤ R(y, ψ)
b = F R(y, ψ)
b .

Next, given a value θbi? and ψbi? = ψ(θbi? ), we obtain a second level bootstrap
sample of estimates
?? ??
ψbi1 , . . . , ψbiA from Pθb? .
i

Exploiting again the monotonicity property (ii) we have

Pθb? ψbi?? ≤ y = Pθb? R(ψbi?? , ψbi? ) ≤ R(y, ψbi? ) = F R(y, ψbi? ) ,
i i

which, for large A, can be approximated by the empirical distribution func-

?? ??
tion of ψbi1 , . . . , ψbiA , i.e., by
A

b?
1 X
G
b
1A y|θi = I ?? .
A j=1 [ψbij ≤y]

By the LLN this converges to Pθb? ψbi?? ≤ y , as A → ∞. Thus, by taking A
i

sufficiently large and using y = ψ,

b we can simulate

G b b? b b?
1A ψ|θ1 , . . . , G1A ψ|θB
b b

60
and regard them as equivalent proxy for

F R(ψ,
b ψb? ) , . . . , F R(ψ,
b ψb? ) .
1 B

Sorting these values we find the (1 − α)-quantile by the usual process. The
corresponding ψb? = ψbi? = ψbL? approximately solves

F R(ψ,
b ψb? ) ≈ 1 − α .

This value ψbL? is approximately the same as our previous ψbL , provided A and
B are sufficiently large.
The above procedure can be reduced to the following: Find that value θb?
and ψb? = ψ(θb? ), for which

Pθb? ψb?? ≤ ψb = F R(ψ,
b ψb? ) ≈ 1 − α .

b and take a convenient value θb? ∈ ψ −1 (ψ

Start with a value ψb1? , say ψb1? = ψ, b? )
1 1
so that ψ(θb1? ) = ψb1? . Using a second level bootstrap sample of size A, evaluate
or approximate

F1 = F R(ψ,
b ψb? ) = P ψb?? ≤ ψb by G b b? .
1A ψ|θ1
b
1 θb1?

This is then iterated by trying new values of ψb? , i.e., ψb1? , ψb2? , . . .. Since

F R(ψ,
b ψb? ) & in ψb?

one should be able to employ efficient root finding algorithms for solving

F R(ψ,
b ψb? ) = 1 − α ,

i.e., use far fewer than the originally indicated AB bootstrap iterations. It
seems reasonable that kA iterations will be sufficient with A ≈ 1000 and
k ≈ 10 to 20.
Note that in this procedure we only need to evaluate G b b?
1A ψ|θi , which in
b

turn only requires that we know how to evaluate the estimates ψ, b ψb? , or
ψb?? . No knowledge of the pivot function R or its distribution function F is
required. For the previously discussed bivariate normal correlation example,
there exists a tame pivot. Therefore we either can, through massive simula-
tion of computationally simple calculations of ρ,
b obtain the exact confidence

61
bound through the above bootstrap process, or in its place use the compu-
tationally difficult analytical process of evaluating the distribution function
Hρ (x) of ρb and solving
Hρ (ρ)
b =α

for ρ = ρbU . Then

Pρ (ρ ≤ ρbU ) = Pρ (Hρ (ρ)

b ≥ α) = 1 − α .

5.2.2 The General Pivot Case

For the general pivot case we assume that the indexing parameter can be
reparametrized in the form θ = (ψ, η), i.e., the quantity ψ of interest is just
a particular real valued component of θ and the remainder η of θ acts as a
vector of nuisance parameters. Again we have estimates θb = (ψ, b ηb) and we
denote the distribution function of ψ by
b

Dψ,η (y) = Pθ ψb ≤ y .

b ∼ U (0, 1) for
Motivated by the probability integral transform result, Dψ,η (ψ)
continuous Dψ,η , we make the following general pivot assumption:

(V ) Dψ,bη ψb is a pivot, i.e., has a distribution function H which does not

depend on unknown parameters, and Dψ,bη ψb & in ψ for fixed ψb and
ηb.

The tame pivot case examined in the previous section satisfies (V ) if F is

continuous. Namely,

b ψ) ≤ R(y, ψ) = F (R(y, ψ))
Dθ (y) = Pθ ψb ≤ y = Pθ R(ψ,

and
Dψ,bη (ψ) b ψ)) ∼ U (0, 1)
b = F (R(ψ,

is a pivot and is decreasing in ψ. Here ηb does not affect Dψ,bη (ψ).

b
Example 2 of a normal random sample illustrates the situation of estimated
nuisance parameters. Here we are interested in confidence bounds for the
p-quantile ψ = ψ(θ) = µ + zp σ, where zp is the standard normal p-quantile.

62
We can think of θ as reparametrized in terms of ψ and σ and again we use
maximum likelihood estimates ψb and σb for ψ and σ. We have that

ψb − ψ ψb − ψ
and
σb σ
are both pivots with respective c.d.f.’s G1 and G2 and
!
y−ψ
Dθ (y) = Pθ ψb ≤ y = G2 .
σ

Thus !
ψb − ψ
Dψ,bσ (ψ)
b =G
2 ∼ G2 G−1
1 (U ) ,
σb
where U ∼ U (0, 1).
This example generalizes easily. Assume that there is a function R(ψ, b ψ, η)
which is a pivot, i.e., has distribution function G2 , and is decreasing in ψ
and increasing in ψ. b Suppose further that R(ψ, b ψ, ηb) is also a pivot with
distribution function G1 . Then again our general pivot assumption (V ) is
satisfied. This follows from

Dψ,η (y) = Pψ,η ψb ≤ y = Pψ,η R(ψ,
b ψ, η) ≤ R(y, ψ, η) = G (R(y, ψ, η))
2

and thus
Dψ,bη ψb = G2 R ψ,
b ψ, ηb = G2 G−1
1 (U )

is a pivot which is decreasing in ψ.

Given the general pivot assumption (V ), it is possible to construct exact
lower confidence bounds for ψ as follows:

b ≤ H −1 (1 − α) = P ψ ≥ ψ
1 − α = Pθ Dψ,bη (ψ) L ,
b
θ

where ψbL solves

b = H −1 (1 − α) or H D (ψ)
Dψ,bη (ψ) b =1−α (4)
ψ,b
η

for ψ = ψbL . This, however, requires knowledge of both H and D.

We observe that ψbL is transformation equivariant. To show this, let g be a
strictly increasing transform of ψ into τ = g(ψ). We use τb = g(ψ)b as our

63
estimate of τ . Then the above procedure applied to τb and with θ = (ψ, η)
reparametrized to ϑ = (τ, η) yields τbL = g(ψbL ).
This is seen as follows. Denote the reparametrized probability model by Peτ,η ,
which is equivalent to Pg−1 (τ ),η . The distribution function of τb is

f (y) = Pe (τb ≤ y) = Pe −1
D τ,η g ψ ≤ y = Pτ,η ψ ≤ g (y)
b e b
τ,η τ,η

= Pg−1 (τ ),η ψb ≤ g −1 (y) = Dg−1 (τ ),η g −1 (y) ,

so that
−1
D
f (τb) = D −1
τ,b
η η g
g (τ ),b g ψb = Dg−1 (τ ),bη ψb

is an exact pivot with same c.d.f. H as Dψ,bη (ψ).

b Solving

1−α=H D
f (τb) = H D −1
τ,b
η η ψ
g (τ ),b
b

for τ = τbL yields

g −1 (τbL ) = ψbL or τbL = g(ψbL ) ,
as was to be shown.
By appropriate double bootstrapping we can achieve the same objective,
namely finding ψbL , without knowing H or D. The double bootstrap we
employ here is a slight variant of the commonly used one. There are two
parts to the procedure. The first part obtains H −1 (1 − α) to any accuracy
for large enough A and B and the second consists of the iterative solution of
Equation (4).
We start by generating the first level bootstrap sample X?1 , . . . , X?B from
Pψ0 ,η0 for some choice of ψ0 and η0 . Typically, for reasons to be discussed
in Section 5.2.5, one would take (ψ0 , η0 ) = (ψ, b ηb). However, for now we will
stay with the arbitrary starting choice (ψ0 , η0 ).
From these bootstrap data sets we obtain the first level bootstrap sample of
estimates, i.e.,
(ψbi? , ηbi? ) , i = 1, . . . , B .
For each i = 1, . . . , B obtain a second level bootstrap data sample X?? ??
i1 , . . . , XiA
from Pψ0 ,bηi? (not from Pψb? ,bη? as one might usually do it) and compute the cor-
i i
responding second level bootstrap sample of estimates

(ψbij?? , ηbij?? ) , j = 1, . . . , A , i = 1, . . . , B .

64
By the LLN , as A → ∞, we have
A
1 X
D
c =
iA I[ψb?? ≤ψb? ] −→ Pψ0 ,bηi? ψbi?? ≤ ψbi? = Dψ0 ,bηi? ψbi? ∼ H .
A j=1 ij i

The latter distributional assertion derives from the pivot assumption (V ) and
from the fact that (ψbi? , ηbi? ) arises from Pψ0 ,η0 . Again appealing to the LLN
we have
B
1 X
I −→ H(y) as B → ∞ ,
B i=1 [Dψ0 ,bηi? (ψbi )≤y]
?

and thus we can consider

B
1 X
I
B i=1 [DbiA ≤y]
as a good approximation to H(y). From this approximation we can obtain
H −1 (1 − α). This is done by sorting the sample D
c , i = 1, . . . , A and finding
iA
its (1 − α)-quantile by the usual process.
Now comes the second part of the procedure. For some value ψ1 (one could
start here with ψ1 = ψ)b generate

X◦1 , . . . , X◦N i.i.d. ∼ Pψ1 ,bη

and get the bootstrap sample of resulting estimates

◦ ◦
ψb11 , . . . , ψb1N .

By the LLN we have

N
◦ 1 X
DN (ψ1 ) = b −→ Dψ1 ,b
I[ψb◦ ≤ψ] η (ψ) .
c b
N 1i
i=1

Using the monotonicity of Dψ,bη (ψ)b in ψ, a few iterations over ψ , ψ , . . .

1 2
should quickly lead to a solution of the equation
c◦ (ψ) = H −1 (1 − α) .
b ≈D
Dψ,bη (ψ) N

For large A, B, N this solution is practically identical with the exact lower
confidence bound ψbL . If this latter process takes k iterations we will have
performed AB + kN bootstrap iterations. This is by no means efficient and
it is hoped that future work will make the computational aspects of this
approach more practical.

65
5.2.3 The Prepivoting Connection
We now examine the connection to Beran’s prepivoting approach and ,by
equivalence, also to Loh’s calibrated confidence sets, see Section 5.1.5, Sup-
pose we have a specified root function R(ψ,b ψ) = R(ψ(X),
b ψ) with distribu-
tion function Fψ,η (x). This is somewhat more special than Beran’s general
root concept R(X, ψ). Suppose now that the following assumptions hold

(V ? ) Fψ,bη (R(ψ,
b ψ)) is a pivot,

R(ψ,b ψ) % in ψ b for fixed ψ and

Fψ,bη (R(ψ, ψ)) & in ψ for fixed ψb and ηb.
b

Then (V ? ) implies (V ), since

Dψ,η (x) = Pψ,η (ψb ≤ x) = Pψ,η (R(ψ,

b ψ) ≤ R(x, ψ)) = F (R(x, ψ))
ψ,η

and
Dψ,bη (ψ)
b = F (R(ψ,
ψ,b
η
b ψ))

is a pivot by assumption.
When F does not depend on ψ, i.e., when the root function is successful in
eliminating ψ from the distribution of R, then one can replace
b ψ)) & in ψ for fixed ψ
Fψ,bη (R(ψ, b and ηb

in (V ? ) by the more natural assumption

b ψ) & in ψ for fixed ψ.
R(ψ, b

In contrast to the pivot assumption in (V ? ), Beran’s prepivoting idea treats

Fψ,
bb η
(R(ψ,
b ψ))

as pivotal or nearly pivotal, its distribution being generated via bootstrap-

ping. The difference in the two approaches consists in how the subscript ψ
on F is treated. Often it turns out that F depends only on the subscript
η and the above distinction does not manifest itself. In those cases Beran’s
prepivoting will lead to exact confidence bounds as well, provided (V ? ) holds.
For example, in the situation of Example 2 with ψ = µ + zp σ and

R ψ, b−ψ =µ
b ψ =ψ b − (µ + zp σ)
b + zp σ

66
as root, Beran’s prepivoting will lead to exact confidence bounds, since the
distribution of R depends only on the nuisance parameter η = σ.
In contrast, consider Example 4 with ψ = ρ. If we take the root R(ρ, b ρ) =
ρb − ρ, then
Fρ (x) = Pρ (ρb − ρ ≤ x) = Hρ (x + ρ)
b Here the assumption (V ? ) is satisfied since
with Hρ denoting the c.d.f. of ρ.

Fρ (ρb − ρ) = Hρ (ρ)
b

is a pivot. However,
Fρb(ρb − ρ) = Hρb(ρb − ρ + ρ)
b

appears not to be a pivot, although we have not verified this. This dif-
ference is mostly due to the badly chosen root. If we had taken as root
R(ρ, b ρ) = Hρ (ρ),
b then the distinction would not arise. In fact, in that case R
itself is already a pivot. However, this particular root function is not trivial
and that points out the other difference between Beran’s prepivoting and
the automatic double bootstrap. In the latter method no knowledge of an
“appropriate” root function is required. √
As a complementary example consider Example 2 with the root R = n(s2 −
σ 2 ) for the purpose of constructing confidence bounds for σ 2 . Let χf denote
the c.d.f. of a chi-square distribution with f degrees of freedom. Then
√
!!
2 2
x
Fµ,σ2 (x) = Pµ,σ2 n(s − σ ) ≤ x = χn−1 (n − 1) 1 + √ 2 .
nσ
Clearly
√
n(s2 − σ 2 )
!!
Fµb,σ2 (R) = χn−1 (n − 1) 1 + √ 2
nσ
(n − 1)s2
!
= χn−1 ∼ U (0, 1)
σ2

is a pivot which will lead to the classical lower bound for σ 2 . On the other
hand, the iterated root
√ 2
n(s − σ 2 )
!!
2
R1,n (σ ) = Fµb,s2 (R) = χn−1 (n − 1) 1 + √ 2
ns
2
!!
σ
= χn−1 (n − 1) 2 − 2
s

67
is a pivot as well, with distribution function
 !−1 
χ−1
n−1 (x)
F1,n (x) = χn−1 (n − 1) 2 −  for 0 < x ≤ χn−1 (2(n − 1))
n−1

and F1,n (0) = χn−1 ((n − 1)/2), F1,n (x) = 0 for x < 0 and F1,n (x) = 1 for
x ≥ χn−1 (2(n − 1)). For γ ≥ χn−1 ((n − 1)/2) the set
n o h
B1,n = σ 2 : F1,n (R1,n ) ≤ γ = (n − 1)s2 /χ−1
n−1 (γ), ∞

yields the classical lower confidence bound, but for γ < χn−1 ((n − 1)/2) the
set B1,n is empty. This quirk was overlooked in Beran’s (1987) treatment of
this example. For large n the latter case hardly occurs, unless we deal with
small γ’s, i.e., with upper confidence bounds.

5.2.4 Sensitivity to Choice of Estimates

In this section we will use Example 2 with ψ = µ + zp σ to illustrate the
sensitivity of the pivot method and thus of the proposed automatic double
bootstrap method to the choice of starting estimates.
In this example it is instructive to analyze to what extent the form of the
estimate (ψ,b σb ) affects the form of the lower bound ψ b for ψ which results
L
from the pivot method.
It is obvious that the lower bound will indeed be different, if we start out with
location and scale estimates, which are different in character from that of the
maximum likelihood estimates. For example, as location scale estimates one
might use the sample median and range or various other robust alternatives.
Here we will analyze the more limited situation where we use as estimates of
ψ and σ
v
u n
uX
ψb = X̄ + ks and σb = rs = rt (Xi − X̄)2 /(n − 1)
i=1

for some known constants k and r > 0. In question here is the sensitivity
of the resulting automatic double bootstrap lower bound ψbL with respect to
k and r. This issue is similar but not the same as that of transformation
equivariance.
It turns out that ψbL does not depend on k or r, i.e., the result is always the
same, namely the classical lower confidence bound for ψ. For example, it does

68
not matter whether we estimate σ by the m.l.e. or by s. More remarkable
is the fact that we could have started with the very biased starting estimate
ψb = X̄, corresponding to k = 0, with the same final lower confidence bound.
It is possible that there is a general theorem hidden behind this that would
more cleanly dispose of the following convoluted argument for this result.
This argument fills the remainder of this section and may be skipped.
Recalling ψ = µ + zp σ, one easily derives

Dψ,σ (x) = Pψ,σ ψb ≤ x = Pψ,σ X̄ + ks ≤ x
√ √
√
!
n(X̄ − µ) n(µ − x)
= Pψ,σ + ≤ −ks n/σ
σ σ
√
= Gn−1,√n(µ−x)/σ (−k n)
√
= Gn−1,√n(ψ−x)/σ−zp √n (−k n) , (5)
where Gf,δ (x) denotes the noncentral Student-t distribution with f degrees
of freedom and noncentrality parameter δ.
Next note that
√
Dψ,bσ (ψ)
b = G √ √ (−k n)
σ −zp n
n−1, n(ψ−ψ)/b
√
b
√
= Gn−1,−zp n−V /r (−k n) ,
where
√ √
n(ψb − ψ) n(X̄ − µ − zp σ) √
V = = +k n
s √ s
= k n + Tn−1,−zp √n
and Tf,δ is a random variable with distribution function Gf,δ (x). The distri-
bution function H of Dψ,bσ (ψ)
b can be expressed more or less explicitly as
√
b ≤ y = P −z n − V /r ≥ δ(n − 1, −k n, y) ,
√
H(y) = P Dψ,bσ (ψ) p
√
where δy = δ(n − 1, −k n, y) solves
√
Gn−1,δy (−k n) = y .
Using the above representation for V we have
√ √
H(y) = P Tn−1,−zp √n ≤ −rzp n − rδy − k n
√ √
= Gn−1,−zp √n − n(rzp + k) − rδ(n − 1, −k n, y) .

69
Solving H(y1−α ) = 1 − α for y1−α = H −1 (1 − α) we get
√ √
tn−1,−zp √n,1−α = − n(rzp + k) − rδ(n − 1, −k n, y1−α )

or √ √
−( n(rzp + k) + tn−1,−zp √n,1−α )/r = δ(n − 1, −k n, y1−α )
where tf,δ,1−α is the (1 − α)-quantile of Gf,δ (x). Using the defining equation
for δy we get
√
H −1 (1 − α) = Gn−1,−(√n(rzp +k)+tn−1,−zp √n,1−α )/r (−k n) .

Solving
H −1 (1 − α) = Dψ,bσ (ψ)
b

or
√ √
Gn−1,−(√n(rzp +k)+tn−1,−zp √n,1−α )/r (−k n) = Gn−1,√n(ψ−ψ)/
b b √ (−k n)
σ −zp n

for ψ = ψbL we find

√ √ √
( n(rzp + k) + tn−1,−zp √n,1−α )/r = n(ψb − ψ)/σb + zp n

or
s s
ψbL = ψ = ψb − ks − √ tn−1,−zp √n,1−α = X̄ − √ tn−1,−zp √n,1−α ,
n n

which is the conventional 100(1 − α)% lower confidence bound for ψ.

5.2.5 Approximate Pivots and Iteration

Previously it was shown that under the general pivot assumption (V ) the
automatic double bootstrap closes the loop as far as exact confidence bounds
are concerned. It is noteworthy in this double bootstrap procedure that
we have complete freedom in choosing (ψ0 , η0 ). This freedom arises from
the pivot assumption. The pivot assumption is a strong one and usually
not satisfied. However, in many practical situations one may be willing to
assume that there is an approximate local pivot. By “local” we mean that
the statement

“Dψ,bη (ψ)
b is approximately distribution free”

70
holds in a neighborhood of the true unknown parameter θ. Since presumably
θb is our best guess at θ, we may as well start our search for H −1 (1 − α) as
close as possible to θ, namely with θ0 = (ψ0 , η0 ) = θ,
b in order to take greatest
advantage of the closeness of the used approximation. To emphasize this we
write
b =1−α
Hθb Dψ,bη (ψ)
as the equation that needs to be solved for ψ to obtain the 100(1 − α)%
lower bound ψbL for ψ. Of course, the left side of this equation will typically
no longer have a uniform distribution on (0, 1). Following Beran (1987) one
could iterate this procedure further. If

b ∼H
Hθb Dψ,bη (ψ) 2,θ

with H2,θb Hθb Dψ,bη (ψ)
b hopefully more uniform than Hθb Dψ,bη (ψ)
b , one
could then try for an adjusted lower bound by solving

H2,θb Hθb Dψ,bη (ψ)
b =1−α

for ψ = ψb2,L . This process can be further iterated in obvious fashion, but
whether this will be useful in small sample situations is questionable. What
would such an iteration converge to in the specific situation to be examined
next?
As illustration of the application of our method to an approximate pivot
situation we will consider the Behrens-Fisher problem, which was examined
by Beran (1988) in a testing context from an asymptotic rate perspective.
Let X1 , . . . , Xm and Y1 , . . . , Yn be independent random samples from respec-
tive N (µ, σ12 ) and N (ν, σ22 ) populations. Of interest are confidence bounds
for ψ = µ−ν. Since we do not assume σ1 = σ2 we are faced with the classical
Behrens-Fisher problem.
We will examine how the automatic double bootstrap or pivot method attacks
this problem. We can reparametrize the above model in terms of (ψ, η), where
µ = ψ + ν and η = (ν, σ1 , σ2 ). As natural estimate of ψ we take ψb = X̄ − Ȳ
and as estimate for η we take ηb = (Ȳ , s1 , s2 ), where s2i is the usual unbiased
estimate of σi2 . The distribution function of ψb is
 
x−ψ
Dψ,η (x) = Pψ,η X̄ − Ȳ ≤ x = Φ  q  .
σ12 /m + σ22 /n

71
The distribution function Hρ of
 
ψb − ψ
Dψ,bη (ψ)
b = Φ q 
s21 /m + s22 /n

depends on the unknown parameters through

nσ12
ρ = ρ(σ12 , σ22 ) = .
nσ12 + mσ22

Thus assumption (V ) is violated.

Traditional solutions to the problem involve approximating the distribution
function Gρ (x) = Hρ (Φ(x)) of

ψb − ψ
T =q
s21 /m + s22 /n

and in the process replace the unknown ρ by ρb = ρ(s21 , s22 ). This is done
for example in Welch’s solution (Welch (1947) and Aspin (1949)), where Gρ
is approximated by a Student t-distribution function Ff (t) with f = f (ρ)
degrees of freedom with
!−1
ρ2 (1 − ρ)2
f (ρ) = + .
m−1 n−1

As a second approximation step one then replaces the unknown ρ by ρ, b i.e.,

one estimates f by f = f (ρ). This leads to the following lower confidence
b b
bound for ψ: q
ψbW L = ψb − Ffb−1 (1 − α) s21 /m + s22 /n .
Recall that in the first phase of the automatic double bootstrap method we
could start the process of finding Hρ with any (ψ0 , η0 ). This would result
in Hρ0 . This is reasonable as long as H does not depend on unknown pa-
rameters. By taking as starting values (ψ0 , η0 ) = (ψ,
b ηb) we wind up with
a determination of Hρb instead. Thus the character of H is maintained and
is not approximated. The only approximation that takes place is that of
replacing the unknown ρ by ρ. b Whether this actually improves the coverage
properties over those of the Welch solution remains to be seen. There is of

72
course the possibility that the two approximation errors in Welch’s solution
cancel each other out to some extent.
The second phase of the pivot or automatic double bootstrap method stipu-
lates that we solve
 
ψb − ψ
1 − α = Hρb Dψ,bη (ψ)
b = G q
ρ

2 2
b
s1 /m + s2 /n

for ψ = ψbL , which yields the following 100(1 − α)% lower bound for ψ
q
ψbL = ψb − G−1 2 2
b (1 − α) s1 /m + s2 /n .
ρ

Beran (1988) arrives at exactly the same bound (although in a testing con-
text) by simple bootstrapping. However, he started out with the Studentized
test statistic T , which thus is one step ahead in the game. It is possible to
analyze the true coverage probabilities for ψbL and ψbW L , although the eval-
uation of the analytical formulae for these coverage probabilities requires
substantial numerical effort.
These analytical formulae are derived by using a well known conditioning
device, see Fleiss (1971) for a recent account of details. The formula for the
exact coverage probability for ψbL is as follows

Kρ (1 − α) = Pρ ψbL ≤ ψ
Z 1 q
= b(w)Fg G−1
b(w) (1 − α) ga1 (ρ)w + ga2 (ρ)(1 − w) dw
ρ
0

with
ρ 1−ρ
g =m+n−2, a1 (ρ) = , a2 (ρ) = .
m−1 n−1
Γ(α + β) α−1
b(w) = w (1 − w)β−1 I[0,1] (w)
Γ(α)Γ(β)
is the beta density with α = (m − 1)/2 and β = (n − 1)/2 and
wρ(n − 1)
ρ(w)
b = .
wρ(n − 1) + (1 − w)(1 − ρ)(m − 1)
G−1
ρ (p) is the inverse of
Z 1 q
Gρ (x) = Pρ (T ≤ x) = b(u)Fg x ga1 (ρ)u + ga2 (ρ)(1 − u) du .
0

73
The corresponding formula for the exact coverage of ψbW L is

Wρ (1 − α) = Pρ ψbW L ≤ ψ
Z 1 q
= b(w)Fg Ff−1b(w)) (1
(ρ
− α) ga1 (ρ)w + ga2 (ρ)(1 − w) dw .
0

When ρ = 0 or ρ = 1 and for any (m, n) one finds that the coverage prob-
abilities are exactly equal to the nominal values 1 − α, i.e., Kρ (1 − α) =
Wρ (1 − α) = 1 − α. This is seen most directly from the fact that in these
cases T ∼ Fm−1 and T ∼ Fn−1 , respectively.
Figure 3 displays the exact coverage probabilities Gρ (.95) and Wρ (.95) for
equal sample sizes m = n = 2, 3, 5 as a function of ρ ∈ [0, .5]. The full graph
is symmetric around ρ = .5 for m = n. It is seen that both procedures
are highly accurate even for small samples. Mostly the double bootstrap
based bounds are slightly more accurate than Welch’s method. However,
for ρ near zero or one there is a reversal. Note how fast the curve reversal
smoothes out as the sample sizes increase. Figure 4 shows the rate at which
the maximum coverage error for both procedures tends to zero for m = n =
2, . . . , 10, 15, 20, 30, 40, 50. It confirms the rate results given by Beran (1988).
The approximate asymptotes are the lines going through (0, 0) and the last
point, corresponding to m = n = 50. It seems plausible that the true
asymptotes actually coincide.
It may be of interest to find out what effect a further bootstrap iteration
would have on the exact coverage rate. The formulas for these coverage rates
are analogous to the previous ones with G−1 −1
b(w) (1 − α) and Ff (ρb(w)) (1 − α)
ρ
replaced by appropriate iterated inverses, adding considerably to the com-
plexity of numerical calculations. We conjecture that such an iteration will
increase the number of oscillations in the coverage curve. This may then ex-
plain why further iterations may lead to highly irregular coverage behavior.

74
Figure 3: Coverage Probabilities of 95% Lower Bounds
in the Behrens-Fisher Problem

● ●
●
0.97

●
●
●

●
m=n= 2
● ●
● ●
●
●
0.96

● ● ●
●
● ● m=n= 3
● ●
● ●
● ●
● ●
coverage probability

●
● ●
● ●
● ● ●
● ● ● ●
● ● ●
● ● ●
● m=n= 5
0.95

● ●
● ●
● ● ● ●
● ● ● ●
●
●●
●● ●
● ●
● ● ●
● ●● ● ●
● ●
●●
●
● ● ● ●
● ●
●● ●
● ●●
●●●●
● ●
● ●
● ●
●
0.94

●
●
●
Double Bootstrap
●
●
Welch Approximated d.f.
●
●
●●●●
0.93

0.0 0.1 0.2 0.3 0.4 0.5

75
Figure 4: Maximum Coverage Error of 95% Lower Bounds
in the Behrens-Fisher Problem
0.030

Double Bootstrap
Welch Approximated d.f.
Approximate Asymptotes
0.025

●
0.020
| coverage error |

●
0.015
0.010

●
0.005

●
●
●
●
● ●
● ●
●
●●
●
0.000

●●●
●

0.00 0.01 0.02 0.03 0.04 0.05 0.06

(m + n)−1

76
5.3 A Case Study
In this section we examine the small sample performance of various bootstrap
methods in the context of Example 2. In particular, we examine the situation
of obtaining confidence bounds for a normal percentile ψ = µ + zp σ. Using
the notation introduced in Section 5.2.4 we take θb = (ψ,b σ
b ) as estimate of
θ = (ψ, η) = (ψ, σ), with

ψb = X̄ + ks and σb = rs

for some known constants k and r > 0. We will make repeated use of the
following expression for the bootstrap distribution of ψb?

b? ≤ x = D (x) = G
√
Pψ,
bbσ
ψ ψ,
bb σ
√
n−1, n(ψ−x)/
b b
√ (−k n) ,
σ −zp n
(6)

which follows from Equation (5) after replacing parameters by estimates.

In the following subsections we will examine the true small sample cover-
age probabilitites of confidence bounds and equal tailed intervals for Efron’s
and Hall’s percentile methods, for the bias corrected percentile method, the
percentile-t method, and the various double bootstrap methods.

5.3.1 Efron’s Percentile Method

According to Efron’s percentile method we take the α-quantile of the boot-
strap distribution of ψb? as 100(1 − α)% lower bound ψbL for ψ. The (1 − α)-
quantile serves as the corresponding upper bound ψbU for ψ, and together
these two bounds serve as a 100(1 − 2α)% confidence interval for ψ.
If we denote by δγ = δγ (k, n) the δ which solves
√
Gn−1,δ (−k n) = γ,

and using (6) we can represent the above bounds as follows

σb √
ψbL = ψb − √ δα (k, n) + zp n = X̄ + k 0 s
n
√
with k 0 = k − rzp − rδα (k, n)/ n. A corresponding expression is obtained
for the upper bound
σb √
ψbU = ψb − √ δ1−α (k, n) + zp n = X̄ + k 00 s
n

77
√
with k 00 = k − rzp − rδ1−α (k, n)/ n.
From (5) the actual coverage probabilities of these bounds are obtained as
√ √
Pψ,σ ψbEL ≤ ψ = Gn−1,−zp √n rδα (k, n) + rzp n − k n ,
√ √
Pψ,σ ψbEU ≥ ψ = 1 − Gn−1,−zp √n rδ1−α (k, n) + rzp n − k n
and
√ √
Pψ,σ ψbEL ≤ ψ ≤ ψbEU = Gn−1,−zp √n rδα (k, n) + rzp n − k n
√ √
−Gn−1,−zp √n rδ1−α (k, n) + rzp n − k n .

Figure 5 shows the behavior of the coverage error of the 95% lower bound √
(with r = 1 and k = z.10 ) for ψ = µ + z.10 σ against the theoretical rate 1/ √n.
The actual size of the error is quite large even for large n. Also, the 1/ n
asymptote is approximated well only for moderately large n, say n ≥ 20.
Figure 6 shows the corresponding result for the upper bound. Note that the
size of the error is substantially smaller here. Finally, Figure 7 shows the
coverage error of the 95% equal tailed confidence interval for ψ against the
theoretical rate of 1/n. The asymptote is reasonably approximated for much
smaller n here.

5.3.2 Hall’s Percentile Method

According to Hall’s percentile method we take

ψbHL = ψb − x?1−α

as 100(1 − α)% level lower bound for ψ. Here x?1−α is the (1 − α)-quantile of
the bootstrap distribution for ψb? − ψ.
b The corresponding 100(1 − α)% level
upper bound is
ψbHU = ψb − x?α ,
and jointly these two bounds serve as a 100(1 − 2α)% confidence interval for
ψ.
From Equation (6) we obtain

b? − ψ

b≤x =G
√
Pψ, ψ √ √
bbσ σ −zp n (−k n) .
n−1,−x n/b

Thus we have √
x?1−α = −σb δ1−α (k, n)/ n + zp

78
Figure 5: Coverage Error of Lower Confidence Bounds Using
Efron’s Percentile Method with X̄ + zp s Estimating ψ = µ + zp σ
in a Normal Population, p = .1 and Confidence γ = .95
0.00

●
●
●●
●
●●
●
●●
●●
●●●●●●
●●
●
●
●

● ●
●
●
●
50
40
30
−0.05

20
15
1312
11
10
true − nominal coverage probability

9
8
−0.10

7
6

5
−0.15

4
−0.20

3
−0.25

n=2

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

1 n

79
Figure 6: Coverage Error of Upper Confidence Bounds Using
Efron’s Percentile Method with X̄ + zp s Estimating ψ = µ + zp σ
in a Normal Population, p = .1 and Confidence γ = .95

4 3
5
6
0.04

7
8 n=2
9
10
11
12
13
15
true − nominal coverage probability

0.03

40
0.02

50
●

●
●
0.01

●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
0.00

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

1 n

80
Figure 7: Coverage Error of Confidence Intervals Using Efron’s
Percentile Method with X̄ + zp s Estimating ψ = µ + zp σ
in a Normal Population, p = .1 and Confidence γ = .95
0.00

●
●
●
●●
●●●
●●●
●●
●
●
●

20
15
1312
11
−0.05

10
9
8
7
true − nominal coverage probability

5
−0.10

4
−0.15

3
−0.20
−0.25

n=2

0.0 0.1 0.2 0.3 0.4 0.5

1 n

81
and thus
√
ψbHL = ψb + σb δ1−α (k, n)/ n + zp
√
= X̄ + s k + rzp + rδ1−α (k, n)/ n = X̄ + k 0 s
√
with k 0 = k + rzp + rδ1−α (k, n)/ n.
From Equation 5 the actual coverage probability of ψbHL is
√ √
Pψ,σ ψbHL ≤ ψ = Gn−1,−zp √n (−k n − rzp n − rδ1−α (k, n)) .

Similarly one finds as actual coverage for the upper bound

√ √
Pψ,σ ψbHU ≥ ψ = 1 − Gn−1,−zp √n (−k n − rzp n − rδα (k, n))

and for the equal tailed interval

√ √
Pψ,σ ψbHL ≤ ψ ≤ ψbHU = Gn−1,−zp √n (−k n − rzp n − rδ1−α (k, n))
√ √
−Gn−1,−zp √n (−k n − rzp n − rδα (k, n)) .

Figures 8-10 show the qualitative behavior of the coverage error of these these
bounds when using k = z.10 , r = 1, and γ = .95. The error is moderately
improved over that of Efron’s percentile method but again sample sizes need
to be quite large before the theoretical asymptotic behavior takes hold. A
clearer comparison between Hall’s and Efron’s percentile methods can be
seen in Figures 11 and 12

5.3.3 Bias Corrected Percentile Method

The respective 100(1 − α)% lower and upper confidence bounds by the bias
corrected bootstrap method are defined as
−1
ψbbcL = Dψ,
bb σ
(Φ (2u0 + zα ))

and
−1
ψbbcU = Dψ,
bb σ
(Φ (2u0 + z1−α )) ,
where √
u0 = Φ−1 Dψ,
bb σ
( b = Φ−1 G
ψ) n−1,−z p
√ (−k n)
n

82
Figure 8: Coverage Error of Lower Confidence Bounds Using
Hall’s Percentile Method with X̄ + zp s Estimating ψ = µ + zp σ
in a Normal Population, p = .1 and Confidence γ = .95
0.00

●
●
●●
●
●●
●
●●
●●
●●●●●
●● ●
● ●
●
● ●
●
●
●
50 40
30
20
−0.05

15
1312
11
10
9
8
true − nominal coverage probability

7
6
−0.10

4
−0.15

3
−0.20
−0.25

n=2

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

1 n

83
Figure 9: Coverage Error of Upper Confidence Bounds Using
Hall’s Percentile Method with X̄ + zp s Estimating ψ = µ + zp σ
in a Normal Population, p = .1 and Confidence γ = .95

4 3
5
6
7 n=2
8
0.03

9
10
11
12
13
true − nominal coverage probability

20
0.02

40
50
●

●
●
●
0.01

●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●●
●
0.00

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

1 n

84
Figure 10: Coverage Error of Confidence Intervals Using Hall’s
Percentile Method with X̄ + zp s Estimating ψ = µ + zp σ
in a Normal Population, p = .1 and Confidence γ = .95
0.00

●
●
●
●●
●●●
●●●
●●
●
●
●

●
●
●
●
●

9
−0.05

8
7
6
true − nominal coverage probability

5
−0.10

4
−0.15

3
−0.20
−0.25

n=2

0.0 0.1 0.2 0.3 0.4 0.5

1 n

85
and Φ denotes the standard normal distribution function. When u0 = 0 these
bounds reduce to Efron’s percentile bounds. Since x = ψbbcU solves
√
Dψ, (x) = Gn−1,√n(ψ−x)/ √ (−k n) = Φ (2u0 + z1−α ) = γ(1 − α)
bbσ b bσ −zp n

we can express ψbbcU as follows

√ √
ψbbcU = ψb − σb δγ(1−α) (k, n) + zp n / n = X̄ + skU
√
with kU = k − rzp − rδγ(1−α) (k, n)/ n.
The corresponding expression for the bias corrected lower bound is
√ √
ψbbcL = ψb − σb δγ(α) (k, n) + zp n / n = X̄ + skL

with
√
γ(α) = Φ (2u0 + zα ) and kL = k − rzp − rδγ(α) (k, n)/ n .

From (5) the respective exact coverage probabilities are obtained as

√
Pψ,σ ψbbcL ≤ ψ = Gn−1,−zp √n (−kL n) ,
√
Pψ,σ ψbbcU ≥ ψ = 1 − Gn−1,−zp √n (−kU n) ,
and for the equal tailed interval
√
Pψ,σ ψbbcL ≤ ψ ≤ ψbbcU = Gn−1,−zp √n (−kL n)
√
= −Gn−1,−zp √n (−kU n) .

Figure 11 compares the qualitative behavior of upper and lower confidence

bounds by the bias corrected percentile method, Efron’s percentile method,
and Hall’s percentile
√ method. The comparison is again made against the
theoretical rate 1/ n, which is appropriate for all three methods. Bias cor-
rection appears to improve over both the other percentile methods. However,
for small sample sizes the magnitude of the actual coverage error is the more
dominant feature. The asymptotes are again approached only for large n.
The differences in coverage error become small relative to the actual coverage
error for large n. Figure 12 portrays the corresponding coverage properties
for equal tailed confidence intervals against the relevant rate of 1/n. The
above observations apply here as well.

86
Figure 11: Coverage Error of Confidence Bounds Comparing
Percentile Methods and Bias Correction with X̄ + zp s Estimating
ψ = µ + zp σ in Normal Population, p = .1 and Confidence γ = .95
0.05

upper bound ● ● ● ● ● ● ● ● ●

●
●

●
●
● ● ● ●
● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ●
● ● ● ●
● ● ●
● ● ● ●
● ●
● ● ● ● ● ● ●
● ●
● ● ● ●
● ● ● ● ●
● ● ●
0.00

●●
●● ● ●
● ●
● ●
●●
●●
● ●●●
●● ● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●●●

●
●
●
●
●
●●
●
●
●●
●
●●
●●
●
●
●●
●
●●
●●●
●●●
●●
●●●
●●●
●● ● ●
●● ●
● ● ●
● ●
● ●
● ● ●
● ● ● ●
● ● ●
● ● ● ●
● ● ●
●
● ● ●
●
●
● ●
● ●
−0.05

●
●
true − nominal coverage probability

● ●

lower bound ●
●
●
●

●
●
●
●
● ●
●
● ●
●
● ●
● ●
● ●

● ●
−0.10

●
● ●

●
●
−0.15

Efron's percentile method ●

Hall's percentile method

bias corrected percentile method
−0.20

with respective straight line asymptotes

−0.25

which appear to coincide for Hall's percentile method

●

and the bias corrected percentile method

●
−0.30

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

1 n

87
Figure 12: Coverage Error of Confidence Intervals Comparing
Percentile Methods and Bias Correction with X̄ + zp s Estimating
ψ = µ + zp σ in a Normal Population, p = .1 and Confidence γ = .95
0.00

●
●
●
●●
●●●
●●●
●●
●
●
●

●
●
●
●
−0.05

●
true − nominal coverage probability

●
−0.10

●
−0.15

Efron's percentile method

Hall's percentile method
bias corrected percentile method
−0.20

with respective straight line asymptotes

−0.25

0.0 0.1 0.2 0.3 0.4 0.5

1/n

88
5.3.4 Percentile-t and Double Bootstrap Methods
Using T = (ψb − ψ)/σb as the Studentized ratio in the percentile-t method will
result in the classical confidence bounds and thus there will be no coverage
error. This arises because T is an exact pivot.
If we take R = ψb − ψ as a root in Beran’s prepivoting method, we again
arrive at the same classical confidence bounds. This was already pointed out
in Section 5.2.3 and is due to the fact that the distribution of R only depends
on the nuisance parameter σ.
The automatic double bootstrap also arrives at the classical confidence bounds
as was already examined in Section 5.2.4. Thus the automatic double boot-
strap succeeds here without having to make a choice of scale estimate for
Studentization or of a root for prepivoting.

5.4 References
Aspin, A.A. (1949). “Tables for use in comparisons whose accuracy involves
two variances, separately estimated (with an Appendix by B.L. Welch).”
Biometrika 36, 290-296.
Bain, L.J. (1987). Statistical Analysis of Reliability and Life-Testing Models,
Theory and Methods. Marcel Dekker, Inc., New York.
Beran, R. (1987). “Prepivoting to reduce level error of confidence sets.”
Biometrika 74, 457-468.
Beran, R. (1988). “Prepivoting test statistics: A bootstrap view of asymp-
totic refinements.” J. Amer. Statist. Assoc. 83, 687-697.
Diaconis, P. and Efron, B. (1983a). “Computer intensive methods in statis-
tics.” Sci. Amer. 248, 116-130.
Diaconis, P. and Efron, B. (1983b). “Statistik per Computer: der Münchhausen-
Trick.” Spektrum der Wissenschaft, Juli 1983, 56-71. German transla-
tion of Diaconis, P. and Efron, B. (1983a), introducing the German term
Münchhausen for bootstrap.
DiCiccio, T.J. and Romano, J.P. (1988). “A review of bootstrap confidence
intervals.” (With discussion) J. Roy. Statist. Soc. Ser. B 50, 338-354.
Efron, B. (1979). “Bootstrap methods: Another look at the jackknife.” Ann.
Statist. 7, 1-26.

89
Efron, B. (1981). “Nonparametric standard errors and confidence intervals.”
(With discussion) Canad. J. Statist. 9, 139-172.
Efron, B. (1982). The Jackknife, the Bootstrap and Other Resampling Plans.
SIAM, Philadelphia.
Efron, B. (1987). “Better bootstrap confidence intervals.” (With discussion)
J. Amer. Statist. Assoc. 82, 171-200.
Fleiss, J.L. (1971). “On the distribution of a linear combination of indepen-
dent chi squares.” J. Amer. Statist. Assoc. 66, 142-144.
Hall, P. (1988a). “Theoretical comparison of bootstrap confidence intervals.”
(With discussion) Ann. Statist. 16, 927-985.
Hall, P. (1992). The Bootstrap and Edgeworth Expansion. Springer-Verlag,
New York.
Lehmann, E.L. (1986). Testing Statistical Hypotheses, Second Edition, John
Wiley & Sons, New York.
Loh, W. (1987). “Calibrating confidence coefficients.” J. Amer. Statist.
Assoc. 82, 155-162.
Reid, N. (1981). Discussion of Efron (1981).
Scholz, F.-W. (1994). “On exactness of the parametric double bootstrap.”
it Statistica Sinica, 4, 477-492.
Welch, B.L. (1947). “The generalization of ‘Student’s’ problem when several
different population variance are involved.” Biometrika 34, 28-35.

PDF Fundamentals of Causal Inference with R 1st Edition Babette A. Brumback download
100% (3)
PDF Fundamentals of Causal Inference with R 1st Edition Babette A. Brumback download
50 pages
Aiken L. Multiple Regression. Testing and Interpreting... 1991
No ratings yet
Aiken L. Multiple Regression. Testing and Interpreting... 1991
220 pages
Casella Berger Statistical Inference
67% (3)
Casella Berger Statistical Inference
686 pages
Store24 Data
No ratings yet
Store24 Data
8 pages
(Latest Edited) Full Note Sta404 - 01042022
No ratings yet
(Latest Edited) Full Note Sta404 - 01042022
108 pages
Cointegration Based Statistical Arbitrage
No ratings yet
Cointegration Based Statistical Arbitrage
91 pages
Large Eddy Simulation For Compressible Flows (Garnier, Adams, Sagaut) PDF
100% (3)
Large Eddy Simulation For Compressible Flows (Garnier, Adams, Sagaut) PDF
280 pages
Patrick Siarry (Editor) - Metaheuristics-Springer (2016) PDF
No ratings yet
Patrick Siarry (Editor) - Metaheuristics-Springer (2016) PDF
497 pages
FUZZ Luanan
No ratings yet
FUZZ Luanan
238 pages
Masterarbeit_Cresnik_Abgabefassung
No ratings yet
Masterarbeit_Cresnik_Abgabefassung
105 pages
ssrn-5162304
No ratings yet
ssrn-5162304
271 pages
(eBook PDF) Computational Blood Cell Mechanics: Road Towards Models and Biomedical Applications download
100% (1)
(eBook PDF) Computational Blood Cell Mechanics: Road Towards Models and Biomedical Applications download
43 pages
Where can buy (Ebook) Instructor's Solutions Manual for Probability and Statistics with R for Engineers and Scientists 1st edition by Michael Akritas ISBN 9780321853080, 0321853083 ebook with cheap price
No ratings yet
Where can buy (Ebook) Instructor's Solutions Manual for Probability and Statistics with R for Engineers and Scientists 1st edition by Michael Akritas ISBN 9780321853080, 0321853083 ebook with cheap price
82 pages
Well Models For Production Optimization: Marte Arianson
No ratings yet
Well Models For Production Optimization: Marte Arianson
73 pages
Combinations of Boolean Groebner Bases and SAT Solvers
No ratings yet
Combinations of Boolean Groebner Bases and SAT Solvers
101 pages
PHD Unimi R07738
No ratings yet
PHD Unimi R07738
134 pages
View - Approch Affine Model
No ratings yet
View - Approch Affine Model
216 pages
Notes
No ratings yet
Notes
172 pages
Steven 3
No ratings yet
Steven 3
72 pages
Final_Thesis_Version_Femke_Schurmann_4727738_
No ratings yet
Final_Thesis_Version_Femke_Schurmann_4727738_
102 pages
Applied Statistics with Python
100% (1)
Applied Statistics with Python
320 pages
18141
No ratings yet
18141
40 pages
Automatic Speech and Speaker Recognition: Large Margin and Kernel Methods
From Everand
Automatic Speech and Speaker Recognition: Large Margin and Kernel Methods
Joseph Keshet
No ratings yet
CRRA
No ratings yet
CRRA
277 pages
Instructor s Solutions Manual for Probability and Statistics with R for Engineers and Scientists 1st edition Michael Akritas - Quickly download the ebook to read anytime, anywhere
100% (1)
Instructor s Solutions Manual for Probability and Statistics with R for Engineers and Scientists 1st edition Michael Akritas - Quickly download the ebook to read anytime, anywhere
51 pages
Population Genetics: Tutorial
No ratings yet
Population Genetics: Tutorial
164 pages
(eBook PDF) Computational Blood Cell Mechanics: Road Towards Models and Biomedical Applications 2024 scribd download
100% (6)
(eBook PDF) Computational Blood Cell Mechanics: Road Towards Models and Biomedical Applications 2024 scribd download
56 pages
The Concept of Stability in Numerical Mathematics
100% (1)
The Concept of Stability in Numerical Mathematics
202 pages
Main - Decay 4print A4
No ratings yet
Main - Decay 4print A4
86 pages
Handbook Statistical Foundations of Machine Learning
No ratings yet
Handbook Statistical Foundations of Machine Learning
267 pages
Empirical IO Notes v4-1 PDF
No ratings yet
Empirical IO Notes v4-1 PDF
65 pages
255tutnotes Analysis
No ratings yet
255tutnotes Analysis
100 pages
Bernat
No ratings yet
Bernat
87 pages
Thesis Rapid Development Consistent Process Models PDF
No ratings yet
Thesis Rapid Development Consistent Process Models PDF
231 pages
Cesarone2020 Preface Contents
No ratings yet
Cesarone2020 Preface Contents
15 pages
Uoc Luong Phi Tham So
No ratings yet
Uoc Luong Phi Tham So
84 pages
Statistical Analysis of Contingency Tables (Fagerland, Morten W. Laake, Petter Lydersen Etc.) (Z-Library)
No ratings yet
Statistical Analysis of Contingency Tables (Fagerland, Morten W. Laake, Petter Lydersen Etc.) (Z-Library)
657 pages
Stability of Vector Differential Delay Equations 1st Edition Michael I. Gil’ (Auth.) all chapter instant download
100% (2)
Stability of Vector Differential Delay Equations 1st Edition Michael I. Gil’ (Auth.) all chapter instant download
81 pages
2014 Bookmatter DataAnalysis
No ratings yet
2014 Bookmatter DataAnalysis
20 pages
RPoE PDF
No ratings yet
RPoE PDF
253 pages
What Teachers Should Know About The Bootstrap Resa
No ratings yet
What Teachers Should Know About The Bootstrap Resa
84 pages
PDF (Ebook) Multivariable advanced calculus by Kuttler K download
100% (9)
PDF (Ebook) Multivariable advanced calculus by Kuttler K download
65 pages
Statistical Inference in Science
No ratings yet
Statistical Inference in Science
262 pages
Instructor s Solutions Manual for Probability and Statistics with R for Engineers and Scientists 1st edition Michael Akritas download pdf
100% (3)
Instructor s Solutions Manual for Probability and Statistics with R for Engineers and Scientists 1st edition Michael Akritas download pdf
79 pages
Vstatmp E17
No ratings yet
Vstatmp E17
504 pages
[Ebooks PDF] download Multivariable advanced calculus Kuttler K full chapters
100% (1)
[Ebooks PDF] download Multivariable advanced calculus Kuttler K full chapters
65 pages
Advanced Statistics Estimation With Handwritten Solutions
No ratings yet
Advanced Statistics Estimation With Handwritten Solutions
285 pages
95119
No ratings yet
95119
47 pages
Quantity Implicatures 1st Edition Bart Geurts download pdf
100% (1)
Quantity Implicatures 1st Edition Bart Geurts download pdf
67 pages
Multivariable Advanced Calculus Kenneth L. Kuttler - The full ebook version is available, download now to explore
100% (1)
Multivariable Advanced Calculus Kenneth L. Kuttler - The full ebook version is available, download now to explore
53 pages
8822 LectureNotes
No ratings yet
8822 LectureNotes
74 pages
Spatial AutoRegression SAR Model Parameter Estimation Techniques
0% (1)
Spatial AutoRegression SAR Model Parameter Estimation Techniques
81 pages
MasterThesis Chiesa
No ratings yet
MasterThesis Chiesa
146 pages
Estimations
100% (1)
Estimations
183 pages
An Introduction To Modern Bayesian Econometrics: Tony Lancaster May 26, 2003
No ratings yet
An Introduction To Modern Bayesian Econometrics: Tony Lancaster May 26, 2003
10 pages
JB Ies 109 Exercises Answers
No ratings yet
JB Ies 109 Exercises Answers
246 pages
An Introduction to Biostatistic 3rd Edition Thomas Glover download
100% (1)
An Introduction to Biostatistic 3rd Edition Thomas Glover download
74 pages
Preconditioners For Iterative Solutions of Large-Sclae Linear Systems Arising From Biots Consolidation Equations
No ratings yet
Preconditioners For Iterative Solutions of Large-Sclae Linear Systems Arising From Biots Consolidation Equations
258 pages
EVT For Visual Recognition Review PDF
No ratings yet
EVT For Visual Recognition Review PDF
108 pages
Lecture Notes
No ratings yet
Lecture Notes
90 pages
LectureNotes New PDF
No ratings yet
LectureNotes New PDF
112 pages
Using Gretl For Principles of Econometrics, 3rd Edition
No ratings yet
Using Gretl For Principles of Econometrics, 3rd Edition
381 pages
Robust Methods in Biostatistics
From Everand
Robust Methods in Biostatistics
Stephane Heritier
No ratings yet
Computer-Aided Modeling of Reactive Systems
From Everand
Computer-Aided Modeling of Reactive Systems
Warren E. Stewart
No ratings yet
IEEE Xplore Full-Text PDF
No ratings yet
IEEE Xplore Full-Text PDF
1 page
Mathematics: Least-Squares Estimators of Drift Parameter For Discretely Observed Fractional Ornstein-Uhlenbeck Processes
No ratings yet
Mathematics: Least-Squares Estimators of Drift Parameter For Discretely Observed Fractional Ornstein-Uhlenbeck Processes
20 pages
Bootstrap 1
No ratings yet
Bootstrap 1
16 pages
Take Profit and Stop Loss Trading Strategies Comparison in Combination With An MACD Trading System
No ratings yet
Take Profit and Stop Loss Trading Strategies Comparison in Combination With An MACD Trading System
23 pages
TWO Estimated Snow Load Curves AND THE Bootstrap Significance Test
No ratings yet
TWO Estimated Snow Load Curves AND THE Bootstrap Significance Test
15 pages
Classification With Rejection Concepts and Evaluations
No ratings yet
Classification With Rejection Concepts and Evaluations
13 pages
Machine Learning For Stock Selection
No ratings yet
Machine Learning For Stock Selection
19 pages
How To Spot Backtest Overfitting: Lawrence Berkeley National Lab (Retired), and University of California, Davis
No ratings yet
How To Spot Backtest Overfitting: Lawrence Berkeley National Lab (Retired), and University of California, Davis
15 pages
Lecture 20: Model Adaptation: Machine Learning April 15, 2010
No ratings yet
Lecture 20: Model Adaptation: Machine Learning April 15, 2010
35 pages
Front-End Factor Analysis For Speaker Verification
No ratings yet
Front-End Factor Analysis For Speaker Verification
12 pages
Bootstrapping Time Series Models
No ratings yet
Bootstrapping Time Series Models
43 pages
Fa Theory
No ratings yet
Fa Theory
17 pages
Deep Learning For Financial Time Series Forecasting in A-Trader
No ratings yet
Deep Learning For Financial Time Series Forecasting in A-Trader
8 pages
Yale PDF
No ratings yet
Yale PDF
6 pages
Moskowitz Time Series Momentum Paper
No ratings yet
Moskowitz Time Series Momentum Paper
23 pages
Algorithms For Portfolio Management Based On The Newton Method
No ratings yet
Algorithms For Portfolio Management Based On The Newton Method
8 pages
2000 Conf
No ratings yet
2000 Conf
22 pages
The Jackknife
No ratings yet
The Jackknife
24 pages
Eco 270
No ratings yet
Eco 270
9 pages
Exercises_w9
No ratings yet
Exercises_w9
4 pages
Principles and Applications of Multilevel Modeling in Human Resource Management Research
No ratings yet
Principles and Applications of Multilevel Modeling in Human Resource Management Research
15 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
7 pages
Rujukan 3-9 (Pake Rumus Transfer Pricing Disini)
No ratings yet
Rujukan 3-9 (Pake Rumus Transfer Pricing Disini)
17 pages
Correlation and Regression
No ratings yet
Correlation and Regression
7 pages
Chapter 1 PDF
No ratings yet
Chapter 1 PDF
32 pages
Journal of Statistical Software: Elastic Net Regularization Paths For All Generalized Linear Models
No ratings yet
Journal of Statistical Software: Elastic Net Regularization Paths For All Generalized Linear Models
31 pages
Simple Regression: Quality MKT Share X Y Error LINEST Output (1
100% (1)
Simple Regression: Quality MKT Share X Y Error LINEST Output (1
2 pages
MSC Nursing
No ratings yet
MSC Nursing
8 pages
Machine Learning Based Predicting House Prices Using Regression Techniques
No ratings yet
Machine Learning Based Predicting House Prices Using Regression Techniques
7 pages
Lampiran 1 Perhitungan CV (Coefficient of Variation) : 1. Kategori 12 Periode: A. Abita Satin
No ratings yet
Lampiran 1 Perhitungan CV (Coefficient of Variation) : 1. Kategori 12 Periode: A. Abita Satin
117 pages
Sta Tug Logistic
No ratings yet
Sta Tug Logistic
240 pages
Kripfganz Schneider 2023 Ardl Estimating Autoregressive Distributed Lag and Equilibrium Correction Models
No ratings yet
Kripfganz Schneider 2023 Ardl Estimating Autoregressive Distributed Lag and Equilibrium Correction Models
37 pages
Lecture 9 Simple Regression
No ratings yet
Lecture 9 Simple Regression
52 pages
CrossValidationReport-Koordinat Lokasi Hand AugerXV
No ratings yet
CrossValidationReport-Koordinat Lokasi Hand AugerXV
5 pages
EViews 7 Users Guide II PDF
No ratings yet
EViews 7 Users Guide II PDF
822 pages
Choosing The Correct Statistical Test (CHS 627 - University of Alabama)
No ratings yet
Choosing The Correct Statistical Test (CHS 627 - University of Alabama)
3 pages
Weibull Distribution
No ratings yet
Weibull Distribution
29 pages
Tutorials2016s1 Week7 Answers-3
No ratings yet
Tutorials2016s1 Week7 Answers-3
5 pages
Problem Set 2
No ratings yet
Problem Set 2
3 pages
Post Hoc Test: Descriptives
No ratings yet
Post Hoc Test: Descriptives
3 pages
Immediate Download Medical Instrument Design and Development From Requirements To Market Placements Claudio Becchetti Ebooks 2024
100% (4)
Immediate Download Medical Instrument Design and Development From Requirements To Market Placements Claudio Becchetti Ebooks 2024
62 pages
Estimation of The Mean and Proportion: Prem Mann, Introductory Statistics, 9/E
No ratings yet
Estimation of The Mean and Proportion: Prem Mann, Introductory Statistics, 9/E
84 pages
Lampiran A. Multiple Paired Comparison Test
No ratings yet
Lampiran A. Multiple Paired Comparison Test
2 pages
Regression Make Simple
No ratings yet
Regression Make Simple
13 pages