CHAPTER 2 ESTIMATION
Here discuss about bootstrapping and jackknifing technique by
following two subsection:
2.1.5.1 Bootstrap Technique
A random sample x = (¢;,t2,--- ,,) from an unknown probability distribution F has
been observed and we wish to estimate a parameter of interest @ = t(F) on the basis of
x. For this purpose, we calculate an estimate 6 = s(x) from x. [Note that s(x) may be
the plug-in estimate ¢(#*), but doesn’t have to be.] How accurate is 6?
The bootstrap was published by Bradley Efron in (1979), inspired by earlier work on
the jackknife, The basic idea of bootstrapping is that inference about a population from
sample data (sample + population) can be modelled by “resampling” the sample data
and performing inference about a sample from resampled data (resampled — sample)
As the population is unknown, the true error in a sample statistic against its population
value is unknown. In bootstrap-resamples, the ‘population’ is in fact the sample, and
this is known; hence the quality of inference of the ‘true’ sample from resampled data
(resampled —> sample) is measurable.
Empirical Distribution Function
Let X1, X2,-+ , Xp be independent, identically distributed real random variables with the
common cumulative distribution function F(t). Then the empirical distribution function
is defined as follow
number of clements in the sample < ¢
7"
mo Ixct
where 14 is the indicator of event
For a fixed ¢, the indicator Ly,<; is a Bernoulli
random variable with parameter p = F(t); hence nF,(t) is a binomial random variable
with mean nF(f) and variance nF(t) [1 — F(t)]. This implies that F,(t) is an unbiased
STAT-4201: Statistical Inference IIT 18 Prof. M. RahmanCHAPTER 2 ESTIMATION
1.004
075+
Fri(x)
050+
025+
0.70 O15 0.20 025
x
Figure 2.2: Empirical distribution function of 100 rolls of the die.
estimator for F(t).
However, in some textbooks, the definition is given as, F(t) = Fy D2 Lxce
Example: A random sample of 100 rolls of the die. The outcomes 1, 2, 3, 4, 5, 6
occurred 13, 19, 10, 17, 14, 27 times, respectively, so the empirical distribution is F =
(13, .19, .10, 17, .14, 27).
Outcomes are: 63246 665362262315166415366414256655362
66141561633222522414566622461222515354214665646
436414544232146.
Plug-in Principle
The plug-in principle is the method of estimation of functionals of a population distribu
tion by evaluating the same functionals at the empirical distribution based on a sample
For example, when estimating the population mean, this method uses the
mple mean:
STAT-4201: Statistical Inference IIT 19 Prof. M. RahmanCHAPTER 2. ESTIMATION
to estimate the population median, it uses the sample median; to estimate the popula-
tion regression line, it uses the sample regression line. The best example of the plug-in
principle, the bootstrapping method.
Definition: ‘The plug-in principle is a simple method of estimating parameters from
samples. The plug-in estimate of a parameter 0 = t(F) is defined to be
(),
In other words, we estimate the function @ = 1(F) of the probability distribution F by
6=+(#)
the same function of the empirical distribution F,
Bootstrap Sample (Definition):
Bootstrap methods depend on the notion of a bootstrap sample. Let F be the empirical
distribution, putting probability 1 on each of the observed values an,i = 1,2,--+ sn
A bootstrap sample is defined to be a random sample of size n drawn from F, s
*)
x= (v},23,
Po (2},23,--+ 23) (2.4)
‘The star notation indicates that x* is not the actual data set x, but rather a randomized,
or resampled, version of x.
There is another way to say (2.4): the bootstrap data points xj, 23,--- 2 are a random
sample of size n drawn with replacement from the population of n objects «1,22,+-+ tn"
Corresponding to a bootstrap data set x* is a bootstrap replication of ,
‘The quantity s(x*) is the result of applying the same function s(-) to x" as was applied
tox, For example if s(x) is the sample mean x then s(x*) is the mean of the bootstrap
STAT-4201: Statistical Inference IIT 20 Prof. M. RahmanCHAPTER 2 ESTIMATION
data set
y
‘a
Bootstrap Standard Errors:
Bootstrap algorithm for estimating standard errors can be described as
2 # each consisting of n data
(i) Select B independent bootstrap samples x",
values drawn with replacement from x.
Evaluate the bootstrap replication corresponding to each bootstrap sample,
G(b) = s(x"), b= 1,2,---,B
Estimate the standard error ser(4) by the sample standard deviation of the B
(ii)
replications as follow
Bp [Fadteo voy)
t=
where 6*(-) = 4 S77, 6°(b).
Types of Bootstrap Scheme
(1) Case resampling
(i) Estimating the distribution of sample mean
(i) Regression
(2) Bayesian bootstrap
(3) Smooth bootstrap
(4) Parametric bootstrap
STAT-4201: Statistical Inference IIT 2 Prof. M. RahmanCHAPTER 2 ESTIMATION
Bootstrap Bootstrap Bootstrap Estimate
Empirical Samples ot Replications «Gf Standard Error
Distribution Sizen of 8
¥ at |
{ xh bet) = s(x*)
LA x? > G12) = s(x")
= *( a)
CA —> bt) 250")
ae
> x® + bby = si)
xP» 1B) = s(x")
1a
A 80S?
Sone fe weriror]
BA
a ow
where 6") =2n4 5
Figure 2.3: The bootstrap algorithm for estimating the standard error of a statistic
6 = s(x); each bootstrap sample is an independent random sample of size n from F. The
number of bootstrap replications B for estimating a standard error is usually between 25
and 200, As B > 00, Sép approaches the plug-in estimate of ser(4)
(5) Resampling residuals
(6) Gaussian process regression bootstrap
(7) Wild bootstrap
(8) Block bootstrap
(i) Time series: Simple block bootstrap
(ii) Time series: Moving block bootstrap
(iii) Cluster data: block bootstrap
Advantages: A great advantage of bootstrap is its simplicity. It is a straightforward
way to derive estimates of standard errors and confidence intervals for complex estimators
STAT-4201: Statistical Inference IIT 2 Prof. M. RahmanCHAPTER 2 ESTIMATION
of complex parameters of the distribution, such as percentile points, proportions, odds
ratio, and correlation coefficients, Bootstrap is also an appropriate way to control and
check the stability of the results. Although for most problems it is impossible to know
true confidence interval, bootstrap is asymptotically more accurate than the standard
intervals obtained using sample variance and assumptions of normality.
Disadvantages: Although bootstrapping is (under some conditions) asymptotically con-
sistent, it does not provide general fini
ample guarantees. ‘The apparent simplicity may
conceal the fact that important assumptions are being made when undertaking the boot-
strap analysis (e.g. independence
amples) where these would be more formally stated
in other approaches.
STAT-4201: Statistical Inference IIT 28 Prof. M. Rahman