Lecture 4

Inference, required resource.

Uploaded by

OMAR FAROQUE

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

2 views

Lecture 4

Inference, required resource.

Uploaded by

OMAR FAROQUE

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 6

CHAPTER 2 ESTIMATION Here discuss about bootstrapping and jackknifing technique by following two subsection: 2.1.5.1 Bootstrap Technique A random sample x = (¢;,t2,--- ,,) from an unknown probability distribution F has been observed and we wish to estimate a parameter of interest @ = t(F) on the basis of x. For this purpose, we calculate an estimate 6 = s(x) from x. [Note that s(x) may be the plug-in estimate ¢(#*), but doesn’t have to be.] How accurate is 6? The bootstrap was published by Bradley Efron in (1979), inspired by earlier work on the jackknife, The basic idea of bootstrapping is that inference about a population from sample data (sample + population) can be modelled by “resampling” the sample data and performing inference about a sample from resampled data (resampled — sample) As the population is unknown, the true error in a sample statistic against its population value is unknown. In bootstrap-resamples, the ‘population’ is in fact the sample, and this is known; hence the quality of inference of the ‘true’ sample from resampled data (resampled —> sample) is measurable. Empirical Distribution Function Let X1, X2,-+ , Xp be independent, identically distributed real random variables with the common cumulative distribution function F(t). Then the empirical distribution function is defined as follow number of clements in the sample < ¢ 7" mo Ixct where 14 is the indicator of event For a fixed ¢, the indicator Ly,<; is a Bernoulli random variable with parameter p = F(t); hence nF,(t) is a binomial random variable with mean nF(f) and variance nF(t) [1 — F(t)]. This implies that F,(t) is an unbiased STAT-4201: Statistical Inference IIT 18 Prof. M. RahmanCHAPTER 2 ESTIMATION 1.004 075+ Fri(x) 050+ 025+ 0.70 O15 0.20 025 x Figure 2.2: Empirical distribution function of 100 rolls of the die. estimator for F(t). However, in some textbooks, the definition is given as, F(t) = Fy D2 Lxce Example: A random sample of 100 rolls of the die. The outcomes 1, 2, 3, 4, 5, 6 occurred 13, 19, 10, 17, 14, 27 times, respectively, so the empirical distribution is F = (13, .19, .10, 17, .14, 27). Outcomes are: 63246 665362262315166415366414256655362 66141561633222522414566622461222515354214665646 436414544232146. Plug-in Principle The plug-in principle is the method of estimation of functionals of a population distribu tion by evaluating the same functionals at the empirical distribution based on a sample For example, when estimating the population mean, this method uses the mple mean: STAT-4201: Statistical Inference IIT 19 Prof. M. RahmanCHAPTER 2. ESTIMATION to estimate the population median, it uses the sample median; to estimate the population regression line, it uses the sample regression line. The best example of the plug-in principle, the bootstrapping method. Definition: ‘The plug-in principle is a simple method of estimating parameters from samples. The plug-in estimate of a parameter 0 = t(F) is defined to be (), In other words, we estimate the function @ = 1(F) of the probability distribution F by 6=+(#) the same function of the empirical distribution F, Bootstrap Sample (Definition): Bootstrap methods depend on the notion of a bootstrap sample. Let F be the empirical distribution, putting probability 1 on each of the observed values an,i = 1,2,--+ sn A bootstrap sample is defined to be a random sample of size n drawn from F, s *) x= (v},23, Po (2},23,--+ 23) (2.4) ‘The star notation indicates that x* is not the actual data set x, but rather a randomized, or resampled, version of x. There is another way to say (2.4): the bootstrap data points xj, 23,--- 2 are a random sample of size n drawn with replacement from the population of n objects «1,22,+-+ tn" Corresponding to a bootstrap data set x* is a bootstrap replication of , ‘The quantity s(x*) is the result of applying the same function s(-) to x" as was applied tox, For example if s(x) is the sample mean x then s(x*) is the mean of the bootstrap STAT-4201: Statistical Inference IIT 20 Prof. M. RahmanCHAPTER 2 ESTIMATION data set y ‘a Bootstrap Standard Errors: Bootstrap algorithm for estimating standard errors can be described as 2 # each consisting of n data (i) Select B independent bootstrap samples x", values drawn with replacement from x. Evaluate the bootstrap replication corresponding to each bootstrap sample, G(b) = s(x"), b= 1,2,---,B Estimate the standard error ser(4) by the sample standard deviation of the B (ii) replications as follow Bp [Fadteo voy) t= where 6*(-) = 4 S77, 6°(b). Types of Bootstrap Scheme (1) Case resampling (i) Estimating the distribution of sample mean (i) Regression (2) Bayesian bootstrap (3) Smooth bootstrap (4) Parametric bootstrap STAT-4201: Statistical Inference IIT 2 Prof. M. RahmanCHAPTER 2 ESTIMATION Bootstrap Bootstrap Bootstrap Estimate Empirical Samples ot Replications «Gf Standard Error Distribution Sizen of 8 ¥ at | { xh bet) = s(x*) LA x? > G12) = s(x") = *( a) CA —> bt) 250") ae > x® + bby = si) xP» 1B) = s(x") 1a A 80S? Sone fe weriror] BA a ow where 6") =2n4 5 Figure 2.3: The bootstrap algorithm for estimating the standard error of a statistic 6 = s(x); each bootstrap sample is an independent random sample of size n from F. The number of bootstrap replications B for estimating a standard error is usually between 25 and 200, As B > 00, Sép approaches the plug-in estimate of ser(4) (5) Resampling residuals (6) Gaussian process regression bootstrap (7) Wild bootstrap (8) Block bootstrap (i) Time series: Simple block bootstrap (ii) Time series: Moving block bootstrap (iii) Cluster data: block bootstrap Advantages: A great advantage of bootstrap is its simplicity. It is a straightforward way to derive estimates of standard errors and confidence intervals for complex estimators STAT-4201: Statistical Inference IIT 2 Prof. M. RahmanCHAPTER 2 ESTIMATION of complex parameters of the distribution, such as percentile points, proportions, odds ratio, and correlation coefficients, Bootstrap is also an appropriate way to control and check the stability of the results. Although for most problems it is impossible to know true confidence interval, bootstrap is asymptotically more accurate than the standard intervals obtained using sample variance and assumptions of normality. Disadvantages: Although bootstrapping is (under some conditions) asymptotically con- sistent, it does not provide general fini ample guarantees. ‘The apparent simplicity may conceal the fact that important assumptions are being made when undertaking the bootstrap analysis (e.g. independence amples) where these would be more formally stated in other approaches. STAT-4201: Statistical Inference IIT 28 Prof. M. Rahman