Interval Estimation: Part 1: Statistics (MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2018
Interval Estimation: Part 1: Statistics (MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2018
(Module 3)
Statistics (MAST20005) & Elements of Statistics (MAST90058)
Semester 2, 2018
Contents
1 The need to quantify uncertainty 1
2 Standard error 2
3 Confidence intervals 3
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.3 Important distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.4 Pivots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.5 Common scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
We have learnt how to do basic inference, using point estimates. What’s next?
1
How useful are point estimates?
Example: surveying Melbourne residents as part of a disability study. The results will be used to set a budget for
disability support.
Estimate from survey: 5% of residents are disabled
What can we conclude?
Estimate from a second survey: 2% of residents are disabled
What can we now conclude?
What other information would be useful to know?
2 Standard error
Report sd(Θ̂)?
Estimate sd(Θ̂)!
2
Standard error
The standard error of an estimate is the estimated standard deviation of the estimator.
Notation:
• Parameter: θ
• Estimator: Θ̂
• Estimate: θ̂
• Standard deviation of the estimator: sd(Θ̂)
• Standard error of the estimate: se(θ̂)
Note: some people also refer to the standard deviation of the estimator as the standard error. This is potentially
confusing, best to avoid doing this.
More info:
• First survey: 5% ± 4%
• Second survey: 2% ± 0.1%
What would we now conclude?
What result should we use for setting the disability support budget?
3 Confidence intervals
3.1 Introduction
Interval estimates
3
Example
0.3
Stan. Norm. PDF
0.2
0.1
0.0
−3 −2 −1 0 1 2 −3
x
Rearranging gives:
1 1
Pr X̄ − 1.96 √ < µ < X̄ + 1.96 √ = 0.95
n n
√ √
This says that the interval (X̄ − 1.96/ n, X̄ + 1.96/ n) has probability 0.95 of containing the parameter µ.
We use this as an interval estimator.
√ √
The resulting interval estimate, (x̄ − 1.96/ n, x̄ + 1.96/ n) is called a 95% confidence interval for µ.
Example
4
• Can write it more formally as a bivariate normal distribution:
" # !
µ − 1.96 √1n 1 1
L 1
∼ N2 ,
U µ + 1.96 √1n n 1 1
Interpretation
• This interval estimator is a random interval and is calculable from our sample. The parameter is fixed and
unknown.
• Before the sample is taken, the probability the random interval contains µ is 95%.
• After the sample is taken, we have a realised interval. It no longer has a probabilistic interpretation; it either
contains µ or it doesn’t.
• This makes the interpretation somewhat tricky. We argue simply that it would be unlucky if our interval did
not contain µ.
• In this example, the interval happens to be of the form, est ± error. This will be the case for many of the
confidence intervals we derive.
Random sample (iid): X1 , . . . , Xn ∼ N(µ, σ 2 ), and assume that we know the value of σ 2 .
2
The sampling distribution of the sample mean is X̄ ∼ N(µ, σn ). Let Φ−1 (1 − α/2) = c, so we can write:
X̄ − µ
Pr −c < √ <c =1−α
σ/ n
or, equivalently,
σ σ
Pr µ − c √ < X̄ < µ + c √ =1−α
n n
Rearranging gives:
σ σ
Pr X̄ − c √ < µ < X̄ + c √ =1−α
n n
5
The following random interval contains µ with probability 1 − α:
σ σ
(X̄ − c √ , X̄ + c √ )
n n
Observe x̄ and construct the interval. This gives a 100 · (1 − α)% confidence interval for the population mean µ.
Worked example
Suppose X ∼ N(µ, 362 ) represents the lifetime of a light bulb, in hours. Test 27 bulbs, observe x̄ = 1478.
Let c = Φ−1 (0.975). A 95% confidence interval for µ is:
σ 36
x̄ ± c √ = 1478 ± 1.96 √ = [1464, 1492]
n 27
In other words, we have good evidence that the mean lifetime for a light bulb is approximately 1,460–1,490 hours.
Shaded
Probability is 90%
-1.645 1.645
3.2 Definition
Definitions
• An interval estimate is a pair of statistics defining an interval that aims to convey an estimate (of a parameter)
with uncertainty.
• A confidence interval is an interval estimate constructed such that the corresponding interval estimator has a
specified probability, known as the confidence level, of containing the true value of the parameter being estimated.
• We often use the abbreviation CI for ‘confidence interval’.
6
General technique for deriving a CI
• Start with an estimator, T , whose sampling distribution is known
• Write the central probability interval based on its sampling distribution,
Take a random sample of size n from an exponential distribution with rate parameter λ.
1. Derive an exact 95% confidence interval for λ.
2. Suppose your sample is of size 9 and has sample mean 3.93.
(a) What is your 95% confidence interval for λ?
(b) What is your 95% confidence interval for the population mean?
3. Repeat the above using the CLT approximation (rather than an exact interval).
Recap
• A point estimate is a single number that is our ‘best guess’ at the true parameter value. In other words, it is
meant to be the ‘most plausible’ value for the parameter, given the data.
• However, this doesn’t allow us to adequately express our uncertainty of this estimate.
• An interval estimate aims to provide a range of values that are plausible based on the observed data. This
allows us to more adequately express our uncertainty of the estimate, by giving an indication of the various
plausible alternative true values.
• The most common type of interval estimate is a confidence interval.
● ● ● ●
7
Width of CIs
Interpreting CIs
• Narrower width usually indicates stronger/greater evidence about the plausible true values for the parameter
being estimated
• Very wide CI ⇒ usually cannot conclude much other than that we have insufficient data
• Moderately wide CI ⇒ conclusions often depend on the location of the interval
• Narrow CI ⇒ more confident about the possible true values, often can be more conclusive
• What constitutes ‘wide’ or ‘narrow’, and how conclusive/useful the CI actually is, will depend on the context of
the study question
Chi-squared distribution
• Also written in text as χ2 -distribution
• Single parameter: k > 0, known as the degrees of freedom
• Notation: T ∼ χ2k or T ∼ χ2 (k)
• The pdf is:
k t
t 2 −1 e− 2
f (t) = k , t≥0
2 2 Γ( k2 )
• When sampling from a normal distribution, the sample variance follows a χ2 -distribution:
(n − 1)S 2
∼ χ2n−1
σ2
8
Student’s t-distribution
• Also known as simply the t-distribution
• Single parameter: k > 0, the degrees of freedom (same as for χ2 )
• Notation: T ∼ tk or T ∼ t(k)
• The pdf is:
− k+1
Γ( k+1 ) t2
2
E(T ) = 0, if k > 1
k
var(T ) = , if k > 2
k−2
• The t-distribution is similar to a standard normal but with ‘wide’ tails
• As k → ∞, then tk → N(0, 1)
• If Z ∼ N(0, 1) and U ∼ χ2 (r) then
Z
T =p ∼ tr
U/r
• This arises when considering the sampling distributions of statistics from a normal distribution, in particular:
X̄−µ
√
σ/ n X̄ − µ
T =q = √ ∼ tn−1
(n−1)S 2 S/ n
σ2 /(n − 1)
F -distribution
• Also known as the Fisher-Snedecor distribution
• Parameters: m, n > 0, the degrees of freedom (same as before)
• Notation: W ∼ Fm,n or W ∼ F(m, n)
• If U ∼ χ2m and V ∼ χ2n are independent then
U/m
F = ∼ Fm,n
V /n
3.4 Pivots
Pivots
Recall our general technique that starts with a probability interval using a statistic with a known sampling distribution:
The easiest way to make this technique work is by finding a function of the data and the parameters, Q(X1 , . . . , Xn ; θ),
whose distribution does not depend on the parameters. In other words, it is a random variable that has the same
distribution regardless of the value of θ.
The quantity Q(X1 , . . . , Xn ; θ) is called a pivot or a pivotal quantity.
9
Remarks about pivots
• The value of the pivot can depend on the parameters, but its distribution cannot.
• Since pivots are a function of the parameteres as well as the data, they are usually not statistics.
• If a pivot is also a statistic, then it is called an ancillary statistic.
Examples of pivots
• We have already seen the following result for sampling from a normal distribution with known variance:
X̄ − µ
Z= √ ∼ N(0, 1).
σ/ n
Normal distribution:
• Inference for a single mean
– Known σ
– Unknown σ
• Comparison of two means
– Known σ
– Unknown σ
– Paired samples
• Inference for a single variance
• Comparison of two variances
Proportions:
• Inference for a single proportion
• Comparison of two proportions
Random sample (iid): X1 , . . . , Xn ∼ N(µ, σ 2 ), and assume that we know the value of σ.
We’ve seen this scenario already in previous examples.
Use the pivot:
X̄ − µ
Z= √ ∼ N(0, 1).
σ/ n
10
Normal, single mean, unknown σ
Rearranging gives:
S S
Pr X̄ − c √ < µ < X̄ + c √ =1−α
n n
X ∼ N(µ, σ 2 ) is the amount of butterfat produced by a cow. Examining n = 20 cows results in x̄ = 507.5 and
s = 89.75. Let c be the 0.95 quantile of t19 , we have c = 1.729. Therefore, a 90% confidence interval for µ is,
89.75
507.50 ± 1.729 √ = [472.80, 542.20]
20
> butterfat
[1] 481 537 513 583 453 510 570 500 457 555 618 327
[13] 350 643 499 421 505 637 599 392
data: butterfat
t = 25.2879, df = 19, p-value = 4.311e-16
alternative hypothesis: true mean is not equal to 0
90 percent confidence interval:
472.7982 542.2018
sample estimates:
mean of x
507.5
> sd(butterfat)
[1] 89.75082
11
650
600
550
Sample Quantiles
500
450
400
350
−2 −1 0 1 2
Theoretical Quantiles
Remarks
• CIs based on a t-distribution (or a normal distribution) are of the form:
for an appropriate quantile, c, which depends on the sample size (n) and the confidence level (1 − α).
• The t-distribution is appropriate if the sample is from a normally distributed population.
• Can check using a QQ plot (in this example, looks adequate).
• If not normal but n is large, can construct approximate CIs using the normal distribution (as we did in a previous
example). This is usually okay if the distribution is continuous, symmetric and unimodal (i.e. has a single ‘mode’,
or maximum value).
• If not normal and n small, distribution-free methods can be used. We will cover these later in the semester.
Suppose we have two populations, with means µX and µY , and want to know how much they differ.
2
Random samples (iid) from each population: X1 , . . . , Xn ∼ N(µX , σX ) and Y1 , . . . , Ym ∼ N(µY , σY2 )
The two samples must be independent of each other.
2
Assume σX and σY2 are known. Then we have the following pivot (why?):
X̄ − Ȳ − (µX − µY )
q 2 2
∼ N(0, 1)
σX σY
n + m
12
Normal, two means, unknown σ, many samples
2
What if we don’t know σX and σY2 ?
If n and m are large, we can just replace σX and σY by estimates, e.g. the sample standard deviations SX and SY .
Rationale: these will be good estimates when the sample size is large.
The (approximate) pivot is then:
X̄ − Ȳ − (µX − µY )
q 2 2
≈ N(0, 1)
SX SY
n + m
where s
2 + (m − 1)S 2
(n − 1)SX Y
SP =
n+m−2
is the pooled estimate of the common variance.
Note that the unknown σ has disappeared (cancelled out), therefore making T a pivot (why?).
We can now find the quantile c so that
Pr(−c < T < c) = 1 − α
and rearranging as usual gives a 100 · (1 − α)% confidence interval for µX − µY :
r
1 1
x̄ − ȳ ± c · sP +
n m
where s
(n − 1)s2X + (m − 1)s2Y
sP =
n+m−2
13
Example (normal, two means, unknown common variance)
Two independent groups of students take the same test. Assume the scores are normally distributed and have a
common unknown population variance.
We have sample sizes n = 9 and m = 15, and get the following summary statistics: x̄ = 81.31, ȳ = 78.61, s2x = 60.76,
s2y = 48.24.
The pivot has df 9 + 15 − 2 = 22 degrees of freedom. Using the 0.975 quantile of t22 , which is 2.074, the 95% confidence
interval is:
r r
8 × 60.76 + 14 × 48.24 1 1
81.31 − 78.61 ± 2.074 +
22 9 15
= [−3.65, 9.05]
t density 22 df
Shaded
Probability is
95%
-2.074 2.074
2
What if the sample sizes are small and pretty sure that σX 6= σY2 ?
Then we can use Welch’s approximation:
X̄ − Ȳ − (µX − µY )
W = q 2 2
SX SY
n + m
We measure the force required to pull wires apart for two types of wire, X and Y . We take 20 measurements for each
wire.
1 2 3 4 5 6 7 8 9 10
X 28.8 24.4 30.1 25.6 26.4 23.9 22.1 22.5 27.6 28.1
Y 14.1 12.2 14.0 14.6 8.5 12.6 13.7 14.8 14.1 13.2
11 12 13 14 15 16 17 18 19 20
X 20.8 27.7 24.4 25.1 24.6 26.3 28.2 22.2 26.3 24.4
Y 12.1 11.4 10.1 14.2 13.6 13.1 11.9 14.8 11.1 13.5
14
30
25
20
15
10
X Y
t = 18.8003
df = 33.086
95% CI: 11.23214 13.95786
Pooled variance:
> t.test(X, Y,
+ conf.level = 0.95,
+ var.equal = TRUE)
t = 18.8003
df = 38
95% CI: 11.23879 13.95121
Remarks
• From box plots: look like very different population means and possibly different spreads
• The Welch approximate t-distribution is appropriate so a 95% confidence interval is 11.23–13.96
• If we assumed equal variances, the confidence interval becomes slightly narrower, 11.24–13.95
• Not a big difference!
15
Example (normal, paired samples)
The reaction times (in seconds) to a red or green light for 8 people are given in the following table. Find a 95% CI for the mean
difference in reaction time.
Red (X) Green (Y ) D =X −Y
1 0.30 0.24 0.06
2 0.43 0.27 0.16
3 0.23 0.36 −0.13
4 0.32 0.41 −0.09
5 0.41 0.38 0.03
6 0.58 0.38 0.20
7 0.53 0.51 0.02
8 0.46 0.61 −0.15
Now we need the α/2 and 1 − α/2 quantiles of χ2n−1 . Call these a and b. In other words,
(n − 1)S 2
Pr a < < b =1−α
σ2
Rearranging gives
a 1 b
1−α = Pr < <
(n − 1)S 2 σ2 (n − 1)S 2
(n − 1)S 2 (n − 1)S 2
= Pr < σ2 <
b a
16
Normal, two variances
Now we wish to compare the variances of two normally distributed populations. Random samples (iid) from each
2
population: X1 , . . . , Xn ∼ N(µX , σX ) and Y1 , . . . , Ym ∼ N(µY , σY2 )
2
We will compute a confidence interval for σX /σY2 . Start by defining:
2
2
h i
SY (m−1)SY
2
σY 2
σY
/(m − 1)
2
SX
= h 2
(n−1)SX
i
σ2 σ2
/(n − 1)
X X
This is the ratio of independent χ2 random variables divided by their degrees of freedom and hence has an Fm−1,n−1
distribution. This doesn’t depend on the parameters and is thus a pivot.
We now need the α/2 and 1 − α/2 quantiles of Fm−1,n−1 . Call these c and d. In other words,
SY2 /σY2 S2 σ2 S2
1 − α = Pr(c < 2 2 < d) = Pr(c X
2 < X
2 <d X )
SX /σX SY σY SY2
2
Rearranging gives the 100 · (1 − α)% confidence interval for σX /σY2 as
2
s2
s
c x2 , d x2
sy sy
Continuing from the previous example, n = 13 and 12s2x = 128.41. A sample of m = 9 seeds from a second strain
gave 8s2y = 36.72.
The 0.01 and 0.99 quantiles of F8,12 are 0.176 and 4.50.
2
Then a 98% confidence interval for σX /σY2 is
128.41/12 128.41/12
0.176 , 4.50 = [0.41, 10.49]
36.72/8 36.72/8
Single proportion
• Observe n Bernoulli trials with unknown probability p of success
• We want a confidence interval for p
• Recall that the sample proportion of successes p̂ = X̄ is the maximum likelihood estimator for p and is unbiased
for p
• The central limit theorem shows for large n,
X − np p̂ − p
p =p ≈ N (0, 1)
np(1 − p) p(1 − p)/n
• Rearranging the corresponding probability statement as usual and estimating p by p̂ gives the approximate
100 · (1 − α)% confidence interval as r
p̂(1 − p̂)
p̂ ± c
n
17
Example (single proportion)
• In the Newspoll of 3rd April 2017, 36% of 1,708 voters sampled said they would vote for the Government first
if an election were held on that day. What is a 95% confidence interval for the population proportion of voters
who would vote for the Government first?
• The sample proportion has an approximate normal distribution since the sample size is large so the required
confidence interval is: r
0.36 × 0.64
0.36 ± 1.96 = [0.337, 0.383]
1708
• It might be nice to round to the nearest percentage for this example. This gives us the final interval: 34%–38%
Two proportions
• We now wish to compare proportions between two different samples: Y1 ∼ Bi(n1 , p1 ), Y2 ∼ Bi(n2 , p2 )
• Use the approximate pivot
p̂ − p̂2 − (p1 − p2 )
q1 ≈ N(0, 1)
p1 (1−p1 ) p2 (1−p2 )
n1 + n2
Two detergents. First successful in 63 out of 91 trials, the second in 42 out of 79.
Summary statistics: p̂1 = 0.692, p̂2 = 0.532
90% confidence interval for the difference in proportions is:
r
0.692 × 0.308 0.532 × 0.468
0.692 − 0.532 ± 1.645 +
91 79
= [0.038, 0.282]