Tutorial Confidence Interval
Tutorial Confidence Interval
Tutorial
Instructor: Prof. Jize ZHANG
Tutors: Wenjun JIANG
(wjiangbb)
> 50%
2
Confidence intervals
A plausible range of values for the population parameter is called a confidence interval.
Confidence intervals consist of a lower limit and an upper limit. It includes a level of
confidence, which is a number that tells us just how likely it is that the true value is
contained within the interval.
3
Confidence intervals
A plausible range of values for the population parameter is called a confidence interval.
Confidence intervals consist of a lower limit and an upper limit. It includes a level of
confidence, which is a number that tells us just how likely it is that the true value is
contained within the interval.
4
General formula (Population mean)
Assumptions:
• Population is normally distributed
• Population standard deviation σ is known
• If population is not normal, use large sample (n > 30)
𝝈
→ A “95% confidence interval” for estimating the population mean is 𝒙
ഥ ± 𝟏. 𝟗𝟔
𝒏
5
Example 1
A sample of 400 circuits from a large normal population has a mean resistance of 2.20
ohms. We know from past testing that the population standard deviation is 0.35 ohms.
Determine a 95% confidence interval for the true mean resistance of the population.
https://poe.com/s/WRuXCG5Yss5j8eCmPYlc 6
Confidence intervals
Confidence intervals
𝜎 known 𝜎 unknown
Solved What can we do?
7
Student t-distribution (𝜎 unknown)
Assumptions:
• Population standard deviation σ is unknown (we can substitute the sample standard
deviation, s)
• Population is normally distributed
• If population is not normal, use large sample (n > 30)
Then, a 1 − 𝛼 100% confidence interval for the mean 𝝁 is: Margin of Error
𝒔 𝝈
ഥ ± 𝒕𝜶Τ𝟐,𝒏−𝟏
𝒙 ഥ ± 𝒛 𝜶Τ 𝟐
𝒙
𝒏 𝒏
9
Example 2 (cont.)
Suppose the annual maximum stream flow of a given river has been observed for 10
years yielding the following statistics:
Sample mean = 𝑥ҧ = 10000 cfs
Sample variance = 𝑠 2 = 9 × 106 cfs 2
a) Establish the two-sided 90% confidence interval on the mean annual maximum
stream flow. Assume a normal population.
https://poe.com/s/EKIlXkDsOQLs3zCBzVrv 10
Example 2 (cont.)
Suppose the annual maximum stream flow of a given river has been observed for 10
years yielding the following statistics:
Sample mean = 𝑥ҧ = 10000 cfs
Sample variance = 𝑠 2 = 9 × 106 cfs 2
a) Establish the two-sided 90% confidence interval on the mean annual maximum
stream flow. Assume a normal population.
What do we know? Solution:
• Sample mean ➔ quantitative 𝑠
90%CI = 𝑥ҧ ± 1.833
• 𝑛 = 10 ➔ t-distribution ➔ d. o. f. = 9 𝑛
• 𝑥ҧ = 104 cfs and 𝑠 2 = 9 × 106 cfs 2 = 10000 ± 1.833 948.683
• 90% confidence ➔ 𝑡0.05,9 = 1.833 = 10000 ± 1738.936
= 8261 cfs, 11739 cfs
https://poe.com/s/EKIlXkDsOQLs3zCBzVrv 11
Example 2 (cont.)
b) If it is desired to estimate the mean annual maximum stream flow to within
± 1000 cfs with 90% confidence, how many years of observation will be required?
Assume the sample (not the true value) variance based on the new set of data will be
approximately 9 × 106 cfs 2 .
https://poe.com/s/9YCYT7PCHBcKK1DsiJQj Oooops! 12
Example 2 (cont.)
b) If it is desired to estimate the mean annual maximum stream flow to within
± 1000 cfs with 90% confidence, how many additional years of observation will be
required? Assume the sample (not the true value) variance based on the new set of
data will be approximately 9 × 106 cfs 2 .
Refer to table
We see that a sample size of 27 will
do, hence an additional (27 – 10) = 17
years of observation will be required.
Coding
https://colab.research.google.com/drive/1JvV5jXdI
hFdqPjsi2c8bQrbzibqX11_x?usp=sharing
13
Confidence intervals
Confidence intervals
𝜎 known 𝜎 unknown
Solved Solved
14
General formula (Population proportion)
Upper and lower confidence limits for the population proportion are calculated with the
formula
𝑝Ƹ 1 − 𝑝Ƹ
𝑝Ƹ ± 𝑧𝛼Τ2
𝑛
where 𝑝Ƹ is the sample proportion, normal with 𝜇𝑝ො = 𝑝
𝑝ො 1−𝑝ො
→ A “95% confidence interval” for estimating the proportion is 𝑝Ƹ ± 𝟏. 𝟗𝟔
𝑛
15
Determining sample size
n=
( z )
2
2 2
n=
2
(
( z 2 ) p 1 − p )
2
E E2
16
Example 3
a) A national survey of 900 women golfers was conducted to learn how women golfers
view their treatment at golf courses in United States. The survey found that 396 of
the women golfers were satisfied with the availability of tee times. Suppose one
wants to develop a 95% confidence interval estimate for the proportion of the
population of women golfers satisfied with the availability of tee times.
https://poe.com/s/rBy47vWyeRMRoLQI0TAY 17
Example 3 (cont.)
b) Suppose the survey director wants to estimate the population proportion with a
margin of error of 0.025 at 95% confidence. How large a sample size is needed to
meet the required precision? (A previous sample of similar units yielded 0.44 for the
sample proportion.)
Solution:
𝜎 1.74
95%CI = 𝑥ҧ ± 1.96 = 3.2 ± 1.96 = 2.7, 3.7
𝑛 50
Judge whether the following conclusions are correct based on the calculations.
19
Supplementary examples to help understand (cont.)
• True or False and explain: We are 95% confident that the average number of
exclusive relationships college students in this sample have been in is between 2.7
and 3.7.
𝜎
False. The confidence interval 𝑥ҧ ± 1.96 definitely (100%) contains the
𝑛
sample mean 𝑥,ҧ not just with probability 95%.
• True or False and explain: 95% of college students have been in 2.7 to 3.7 exclusive
relationships.
False. The confidence interval is for covering the population mean 𝜇, not for
covering 95% of the entire population. If 95% of college students have been in
2.7 to 3.7 exclusive relationships, the standard deviation won’t be as large as
1.74. 20
Supplementary examples to help understand (cont.)
• True or False and explain: There is 0.95 probability that the true mean number of
exclusive relationships of college students falls in the interval (2.7, 3.7).
• True or False and explain: The interval (2.7, 3.7) has probability of 0.95 of enclosing
the true mean number of exclusive relationships of college students.
Both are False. The population mean 𝜇 is a fixed number, not random. It is either
in the interval (2.7, 3.7), or not in the interval. There is no uncertainty involved.
21