Chapter 3
Chapter 3
Chapter 3
Estimation is the process of estimating the value of parameter from information obtained from a sample.
Inferential statistics techniques have various assumptions that must be met before valid conclusions can be
made.
✓ One common assumption is that the samples must be randomly selected.
✓ Another common assumption is that either the sample size must be ≥ to 30 OR the population must be
normally or approximately normally distributed if the sample size is less than 30.
AN EXAMPLE ON HOW TO DEVELOP A SAMPLING DISTRIBUTION
X Frequency P(X)
18 1 0.25 (=1/4)
20 1 0.25
22 1 0.25
24 1 0.25
4 1.00
18+20
= = 19
2
✓ Different samples of the same size from the same population will yield
different sample means
�
𝑿𝑿 Frequency �)
P(𝑿𝑿
18 1 0.0625 (=1/16)
19 2 0.1250
20 3 0.1875
Sampling Distribution
21 4 0.2500
22 3 0.1875
23 2 0.1250
24 1 0.0625
16 1.0000
18+19+19+20+20+20+21+21+21+21+22+22+22+23+23+24
16
COMPARING THE POPULATION DISTRIBUTION TO THE
SAMPLE MEANS DISTRIBUTION
Population Distribution Sampling Distribution
Properties of the distribution of sample means
2.236
= = 1.58
2
CENTRAL LIMIT THEOREM (CLT)
For any population X with expected value 𝜇𝜇 and standard deviation 𝜎𝜎, the sampling distribution of X will be
approximately normal if the sample size n is sufficiently large. As a rule of thumb, a sample is considered
large if it is more than or equal to 30 ( ≥ 30 ).
When the original population is normally distributed, the distribution of the sample means will be normally
distributed, for any sample size n.
When the original population is not normally distributed, the distribution of the sample mean will be
normally distributed for a sample size of 30 or more.
It is important to remember 2 things when you use the CLT:
Estimator Estimate
A statistics that is used to estimate a Any particular value of the estimator.
population parameter.
3 PROPERTIES OF THE BEST ESTIMATOR
TWO TYPES OF ESTIMATION
Types of
estimation
❑ Example: Suppose a college president wishes to estimate the average age of students attending classes this semester. The
president could select a random sample of 100 students and find the average age of these students, says, 22.3 years. From
the sample mean, the president could infer that the average age of all students is 22.3 years. This type of estimate is called
a Point Estimate.
❑ The following table indicates that best estimator for each parameter:
CONTENTS
3.0 Introduction to Sampling Distribution of Sample Mean
3.1 Introduction to Estimation
Types of
estimation
❑ Example: Suppose a college president wishes to estimate the average age of students attending classes this semester. The
president could select a random sample of 100 students and find the average age of these students, says, 22.3 years. From
the sample mean, the president could infer that the average age of all students is 22.3 years. This type of estimate is called
a Point Estimate.
❑ The following table indicates that best estimator for each parameter:
EXAMPLE 1
The total time for exercise in a week among 8 career women is selected. The resulting observations are 10.2, 9.3,
11.9, 9.2, 8.3, 11.2, 10.4 and 9.5. What are the point estimates of mean and standard deviation of exercise
time?
∑ 𝑥𝑥 10.2+9.3+11.9+9.2+8.3+11.2+10.4+9.5 80
Solution: 𝑥𝑥̅ = = = = 10
𝑛𝑛 8 8
1 (∑ 𝑥𝑥)2 1 (80)2
𝑠𝑠 = [∑ 𝑥𝑥 2 − = [809.52 − ] = 1.1662
𝑛𝑛−1 𝑛𝑛 8−1 8
3.1.2 INTERVAL ESTIMATION
How good is a point estimate? The answer is that there is no way of knowing how close a particular point
estimate is to the population mean.
So, this answer creates lack of confidence about the accuracy of the point estimates. Therefore,
statisticians prefer another type of estimate, called an Interval Estimate.
An interval estimate of a parameter is an interval or a range of values used to estimate the parameter.
When estimating a population parameter using a sample statistic it is never going to be perfect; there will
always be error.
Example: An interval estimate for the average age of all students might be 21.9 < 𝜇𝜇 < 22.7,
or 22.3 ± 0.4 years.
Example:
A point estimate for the average age of all students is 22.3 years.
An interval estimate for the average age of all students might be 21.9 < 𝜇𝜇 < 22.7, or 22.3 ± 0.4 years.
Confidence Interval
�
Interval Estimate
21.9 22.7
𝑥𝑥̅ ± 𝐸𝐸
3.1.2
INTERVAL ESTIMATION
Interval estimate is also known as Confidence Interval. We can write the confidence interval for a
parameter 𝜃𝜃 as
P(a < 𝜃𝜃 < b) = 1 – α
Notation : a = lower confidence limit (LCL)
b = upper confidence limit (UCL)
1 – α = the confidence coefficient
(1 – α) 100% = the confidence level
α = significance level
The confidence level refer to % of confidence intervals that we expect to contain the population parameter.
If you construct a 95% confidence interval, this means the confidence coefficient is 0.95 and the
confidence level is 95% .
3 common confidence intervals are used: 90%, 95% and 99%. If you desire to be more confident, such as
99% confident, then you must make the interval larger so that the intervals contain the true
population mean. The most common used of confidence level is 95%.
Example:
A point estimate for the average age of all students is 22.3 years.
An interval estimate for the average age of all students might be 21.9 < 𝜇𝜇 < 22.7, or 22.3 ± 0.4 years.
Confidence Interval
Interval Estimate
at 95%
confidence level
21.9 22.7
Interpretation: We are 95% confident that the population mean lies between 21.9 and 22.7.
3.2
CONFIDENCE INTERVAL FOR A POPULATION MEAN
𝑥𝑥̅ ± 𝐸𝐸
𝑥𝑥̅ ± 𝐸𝐸
3.2.1
POPULATION VARIANCE (𝜎𝜎 2 ) OR STD. DEVIATION (𝜎𝜎) IS KNOWN
The (1 – α) 100% confidence level for the population mean is,
Example 2: The average lifetime of a product from a sample of 30 items is found to be 48 months. It is estimated that the
standard deviation of the population is 3 months. Find the 95% confidence interval for the average lifetime of the product
and interpret the interval.
Interpretation:
Solution:
We are 95% confident
that the average life
time (months) of the
product lies between
46.9265 and
= 48 ± 1.96 0.5477 = 48 − 1.0735 < 𝜇𝜇 < 48 + 1.0735 = 46.9265 < 𝜇𝜇 < 49.0735 49.0735.
Solution: CL = 95%
α = 1 - 0.95 = 0.05
𝛼𝛼
= 0.025
2
= 48 ± 1.96 (0.5477)
Interpretation:
Solution: CL = 90%
α = 1 - 0.90 = 0.10
𝛼𝛼
= 0.05
2
Interpretation:
We are 90% confident
that the average life
time (months) of the
product lies between
47.0991 and
48.9009.
Exercise 1
A researcher claimed that the distribution of height of men in a population is normally distributed with mean of 69
inches and a standard deviation of 2.5 inches. A sample of 100 men drawn randomly from the population had an
average height of 68.5 inches. Construct a 98% confidence interval for the population mean. Interpret the
interval.
Solution:
3.2.2
POPULATION VARIANCE (𝜎𝜎 2 ) OR STD. DEVIATION (𝜎𝜎) IS UNKNOWN
(LARGE SAMPLE SIZE)
The (1 – α) 100% confidence level for the population mean is,
Example 3: The time taken (in seconds) to connect to the internet via a dial-in service for a sample of 35 nights gave a
mean of 26.46 and a standard deviation of 10.81. Find a 98% confidence interval on the mean time required to
access the internet during the night.
The time taken (in seconds) to connect to the internet via a dial-in service for a sample of 35 nights gave a mean of
26.46 and a standard deviation of 10.81. Find a 98% confidence interval on the mean time required to
access the internet during the night.
Solution: CL = 98%
α = 1 - 0.98 = 0.02
𝛼𝛼
= 0.01
2
Interpretation:
We are 98% confident
that the mean time
(seconds) required to
access the internet
during the night lies
between 22.2093 and
30.7107.
Exercise 2
Table below shows the summary of the statistics of the mean height (in meter) of female high school for a random
sample of 50 female students.
i. Calculate a 95% confidence interval for the mean height of female students. Interpret the result.
Solution:
PAST YEAR QUESTION (JAN’18 – QUESTION 3)
There was a claim that the price of ikan kembung sold in a certain market was different from the average RM17
per kg. A study was conducted to investigate the changing price per kg (in RM) of ikan kembung. Fifty stalls were
selected at random and the results obtained are as follows:
Example 4: The breaking strengths of 11 bundles of wool fibres have a sample mean 436.5 and a sample of
standard deviation of 11.90. Assume the breaking strengths of the populations are normally distributed. Construct a 90%
confidence interval for the mean breaking strengths for wool fibres.
The breaking strengths of 11 bundles of wool fibres have a sample mean 436.5 and a sample of standard
deviation of 11.90. Assume the breaking strengths of the populations are normally distributed. Construct a 90%
confidence interval for the mean breaking strengths for wool fibres.
Solution: CL = 90%
α = 1 - 0.90 = 0.10
𝛼𝛼
= 0.05
2
n = 11
Interpretation:
We are 90% confident
that the mean
breaking strengths for
wool fibres lies
between 429.9986
and 443.0014.
PAST YEAR QUESTION (JUNE’19 – QUESTION 5)
A statistics lecturer intends to investigate whether there is sufficient evidence to conclude that the average score
was different from the expected average score of 74. A random sample of 15 students were selected and analysed
using SPSS. The results obtained is as follow.
(2 marks)
CONTENTS
3.3 Confidence Interval for the Difference between Two Population Means
3.3.1 Independent Sample - Population Variances (𝜎𝜎1 , 𝜎𝜎2 ) or Population Std. Deviation (𝜎𝜎1 , 𝜎𝜎2 ) is known
3.3.2 Independent Sample - Population Variances (𝜎𝜎1 , 𝜎𝜎2 ) or Population Std. Deviation (𝜎𝜎1 , 𝜎𝜎2 ) is unknown (Large
& Small Sample Size)
3.3.3 Dependent Sample
3.3
CONFIDENCE INTERVAL FOR THE DIFFERENCE BETWEEN TWO
POPULATION MEANS
INTRODUCTION
Example:
I. The average lifetimes of 2 different brands of bus tires might be compared to see whether there is any
difference in tread wear.
II. 2 different brands of fertilizer might be tested to see whether one is better than the other for growing
plants.
INDEPENDENT SAMPLE
❖ Suppose we want to estimate the difference between the mean salaries of all male and all female
executives. To do so, we draw two samples, one from the population of male executives and
another from the population of female executives. These two samples are independent because
they are drawn from two different populations, and the samples have no effect on each other.
DEPENDENT SAMPLE
❖ Suppose we want to estimate the difference between the mean weights of all participants before
and after a weight loss program. To accomplish this, suppose we take a sample of 30 participants
and measure their weights before and after the completion of this program. Note that these two
samples include the same 30 participants. i.e. Data from the same participant – before and after.
This is an example of two dependent samples. Such samples are also called paired or matched
samples.
INDEPENDENT
SAMPLE
3.3.1
VARIANCES (𝜎𝜎 2 , 𝜎𝜎 2 ) OR STD. DEVIATIONS (𝜎𝜎 , 𝜎𝜎 ) ARE KNOWN
1 2 1 2
❖ Assumption: i. The populations are normally distributed
ii. For both small and large sample size
iii. Population variances 𝜎𝜎 2 & 𝜎𝜎 2 are known
1 2
(b) Construct a 97% confidence interval for the difference between the 2019 mean salaries of all full-
time state officers in these two states.
3.3.2
VARIANCES (𝜎𝜎12, 𝜎𝜎22) OR STD. DEVIATIONS (𝜎𝜎1 , 𝜎𝜎2 ) ARE UNKNOWN
(Large Samples – Both n 1 & n 2 ≥ 30)
❖ Assumption: i. The populations are normally distributed
ii. Population variances 𝜎𝜎 2 & 𝜎𝜎 2 are unknown
1 2
< 30
HOW TO DETERMINE THE ASSUMPTION OF EQUALITY OF
VARIANCES?
HOW TO DETERMINE THE P-VALUE FOR LEVENE’S TEST?
p-value
p-value
C.I FOR (𝝁𝝁1−𝝁𝝁2) WHEN 𝜎𝜎 21 = 𝜎𝜎22
EXAMPLE 7
An insurance company wants to know if the average speed at which men drive cars is greater than that of women
drivers. The company took a random sample of 26 cars driven by men on a highway and found the mean speed to
be 72 miles per hour with a standard deviation of 2.2 miles per hour. Another sample of 16 cars driven by women
on the same highway gave a mean speed of 68 miles per hour with standard deviation of 2.5 miles per hour.
Assume that the speeds at which all men and all women drive cars on this highway are both normally distributed
with the same population standard deviation. (assume equal variances)
Construct a 98% confidence interval for the difference between the mean speeds of cars driven by all men and all
women on this highway.
Solution:
Sample
Men : 𝑛𝑛1 = 26, 𝑥𝑥̅1 = 72, 𝑠𝑠1 = 2.2
Women: 𝑛𝑛2 = 16, 𝑥𝑥̅2 = 68, 𝑠𝑠2 = 2.5
CL = 98% = 0.98
α = 1 - 0.98= 0.02
𝛼𝛼
= 0.01
2
From t table, df = 𝑛𝑛1 + 𝑛𝑛2 − 2 = 40
𝑡𝑡0.01, 40 = 2.423
Sample
Men : 𝑛𝑛1 = 26, 𝑥𝑥̅1 = 72, 𝑠𝑠1 = 2.2 𝑛𝑛1 − 1 𝑠𝑠12 + 𝑛𝑛2 − 1 𝑠𝑠22
Women: 𝑛𝑛2 = 16, 𝑥𝑥̅2 = 68, 𝑠𝑠2 = 2.5 𝑠𝑠𝑝𝑝 =
𝑛𝑛1 + 𝑛𝑛2 − 2
26−1 2.22 + 16−1 2.52
CL = 98% = 0.98 =
26+16−2
α = 1 - 0.98= 0.02
𝛼𝛼 121+93.75
= 0.01 =
2 40
From t table, df = 𝑛𝑛1 + 𝑛𝑛2 − 2 = 40 214.75
𝑡𝑡0.01, 40 = 2.423 = = 2.3171
40
1 1
𝑢𝑢1 − 𝑢𝑢2 = (𝑥𝑥̅1 -𝑥𝑥̅2 ) ± 𝑡𝑡𝛼𝛼, 𝑑𝑑𝑑𝑑 (𝑠𝑠𝑝𝑝 𝑛𝑛1
+
𝑛𝑛2
) Interpretation:
2
1 1
= (72 - 68) ± 2.423 (2.3171 𝑥𝑥 + ) We are 98% confident
26 16
that the differences
= 4 ± 2.423 (0.7362)
between the two
= 4 ± 1.7839
population means lies
2.2161 < 𝑢𝑢1 − 𝑢𝑢2 < 5.7839 between 2.2161 and
5.7839.
EXAMPLE 8
The manufacturer of a small battery-powered tape recorder decides to include four alkaline batteries with its
product. Two battery suppliers are being considered; each has its own brand (brand 1 and brand 2). The
supervising inspector of incoming quality wants to know if the average lifetimes of two brands are the same. A
sample experiment is conducted: each of ten batteries (five of each brand) is connected to a test device that places
a small drain on the battery power and records the battery lifetimes the following result (in hours) are obtained:
EXAMPLE 9
a) Based on the p-value in the Levene’s Test, test the equality of variances in this study. Use α = 0.05
b) State the 95% confidence interval on the differences between the average lifetimes of the two brands.
c) Based on the confidence interval, can we conclude that the average lifetimes of the two brands are equal?
Solution:
p – value = 0.459
α = 0.05
c) Based on the confidence interval, can we conclude that the average lifetimes of the two brands are equal?
5.6823 < 𝑢𝑢1 − 𝑢𝑢2 < 19.5177
No, the average lifetimes of the two brands are not equal, because the interval does not include the value of 0.
C.I FOR (𝝁𝝁1−𝝁𝝁2) WHEN 𝜎𝜎12 ≠ 𝜎𝜎22
EXAMPLE 10
A set of facilitation tools to help with data analysis for problem solving is being developed by a group of
statisticians at UiTM. In order to test effectiveness of these tools, a group of research officers were asked to
analyze and produce a built-in report for a set of data on the computer. Twelve equally capable research officers
were randomly selected and six were randomly assigned a standard procedure to complete the task. The other six
were asked to do the task using the developed facilitation tools. The response measured was the time to
completion (in minutes). The output of statistical analysis is shown in the following tables.
p – value = 0.003
α = 0.05
Yes, the average completion time of the two procedures are differ, because the interval does not include the value
of 0.
2
𝑠𝑠2 𝑠𝑠2 5.8912 4.0872
2
1 + 2 34.704 16.704 2
𝑛𝑛1 𝑛𝑛2 + + 73.411
6 6
𝑑𝑑𝑑𝑑 = 2 2 = 2 2 = 6
2
6
16.704 2
= 33.455 7.751 = 8.908
𝑠𝑠2
1 𝑠𝑠2
2 5.8912 4.0872 34.704 +
6 6 5 5
𝑛𝑛1 𝑛𝑛2 6 6 +
+ + 6−1 6−1
𝑛𝑛1 −1 𝑛𝑛2 −1 6−1 6−1
INDEPENDENT SAMPLE
❖ Suppose we want to estimate the difference between the mean salaries of all male and all female
executives. To do so, we draw two samples, one from the population of male executives and
another from the population of female executives. These two samples are independent because
they are drawn from two different populations, and the samples have no effect on each other.
DEPENDENT SAMPLE
❖ Suppose we want to estimate the difference between the mean weights of all participants before
and after a weight loss program. To accomplish this, suppose we take a sample of 30 participants
and measure their weights before and after the completion of this program. Note that these two
samples include the same 30 participants. i.e. Data from the same participant – before and after.
This is an example of two dependent samples. Such samples are also called paired or matched
samples.
DEPENDENT
SAMPLE
3.3.3
DEPENDENT SAMPLE (Matched or Paired Samples)
EXAMPLE 11
A random sample of 9 local banks shows their deposits (in billions of dollars) 3 years ago and their deposits (in
billions of dollars) today. At α=0.05, construct its confidence interval. Assume the variable is normally distributed.
Bank 1 2 3 4 5 6 7 8 9
3 years ago 11.42 8.41 3.98 7.37 2.28 1.10 1.00 0.9 1.35
Today 16.69 9.44 6.53 5.58 2.92 1.88 1.78 1.5 1.22
3 years ago, x1 Today, x2 di = x1i – x2i di2
Solution: 1st step :
11.42 16.69 11.42 – 16.69 = -5.27 (−5.27)2 = 27.773
8.41 9.44 8.41 – 9.44 = -1.03 (-1.03)2 = 1.061
3.98 6.53 3.98 – 6.53 = -2.55 (-2.55)2 = 6.503
7.37 5.58 7.37 – 5.58 = 1.79 (1.79)2 = 3.204
2.28 2.92 2.28 – 2.92 = -0.64 (-0.64)2 = 0.410
1.10 1.88 1.10 – 1.88 = -0.78 (-0.78)2 = 0.608
1.00 1.78 1.00 – 1.78 = -0.78 (-0.78)2 = 0.608
0.90 1.50 0.90 – 1.50 = -0.6 (-0.6)2 = 0.36
1.35 1.22 1.35 – 1.22 = 0.13 (0.13)2 = 0.017
� 𝑑𝑑 = −9.73 � 𝑑𝑑 2 = 40.544
EXAMPLE 11
−9.73
= -1.081
9
9(40.544)− −9.73 2
= 1.973
9(9−1)
𝑑𝑑̅ 1.937
-1.081 ± 𝑡𝑡0.025,8
9 𝑠𝑠𝑑𝑑 Interpretation:
± 𝑡𝑡0.025,6 We are 95% confident
𝑛𝑛
= -1.081 ± 2.306 0.646 that the differences
= -1.081 ± 1.490 between the two
α = 0.05 population means lies
𝛼𝛼 −2.571 < 𝑢𝑢𝑑𝑑 < 0.409
= 0.025 between -2.571 and
2
From t table, 0.409.
n – 1 = 9 -1 = 8
𝑡𝑡0.025, 8 = 2.306
Exercise 4
The manufacturer of a gasoline additive claimed that the use of this additive increases gasoline mileage. A random
sample of six cars was selected and these cars were driven for one week without the gasoline additive and then for
one week with the gasoline additive. The following table gives the miles per gallon for these cars without and with
the gasoline additive.
Construct a 95% confidence interval for the difference in mean mileage per gallon for cars without and with the
gasoline additive.