Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Chapter 3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 81

CHAPTER 3 : ESTIMATION

Lecturer: Dr Ruzanita Mat Rani


Prepared BY : HAZIYAH BINTI MD JASMIN
CONTENTS
3.0 Introduction to Sampling Distribution of Sample Mean
3.1 Introduction to Estimation

3.1.1 Point Estimation


3.1.2 Interval Estimation

 One aspect of inferential statistics is estimation.

 Estimation is the process of estimating the value of parameter from information obtained from a sample.

 A sample statistic that is used to estimate a population parameter is called an estimator.


3.0
INTRODUCTION TO SAMPLING DISTRIBUTION OF SAMPLE MEAN
 A Sampling Distribution is a distribution of all possible values of a sample statistics for a given size
sample selected from a population.

Sample statistic – A summary measure calculated for a sample data.


Population parameter – A numerical measure calculated for a population data.

Sampling distribution – The probability distribution of a sample statistics.


Population distribution – The probability distribution of the population data.
3.0
INTRODUCTION TO SAMPLING DISTRIBUTION OF SAMPLE MEAN
 For example, a teacher wishes to estimate student’s mean CGPA at the school. If the teacher obtained many
different samples of 30 students, then she will compute a different mean for each sample. The teacher is
interested in the distribution of all potential mean CGPA and might calculate mean for any given sample of 30
students.

 Inferential statistics techniques have various assumptions that must be met before valid conclusions can be
made.
✓ One common assumption is that the samples must be randomly selected.
✓ Another common assumption is that either the sample size must be ≥ to 30 OR the population must be
normally or approximately normally distributed if the sample size is less than 30.
AN EXAMPLE ON HOW TO DEVELOP A SAMPLING DISTRIBUTION
X Frequency P(X)
18 1 0.25 (=1/4)
20 1 0.25
22 1 0.25
24 1 0.25
4 1.00
18+20
= = 19
2

✓ Different samples of the same size from the same population will yield
different sample means

𝑿𝑿 Frequency �)
P(𝑿𝑿
18 1 0.0625 (=1/16)
19 2 0.1250
20 3 0.1875
Sampling Distribution
21 4 0.2500
22 3 0.1875
23 2 0.1250
24 1 0.0625
16 1.0000

Approximately normally distributed

18+19+19+20+20+20+21+21+21+21+22+22+22+23+23+24
16
COMPARING THE POPULATION DISTRIBUTION TO THE
SAMPLE MEANS DISTRIBUTION
Population Distribution Sampling Distribution
Properties of the distribution of sample means

2.236
= = 1.58
2
CENTRAL LIMIT THEOREM (CLT)
 For any population X with expected value 𝜇𝜇 and standard deviation 𝜎𝜎, the sampling distribution of X will be
approximately normal if the sample size n is sufficiently large. As a rule of thumb, a sample is considered
large if it is more than or equal to 30 ( ≥ 30 ).

 As a general guideline, the normal distribution approximation is justified when n ≥ 30.

 It is important to remember 2 things when you use the CLT:

 When the original population is normally distributed, the distribution of the sample means will be normally
distributed, for any sample size n.

 When the original population is not normally distributed, the distribution of the sample mean will be
normally distributed for a sample size of 30 or more.
It is important to remember 2 things when you use the CLT:

 When the original population is normally


distributed, the distribution of the
sample means will be normally
distributed, for any sample size n.

 When the original population is not


normally distributed, the distribution of
the sample mean will be normally
distributed for a sample size of 30 or
more.
Symbols that you need to know:
3.1
INTRODUCTION TO ESTIMATION
 One aspect of inferential statistics is ESTIMATION, which is the process of estimating the value of a parameter
from information obtained from a sample.
Parameter Statistics
Any statistical measure (such as mean, Any statistical measure computed from a
mode, std, deviation) computed from sample data is known as STATISTICS.
population data is known as PARAMETER.

Estimator Estimate
A statistics that is used to estimate a Any particular value of the estimator.
population parameter.
3 PROPERTIES OF THE BEST ESTIMATOR
TWO TYPES OF ESTIMATION

Types of
estimation

Point Estimate Interval Estimate


Estimation procedure

Step 1: Select a sample.

Step 2: Collect required information from the


members of the sample.

Step 3: Calculate the value(s) of the sample statistic(s).

Step 4: Assign value(s) to the corresponding


population parameter(s).
3.1.1 POINT ESTIMATION
❑ The value of a sample statistic that is used to estimate a population parameter is called a point estimate.

❑ Example: Suppose a college president wishes to estimate the average age of students attending classes this semester. The
president could select a random sample of 100 students and find the average age of these students, says, 22.3 years. From
the sample mean, the president could infer that the average age of all students is 22.3 years. This type of estimate is called
a Point Estimate.

❑ The following table indicates that best estimator for each parameter:
CONTENTS
3.0 Introduction to Sampling Distribution of Sample Mean
3.1 Introduction to Estimation

3.1.1 Point Estimation


3.1.2 Interval Estimation
TWO TYPES OF ESTIMATION

Types of
estimation

Point Estimate Interval Estimate


3.1.1 POINT ESTIMATION
❑ The value of a sample statistic that is used to estimate a population parameter is called a point estimate.

❑ Example: Suppose a college president wishes to estimate the average age of students attending classes this semester. The
president could select a random sample of 100 students and find the average age of these students, says, 22.3 years. From
the sample mean, the president could infer that the average age of all students is 22.3 years. This type of estimate is called
a Point Estimate.

❑ The following table indicates that best estimator for each parameter:
EXAMPLE 1
The total time for exercise in a week among 8 career women is selected. The resulting observations are 10.2, 9.3,
11.9, 9.2, 8.3, 11.2, 10.4 and 9.5. What are the point estimates of mean and standard deviation of exercise
time?

∑ 𝑥𝑥 10.2+9.3+11.9+9.2+8.3+11.2+10.4+9.5 80
Solution: 𝑥𝑥̅ = = = = 10
𝑛𝑛 8 8

1 (∑ 𝑥𝑥)2 1 (80)2
𝑠𝑠 = [∑ 𝑥𝑥 2 − = [809.52 − ] = 1.1662
𝑛𝑛−1 𝑛𝑛 8−1 8
3.1.2 INTERVAL ESTIMATION

 How good is a point estimate? The answer is that there is no way of knowing how close a particular point
estimate is to the population mean.

 So, this answer creates lack of confidence about the accuracy of the point estimates. Therefore,
statisticians prefer another type of estimate, called an Interval Estimate.

 An interval estimate of a parameter is an interval or a range of values used to estimate the parameter.
When estimating a population parameter using a sample statistic it is never going to be perfect; there will
always be error.

 We can express that error, or uncertainty, using an interval estimate:

 Example: An interval estimate for the average age of all students might be 21.9 < 𝜇𝜇 < 22.7,
or 22.3 ± 0.4 years.
Example:

 A point estimate for the average age of all students is 22.3 years.

 An interval estimate for the average age of all students might be 21.9 < 𝜇𝜇 < 22.7, or 22.3 ± 0.4 years.

Confidence Interval


Interval Estimate

21.9 22.7

22.3 ± 0.4 Margin of error

𝑥𝑥̅ ± 𝐸𝐸
3.1.2
INTERVAL ESTIMATION
 Interval estimate is also known as Confidence Interval. We can write the confidence interval for a
parameter 𝜃𝜃 as
P(a < 𝜃𝜃 < b) = 1 – α
Notation : a = lower confidence limit (LCL)
b = upper confidence limit (UCL)
1 – α = the confidence coefficient
(1 – α) 100% = the confidence level
α = significance level
 The confidence level refer to % of confidence intervals that we expect to contain the population parameter.
 If you construct a 95% confidence interval, this means the confidence coefficient is 0.95 and the
confidence level is 95% .
 3 common confidence intervals are used: 90%, 95% and 99%. If you desire to be more confident, such as
99% confident, then you must make the interval larger so that the intervals contain the true
population mean. The most common used of confidence level is 95%.
Example:

 A point estimate for the average age of all students is 22.3 years.

 An interval estimate for the average age of all students might be 21.9 < 𝜇𝜇 < 22.7, or 22.3 ± 0.4 years.

Confidence Interval

Interval Estimate

at 95%
confidence level
21.9 22.7
Interpretation: We are 95% confident that the population mean lies between 21.9 and 22.7.
3.2
CONFIDENCE INTERVAL FOR A POPULATION MEAN
𝑥𝑥̅ ± 𝐸𝐸
𝑥𝑥̅ ± 𝐸𝐸
3.2.1
POPULATION VARIANCE (𝜎𝜎 2 ) OR STD. DEVIATION (𝜎𝜎) IS KNOWN
 The (1 – α) 100% confidence level for the population mean is,

 Assumption: i. The sample is a random sample

ii. Either n ≥ 30 or the population is normally distributed when n < 30

iii. Variance or standard deviation for the population is known

 Example 2: The average lifetime of a product from a sample of 30 items is found to be 48 months. It is estimated that the
standard deviation of the population is 3 months. Find the 95% confidence interval for the average lifetime of the product
and interpret the interval.
Interpretation:
Solution:
We are 95% confident
that the average life
time (months) of the
product lies between
46.9265 and
= 48 ± 1.96 0.5477 = 48 − 1.0735 < 𝜇𝜇 < 48 + 1.0735 = 46.9265 < 𝜇𝜇 < 49.0735 49.0735.
Solution: CL = 95%
α = 1 - 0.95 = 0.05
𝛼𝛼
= 0.025
2

= 48 ± 1.96 (0.5477)

= 48 − 1.0735 < 𝜇𝜇 < 48 + 1.0735


= 46.9265 < 𝜇𝜇 < 49.0735

Interpretation:

We are 95% confident


that the average life
time (months) of the
product lies between
46.9265 and
49.0735.
Repeat the same problem in the previous slide by finding the 90% confidence interval for the average lifetime.
Interpret the value obtained.

Solution: CL = 90%
α = 1 - 0.90 = 0.10
𝛼𝛼
= 0.05
2

Interpretation:
We are 90% confident
that the average life
time (months) of the
product lies between
47.0991 and
48.9009.
Exercise 1
A researcher claimed that the distribution of height of men in a population is normally distributed with mean of 69
inches and a standard deviation of 2.5 inches. A sample of 100 men drawn randomly from the population had an
average height of 68.5 inches. Construct a 98% confidence interval for the population mean. Interpret the
interval.

Solution:
3.2.2
POPULATION VARIANCE (𝜎𝜎 2 ) OR STD. DEVIATION (𝜎𝜎) IS UNKNOWN
(LARGE SAMPLE SIZE)
 The (1 – α) 100% confidence level for the population mean is,

 Assumption: i. The sample is a random sample


ii. The population is normally distributed
iii. Sample size is large (n ≥ 30)
iv. Variance or standard deviation for the population is unknown

 Example 3: The time taken (in seconds) to connect to the internet via a dial-in service for a sample of 35 nights gave a
mean of 26.46 and a standard deviation of 10.81. Find a 98% confidence interval on the mean time required to
access the internet during the night.
The time taken (in seconds) to connect to the internet via a dial-in service for a sample of 35 nights gave a mean of
26.46 and a standard deviation of 10.81. Find a 98% confidence interval on the mean time required to
access the internet during the night.

Solution: CL = 98%
α = 1 - 0.98 = 0.02
𝛼𝛼
= 0.01
2

Interpretation:
We are 98% confident
that the mean time
(seconds) required to
access the internet
during the night lies
between 22.2093 and
30.7107.
Exercise 2
Table below shows the summary of the statistics of the mean height (in meter) of female high school for a random
sample of 50 female students.

i. Calculate a 95% confidence interval for the mean height of female students. Interpret the result.

ii. Determine the value of A.

Solution:
PAST YEAR QUESTION (JAN’18 – QUESTION 3)
There was a claim that the price of ikan kembung sold in a certain market was different from the average RM17
per kg. A study was conducted to investigate the changing price per kg (in RM) of ikan kembung. Fifty stalls were
selected at random and the results obtained are as follows:

a) Show that the standard error of the mean is 0.1181.


(3 marks)
b) Construct a 99% confidence interval for the mean price of ikan kembung.
(4 marks)
c) Based on the confidence interval in b), does the average price per kg (in RM) of ikan kembung in the market
differ from RM17? Give a reason to support your answer.
(2 marks)
3.2.2
POPULATION VARIANCE (𝜎𝜎 2 ) OR STD. DEVIATION (𝜎𝜎) IS UNKNOWN
(SMALL SAMPLE SIZE)
 The (1 – α) 100% confidence level for the population mean is,

 Assumption: i. The sample is a random sample


ii. The population is normally distributed
iii. Sample size is small (n < 30)
iv. Variance or standard deviation for the population is unknown

 Example 4: The breaking strengths of 11 bundles of wool fibres have a sample mean 436.5 and a sample of
standard deviation of 11.90. Assume the breaking strengths of the populations are normally distributed. Construct a 90%
confidence interval for the mean breaking strengths for wool fibres.
The breaking strengths of 11 bundles of wool fibres have a sample mean 436.5 and a sample of standard
deviation of 11.90. Assume the breaking strengths of the populations are normally distributed. Construct a 90%
confidence interval for the mean breaking strengths for wool fibres.

Solution: CL = 90%
α = 1 - 0.90 = 0.10
𝛼𝛼
= 0.05
2
n = 11

Interpretation:
We are 90% confident
that the mean
breaking strengths for
wool fibres lies
between 429.9986
and 443.0014.
PAST YEAR QUESTION (JUNE’19 – QUESTION 5)
A statistics lecturer intends to investigate whether there is sufficient evidence to conclude that the average score
was different from the expected average score of 74. A random sample of 15 students were selected and analysed
using SPSS. The results obtained is as follow.

a) Identify the statistical test used for this study.


(1 mark)
b) Show that the standard error of the mean is 3.5002.
(2 marks)
c) Construct a 95% confidence interval for the average score.
(4 marks)
d) Based on the confidence interval in c), does the average score differ from 74? Give a reason to support your
answer.

(2 marks)
CONTENTS
3.3 Confidence Interval for the Difference between Two Population Means
3.3.1 Independent Sample - Population Variances (𝜎𝜎1 , 𝜎𝜎2 ) or Population Std. Deviation (𝜎𝜎1 , 𝜎𝜎2 ) is known
3.3.2 Independent Sample - Population Variances (𝜎𝜎1 , 𝜎𝜎2 ) or Population Std. Deviation (𝜎𝜎1 , 𝜎𝜎2 ) is unknown (Large
& Small Sample Size)
3.3.3 Dependent Sample
3.3
CONFIDENCE INTERVAL FOR THE DIFFERENCE BETWEEN TWO
POPULATION MEANS
INTRODUCTION
Example:
I. The average lifetimes of 2 different brands of bus tires might be compared to see whether there is any
difference in tread wear.
II. 2 different brands of fertilizer might be tested to see whether one is better than the other for growing
plants.

Let, 𝜇𝜇1 = Mean of 1st population


𝜇𝜇2 = Mean of 2nd population

 We want to find C.I. for (𝝁𝝁1−𝝁𝝁2)


 But we used sample statistics or its estimator (𝑥𝑥1̅ − 𝑥𝑥̅2 ) to make the C.I.
 There are 2 different types of interval estimation for difference between two means, namely INDEPENDENT
and DEPENDENT samples.
DIFFERENT BETWEEN INDEPENDENT & DEPENDENT SAMPLE
Two samples drawn from two populations are independent if the selection of one sample from one
population does not affect the selection of the second sample from the second population. Otherwise,
the samples are dependent.

 INDEPENDENT SAMPLE
❖ Suppose we want to estimate the difference between the mean salaries of all male and all female
executives. To do so, we draw two samples, one from the population of male executives and
another from the population of female executives. These two samples are independent because
they are drawn from two different populations, and the samples have no effect on each other.

 DEPENDENT SAMPLE
❖ Suppose we want to estimate the difference between the mean weights of all participants before
and after a weight loss program. To accomplish this, suppose we take a sample of 30 participants
and measure their weights before and after the completion of this program. Note that these two
samples include the same 30 participants. i.e. Data from the same participant – before and after.
This is an example of two dependent samples. Such samples are also called paired or matched
samples.
INDEPENDENT
SAMPLE
3.3.1
VARIANCES (𝜎𝜎 2 , 𝜎𝜎 2 ) OR STD. DEVIATIONS (𝜎𝜎 , 𝜎𝜎 ) ARE KNOWN
1 2 1 2
❖ Assumption: i. The populations are normally distributed
ii. For both small and large sample size
iii. Population variances 𝜎𝜎 2 & 𝜎𝜎 2 are known
1 2

❖ The 1 − 𝛼𝛼 100% confidence interval for 𝜇𝜇1 − 𝜇𝜇2 is,


EXAMPLE 5
An experiment was conducted in which two types of engines, A and B were compared. Gas mileage in miles per
gallon was measured. 75 experiments were conducted using engine type A and 50 experiments were done for
engine type B. The gasoline used and other conditions were held constant. The average gas mileage for engine A
was 42 miles per gallon and the average for engine B was 36 miles per gallon. Find a 96% confidence interval on
𝜇𝜇A − 𝜇𝜇B , where 𝜇𝜇A and 𝜇𝜇B are population mean gas mileage for engine A and engine B, respectively. Assume that
the population standard deviations are 8 and 6 for engine A and B respectively.
Solution:
Sample
𝜎𝜎12 𝜎𝜎22
Engine A: 𝑛𝑛1 = 75, 𝑥𝑥1̅ = 42 𝑢𝑢1 − 𝑢𝑢2 = (𝑥𝑥̅1 -𝑥𝑥̅2 ) ± 𝑧𝑧 𝛼𝛼 +
𝑛𝑛1 𝑛𝑛2
Engine B: 𝑛𝑛2 = 50, 𝑥𝑥̅2 = 36 2
82 62
Population = (42 – 36) ± 2.0537 +
75 50
Engine A: 𝜎𝜎1 = 8 Interpretation:
= 6 ± 2.0537 (1.2543)
Engine B: 𝜎𝜎2 = 6 We are 96% confident
= 6 ± 2.5760
that the differences
CL = 96% = 0.96 3.4240 < 𝑢𝑢1 − 𝑢𝑢2 < 8.5760 between the two
α = 1 - 0.96 = 0.04 population means lies
𝛼𝛼
= 0.02 between 3.4240 miles
2
From Z table, per gallon and 8.5760
𝑍𝑍0.02 = 2.0537 miles per gallon.
Exercise 1
According to the latest survey, the average monthly salary of full-time state officers was RM5312.60 in
State A and RM4680.00 in State B in 2019. Suppose that these mean salaries are based on random
samples of 50 full-time state officers from State A and 40 full-time state officers from State B. It is
known from the past study, the population standard deviations of the 2019 salaries of all full-time state
officers in these two states were RM900.00 and RM850.00, respectively.

(a) What is the point estimate of μ1 – μ2?

(b) Construct a 97% confidence interval for the difference between the 2019 mean salaries of all full-
time state officers in these two states.
3.3.2
VARIANCES (𝜎𝜎12, 𝜎𝜎22) OR STD. DEVIATIONS (𝜎𝜎1 , 𝜎𝜎2 ) ARE UNKNOWN
(Large Samples – Both n 1 & n 2 ≥ 30)
❖ Assumption: i. The populations are normally distributed
ii. Population variances 𝜎𝜎 2 & 𝜎𝜎 2 are unknown
1 2

iii. Both populations has larger sample size (n 1 & n 2 ≥ 30)

❖ The 1 − 𝛼𝛼 100% confidence interval for 𝜇𝜇1 − 𝜇𝜇2 is,


EXAMPLE 6
A researcher wants to determine whether there is significant difference in the Body Mass Index (BMI) between
male and female. A survey was conducted on 80 patients at Tawakal Health Centre. The collected data analyzed
using SPSS. The partial output indicated in the following table. Hence find a 95% confidence interval for the mean
difference in t he BMI between male and female. Group statistics
Gender N Mean Std. Deviation Std. Error Mean
BMI Male 40 27.0375 4.41911 0.69872

Female 40 24.0175 4.00031 0.63251


Solution: 𝑠𝑠12 𝑠𝑠22
𝑢𝑢1 − 𝑢𝑢2 = (𝑥𝑥̅1 -𝑥𝑥̅2 ) ± 𝑧𝑧𝛼𝛼 𝑛𝑛1
+
𝑛𝑛2
2
Sample 4.419112 4.000312
Male : 𝑛𝑛1 = 40, 𝑥𝑥1̅ = 27.0375, 𝑠𝑠1 =4.41911 = (27.0375 – 24.0175) ± 1.9600 +
40 40
Female: 𝑛𝑛2 = 40, 𝑥𝑥̅2 = 24.0175, 𝑠𝑠2 = 4.00031 = 3.02 ± 1.9600 (0.9425) Interpretation:
= 3.02 ± 1.8473
CL = 95% = 0.95
α = 1 - 0.95 = 0.05 1.1727 < 𝑢𝑢1 − 𝑢𝑢2 < 4.8673 We are 95% confident
𝛼𝛼 that the differences
= 0.025
2 between the two
From Z table, population means lies
𝑍𝑍0.025 = 1.9600 between 1.1727 and
4.8673.
CONTENTS
3.3 Confidence Interval for the Difference between Two Population Means
3.3.1 Independent Sample - Population Variances (𝜎𝜎1 , 𝜎𝜎2 ) or Population Std. Deviation (𝜎𝜎1 , 𝜎𝜎2 ) is known
3.3.2 Independent Sample - Population Variances (𝜎𝜎1 , 𝜎𝜎2 ) or Population Std. Deviation (𝜎𝜎1 , 𝜎𝜎2 ) is unknown (Large
& Small Sample Size)
3.3.3 Dependent Sample
3.3
CONFIDENCE INTERVAL FOR THE DIFFERENCE BETWEEN TWO
POPULATION MEANS
INDEPENDENT
SAMPLE
MIND MAP FOR C.I FOR (𝝁𝝁1−𝝁𝝁2) USING Z-SCORE
3.3.2
VARIANCES (𝜎𝜎12, 𝜎𝜎22) OR STD. DEVIATIONS (𝜎𝜎1 , 𝜎𝜎2 ) ARE UNKNOWN
(Small Samples – Both n 1 or n 2 < 30)

< 30
HOW TO DETERMINE THE ASSUMPTION OF EQUALITY OF
VARIANCES?
HOW TO DETERMINE THE P-VALUE FOR LEVENE’S TEST?

p-value

p-value
C.I FOR (𝝁𝝁1−𝝁𝝁2) WHEN 𝜎𝜎 21 = 𝜎𝜎22
EXAMPLE 7
An insurance company wants to know if the average speed at which men drive cars is greater than that of women
drivers. The company took a random sample of 26 cars driven by men on a highway and found the mean speed to
be 72 miles per hour with a standard deviation of 2.2 miles per hour. Another sample of 16 cars driven by women
on the same highway gave a mean speed of 68 miles per hour with standard deviation of 2.5 miles per hour.
Assume that the speeds at which all men and all women drive cars on this highway are both normally distributed
with the same population standard deviation. (assume equal variances)
Construct a 98% confidence interval for the difference between the mean speeds of cars driven by all men and all
women on this highway.
Solution:
Sample
Men : 𝑛𝑛1 = 26, 𝑥𝑥̅1 = 72, 𝑠𝑠1 = 2.2
Women: 𝑛𝑛2 = 16, 𝑥𝑥̅2 = 68, 𝑠𝑠2 = 2.5
CL = 98% = 0.98
α = 1 - 0.98= 0.02
𝛼𝛼
= 0.01
2
From t table, df = 𝑛𝑛1 + 𝑛𝑛2 − 2 = 40
𝑡𝑡0.01, 40 = 2.423
Sample
Men : 𝑛𝑛1 = 26, 𝑥𝑥̅1 = 72, 𝑠𝑠1 = 2.2 𝑛𝑛1 − 1 𝑠𝑠12 + 𝑛𝑛2 − 1 𝑠𝑠22
Women: 𝑛𝑛2 = 16, 𝑥𝑥̅2 = 68, 𝑠𝑠2 = 2.5 𝑠𝑠𝑝𝑝 =
𝑛𝑛1 + 𝑛𝑛2 − 2
26−1 2.22 + 16−1 2.52
CL = 98% = 0.98 =
26+16−2
α = 1 - 0.98= 0.02
𝛼𝛼 121+93.75
= 0.01 =
2 40
From t table, df = 𝑛𝑛1 + 𝑛𝑛2 − 2 = 40 214.75
𝑡𝑡0.01, 40 = 2.423 = = 2.3171
40

1 1
𝑢𝑢1 − 𝑢𝑢2 = (𝑥𝑥̅1 -𝑥𝑥̅2 ) ± 𝑡𝑡𝛼𝛼, 𝑑𝑑𝑑𝑑 (𝑠𝑠𝑝𝑝 𝑛𝑛1
+
𝑛𝑛2
) Interpretation:
2
1 1
= (72 - 68) ± 2.423 (2.3171 𝑥𝑥 + ) We are 98% confident
26 16
that the differences
= 4 ± 2.423 (0.7362)
between the two
= 4 ± 1.7839
population means lies
2.2161 < 𝑢𝑢1 − 𝑢𝑢2 < 5.7839 between 2.2161 and
5.7839.
EXAMPLE 8
The manufacturer of a small battery-powered tape recorder decides to include four alkaline batteries with its
product. Two battery suppliers are being considered; each has its own brand (brand 1 and brand 2). The
supervising inspector of incoming quality wants to know if the average lifetimes of two brands are the same. A
sample experiment is conducted: each of ten batteries (five of each brand) is connected to a test device that places
a small drain on the battery power and records the battery lifetimes the following result (in hours) are obtained:
EXAMPLE 9
a) Based on the p-value in the Levene’s Test, test the equality of variances in this study. Use α = 0.05
b) State the 95% confidence interval on the differences between the average lifetimes of the two brands.
c) Based on the confidence interval, can we conclude that the average lifetimes of the two brands are equal?
Solution:

a) 𝐻𝐻0 : 𝜎𝜎12 = 𝜎𝜎22 (𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎)


𝐻𝐻1 : 𝜎𝜎12 ≠ 𝜎𝜎22 (𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 𝑛𝑛𝑛𝑛𝑛𝑛 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎)

p – value = 0.459
α = 0.05

Since p – value (0.459) > α (0.05), fail to reject 𝐻𝐻0


If fail to reject 𝐻𝐻0 , 𝜎𝜎12 = 𝜎𝜎22 (equal variances assumed)

∴ Equal variances assumed


b) State the 95% confidence interval on the differences between the average lifetimes of the two brands.
Sample
𝑛𝑛1 − 1 𝑠𝑠12 + 𝑛𝑛2 − 1 𝑠𝑠22 Interpretation:
Brand 1: 𝑛𝑛1 = 5, 𝑥𝑥̅1 = 44.20, 𝑠𝑠1 = 5.263 𝑠𝑠𝑝𝑝 =
Brand 2: 𝑛𝑛2 = 5, 𝑥𝑥̅2 = 31.60, 𝑠𝑠2 = 4.159 𝑛𝑛1 + 𝑛𝑛2 − 2
We are 95% confident
5−1 5.2632 + 5−1 4.1592
= that the differences
5+5−2
CL = 95% = 0.95 between the two
α = 1 - 0.95= 0.05 =
149.9858
population means lies
𝛼𝛼 8
= 0.025 between 5.6823 and
2 = 4.7432
From t table, df = 𝑛𝑛1 + 𝑛𝑛2 − 2 = 8 19.5177.
1 1
𝑡𝑡0.025, 8 = 2.306 𝑢𝑢1 − 𝑢𝑢2 = (𝑥𝑥̅1 -𝑥𝑥̅2 ) ± 𝑡𝑡𝛼𝛼, 𝑑𝑑𝑑𝑑 (𝑠𝑠𝑝𝑝 + )
2 𝑛𝑛1 𝑛𝑛2
1 1
= (44.20 – 31.60) ± 2.306 (4.7432 𝑥𝑥 + )
5 5
= 12.6 ± 6.9177
5.6823 < 𝑢𝑢1 − 𝑢𝑢2 < 19.5177

c) Based on the confidence interval, can we conclude that the average lifetimes of the two brands are equal?
5.6823 < 𝑢𝑢1 − 𝑢𝑢2 < 19.5177
No, the average lifetimes of the two brands are not equal, because the interval does not include the value of 0.
C.I FOR (𝝁𝝁1−𝝁𝝁2) WHEN 𝜎𝜎12 ≠ 𝜎𝜎22
EXAMPLE 10
A set of facilitation tools to help with data analysis for problem solving is being developed by a group of
statisticians at UiTM. In order to test effectiveness of these tools, a group of research officers were asked to
analyze and produce a built-in report for a set of data on the computer. Twelve equally capable research officers
were randomly selected and six were randomly assigned a standard procedure to complete the task. The other six
were asked to do the task using the developed facilitation tools. The response measured was the time to
completion (in minutes). The output of statistical analysis is shown in the following tables.

Group 1 (Standard procedure) 61 69 68 74 58 63


Group 2 (Facilitation tool) 32 42 40 34 38 33
a) Based on the p-value in the Levene’s Test, test the equality of variances in this study. Use α = 0.05
b) State the 95% confidence interval to estimate the difference between the average completion times for the two
procedures.
c) Based on the confidence interval, can we conclude that the mean difference between the average completion
times for the two procedures are differ?
d) Show the degree of freedom for unequal variances is 8.908.
a) Based on the p-value in the Levene’s Test, test the equality of variances in this study. Use α = 0.05

𝐻𝐻0 : 𝜎𝜎12 = 𝜎𝜎22 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎


𝐻𝐻1 : 𝜎𝜎12 ≠ 𝜎𝜎22 (𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 𝑛𝑛𝑛𝑛𝑛𝑛 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎)

p – value = 0.003
α = 0.05

Since p – value (0.003) < α (0.05), reject 𝐻𝐻0


If reject 𝐻𝐻0 , 𝜎𝜎12 ≠ 𝜎𝜎22 (equal variances not assumed)

∴ Equal variances not assumed


b) State the 95% confidence interval to estimate the difference between the average completion times for the two
procedures.
2
CL = 95% = 0.95 𝑠𝑠2 𝑠𝑠2 5.8912 4.0872
2
1 + 2 34.704 16.704 2
𝑛𝑛1 𝑛𝑛2 + + 73.411
α = 1 - 0.95= 0.05 𝑑𝑑𝑑𝑑 = 2 2 =
6
2
6
2 = 6
2
6
16.704 2
= 33.455 7.751 = 8.908 ≈ 9
𝛼𝛼 𝑠𝑠2 𝑠𝑠2 5.8912 4.0872 34.704 +
= 0.025 1
𝑛𝑛1
2
𝑛𝑛2 6 6 6 + 6 5 5
2 + + 6−1 6−1
6−1 6−1
From t table, df = 9 𝑛𝑛1 −1 𝑛𝑛2 −1

𝑡𝑡0.025, 9 = 2.262 𝑠𝑠12 𝑠𝑠22


𝑢𝑢1 − 𝑢𝑢2 = (𝑥𝑥̅1 -𝑥𝑥̅2 ) ± 𝑡𝑡𝛼𝛼 ( + )
, 𝑑𝑑𝑑𝑑 𝑛𝑛1 𝑛𝑛2
2
5.8912 4.0872
= (65.5 – 36.5) ± 2.262 ( + )
6 6
= 29 ± 2.262(2.927)
= 29 ± 6.621
22.379 < 𝑢𝑢1 − 𝑢𝑢2 < 35.621
Interpretation:
Sample
Group 1: 𝑛𝑛1 = 6, 𝑥𝑥̅1 = 65.5, 𝑠𝑠1 = 5.891 We are 95% confident that the
Group 2: 𝑛𝑛2 = 6, 𝑥𝑥̅2 = 36.5, 𝑠𝑠2 = 4.087 differences between the two population
means lies between 22.379 and 35.621.
c) Based on the confidence interval, can we conclude that the mean difference between the average completion
times for the two procedures are differ?
22.379 < 𝑢𝑢1 − 𝑢𝑢2 < 35.621

Yes, the average completion time of the two procedures are differ, because the interval does not include the value
of 0.

d) Show the degree of freedom for unequal variances is 8.908.

2
𝑠𝑠2 𝑠𝑠2 5.8912 4.0872
2
1 + 2 34.704 16.704 2
𝑛𝑛1 𝑛𝑛2 + + 73.411
6 6
𝑑𝑑𝑑𝑑 = 2 2 = 2 2 = 6
2
6
16.704 2
= 33.455 7.751 = 8.908
𝑠𝑠2
1 𝑠𝑠2
2 5.8912 4.0872 34.704 +
6 6 5 5
𝑛𝑛1 𝑛𝑛2 6 6 +
+ + 6−1 6−1
𝑛𝑛1 −1 𝑛𝑛2 −1 6−1 6−1

df = 8.908, the degree of freedom for unequal variances is 8.908.


Exercise 2
A consumer association wanted to estimate the difference in the mean amounts of caffeine in two brands of
coffee. The agency took a sample of 15 packets (200 grams packet) of Brand I coffee that showed the mean
amount of caffeine in these packets to be 80 milligrams per packet (of 200 grams) with a standard deviation of 5
milligrams. Another sample of 12 packets (200 grams packet) of Brand II coffee gave a mean amount of caffeine
equal to 77 milligram per packet (of 200 grams) with a standard deviation of 6 milligrams. Construct a 98%
confidence interval for the difference between the mean amounts of caffeine per packet of these two brands of
coffee. Assume that the two populations are normally distributed and that the standard deviations of the two
populations are equal.
Exercise 3
Refer to Exercise 2. Construct a 98% confidence interval for the difference between the mean amounts
of caffeine per packet of these two brands. Assume that two populations are normally distributed and
that the standard deviations of the two populations are not equal.
CONTENTS
3.3 Confidence Interval for the Difference between Two Population Means
3.3.1 Independent Sample - Population Variances (𝜎𝜎1 , 𝜎𝜎2 ) or Population Std. Deviation (𝜎𝜎1 , 𝜎𝜎2 ) is known
3.3.2 Independent Sample - Population Variances (𝜎𝜎1 , 𝜎𝜎2 ) or Population Std. Deviation (𝜎𝜎1 , 𝜎𝜎2 ) is unknown (Large
& Small Sample Size)
3.3.3 Dependent Sample
3.3
CONFIDENCE INTERVAL FOR THE DIFFERENCE BETWEEN TWO
POPULATION MEANS
INTRODUCTION
Example:
I. The average lifetimes of 2 different brands of bus tires might be compared to see whether there is any
difference in tread wear.
II. 2 different brands of fertilizer might be tested to see whether one is better than the other for growing
plants.

Let, 𝜇𝜇1 = Mean of 1st population


𝜇𝜇2 = Mean of 2nd population

 We want to find C.I. for (𝝁𝝁1−𝝁𝝁2)


 But we used sample statistics or its estimator (𝑥𝑥1̅ − 𝑥𝑥̅2 ) to make the C.I.
 There are 2 different types of interval estimation for difference between two means, namely INDEPENDENT
and DEPENDENT samples.
DIFFERENT BETWEEN INDEPENDENT & DEPENDENT SAMPLE
Two samples drawn from two populations are independent if the selection of one sample from one
population does not affect the selection of the second sample from the second population. Otherwise,
the samples are dependent.

 INDEPENDENT SAMPLE
❖ Suppose we want to estimate the difference between the mean salaries of all male and all female
executives. To do so, we draw two samples, one from the population of male executives and
another from the population of female executives. These two samples are independent because
they are drawn from two different populations, and the samples have no effect on each other.

 DEPENDENT SAMPLE
❖ Suppose we want to estimate the difference between the mean weights of all participants before
and after a weight loss program. To accomplish this, suppose we take a sample of 30 participants
and measure their weights before and after the completion of this program. Note that these two
samples include the same 30 participants. i.e. Data from the same participant – before and after.
This is an example of two dependent samples. Such samples are also called paired or matched
samples.
DEPENDENT
SAMPLE
3.3.3
DEPENDENT SAMPLE (Matched or Paired Samples)
EXAMPLE 11
A random sample of 9 local banks shows their deposits (in billions of dollars) 3 years ago and their deposits (in
billions of dollars) today. At α=0.05, construct its confidence interval. Assume the variable is normally distributed.
Bank 1 2 3 4 5 6 7 8 9
3 years ago 11.42 8.41 3.98 7.37 2.28 1.10 1.00 0.9 1.35
Today 16.69 9.44 6.53 5.58 2.92 1.88 1.78 1.5 1.22
3 years ago, x1 Today, x2 di = x1i – x2i di2
Solution: 1st step :
11.42 16.69 11.42 – 16.69 = -5.27 (−5.27)2 = 27.773
8.41 9.44 8.41 – 9.44 = -1.03 (-1.03)2 = 1.061
3.98 6.53 3.98 – 6.53 = -2.55 (-2.55)2 = 6.503
7.37 5.58 7.37 – 5.58 = 1.79 (1.79)2 = 3.204
2.28 2.92 2.28 – 2.92 = -0.64 (-0.64)2 = 0.410
1.10 1.88 1.10 – 1.88 = -0.78 (-0.78)2 = 0.608
1.00 1.78 1.00 – 1.78 = -0.78 (-0.78)2 = 0.608
0.90 1.50 0.90 – 1.50 = -0.6 (-0.6)2 = 0.36
1.35 1.22 1.35 – 1.22 = 0.13 (0.13)2 = 0.017

� 𝑑𝑑 = −9.73 � 𝑑𝑑 2 = 40.544
EXAMPLE 11
−9.73
= -1.081
9

9(40.544)− −9.73 2
= 1.973
9(9−1)

𝑑𝑑̅ 1.937
-1.081 ± 𝑡𝑡0.025,8
9 𝑠𝑠𝑑𝑑 Interpretation:
± 𝑡𝑡0.025,6 We are 95% confident
𝑛𝑛
= -1.081 ± 2.306 0.646 that the differences
= -1.081 ± 1.490 between the two
α = 0.05 population means lies
𝛼𝛼 −2.571 < 𝑢𝑢𝑑𝑑 < 0.409
= 0.025 between -2.571 and
2
From t table, 0.409.
n – 1 = 9 -1 = 8
𝑡𝑡0.025, 8 = 2.306
Exercise 4
The manufacturer of a gasoline additive claimed that the use of this additive increases gasoline mileage. A random
sample of six cars was selected and these cars were driven for one week without the gasoline additive and then for
one week with the gasoline additive. The following table gives the miles per gallon for these cars without and with
the gasoline additive.

Construct a 95% confidence interval for the difference in mean mileage per gallon for cars without and with the
gasoline additive.

Solution: Without, x1 With, x2 di = x1i – x2i di2


24.6 26.3
28.3 31.7
18.9 18.2
23.7 25.3
15.4 18.3
29.5 30.9
PAST YEAR QUESTION (JAN’18 – QUESTION 4)
A researcher claims that after playing a certain type of interactive game, the memory capability of a group of
autistic children had improved. In order to test her hypothesis, a sample of 20 autistic children was selected and
the ability to memorize items out of 10 before and after playing the interactive game was recorded. The data were
analysed using SPSS and have the following outputs:

a) Find the values of D and E. (2 marks)


b) State the 95% confidence interval for the mean difference and explain. (3 marks)
PAST YEAR QUESTION (JUNE’19 – QUESTION 6)
One indicator of physical fitness is resting pulse rate. Ten men volunteered to test the exercise device as advertised
on television by using it three times a week for 20 minutes. Their pulse rate (beats per minute) were measured
before and after six week of the test. The data were recorded and analysed using SPSS. The results are shown
below.

a) Find the value of G.


(1 mark)
c) State the 95% confidence interval for the mean difference before and after six weeks of the test.
(1 mark)

You might also like