Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

MATHunit 3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Unit 3:

Test of Hypotheses for the Difference


between Two Population Means and
ANOVA

Lesson Objectives: At the end of this unit, you are expected to:

1. Formulates the appropriate null and alternative hypotheses on a population


mean.
2. Identifies the appropriate form of the test-statistic (t-test/ ANOVA)
3. Identifies the appropriate rejection region for a given level of significance.
4. Computes for the test-statistic value.
5. Draws conclusion about the population mean based on the test-statistic
value and the rejection region.
6. Solves problems involving test of hypothesis on the population mean.

This unit deals with the Difference Between Two Population Means t- test and ANOVA.
Observe the figure below. When do we use t-test and ANOVA?
The figure above is taken from https://www.iuj.ac.jp/faculty/kucc625/method/anova.html

We follow the same steps and decision rules in hypothesis testing.

Test of Hypothesis for the Difference between Two Population Means

A. T-test for Independent Samples:


If 𝝈𝟏 and 𝝈𝟐 are unknown regardless of the sample size.

(̅̅̅
𝑥1̅−𝑥
̅̅̅2̅)
𝑡= 1 1
√𝑠𝑝2(𝑛 +𝑛 )
1 2

(𝑛1−1)𝑠12+(𝑛2 −1)𝑠22
Where: 𝑠𝑝2 = ; with: 𝑑𝑓 = 𝑛1 + 𝑛2 − 2
𝑛1 +𝑛2−2

Two-tailed test: 𝐻𝑜 : 𝜇1 = 𝜇2 or 𝜇1 − 𝜇2 = 0
𝐻𝑎 : 𝜇1 ≠ 𝜇2 or 𝜇1 − 𝜇2 ≠ 0
One- tailed test: 𝐻𝑜 : 𝜇1 = 𝜇2 or 𝜇1 − 𝜇2 = 0 or 𝐻𝑜 : 𝜇1 ≥ 𝜇2 or 𝜇1 ≤ 𝜇2
𝐻𝑎 : 𝜇1 < 𝜇2 or 𝜇1 > 𝜇2

 The Independent Samples t- test compares the means of two independent groups in
order to determine whether there is statistical evidence that the associated
population means are significantly different.

B. T- test for Dependent Samples (Paired Differences: Paired t- test/ Related or


Matched Samples)

Dependent samples mean a single sample is undertaken under two separate


conditions. To test a hypothesis concerning mean for two dependent samples, use
the formula below:

̅
𝒅
𝒕= 𝑺𝒅 ; with df = n – 1

√𝒏
𝒏(∑ 𝒅𝒊 𝟐 )−(∑ 𝒅𝒊 )𝟐 ∑ 𝒅𝒊
Where: 𝑺𝒅 = √ ; ̅=
𝒅
𝒏(𝒏−𝟏) 𝒏

̅ = the means of the difference between the n matched pairs of measures


𝒅
Sd = the standard deviation of the mean differences
n = total number of matching

𝑯𝒐 : 𝝁𝒅 = 𝟎 → 𝑯𝒂 : 𝝁𝒅 ≠ 𝟎 ; 𝑯𝒐 : 𝝁𝒅 ≥ 𝟎 → 𝑯𝒂 : 𝝁𝒅 < 𝟎

𝑯𝒐 : 𝝁𝒅 ≤ 𝟎 → 𝑯𝒂 : 𝝁𝒅 > 𝟎

Examples for t- test- Independent Sample

Example 1:
A group of high school students was exposed to two different teaching methods, and
students’ ratings in the tests were obtained as follows:
Method A: 77, 85, 92, 87, 88, 84, 80, 82, 89, 79
Method B: 75, 93, 83, 90, 78, 83, 79, 91
Use 0.05 level of significance to test that there is no significant difference between the
mean ratings.

1 Let: 𝜇1 = Method A 𝜇2 = Method B


𝐻𝑜 : 𝜇1 = 𝜇2 : There is no significant difference between the mean ratings. (claim)
𝐻𝑜 : 𝜇1 ≠ 𝜇2 : There is a significant difference between the mean ratings.
2 T-test for independent samples Type of tailed: two-tailed
3 CV= +
−2.120 since 𝑑𝑓 = 10 + 8 − 2 = 16 𝛼 = 0.05
4 Method A: 𝑥̅1 = 84.3 𝑠12 = 23.12 𝑥̅2 = 84 𝑠22 = 44.29 𝑛1 = 10 𝑛2 = 8
𝑠𝑝2 = 32.38
Remember, you can use your calculator
84.2−84
𝑡= 1 1
= 0.11 in getting the mean and sample
√32.38( + )
10 8 standard deviation taught in module 1
and 2.

5 Since |𝑡𝑐𝑜𝑚𝑝 | = 0.11 < |𝑡𝑐𝑟𝑖𝑡 | = 2.120, then Do Not Reject the 𝐻0

6 At 0.05 level of significance, there is no sufficient evidence to reject the claim.

Or: At 0.05 level of significance, there is sufficient evidence to show that there is no
significant difference between the mean ratings.

Using excel to compute for the value of t:

Step 1: Encode the data in excel. Step 2: Click on the “data” tab

Step 3: a) Click on the “Data Analysis,”


b) find “t test: Two Sample Assuming Equal Variances”
c) click “ok”
Note: If there’s no data analysis under the data tab, follow these steps in adding in “Data
Analysis”
1. Click the File tab, click Options, and then click the Add-Ins category.
2. In the Manage box, select Excel Add-ins and then click Go.
3. In the Add-Ins box, check the Analysis ToolPak check box, and then click OK.
• If Analysis ToolPak is not listed in the Add-Ins available box, click Browse to
locate it.
• If you are prompted that the Analysis ToolPak is not currently installed on
your computer, click Yes to install it.

Step 4: a) input the data of Method A to Variable 1 Range by highlighting cell A1 to A11
b) input the data of Method B to Variable 2 Range by highlighting cell B1 to B9
c) click on the labels because we included the labels
(Method 1- A1, and Method B- b1)
d) make sure the “Alpha” is correct. Alpha 0.05 is the default and you can change it
depending on the given.
e) click on the box beside output range, then click any cell (in the example below, I
clicked on the cell D2.
f) click “ok”
When clicking ok, the output will show on cell D2. As you can see the values are the same as
what we have computed manually above.

Example 2:

Ten female and male soldiers participated in a shooting competition. Their scores were
recorded as follows:
Female 67 24 57 55 63 54 56 68 33 43
Male 70 38 58 58 56 67 56 75 42 38

Do the data indicate that the male soldiers are better in shooting competitions than the
female soldiers? Use 1% level of significance.

1 Let: 𝜇1 = Female 𝜇2 = Male

𝐻𝑜 : 𝜇1 = 𝜇2 𝑜𝑟 𝐻𝑜 : 𝜇1 ≥ 𝜇2
𝐻𝑎 : 𝜇1 < 𝜇2 claim

2 T-test for independent samples Type of tailed: left- tailed


3 CV= -2.2552 since 𝑑𝑓 = 10 + 10 − 2 = 18 𝛼 = 0.01
4 2 2
Method A: 𝑥̅1 = 52 𝑠1 = 209.11 𝑥̅2 = 55.8 𝑠2 = 169.96 𝑛1 = 10 𝑛2 = 10
𝑠𝑝2 = 189.53

52−55.8
𝑡= 1 1
= −0.617
√189.53( + )
10 10

5 Since |𝑡𝑐𝑜𝑚𝑝 | = 0.617 < |𝑡𝑐𝑟𝑖𝑡 | = 2.2552, then Do Not Reject the 𝐻0
6 At 0.01 level of significance, there is no sufficient evidence to show that male soldiers
are better in shooting competition than the female soldiers

Excel Output:

➢ As you can see, the values are the same even when you compute it manually.

Examples for t- test- Dependent Sample

Example 3

Charlie and Kate administered their researcher-made test to measure male high school
students' mathematical ability on some solid figures. They randomly took 10 male pupils of
San Louis laboratory High School. Their scores in each solid figure are shown below:
Students 1 2 3 4 5 6 7 8 9 10
Cube 9 7 6 6 4 3 5 4 5 8
Cylinder 5 6 8 6 5 6 5 7 3 5
Is there difference in the students' mathematical ability (mean scores) on the above-
mentioned solid figures? Assume that the test score is approximately normally distributed.
Use 0.01 level of significance.

Solution: First, you have to get the difference of each score (d) and the mean
difference
Students 1 2 3 4 5 6 7 8 9 10
Cube 9 7 6 6 4 3 5 4 5 8
Cylinder 5 6 8 6 5 6 5 7 3 5
d 4 1 -2 0 -1 -3 0 -3 2 3 ∑ 𝑑 = 1 𝑑̅ = 0.1
𝑑2 16 1 4 0 1 9 0 9 4 9 ∑ 𝑑 2 = 53
1 𝐻𝑜 : 𝜇𝑑 = 0
𝐻𝑎 : 𝜇𝑑 ≠ 0 claim

2 T-test for dependent samples Type of tailed: two-tailed


3 CV= ±3.250 since 𝑑𝑓 = 10 − 1 = 9 𝛼 = 0.01
4
𝒏(∑ 𝒅𝒊 𝟐 )−(∑ 𝒅𝒊 )𝟐 𝟏𝟎(𝟓𝟑)−(𝟏)
𝑺𝒅 = √ = √ 𝟏𝟎(𝟏𝟎−𝟏) = 𝟐. 𝟒𝟐𝟒𝟒
𝒏(𝒏−𝟏)

𝒅̅ 0.1
𝒕= 𝑺𝒅 = 2.4244 = 𝟎. 𝟏𝟑𝟎𝟒
⁄ √10
√𝒏

5 Since |𝑡𝑐𝑜𝑚𝑝 | = 0.1304 < |𝑡𝑐𝑟𝑖𝑡 | = 3.250, then Do Not Reject the 𝐻0

6 At 0.01 level of significance, there is no sufficient evidence to show that there is


significant difference in the mathematical ability on solid figures of the students.

Excel Presentation:

The process is the same as the t- test for independent samples. The difference is that you
will choose the “t-test: Paired Two Samples for Mean”
The result should be like the photo below:

To do: Compare the result of the excel output and the one we computed manually.

Example 4
Twenty college freshmen were divided into 10 pairs, each member of the pair having
approximately the same IQ. One of each pair was selected at random and assigned to a Math
section using programmed materials only. The other member of each pair was assigned to a
section in which the professor lectured. At the end of the semester, each group was given
the same examination and the results were recorded. Assuming that the populations are
normally distributed, test at 0.01 significance level that there is no difference in the two
learning procedures.

Pair Programmed Lecture d d2


Materials
1 75 80
2 89 84
3 60 52
4 75 77
5 68 80
6 90 85
7 79 85
8 72 71
9 83 91
10 78 80
Σ

1 𝐻𝑜 : 𝜇𝑑 = 0 There is no significant difference in the two learning procedures.


(Claim)
𝐻𝑎 : 𝜇𝑑 ≠ 0 The two learning procedures differ significantly.

2 T-test for dependent samples Type of tailed: two-tailed


3 CV= ±3.250 since 𝑑𝑓 = 10 − 1 = 9 𝛼 = 0.01
4
𝒏(∑ 𝒅𝒊 𝟐 )−(∑ 𝒅𝒊 )𝟐 𝟏𝟎(𝟑𝟗𝟐)−(−𝟏𝟔)𝟐 Make sure you have solved for
𝑺𝒅 = √ 𝒏(𝒏−𝟏)
=√ 𝟏𝟎(𝟏𝟎−𝟏)
= 𝟔. 𝟑𝟖𝟎𝟓 ∑ 𝑑 & ∑ 𝑑 2

𝒅̅ −1.6
𝒕= 𝑺𝒅 = 6.3805 = −0.793
⁄ √10
√𝒏

5 Since |𝑡𝑐𝑜𝑚𝑝 | = 0.793 < |𝑡𝑐𝑟𝑖𝑡 | = 3.250, then We Fail to Reject the 𝐻0

6 At 0.01 level of significance, there is no sufficient evidence to show that there is


significant difference in the two learning procedures.

From excel output:


For the next topic, we will be dealing with three or more population means. There are some
more ANOVA topics, but we will focus on ANOVA: Single Factor (One-way ANOVA).

Test of Hypothesis for the Difference between Two or More Population Means

ANALYSIS OF VARIANCE (AN0VA)- One Way

Analysis of Variance is used to test hypothesis about three or more population means
rather than population variances. The F-test is used to test the significance of the
differences of the population means named after R.A. Fisher.
- F- test: key statistics in ANOVA

One-way ANOVA: this design is called s completely randomized experiment with equal
sample sizes. The term “one-way” refers to the condition that only one factor attribute is
being studied in the experiment. (This is also categorized as a single factor ANOVA)

Assumptions underlying the use of the ANOVA


1. The individuals in the various subgroups should be selected on the basis of
random sampling from normally distributed populations.
2. The variances of the subgroups should be homogenous
(𝜎12 = 𝜎22 = 𝜎32 = ⋯ = 𝜎𝑛2 )
3. The samples that constitute the groups should be independent.

The purpose of ANOVA, as the term implies, is to establish the variations (or sources of
differences) between groups and within groups. In comparing the groups, there are three
possible sources of variation, these are:
1. Variation between groups (column means or treatments).
2. Variation within groups (experimental error).
3. Total variation among the values of all groups.

➢ The population being studied are referred to as the treatments. The term
treatment is used generally to refer various classifications such as different
schools, fertilizers, varieties of plants, soil types, age groups, methods,
machines, courses, or different locations. The variation among samples
involves the differences among the treatment means while variation within
samples is due to random error.
𝐻𝑜 : 𝜇1 = 𝜇2 = 𝜇3 = 𝜇4 = 𝜇5
𝐻𝑎 : 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑡𝑤𝑜 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛𝑠 𝑎𝑟𝑒 𝑛𝑜𝑡 𝑒𝑞𝑢𝑎𝑙

➢ Basically, the steps are the same as what we have learned above/ from the
previous lessons.
➢ We will not deal with the manual computations of the f- test. What we will do
is the computation in excel. However, I will show you one example of manual
computation, but the rest will be in excel.

Note: You can see the table for F- distribution table (Critical Values) at the end of this
Module (Appendix).

Example 1. The following are the IQ’s of a random sample of students from 3 large schools.
At 5% level of significance, test if there is a significant difference among the three groups.
School 1 School 2 School 3
101 93 104
107 106 96
106 95 103
98 96 108
115 100 93

1 𝐻𝑜 : 𝜇1 = 𝜇2 = 𝜇3 There is no significant difference among the population means of


the three groups.
𝐻𝑎 : 𝜇𝑑 ≠ 0 At least two of the population means differ (Claim)

2 F-test/ ANOVA Single Factor


3 CV= 3.885 𝛼 = 0.05

Degrees of freedom (df):


𝐷𝑓𝑏 = 𝑘 − 1 = 3 − 1 = 2
𝐷𝑓𝑤 = 𝑁 − 𝑘 = 15 − 3 = 12

Where: k=number of groups


N=total sample size

Note: please see the appendix for the 𝑓𝑐𝑟𝑖𝑡 table


4 ∑ 𝑥𝑇 = 𝑠𝑢𝑚𝑚𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑡𝑜𝑡𝑎𝑙 𝑜𝑓 𝑎𝑙𝑙 𝑡ℎ𝑒 𝑔𝑟𝑜𝑢𝑝𝑠
∑ 𝑥 𝑇2 = 𝑠𝑢𝑚𝑚𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑠𝑞𝑢𝑎𝑟𝑒𝑑 𝑠𝑐𝑜𝑟𝑒𝑠 𝑜𝑓 𝑎𝑙𝑙 𝑡ℎ𝑒 𝑔𝑟𝑜𝑢𝑝𝑠
N = total sample size
n = sample size of each group
Steps:
(∑ 𝑥𝑇 )2
1.) SST (Sum of squares) 𝑆𝑆𝑇 = ∑ 𝑥 𝑇2 − 𝑁
(1521)2
𝑆𝑆𝑇 = 154795 − = 𝟓𝟔𝟓. 𝟔
15

2.) SSB (between sum of squares)


(∑ 𝑥1 )2 (∑ 𝑥2 )2 (∑ 𝑥3 )2 (∑ 𝑥𝑘 )2 (∑ 𝑥𝑇 )2
𝑆𝑆𝑏 = [ + + + ⋯+ ]−
𝑛1 𝑛2 𝑛3 𝑛𝑘 𝑁

(527)2 (490)2 (504)2 (1521)2


𝑆𝑆𝑏 = [ + + ]− = 𝟏𝟑𝟗. 𝟔
5 5 5 15
3.) SSW (within sum of square)
𝑆𝑆𝑤 = 𝑆𝑆𝑇 − 𝑆𝑆𝑏 𝑆𝑆𝑤 = 565.6 − 139.6 = 𝟒𝟐𝟔

4.) Mean Squares (MS)


𝑆𝑆𝑏 139.6
𝑓𝑜𝑟 𝑀𝑆𝑏 = 𝑀𝑆𝑏 = = 𝟔𝟗. 𝟖
𝑑𝑓𝑏 2
𝑆𝑆𝑤 426
𝑓𝑜𝑟 𝑀𝑆𝑤 = 𝑑𝑓𝑤
𝑀𝑆𝑤 = 12
= 𝟑𝟓. 𝟓
𝑀𝑆𝑏 69.8
5.) Solve for F 𝐹 = 𝑀𝑆𝑤 𝐹 = 35.5 = 𝟏. 𝟗𝟕

5 Since |𝐹𝑐𝑜𝑚𝑝 | = 1.97 < |𝐹𝑐𝑟𝑖𝑡 | = 3.89, then We Fail to Reject the 𝐻0
6 At 0.05 level of significance, there is not enough evidence to show that there is a
significant difference among the three groups.

Excel Process:
The excel output should be like the picture below:

Where: SS= sum of squares


df= degrees of freedom
MS= mean square
Example 2
The table below gives the output for 6 years of an experimental farm that used each of 4
fertilizers. Assume that the outputs with each fertilizer are normally distributed with equal
variance. Test the hypothesis that the population means are the same at the 5% level of
significance.

Yield (cavans)
Year Fertilizer 1 Fertilizer 2 Fertilizer 3 Fertilizer 4
1 49 46 55 50
2 57 51 61 58
3 56 58 52 57
4 52 61 60 65
5 47 50 48 61
6 59 48 57 53

Solution:
1 𝐻𝑜 : 𝜇1 = 𝜇2 = 𝜇3 There is no significant difference among the population means.
(claim)

𝐻𝑎 : 𝜇𝑑 ≠ 0 At least two of the population means differ.

2 F-test/ ANOVA Single Factor


3 CV= 3.098 𝛼 = 0.05

Degrees of freedom (df):


𝐷𝑓𝑏 = 𝑘 − 1 = 4 − 1 = 3
𝐷𝑓𝑤 = 𝑁 − 𝑘 = 24 − 4 = 20

Where: k=number of groups


N=total sample size
4 Excel Output
Wherein the 𝑓𝑐𝑜𝑚𝑝 = 1.086

5 Since |𝐹𝑐𝑜𝑚𝑝 | = 1.086 < |𝐹𝑐𝑟𝑖𝑡 | = 3.098, then We Fail to Reject the 𝐻0

6 At 0.05 level of significance, there is not enough evidence to show that there is a
significant difference among the three groups.

Now it’s your turn to answer some exercises. Refer at the end of Module 5 for the key
answer.

1. What is the difference between t-test and ANOVA, if any?


2. In an analysis of variance, what do treatments represent?
3. What kind of t-test (Independent/ Dependent) will be applied in the following
situations?
a. Sample 1: Resting heart rates of 35 individuals before drinking coffee.
Sample 2: Resting heart rates of the same individuals after drinking coffee.
_______________________
b. Sample 1: Test scores for 35 statistics students.
Sample 2: Test scores for 42 biology students who do not study statistics.
_______________________
For number 4: Conduct Hypothesis Testing
4. Ten subjects were chosen for an experiment. They were asked to perform a certain
physical activity. The number of heart beats per minute, before and after the
experiment are recorded as follows:
Subject 1 2 3 4 5 6 7 8 9 10
No.
Before 60 67 72 71 68 72 71 69 75 68
After 92 79 72 80 72 76 73 81 80 76
Is there sufficient evidence to indicate that the experimental condition increases the
number of heart beats per minute? Test at ɑ= 0.05.

5. Complete the one-way (single factor) ANOVA table and answer the questions below
the table.
Source of df Sum of Mean of Computed F
variation squares squares
Between 2 𝐹𝑐𝑜𝑚𝑝 =
samples
Within
samples 57.87
(error)
Total 11 285.00
a. How many treatments are there? ________________________
b. What is the sample size? _______________________
c. What is the critical value if 𝛼 = 0.01? _________________
d. State the null and the alternative hypothesis.
𝐻𝑜 :_______________________________________________________________________
𝐻𝑎 :_______________________________________________________________________

e. What is your conclusion regarding the null hypothesis?

As we all know, you are done with your Research 1 subject last semester and
currently you are taking another research subject this second semester. As part of
your evaluation, write down your research title (last semester/ this semester) and
your specific research questions. Identify what statistical tool you need to answer
your research questions.

You might also like