STA 342-TESTING HYPOTHESIS-6-The Chi-Square Test
STA 342-TESTING HYPOTHESIS-6-The Chi-Square Test
STA 342-TESTING HYPOTHESIS-6-The Chi-Square Test
http://ecampus.mmust.ac.ke
TOPIC 6: < THE CHI-SQUARE ( 2 ) TEST >
Introduction
In this topic, we will perform the Chi-Square ( 2 ) test that is applied to both parametric and non-
parametric distributions.
Objectives
1. State and test a hypothesis on population variance when mean is given and when mean is not
given ( 2 test).
Learning Activities
Activity 1:
Students to take note of the exercises provided within the text and at the end of the topic
Topic Resources
Students to take note of the reference text books provided in the course outline
Page 2 of 15
TOPIC 6: < THE CHI-SQUARE ( 2 ) TEST >
INTRODUCTION
So far, we have been testing hypotheses on population parameters, i.e. population mean , population
These tests are referred to as Parametric tests as they are centered around population parameters.
However, not all real life situations require parametric testing, since some populations do not have well
defined distributions (Distribution-free populations), hence non–parametric tests become useful.
Some common non–parametric tests include; Mann – Whitney U - test, Kruskal – Wallis test, Wilcoxon
Signed rank test, Kolmogorov – Smirnov test, Chi – Square test, etc.
In this topic, we shall limit ourselves to the Chi – Square test because it is unique, as it applies to both
parametric and non-parametric situations.
Let X X1 , X 2 ,..., X n be identically and independently distributed random variable from normal distribution
with mean and variance 2 , i.e. X ~ N , 2 . To test on variance, we need to review the following.
Recall that:
a) Knowledge of population variability is important, e.g.
i) An educator will want to choose a teaching strategy/ method with low variability in the
students achievement level,
ii) Tyres with low variability in wear may be preferred to more durable tyres with greater
variability in useful lifetime, simply because it’s cheaper to replace an entire set of tyres
periodically for each car,
Page 3 of 15
TOPIC 6: < THE CHI-SQUARE ( 2 ) TEST >
c) If population variance = 2 is unknown, Student’s t- distribution is used to make inference about the
population mean. Thus, a Chi-square distribution assigns a role to S 2 that is similar to the role
Student’s t- distribution assigns to x .
1 n 2
n 2
from 1
xi n x 2 ,
S2
n 1 i 1
xi x
n 1
i 1
n
n 2
we have n 1 S 2 xi x xi2 n x 2
i 1 i 1
2
n
n 2 2
x x xi n x
Dividing on both sides, then: n 1 S
2 2 i
i 1 ~ 2 ......... *
i 1
n 1
2
2 2
f) If is known:
1 n
2
from 1 n
s2 xi xi2 n 2 ,
n i 1 n i 1
n
n 2
we have n s 2 xi xi2 n 2
i 1 i 1
2
n
n 2 2
x xi n
Dividing 2 on both sides, then:
2 i
i 1 ~ 2 ......... **
ns i 1
2 2 2
n
Note:
i) Take note of the use of capital letter S 2 and small letter s 2 (to bring the difference in the above two
equations * and **)
Activity 4.3:
Repeat Activity 4.2 for n 16 and 0.01 , find
a) To test H 0 : 2 02 versus H1 : 2 02 or
H 0 : 2 02 versus H1 : 2 02 or
H 0 : 2 02 versus H1 : 2 02 , where 02 is some specified variance.
b) Under H 0 , the test statistics is:
n s2
i) 2 ~ n2 , where 2 ,n establishes the critical value (if is known)
02
ii) 2
n 1 S 2 ~ 2 , where 2 establishes the critical value (if is unknown)
n 1 , n 1
0 2
Page 5 of 15
TOPIC 6: < THE CHI-SQUARE ( 2 ) TEST >
H 0 : 2 02 versus
critical value 1 , n (lower tailed test)
2
ii)
H1 : 0
2 2
H 0 : 2 02 versus
critical values : C1 1 2 , n and C2 2 , n (Two tailed test)
2 2
iii)
H1 : 0
2 2
n n n
where; ns 2 xi = x 2 xi n 2 and 2 , n , 12 , n ,
2 2
i
2
, n and 2
1 , n are tabulated in the chi –
2 2
i 1 i 1 i 1
N/B: The Decision rule is such that we reject H 0 iff the calculated 2 value is greater than the tabulated
2 -critical value for an upper tail and iff the calculated 2 value is less than the tabulated 2 -critical value
for a lower tail.
H 0 : 2 02 versus
critical value 1 , n1 (lower tailed test)
2
ii)
H1 : 0
2 2
H 0 : 2 02 versus
critical values : C1 1 2 , n 1 and C2 2 , n 1 (Two tailed test)
2 2
iii)
H1 : 0
2 2
2
n 1 S 2
02
which is compared to 2 , n 1 or 12 , n1 (for one tailed test) or 2
, n1 and
2
2
1 , n1 (for two tailed test).
2
n n
where; n 1 s 2 xi x = x nx 2 and 2 , n 1 , 12 , n1 ,
2 2 2 2
i , n1 and 1 , n1 are tabulated in the chi–
2 2
i 1 i 1
square( 2 ) tables.
Example 6.1:
The weight of a new package of juice is known to have a mean of 40 grams. A sample of n 7 packages
yielded the weights 37, 38, 38, 39, 40, 41 and 42 grams respectively. Assuming the weights come from a
normal population, test the hypothesis H0 : 4 against H1 : 4 where is the standard deviation
Solution:
H0 : 2 16 against H1 : 2 16 .
n n
Data summary: n 7 ; 40 ; xi 275 ;
i 1
x
i 1
2
i 10823 ; x 275
7 39.28571429 ;
n n n
ns 2 xi = xi2 2 xi n 2 = 10823 2 40 275 7 40 = 23
2 2
i 1 i 1 i 1
Therefore, 2 ns 2
02
= 23
16 = 1.4375
Page 7 of 15
TOPIC 6: < THE CHI-SQUARE ( 2 ) TEST >
Example 6.2:
A particular postmaster has determined that the current procedure of separate lines yields a standard
deviation in waiting times on December mornings of 02 10 minutes per customer. She wishes to
implement the single line policy on a trial basis to see if a reduction in waiting time variability is achieved.
A sample of 30 customers was monitored and their waiting times were determined which gave a sample
standard deviation s 5 minutes. Test at 1% 0.01level.
Solution:
2
n 1 S 2 30 1 52 7.25
02 102
Exercise 6.1:
Repeat Example 6.1 while assuming that the population mean 40 was unknown.
H0 : 2 16 against H1 : 2 16 .
2
n 1 S 2
02
???
Page 8 of 15
TOPIC 6: < THE CHI-SQUARE ( 2 ) TEST >
Exercise 6.3:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Note: Alternatively; we can use the p value approach.
Exercise 6.4:
a) Refer to Example 6.1 and use the p value approach to test H0 : 2 16 against H1 : 2 16 at
0.05 , where mean is known
Hint: p value P 2 21 , n
b) Refer to Example 6.1 and use the p value approach to test H0 : 2 16 against H1 : 2 16 at
0.01 , where mean is unknown
Hint: p value P 2 21 , n1
c) Refer to Example 6.2 and use the p value approach to test H 0 : 2 100 verses H1 : 2 100 at
1% 0.01, where mean is unknown
Hint: p value P 2 21 , n1
Exercise 6.5:
Refer to Example 6.2 and use it to test H 0 : 2 100 versus H1 : 2 100 . (i.e. a two tailed test)
Exercise 6.6:
A manufacturer of car batteries guarantees that his batteries will last, on the average, 3 years with a standard
deviation of 1 year. If 5 of these batteries have lifetimes of 1.9, 2.4, 3.0, 3.5 and 4.2 years, is the
manufacturer still convinced of his standard deviation of 1 year? Test at (i) 0.01 and (ii) 0.05
Page 9 of 15
TOPIC 6: < THE CHI-SQUARE ( 2 ) TEST >
REMARK 1:
or 2 2
, n1 and 1 , n1
2 2
REMARK 2:
A Chi-square distribution 2 ,v table gives values up to n 30 . For n -large, (i.e. n 30 ), we use the
normal approximation to Chi-square distribution, with:
2 v n 1
2
Exercise 6.7:
For each of the following hypotheses testing situations, state whether we reject or fail to reject H 0
Exercise 6.8:
For each of the following hypotheses testing situations, state whether we reject or fail to reject H 0
Page 10 of 15
TOPIC 6: < THE CHI-SQUARE ( 2 ) TEST >
Example 6.3
A study was carried out to investigate whether there is a relationship between students’ performance in an
end of stage examination and the category of school attended. The data collected is summarized in the table
below:
GRADES TOTAL
SCHOOL CATEGORY A B C
X 18 12 20
Y 26 12 32
TOTAL
Solution:
Note that this is a 2 3 contingency table, i.e. 2 rows and 3 columns.
We wish to test
H0 : grades are independent of schools versus H1 : grades are NOT independent of school
Steps:
1. Find the columns totals, row totals and overall total.
GRADES TOTAL
SCHOOL CATEGORY A B C
X 18 12 20 50
Y 26 12 32 70
TOTAL 44 24 52 120
Row total Column total
2. Find the expected frequencies (Eij) by taking: Overall total e.g.
Row1 total Column1 total
i. the expected frequency for observation 18 is E11 Overall total 50120
44
18.33333
Row1 total Column 2 total
ii. the expected frequency for observation 12 is E12 Overall total 50120
24
10
Page 11 of 15
TOPIC 6: < THE CHI-SQUARE ( 2 ) TEST >
Ei
, where Oi is the observed frequency / experimental values and Ei is expected
i 1
frequency / theoretical values.
Oi Ei Oi Ei 2
Ei
X 18 18.33333 0.006060606
12 10 0.4
20 21.6666 0.1282051282
Y 26 25.6666 0.004329004329
12 14 0.2857142857
32 30.3333 0.09157509158
TOTAL 0.9158841158
Ei 0.9158841158
i 1
5. Find the number of degrees of freedom, i.e. (total number of rows minus one) (total number of
columns minus one).
This implies that h 1 k 1 = 2 1 3 1 = 1 2 = 2
6. Find the tabulated chi – square value.
2 , v = 0.05,
2
2 5.99
fail to reject H 0 .
The Decision: There is NO statistical evidence to suggest a connection between the school and the
grades
Or
Performance and school are independent.
Page 12 of 15
TOPIC 6: < THE CHI-SQUARE ( 2 ) TEST >
Exercise 6.9
A sample of 200 people with a particular disease was selected. Out of these, 100 were given a drug and the
others were not given any drug. The results are as follows:
Number of people
STATUS Drug No Drug
Cured 65 55
Not cured 35 45
Exercise 6.10
A certain drug is said to be effective in curing cold. In an experiment on 500 people, half were given the
drug and half were cheated with sugar pills. The patients’ reactions to drugs are recorded in the table below
On the basis of this information, can it be concluded that there is a significant difference in the effect of the
drug and the sugar pills? Test at 5% .
Here we will consider the goodness of fit test for a normal distribution.
Example 6.4
The height in cm gained by a plant in its first year of planting is denoted by a random variable X. The value
of X is measured for a random sample of 86 plants and the results obtained are summarized in the table
below:
Page 13 of 15
TOPIC 6: < THE CHI-SQUARE ( 2 ) TEST >
a) Assuming X is modelled by a N (50,152 ) calculate the expected frequencies for each of the five
classes.
b) Carry out a goodness of fit test at 5% to test the hypothesis that X can be modelled as in (a)
above.
Solution:
Probabilities E Pr obability 86
35 50 13.7
p( X 35) p( Z ) p(Z 1) 1 0.8413 0.1587
15
p(35 X 45) p(1 Z 0.333) 0.8413 0.6304 0.2109 18.1
E 86
b) We wish to test
H0 : X N (50,152 ) versus
H1 : X is not distributed this way
Page 14 of 15
TOPIC 6: < THE CHI-SQUARE ( 2 ) TEST >
10 13.7 0.999…
18 18.1 0.0005…
28 22.4 1.4….
18 18.1 0.0005…
12 13.7 0.210…
(O E ) 2
E =2.611
(4)
2
9.488
Exercise 6.11
A weaving mill sells lengths of cloth with a nominal length of 70m. A customer measured 100 lengths and
obtained the following frequency distribution.
Use a 2 test at 5% to show that a normal distribution is not an adequate model for this data.
Page 15 of 15