Chi Square (KI Square) Test
Chi Square (KI Square) Test
Chi Square (KI Square) Test
test
2
_
Chi square test
A chi square test is an inferential statistic
Analyzes proportions/frequency data
Uses proportions/distributions from a sample to
test hypotheses about proportions/ distributions
in the population
3
1. Determine Appropriate Test
Chi Square is used when both variables are
measured on a nominal scale.
It can be applied to interval or ratio data that have
been categorized into a small number of groups.
It assumes that the observations are randomly
sampled from the population.
All observations are independent (an individual
can appear only once in a table and there are no
overlapping categories).
It does not make any assumptions about the
shape of the distribution nor about the
homogeneity of variances.
_
2
test
For testing significance of patterns in
qualitative data
Test statistic is based on counts that
represent the number of items that fall in
each category
Test statistics measures the agreement
between actual counts and expected counts
assuming the null hypothesis
Chi-squared Tests
_
2
Distribution
The chi-square distribution can be used to
see whether or not an observed counts
agree with an expected counts. Let
O = observed count and
E = Expected count
Chi-squared Distribution
=
E
E O
2
) (
2
_
The chi square test can only be used on data that
has the following characteristics:
The data must be in the form
of frequencies
The frequency data must have a
precise numerical value and must be
organised into categories or groups.
The total number of observations must be
greater than 20 to obtain the precise
result of chi square test.
The expected frequency in any one cell
of the table must be greater than 5.
Total number of frequency must be
large that is N greater than 50
Chi square test has several applications .
Some important applications of this test are
as follows
Test of goodness of fit
Test of independence of attributes
Two chi square tests
Goodness of fit
One variable
Determines how well the sample proportions
match a pre-specified distribution
Independence
Two variables
Determines whether there is a relationship
between two variables
Steps in hypothesis testing
1. State the hypotheses
null
alternative
2. Select an alpha level and determine the critical value
3. Compute the test statistic
4. Make a decision
Test for goodness of fit
Forms of the null hypothesis
1. No preference
There is no difference in proportions among the categories
Participants do not prefer one category over another
Example: Pepsi: 50%, Coke 50%
2. No difference from a comparison population
There is no difference between the sample distribution and a
known (population) distribution
Calculating the test statistic
Observed frequencies
the number of individuals from the sample who
are classified in a particular category
O
Expected frequencies
the number of individuals from the sample who
are expected to be classified in a particular
category
E
Calculating the test statistic
Heads Tails
Percentages 50%
50%
Proportions .5 .5
Consider an example of Coin flip
Null hypothesis H
o
: coin is unbiased
Alternative hypothesis H
1
: coin is biased
Calculating the test statistic
Expected Heads Tails
Proportions .5 .5
Frequencies 25 25
Expected frequency = O
N= 50 (sample size)
Under H
0
: coin unbiased
E = .5 x 50 = 25
Calculating the test statistic
N= 50
Following are the number of head and tail
observed in tossing a coin 50 times.
Heads Tails
Observed 35 15
Calculating the test statistic
Heads Tails
Observed 35 15
Expected 25 25
Steps
1. find the difference between O and E for
each category
2. square the difference
3. divide the squared difference by O
4. sum the values from all categories
| |
( )
=
2
2
E
E O
_
= 4 + 4 = 8
Heads Tails
Observed (O) 35 15
Expected (E) 25 25
O - E
10 -10
(O - E)
2
100 100
(O - E)
2/
E 4 4
( )
=
2
2
E
E O
_
Test for goodness of fit
The greater the number of categories, the
greater the likelihood of a large observed chi
square value
Degrees of freedom (df)
The number of values that are free to vary
df = C 1
C = the number of categories
Chi square distribution
Critical values for chi square distribution
Chi square distribution
x
2
0 3.84
5%
Critical value (df = 1, o = .05) = 3.84
Goodness of fit
Make a decision
Critical value = 3.84 with df = 1 and o = .05.
Observed chi square = 8.0
8.0 > 3.84
Observed chi square is greater than critical value
We reject the null hypothesis
Conclude that category frequencies are different
People were more likely to predict heads than tails
2 x 2 contingency tables
Var A
Var B
a1
a2
b1 b2
total
total
Ho : The two variable are independent
Ha : The two variables are associated
Chi-squared test for independence of Attributes
Operator
Result
A
B
def
Not def.
total
total
100 900 1000
60 440 500
160 1340 1500
_
2
test
_
2
test
Operator
Result
A
B
def
not
def.
total
total
100 900 1000
60 440 500
160 1340 1500
Total number of items=1500
Total number of defective items=160
Overall defective rate =160/1500=0.1067
Now, apply this rate to the number of items
produced by each operator.
_
2
test
Operator
Result
A
B
def Not def.
total
total
100 900 1000
60 440 500
160 1340 1500
Expected defective from Operator A
= 1000 * 0.1067 = 106.7
(expected not defective=1000-106.7=893.3)
Expected defective from Operator B
= 500 * 0.1067 = 53.3
(expected not defective=500-53.3=446.7)
Expected frequency = row total x column total
Grand total
Alternatively in case of contingency table
expected frequency can calculate as follow
_
2
test
Operator
Expected
A
B
def
Not def.
total
total
106.7 893.3
53.3 446.7
Result
Operator
A
B
def Not def.
total
total
100 900
1000
60 440 500
160 1340 1500
Observed Expected
100 106.7
60 53.3
900 893.3
440 446.7
( )
=
2
2
E
E O
_
_
2
test
r x c contingency tables
SA A NO D SD
Gr 1 12 18 4 8 12
Gr2 48 22 10 8 10
Gr3 10 4 12 10 12
_
2
test
use when you have categorical data
measure the difference between actual counts
and expected counts
test the independence of two variables
Assumptions:
data set is a random sample
you have at least 5 counts in each category
degrees of freedom =
(categories var1 -1)(categories var2 -1)