Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Chi Square Test

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 24

Chi-Square Test

A Chi-Square test is a test of statistical significance for categorical


variables. we typically use it to find how the observed value of a given
event is significantly different from the expected value.
Testing Method

• Chi Square Test is a method for testing if two categorical


variables(does not have any order and can not be measured in high
and low) are related in a given population.

• Q- can you find out the mean value of colour ( Red, blue, green,
yellow).? Yes/No
1- Define Hypothesis

• H0: The two categorical variable are independent. [No Relation]


• H1: The two categorical variable are dependent. [Is Relation]
Assumptions of the Chi-Square Test

• The χ2 assumes that the data for the study is obtained through random selection, i.e.
they are randomly picked from the population.

• The categories are mutually exclusive i.e. each subject fits in only one category. For
e.g.- from our above example – the number of people who lunched in your restaurant
on Monday can’t be filled in the Tuesday category.

• The data should be in the form of frequencies or counts of a particular category and
not in percentages.

• The data should not consist of paired samples or groups or we can say the
observations should be independent of each other.
Example:

A research was interested in the relationship between the placement of


students in the art department of a reputed University and their C.G.P.A.

Here independent variable is C.G.P.A with the categories 9-10, 8-9, 7-8,
6-7, and below 6.
Question:

The statistical question here is: whether or not the observed frequencies
of placed students are equally distributed for different C.G.P.A
categories (so that our theoretical frequency distribution contains the
same number of students in each of the C.G.P.A categories)
Contingency table
Calculate Chi Square Value.

Chi-Square measures how expected count E and observed count O


deviates each other.
1.Step 1: Subtract each expected frequency from the related observed frequency. For example, for
the C.G.P.A category 10-9, it will be “30-20 = 10”. Apply similar operation for all the categories

2.Step 2: Square each value obtained in step 1, i.e. (O-E)2. For example: for the C.G.P.A category 10-
9, the value obtained in step 1 is 10. It becomes 100 on squaring. Apply similar operation for all
the categories

3.Step 3: Divide all the values obtained in step 2 by the related expected frequencies i.e. (O-E) 2/E.
For example: for the C.G.P.A category 10-9, the value obtained in step 2 is 100. On dividing it with
the related expected frequency which is 20, it becomes 5. Apply similar operation for all the
categories

4.Step 4: Add all the values obtained in step 3 to get the chi-square value. In this case, the chi-
square value comes out to be 32.5

5.Step 5: Once we have calculated the chi-square value, the next task is to compare it with the
critical chi-square value. We can find this in the below chi-square table against the degrees of
freedom (number of categories – 1) and the level of significance:
Chi Square
Table
Finally.

• In this case, the degrees of freedom are 5-1 = 4. So, the critical value
at 5% level of significance is 9.49.

• Our obtained value of 32.5 is much larger than the critical value of
9.49. Therefore, we can say that the observed frequencies are
significantly different from the expected frequencies. In other
words, C.G.P.A is related to the number of placements that occur in
the department of statistics.
Chi Square Distribution.
chi-square
The chi-square test helps us to solve the problem in feature
selection by testing the relationship between the features in
machine Learning.
Degrees of freedom:

• In statistics, the number of degrees of freedom is the number


of values in the final calculation of a statistic that are free to
vary.

• In statistics, the degrees of freedom (DF) indicate the number of


independent values that can vary in an analysis without breaking any
constraints. It is an essential idea that appears in many contexts
throughout statistics including hypothesis tests, probability
distributions, and regression analysis.
Feature selection in machine Learning.

• Feature selection is an important problem in machine learning,


where we will be having several features in line and have to
select the best features to build the model. The chi-square test
helps you to solve the problem in feature selection by testing
the relationship between the features.
Feature selection in machine Learning.

• Let’s consider a scenario where we need to determine the


relationship between the independent category feature
(predictor) and dependent category feature(response). In feature
selection, we aim to select the features which are highly dependent
on the response.
2- Get Observed Frequencies. [Given Data
Set]
3- Calculate Expected Frequencies.
4- Calculate Chi Square
Q-Investigate whether gender and type of food
preferences are related.?

• H0: gender and type of food preferences are Independent.


• Not Related

• H1: gender and type of food preferences are dependent.


• Related
Let us see calculation using Excel.
• If Chi Square Value < CV then Fail To Reject H0
P VALUE APPROACH

• EXCEL FORMULA:CHISQ.TEST

• If p-value is less then alpha(0.05) – Reject H0.


• If p-value is greater then alpha(0.05) – Fail to Reject Null H0.

You might also like