Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Chi Square

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 30

Chi-Square Test

Chi Square-Test
The chi-square test is an important and popular test of hypothesis which fall is categorized in non-parametric test.

This test was first introduced by Karl Pearson in the year 1900.

The Chi-square test is a hypothesis test used to determine whether there is a relationship between two categorical
variables.

What are categorical variables?

a person's gender preferred


newspaper
frequency of television viewing highest
level of education

So whenever you want to test whether there is a relationship between two categorical variables, you use a Chi test.
2
Prerequisite

• Categorical variables

• Distribution free method

• Random sample

• All expected frequencies must be greater than 5


Definition

The chi-square test is a hypothesis test used for categorical variables with nominal
or ordinal measurement scale.

The chi-square test checks whether the frequencies occurring in the sample differ
significantly from the frequencies one would expect.

Thus, the observed frequencies are compared with the expected frequencies and
their deviations are examined.
Null hypothesis and alternative hypothesis

Null hypothesis: there is no relationship between gender and highest


educational attainment.

Alternative hypothesis: There is a relation between gender and the highest


educational attainment.

Tip: On DATAtab you can calculate the chi-square test online. Simply visit the
Chi- Square Test Calculator.
H0: In the population, the two categorical variables are independent. H1:

In the population, the two categorical variables are depenedent Note!

There are several ways to phrase these hypotheses:

Instead of using the words "independent" and "dependent" one could say "there is no relationship between
the two categorical variables" versus "there is a relationship between the two categorical variables."

Or "there is no association between the two categorical variables" versus "there is an association between
the two variables."

The important part is that the null hypothesis refers to the two categorical variables not being related
while the alternative is trying to show that they are related.


Applications of the Chi-Square Test
There are various applications of the Chi-square test, it can be used to answer the following questions:

1) Independence test

Are two categorical variables independent of each other? For example, does gender have an impact on whether a
person has a Netflix subscription or not?

2) Distribution test

Are the observed values of two categorical variables equal to the expected values? One question could be, is one of
the three video streaming services Netflix, Amazon, and Disney subscribed to above average?

3) Homogeneity test

Are two or more samples from the same population? One question could be whether the subscription frequencies of
the three video streaming services Netflix, Amazon and Disney differ in different age groups.

Example
A study investigates whether a new drug reduces the incidence of a certain disease. Two groups of
patients are considered: one receiving the new drug and the other receiving a placebo.

Null Hypothesis (H0): There is no difference in disease incidence between the new drug and the placebo
groups.
Alternative Hypothesis (H1): There is a difference in disease incidence between the new drug and the
placebo groups.

Result: If the p-value is less than the significance level (usually 0.05), the null hypothesis is
rejected, indicating a significant difference in disease incidence between the two groups.

Conclusion: In this hypothetical example, the calculated p-value is 0.026, which is less than
0.05. Therefore, we reject the null hypothesis and conclude that the new drug significantly reduces
the incidence of the disease compared to the placebo.

Chi-Square Test of Independence
• used when two categorical variables are to be tested for independence
• The aim is to analyze whether the characteristic values of the first variable are influenced by the
characteristic values of the second variable and vice versa.

• For example:does gender have an influence on whether a person has a Netflix subscription or not? For the
two variables gender (male, female) and has Netflix subscription (yes, no), it is tested whether they are
independent. If this is not the case, there is a relationship between the characteristics.

• The research question that can be answered with the Chi-square test is: Are the characteristics of gender
and ownership of a Netflix subscription independent of each other?

• If two variables are independent, the expected frequencies of the individual cells are obtained with

Calculate chi-squared
Chi-square distribution test
• If a variable is present with two or more values, the differences in the frequency of the
individual values can be examined.

• The Chi-square distribution test, or Goodness-of-fit test, checks whether the frequencies of the
individual characteristic values in the sample correspond to the frequencies of a defined
distribution. In most cases, this defined distribution corresponds to that of the population. In
this case, it is tested whether the sample comes from the respective population.

• For market researchers it could be of interest whether there is a difference in the market penetration
of the three video streaming services Netflix, Amazon and Disney between Berlin and the whole of
Germany. The expected frequency is then the distribution of streaming services throughout
Germany and the observed frequency results from a survey in Berlin. In the following tables the
fictitious results are shown:

Chi-square homogeneity test
• The Chi-square homogeneity test can be used to check whether two or more samples come
from the same population?

• One question could be whether the subscription frequency of three video streaming services
Netflix, Amazon and Disney differ in different age groups.

• As a fictitious example, a survey is made in three age groups with the following result:
Effect size in the Chi-square test
Choose Test
Imagine you are a researcher interested in sex
differences in student’s attitudes toward
homosexuals. Specifically, you want to test the
idea that women are more accepting of
homosexuals and thus have more positive
attitudes toward homosexuals than do men.
You collect data from 10 men and 10 women
using a scale that measures homophobia. Your
data looks like this (note: larger scores equal
more positive attitudes).
A researcher believes that recall of verbal material differs with the level of
processing. He divided his subjects into three groups. In the low processing group,
participants read each word and were instructed to count the number of letters in
the word. In the medium processing group, participants were asked to read each
word and think of a word that rhymed. In the high processing group, participants
were asked to read each word and try to memorize it for later recall. Each group was
allowed to read the list of 30 words three times, then they were asked to recall as
many of the words on the list as possible. If the researcher wants to know whether
the three groups have different amounts of recall, what type of statistical test should
be used?
A psychologist wants to investigate the relationship between stress and mental health
during the first year of college. The researcher developed scales that measure 1) the
frequency of stressful events, 2) the perceived importance of these events, 3) the
desirability of such events, and 4) the impact of these events of the student. The
researcher had 150 first year college students fill out this questionnaire as well as a
Symptom Checklist, which is designed to assess the presence or absence of
psychological disorders. What statistical procedure would help this researcher
discover if psychological disorders can be predicted from the different aspects of
stress that have been measured?
A psychologist is interested in the relationship between job satisfaction and stress.
Within a large corporation, the psychologist asked a random sample of workers two
questions. The first question asked workers to rate their overall satisfaction with
their job on a scale from 1 to 50. The second question asked the workers to rate
their stress level during the past week on a scale from 1 to 50. What type of
statistical test best assesses the relationship between job satisfaction and level of
stress?
An institutional researcher at a large university wants to compare the mathematical
abilities of its male and female students. A researcher selects 100 men and 100
women at random from each of the four classes, and administers a standardized
mathematics test. The men average 500 on the test, with a standard deviation of
120. The women average 450 on the test, with a standard deviation of 110. What
test would be appropriate to determine whether the difference between men and
women is real or due to chance variation?
A psychologist is interested in attitudes towards the disabled. She believes that
contact with someone who is disabled might have an effect on peoples’ attitudes.
To test her hypothesis, she measured attitudes toward the disabled both before and
after contact with an individual in a wheelchair.
What type of statistical test should she use to determine if contact with a
disabled person changes peoples’ attitudes toward the disabled?
A psychologist is interested in comparing the demographics of grand jury members
to demographics of the population to see if grand jury panels are really
representative of the population. The first variable she examines is age. The
percentage of people over 65 in the population is 25%, but out of 200 people
empaneled for grand jury trials, 76 (38%) were aged 65 or more. She wants to know
if the proportion of people over 65 on grand juries is significantly different than that
proportion in the population. What test should she use?
A psychologist is interested in whether or not handedness is related to gender.
Specifically she wants to know if the percentage left-handed men in the population
is different from the percentage of left-handed women. She collected data on
handedness for 200 men and 200 women. What type of statistical test would be
appropriate?
A researcher is interested in the relationship between socio-economic status (SES)
and domestic violence. She believes that there is a greater incidence of domestic
violence in households below the poverty line. To assess this relationship, she
surveys a random sample of American adults and asks two questions. First
participants are asked to estimate their household income.
Later in the survey participants are asked if there has been any physical violence
between members of their household (Yes or No). How could the researcher
determine whether there is a relationship between SES and incidence of domestic
violence?

You might also like