Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Parametric vs non parametric tests- Chi Square Test

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Parametric V/S Nonparametric

Methods
Chi-Square Test
Parametric v/s Non-Parametric Test

 Parametric tests are those that make assumptions about the


parameters of the population distribution from which the
sample is drawn. This is often the assumption that the
population data are normally distributed.

1. Normality — Data in each group should be normally distributed

2. Independence — Data in each group should be sampled randomly and


independently

3. No Outliers — no extreme outliers in the data

4. Equal Variance — Data in each group should have approximately equal


variance
Parametric v/s Non-Parametric Test

 A non-parametric test (sometimes referred to as a distribution


free test) does not assume anything about the underlying distribution
(for example, that the data comes from a normal (parametric
distribution).

Non-parametric tests are “distribution-free” and, as such, can be


used for non-Normal variables. This means the assumption of a
normal population is not necessary.

These tests are exclusively for data of nominal scale of measurement. For
this type of measurement, data are classified into categories where there is
no natural order. Examples include gender of Congressional representatives,
state of birth of students, or brand of peanut butter purchased.
For that we introduce a new test statistic: the chi-square statistic.
The key differences between nonparametric and parametric tests
based on certain properties.

Properties Parametric Non-parametric

Assumptions Yes No

central tendency Value Mean value Median value

Correlation Pearson Spearman

Probabilistic distribution Normal Arbitrary

Population knowledge Requires Does not require

Used for Interval data Nominal data

Applicability Variables Attributes & Variables

Examples z-test, t-test, etc. Kruskal-Wallis, Mann-Whitney


This flowchart helps you choose among parametric tests.
Non-parametric tests don’t make as many assumptions
about the data
Predictor variable Outcome variable Use in place of…

Spearman’s r •Quantitative •Quantitative Pearson’s r


Chi square test of •Categorical •Categorical Pearson’s r
independence
Sign test •Categorical •Quantitative One-sample t-test
Kruskal–Wallis H •Categorical •Quantitative ANOVA
•3 or more groups
ANOSIM •Categorical •Quantitative MANOVA
•3 or more groups •2 or more outcome
variables
Wilcoxon Rank-Sum test •Categorical •Quantitative Independent t-test
•2 groups •groups come from
different populations

Wilcoxon Signed-rank •Categorical •Quantitative Paired t-test


test •2 groups •groups come from the
same population
Goodness-of-Fit Test Example
Bubba’s Fish and Pasta is a chain of restaurants along the Gulf Coast of Florida. Bubba
is considering adding steak to the menu. Before doing so, he hires a research firm to
conduct a survey to find out what the patron’s favorite meal is when eating out. Here
are the results of the survey of 120 adults.

We have to determine whether the


dishes are equally popular or there is
the difference.
Is it reasonable to conclude there is no
preference among the four entrées?

Is the difference in the number of


times each entrée is selected due
to chance, or should we
conclude that the entrées are
not equally preferred?
Goodness-of-Fit Test Example Continued

Step 1: State the null and the alternate hypothesis


H0: There is no difference in the proportion of adults selecting each entrée
H1: There is a difference in the proportion of adults selecting each entrée
Step 2: Select the level of significance, we select .05
Step 3: Select the test statistic, we’ll use chi-square, χ2
Step 4: Formulate the decision rule, reject H0 if 𝜒2 >7.815
Goodness-of-Fit Test Example Concluded

Step 5: Take sample, make decision, do not reject H0, 2.200 is not greater than 7.815

Step 6: Interpret, the data do not suggest the preferences among the four entrées are
different.
Chi-Square Characteristics
 The characteristics of the chi-square distribution are
 The value of chi-square is never negative
 There is a family of chi-square distributions
 The chi-square distribution is positively skewed
 As the degrees of freedom increase, the distribution
approaches a normal distribution
Hypothesis Test of Unequal Expected
Frequencies Example

The American Hospital Administration Association reports the number of times senior
citizens are admitted to a hospital during a one-year period; 40% are not admitted, 30%
are admitted once, 20% are admitted twice, and 10% are admitted 3 or more times.

Then, a survey of 150 residents of Bartow Estates, a community devoted to active


seniors located in central Florida revealed 55 residents were not admitted, 50 were
admitted once, 32 were admitted twice, and the rest in the survey were admitted three
or more times. Can we conclude the survey at Bartow Estates is consistent with the
information reported by the AHAA?
Hypothesis Test of Unequal Expected
Frequencies Example Continued
Step 1: State the null and alternate hypothesis
H0: There is no difference between local and national experience for hospital
admissions
H1: There is a difference between local and national experience for hospital
admissions
Step 2: Select the level of significance, we select .05
Step 3: Select the test statistic, we’ll use χ2
Step 4: Formulate the decision rule, reject H0 if χ2 >7.815
Step 5: Calculate the test statistic, make decision, do not reject H0,1.3723 < 7.815
Step 6: Interpret, there is no evidence of a difference between the local and the
national experience for hospital admissions
Limitations of Chi-Square
 If there is an unusually small frequency in a cell, chi-square
might result in an erroneous conclusion
 A very small number in the denominator, can make the
quotient quite large
 For only two cells, the fe should be at least 5
 For more than two cells, chi-square should not be used if
more than 20% of the fe cells have an expected frequency
that is less than 5
The issue can be resolved by
combining categories if it is
logical to do so. In this
example, we combine the
three vice president
categories, which satisfies
the 20% policy.
Goodness-of-Fit Test
 A goodness-of-fit test can be used to determine whether
a sample of observations is from a normal population
1. Calculate the mean and standard deviation of the sample data
2. Group the data into a frequency distribution
3. Convert the class limits to z values and find the standard normal probability
distribution for each class
4. For each class, find the expected normally distributed frequency by multiplying
the standard normal probability distribution by the class frequency
5. Calculate the chi-square goodness-of-fit statistic based on the observed and
expected class frequencies
6. Find the expected frequency in each cell by determining the product of the
probability of finding a value in each cell by the total number of observations
7. If we use the information on the sample mean and the sample standard deviation
from the sample data, the degrees of freedom are k - 3
Hypothesis Test that a Distribution is
Normal Example
We investigate whether the profit data of Applewood Auto Group follows the normal
distribution. In chapter 3, we found the mean profit was $1,843.17 and the standard
deviation was $643.63.

Now, calculate z values to calculate the area of probability for each of the eight classes.
This multiplied by the total, 180, will represent the expected frequencies for each class.
x − xഥ $200−$1,843.17 x − xഥ $600−$1,843.17
z= s = = 2.55 z= s = = 1.93
$643.63 $643.63
P(x < $200) = P(z < -2.55) = .5000 - .4946 = .0054
P($200 < x < $600) = P(-2.55 < z < -1.93) = .0268 - .0054 = .0214
Hypothesis Test that a Distribution is
Normal Example Continued
Now, combine the classes that have fe < 5.

Once that is done, we can calculate the chi-square statistic.


Hypothesis Test that a Distribution is
Normal Example Concluded
Step 1: State the null and alternate hypothesis
H0: The population of profits follows the normal distribution.
H1: The population of profits does not follow the normal distribution.
Step 2: Select the level of significance, we select .05
Step 3: Select the test statistic, we’ll use chi-square, χ2
Step 4: Formulate the decision rule, reject H0 if χ2 > 11.070
Step 5: Make decision, χ2 = 5.220, we do not reject H0
fo −fe 2 8 −4.82 2 4−6.46 2
χ2 =Σ = + ……… + 6.46 = 5.220
fe 4.82
Step 6: Interpret, we conclude the evidence does not suggest the distribution of
profits is other than normal.
Contingency Table
 We can use a contingency table to test whether two
traits or characteristics are related
 The expected frequency will be determined as follows

 The degrees of freedom = (Rows – 1)(Columns – 1)


 Example
 Ford Motor Company operates the Dearborn plant with
3 shifts per day, 5 days a week. Vehicles are classified as to
quality level (acceptable, unacceptable) and shift (day,
afternoon, night). Is there a difference in the quality level
on the three shifts?
Contingency Table Example
Rainbow Chemical, Inc. employs hourly and salaried employees. The vice president of
human resources surveyed 380 employees about his/her satisfaction level with the
current health care benefits program. The employees were then classified according to
pay type, salary or hourly. Is it reasonable to conclude that pay type and level of
satisfaction with the health care benefits are related?

Step 1: State the null and alternate hypothesis


H0: There is no relationship between level of satisfaction and pay type
H1: There is a relationship between level of satisfaction and pay type
Step 2: Select the level of significance, we select .05
Step 3: Select the test statistic, we’ll use chi-square, χ2
Contingency Table Example Continued
Step 4: Formulate the decision rule, reject H0 if chi-square > 5.991

Step 5: Make decision, chi-square is 2.506, do not reject H0

Step 6: Interpret, the sample data do not provide evidence that pay type and
satisfaction level with health care benefits are related.
Question: The Federal Correction Agency: Does a male released from
federal prison make a different adjustment to civilian life if he returns to
his hometown or if he goes elsewhere to live? To put it another way, is
there a relationship between adjustment to civilian life and place of
residence after release from prison? Use the .01 significance level.
Two variables: 1. adjustment to civilian life 2. place of residence.
Note that both variables are measured on the nominal scale.

TABLE Adjustment to Civilian Life and Place of Residence

You might also like