Non Parametric Test
Non Parametric Test
Non Parametric Test
J P Singh
Biostatistics
Introduction
In inferential statistics there are two types of test:
o Parametric test
o Non-parametric test
The nature of the parent population is known.
The nature of distribution of population is known.
When the nature of the population from which samples are drawn are
not available, parametric tests are useless.
In such situation non-parametric tests are used.
These tests are also called Distribution free tests.
Most of the non-parametric tests are applicable to data measured in an
ordinal or nominal scale.
Non-parametric tests are defined as those statistical test of
hypothesis in which no parameter is involved and which are
based on ordered statistics.
Difference Between
Parametric and Non-parametric Test
Parametric Test Non-parametric Test
1. Those test whose models specify 1. Those test whose models do not specify
certain conditions about the any condition about the parameter of the
parameters of the population from population from which the sample has
which the sample has been drawn. been drawn.
2. Tests are designed to test the statistical 2. Tests are designed only to test statistical
hypothesis of one or more parameters hypothesis which does not involved any
of the population. parameter of the population.
3. Tests are used for the data which are 3. Tests are used for the data which are
measured in interval and ratio scale. measured in nominal and ordinal scale.
4. Tests are most powerful. 4. Test are less powerful than parametric
test.
Chi-square Distribution
Characteristics
Every χ2 distribution extends indefinitely to
the right from 0.
Every χ2 distribution has only one (right )
tail.
As df increases, the χ2 curves get more bell
shaped and approach the normal curve in
appearance (but remember that a chi
square curve starts at 0, not at - ∞ )
Chi-square Distribution
Introduction
A chi square (χ2 ) distribution is a probability
distribution.
The chi-square is useful in making statistical
inferences about categorical data in which the
categories are two and above .
Chi Square (χ2) test is the most popular discrete
data hypothesis testing method.
It is a nonparametric test of statistical of
significance of bivariate tabular analysis with a
contingency table.
Introduction Cont…
Chi square helps us analyze data that come in the form of
counts, such as 25 females and 12 males entered in to a
clinical study.
The most common application for chi square is to
determine whether or not a significant difference exists
between the observed counts of cases falling into each
category and the expected counts based on the null
hypothesis.
Chi-square statistics is widely used in reports of clinical
trials, observational studies, case control studies,
laboratories studies etc.
It is used most frequently to test the statistical
significance of results reported in bivariate tables.
Situations for chi square test
A. Test of association between two events in binomial samples
Two events can often can be studied for their association such as
cholesterol and coronary disease,
nutrition and intelligence,
smoking and cancer,
treatment and outcome of disease,
alcohol and gastric ulcer,
vaccination and immunity,
weight and diabetes,
blood pressure and heart disease, and so on.
Situations for chi square test
B. Test of association between two events in
multinomial samples
It can be applied to find association or relationship
between two discrete attributes when there more than
two classes of groups such as
test state of nutrition less than 60%, 61 – 80%, 81 – 100%
and intelligence quotient (percentage),
the association between number of cigarettes equal to
10, more than 10, 11 to 20, 21 to 30 and more than 30
smoked per day and the incidence of lung cancer,
between parity of the mother 1st, 2nd, 3rd, 4th, 5th and
above and the weight of the newborn, and so on.
Requirement for Chi square test
• The frequencies used must be absolute, not relative. i.e. the data
must be in the form of frequencies counted in each of the set of
categories.
• The total number observed must exceed 50.
• The expected frequency on any one fraction must not normally be
less than 5. If it is less than 5, the frequencies from the preceding
or succeeding frequency be pooled together in order to make it 5
or more than 5.
• In such cases, the adjustment of in the degree of freedom must
also be done.
• All the observations must be independent of each other. i.e. one
observation must not have an influence upon another observation.
Application of Chi-square Test
Chi square = χ2 =
Steps
• Formulate the hypothesis
• Null hypothesis (H0): There is no significance difference the
observed value and the corresponding expected values.
• Alternative hypothesis (H1): The difference is significant.
• Test statistics
Let O1, O2, O3,………On be the set of observed frequencies
and E1, E2, E3, ………En the corresponding set of expected
frequencies, then chi square statistic is
χ2= = + + + …………. +
Testing of goodness of fit Cont…
• Degree of freedom
Degree of freedom = n-1
• Level of significance
Choose level of significance 5% or 1% or given
• Tabulated value
Select tabulated value at alpha level of significance at (n-1) d. f.
• Decision making
Compare the calculated value of χ2 at (n-1) d. f. with given level of
significance (α)
• Conclusion
If calculated value of χ2 > tabulated value of χ2 at level of
significance, then the null hypothesis is rejected, otherwise Ho is
accepted.
Example
In a community survey of 300 families with 5
children each revolved the following distribution:
No. of Boys: 5 4 3 2 1
0
No. of Girls: 0 1 2 3 4
5
No. of Families: 12 48 86 100 44 10
Is the result consistent with the hypothesis that
the male and female births are equally probable?
(Or fit a binomial distribution and test the
goodness of fit).
Solution
Null hypothesis (H0): there is equal chance of male and female
birth. Or the data is taken from binomial distribution.
Alternative hypothesis (H1): there is no equal chance of male
and female birth. Or the data is not taken from binomial
distribution.
Let us suppose that X follows a binomial distribution and its
probability of success is considered as boys
Then P (boy) = ½ and P (girl) = q = 1 - ½ = ½.
N = 300, n = 5, and x = 0, 1, 2, 3, 4, 5
Expected frequency (E) = N p (X = x)
p (X = x) = probability of no. of x male births.
= nCx px qn-x
E = 300 5Cx (½)x (½)5-x
Solution Cont…
Calculation of expected frequency
No. of Birth P(X = x) = 5Cx (½)x (½)5-x E = 300 * P(X = x)
0 5
C0 (½)0 (½)5-0 = 0.03125 E = 300 x 0.03125 = 9.375
1 5
C1 (½)1 (½)5-1 = 0.15625 E = 300 x 0.15625 = 46.875
2 5
C2 (½)2 (½)5-2 = 0.3125 E = 300 x 0.3125 = 93.75
3 5
C3 (½)3 (½)5-3 = 0.3125 E = 300 x 0.3125 = 93.75
4 5
C4 (½)4 (½)5-4 = 0.15625 E = 300 x 0.15625 = 46.875
5 5
C5 (½)5 (½)5-5 = 0.03125 E = 300 x 0.03125 = 9.375
Solution Cont…
Calculation of χ2
No. of Observed E = N p(X = x) O–E (O-E)2
boys Frequency(O) = 300 p(X=x)
=2.04
Solution Cont…
Here, test statistic, χ2 = = 2.04
The degree of freedom = n – 1 = 6 – 1 = 5
Level of significance, = 5% = 0.05
Critical value: the tabulate value at 5 d. f. at 5%
level of significance = 11.07
Decision: Calculate value of chi square is less
than tabulated value, the null hypothesis is
accepted.
Conclusion: Since we accept the null hypothesis,
the data follow the binomial distribution.
Example
A. S. parks gives the distribution of the number of
males in litters of piglets.
No. of males 0 1 2 3 4 5 6 7 8
Frequency 1 8 37 81 162 77 20 5 1
Calculate theoretical frequencies on the
hypothesis that the probability of a piglet being
male is 1/2. Group together the first two and last
two frequencies; how many degrees of freedom
remain? What is the value of χ2 . Comment upon
the closeness of fit.
Solution
χ2 =
Solution Cont…
Calculation of χ2
No. of Observed Expected O–E (O-E)2
males Frequency (O) Frequency (E)
0 1 1.53
9 13.77 -4.77 22.753 1.652
1 8 12.24
2 37 42.84 -5.84 34.106 10.796
3 81 85.68 -4.68 21.902 0.256
4 162 107.1 54.9 3014.01 28.142
5 77 85.68 -8.68 75.342 0.879
6 30 42.84 -12.84 164.487 3.848
7 5 12.24
6 13.77 -7.77 60.373 4.384
8 1 1.53
Total ∑O = 392 ∑E = 392
= 49.957
Solution Cont…
Degree of freedom: 9 – 1-1-1 = 6
Level of significance: alpha = 5% = 0.05
Critical value: the tabulated value of chi square
at 5% level of significance for 6 d. f. is 12.592
Decision: Since calculated value of chi square is
greater than tabulated value, the null hypothesis
is not accepted.
Conclusion: The proportion of male piglet birth
and female piglet birth are not equal.
Example-1
Records taken of the number of male and
female births in 800 families having four
children are given below:
No. of female births: 4 3 2 1 0
No. of male births: 0 1 2 3 4
No. of families: 32 178 290 236 64
Test whether the data is consistent with the
hypothesis that the binomial law hold and the
chance of male birth is equal to that of female
births.
Example-2
Fit the Poisson distribution and test the
goodness of fit for the following data with
respect to number of RBC.
RBC per cell: 0 1 2 3 4 5
No. of cells: 142 156 69 27 5 1
B. Testing of Association between two
categorical variable
Another important application of chi square test is to test
the null hypothesis that two criteria of classification, when
applied to the same set of entities are independent. For
example,
a physician is interested to determine whether there is the
relationship between diabetic cholesterols and diabetic
complications,
a dentist is interested to know association between dental
carrier and amount of sweet consumptions
many examples where the association or independence
between attributes is tested.
Testing of Association between two
categorical variable
E= =
Inoculated 10 90 100
10 18 3.556
90 83 0.753
26 18 3.556
74 82 0.753
Total 8.618
………………………………………………………………………
……………………………………………………………………..
Alternative formula
This method can be used only 2 X 2 contingency table.
a b a+b
c d c+d
a+c b+d N=a+b+c+d
χ2 =
χ2 = =
Example
Apply chi square test to find the effective of
vaccine. The following results were obtained:
Vaccine Results
Died Survived Total
Given 2 10 12
Not given 6 6 12
Total 8 16 24
Solution
Null hypothesis: Vaccine is not effective in
controlling disease.
Alternative hypothesis: Vaccine is effective in
controlling disease.
Here expected frequency is less than 5, so we
should apply Yates correction with the formula.
2
=
Solution Cont…
Here a = 2, b = 10, c = 6, d = 6, N = 24
Then, χ2 =
= 1.68
Solution Cont…
Level of significance = 5% = 0.05
Degree of freedom = (2-1) (2-1) = 1
Critical value: The tabulated value of chi square at 1
degree of freedom at 5% level of significance = 3.84.
Decision: Here the calculated value of chi square is
less than tabulated value, we do not reject the null
hypothesis.
Conclusion: Vaccine is not effective for controlling
disease.
Example
Sternal surgical site infection (SSI) after coronary artery
bypass graft surgery is a complication that increases patient
morbidity and costs for patients, payers, and the health care
system. Researchers performed a study that examined two
types of preoperative skin preparation before performing
open heart surgery. These two preparations used aqueous
iodine and insoluble iodine with the following results.
Comparison of Aqueous and Insoluble Preps
Prep Group
Infected Not Infected
Aqueous iodine 14 94
Insoluble iodine 4 97
Do these data provide sufficient evidence at the α = 5% level
to justify the conclusion that the type of skin preparation
and infection are related?
Fisher Exact Test
Sometimes we have data that can be
summarized in a 2 x 2 contingency table, but
these data are derived from very small samples.
The chi-square test is not an appropriate
method of analysis if minimum expected
frequency requirements are not met.
For example, n is less than 50 and two or more
of the expected frequencies is less than 5, the
chi-square test should be avoided.
Fisher Exact Test Cont…
A test that may be used when the size
requirements of the chi-square test are
not met. The test has come to be known
as the Fisher exact test.
It is called exact because, if desired, it
permits us to calculate the exact
probability of obtaining the observed
results or results that are more extreme.
Assumptions
1. The samples are random and independent.
2. The measurement scale is nominal or ordinal but of
dichotomous nature.
3. Each observation can be categorized as one of two
mutually exclusive types.
A 2 x 2 Contingency Table for the Fisher Exact Test
Attribute B
Attribute A Total
B1 B2
A1 a b a+b
A2 c d c+d
Total a+c b+d a+b+c+d=n
If more than 20% of the expected cell frequencies < 5, use the Fisher’s
Cont….
Yes 2 7 9
No 8 2 10
Total 10 9 19
Solution
Solution (cont…)
H0: The proportion of patients in both medical condition
characteristics X and Y are equal.
H1: The proportion of patients in both medical condition
characteristics X and Y are not equal.
pk = (9!*10!*9!*10!) / (19!*2!*7!*8!*2!) = 0.01754
pk-1 = (9!*10!*9!*10!) / (19!*1!*8!*9!*1!) = 0.00097
pk-2 = (9!*10!*9!*10!) / (19!*0!*9!*10!*0!) = 0.00001
Critical value: P = 0.01754 + 0.00097 + 0.00001
= 0.01852
Level o significance = 5% = 0.05
Decision: As P < α, null hypothesis is not accepted.
Conclusion: The proportion of patients in both medical condition
characteristics X and Y are not equal.
Example
Suppose we carry out a clinical trial and randomly
allocate 6 patients to treatment A and 6 to treatment B.
The outcome is as follows:
Treatment type Survived Died Total
A 3 3 6
B 5 1 6
Total 8 4
12
Test the hypothesis that there is no association between
treatment and survival at 5% level of significance.
Mc Nemar’s test
When categorical data are paired and the
outcome of interest is a proportion, the Mc
Nemar Test is used to evaluate hypotheses
about the data.
Sometimes called the Mc Nemar Chi-square
test because the test statistic has a Chi-square
distribution
Summarizing the Data
After treatment
+ (-) - (+)
+ a b
- c d
treatment
Before
or
• Degree of freedom = (c-1)(r-1) = 1
• Tabulated value of Chi square at alpha level of
significance for 1 d.f. is less than or equal to
calculated Chi square, the null hypothesis is
not accepted.
Example (cont…)
Breast cancer patients receiving mastectomy followed
by chemotherapy were matched to each other on age
and cancer stage. By random assignment, one patient in
each matched pair received chemo perioperatively and
for an additional 6 months, while the other patient in
each matched pair received chemo perioperatively only.
Periop. only
Survived 5 years Died within 5 years
Survived 5 years 512 17
p. + 6 Months
A 2.5 3 1.25 1.75 3.5 2.5 1.75 2.25 3.5 2.5 2 3.5
peak(hr)
Time to
17 0 1 10/10 10/10 0
Solution Cont…
Here n1 = n2 = 10
The test statistics is
Dmaxn1n2 = Maximum of = 6/10 = 0.6
Critical value: Value of D for n1 = n2 = 10 at 5% level of
significance for two tail test is 0.294.
Decision: Calculated value is greater than tabulated
value, Null hypothesis is not accepted.
Conclusion: Since null hypothesis is rejected, we can
conclude that the two distribution of life times of
electric bulbs of two brands are different.
LARGE SAMPLE SIZE
n1 + n2 > 40 and both are greater than 20 i.e. n1 =
n2 > 20 (for n1 ≠ n2).
Two large samples of equal and unequal size,
the K-S test statistics for two tail test is given by
Dmaxn1n2 = Maximum of
But for one tailed test situation, the K-S test
statistics is calculated by using chi-square which
can be derived as = 4(Dn1,n2)2
Which chi-square distribution wit 2 degree of
freedom.