Non Parametric Test

Non-parametric Test
J P Singh
Biostatistics
Introduction
 In inferential statistics there are two types of test:
o Parametric test
o Non-parametric test
 The nature of the parent population is known.
 The nature of distribution of population is known.
 When the nature of the population from which samples are drawn are
not available, parametric tests are useless.
 In such situation non-parametric tests are used.
 These tests are also called Distribution free tests.
 Most of the non-parametric tests are applicable to data measured in an
ordinal or nominal scale.
Non-parametric tests are defined as those statistical test of
hypothesis in which no parameter is involved and which are
based on ordered statistics.
Difference Between
Parametric and Non-parametric Test
Parametric Test Non-parametric Test
1. Those test whose models specify 1. Those test whose models do not specify
certain conditions about the any condition about the parameter of the
parameters of the population from population from which the sample has
which the sample has been drawn. been drawn.
2. Tests are designed to test the statistical 2. Tests are designed only to test statistical
hypothesis of one or more parameters hypothesis which does not involved any
of the population. parameter of the population.
3. Tests are used for the data which are 3. Tests are used for the data which are
measured in interval and ratio scale. measured in nominal and ordinal scale.
4. Tests are most powerful. 4. Test are less powerful than parametric
test.
Chi-square Distribution
Characteristics
Every χ2 distribution extends indefinitely to
the right from 0.
Every χ2 distribution has only one (right )
tail.
As df increases, the χ2 curves get more bell
shaped and approach the normal curve in
appearance (but remember that a chi
square curve starts at 0, not at - ∞ )
Chi-square Distribution
Introduction
 A chi square (χ2 ) distribution is a probability
distribution.
 The chi-square is useful in making statistical
inferences about categorical data in which the
categories are two and above .
 Chi Square (χ2) test is the most popular discrete
data hypothesis testing method.
 It is a nonparametric test of statistical of
significance of bivariate tabular analysis with a
contingency table.
Introduction Cont…
 Chi square helps us analyze data that come in the form of
counts, such as 25 females and 12 males entered in to a
clinical study.
 The most common application for chi square is to
determine whether or not a significant difference exists
between the observed counts of cases falling into each
category and the expected counts based on the null
hypothesis.
 Chi-square statistics is widely used in reports of clinical
trials, observational studies, case control studies,
laboratories studies etc.
 It is used most frequently to test the statistical
significance of results reported in bivariate tables.
Situations for chi square test
A. Test of association between two events in binomial samples
Two events can often can be studied for their association such as
 cholesterol and coronary disease,
 nutrition and intelligence,
 smoking and cancer,
 treatment and outcome of disease,
 alcohol and gastric ulcer,
 vaccination and immunity,
 weight and diabetes,
 blood pressure and heart disease, and so on.
Situations for chi square test
B. Test of association between two events in
multinomial samples
It can be applied to find association or relationship
between two discrete attributes when there more than
two classes of groups such as
test state of nutrition less than 60%, 61 – 80%, 81 – 100%
and intelligence quotient (percentage),
the association between number of cigarettes equal to
10, more than 10, 11 to 20, 21 to 30 and more than 30
smoked per day and the incidence of lung cancer,
between parity of the mother 1st, 2nd, 3rd, 4th, 5th and
above and the weight of the newborn, and so on.
Requirement for Chi square test
• The frequencies used must be absolute, not relative. i.e. the data
must be in the form of frequencies counted in each of the set of
categories.
• The total number observed must exceed 50.
• The expected frequency on any one fraction must not normally be
less than 5. If it is less than 5, the frequencies from the preceding
or succeeding frequency be pooled together in order to make it 5
or more than 5.
• In such cases, the adjustment of in the degree of freedom must
also be done.
• All the observations must be independent of each other. i.e. one
observation must not have an influence upon another observation.
Application of Chi-square Test
Testing the Goodness of Fit

 Testing of Independence Between Attributes
 Testing of Homogeneity
Chi Square test of Goodness of Fit
• Chi square test is applied as a test of “goodness of fit”.

• It indicates the closeness of observed frequency with that of
the expected frequency i.e. to test whether the difference
between the theoretical and observed values are due to
chance only or due to some factor playing part.
• Chi square is used to test the fitness of observed frequency
of a certain probability distributions to a theoretical
distribution such as Binomial, Poisson or Normal.
• The test determines whether an observed frequency
distribution differs from the theoretical distribution by
chance or if the sample is drawn from a different population.
Definition
A statistic which measures the discrepancy between n
observed frequencies O1, O2, ……….. On and the
corresponding expected frequencies E1, E2 ……….. En.
Chi square = χ2 =
The sampling distribution of the chi-square statistic is

known as the chi-square distribution. As in t distributions,
there is a different χ2 distribution for each different value
of degrees of freedom, but all of them share the
following characteristics.
A. Testing of goodness of fit
Steps
• Formulate the hypothesis
• Null hypothesis (H0): There is no significance difference the
observed value and the corresponding expected values.
• Alternative hypothesis (H1): The difference is significant.
• Test statistics
Let O1, O2, O3,………On be the set of observed frequencies
and E1, E2, E3, ………En the corresponding set of expected
frequencies, then chi square statistic is
χ2= = + + + …………. +
Testing of goodness of fit Cont…
• Degree of freedom
Degree of freedom = n-1
• Level of significance
Choose level of significance 5% or 1% or given
• Tabulated value
Select tabulated value at alpha level of significance at (n-1) d. f.
• Decision making
Compare the calculated value of χ2 at (n-1) d. f. with given level of
significance (α)
• Conclusion
If calculated value of χ2 > tabulated value of χ2 at level of
significance, then the null hypothesis is rejected, otherwise Ho is
accepted.
Example
In a community survey of 300 families with 5
children each revolved the following distribution:
No. of Boys: 5 4 3 2 1
0
No. of Girls: 0 1 2 3 4
5
No. of Families: 12 48 86 100 44 10
Is the result consistent with the hypothesis that
the male and female births are equally probable?
(Or fit a binomial distribution and test the
goodness of fit).
Solution
Null hypothesis (H0): there is equal chance of male and female
birth. Or the data is taken from binomial distribution.
Alternative hypothesis (H1): there is no equal chance of male
and female birth. Or the data is not taken from binomial
distribution.
Let us suppose that X follows a binomial distribution and its
probability of success is considered as boys
Then P (boy) = ½ and P (girl) = q = 1 - ½ = ½.
N = 300, n = 5, and x = 0, 1, 2, 3, 4, 5
Expected frequency (E) = N p (X = x)
p (X = x) = probability of no. of x male births.
= nCx px qn-x
E = 300 5Cx (½)x (½)5-x
Solution Cont…
Calculation of expected frequency
No. of Birth P(X = x) = 5Cx (½)x (½)5-x E = 300 * P(X = x)
0 5
C0 (½)0 (½)5-0 = 0.03125 E = 300 x 0.03125 = 9.375
1 5
C1 (½)1 (½)5-1 = 0.15625 E = 300 x 0.15625 = 46.875
2 5
C2 (½)2 (½)5-2 = 0.3125 E = 300 x 0.3125 = 93.75
3 5
C3 (½)3 (½)5-3 = 0.3125 E = 300 x 0.3125 = 93.75
4 5
C4 (½)4 (½)5-4 = 0.15625 E = 300 x 0.15625 = 46.875
5 5
C5 (½)5 (½)5-5 = 0.03125 E = 300 x 0.03125 = 9.375
Solution Cont…
Calculation of χ2
No. of Observed E = N p(X = x) O–E (O-E)2
boys Frequency(O) = 300 p(X=x)
0 10 9.375 0.63 0.39 0.04
1 44 46.875 -2.88 8.27 0.18
2 100 93.75 6.25 39.1 0.42
3 86 93.75 -7.75 60.1 0.64
4 48 46.875 1.13 1.27 0.03
5 12 9.375 2.63 6.89 0.74
Total ∑O = 300 ∑E = 300
=2.04
Solution Cont…
Here, test statistic, χ2 = = 2.04
The degree of freedom = n – 1 = 6 – 1 = 5
Level of significance, = 5% = 0.05
Critical value: the tabulate value at 5 d. f. at 5%
level of significance = 11.07
Decision: Calculate value of chi square is less
than tabulated value, the null hypothesis is
accepted.
Conclusion: Since we accept the null hypothesis,
the data follow the binomial distribution.
Example
A. S. parks gives the distribution of the number of
males in litters of piglets.
No. of males 0 1 2 3 4 5 6 7 8
Frequency 1 8 37 81 162 77 20 5 1
Calculate theoretical frequencies on the
hypothesis that the probability of a piglet being
male is 1/2. Group together the first two and last
two frequencies; how many degrees of freedom
remain? What is the value of χ2 . Comment upon
the closeness of fit.
Solution
Let X be a random variable of number of males

piglet which follows the binomial distribution.
Total number of observations (N) = 392
Number of trials (n) = 8
Probability of success (p) = ½
Probability of failure (q) = 1 – p = 1 – ½ = ½
p (X = x) = nCx px qn-x
E = 392 8Cx (½)x (½)8-x = 392 8Cx (½)x+8-x
= 392 8Cx (½)8 = 392 x 1/256 8Cx = 1.53 8Cx
Solution Cont…
For expected frequencies for x = 0, 1, 2, 3, 4, 5, 6, 7
and 8 success are
f(0) = 1.53 8C0 = 1.53 f(1) = 1.53 8C1 = 12.24
f(2) =1.53 8C2 = 42.84 f(3) = 1.53 8C3 = 85.68
f(4) = 1.53 8C4 = 107.1 f(5) = 1.53 8C5 = 85.68
f(6) = 1.53 8C6 = 42.84 f(7) = 1.53 8C7 = 12.24
f(8) = 1.53 8C8 =1.53
Solution Cont…
Null hypothesis (H0) : the proportion of male
piglet birth and female piglet birth are equal.
Alt. hypothesis (H1) : the proportion of male
piglet birth and female piglet birth are not equal.
Test statistics: Under H0 the chi square statistic is
χ2 =
Solution Cont…
Calculation of χ2
No. of Observed Expected O–E (O-E)2
males Frequency (O) Frequency (E)
0 1 1.53
9 13.77 -4.77 22.753 1.652
1 8 12.24
2 37 42.84 -5.84 34.106 10.796
3 81 85.68 -4.68 21.902 0.256
4 162 107.1 54.9 3014.01 28.142
5 77 85.68 -8.68 75.342 0.879
6 30 42.84 -12.84 164.487 3.848
7 5 12.24
6 13.77 -7.77 60.373 4.384
8 1 1.53
Total ∑O = 392 ∑E = 392
= 49.957
Solution Cont…
Degree of freedom: 9 – 1-1-1 = 6
Level of significance: alpha = 5% = 0.05
Critical value: the tabulated value of chi square
at 5% level of significance for 6 d. f. is 12.592
Decision: Since calculated value of chi square is
greater than tabulated value, the null hypothesis
is not accepted.
Conclusion: The proportion of male piglet birth
and female piglet birth are not equal.
Example-1
Records taken of the number of male and
female births in 800 families having four
children are given below:
No. of female births: 4 3 2 1 0
No. of male births: 0 1 2 3 4
No. of families: 32 178 290 236 64
Test whether the data is consistent with the
hypothesis that the binomial law hold and the
chance of male birth is equal to that of female
births.
Example-2
Fit the Poisson distribution and test the
goodness of fit for the following data with
respect to number of RBC.
RBC per cell: 0 1 2 3 4 5
No. of cells: 142 156 69 27 5 1
B. Testing of Association between two
categorical variable
Another important application of chi square test is to test
the null hypothesis that two criteria of classification, when
applied to the same set of entities are independent. For
example,
 a physician is interested to determine whether there is the
relationship between diabetic cholesterols and diabetic
complications,
 a dentist is interested to know association between dental
carrier and amount of sweet consumptions
many examples where the association or independence
between attributes is tested.
Testing of Association between two
categorical variable
The different classification of the two

different criteria is placed in the two way
table, one classification of criteria in row
and other in columns. Such a table is
called contingency table.
Steps
Null hypothesis (H0): The two attributes are independent. Or there is
no association between two attributes.
Alternative hypothesis (H1): The two attributes are dependent.
Statistical test
At first expected frequencies are computed on the basis of given null
hypothesis. For any cell, it can be calculated by the formula as:
E= =
Degree of freedom = (r – 1) (c – 1), r = no. of row, c = no. of column.
χ2 = Other steps are same as above.

Example
From the following data test and conclude
whether inoculation is effective in preventing
tuberculosis.
Group Attacked Not attacked total
Inoculated 10 90 100
Not inoculated 26 74 100
Total 36 164 200

Solution
Null hypothesis: Inoculation is not effective in
preventing tuberculosis.
Alternative hypothesis: Inoculation is effective in
preventing tuberculosis.
Test statistics χ2 =
E=
For O = 10, E = = 18, O = 90, E = = 83
O = 26, E = = 18, O = 74, E = = 82

Solution Cont…
O E
10 18 3.556
90 83 0.753
26 18 3.556
74 82 0.753
Total 8.618
Level of significance = 5% = 0.05

Degree of freedom = (2-1) (2-1) = 1
The tabulated value of chi square at 1 degree of freedom at
5% level of significance = 3.84.
Decision: Here the calculated value is greater than tabulated
value, so we do not accept the null hypothesis.
Conclusion: The inoculation is effective in preventing
tuberculosis.
Example
In 1992, the U.S. Public Health Service and the Centers for Disease
Control and Prevention recommended that all women of childbearing
age consume 400 mg of folic acid daily to reduce the risk of having a
pregnancy that is affected by a neural tube defect such as spina bifida or
anencephaly. In a study, 693 pregnant women called a teratology
information service about their use of folic acid supplementation. The
researchers wished to determine if preconceptional use of folic acid and
race are independent.
Race of Pregnant Caller and Use of Folic Acid
Preconceptional Use of Folic Acid
Yes No Total
White 260 299 559
Black 15 41 56
Other 7 14 21
Total 282 354 636
Solution
Null hypothesis: Race and preconceptional
use of folic acid are independent.
Alternative hypothesis: The two variables are
not independent.
………………………………………………………………………
……………………………………………………………………..
Alternative formula
This method can be used only 2 X 2 contingency table.
a b a+b
c d c+d
a+c b+d N=a+b+c+d
χ2 =
Decision: If the calculated value of χ2 > tabulated

value with (c-1)(r-1) d. f. at alpha level of significance,
we reject the null hypothesis, otherwise we accept H0.
Example
In an air pollution study, a random sample of 400 households
was selected from each of the two communities. A respondent
in each household was asked whether or not anyone in the
household bothered by air pollution. The responses were as
follows:
Community Any member bothered by air pollution
Yes No Total
I 43 157 200
II 81 119 200
Total 124 276 400
Can the researcher conclude that the two communities are

homogeneous with respect to the variable of interest?
Yates Continuity Correction
Yates proposed a procedure for correcting
small expected cell frequency less than 5 for
any cell in the case of 2 x 2 contingency tables.
χ2 = =
Example
Apply chi square test to find the effective of
vaccine. The following results were obtained:
Vaccine Results
Died Survived Total
Given 2 10 12
Not given 6 6 12
Total 8 16 24
Solution
Null hypothesis: Vaccine is not effective in
controlling disease.
Alternative hypothesis: Vaccine is effective in
controlling disease.
Here expected frequency is less than 5, so we
should apply Yates correction with the formula.
2
=
Solution Cont…
Here a = 2, b = 10, c = 6, d = 6, N = 24
Then, χ2 =
= 1.68
Solution Cont…
Level of significance = 5% = 0.05
Degree of freedom = (2-1) (2-1) = 1
Critical value: The tabulated value of chi square at 1
degree of freedom at 5% level of significance = 3.84.
Decision: Here the calculated value of chi square is
less than tabulated value, we do not reject the null
hypothesis.
Conclusion: Vaccine is not effective for controlling
disease.
Example
Sternal surgical site infection (SSI) after coronary artery
bypass graft surgery is a complication that increases patient
morbidity and costs for patients, payers, and the health care
system. Researchers performed a study that examined two
types of preoperative skin preparation before performing
open heart surgery. These two preparations used aqueous
iodine and insoluble iodine with the following results.
Comparison of Aqueous and Insoluble Preps
Prep Group
Infected Not Infected
Aqueous iodine 14 94
Insoluble iodine 4 97
Do these data provide sufficient evidence at the α = 5% level
to justify the conclusion that the type of skin preparation
and infection are related?
Fisher Exact Test
 Sometimes we have data that can be
summarized in a 2 x 2 contingency table, but
these data are derived from very small samples.
 The chi-square test is not an appropriate
method of analysis if minimum expected
frequency requirements are not met.
 For example, n is less than 50 and two or more
of the expected frequencies is less than 5, the
chi-square test should be avoided.
Fisher Exact Test Cont…
A test that may be used when the size
requirements of the chi-square test are
not met. The test has come to be known
as the Fisher exact test.
It is called exact because, if desired, it
permits us to calculate the exact
probability of obtaining the observed
results or results that are more extreme.
Assumptions
1. The samples are random and independent.
2. The measurement scale is nominal or ordinal but of
dichotomous nature.
3. Each observation can be categorized as one of two
mutually exclusive types.
A 2 x 2 Contingency Table for the Fisher Exact Test
Attribute B
Attribute A Total
B1 B2
A1 a b a+b
A2 c d c+d
Total a+c b+d a+b+c+d=n
If more than 20% of the expected cell frequencies < 5, use the Fisher’s
Cont….
Null hypothesis(H0): P1 = P2 i.e. there is no

significance difference in the proportion of
two sample attributes.
Alternative hypothesis(H1): P1 ≠ P2 i.e. there is
significance difference in the proportion of
two sample attributes.
Cont….
Test statistics
Critical value: it is the adding of all probabilities

Decision: Accept null hypothesis if P > α,
otherwise reject it.
Example
19 patients with medical condition are
assessed to whether they do/don’t possess 2
other characteristic, X & Y ie, want to
directionally test if those showing (not
showing) characteristic X tend to also show
(not show) characteristic Y.
Y
X No Yes Total
Yes 2 7 9
No 8 2 10
Total 10 9 19
Solution
Solution (cont…)
H0: The proportion of patients in both medical condition
characteristics X and Y are equal.
H1: The proportion of patients in both medical condition
characteristics X and Y are not equal.
pk = (9!*10!*9!*10!) / (19!*2!*7!*8!*2!) = 0.01754
pk-1 = (9!*10!*9!*10!) / (19!*1!*8!*9!*1!) = 0.00097
pk-2 = (9!*10!*9!*10!) / (19!*0!*9!*10!*0!) = 0.00001
Critical value: P = 0.01754 + 0.00097 + 0.00001
= 0.01852
Level o significance = 5% = 0.05
Decision: As P < α, null hypothesis is not accepted.
Conclusion: The proportion of patients in both medical condition
characteristics X and Y are not equal.
Example
Suppose we carry out a clinical trial and randomly
allocate 6 patients to treatment A and 6 to treatment B.
The outcome is as follows:
Treatment type Survived Died Total
A 3 3 6
B 5 1 6
Total 8 4
12
Test the hypothesis that there is no association between
treatment and survival at 5% level of significance.
Mc Nemar’s test
 When categorical data are paired and the
outcome of interest is a proportion, the Mc
Nemar Test is used to evaluate hypotheses
about the data.
 Sometimes called the Mc Nemar Chi-square
test because the test statistic has a Chi-square
distribution
Summarizing the Data
 Like the Chi-square test, data need to be arranged in a

contingency table before calculating the Mc Nemar
statistic
 The table will always be 2 X 2 but the cell frequencies are
numbers of ‘pairs’ not numbers of individuals
 Examples for setting up the tables are in the following
* Case – Control paired data
* Twins paired data: one exposed and one
unexposed
* Before – After paired data
Pair-Matched Data for Case-Control Study:
outcome is exposure to some risk factor
Control
Case Exposed Unexposed
(Unexposed) (Exposed)
Exposed a b
Unexposed c d
The counts in the table for a case-control study are

numbers of pairs not numbers of individuals.
Paired Data for Before-After Counts
After treatment
+ (-) - (+)
+ a b
- c d
treatment
Before
The counts in the table for a before-after study are

numbers of pairs and number of individuals.
Null hypotheses for Paired Data
 The null hypothesis for case-control pair matched
data is that the proportion of subjects exposed to
the risk factor is equal for cases and controls.
 The null hypothesis for twin paired data is that
the proportions with the event are equal for
exposed and unexposed twins
 The null hypothesis for before-after data is that
the proportion of subjects with the characteristic
(or event) is the same before and after treatment.
Steps
 For any of the paired data Null Hypotheses the
following are true if the null hypothesis is true:
Ho: Pb = Pc
H1: Pb ≠ Pc
 Cells ‘b’ and ‘c’ are called the discordant cells because
they represent pairs with a difference
 Cells ‘a’ and ‘d’ are the concordant cells. These cells do
not contribute any information about a difference
between pairs or over time so they aren’t used to
calculate the test statistic.
Steps (cont…)
• The Mc Nemar test statistic is:
or
• Degree of freedom = (c-1)(r-1) = 1
• Tabulated value of Chi square at alpha level of
significance for 1 d.f. is less than or equal to
calculated Chi square, the null hypothesis is
not accepted.
Example (cont…)
Breast cancer patients receiving mastectomy followed
by chemotherapy were matched to each other on age
and cancer stage. By random assignment, one patient in
each matched pair received chemo perioperatively and
for an additional 6 months, while the other patient in
each matched pair received chemo perioperatively only.
Periop. only
Survived 5 years Died within 5 years
Survived 5 years 512 17
p. + 6 Months
Died within 5 years 5 90

Solution
…………………………………………………………………..
…………………………………………………………………..
Conclusion: The data provide evidence that
an extra 6 months of chemotherapy results
in a different survival rate compared to
treatment with perioperative chemo alone.
Example
Consider an independent bivariate random variable (Xi, Yi), i =
1, 2, 3,…N, whose frequencies of type (+) and (-) sign are
classified into 2 x 2 contingency table as shown below.
After mediation
Before meditation
Non-smoker (-) Smoker (+) Total
Smoker (+) a b a+b

Non-smoker (-) c d c+d
Total a+c b+d N
Responses changed from (+) to (-) and (-) to (+) are indicated in
cell ‘a’ and ‘d’ respectively. Responses of no changes are in
closed in cells ‘b’ and ‘c’.
The researcher’s wishes to draw inferences about the effect of
the mediation of smoking habits from the observations.
a = 12, b = 5, c = 5, d = 8
Does the mediation reduce the smoking habits?
Mann–Whitney U test
It is also called the

Mann–Whitney Wilcoxon (MWW),
or
Wilcoxon rank-sum test,
or
Wilcoxon–Mann–Whitney test
Steps
• Small sample case i.e. n1 ≤ 10 and n2 ≤ 10
(i.e. n1+ n2 ≤ 20)
• Rank these n observations either from
smallest to largest or largest to smallest
value.
• If the value of two or more observations
are equal, then assign each of the tied
observations by average of their rank.
Steps
• where n1 is the sample size for sample 1,

and R1 is the sum of the ranks in sample 1.
• where n2 is the sample size for sample 2,

and R2 is the sum of the ranks in sample 2.
• So that U1 + U2 = n1n2
Steps
• Test Statistics is the smaller value of U1 and U2 is
the one used when consulting significance tables.
i.e. Ucal = minimum ( U1, U2).
• Critical value: Value of U at α level of significance
at (n1, n2) degree of freedom.
• Decision: If Ucal ≤ Utab then we can not accept the
null hypothesis, otherwise accept it.
Example
A new drug was evaluated for antihypertensive
effect against the placebo. The table represents
the diastolic BP of two groups after oral
administration of drug and placebo. We want to
find out whether the drug is effective or not.
Group A 78 87 100 99 93 86 89 98
(Placebo)
Group B 75 88 91 72 79 81 86 84 79
(Drug)
Solution
Given, n1 = 8 and n2 = 9
Hypothesis
Null hypothesis (H0): f(A) = f(B) i.e. the new drug has no
effect on blood pressure on two groups.
Alternative hypothesis (H1): f(A) ≠ f(B) i.e. the new drug has
effect on blood pressure on two groups.
Ranking of BP combining of two groups:
Ass. 72 75 78 79 79 81 84 86 86 87 88 89 91 93 98 99 100
order
Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Assigned 1 2 3 4.5 4.5 6 7 8.5 8.5 10 11 12 13 14 15 16 17

rank
Solution (cont…)
Separate the ranks of two groups and take
summation of ranks for individual groups.
Group A (Placebo) Rank of A (R1) Group B (Drug) Rank of (R2)
78 3 75 2
87 10 88 11
100 17 91 13
99 16 72 1
93 14 79 4.5
86 8.5 81 6
89 12 86 8.5
98 15 84 7
79 4.5
n1 = 8 ∑R1 = 95.5 n2 = 9 ∑R2 = 57.5
Solution (cont…)
The U value calculated as
= = 12.5
And
= = 59.5
So the Ucal = minimum ( U1, U2) = min.( 12.5, 59.5) = 12.5

Critical value: Value of U at 5% at (8, 9) degree of freedom is 15.
Decision: Since Ucal < Utab , we can not accept the null
hypothesis.
Conclusion: It is conclude that the new drug is effective in
reducing blood pressure.
Example
Two independent random samples drawn
from two populations are as follows:
Sample I 4 6 9 10 12 14 15
Sample II 1 2 3 4 7 11
Apply Mann-Whitney U test with α = 0.05, test

whether two samples have come from the
same populations or from population with
equal mean.
Example
The following are the score of certain
randomly selected Health Science students at
mid-term (MT) and final examination (FE):
MT 55 57 72 90 57 74
FT 80 76 63 58 56 37 75
Set up null hypothesis and alternative

hypothesis and test your null hypothesis using
Mann-Whitney U-test.
SIGN TEST
• A sign test is a very simple and easiest non-
parametric test which applies to the median.
• Its name comes from the fact that it is based
on the two signs minus(-) and plus(+) sings of
observations rather than numerical values in a
sample.
• This test is applied for testing hypothesis
concerning
– The specified value of median for one population
– The median of difference between paired data of
two dependent samples
ONE SAMPLE SIGN TEST
Small Sample Test (n ≤ 25)
• Let a random sample x1, x2, .........,xn of small size be drawn from
a population with an unknown median Md Also, P(X = Md) = 0
• Null hypothesis (H0): Md = where is an specified value of the
population median.
• Or : H0 : Total number of plus(+) signs = total number of minus(-)
signs i,.e. P(X > ) = P(X < )
• Alternative hypothesis(H1): Md (two tailed test)
• or Md > (right tailed test)
• or Md < (left tailed test)
Contd…
• Test statistics: the test statistics for testing the null hypothesis is
• k = number of less frequent sign obtained from the
following process:
– Assign plus(+) sign to difference d = Y – X, if d > 0
– Assign minus(-) sign to difference d = Y – X, if d < 0
– Assign zero(0) to difference d = Y – X, if d = 0
• Count the number of plus(+) and minus(-) sign denoted
by n(+) and n(-) respectively
• When d = 0 is occurred, it is said to be a tie and it is
denoted by “t”
• Then the effective sample size will be n e = n – t
i.e. ne = n(+) + n(-)
Contd…
• The test statistics is k = number of less frequent sign.
• k = minimum {n(+), n(-)}
• Critical region: We obtain the p-value p0 the
probability that the binomial variate from the
binomial table.
• Decision:
(i) For one tailed test, if p0 ≤ α, we reject the null
hypothesis and if p0 > α, we accept the null
hypothesis.
(ii) For two tailed test, if 2p0 ≤ α, we reject the null
hypothesis and if 2p0 > α, we accept the null
hypothesis at the pre-assigned level of significance.
Example
Q1. The data of percentage of broken groundnut pods
recorded from an experiment are given below:
9.8, 10.4, 11.5, 10.4, 8.5, 8.0, 10.5, 7.5, 8.8, 9.2
Test the hypothesis that the percentage of broken
pods has the median 8. Use sign test.
Solution:
• Null hypothesis (H0): Md = 8 i.e. the percentage of
broken pods has the median 8.
• Alternative hypothesis(H1): Md ≠8 i.e. the percentage
of broken pods do not have the median 8.
• Test Statistics:
9.8 10.4 11.5 10.4 8.5 8.0 10.5 7.5 8.8 9.2
+ + + + + 0 + - + +
Here total number of plus sign, n(+) = 8

total number of minus sign, n(-) = 1
and number of ties, t = 1
The effective sample size ne = n – t = 10 – 1 = 9
Solution: Contd…
• Therefore, the test statistics , under H0, is
• k = minimum {n(+), n(-)}
= minimum (8,1) = 1
• Critical value: The p-value p0 , the probability that the
total number of minus (-) sign is less than or equal to 1
is obtained from binomial table is 0.02.
For two tail test, the critical value is 2p0 = 2x 0.02 = 0.04
• Decision: Since 2p0 = 0.04 is less than α = 0.05, we
reject the null hypothesis.
• Conclusion: Hence we conclude that the percentage of
broken pods has not the median 8.
Example
Q2. Ten students were given a test in statistics.
Then they were given a one month’s special
coaching and another similar test was held.
The marks obtained by the 10 students before
and after the special coaching were given
below:
S.No. 1 2 3 4 5 6 7 8 9 10
Before Coaching 12 15 10 13 18 10 8 17 9 7
After Coaching 12 17 12 12 14 12 16 16 18 12
Using sign test, test whether the special

coaching is effective or not at α = 0.05 level.
Solution:
Let Xi and Yi (i = 1, 2, ........,10) be the marks
obtained by the ten students before and after
the coaching respectively. Let and be the
median marks before and after the coaching.
Null hypothesis (H0): = , the special
coaching is not effective.
Null hypothesis (H1): > , the special
coaching is effective.
k = Total number of less frequent gign which is
obtained as follows.
Solution: Contd…
S.No. 1 2 3 4 5 6 7 8 9 10
X 12 15 10 13 18 10 8 17 9 7
Y 12 17 12 12 14 12 16 16 18 12
D=Y–X 0 2 2 -1 -4 2 8 -1 9 5
Sign 0 + + - - + + - + +
Here, n = 10, n(+) = 6, n(-) = 3, t = 1

Therefore, the effective sample size ne = n – t = 10 – 1 = 9
i.e. ne = n (+) = n(-) = 6 + 3 = 9
Therefore the test statistics is
k = Total number of less frequent sign
= Minimum { n(+), n(-) } = { 6, 3 } = 3
Critical value: For the effective sample size ne = 9, The p-value p0 , the probability
that the total number of minus (-) sign is less than or equal to 3 is obtained from
binomial table is 0.254.
Decision: Since p0 = 0.254 is very much greater than α = 0.05, we accept the null
hypothesis at 5% level of significance.
Conclusion: Hence we can conclude that the special coaching is not effective.
Wilcoxon Matched Pairs Test (Signed Rank
Test)
 The test is an alternative to the paired t-test.
 When the assumption of normality or equality
of variances are not meet.
 This test can be applied when
effectiveness of some treatment/ activity is to
check.
Differences between two treatments used on two
random samples.
Procedure for Wilcoxon test (small sample i.e. n<25)
1. State the hypothesis:
Null hypothesis(H0): f(X) = f(Y), the X and Y’s have come from similar
population.
Alternative hypothesis(H1): f(X) ≠ f(Y), the X and Y’s have not come from
similar population.
2. Obtain the difference d = X – Y
3. Rank absolute values of these differences (i.e., disregard signs).
4. If any value occur more than once, take average of those ranks as the
assigned rank of each value.
5. The ranks are assumed according to the sign of differences, plus and minus
and denote them R+ and R- respectively.
6. Obtain the sum of the ranks with the +ve sign and denote them by R+.
Similarly, obtain the sum of the –ve sign ranks denoted by R-.
7. Wilcoxon T statistics is the smallest value of R+ and R- i.e. Tcal = min.(R+, R-)
8. Take tabulated value from Wilcoxon Table.
9. Decision: It Tcal ≤ Ttab at α level of significance, then null hypothesis is
rejected, otherwise accepted.
Example
To evaluate the effectiveness of a new
antidiabetic drug, blood glucose level (mmol/l)
were measured before (X) and after (Y) oral
administration of the drug to 10 diabetic
subjects in a clinical trial. Results are given in
table below. We are interested to find if the
drug effective in reducing blood glucose level.
X 18 15 26 16 12 11 21 19 9 20
Y 6 21 14 14 10 15 11 13 3 10
Solution
Null hypothesis(H0): f(X) = f(Y), the drug is not effective in reducing blood
glucose level.
Alternative hypothesis(H1): f(X) ≠ f(Y), the drug is effective in reducing blood
glucose level.
X Y d=X-Y Rank of difference ( R ) R+ R-
18 6 12 9.5 9.5
15 21 -6 5 5
26 14 12 9.5 9.5
16 14 2 1.5 1.5
12 10 2 1.5 1.5
11 15 -4 3 3
21 11 10 7.5 7.5
19 13 6 5 5
9 3 6 5 5
20 10 10 7.5 7.5
Sum 47 8
Solution (cont…)
The test statistics is given by
Tcal = minimum (R+, R-) = min.(47, 8) = 8
Critical value: Value of T at 5% level of significance
for 10 d.f. is 8.
Decision: Since Tcal = Ttab, the null hypothesis is not
accepted.
Conclusion: We conclude that the drug is effective
in reducing blood glucose level.
Example
A bioavailability study was conducted at BIRDEM to
compare the time to peak plasma level for two oral
formulations of Aceclofenac. The time to peak
concentration of two formulations in the plasma
were measured after administration of the both
drugs to each of 12 persons on two different
occasions. Results are given below:
Subject 1 2 3 4 5 6 7 8 9 10 11 12
A 2.5 3 1.25 1.75 3.5 2.5 1.75 2.25 3.5 2.5 2 3.5
peak(hr)
Time to
B 3.5 4 2.5 2 3.5 4 1.5 2.5 3 3 3.5 4

Kolmogorov-Smirnov Test
(Normality Test)
• This is the nonparametric analog to the two-sample t-test with unequal
variances.
• It is often used when the data have not met either the assumption of
normality or the assumption of equal variances.
• Assumptions:
Variable at least ordinal
Two samples are independent
Both simple random samples
Identical distributions
(In probability theory and statistics or other collection of random
variables is independent and identically distributed (i.i.d.) if each
random variable has the same probability distributions the others and
all are mutually independent.)
Kolmogorov-Smirnov Test
(Normality Test) Cont…
• This test has poor statistical efficiency.
• Many nonparametric statistics are based on ranks and
therefore measure differences in location.
• The K-S examines a single maximum difference between two
distributions.
• If a statistical difference is found between the distributions of
X and Y, the test provides no insight as to what caused the
difference.
• The difference could be due to differences in location (mean),
variation (standard deviation), presence of outliers, type of
skewness, type of kurtosis, number of modes, and so on.
Procedure
• The null and alternative hypotheses for the K-S test
relate to the equality of the two distribution
functions [usually noted as F(X) or F(Y)].
• Thus, the typical two-tailed hypothesis becomes:
Ho: F(X) = F(Y), the two independent random samples
have drawn from the same or identical population.
H1: F(X) ≠ F(Y), the two independent random samples
have drawn from the same or identical population.
Procedure Cont…
• Find Xmin and Xmax for 2 samples and lay out a column of class
categories.
• List the cumulative frequencies of the two samples in respective
columns.
• Determine relative expected frequencies by dividing by sample sizes.
• Determine the absolute differences (d) between relative expected
frequencies.
• Identify largest d, becomes Dmax
• Multiply Dmax by n1n2 (calc test value).
• Compare Dmaxn1n2 with critical value in table.
• Critical value Dn1,n2,α is obtained from the table of Kolmogorove-
Smirnov table.
• Decision: If calculative value is less than or equal to tabulated value,
then we accept the null hypothesis, otherwise reject it.
SMALL SAMPLE CASE
 n1 + n2 ≤ 40 and both are less than 20i.e. n1 = n2 ≤ 20
 n1 and n2 are two independent random samples drawn from two
populations with continuous distribution function F(x) and F(y)
respectively.
 Hypothesis:
 H0: F(x) = F(y), The two independent random samples have been
drawn from the same or identical population. OR there is reasonable
agreement between the two sample distribution function
 H1: F(x) ≠ F(y), The two independent random samples have not been
drawn from the same or identical population. OR there is not
reasonable agreement between the two sample distribution function.
 H1: F(x) > F(y), for one tail test
 H1: F(x) < F(y), for one tail test
SMALL SAMPLE CASE cont…
 Dmaxn1n2 = Maximum of for two tail
test.
 D+ = Maximum of for one tail test.
 D- = Maximum of for one tail test.
 Where F(X) = K1/n1 and F(Y) = K2/n2.
 K1 = the observed cumulative frequency of X.
 K2 = the observed cumulative frequency of Y.
Example
The life times (100 hours) of electric bulbs used in
O.T. randomly selected from two different brands
are given below:
Brand X 12 14 11 16 14 12 14 16 12 12
Brand Y 11 17 11 11 11 14 11 12 11 11
Test whether the two distribution for the length

of life of electric bulbs of two brands are same or
different. Use K-S test for Alfa = 0.05.
Solution
 Null hypothesis (H0): F(x) = F(y), The two distribution of
life times of electric bulbs of two brands are same.
 Alt. hypothesis (H1): F(x) ≠ F(y), The two distribution of
life times of electric bulbs of two brands are different.
Combined Frequency Frequency F(X) F(Y)
ordered life of X of Y
11 1 7 1/10 7/10 6/10
12 4 1 5/10 8/10 3/10
14 3 1 8/10 9/10 1/10
16 2 0 10/10 9/10 1/10
17 0 1 10/10 10/10 0
Solution Cont…
Here n1 = n2 = 10
The test statistics is
Dmaxn1n2 = Maximum of = 6/10 = 0.6
Critical value: Value of D for n1 = n2 = 10 at 5% level of
significance for two tail test is 0.294.
Decision: Calculated value is greater than tabulated
value, Null hypothesis is not accepted.
Conclusion: Since null hypothesis is rejected, we can
conclude that the two distribution of life times of
electric bulbs of two brands are different.
LARGE SAMPLE SIZE
 n1 + n2 > 40 and both are greater than 20 i.e. n1 =
n2 > 20 (for n1 ≠ n2).
 Two large samples of equal and unequal size,
the K-S test statistics for two tail test is given by
Dmaxn1n2 = Maximum of
 But for one tailed test situation, the K-S test
statistics is calculated by using chi-square which
can be derived as = 4(Dn1,n2)2
Which chi-square distribution wit 2 degree of
freedom.

Non Parametric Test

Uploaded by

Copyright:

Available Formats

Non Parametric Test

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Non Parametric Test

Uploaded by

Copyright:

Available Formats

Non-parametric Test

Testing the Goodness of Fit

• Chi square test is applied as a test of “goodness of fit”.

The sampling distribution of the chi-square statistic is

0 10 9.375 0.63 0.39 0.04

1 44 46.875 -2.88 8.27 0.18

2 100 93.75 6.25 39.1 0.42

3 86 93.75 -7.75 60.1 0.64

4 48 46.875 1.13 1.27 0.03

5 12 9.375 2.63 6.89 0.74

Total ∑O = 300 ∑E = 300

Let X be a random variable of number of males

The different classification of the two

Degree of freedom = (r – 1) (c – 1), r = no. of row, c = no. of column.

χ2 = Other steps are same as above.

Not inoculated 26 74 100

Total 36 164 200

O = 26, E = = 18, O = 74, E = = 82

Level of significance = 5% = 0.05

Decision: If the calculated value of χ2 > tabulated

Can the researcher conclude that the two communities are

Null hypothesis(H0): P1 = P2 i.e. there is no

Critical value: it is the adding of all probabilities

 Like the Chi-square test, data need to be arranged in a

The counts in the table for a case-control study are

The counts in the table for a before-after study are

Died within 5 years 5 90

Smoker (+) a b a+b

It is also called the

• where n1 is the sample size for sample 1,

• where n2 is the sample size for sample 2,

Assigned 1 2 3 4.5 4.5 6 7 8.5 8.5 10 11 12 13 14 15 16 17

So the Ucal = minimum ( U1, U2) = min.( 12.5, 59.5) = 12.5

Apply Mann-Whitney U test with α = 0.05, test

Set up null hypothesis and alternative

Here total number of plus sign, n(+) = 8

Using sign test, test whether the special

Here, n = 10, n(+) = 6, n(-) = 3, t = 1

B 3.5 4 2.5 2 3.5 4 1.5 2.5 3 3 3.5 4

Test whether the two distribution for the length

12 4 1 5/10 8/10 3/10

14 3 1 8/10 9/10 1/10

16 2 0 10/10 9/10 1/10

You might also like