Module 5c Nonparametric Tests
Module 5c Nonparametric Tests
PARAMETRIC
TESTS
Learning Objectives:
At the end of the discussion, the student should be able to:
• Define nonparametric tests and explain when they may be desirable.
• Use the One-Sample Runs Test.
• Use the Wilcoxon Signed –Rank Test for dependent samples.
• Use the Wilcoxon Rank Sum Test for two independent samples.
Classifications of Test of Hypothesis
PARAMETRIC AND NONPARAMETRIC
Parametric Tests
• Parametric hypothesis tests require the estimation of one or more
unknown parameters (e.g., population mean or variance).
• Often, unrealistic assumptions are made about the normality of
the underlying population.
• Large sample sizes are often required to invoke the Central Limit
Theorem.
Nonparametric Tests
• In contrast, nonparametric or distribution-free tests
–usually focus on the sign or rank of the data rather than the exact
numerical value.
–do not specify the shape of the parent population.
–can often be used in smaller samples.
–can be used for ordinal data.
– nonparametric methods are procedures that work their magic
without reference to specific parameters (or measure of the
population).
• If the information about the • If there is no knowledge about the
population is completely population parameters (i.e., µ or
known by means of its σ), but still it is required to test the
parameters, (i.e., µ and σ), hypothesis of the population, then
then we use the parametric we use the nonparametric tests
tests (e.g., t-test, z-test, f-test, (e.g., Single-sign test, Wilcoxon
ANOVA, Pearson correlation signed rank test, Wilcoxon rank
coefficient) sum test, chi-square test, Kruskal-
Wallis test, Spearman rank)
https://www.slideshare.net/saiprakash6/distinguish-between-parametric-vs-nonparametric-test1
http://www.mayo.edu/mayo-edu-docs/center-for-translational-science-activities-documents/berd-5-6.pdf
Test statistics is based on the distribution test statistics is arbitrary
https://www.slideshare.net/saiprakash6/distinguish-between-parametric-vs-nonparametric-test1;
Statistical analysis with software applications by McGraw Hill Education; Probability and statistics by Walpole
Parametric tests require the estimation of Nonparametric test assumes no
one or more unknown parameters knowledge of the distributions of the
(population parameters) population, except that it is continuous
Parametric Tests are applicable only for Nonparametric tests are applicable to
variable both variable and attributes
No parametric test exist for nominal scale Nonparametric test do exists for nominal
data and ordinal scale data
Parametric test is more powerful when Nonparametric test is more efficient
data assumes normality when data seriously departs from
normality
https://www.slideshare.net/saiprakash6/distinguish-between-parametric-vs-nonparametric-test1
http://www.mayo.edu/mayo-edu-docs/center-for-translational-science-activities-documents/berd-5-6.pdf
Parametric Tests and Analogous Nonparametric
Tests based on the purpose:
Table 1 on the next slide contains the names of several familiar
statistical procedures and categorizes each one as parametric or
nonparametric. All of the parametric procedures listed in the table rely
on an assumption of approximate normality.
https://www.mayo.edu/research/documents/parametric-and-nonparametric-demystifying-the-terms/doc-20408960
Parametric Procedure Nonparametric Procedure
Purpose of Test Example
(Normal Theory Based (Corresponding Nonparametric
(Analysis Type)
Tests) Test)
Test whether the median income
of the residents in barangay A
T test for independent Mann Whitney U Test Compares two
is greater than the median
samples Wilcoxon Rank Sum Test distinct/independent samples
income of the residents in
barangay B
Determine whether there is a
Examines a set of differences significant change in the
Paired Sample Sign Test (i.e., Compare two quantitative standard deduction of single
Paired t test
Wilcoxon Signed-Rank Test measurements taken from the taxpayers after the new tax law
same individual) has been implemented from
2017 to 2019.
Assesses the linear association A market analysts would like to
between two variables(i.e., see the relationship between the
Pearson Correlation Spearman Rank Correlation quality of a commodity and its
Estimate the degree of
Coefficient Coefficient market price.
association between two
quantitative variables)
Test the hypothesis that
One Way Analysis of Kruskal-Wallis Analysis of Variance Compares three or more defective items produced by the
Variance by Ranks distinct/independent groups three brands of machines A,B, C,
are equal.
Source: Applied Statistics in Business and Economics by Doane and Seward p. 694
Advantages and Disadvantages of
Nonparametric Tests
• If parametric and nonparametric tests are both applicable to the
same data set, we should carry out the more efficient parametric
technique over its nonparametric counterpart.
• Since we do not always have quantitative measurements and the
assumptions of normality is not at all times justified, the
nonparametric tests or distribution-free method will compliment to
their customary parametric tests.
• With nonparametric tests data analyst has more ammunition to
accommodate a wider variety of experimental situations.
Statistical analysis with software applications by McGraw Hill Education; Probability and statistics by Walpole
Advantages of Nonparametric Tests
1. They can be used to test population parameters when the variable is not
normally distributed.
2. They can be used when the data are nominal or ordinal.
3. They can be used to test hypotheses that do not involve population parameters.
4. Can be used in small samples, thus, assumptions of normality are not required.
5. In most cases, the computations are easier than those for the parametric
methods.
6. Their interpretation is often more direct than the interpretation of parametric
tests.
7. Nonparametric tests are simple and easy to understand.
8. It will not involve complicated sampling theory.
9. No assumption is made regarding the population.
Disadvantages of Nonparametric Tests
1. They are less sensitive than their parametric counterparts when the
assumptions of the parametric methods are met.
2. Larger differences are needed before the null hypothesis can be rejected.
3. They do not utilize all the information provided by the sample.
4. Require special tables for small samples.
5. They are less efficient than their parametric counterparts when the assumptions
of the parametric methods are met (normality).
6. Larger sample sizes are needed to overcome the loss of information.
7. It can be applied only for nominal or ordinal scale.
Ranking of Data
There are many applications in business where data are reported not as
values on a continuum but rather on an ordinal scale, thus, assigning ranks to
the values is necessary to draw an analysis of the data. The distribution-free
methods therefore allows the data analyst to make an analysis of ranks
rather than the actual data values which makes nonparametric tests very
appealing and intuitive.
For example, assuming that the nonparametric test is applicable, and an
HR personnel would like to determine the degree of relationship between
the performance rank obtained by the ten trainees during the first and
second evaluation period. A nonparametric test could then be used to
determine if there is an agreement between the two rank evaluations.
Statistical analysis with software applications by McGraw Hill Education; Probability and statistics by Walpole
Thus, since nonparametric tests can be applied to ordinal scale of data
measurement, it is important for the analyst to be efficient in ranking data
sets.
Example 1:
The following set of values are ranked from highest to lowest:
Student A B C D E
Grade 87 65 78 93 85
Rank 2 5 4 1 3
The following set of values are ranked from lowest to highest using the
Example 2:
average ranking for repeated values :
Contestant A B C D E
Score 3 1 3 4 6
Student A B C D E F G H I
Math score 6 3 8 6 9 9 11 2 9
rank 6.5 8 5 6.5 3 3 1 9 3
3
Example 4: It may also be necessary to get the sum of ranks according to sign, R+ and R-.
Call Center Rank During 1st Rank During 2nd Rank 1 - Rank 2 Rank of the Difference Between Ranks
/D/
Trainee Evaluation Evaluation (D) Rank of /D/ Average rank Signed Average rank
A 8 7 1 1 1 1 1
B 2 5 -3 3 6 8 -8
C 7 10 -3 3 7 8 -8
D 1 4 -3 3 8 8 -8
E 4 2 2 2 2 3.5 3.5
F 9 6 3 3 9 8 8
G 3 1 2 2 3 3.5 3.5
H 6 9 -3 3 10 8 -8
I 10 8 2 2 4 3.5 3.5
J 5 3 2 2 5 3.5 3.5
R+ = 23 sum of positive ranks
R- = 32 sum of negative ranks
In this module three
nonparametric tests will be
presented:
• The One-Sample Runs Test
• Wilcoxon Signed-Rank Test
• Wilcoxon Rank Sum Test
One-Sample Runs Test
The one-sample runs test is also called the Wald-Wolfowitz test
after its inventors, Abraham Wald (1902-1950), and his student Jacob
Wolfowitz.
• One-sample runs test purpose is to detect nonrandomness.
• A nonrandom pattern suggests that the observations are not
independent.
• Here, we investigate whether each observation in a sequence is
independent of its predecessor (or the appearance of one is not
dependent on the appearance of another).
Random Independent Nonrandom Nonindependent
Statistical analysis with software applications by McGraw Hill Education
Runs Test
This test is to determine whether a sequence of binary events (two
outcomes involved) follows a random pattern. A nonrandom sequence
suggests nonindependent observations.
The hypotheses are
Ho: Events follow a random pattern.
H1: Events do not follow a random pattern.
To test the hypothesis of randomness, we first count the number of outcomes of
each type:
n1 : number of outcomes in the first type
n2 : number of outcomes in the second type
n = total sample size = n1 + n2
Wald-Wolfowitz Runs Test
• When n1 ≥10 and n2 ≥10 (large sample situation), the number of
runs R may be assumed to be normally distributed with mean µR and
standard deviation σR.
Summary of Procedures in Conducting a One-Sample Runs Test
STEP 1 State the hypotheses and identify the claim.
STEP 4 Find the critical value using the table for the areas under the normal curve (z - table).
For a given level of significance α, find the critical value zα/2 for a two-tailed test.
Because either too many runs or too few runs would be nonrandom, we choose a two-tailed
test.
STEP 5 Make the decision. Reject the hypothesis of a random pattern if zcalc < −zα/2 or if zcalc > +zα/2.
Because the actual number of runs (R = 8) is far less than the expected (µR = 17.5),
the sample suggests that the null hypothesis, “defects follow a random sequence”,
may be false. To verify this observation, proceed with solving the test statistic zcalc.
standard deviation σR
Step 4. Find the critical value. Because either too many runs or too few runs
would be nonrandom, we choose a two-tailed test.
Thus, the critical value z0.005 for a two-tailed test at α = 0.01 is ± 2.576
(refer to the z-table for the critical value or the table indicated below).
Step 5. Make the decision. Reject the null hypothesis since the test
statistic zcalc = -3.90 is less than the critical value - 2.576.
Because the actual number of runs (R = 14) is considerably near the expected run
(µR = 14.33), the sample suggests that the null hypothesis, “Outcomes A follow a
random sequence”, is true. To verify this observation, proceed with solving the test
statistic zcalc.
standard deviation σR
Step 4. Find the critical value. Because either too many runs or too few runs
would be nonrandom, we choose a two-tailed test.
Thus, the critical value z0.025 for a two-tailed test at α = 0.05 is ± 1.96.
Step 5. Make the decision. Do not reject the null hypothesis since the
test statistic zcalc = -0.13 is greater than the critical value – 1.96.
• When evaluating the difference between paired observations, use the median
difference Md (paired sample 1 – paired sample 2) and zero as the benchmark:
• Calculate the difference between the paired observations.
• Rank the differences from smallest to largest by absolute value.
• Add the ranks of the positive differences to obtain the rank sum W.
Step 2. Find the critical value. At two-tailed test and α = 0.05, the
critical value ±1.96. (using the z –table or the table indicated
below)
-
Machine X X - 5.6 /X - 5.6/ rank R+ R
Step 3. Compute the test statistic following
1 5.88 0.28 0.28 4 4
the procedure shown below. 2 5.91 0.31 0.31 6 6
3 6.11 0.51 0.51 7 7
4 5.55 -0.05 0.05 1.5 1.5
• Calculate the difference between the 5 5.55 -0.05 0.05 1.5 1.5
given data value X and the benchmark 6 6.53 0.93 0.93 13 13
median Mo (X - Mo). 7 5.31 -0.29 0.29 5 5
• Get the absolute value of the 8 5.04 -0.56 0.56 8 8
9 4.99 -0.61 0.61 9.5 9.5
difference /X – Mo /. 10 5.34 -0.26 0.26 3 3
• Rank (or average rank) /X – Mo /. 11 4.99 -0.61 0.61 9.5 9.5
• Separate the positive ranks R+ and the 12 4.84 -0.76 0.76 11 11
negative ranks R- according to the sign 13 7.21 1.61 1.61 15 15
14 4.75 -0.85 0.85 12 12
of the difference X - Mo. 15 4.30 -1.30 1.3 14 14
• Get the sum of R+ , . 16 7.77 2.17 2.17 17 17
• Note that negative ranks are shown but 17 12.28 6.68 6.68 20 20
18 3.74 -1.86 1.86 16 16
not used in the analysis. 19 8.58 2.98 2.98 18 18
20 11.99 6.39 6.39 19 19
sum 126.65 210 119 91
Calculation for test statistic:
sum of all positive ranks W
Step 2. Find the critical value. At one-tailed test and α = 0.10, the critical
value ±1.28. (using the z –table or the table indicated below)
Step 3. Compute the test statistic following the Week Old New d /d/ Rank R+ R-
procedure shown below. 1 16 14 2 2 5.5 5.5
2 28 12 16 16 22 22
3 22 10 12 12 20 20
• Calculate the difference d between the 4
5
14
26
10
13
4
13
4
13
8.5
21
8.5
21
paired observations. 6 17 12 5 5 11 11
7 16 16 0 0 1 1
• Get the absolute value of the difference d. 8 19 10 9 9 18 18
• Rank (or average rank) the absolute 9
10
17
15
11
18
6
-3
6
3
14
7
14
-7
difference. 11 20 16 4 4 8.5 8.5
12 18 12 6 6 14 14
• Separate the positive ranks R+ and the 13 18 13 5 5 11 11
negative ranks R- according to the 14 18 12 6 6 14 14
15 16 17 -1 1 3 -3
corresponding sign of the difference. 16 14 19 -5 5 11 -11
α = 0.10
• Wilcoxon rank sum test is named after statisticians Frank Wilcoxon (1892 - 1965),
Henry B. Mann (1905-2000), and D. Ransom Whitney (1915 - 2007).
• Compares two populations whose distributions are assumed to be the same except for
a shift in location.
• It is a test of differences between the medians of two different populations that are
obviously nonnormal, and samples are independent (no pairing of observations).
• It is analogous to the t – test for two independent sample means, thus, it requires
independent samples from populations with equal variances.
• It does not assume normality.
Wilcoxon Rank Sum Test for Two Independent Samples
The test of the hypothesis can be either one-tailed or two-tailed test.
• If testing for the difference of two population medians, then the test is two-
tailed.
• If one median is greater or less than the other, the test in one-tailed.
Note:
• This module will illustrate only a large-sample version (n1 ≥ 10, n2 ≥ 10) of
this test, thus we can use the z-test.
• The test statistics is the difference in mean ranks, divided by its standard
error.
For large samples (n1 ≥ 10, n2 ≥ 10), use a z-test. The test statistic is
computed using any of the two methods :
❖ Performing the Test using Method A
STEP 2 Find the critical. For large samples (n1 ≥ 10, n2 ≥ 10), use a z-test.
STEP 4 Make the decision. For a given α, reject the null hypothesis if zcalc < −zα/2 or if zcalc > +zα/2.
Day 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Product name A B A A B B B A A A B B A A
Quantity
17 18 20 16 20 18 16 21 15 19 15 18 17 20
Produced
Step 1. The hypotheses are:
Step 2. Find the critical value. At two-tailed test and α = 0.05, the critical
value ±1.96. (using the z –table or the table indicated below)
Product Name Quantity Produced Rank
Step 3. Compute the test statistic following the procedure shown B
A
14
15
1
3
below. A
B
15
15
3
3
A 16 6
A 16 6
• Sort the combined samples from lowest to highest. B 16 6
A 17 10.5
• Assign a rank to each value, use the average of the ranks B 17 10.5
B 17 10.5
when there are tied as shown on the table. B 17 10.5
A 17 10.5
• Separate the data into two groups according to A 17 10.5
A 18 16
classification, in this case (Product A, Product B) as A 18 16
B 18 16
shown on the table on the next slide. B 18 16
Step 4. Make the decision. Do not reject the null hypothesis since the test statistic
zcalc = 0.1382 is less than the critical value +1.96.
Step 5. Summarize the results. There is not enough evidence to reject the claim that the
average production of product A and product B are equal.
References:
• Statistical analysis with software applications by
McGraw Hill Education
• Source: Applied Statistics in Business and Economics by
Doane and Seward p. 694
• Probability and statistics by Walpole, R., Myers, R., and
Myers, S.
• http://www.mayo.edu/mayo-edu-docs/center-for-
translational-science-activities-documents/berd-5-6.pdf
• https://www.slideshare.net/saiprakash6/distinguish-
between-parametric-vs-nonparametric-test1