Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
100% found this document useful (1 vote)
106 views

Introduction To Non Parametric Statistical Methods Research Gate

Uploaded by

Aaron Fuchs
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
106 views

Introduction To Non Parametric Statistical Methods Research Gate

Uploaded by

Aaron Fuchs
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/322677728

INTRODUCTION TO NONPARAMETRIC STATISTICAL METHODS

Book · January 2018

CITATION READS

1 32,971

3 authors:

Christian Akrong Hesse Ezekiel Nortey


University of Professional Studies University of Ghana, College of Basic and Applied Sciences
27 PUBLICATIONS   69 CITATIONS    58 PUBLICATIONS   185 CITATIONS   

SEE PROFILE SEE PROFILE

John Benjamin Ofosu


Methodist University College Ghana
15 PUBLICATIONS   65 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

SHORT COURSE IN SURVEY METHODS, DATA MANAGEMENT AND ANALYSIS FOR BUSINESS AND INDUSTRY PROFESSIONALS View project

Report Writing of the 2010 Population and Housing Census of Ghana View project

All content following this page was uploaded by Christian Akrong Hesse on 24 January 2018.

The user has requested enhancement of the downloaded file.


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

C. A. HESSE, BSc, MPhil, PhD.


Senior Lecturer of Statistics,
Methodist University College Ghana.

J. B. OFOSU, BSc, PhD, FSS.


Professor of Statistics,
Methodist University College Ghana.

E. N. NORTEY, BSc, MPhil, PhD.


Senior Lecturer of Statistics,
University of Ghana.
INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

Copyright © 2017
Akrong Publications Ltd.

All rights reserved

No part of this publication may be reproduced, in part or in whole, stored in a retrievable


system, or transmitted in any form or by any means, electronic, mechanical, photocopying,
recording or otherwise, without prior permission of the publisher.

Published and Printed by


AKRONG PUBLICATIONS LIMITED
P. O. BOX M. 31
ACCRA, GHANA

(0244 648 757, 0264 648 757)

ISBN: 978–9988–2–6059–0

Published, 2017
akrongh@yahoo.com.

ii
INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

PREFACE
A statistical method is called non-parametric if it makes no assumption on the population
distribution or sample size. This is in contrast with most parametric methods in elementary
statistics that assume that the data set used is quantitative, the population has a normal
distribution and the sample size is sufficiently large. In general, conclusions drawn from non-
parametric methods are not as powerful as the parametric ones. However, as non-parametric
methods make fewer assumptions, they are more flexible, more robust, and applicable to non-
quantitative data.
This book is designed for students to acquire basic skills needed for solving real life
problems where data meet minimal assumption and secondly to beef up their reading list as
well as provide them with a “one shop stop” textbook on Nonparametric.

Our Approach
This book is an introduction to basic ideas and techniques of nonparametric statistical methods
and is intended to prepare students of the sciences as well as the humanities, for a better
understanding of some underlying explanations of real life situations. Researchers will find
the text useful since it provides a step-by-step presentation of procedures, use of more practical
data sets, and new problems from real-life situations. The book continues to emphasize the
importance of nonparametric methods as a significant branch of modern statistics and equips
readers with the conceptual and technical skills necessary to select and apply the appropriate
procedures for any given situation.
Written by leading statisticians, Introduction to Nonparametric Statistical Methods,
provides readers with crucial nonparametric techniques in a variety of settings, emphasizing
the assumptions underlying the methods. The book provides an extensive array of examples
that clearly illustrate how to use nonparametric approaches for handling one- or two-sample
location and dispersion problems, dichotomous data, one-way analysis of variance, rank tests,
goodness-of-fit tests and tests of randomness.
A wide range of topics is covered in this text although the treatment is limited to the
elementary level. There are solved, partly solved and unsolved assignments with every section,
to make the student or reader familiar with the methods introduced.
C. A. Hesse
J. B. Ofosu
E. N. Nortey
July, 2017

iii
INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

CONTENTS
1. Preliminaries............................................................................................................. 1
1.1 Introduction ...................................................................................................... 1
1.2 Parametric and nonparametric methods ........................................................... 2
1.3 Parametric versus nonparametric methods ....................................................... 2
1.4 Classes of nonparametric methods ................................................................... 3
1.5 When to use nonparametric procedures............................................................ 4
1.6 Advantages of nonparametric statistics ............................................................ 4
1.7 Disadvantages of nonparametric tests .............................................................. 6
1.8 The scope of this book ...................................................................................... 6
1.9 Format and organization ................................................................................... 6

2. One-Sample Nonparametric Methods ................................................................... 8


2.1 Introduction ...................................................................................................... 8
2.2 The one-sample sign test .................................................................................. 9
2.2.1 Assumptions ......................................................................................... 9
2.2.2 Hypotheses ............................................................................................ 10
2.2.3 Large sample approximation ................................................................ 14
2.2.4 Confidence interval for the median based on the sign test ................... 16
2.3 The Wilcoxon signed-ranks test ....................................................................... 18
2.3.1 Assumptions ......................................................................................... 18
2.3.2 Hypotheses ............................................................................................ 18
2.3.3 Test statistic .......................................................................................... 18
2.3.4 Carrying out the Wilcoxon signed ranks test ........................................ 19
2.3.5 Large sample approximation ................................................................ 22
2.3.6 Confidence Interval for the Median based on the Wilcoxon
Signed-Ranks Test ................................................................................ 25
2.4 The binomial test .............................................................................................. 30
2.4.1 Assumptions ......................................................................................... 31
2.4.2 Hypotheses ............................................................................................ 31
2.4.3 Large sample approximation ................................................................ 34
3.4.4 Large sample confidence interval for p ................................................ 35
2.5 The one-sample runs test for randomness ........................................................ 36

iv
INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

3. Procedures That Utilize Data from Two Independent Samples .......................... 40


3.1 Introduction ...................................................................................................... 40
3.2 The median test ................................................................................................. 41
3.2.1 Assumptions ......................................................................................... 41
3.2.2 Hypotheses ............................................................................................ 41
3.2.3 Large sample approximation ................................................................ 43
3.3 The Mann-Whitney (Wilcoxon rank-sum) test ................................................ 50
3.3.1 Assumptions ......................................................................................... 50
3.3.2 Hypotheses ............................................................................................ 50
3.3.3 Large-Sample Approximation .............................................................. 55
3.3.4 Confidence interval for difference between two population medians .. 57
3.4 The Wald-Wolfowitz two-sample runs test ...................................................... 61
3.5 The two-sample runs test for randomness ........................................................ 66

4. Procedures Using Data from Two Related Samples ............................................. 71


4.1 Introduction ...................................................................................................... 71
4.2 The sign test for two related samples ............................................................... 71
4.2.1 Introduction........................................................................................... 72
4.2.2 Assumptions ......................................................................................... 72
4.2.3 Test procedure ...................................................................................... 72
4.2.4 Hypotheses ............................................................................................ 72
4.2.5 Confidence interval for the differences of the medians of two populations,
based on the sign test ............................................................................ 75
4.3 Wilcoxon matched-pairs signed-ranks test ....................................................... 76
4.3.1 Introduction........................................................................................... 76
4.3.2 Assumptions ......................................................................................... 76
4.3.3 Test Procedure ...................................................................................... 76
4.3.4 Hypotheses ............................................................................................ 77
4.3.5 Large sample approximation ................................................................ 79
4.3.7 Confidence interval for the median of population differences
between pairs of measurements based on the matched-pair
signed ranks Wilcoxon test ................................................................... 80
4.4 A test for two related samples when the data consist of
frequencies (The McNemar test) ...................................................................... 87

v
INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

5. Chi-Square Test of Homogeneity and Independence ........................................... 94


5.1 Introduction ...................................................................................................... 94
5.2 The chi-square test of homogeneity.................................................................. 94
5.3 The chi-square test of independence ................................................................ 104

6. Procedures Using Data from Three or More Independent Samples ................... 114
6.1 Introduction ...................................................................................................... 114
6.2 Extension of the median test ............................................................................. 114
6.3 The Kruskal-Wallis one-way analysis of variance by Ranks ........................... 120
6.4 The Jonckheere-Terpstra test for ordered alternatives ..................................... 134

7. Procedures Using Data from Three or More Related Samples ........................... 143
7.1 Introduction ...................................................................................................... 143
7.2 Data from a randomized complete block design .............................................. 144
7.3 Friedman two-way analysis of variance by ranks ............................................ 145
7.4 Page’s test for ordered alternatives ................................................................... 155

8. Goodness-of-Fit Tests .............................................................................................. 161


8.1 Introduction ...................................................................................................... 161
8.2 The chi-square goodness-of-fit test .................................................................. 161
8.3 Kolmogorov-Smirnov goodness-of-fit test ....................................................... 173
8.3.1 The Kolmogorov–Smirnov goodness-of-fit test for a single sample ...... 173
8.3.2 The Kolmogorov–Smirnov two-sample test ........................................... 180

9. Rank Correlation ..................................................................................................... 190


9.1 Introduction ...................................................................................................... 190
9.2 Spearman’s rank correlation coefficient ........................................................... 191
9.3 Kendall’s rank correlation coefficient .............................................................. 199

Answers to Exercises ................................................................................................ 207

Appendix ................................................................................................................... 213

vi
INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

Chapter One
Preliminaries

1.1 Introduction
The typical introductory courses in hypothesis-testing and confidence interval examine
primarily parametric statistical procedures. A main feature of these statistical procedures is the
assumption that we are working with random samples from normal populations. These
procedures are known as parametric methods because they are based on a particular
parametric family of distributions – in this case, the normal. For example, given a set of
independent observations from a normal distribution, we often want to infer something about
the unknown parameters. Here the t-test is usually used to determine whether or not the
hypothesized value 0 for the population mean should be rejected or not. More usefully, we
may construct a confidence interval for the ‘true’ population mean.
Parametric inference is sometimes inappropriate or even impossible. To assume that
samples come from any specified family of distributions may be unreasonable. For example,
we may not have examination marks for each candidate but know only the numbers of
candidates who obtained the ordered grades A, B+, B, B–, C+, C, D and F. Given these grade
distributions for two different courses, we may want to know if they indicate a difference in
performance between the two courses. In this case it is inappropriate to use the traditional
(parametric) method of analysis.
In this book we describe procedures called nonparametric and distribution-free methods.
Nonparametric methods provide an alternative series of statistical methods that require no or
very limited assumptions to be made about the data. These methods are most often used to
analyse data which do not meet the distributional requirements of parametric methods. In
particular, skewed data are frequently analysed by non-parametric methods, although data
transformation can sometimes make the data suitable for parametric analyses. These
procedures have considerable appeal. One of their advantages is that the data need not be
quantitative but can be categorical (such as yes or no) or rank data.
Generally, if both parametric and nonparametric methods are applicable to a particular
problem, we should use the more efficient parametric method.

1
INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

1.2 Parametric and nonparametric methods


The word statistics has several meanings. It is used to describe a collection of data and also to
designate operations that may be performed with primary data. The scientific discipline called
statistical inference uses observed data – in this context called a sample – to make inference
about a larger observable collection of data called a population. We associate distributions
with populations. For example, if the random variable which describes a population is
N (,  2 ), then we say that the population is N (,  2 ).
Parametric methods are often those for which we know that the population is normal, or
we can approximate using a normal distribution after we invoke the central limit theorem.
Ultimately the classification of a method as parametric depends upon the assumptions that are
made about a population. A few parametric methods include the testing of a statistical
hypothesis about a population mean under two different conditions:
1. when sampling is from a normally distributed population with known variance,
2. when sampling is from a normally distributed population with unknown variance.
The nonparametric methods, however, are not based on the underlying assumptions and
thus do not require a population’s distribution to be denoted by specific parameters.

1.3 Parametric versus nonparametric methods


The analysis of data often begins by considering the appropriateness of the normal distribution
as a model for describing the distribution of the population. If this distribution is reasonable,
or if the normal approximation is deemed adequate, then the analysis will be carried out using
normal-theory methods. If the normal distribution is not appropriate, it is common to consider
the possibility of a transformation of the data. For instance, a simple transformation of the
form Y  log( X ) may yield data that are normally distributed, so that normal-theory methods
may be applied to the transformed data.
If neither of these approaches seems reasonable, there are two ways to proceed. It may be
possible to identify the type of distribution that is appropriate – say, exponential – and then
use the methods that specifically apply to that distribution. However, there may not be
sufficient data to ascertain the form of the distribution, or the data may come from a
distribution for which methods are not readily available. In such situations one hopes not to
make untenable assumptions, and this is where nonparametric methods come into play.

2 Introducing Nonparametric Methods


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

Nonparametric methods require minimal assumptions about the form of the distribution
of the population. For instance, it might be assumed that the data are from a population that
has continuous distribution, but no other assumptions are made. Or it might be assumed that
the population distribution depends on location and scale parameters, but the functional form
of the distribution, whether normal or whatever, is not specified. By contrast, parametric
methods require that the form of the population distribution be completely specified except for
finite number of parameters. For instance, the familiar one-sample t-test for means assumes
that observations are selected from a population that has a normal distribution, and the only
values not known are the population mean and standard deviation. The simplicity of
nonparametric methods, the widespread availability of such methods in statistical packages,
and the desirable statistical properties of such methods make them attractive additions to the
data analyst’s tool kit.

1.4 Classes of nonparametric methods


Nonparametric methods may be classified according to their function, such as two-sample
tests, tests for trends, and so on. This is generally how this book is organized. However,
methods may also be classified according to the statistical ideas upon which they are based.
Here, we consider the ideas that underlie the methods discussed in this book.
The typical introductory course in statistics examines primary parametric statistical
procedures. Recall that these procedures include tests based on the Student’s t-distribution,
analysis of variance, correlation analysis and regression analysis. A characteristic of these
procedures is the fact that the appropriateness of their use for the purpose of inference depends
on certain assumptions. Inferential procedures in analysis of variance, for example, assume
that samples have been drawn from normally distributed populations with equal variances.
Since populations do not always meet the assumptions underlying parametric tests, we
frequently need inferential procedures whose validity do not depend on rigid assumptions.
Nonparametric statistical procedures fill this need in many instances, since they are valid under
very general assumptions. As we shall discuss more fully later, nonparametric procedures also
satisfy other needs of the researcher.
By convention, two types of statistical procedures are treated as nonparametric:
(1) truly nonparametric procedures and (2) distribution-free procedures. Strictly speaking,

Introducing Nonparametric Methods 3


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

nonparametric procedures are not concerned with population parameters. For example, in this
book we shall discuss tests for randomness where we are concerned with some characteristic
other than the value of a population parameter. The validity of distribution-free procedures
does not depend on the functional form of the population from which the sample has been
drawn. It is customary to refer to both types of procedure as nonparametric. Kendal
and Sundrum (1953) discussed the differences between the terms nonparametric and
distribution-free.

1.5 When to use nonparametric procedures


The following are some situations in which the use of a nonparametric procedure is
appropriate.
1. The hypothesis to be tested does not involve a population parameter.
2. The data have been measured on a scale weaker than that required for the parametric
procedure that would otherwise be employed. For example, the data may consist of count
data or rank data, thereby precluding the use of some otherwise appropriate parametric
procedure.
3. The assumptions necessary for the valid use of a parametric procedure are not met. In
many instances, the design of a research project may suggest a certain parametric
procedure. Examination of the data, however, may reveal that one or more assumptions
underlying the test are grossly violated. In that case, a nonparametric procedure is
frequently the only alternative.
4. Results are needed in a hurry and calculations must be done by hand.
The literature in nonparametric statistics is extensive. A bibliography by Savage (1962)
contained some 3 000 entries. An up-to-date bibliography would undoubtedly contain many
times that number.

1.6 Advantages of nonparametric statistics


The following are some of the advantages of the available nonparametric statistical procedures.

4 Introducing Nonparametric Methods


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

1. Make fewer assumptions.


Nonparametric Statistical Procedures are procedures that generally do not need rigid
parametric assumptions with regards to the populations from which the data are taken.

2. Wider scope.
Since there are fewer assumptions that are made about the sample being studied,
nonparametric statistics are usually wider in scope as compared to parametric statistics that
actually assume a distribution.

3. Need not involve population parameters.


Parametric tests involve specific probability distributions (e.g., the normal distribution)
and the tests involve estimation of the key parameters of that distribution (e.g., the mean
or difference in means) from the sample data. However, nonparametric tests need not
involve population parameters.

4. The chance of their being improperly used is small.


Since most nonparametric procedures depend on a minimum set of assumptions, the
chance of their being improperly used is small.

5. Applicable even when data is measured on a weak measurement scale.


For interval or ratio data, you may use a parametric test depending on the shape of the
distribution. Non-parametric test can be performed even when you are working with data
that is nominal or ordinal.

6. Easy to understand.
Researchers with minimum preparation in Mathematics and Statistics usually find
nonparametric procedures easy to understand.

7. Computations can quickly and easily be performed.


Nonparametric tests usually can be performed quickly and easily without automated
instruments (calculators and computers). They are designed for small numbers of data,
including counts, classifications and ratings.

Introducing Nonparametric Methods 5


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

1.7 Disadvantages of nonparametric tests


Nonparametric procedures are not without disadvantages. The following are some of the more
important disadvantages.

1. May Waste Information.


The researcher may waste information when parametric procedures are more appropriate
to use. If the assumptions of the parametric methods can be met, it is generally more
efficient to use them.

2. Difficult to compute by hand for large samples.


For large sample sizes, data manipulations tend to become more laborious, unless
computer software is available.

3. Tables not widely available.


Often special tables of critical values are needed for the test statistic, and these values
cannot always be generated by computer software. On the other hand, the critical values
for the parametric tests are readily available and generally easy to incorporate in computer
programs

1.8 The scope of this book


The emphasis in this book is on the application of nonparametric statistical methods. Wherever
available, the examples and exercises use real data, gleaned primary from the results of
research published in various journals. We hope that the use of real situations and real data
will make the book more interesting to you. We have included problems from a wide variety
of statistical techniques described. We have included, also, a wide variety of statistical
techniques. The techniques we discuss are those most likely to prove helpful to the researcher
and most likely to appear in the research literature. In this text we have covered not only
hypothesis testing, but interval estimation as well.

1.9 Format and organization


In presenting these statistical procedures, we have adopted a format designed to make it easy
for you to use the book. Each hypothesis-testing procedure is broken down into four
components: (1) assumptions, (2) hypothesis, (3) test statistics, and (4) decision rule.

6 Introducing Nonparametric Methods


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

Thus, for a given test, you can quickly determine the assumptions on which the test is
based, the hypotheses that are appropriate, how to compute the test statistic, and how to
determine whether to reject the null hypothesis. First, we discuss these topics in general, and
then we use an example to illustrate the application of the test.
Where appropriate for a given test, we discuss ties, the large-sample approximation, and
the power efficiency. For each procedure, we cite references that you may consult if you are
interested in learning more about the procedure or in further pursuing a related topic. Finally
we provide exercises for each procedure. These exercises serve two purposes: They illustrate
appropriate uses of a test, and they give you a chance to determine whether you have mastered
the computational techniques, and learnt how to set the hypotheses and use the applicable
decision rule.
In the remaining chapters, we cite two types of reference: those that are cited in the body
of the text and refer you to the statistical literature, and those that are cited in the examples and
exercises and refer you to the research literature.

References
Armitage, P. (1971). Statistical Methods in Medical Research, Oxford and Edinburgh:
Blackwell Scientific Publications.
Colton, T. (1974). Statistics in Medicine, Boston: Little Brown.
Dunn, Olive J., (1964). Basic Statistics: A Primer for the Biomedical Sciences, New York:
Wiley.
Kendall, M. G. and Sundrum (1953). Distribution-Free Methods and Order Properties. Rev.
Int. Statist. Inst. 21, 124 – 134.
Savage, I. R. (1962). Bibliography on Nonparametric Statistics. Harvard University Press.
Remington, R. D. and Schork, M. A. (1970). Statistics with Applications to the Biological and
Health Sciences, Englewood Cliffs, N.J.: Prentice-Hall.

Introducing Nonparametric Methods 7


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

Chapter Two
One-Sample Nonparametric Methods

2.1 Introduction
In classical parametric tests (which assume that the population from which the sample data
have been drawn is normally distributed), the parameter of interest is the population mean. In
this chapter, we shall be concerned with the nonparametric analog of the one-sample z and t
tests. These are nonparametric procedures (which utilize data consisting of a single set of
observations) that are appropriate when the location parameter is the median, rather than the
mean.
Several nonparametric procedures are available for making inferences about the median.
Two of the nonparametric tests which are useful in situations where the conditions for
the parametric z and t tests are not met, are the one-sample sign test and the Wilcoxon
signed-ranks test.
Recall that the median of a set of data is defined as the middle value when data are
arranged in order of magnitude. For continuous distributions, we define the median as the
point  for which the probability that a value selected at random from the distribution is less
than , and the probability that a value selected at random from the distribution is greater than
, are both equal to 1 2. When the population from which the sample has been drawn is
symmetric, any conclusions about the median are applicable to the mean, since in symmetrical
distributions the mean and the median coincide.
In this chapter, we shall also discuss procedures for making inferences concerning the
population proportion and testing for randomness and the presence of trend.
Wherever possible, we shall observe the following format in presenting the hypothesis-
testing procedures.

1. Assumptions
We list the assumptions necessary for the validity of the test, and describe the data on
which the calculations are based.
2. Parameter of interest
From the problem context, we identify the parameter of interest.

8
INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

3. Hypotheses
We state the null hypothesis H 0 and the alternative hypothesis H1.

4. Test statistic
We write down a formula or direction for computing the relevant test statistic. When we
give a formula, we describe the methodology for evaluating it.

5. Significance level
We choose a significance level .

6. Decision rule
We determine the critical region. The Appendix gives appropriate tables for the distribution
of the test statistic. From these tables, we can determine the critical values of the test statistic
corresponding to the chosen .

7. Value of the test statistic


We compute the value of the test statistic from the sample data.

8. Decision
If the computed value of the test statistic is as extreme as or more extreme than a critical
value, we reject H 0 and conclude that H1 is true. If we cannot reject H 0 , we conclude that
there is not enough information to warrant its falsity.

2.2 The one-sample sign test


The sign test is perhaps the oldest of all nonparametric procedures. Let X1, X 2 , ..., X n be an
observed random sample of size n from a population with median . The sign test utilizes only
the signs of the differences between the observed values X i and the hypothesized median 0 .
Thus, the data is converted into a series of plus (+) and minus (–) signs.

2.2.1 Assumptions
1. The sample available for analysis is a random sample of independent measurements from
a population with an unknown median .
2. The variable of interest is measured on at least an ordinal scale.
3. The variable of interest is continuous.

One-Sample Nonparametric Methods 9


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

2.2.2 Hypotheses
The hypothesis to be tested concerns the value of the population median. To test the hypothesis
H0:   0
where 0 is a specified median value, against a corresponding one-sided or two-sided
alternative, we use the Sign Test. The test statistic S depends on the alternative hypothesis,
H1.

(a) One-sided test


For a one sided test, the alternative hypothesis is either H1:   0 or H1:   0 .
(i) If we wish to test
H0:   0 against
H1:   0 ,
then the test statistic is defined by
S  N,
where N  = Number of observations X i greater than 0
= Number of +signs when the differences X i  0 are computed,
i = 1, 2, ...n.
If the alternative hypothesis is true, then we should expect X i  0 to
yield significantly fewer positive (+) signs than negative (−) signs. Thus, a smaller
number of (+) signs leads to the rejection of H 0 . When H 0 is true, we expect the
number of (−) signs to be equal to that of the (+) signs and hence
P( S  0 )  P( S  0 )  12 .
Thus, when H 0 is true, S has the binomial distribution with parameters n and 12 .
That is,

S b n, 12 . 
Decision rule
The p-value of the test is defined by
p  P  S  so H 0 is true  ,
where so is the observed value of the test statistic S. We reject H 0 at significance
level  if p  .

10 One-Sample Nonparametric Methods


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

(ii) For a one-sided test, we test


H0:   0 against
H1:   0
The test statistic is
S  N
where N  = Number of observations less than 0
= Number of –signs when the differences X i  0 are computed,
i = 1, 2, ...n.
If the alternative hypothesis is true, then we should expect X i  0 to yield less
negative (−) signs than would be expected if the null hypothesis were true. Likewise,
when H 0 is true, S has the binomial distribution with parameters n and 1 . That is,
2
S 
b n, 12 .
Decision rule
The p-value of the test is defined by
p  P  S  so H 0 is true  ,
where so is the observed value of S  N  . We reject H 0 at significance level  if
p  .

(b) Two-sided test


If we wish to test
H0:   0 against
H1:   0 ,
then the test statistic is defined by

S  min N  , N  , 
where N  is the number of –signs and N  is the number of +signs when the differences
X i  0 are computed.
We should reject the null hypothesis if we have too few negative (–) signs or too few
positive (+) signs. When H 0 is true, S has the binomial distribution with parameters n
and 12 .

One-Sample Nonparametric Methods 11


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

Decision Rule
The p-value of the test is defined by
p  2P  S  so H 0 is true  ,
where so is the observed value of the test statistic S. We reject H 0 at significance level
 if p  .

Problem with zero differences


 We assume that the variable of interest is continuous. Therefore, in theory, no zero
differences should occur when we compute xi  0 .
 In practice, however, zero differences do occur. The usual procedure is to discard
observations leading to zero differences and reduce n accordingly. In that case the
hypothesis may be re-stated in probability terms. For example, a two-sided case will
have its null hypothesis as
P  X  0   P  X  0   0.5.

Example 2.1
Appearance transit times for 11 patients with significantly occluded right coronary arteries are
given below:
Subject 1 2 3 4 5 6 7 8 9 10 11
Transit time (in sec) 1.80 3.30 5.65 2.25 2.50 3.50 2.25 3.10 2.70 2.70 3.00
Can we conclude, at the 0.05 level of significance, that the median appearance transit time in
the population from which the data were drawn, is different from 3.50 seconds?

Solution
The parameter of interest is , the median appearance transit time in the population. We wish
to test the hypothesis
H0:   3.50 against
H1:   3.50,
at the   0.05 level of significance. Since this is a two-sided test, the test statistic is

S = min N  , N  , 
where N  is the number of observations less than 3.50 and N  is the number of observations
greater than 3.50. When H 0 is true, S b 10, 0.5 .

12 One-Sample Nonparametric Methods


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

Note: We discard one observation which has the same value as the hypothesized median,
leaving us with a usable sample size of 10.
Let so be the observed value of the test statistic. We reject H 0 at the 0.05 level of significance
when p  0.05, where
p  2P  S  so 10 , 0.5 .

1 2 3 4 5 6 7 8 9 10
Xi 1.80 3.30 5.65 2.25 2.50 2.25 3.10 2.70 2.70 3.00
Sign of X i  3.50 – – + – – – – – – –

From the above table, N   9 and N   1. The observed value of the test statistic is therefore
given by
so = min 9, 1  1.
Since this is a two-sided test, the p-value of the test is given by
p  2P  S  110 , 0.5  2  0.0107  0.0214.
Since the p-value of the test, 0.0214, is less than 0.05, we reject H 0 at the 0.05 level of
significance and conclude that the population median is not 3.50.

Example 2.2
The following data are IQs of arrested drug abusers who are aged 16 years or older. Is there
any evidence that the median IQ of drug abusers in the population is greater than 107?
Use   0.05.
99 100 90 94 135 108 107 111 119 104 127 109 117 105 125

Solution
The parameter of interest is , the median IQ of drug abusers in the population. We wish to
test the hypothesis
H0:   107 against
H1:   107,
at the   0.05 level of significance. The test statistic is
S = N
where N  is the number of observations less than 107. When H 0 is true, S b 14, 0.5 .

One-Sample Nonparametric Methods 13


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

Note: We discard one observation which has the same value as the hypothesized median,
leaving us with a usable sample size of 14.
Let so be the observed value of the test statistic. We reject H 0 at the 0.05 level of significance
when p    0.05, where the p-value of the test is given by
p  P  S  so 14 , 0.5 .
The following table gives the signs of X i  107.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
99 100 90 94 135 108 111 119 104 127 109 117 105 125
– – – – + + + + – + + + – +

Here, N   6 and N   8. The observed value of the test statistic is min(6, 8)  6. Thus,
so = 6.
Since this is a one-sided test, the p-value of the test is given by
p  P  S  6 14 , 0.5  0.3953.
Since the p-value of the test, 0.3953, is greater than 0.05, we fail to reject H 0 at the 0.05 level
of significance. Hence, there is not enough evidence to conclude that the median IQ of the
subjects in the population is greater than 107.

2.2.3 Large sample approximation


If the sample size is larger than 15, we can use the normal approximation to the binomial
 
b n, 12 , then it can be
distribution with a continuity correction. Thus, if n is large and S
shown that S is approximately normally distribution with mean np and variance np (1  p ).
That is, S N  np, np(1  p)  . Thus, for the sign test, when p  12 and n > 15, we can use the
test statistic
S  12 n S  12 n
Z   1 n
. ……………………………………………….....(2.1)
n  12  12 2

When H 0 is true and n  15, Z is approximately N (0, 1). For the large sample
approximation, it is common to use a continuity correction, by replacing S by S  1 2 in the
definition of Z. Equation (2.1) then becomes

Z 
 S  12   12 n . …………………………………………………………..(2.2)
1 n
2

14 One-Sample Nonparametric Methods


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

Example 2.3
The following data gives the ages, in years, of a random sample of 20 students from Besease
Senior High School. It is believed that the median age of students in this school is smaller than
22 years. Based on these data, is there sufficient evidence to conclude that the median age of
students from Besease Senior High School is smaller than 22 years?
9 13 16 16 16 17 18 19 19 19
19 20 20 21 21 23 24 25 25 27

Solution
The parameter of interest is , the median age of students from Besease Senior High School.
We are interested in testing the null hypothesis
H0:   22 against
H1:   22.
The test statistic is
S  N,
where N  = number of observations X i greater than 22
= number of +signs when the differences X i  22 are computed, i = 1, 2, ...20.
When H 0 is true, S  
b 20, 12 . Since n > 15, we use the normal approximation to the
binomial distribution with a continuity correction. The test statistic then becomes
S  0.5   0.5  20
Z .
0.5 20
When H 0 is true, Z is N(0, 1). Let zo denote the observed value of the test statistic Z. We
reject H 0 at the 0.05 level of significance when zo  z  z0.05  1.645. The following table
gives the signs of X i  22.

9 13 16 16 16 17 18 19 19 19
– – – – – – – – – –
19 20 20 21 21 23 24 25 25 27
– – – – – + + + + +

From the above table, N   5. Thus, the observed value of the statistic S is 5. This gives,
 5  0.5  0.5  20
zo   2.0125.
0.5 20

One-Sample Nonparametric Methods 15


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

Since 2.0125 is less than 1.645, we reject H 0 at the 0.05 level of significance and conclude
that the median age of students of Besease Senior High School is less than 22 years.

2.2.4 Confidence interval for the median based on the sign test
 The 100(1  ) confidence interval for  consists of those values of 0 for which we
would not reject a two-sided null hypothesis H0:   0 at the  level of significance.
 We designate the lower limit of our confidence interval by  L and the upper limit by U .
 We determine the largest positive or negative signs, (i.e. the value s) such that
P  S  s n, 0.5  2 .
 When the data values are arranged in order of magnitude, the ( s  1) th observation
is  L . To find U , the upper limit of the confidence interval, we count the ordered sample
values backwards from the largest. The ( s  1) th observation from the largest value locates
U . i.e. U  (n  s)th value.

Example 2.4
Construct a 95% confidence interval for the median of the population from which the
following sample data have been drawn, using the sign test.
0.07 0.69 1.74 1.90 1.99 2.41 3.07 3.08
3.10 3.57 3.71 4.01 8.11 8.23 9.10 10.16

Solution
 The point estimate of the population median is the sample median which is the mean of
the two middle values in the ordered array. Thus,
the sample median = 3.08 2 3.10  3.09.
 To find L , we consult a table of the binomial distribution and find that
P  S  3 16, 0.5  0.0105 and P  S  4 16, 0.5  0.0383.
 Thus, we note that we cannot obtain an exact 95% confidence interval for the median.
Since 100[1 – 2(0.0105)] = 97.9, which is larger than 95 and 100[1 – 2(0.0383)] = 92.34,
which is smaller than 95.
 This method of constructing confidence intervals for the median does not usually yield
intervals with exactly the usual coefficients of 0.90 , 0.95 , and 0.99.

16 One-Sample Nonparametric Methods


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

 In practice, we choose between a wider interval and a higher confidence or the narrower
interval and lower confidence.
 Suppose we choose s  4, then s 1  5. Therefore the 5th value in the ordered array is
 L and the 12th (i.e. 16 – 4) value in the ordered array is U .
 Thus  L  1.99 and U  4.01. .
 The confidence coefficient is therefore 100[1 – 2(0.0383)] = 92.34. We say that we are
92.34% confident that the population median is between 1.99 and 4.01.

Large Sample Approximation


We find k such that
P  S  k n, 0.5  2 and
 S  1n k  1n 
P  1 2  1 2   2
 2 n 2
n 

 k  1n 
 P  Z  1 2   2
 2
n 

 P  Z  z 1    
 2 
2
k  12 n
where Z is N(0, 1) and z 1   1 n
.
2 2
Making k the subject of the above equation, we obtain
 
k  12 n  12 z  n  12  n  z  n  .
2  2 
Approximately k  s 1. If the resulting value is not an integer, we use the closest integer.

Example 2.5
Refer to Example 2.3. Construct a 95% confidence interval for .

Solution
Here, n  20, z   z0.025  1.96.
2


k  s  1  12 20  1.96 20   5.6 6. and s  5.

One-Sample Nonparametric Methods 17


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

Therefore the 6th observation in the ordered array is  L and the (20  5) th  15th observation
in the ordered array is U . Thus, L  16 and U  23. Hence the 95% confidence interval
for  is 16    23.

2.3 The Wilcoxon signed-ranks test


As we have seen, the sign test utilizers only the signs of the differences between observed
values and the hypothesized median. For testing H0:   0 , there is another procedure that
uses the magnitude of the differences when these are available. The Wilcoxon signed-ranks
procedure makes use of additional information to rank the differences between the sample
measurements and the hypothesized median. The Wilcoxon signed-ranks test uses more
information than the sign test, making it a more powerful test when the sampled population is
symmetric. However, the sign test is preferred when the sampled population is not symmetric.

2.3.1 Assumptions
1. The sample available for analysis is a random sample of size n from a population with an
unknown median .
2. The variable of interest is measured on a continuous scale.
3. The sampled population is symmetric.
4. The scale of measurement is at least interval.
5. The observations are independent.

2.3.2 Hypotheses
The parameter of interest is , the population median. To test the hypothesis
H0:   0
where 0 is the hypothesized median, against a corresponding one-sided or two-sided
alternative, we can also use the Wilcoxon signed-ranks test.

2.3.3 Test statistic


To obtain the test statistic, we use the following procedure.
1. Subtract the hypothesized median 0 from each observation X i , that is, for each
observation X i , find
Di  X i  0 ,  i  1, 2, ..., n.

18 One-Sample Nonparametric Methods


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

2. If any observation X i is equal to the hypothesized median, 0 , eliminate it from the


calculations and reduce the sample size accordingly.
3. Rank the differences Di , from the smallest to largest without regard to their signs. If two
or more Di are tied, assign each tied value the mean of the rank positions of the tied
differences.
4. Assign to each rank the sign of the difference of which it is ranked.
5. Obtain the sum of the ranks with positive signs; call it W  . Obtain the sum of the ranks
with negative signs; call it W  .
6. Note that:
n ( n  1)
W  2
 W .
7. For a given sample, we do not expect W  to be equal to W  .

2.3.4 Carrying out the Wilcoxon signed ranks test


When the null hypothesis,
H0:   0 ,
is true, we do not expect a great difference between W  and W  . Consequently, a sufficiently
small value of W  or a sufficiently small value of W  causes us to reject H 0 .
(a) One-sided test: To test
H0:   0 , against
H1:   0
at the α level of significance.
Test statistic
A sufficiently small value of W  leads to the rejection of the null hypothesis H 0 . The test
statistic therefore is
W  W .
Decision rule
We reject H 0 at significance level  if the observed W value wo , is less than or equal to
the tabulated W value for n and a preselected .
(b) One-sided test: To test
H0:   0 , against
H1:   0

One-Sample Nonparametric Methods 19


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

at the α level of significance.

Test statistic
For a sufficiently small W  value, we reject H 0 . The test statistic therefore is
W  W ,
since a small value causes us to reject the null hypothesis.

Decision rule
We reject H 0 at significance level  if the observed W value, wo , is less than or equal to
the tabulated W value for n and a preselected value of .

(c) For a two-sided test, we test


H0:   0 , against
H1:   0
at the α level of significance.

Test statistic
The test statistic is

W  min W  , W  , 
since a small value of either W  or W  causes us to reject the null hypothesis.

Decision rule
We reject H 0 at significance level  if the observed W value, wo , is less than or equal to
the tabulated W value for n and a preselected value of 2 .

The distribution of W
1. The smallest value W can take is zero (0) and the largest value that W can take is the sum
n ( n  1)
of the integers from 1 to n: that is, 2
. W is therefore a discrete random variable
n ( n  1)
whose support ranges between 0 and 2
.
2. It can be shown that the probability mass function of the discrete random variable W is
given by
c ( w) n ( n  1)
P(W  w)  f ( w)  , 0<w< 2
,
2n
where c(w) = the number of possible ways to assign a +sign or a −sign to the first n integers
so that the sum of the ranks with +signs (or –signs) is equal to w.

20 One-Sample Nonparametric Methods


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

Example 2.6
The following are the systolic blood pressures (mmHg) of 13 patients undergoing a drug
therapy for hypertension:
183 178 152 157 194 163 144 114 179 150 118 158 165
Can we conclude on the basis of these data that the median systolic blood pressure is less than
165 mmHg? Take α = 0.05.

Solution Table 2.1: Computation of test statistic


The parameter of interest is , the median Di Ranks Ranks
No. X i
systolic blood pressure of the population. We X i  165 (–) (+)
wish to test the hypothesis
1 114 -51 12
H0:   165 against
2 118 -47 11
H1:   165.
3 144 -21 10
at the   0.05 level of significance. Using the 4 150 -15 7
Wilcoxon signed rank test, the test statistic is
5 152 -13 4.5
W  W , 6 157 -8 3

where W is the sum of the ranks with positive 7 158 -7 2
signs. 8 163 -2 1
We reject H 0 at the 0.05 level of 9 178 13 4.5
significance if wo  w12, 0.05  17, where wo 10 179 14 6
is the observed value of the test statistic. 11 183 18 8
From Table 2.1, W   50.5 and 12 194 29 9
50.5 27.5
W   27.5. The value of the test statistic is
therefore wo  27.5. Since 27.5 > 17, we fail to reject H 0 . We conclude that the median
systolic blood pressure of the subjects in the population is not less than 165 mmHg.

Example 2.7
Refer to Example 2.2. Use the Wilcoxon signed-ranks test to determine if there is any evidence
that the median IQ of drug abusers in the population is different from 107. Use   0.05.

One-Sample Nonparametric Methods 21


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

Solution Table 2.2: Computation of test statistic


Let  denote the median IQ of drug abusers
No. IQ Di Ranks (–) Ranks (+)
who are aged 16 years or older. We wish to
test the hypothesis 1 90 -17 11
H0:   107 against 2 94 -13 10
H1:   107. 3 99 -8 7
at the   0.05 level of significance. The test 4 100 -7 6
statistic is
5 104 -3 4

W  min W  , W  .  6 105 -2 2.5
where W  and W  are the sums of the ranks 7 108 1 1
with negative and positive signs, respectively. 8 109 2 2.5
We reject H 0 at the 0.05 level of 9 111 4 5
significance if wo  w14, 0.025  21, where 10 117 10 8
wo is the observed value of the test statistic. 11 119 12 9
From Table 2.2, W   40.5 and 12 125 18 12
W   64.5. The value of the test statistic is 13 127 20 13
wo  40.5. 14 135 28 14
Since 40.5 > 21, we fail to reject H 0 . We 40.5 64.5
conclude that the median IQ of the subjects in the population may be 107.

2.3.5 Large sample approximation


Theorem 2.1
When the null hypothesis is true,
and

Proof
n
When H 0 is true, W can be defined as W   Wi where
i 1

22 One-Sample Nonparametric Methods


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

 Wi  0 with probability 12
 Wi  i with probability 12 .
Thus,

   
n n n ( n  1) n n( n  1)
E (W )   E (Wi )   0 12  i 12   12  i  12  2  .
i 1 i 1
  i 1
4

Since W1, W2 , ..., Wn are independent,


n
V (W )   V (Wi )
i 1

     2i 
2
V (Wi )  E (Wi2 )   E (Wi )   0 2 12  i 2 12  
2
 12 i 2  14 i 2  14 i 2 .
 
n n n ( n  1)(2 n  1) n ( n  1)(2 n  1)
V (W )   14 i 2  14  i 2  14  6
 24
.
i 1 i 1

Theorem 2.2
When the null hypothesis is true, for large n:

follows an approximate standard normal distribution N(0, 1).

Proof
n ( n  1) n ( n  1)(2 n  1)
If W is a random variable with mean 4
and variance 24
, then by the central
limit theorem,
n ( n  1)
W 
Z 4
n ( n  1)(2 n  1)
24
is approximately N(0, 1).

Adjustment for Ties


 We can incorporate an adjustment for ties among nonzero differences in the large sample
approximation in the following way.
 Let t be the number of absolute differences tied for a particular nonzero rank. Then the
correction factor is

One-Sample Nonparametric Methods 23


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

 t3   t .
48
 We can subtract this quantity from the expression in the denominator under the square root
sign.
 Thus the adjusted statistic for a large sample approximation is
n ( n  1)
W 
Z 4 .
n ( n  1)(2 n  1) 3
24
  t 48  t
 We illustrate the calculation of an adjustment for ties in the following data:

Table 2.3: Computation of correction factor


Observation Rank t t3
3 1.5 2 8
3 1.5
4 3
6 5
6 5 3 27
6 5
8 7.5 2 8
8 7.5
9 10.5
9 10.5
4 64
9 10.5
9 10.5
11 107

 t 3   t  107  11  2.
48 48

Example 2.8
The following data show the life span, in years, of a random sample of 21 recorded deaths in
a certain country. It has been known in the past years that the median life span in the country
is 50 years. Can we conclude from these data that the median life span in the country has
improved? Use α = 0.05

24 One-Sample Nonparametric Methods


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

39 42 42 47 47 53 59 59 59 60 62
65 66 68 69 70 72 75 75 85 90

Solution Table 2.4: Computation of test statistic


Let  denote the median life span in the Xi Di
Di Rank t t3
population . We wish to test the hypothesis 1 39 -11 11 10
H0:   50 against 2 42 -8 8 4.5
2 8
H1:   50. 3 42 -8 8 4.5
The test statistic is 4 47 -3 3 2
n ( n  1) 5 47 -3 3 2 3 27
W 
Z 4 . 6 53 3 3 2
n ( n  1)(2 n  1) 3
t  t 7 59 9 9 7
24
 48
8 59 9 9 7 3 27
When H 0 is true, W is N(0, 1). 9 59 9 9 7
Reject H 0 at the 0.05 level of significance if 10 60 10 10 9
z  z0.05  1.645, where z is the computed 11 62 12 12 11
12 65 15 15 12
value of Z. From Table 2.4,
13 66 16 16 13
 t  10,  t 3  70, W 1  23 W   210 14 68 18 18 14
The value of the test statistic is 15 69 19 19 15
23  21 4 22 16 70 20 20 16
wo   3.2175.
21  22  43  70  10 17 72 22 22 17
24 48
18 75 25 25 18.5
Since –3.2175 < –1.645, we reject H 0 at the 0.05 2 8
19 75 25 25 18.5
level of significance. We therefore conclude that, 20 85 35 35 20
the median life span in the country has improved 21 90 40 40 21
significantly. 10 70

2.3.6 Confidence Interval for the Median, based on the Wilcoxon Signed-Ranks Test
Arithmetic Procedure
Step1: Find the means, uij , of all possible pairs of observation xi and x j from the sample
observation x1, x2 ,..., xn , that is
xi  x j
uij  2
, 1  i  j  n.

One-Sample Nonparametric Methods 25


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

n ( n  1)
There are 2
 n such averages, distributed symmetrically about the median.
Step 2: Arrange the uij in an increasing order of magnitude.
Step 3: The median of the uij ' s is a point estimate of the population median.
Step 4: Find, from the Wilcoxon Signed Ranks Test table, t  wn, p corresponding to the
sample size n and appropriate value of p as determined by the desired confidence
level. When the confidence coefficient is 1    , then p   2. If the exact value
of p cannot be found in the Wilcoxon signed ranks test table, we choose a closer
neighbouring value.
Step 5: The end points of the confidence interval are the kth smallest and kth largest values of
uij ' s where k = t + 1, where t is either value in the column labelled T corresponding
to n and the value of p selected (see Wayne, 1978).

Example 2.9
Determine the 95% confidence interval for the population median by the Wilcoxon Signed-
ranks procedure using the following data:
26 25 29 41 29 32 32 40 26 29

Solution
All the 55 possible pairs of means from the observations are given in the Table 2.5.
Table 2.5: All possible pairs of means from the observations
25 26 28 29 29 29 32 38 40 41
25 25.0
26 25.5 26.0
28 26.5 27.0 28.0
29 27.0 27.5 28.5 29.0
29 27.0 27.5 28.5 29.0 29.0
29 27.0 27.5 28.5 29.0 29.0 29.0
32 28.5 29.0 30.0 30.5 30.5 30.5 32.0
38 31.5 32.0 33.0 33.5 33.5 33.5 35.0 38.0
40 32.5 33.0 34.0 34.5 34.5 34.5 36.0 39.0 40.0
41 33.0 33.5 34.5 35.0 35.0 35.0 36.5 39.5 40.5 41.0
Thus, a point estimate of the population median  is the 28th observation of the ordered data
in Table 2.5. This is 32. From the Wilcoxon signed ranks test table, t  w10, 0.025  8. Thus,

26 One-Sample Nonparametric Methods


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

k = t + 1 = 9. Therefore the 9th observation in the ordered array in Table 2.5 is the lower limit
 L and the 9th observation from the largest value locates the upper limit U . Thus,
L  28.5 and U  35.0. Therefore the 95% confidence interval for  is 28.5    35.0.

Large Sample Approximation


With samples larger than 30, we cannot use the Wilcoxon signed-ranks table to determine k.
A large sample approximation of k is however given by (see Wayne, 1978)
n( n 1) n(n  1)(2n  1)
k 4
 z1  .
2 24

Exercise 2(a)
1. The median age of the onset of diabetes is thought to be 45 years. The ages at onset of a
random sample of 16 people with diabetes are:
26.2 30.5 35.5 38.0 39.8 40.3 45.0 45.6
45.9 46.8 48.9 51.4 52.4 55.6 60.9 65.4
Perform the
(a) sign test, (b) Wilcoxon signed-ranks test,
to determine if there is any evidence to conclude that the median age of the onset of
diabetes differs significantly from 45 years. Take α = 0.05.
2. Recent studies of the private practices of physicians who saw no Medicaid patients
suggested that the median length of each patient visit was 22 minutes. It is believed that
the median visit length in practices with a large Medicaid load is shorter than 22 minutes.
A random sample of 20 visits in practices with a large Medicaid load yielded, in order,
the following visit lengths:
9.4 13.4 15.6 16.2 16.4 16.8 18.1 18.7 18.9 19.1
19.3 20.1 20.4 21.6 21.9 23.4 23.5 24.8 24.9 26.8
(a) Use the large sample approximation of the sign test to determine if there is
sufficient evidence to conclude, at the 1% level of significance, that the average visit
length in practices with a large Medicaid load is shorter than 22 minutes?
(b) Based on the sign test, construct a 95% confidence interval for the median visit length
in practices with a large Medicaid load.
3. The following are the blood glucose levels of 12 patients who attend St. Thomas Hospital:

One-Sample Nonparametric Methods 27


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

86 100 120 90 101 98 109 108 93 107 99 110


Perform the Wilcoxon signed ranks test to determine if we can conclude on the basis of
these data that the average glucose level in the population is greater than 96 mg/dl? Take
α = 0.05.
4. From a random sample of 14 students from Accra Catholic Senior High School, the body
masses of 9 students were found to be less than 38 kg whilst those of 4 students exceeded
38 kg with the remaining students recording exactly 38 kg. Can we conclude, based on a
sign test, that the average body mass of students from the school is less than 38 kg?
5. In a sample of 25 adolescents who served as the subjects in an immunologic study, one
variable of interest was the diameter of skin test reaction to an antigen. The sample
observations, in mm erythema, were as follows:
16.0 17.0 18.0 19.0 20.0 21.0 22.0 22.0 22.0 23.0 24.0 26.0 27.0
28.0 29.0 30.0 30.0 31.0 32.0 33.0 34.0 35.0 36.0 36.0 37.0

Use the large sample approximation of the Wilcoxon signed ranks test to determine if
we can conclude from these data that the population average is less than 30 mm.
Take α = 0.05.
6. Barrett (1991) reported data on eight cases of umbilical cord prolapse. The maternal ages
were 25, 28, 17, 26, 27, 18, 25, and 30.
(a) Perform the Wilcoxon signed ranks test to determine if there is enough evidence,
based on the data, that the average age of the population from which the sample may
be presumed to have been drawn is greater than 20 years. Take α = 0.01.
(b) Based on the Wilcoxon signed ranks test, construct a 99% confidence interval for the
population median.
7. Out of a random sample of 100 recorded deaths in a certain country during the past year,
68 of them were more than 65 years whilst the remaining 32 were below 65 years. Perform
a sign test to determine if we can we conclude that the average life span in the country is
greater than 65 years. Use α = 0.05.
8. Recent studies of the private practices of physicians who saw no Medicaid patients
suggested that the median length of each patient visit was 22 minutes. It is believed that
the median visit length in practices with a large Medicaid load is shorter than 22 minutes.
A random sample of 20 visits in practices with a large Medicaid load yielded, in order, the
following visit lengths:

28 One-Sample Nonparametric Methods


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

9.4 13.4 15.6 16.2 16.4 16.8 18.1 18.7 18.9 19.1
19.3 20.1 20.4 21.6 21.9 23.4 23.5 24.8 24.9 26.8
Based on the large sample approximation of the sign test, is there sufficient evidence to
conclude that the average visit length in practices with a large Medicaid load is shorter
than 22 minutes?
9. To determine whether the median life span of certain spices of animal is greater than 5
years, a random sample of 25 observations were made and life span in years is the
following:
11.3 5.8 3.1 4.1 7.3 4.4 1.4 2.5 6.6 7.6 24.9 30.1 2.9
5.5 7.2 3.2 3.9 7.2 20.1 3.1 6.1 4.9 19.4 4.2 6.3
At 0.05 level of significant, use the large sample approximation of the sign test to
determine if the average life span is greater than 5 years.
10. A physician states that the median number of times he sees each of his patients during the
year is five. In order to evaluate the validity of this statement, he randomly selects ten of
his patients and determines the number of office visits each of them made during the past
year. He obtains the following values for the ten patients in his sample: 9, 10, 8, 4, 8, 3,
0, 10, 15, 9. Do the data support his contention that the median number of times he sees a
patient is five?
11. Moore and Ogletree (1973) investigated the readiness of pupils at the beginning of the
first grade. They compared scores on a readiness test of pupils who had attended a head
start program for a full year with the scores of those who had not. The readiness test scores
of 10 pupils who did not attend a Head Start program are as follows: 33, 19, 40, 35, 51,
41, 27, 55, 39, 21. Can we conclude, based on the Wilcoxon signed ranks test, that the
median score of the population represented by this sample is less than 45.3? Take
 = 0.05.
12. Abu-Ayyash (1972) found that the median education of heads of households living in
mobile homes in a certain area was 11.6 years. Suppose that a similar survey conducted
in another area revealed the educational levels of heads of households as shown in the
following data.
13 6 6 12 12 10 9 11 14 8 7 16 15 8 7
Based on the sign test, can we conclude that the average educational level of the
population represented by this sample is less than 11.6 years? Take  = 0.05.
13. Lenzer et al. (1973) reported the endurance score of animals during a 48-hour session of
discrimination responding. The median score for an animal with electrodes implanted in

One-Sample Nonparametric Methods 29


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

the hypothalamus was 97.5. Suppose that the experiment was duplicated in another
laboratory, except that electrodes were implanted in the forebrain in 12 animals. Assume
that investigators observed the endurance score shown in the following table.
93.6 89.1 97.7 84.4 97.8 94.5 88.3 97.5 83.7 94.6 85.5 82.6
Use the one-sample sign test to see whether the investigators may conclude at the 0.05
level of significance that the median endurance score of animals with electrodes implanted
in the forebrain is less than 97.5.
14. Iwamoto (1971) found that the mean weight of a sample of a particular species of adult
female monkey from a certain locality was 8.41 kg. Suppose that a sample of adult females
of the same species from another locality yielded the weights as shown in the following
table. By using the one-sample sign test, can we conclude, at the 0.05 level of significance,
that the median weight of the population from which this second sample was drawn is
greater than 8.41 kg?
8.30 9.50 9.60 8.75 8.40 9.10 9.25 9.80 10.05 8.15 10.00 9.60 9.80 9.20 9.30

2.4 The binomial test


Inferences concerning proportions are required in many areas. The population proportion is a
parameter of frequent interest in research and decision-making activities. The politician is
interested in knowing what proportion of voters will vote for him in the next election. All
manufacturing firms are concerned about the proportion of defective items when a shipment
is made. A market analyst may wish to know the proportion of families in a certain area who
have central air conditioning. A sociologist may want to know the proportion of heads of
household in a certain area who are women. Many questions of interest to the health worker
relate to the population proportion. What proportion of patients who receive a particular
treatment recover? What proportion of a population has a certain disease?
When it is impossible or impractical to survey the total population, researchers base
decision regarding population proportions, on inferences made by analyzing samples drawn
from the population. As usual, inference may take the form of interval estimation or hypothesis
testing.
Sometimes, we want to draw inferences concerning the total number, the proportion or
percentage of units in the population that possess some characteristic or attribute or fall into
some defined class. A random sample of size n is drawn from a population. Suppose we wish

30 One-Sample Nonparametric Methods


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

to estimate the proportion, p , of units in the population that belong to some definite class in
the population.
Testing hypotheses about population proportions is carried out in much the same way as
for median when the assumptions necessary for the test are satisfied.

2.4.1 Assumptions
1. The data consist of a sample of the outcomes of n repetitions of some process. Each
outcome consists of either a ‘success’ or a ‘failure’. The proportion of the sample having
a characteristic of interest is pˆ  S n an estimate of the population proportion p, where S
is the number of successes (the total number of sampling units with a particular
characteristic of interest).
2. The n trials are independent.
3. The probability of a success p, remains constant from trial to trial.

2.4.2 Hypotheses
One-sided and two-sided tests may be made, depending on the question being asked. In other
words, we can test H 0 : p  p0 against one of the alternatives p  p0 , p  p0 or p  p0 .

(a) One-sided test


H0: p  p0 against
H1: p  p0 .
Test statistic
Since we are interested in the number of successes S, our test statistic is S. When H 0 is
true, S has the binomial distribution with parameters n and p0 . That is S b  n, p0  .

Decision rule
Sufficiently small values of S lead to the rejection of H 0 . Let so denote the observed
value of S. We reject H0 at the α level of significance if the p-value of the test  ,
where
p-value  P  S  so n, p0  .

(b) One-sided test


H0: p  p0 against

One-Sample Nonparametric Methods 31


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

H1: p  p0 .
Test statistic
The test statistic therefore is S. When H0 is true, S b  n, p0  .

Decision rule
For sufficiently large values of S, we reject H 0 . Thus, we reject H0 at α level of

 
significance if the p-value of the test  P S  so n, p0  , where so is the observed value
of S.

(c) Two-sided test


Here, we test
H0: p  p0 against
H1: p  p0 .
Test statistic
The test statistic therefore is S. When H0 is true, S b  n, p0  .

Decision rule
For sufficiently large or sufficiently small values of S, we reject H 0 . The hypothesized
s
proportion is p0 whilst the observed sample proportion pˆ  no , where so is the observed
value of S. The p-value of the test is defined by

p -value  
 o 0
2 P S  s n, p , if pˆ  p ,
 0

 
2 P S  so n, p0 , if pˆ  p0 .

We reject H0 at the α level of significance if the p-value of the test  .

Example 2.10
In a survey of injection drug users in a large city, Coates et al. (1991) found that 2 out of 12
were HIV positive. We wish to know if we can conclude, at the 10% level of significance, that
fewer than 40% of the injection drug users in the sampled population are HIV positive.

Solution
The parameter of interest is p, the proportion of injection drug users in the sampled population
who are HIV positive. We wish to test

32 One-Sample Nonparametric Methods


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

H 0 : p  0.4 against H1: p  0.4


at significance level   0.1. The test statistic is S, the number of injection drug users in the
sample who are HIV positive. When H0 is true, S has the binomial distribution with
parameters n  12 and p = 0.4. Thus,
S b 12, 0.4  .
Let so denote the observed value of the test statistic. We reject H0 at the 0.1 level of
significance if the p -value  0.1, where the p-value  P  S  so 12, 0.4  . Given so  2,
p-value  P  S  2 12, 0.4  0.0834.
Since the p-value, 0.0834 < 0.1, we reject H0 at the 10% level of significance and conclude
that fewer than 40% of the injection drug users in the sampled population are HIV positive.

Example 2.11
A researcher found anterior sub-capsular vacuoles in the eyes of 6 out of 15 diabetic patients.
Using the binomial test, can we conclude that the population proportion with the condition of
interest is greater than 0.2? Use  = 0.05.

Solution
The parameter of interest is p , the proportion of diabetic patients in the population with
anterior sub-capsular vacuoles in the eyes. We wish to test
H 0 : p  0.2 against H1: p  0.2.
The test statistic is S, the number of diabetic patients in the sample with anterior sub-capsular
vacuoles in the eyes. When H0 is true,
S b 15, 0.2  .
Let so denote the observed value of the test statistic. We reject H0 at the 0.05 level of
significance if the p-value of the test  0.05, where p-value  P  S  so 15, 0.2  . Given
so  6,
p-value  P  S  6 15, 0.2   1  P  S  6 15, 0.2   1  0.9819  0.0181.
Since the p-value 0.0181 < 0.05, we reject H0 at the 0.05 level of significance and conclude
that the population proportion p is greater than 0.2.

One-Sample Nonparametric Methods 33


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

2.4.3 Large sample approximation


1. If S is a binomial random variable with parameters n and p0 , then the expectation and
variance of S are given by
E (S )  np0 and V (W )  np0 (1  p0 ).
2. Thus, when the null hypothesis is true, and n is large,
S  np0
Z
np0 (1  p0 )
follows an approximate standard normal distribution, N(0, 1).
3. The normal approximation to the binomial distribution is good if np0  5 and
n 1 – p0   5.
3. Note that the sign-test discussed earlier is a special case of the binomial test, in which
p0  0.5.

Example 2.12
A commonly prescribed drug for relieving nervous tension is believed to be only 60%
effective. Experimental results with a new drug administered to a random sample of 100 adults
who were suffering from nervous tension show that 70 received relief. Is this sufficient
evidence to conclude that the new drug is superior to the one commonly prescribed? Use
α = 0.05.

Solution
The parameter of interest is p, the proportion of adults in the population who received relief
from nervous tension. We wish to test
H 0 : p  0.6 against H1: p  0.6
at α = 0.05 level of significance. The test statistic is
S  np0
Z .
np0 (1  p0 )
Given n  100 and p0  0.6, both np0 and n(1  po ) are greater than 5 and so Z is
approximately N(0, 1) when H0 is true. We reject H0 if z, the computed Z value is greater than
z0.95  1.645 . Now, S = 70 and
z  70  100  0.6  2.0412.
100  0.6  0.4
Since 2.0412 > 1.645, we reject H0 at the 0.05 level of significance. We conclude that the new
drug is superior to the one commonly prescribed.

34 One-Sample Nonparametric Methods


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

2.4.4 Large sample confidence interval for p


If p̂ is the proportion of observations in a random sample of size n that belongs to a class of
interest, then an approximate 100(1 – )% confidence interval of the proportion p of the
population that belongs to this class is (see Ofosu & Hesse, 2011)
pˆ (1  pˆ ) pˆ (1  pˆ )
pˆ  z1 1  n
 p  pˆ  z1 1  n
,
2 2
where p̂  s n is the proportion of the sample with the characteristic of interest.

Example 2.13
In a certain university, the proportion of students who have diabetes mellitus is p. Of the 500
students selected at random from the university, 6 had diabetes mellitus.
(a) Find a point estimate of p. (b) Construct a 90% confidence interval for p.

Solution
(a) A point estimate of p is given by pˆ  6
500
 0.012.
(b) npˆ  6 and n(1  pˆ )  494 . Both npˆ and n(1  pˆ ) are of sufficient magnitude to justify the
use of the formula for constructing a confidence interval for p. To construct a 90%
confidence interval, we put 1    0.90. This gives  = 0.10. From the standard normal
table, we find that z1 1   z0.95  1.645 . Hence a 90% confidence interval for p is
2

 0.988  p  0.012  1.645 0.012  0.988 ,


0.012  1.645 0.012500 500
which simplifies to 0.004  p  0.020.

Exercise 2(b)
1. A researcher found that 66% of a sample of 14 infants had completed the hepatitis B
vaccine series. Can we conclude on the basis of these data that, in the sampled population,
more than 60% have completed the series? Use α = 0.01.
2. A health survey of 12 male inmates 50 years of age and older residing in a state’s
correctional facilities was made. They found that 22% of the respondents reported a history
of venereal disease. On the basis of these findings, can we conclude that in the sampled
population, more than 15% have a history of venereal disease? Use α = 0.05.

One-Sample Nonparametric Methods 35


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

3. The fraction of defective integrated circuits produced in a photolithography process is


being studied. A random sample of 300 circuits is tested, revealing 13 defectives. Use the
data to test Ho: p  0.05 against H1: p  0.05. Use α = 0.05.
4. A commonly prescribed drug for relieving nervous tension is believed to be only 70%
effective. Experimental results with a new drug administered to a random sample of 10
adults who were suffering from nervous tension show that 8 received relief. Is this
sufficient evidence to conclude that the new drug is superior to the one commonly
prescribed? Use α = 0.05.
5. Suppose that, in the past, 40% of all adults favoured capital punishment. Do we have reason
to believe that the proportion of adults favouring capital punishment today has increased
if, in a random sample of 15 adults, 8 favour capital punishment? Use α = 0.05.

2.5 The one-sample runs test for randomness


In many situations we want to know whether we can conclude that a set of observations
constitute a random sample from an infinite population. Test for randomness is of major
importance because the assumption of randomness underlies statistical inference (see Ofosu
& Hesse, 2011). In addition, tests for randomness are important for time series analysis. The
runs test procedure is used to examine whether or not a sequence of sample values is random.
Consider, for example, the following sequence of sample values
21 23 24 27 30 28 27 26 25 23 22 21
Each observation is denoted by a ‘+’ sign if it is more than the previous observation and by a
‘– ‘ sign if it is less than the previous observation as shown in the following table.
21 23 24 27 30 28 27 26 25 23 22 21
+ + + + – – – – – – –
1 2

A run is a sequence of signs of the same kind bounded by signs of other kind. In this case, we
doubt the sequence’s randomness, since there are only two runs.
If the order of occurrence were
25 22 27 23 27 28 21 26 23 30 21 24
– + – + + – + – + – +
1 2 3 4 4 6 7 8 9 10

36 One-Sample Nonparametric Methods


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

we would doubt the sequence’s randomness because there are too many runs (10 in this
instance).
Too few runs indicate that the sequence is not random (has persistency) whilst too many
runs also indicate that the sequence is not random (is zigzag). Let us now consider the one
sample runs test. This procedure helps us to decide whether a sequence of sample values is the
result of a random process.

Assumptions
The data available for analysis consist of a sequence of sample values, recorded in the order
of their occurrence.

Hypotheses
We wish to test
H 0 : The sequence of sample values is random, against
H1: The sequence of sample values is not random.

Test Statistic
The test statistic is R, the total number of runs.

Decision Rule
Since the null hypothesis does not specify the direction, a two-sided test is appropriate. The
critical value, rc , for the test is obtained from Table A.5, in the Appendix, for a given sample
size n and at a desired level of significance α. If rc  lower  ≤ r ≤ rc  upper  , accept H 0 .
Otherwise reject H 0 .

Tied Values
If an observation is equal to its preceding observation, denote it by zero. While counting the
number of runs, ignore it and reduce the value of n accordingly.

Large Sample Sizes


If n  25, then the test statistic can be approximated by
R  R
Z ,
Var( R )

One-Sample Nonparametric Methods 37


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

which is N(0, 1), when H 0 is true, where  R  E ( R )  2n 3 1 and Var( R )  16 n90 29 . We

reject H 0 at the level of significance α if z  z , where z is the computed value of Z.


1 2

Example 2.15
The following are the blood glucose levels of 12 patients who attend St. Thomas Hospital:
Test, at the 0.05 level of significance whether the sequence is random?

86 99 98 90 109 101 100 110 110 93 108 120

Solution
We wish to test
H 0 : The sequence is random, against
H1: The sequence is not random.
The test statistic is
R  the number of runs.
We reject H 0 at the 0.05 level of significance if r  rc  lower  or r  rc  upper  , where r is
the observed value of R and rc is the critical value. It can be seen that:

86 99 98 90 109 101 100 110 110 93 108 120


+ – – + – – + 0 – + +

Here n = 11 and the number of runs r = 7. From the table of critical values for runs up and
down test, rc  lower   4 and rc  upper   10 (see Table A.5, in the Appendix)
Note: Since two consecutive observations are the same, that is 110, we use n = 11 instead
of n = 12.
Since 4  r  10, we fail to reject H 0 at the 0.05 level of significance and therefore conclude
that the sequence is random.

Exercise 2(c)
1. The following data show the average daily temperatures recorded at Accra, Ghana, for 15
consecutive days during June 2017.

38 One-Sample Nonparametric Methods


INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS

Day 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Temperature 28 27 26 27 28 29 29 27 26 25 28 24 25 26 28

Test, at the 0.05 level of significance, if we can conclude that the pattern of temperature
is random?
2. The following data show the inflation rate in Ghana from 2006 to 2017. Test, at the 0.05
level of significance, if we can conclude that the pattern of year inflation is random?

2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
11.7 10.7 16.5 13.1 6.7 7.7 7.1 11.7 15.5 17.2 17.5 12.0

References
Abu-Ayyash, A. Y. (1972). The mobile home: A neglected phenomenon in geographic
research. Geog. Bull., 5, 28 – 30.
Barrett, J. M. (1991). Funic reduction for the management of umbilical cord prolapse.
American Journal of Obstetrics and Gynaecology. 165, 654-657.
Coates, R., Millson, M., Myers, T. (1991). The benefits of HIV Antibody testing of saliva in
field research. Canadian Journal of Public Health, 82, 397-398.
Iwamoto, M. (1971). Morphological studies of Macaca Fuscata: VI, Somatometry. Primates,
12, 151 – 174.
Lenzer, Irmingard I., and White, C. A. (1973). Statistical effects in continuous reinforcement
and successive sensory discrimination situations. Physiolog. Psychol, 1, 77 – 82.
Moore, R. C and Ogletree, E. J. (1973). A comparison of the readiness and intelligence of first
grade children with and without a full year of Head Start training. Education, 93, 266 – 270.
Ofosu, J. B., & Hesse, C. A. (2011). Elementary Statistical Methods. EPP Books Services,
Accra.
Wayne, W. D. (1978). Applied nonparametric statistics. Houghton Mifflin company, London.

One-Sample Nonparametric Methods 39

View publication stats

You might also like