Introduction To Non Parametric Statistical Methods Research Gate
Introduction To Non Parametric Statistical Methods Research Gate
net/publication/322677728
CITATION READS
1 32,971
3 authors:
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
SHORT COURSE IN SURVEY METHODS, DATA MANAGEMENT AND ANALYSIS FOR BUSINESS AND INDUSTRY PROFESSIONALS View project
Report Writing of the 2010 Population and Housing Census of Ghana View project
All content following this page was uploaded by Christian Akrong Hesse on 24 January 2018.
INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS
Copyright © 2017
Akrong Publications Ltd.
ISBN: 978–9988–2–6059–0
Published, 2017
akrongh@yahoo.com.
ii
INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS
PREFACE
A statistical method is called non-parametric if it makes no assumption on the population
distribution or sample size. This is in contrast with most parametric methods in elementary
statistics that assume that the data set used is quantitative, the population has a normal
distribution and the sample size is sufficiently large. In general, conclusions drawn from non-
parametric methods are not as powerful as the parametric ones. However, as non-parametric
methods make fewer assumptions, they are more flexible, more robust, and applicable to non-
quantitative data.
This book is designed for students to acquire basic skills needed for solving real life
problems where data meet minimal assumption and secondly to beef up their reading list as
well as provide them with a “one shop stop” textbook on Nonparametric.
Our Approach
This book is an introduction to basic ideas and techniques of nonparametric statistical methods
and is intended to prepare students of the sciences as well as the humanities, for a better
understanding of some underlying explanations of real life situations. Researchers will find
the text useful since it provides a step-by-step presentation of procedures, use of more practical
data sets, and new problems from real-life situations. The book continues to emphasize the
importance of nonparametric methods as a significant branch of modern statistics and equips
readers with the conceptual and technical skills necessary to select and apply the appropriate
procedures for any given situation.
Written by leading statisticians, Introduction to Nonparametric Statistical Methods,
provides readers with crucial nonparametric techniques in a variety of settings, emphasizing
the assumptions underlying the methods. The book provides an extensive array of examples
that clearly illustrate how to use nonparametric approaches for handling one- or two-sample
location and dispersion problems, dichotomous data, one-way analysis of variance, rank tests,
goodness-of-fit tests and tests of randomness.
A wide range of topics is covered in this text although the treatment is limited to the
elementary level. There are solved, partly solved and unsolved assignments with every section,
to make the student or reader familiar with the methods introduced.
C. A. Hesse
J. B. Ofosu
E. N. Nortey
July, 2017
iii
INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS
CONTENTS
1. Preliminaries............................................................................................................. 1
1.1 Introduction ...................................................................................................... 1
1.2 Parametric and nonparametric methods ........................................................... 2
1.3 Parametric versus nonparametric methods ....................................................... 2
1.4 Classes of nonparametric methods ................................................................... 3
1.5 When to use nonparametric procedures............................................................ 4
1.6 Advantages of nonparametric statistics ............................................................ 4
1.7 Disadvantages of nonparametric tests .............................................................. 6
1.8 The scope of this book ...................................................................................... 6
1.9 Format and organization ................................................................................... 6
iv
INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS
v
INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS
6. Procedures Using Data from Three or More Independent Samples ................... 114
6.1 Introduction ...................................................................................................... 114
6.2 Extension of the median test ............................................................................. 114
6.3 The Kruskal-Wallis one-way analysis of variance by Ranks ........................... 120
6.4 The Jonckheere-Terpstra test for ordered alternatives ..................................... 134
7. Procedures Using Data from Three or More Related Samples ........................... 143
7.1 Introduction ...................................................................................................... 143
7.2 Data from a randomized complete block design .............................................. 144
7.3 Friedman two-way analysis of variance by ranks ............................................ 145
7.4 Page’s test for ordered alternatives ................................................................... 155
vi
INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS
Chapter One
Preliminaries
1.1 Introduction
The typical introductory courses in hypothesis-testing and confidence interval examine
primarily parametric statistical procedures. A main feature of these statistical procedures is the
assumption that we are working with random samples from normal populations. These
procedures are known as parametric methods because they are based on a particular
parametric family of distributions – in this case, the normal. For example, given a set of
independent observations from a normal distribution, we often want to infer something about
the unknown parameters. Here the t-test is usually used to determine whether or not the
hypothesized value 0 for the population mean should be rejected or not. More usefully, we
may construct a confidence interval for the ‘true’ population mean.
Parametric inference is sometimes inappropriate or even impossible. To assume that
samples come from any specified family of distributions may be unreasonable. For example,
we may not have examination marks for each candidate but know only the numbers of
candidates who obtained the ordered grades A, B+, B, B–, C+, C, D and F. Given these grade
distributions for two different courses, we may want to know if they indicate a difference in
performance between the two courses. In this case it is inappropriate to use the traditional
(parametric) method of analysis.
In this book we describe procedures called nonparametric and distribution-free methods.
Nonparametric methods provide an alternative series of statistical methods that require no or
very limited assumptions to be made about the data. These methods are most often used to
analyse data which do not meet the distributional requirements of parametric methods. In
particular, skewed data are frequently analysed by non-parametric methods, although data
transformation can sometimes make the data suitable for parametric analyses. These
procedures have considerable appeal. One of their advantages is that the data need not be
quantitative but can be categorical (such as yes or no) or rank data.
Generally, if both parametric and nonparametric methods are applicable to a particular
problem, we should use the more efficient parametric method.
1
INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS
Nonparametric methods require minimal assumptions about the form of the distribution
of the population. For instance, it might be assumed that the data are from a population that
has continuous distribution, but no other assumptions are made. Or it might be assumed that
the population distribution depends on location and scale parameters, but the functional form
of the distribution, whether normal or whatever, is not specified. By contrast, parametric
methods require that the form of the population distribution be completely specified except for
finite number of parameters. For instance, the familiar one-sample t-test for means assumes
that observations are selected from a population that has a normal distribution, and the only
values not known are the population mean and standard deviation. The simplicity of
nonparametric methods, the widespread availability of such methods in statistical packages,
and the desirable statistical properties of such methods make them attractive additions to the
data analyst’s tool kit.
nonparametric procedures are not concerned with population parameters. For example, in this
book we shall discuss tests for randomness where we are concerned with some characteristic
other than the value of a population parameter. The validity of distribution-free procedures
does not depend on the functional form of the population from which the sample has been
drawn. It is customary to refer to both types of procedure as nonparametric. Kendal
and Sundrum (1953) discussed the differences between the terms nonparametric and
distribution-free.
2. Wider scope.
Since there are fewer assumptions that are made about the sample being studied,
nonparametric statistics are usually wider in scope as compared to parametric statistics that
actually assume a distribution.
6. Easy to understand.
Researchers with minimum preparation in Mathematics and Statistics usually find
nonparametric procedures easy to understand.
Thus, for a given test, you can quickly determine the assumptions on which the test is
based, the hypotheses that are appropriate, how to compute the test statistic, and how to
determine whether to reject the null hypothesis. First, we discuss these topics in general, and
then we use an example to illustrate the application of the test.
Where appropriate for a given test, we discuss ties, the large-sample approximation, and
the power efficiency. For each procedure, we cite references that you may consult if you are
interested in learning more about the procedure or in further pursuing a related topic. Finally
we provide exercises for each procedure. These exercises serve two purposes: They illustrate
appropriate uses of a test, and they give you a chance to determine whether you have mastered
the computational techniques, and learnt how to set the hypotheses and use the applicable
decision rule.
In the remaining chapters, we cite two types of reference: those that are cited in the body
of the text and refer you to the statistical literature, and those that are cited in the examples and
exercises and refer you to the research literature.
References
Armitage, P. (1971). Statistical Methods in Medical Research, Oxford and Edinburgh:
Blackwell Scientific Publications.
Colton, T. (1974). Statistics in Medicine, Boston: Little Brown.
Dunn, Olive J., (1964). Basic Statistics: A Primer for the Biomedical Sciences, New York:
Wiley.
Kendall, M. G. and Sundrum (1953). Distribution-Free Methods and Order Properties. Rev.
Int. Statist. Inst. 21, 124 – 134.
Savage, I. R. (1962). Bibliography on Nonparametric Statistics. Harvard University Press.
Remington, R. D. and Schork, M. A. (1970). Statistics with Applications to the Biological and
Health Sciences, Englewood Cliffs, N.J.: Prentice-Hall.
Chapter Two
One-Sample Nonparametric Methods
2.1 Introduction
In classical parametric tests (which assume that the population from which the sample data
have been drawn is normally distributed), the parameter of interest is the population mean. In
this chapter, we shall be concerned with the nonparametric analog of the one-sample z and t
tests. These are nonparametric procedures (which utilize data consisting of a single set of
observations) that are appropriate when the location parameter is the median, rather than the
mean.
Several nonparametric procedures are available for making inferences about the median.
Two of the nonparametric tests which are useful in situations where the conditions for
the parametric z and t tests are not met, are the one-sample sign test and the Wilcoxon
signed-ranks test.
Recall that the median of a set of data is defined as the middle value when data are
arranged in order of magnitude. For continuous distributions, we define the median as the
point for which the probability that a value selected at random from the distribution is less
than , and the probability that a value selected at random from the distribution is greater than
, are both equal to 1 2. When the population from which the sample has been drawn is
symmetric, any conclusions about the median are applicable to the mean, since in symmetrical
distributions the mean and the median coincide.
In this chapter, we shall also discuss procedures for making inferences concerning the
population proportion and testing for randomness and the presence of trend.
Wherever possible, we shall observe the following format in presenting the hypothesis-
testing procedures.
1. Assumptions
We list the assumptions necessary for the validity of the test, and describe the data on
which the calculations are based.
2. Parameter of interest
From the problem context, we identify the parameter of interest.
8
INTRODUCTION TO
NONPARAMETRIC
STATISTICAL METHODS
3. Hypotheses
We state the null hypothesis H 0 and the alternative hypothesis H1.
4. Test statistic
We write down a formula or direction for computing the relevant test statistic. When we
give a formula, we describe the methodology for evaluating it.
5. Significance level
We choose a significance level .
6. Decision rule
We determine the critical region. The Appendix gives appropriate tables for the distribution
of the test statistic. From these tables, we can determine the critical values of the test statistic
corresponding to the chosen .
8. Decision
If the computed value of the test statistic is as extreme as or more extreme than a critical
value, we reject H 0 and conclude that H1 is true. If we cannot reject H 0 , we conclude that
there is not enough information to warrant its falsity.
2.2.1 Assumptions
1. The sample available for analysis is a random sample of independent measurements from
a population with an unknown median .
2. The variable of interest is measured on at least an ordinal scale.
3. The variable of interest is continuous.
2.2.2 Hypotheses
The hypothesis to be tested concerns the value of the population median. To test the hypothesis
H0: 0
where 0 is a specified median value, against a corresponding one-sided or two-sided
alternative, we use the Sign Test. The test statistic S depends on the alternative hypothesis,
H1.
Decision Rule
The p-value of the test is defined by
p 2P S so H 0 is true ,
where so is the observed value of the test statistic S. We reject H 0 at significance level
if p .
Example 2.1
Appearance transit times for 11 patients with significantly occluded right coronary arteries are
given below:
Subject 1 2 3 4 5 6 7 8 9 10 11
Transit time (in sec) 1.80 3.30 5.65 2.25 2.50 3.50 2.25 3.10 2.70 2.70 3.00
Can we conclude, at the 0.05 level of significance, that the median appearance transit time in
the population from which the data were drawn, is different from 3.50 seconds?
Solution
The parameter of interest is , the median appearance transit time in the population. We wish
to test the hypothesis
H0: 3.50 against
H1: 3.50,
at the 0.05 level of significance. Since this is a two-sided test, the test statistic is
S = min N , N ,
where N is the number of observations less than 3.50 and N is the number of observations
greater than 3.50. When H 0 is true, S b 10, 0.5 .
Note: We discard one observation which has the same value as the hypothesized median,
leaving us with a usable sample size of 10.
Let so be the observed value of the test statistic. We reject H 0 at the 0.05 level of significance
when p 0.05, where
p 2P S so 10 , 0.5 .
1 2 3 4 5 6 7 8 9 10
Xi 1.80 3.30 5.65 2.25 2.50 2.25 3.10 2.70 2.70 3.00
Sign of X i 3.50 – – + – – – – – – –
From the above table, N 9 and N 1. The observed value of the test statistic is therefore
given by
so = min 9, 1 1.
Since this is a two-sided test, the p-value of the test is given by
p 2P S 110 , 0.5 2 0.0107 0.0214.
Since the p-value of the test, 0.0214, is less than 0.05, we reject H 0 at the 0.05 level of
significance and conclude that the population median is not 3.50.
Example 2.2
The following data are IQs of arrested drug abusers who are aged 16 years or older. Is there
any evidence that the median IQ of drug abusers in the population is greater than 107?
Use 0.05.
99 100 90 94 135 108 107 111 119 104 127 109 117 105 125
Solution
The parameter of interest is , the median IQ of drug abusers in the population. We wish to
test the hypothesis
H0: 107 against
H1: 107,
at the 0.05 level of significance. The test statistic is
S = N
where N is the number of observations less than 107. When H 0 is true, S b 14, 0.5 .
Note: We discard one observation which has the same value as the hypothesized median,
leaving us with a usable sample size of 14.
Let so be the observed value of the test statistic. We reject H 0 at the 0.05 level of significance
when p 0.05, where the p-value of the test is given by
p P S so 14 , 0.5 .
The following table gives the signs of X i 107.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
99 100 90 94 135 108 111 119 104 127 109 117 105 125
– – – – + + + + – + + + – +
Here, N 6 and N 8. The observed value of the test statistic is min(6, 8) 6. Thus,
so = 6.
Since this is a one-sided test, the p-value of the test is given by
p P S 6 14 , 0.5 0.3953.
Since the p-value of the test, 0.3953, is greater than 0.05, we fail to reject H 0 at the 0.05 level
of significance. Hence, there is not enough evidence to conclude that the median IQ of the
subjects in the population is greater than 107.
When H 0 is true and n 15, Z is approximately N (0, 1). For the large sample
approximation, it is common to use a continuity correction, by replacing S by S 1 2 in the
definition of Z. Equation (2.1) then becomes
Z
S 12 12 n . …………………………………………………………..(2.2)
1 n
2
Example 2.3
The following data gives the ages, in years, of a random sample of 20 students from Besease
Senior High School. It is believed that the median age of students in this school is smaller than
22 years. Based on these data, is there sufficient evidence to conclude that the median age of
students from Besease Senior High School is smaller than 22 years?
9 13 16 16 16 17 18 19 19 19
19 20 20 21 21 23 24 25 25 27
Solution
The parameter of interest is , the median age of students from Besease Senior High School.
We are interested in testing the null hypothesis
H0: 22 against
H1: 22.
The test statistic is
S N,
where N = number of observations X i greater than 22
= number of +signs when the differences X i 22 are computed, i = 1, 2, ...20.
When H 0 is true, S
b 20, 12 . Since n > 15, we use the normal approximation to the
binomial distribution with a continuity correction. The test statistic then becomes
S 0.5 0.5 20
Z .
0.5 20
When H 0 is true, Z is N(0, 1). Let zo denote the observed value of the test statistic Z. We
reject H 0 at the 0.05 level of significance when zo z z0.05 1.645. The following table
gives the signs of X i 22.
9 13 16 16 16 17 18 19 19 19
– – – – – – – – – –
19 20 20 21 21 23 24 25 25 27
– – – – – + + + + +
From the above table, N 5. Thus, the observed value of the statistic S is 5. This gives,
5 0.5 0.5 20
zo 2.0125.
0.5 20
Since 2.0125 is less than 1.645, we reject H 0 at the 0.05 level of significance and conclude
that the median age of students of Besease Senior High School is less than 22 years.
2.2.4 Confidence interval for the median based on the sign test
The 100(1 ) confidence interval for consists of those values of 0 for which we
would not reject a two-sided null hypothesis H0: 0 at the level of significance.
We designate the lower limit of our confidence interval by L and the upper limit by U .
We determine the largest positive or negative signs, (i.e. the value s) such that
P S s n, 0.5 2 .
When the data values are arranged in order of magnitude, the ( s 1) th observation
is L . To find U , the upper limit of the confidence interval, we count the ordered sample
values backwards from the largest. The ( s 1) th observation from the largest value locates
U . i.e. U (n s)th value.
Example 2.4
Construct a 95% confidence interval for the median of the population from which the
following sample data have been drawn, using the sign test.
0.07 0.69 1.74 1.90 1.99 2.41 3.07 3.08
3.10 3.57 3.71 4.01 8.11 8.23 9.10 10.16
Solution
The point estimate of the population median is the sample median which is the mean of
the two middle values in the ordered array. Thus,
the sample median = 3.08 2 3.10 3.09.
To find L , we consult a table of the binomial distribution and find that
P S 3 16, 0.5 0.0105 and P S 4 16, 0.5 0.0383.
Thus, we note that we cannot obtain an exact 95% confidence interval for the median.
Since 100[1 – 2(0.0105)] = 97.9, which is larger than 95 and 100[1 – 2(0.0383)] = 92.34,
which is smaller than 95.
This method of constructing confidence intervals for the median does not usually yield
intervals with exactly the usual coefficients of 0.90 , 0.95 , and 0.99.
In practice, we choose between a wider interval and a higher confidence or the narrower
interval and lower confidence.
Suppose we choose s 4, then s 1 5. Therefore the 5th value in the ordered array is
L and the 12th (i.e. 16 – 4) value in the ordered array is U .
Thus L 1.99 and U 4.01. .
The confidence coefficient is therefore 100[1 – 2(0.0383)] = 92.34. We say that we are
92.34% confident that the population median is between 1.99 and 4.01.
k 1n
P Z 1 2 2
2
n
P Z z 1
2
2
k 12 n
where Z is N(0, 1) and z 1 1 n
.
2 2
Making k the subject of the above equation, we obtain
k 12 n 12 z n 12 n z n .
2 2
Approximately k s 1. If the resulting value is not an integer, we use the closest integer.
Example 2.5
Refer to Example 2.3. Construct a 95% confidence interval for .
Solution
Here, n 20, z z0.025 1.96.
2
k s 1 12 20 1.96 20 5.6 6. and s 5.
Therefore the 6th observation in the ordered array is L and the (20 5) th 15th observation
in the ordered array is U . Thus, L 16 and U 23. Hence the 95% confidence interval
for is 16 23.
2.3.1 Assumptions
1. The sample available for analysis is a random sample of size n from a population with an
unknown median .
2. The variable of interest is measured on a continuous scale.
3. The sampled population is symmetric.
4. The scale of measurement is at least interval.
5. The observations are independent.
2.3.2 Hypotheses
The parameter of interest is , the population median. To test the hypothesis
H0: 0
where 0 is the hypothesized median, against a corresponding one-sided or two-sided
alternative, we can also use the Wilcoxon signed-ranks test.
Test statistic
For a sufficiently small W value, we reject H 0 . The test statistic therefore is
W W ,
since a small value causes us to reject the null hypothesis.
Decision rule
We reject H 0 at significance level if the observed W value, wo , is less than or equal to
the tabulated W value for n and a preselected value of .
Test statistic
The test statistic is
W min W , W ,
since a small value of either W or W causes us to reject the null hypothesis.
Decision rule
We reject H 0 at significance level if the observed W value, wo , is less than or equal to
the tabulated W value for n and a preselected value of 2 .
The distribution of W
1. The smallest value W can take is zero (0) and the largest value that W can take is the sum
n ( n 1)
of the integers from 1 to n: that is, 2
. W is therefore a discrete random variable
n ( n 1)
whose support ranges between 0 and 2
.
2. It can be shown that the probability mass function of the discrete random variable W is
given by
c ( w) n ( n 1)
P(W w) f ( w) , 0<w< 2
,
2n
where c(w) = the number of possible ways to assign a +sign or a −sign to the first n integers
so that the sum of the ranks with +signs (or –signs) is equal to w.
Example 2.6
The following are the systolic blood pressures (mmHg) of 13 patients undergoing a drug
therapy for hypertension:
183 178 152 157 194 163 144 114 179 150 118 158 165
Can we conclude on the basis of these data that the median systolic blood pressure is less than
165 mmHg? Take α = 0.05.
Example 2.7
Refer to Example 2.2. Use the Wilcoxon signed-ranks test to determine if there is any evidence
that the median IQ of drug abusers in the population is different from 107. Use 0.05.
Proof
n
When H 0 is true, W can be defined as W Wi where
i 1
Wi 0 with probability 12
Wi i with probability 12 .
Thus,
n n n ( n 1) n n( n 1)
E (W ) E (Wi ) 0 12 i 12 12 i 12 2 .
i 1 i 1
i 1
4
2i
2
V (Wi ) E (Wi2 ) E (Wi ) 0 2 12 i 2 12
2
12 i 2 14 i 2 14 i 2 .
n n n ( n 1)(2 n 1) n ( n 1)(2 n 1)
V (W ) 14 i 2 14 i 2 14 6
24
.
i 1 i 1
Theorem 2.2
When the null hypothesis is true, for large n:
Proof
n ( n 1) n ( n 1)(2 n 1)
If W is a random variable with mean 4
and variance 24
, then by the central
limit theorem,
n ( n 1)
W
Z 4
n ( n 1)(2 n 1)
24
is approximately N(0, 1).
t3 t .
48
We can subtract this quantity from the expression in the denominator under the square root
sign.
Thus the adjusted statistic for a large sample approximation is
n ( n 1)
W
Z 4 .
n ( n 1)(2 n 1) 3
24
t 48 t
We illustrate the calculation of an adjustment for ties in the following data:
t 3 t 107 11 2.
48 48
Example 2.8
The following data show the life span, in years, of a random sample of 21 recorded deaths in
a certain country. It has been known in the past years that the median life span in the country
is 50 years. Can we conclude from these data that the median life span in the country has
improved? Use α = 0.05
39 42 42 47 47 53 59 59 59 60 62
65 66 68 69 70 72 75 75 85 90
2.3.6 Confidence Interval for the Median, based on the Wilcoxon Signed-Ranks Test
Arithmetic Procedure
Step1: Find the means, uij , of all possible pairs of observation xi and x j from the sample
observation x1, x2 ,..., xn , that is
xi x j
uij 2
, 1 i j n.
n ( n 1)
There are 2
n such averages, distributed symmetrically about the median.
Step 2: Arrange the uij in an increasing order of magnitude.
Step 3: The median of the uij ' s is a point estimate of the population median.
Step 4: Find, from the Wilcoxon Signed Ranks Test table, t wn, p corresponding to the
sample size n and appropriate value of p as determined by the desired confidence
level. When the confidence coefficient is 1 , then p 2. If the exact value
of p cannot be found in the Wilcoxon signed ranks test table, we choose a closer
neighbouring value.
Step 5: The end points of the confidence interval are the kth smallest and kth largest values of
uij ' s where k = t + 1, where t is either value in the column labelled T corresponding
to n and the value of p selected (see Wayne, 1978).
Example 2.9
Determine the 95% confidence interval for the population median by the Wilcoxon Signed-
ranks procedure using the following data:
26 25 29 41 29 32 32 40 26 29
Solution
All the 55 possible pairs of means from the observations are given in the Table 2.5.
Table 2.5: All possible pairs of means from the observations
25 26 28 29 29 29 32 38 40 41
25 25.0
26 25.5 26.0
28 26.5 27.0 28.0
29 27.0 27.5 28.5 29.0
29 27.0 27.5 28.5 29.0 29.0
29 27.0 27.5 28.5 29.0 29.0 29.0
32 28.5 29.0 30.0 30.5 30.5 30.5 32.0
38 31.5 32.0 33.0 33.5 33.5 33.5 35.0 38.0
40 32.5 33.0 34.0 34.5 34.5 34.5 36.0 39.0 40.0
41 33.0 33.5 34.5 35.0 35.0 35.0 36.5 39.5 40.5 41.0
Thus, a point estimate of the population median is the 28th observation of the ordered data
in Table 2.5. This is 32. From the Wilcoxon signed ranks test table, t w10, 0.025 8. Thus,
k = t + 1 = 9. Therefore the 9th observation in the ordered array in Table 2.5 is the lower limit
L and the 9th observation from the largest value locates the upper limit U . Thus,
L 28.5 and U 35.0. Therefore the 95% confidence interval for is 28.5 35.0.
Exercise 2(a)
1. The median age of the onset of diabetes is thought to be 45 years. The ages at onset of a
random sample of 16 people with diabetes are:
26.2 30.5 35.5 38.0 39.8 40.3 45.0 45.6
45.9 46.8 48.9 51.4 52.4 55.6 60.9 65.4
Perform the
(a) sign test, (b) Wilcoxon signed-ranks test,
to determine if there is any evidence to conclude that the median age of the onset of
diabetes differs significantly from 45 years. Take α = 0.05.
2. Recent studies of the private practices of physicians who saw no Medicaid patients
suggested that the median length of each patient visit was 22 minutes. It is believed that
the median visit length in practices with a large Medicaid load is shorter than 22 minutes.
A random sample of 20 visits in practices with a large Medicaid load yielded, in order,
the following visit lengths:
9.4 13.4 15.6 16.2 16.4 16.8 18.1 18.7 18.9 19.1
19.3 20.1 20.4 21.6 21.9 23.4 23.5 24.8 24.9 26.8
(a) Use the large sample approximation of the sign test to determine if there is
sufficient evidence to conclude, at the 1% level of significance, that the average visit
length in practices with a large Medicaid load is shorter than 22 minutes?
(b) Based on the sign test, construct a 95% confidence interval for the median visit length
in practices with a large Medicaid load.
3. The following are the blood glucose levels of 12 patients who attend St. Thomas Hospital:
Use the large sample approximation of the Wilcoxon signed ranks test to determine if
we can conclude from these data that the population average is less than 30 mm.
Take α = 0.05.
6. Barrett (1991) reported data on eight cases of umbilical cord prolapse. The maternal ages
were 25, 28, 17, 26, 27, 18, 25, and 30.
(a) Perform the Wilcoxon signed ranks test to determine if there is enough evidence,
based on the data, that the average age of the population from which the sample may
be presumed to have been drawn is greater than 20 years. Take α = 0.01.
(b) Based on the Wilcoxon signed ranks test, construct a 99% confidence interval for the
population median.
7. Out of a random sample of 100 recorded deaths in a certain country during the past year,
68 of them were more than 65 years whilst the remaining 32 were below 65 years. Perform
a sign test to determine if we can we conclude that the average life span in the country is
greater than 65 years. Use α = 0.05.
8. Recent studies of the private practices of physicians who saw no Medicaid patients
suggested that the median length of each patient visit was 22 minutes. It is believed that
the median visit length in practices with a large Medicaid load is shorter than 22 minutes.
A random sample of 20 visits in practices with a large Medicaid load yielded, in order, the
following visit lengths:
9.4 13.4 15.6 16.2 16.4 16.8 18.1 18.7 18.9 19.1
19.3 20.1 20.4 21.6 21.9 23.4 23.5 24.8 24.9 26.8
Based on the large sample approximation of the sign test, is there sufficient evidence to
conclude that the average visit length in practices with a large Medicaid load is shorter
than 22 minutes?
9. To determine whether the median life span of certain spices of animal is greater than 5
years, a random sample of 25 observations were made and life span in years is the
following:
11.3 5.8 3.1 4.1 7.3 4.4 1.4 2.5 6.6 7.6 24.9 30.1 2.9
5.5 7.2 3.2 3.9 7.2 20.1 3.1 6.1 4.9 19.4 4.2 6.3
At 0.05 level of significant, use the large sample approximation of the sign test to
determine if the average life span is greater than 5 years.
10. A physician states that the median number of times he sees each of his patients during the
year is five. In order to evaluate the validity of this statement, he randomly selects ten of
his patients and determines the number of office visits each of them made during the past
year. He obtains the following values for the ten patients in his sample: 9, 10, 8, 4, 8, 3,
0, 10, 15, 9. Do the data support his contention that the median number of times he sees a
patient is five?
11. Moore and Ogletree (1973) investigated the readiness of pupils at the beginning of the
first grade. They compared scores on a readiness test of pupils who had attended a head
start program for a full year with the scores of those who had not. The readiness test scores
of 10 pupils who did not attend a Head Start program are as follows: 33, 19, 40, 35, 51,
41, 27, 55, 39, 21. Can we conclude, based on the Wilcoxon signed ranks test, that the
median score of the population represented by this sample is less than 45.3? Take
= 0.05.
12. Abu-Ayyash (1972) found that the median education of heads of households living in
mobile homes in a certain area was 11.6 years. Suppose that a similar survey conducted
in another area revealed the educational levels of heads of households as shown in the
following data.
13 6 6 12 12 10 9 11 14 8 7 16 15 8 7
Based on the sign test, can we conclude that the average educational level of the
population represented by this sample is less than 11.6 years? Take = 0.05.
13. Lenzer et al. (1973) reported the endurance score of animals during a 48-hour session of
discrimination responding. The median score for an animal with electrodes implanted in
the hypothalamus was 97.5. Suppose that the experiment was duplicated in another
laboratory, except that electrodes were implanted in the forebrain in 12 animals. Assume
that investigators observed the endurance score shown in the following table.
93.6 89.1 97.7 84.4 97.8 94.5 88.3 97.5 83.7 94.6 85.5 82.6
Use the one-sample sign test to see whether the investigators may conclude at the 0.05
level of significance that the median endurance score of animals with electrodes implanted
in the forebrain is less than 97.5.
14. Iwamoto (1971) found that the mean weight of a sample of a particular species of adult
female monkey from a certain locality was 8.41 kg. Suppose that a sample of adult females
of the same species from another locality yielded the weights as shown in the following
table. By using the one-sample sign test, can we conclude, at the 0.05 level of significance,
that the median weight of the population from which this second sample was drawn is
greater than 8.41 kg?
8.30 9.50 9.60 8.75 8.40 9.10 9.25 9.80 10.05 8.15 10.00 9.60 9.80 9.20 9.30
to estimate the proportion, p , of units in the population that belong to some definite class in
the population.
Testing hypotheses about population proportions is carried out in much the same way as
for median when the assumptions necessary for the test are satisfied.
2.4.1 Assumptions
1. The data consist of a sample of the outcomes of n repetitions of some process. Each
outcome consists of either a ‘success’ or a ‘failure’. The proportion of the sample having
a characteristic of interest is pˆ S n an estimate of the population proportion p, where S
is the number of successes (the total number of sampling units with a particular
characteristic of interest).
2. The n trials are independent.
3. The probability of a success p, remains constant from trial to trial.
2.4.2 Hypotheses
One-sided and two-sided tests may be made, depending on the question being asked. In other
words, we can test H 0 : p p0 against one of the alternatives p p0 , p p0 or p p0 .
Decision rule
Sufficiently small values of S lead to the rejection of H 0 . Let so denote the observed
value of S. We reject H0 at the α level of significance if the p-value of the test ,
where
p-value P S so n, p0 .
H1: p p0 .
Test statistic
The test statistic therefore is S. When H0 is true, S b n, p0 .
Decision rule
For sufficiently large values of S, we reject H 0 . Thus, we reject H0 at α level of
significance if the p-value of the test P S so n, p0 , where so is the observed value
of S.
Decision rule
For sufficiently large or sufficiently small values of S, we reject H 0 . The hypothesized
s
proportion is p0 whilst the observed sample proportion pˆ no , where so is the observed
value of S. The p-value of the test is defined by
p -value
o 0
2 P S s n, p , if pˆ p ,
0
2 P S so n, p0 , if pˆ p0 .
We reject H0 at the α level of significance if the p-value of the test .
Example 2.10
In a survey of injection drug users in a large city, Coates et al. (1991) found that 2 out of 12
were HIV positive. We wish to know if we can conclude, at the 10% level of significance, that
fewer than 40% of the injection drug users in the sampled population are HIV positive.
Solution
The parameter of interest is p, the proportion of injection drug users in the sampled population
who are HIV positive. We wish to test
Example 2.11
A researcher found anterior sub-capsular vacuoles in the eyes of 6 out of 15 diabetic patients.
Using the binomial test, can we conclude that the population proportion with the condition of
interest is greater than 0.2? Use = 0.05.
Solution
The parameter of interest is p , the proportion of diabetic patients in the population with
anterior sub-capsular vacuoles in the eyes. We wish to test
H 0 : p 0.2 against H1: p 0.2.
The test statistic is S, the number of diabetic patients in the sample with anterior sub-capsular
vacuoles in the eyes. When H0 is true,
S b 15, 0.2 .
Let so denote the observed value of the test statistic. We reject H0 at the 0.05 level of
significance if the p-value of the test 0.05, where p-value P S so 15, 0.2 . Given
so 6,
p-value P S 6 15, 0.2 1 P S 6 15, 0.2 1 0.9819 0.0181.
Since the p-value 0.0181 < 0.05, we reject H0 at the 0.05 level of significance and conclude
that the population proportion p is greater than 0.2.
Example 2.12
A commonly prescribed drug for relieving nervous tension is believed to be only 60%
effective. Experimental results with a new drug administered to a random sample of 100 adults
who were suffering from nervous tension show that 70 received relief. Is this sufficient
evidence to conclude that the new drug is superior to the one commonly prescribed? Use
α = 0.05.
Solution
The parameter of interest is p, the proportion of adults in the population who received relief
from nervous tension. We wish to test
H 0 : p 0.6 against H1: p 0.6
at α = 0.05 level of significance. The test statistic is
S np0
Z .
np0 (1 p0 )
Given n 100 and p0 0.6, both np0 and n(1 po ) are greater than 5 and so Z is
approximately N(0, 1) when H0 is true. We reject H0 if z, the computed Z value is greater than
z0.95 1.645 . Now, S = 70 and
z 70 100 0.6 2.0412.
100 0.6 0.4
Since 2.0412 > 1.645, we reject H0 at the 0.05 level of significance. We conclude that the new
drug is superior to the one commonly prescribed.
Example 2.13
In a certain university, the proportion of students who have diabetes mellitus is p. Of the 500
students selected at random from the university, 6 had diabetes mellitus.
(a) Find a point estimate of p. (b) Construct a 90% confidence interval for p.
Solution
(a) A point estimate of p is given by pˆ 6
500
0.012.
(b) npˆ 6 and n(1 pˆ ) 494 . Both npˆ and n(1 pˆ ) are of sufficient magnitude to justify the
use of the formula for constructing a confidence interval for p. To construct a 90%
confidence interval, we put 1 0.90. This gives = 0.10. From the standard normal
table, we find that z1 1 z0.95 1.645 . Hence a 90% confidence interval for p is
2
Exercise 2(b)
1. A researcher found that 66% of a sample of 14 infants had completed the hepatitis B
vaccine series. Can we conclude on the basis of these data that, in the sampled population,
more than 60% have completed the series? Use α = 0.01.
2. A health survey of 12 male inmates 50 years of age and older residing in a state’s
correctional facilities was made. They found that 22% of the respondents reported a history
of venereal disease. On the basis of these findings, can we conclude that in the sampled
population, more than 15% have a history of venereal disease? Use α = 0.05.
A run is a sequence of signs of the same kind bounded by signs of other kind. In this case, we
doubt the sequence’s randomness, since there are only two runs.
If the order of occurrence were
25 22 27 23 27 28 21 26 23 30 21 24
– + – + + – + – + – +
1 2 3 4 4 6 7 8 9 10
we would doubt the sequence’s randomness because there are too many runs (10 in this
instance).
Too few runs indicate that the sequence is not random (has persistency) whilst too many
runs also indicate that the sequence is not random (is zigzag). Let us now consider the one
sample runs test. This procedure helps us to decide whether a sequence of sample values is the
result of a random process.
Assumptions
The data available for analysis consist of a sequence of sample values, recorded in the order
of their occurrence.
Hypotheses
We wish to test
H 0 : The sequence of sample values is random, against
H1: The sequence of sample values is not random.
Test Statistic
The test statistic is R, the total number of runs.
Decision Rule
Since the null hypothesis does not specify the direction, a two-sided test is appropriate. The
critical value, rc , for the test is obtained from Table A.5, in the Appendix, for a given sample
size n and at a desired level of significance α. If rc lower ≤ r ≤ rc upper , accept H 0 .
Otherwise reject H 0 .
Tied Values
If an observation is equal to its preceding observation, denote it by zero. While counting the
number of runs, ignore it and reduce the value of n accordingly.
Example 2.15
The following are the blood glucose levels of 12 patients who attend St. Thomas Hospital:
Test, at the 0.05 level of significance whether the sequence is random?
Solution
We wish to test
H 0 : The sequence is random, against
H1: The sequence is not random.
The test statistic is
R the number of runs.
We reject H 0 at the 0.05 level of significance if r rc lower or r rc upper , where r is
the observed value of R and rc is the critical value. It can be seen that:
Here n = 11 and the number of runs r = 7. From the table of critical values for runs up and
down test, rc lower 4 and rc upper 10 (see Table A.5, in the Appendix)
Note: Since two consecutive observations are the same, that is 110, we use n = 11 instead
of n = 12.
Since 4 r 10, we fail to reject H 0 at the 0.05 level of significance and therefore conclude
that the sequence is random.
Exercise 2(c)
1. The following data show the average daily temperatures recorded at Accra, Ghana, for 15
consecutive days during June 2017.
Day 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Temperature 28 27 26 27 28 29 29 27 26 25 28 24 25 26 28
Test, at the 0.05 level of significance, if we can conclude that the pattern of temperature
is random?
2. The following data show the inflation rate in Ghana from 2006 to 2017. Test, at the 0.05
level of significance, if we can conclude that the pattern of year inflation is random?
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
11.7 10.7 16.5 13.1 6.7 7.7 7.1 11.7 15.5 17.2 17.5 12.0
References
Abu-Ayyash, A. Y. (1972). The mobile home: A neglected phenomenon in geographic
research. Geog. Bull., 5, 28 – 30.
Barrett, J. M. (1991). Funic reduction for the management of umbilical cord prolapse.
American Journal of Obstetrics and Gynaecology. 165, 654-657.
Coates, R., Millson, M., Myers, T. (1991). The benefits of HIV Antibody testing of saliva in
field research. Canadian Journal of Public Health, 82, 397-398.
Iwamoto, M. (1971). Morphological studies of Macaca Fuscata: VI, Somatometry. Primates,
12, 151 – 174.
Lenzer, Irmingard I., and White, C. A. (1973). Statistical effects in continuous reinforcement
and successive sensory discrimination situations. Physiolog. Psychol, 1, 77 – 82.
Moore, R. C and Ogletree, E. J. (1973). A comparison of the readiness and intelligence of first
grade children with and without a full year of Head Start training. Education, 93, 266 – 270.
Ofosu, J. B., & Hesse, C. A. (2011). Elementary Statistical Methods. EPP Books Services,
Accra.
Wayne, W. D. (1978). Applied nonparametric statistics. Houghton Mifflin company, London.