Lectures On Biostatistics-Ocr4 PDF
Lectures On Biostatistics-Ocr4 PDF
The rightsholder has graciously given you the freedom to download all pages of this
book. No additional commercial or other uses have been granted .
\.
L.... , ..... ..
Lectures on Biostatistics
Di! gle
D.COLQUHOUN
Lectures on Biostatistics
An Introduction to Statistics with
Applications in Biology and Medicine
.-. I'
U ~'.: '.' .
• ,.,
:.' ","'
\ t .•..
I '.
"
E: ~"
, (_. ~
~ I,..,
~;.. .. .'
'\ i. ...,'
Digitized by Google
Oxford University Press, Efy House, LoruJon W. I
OLAMIOW NJlW YOR& TORONTO KRLIIOUILN1I WRLLIIfOTON
CAP. TOWN DADAK NAIIlOIII DAR a IIALAAII LIIIAKA ADDII A _
DRLIII BOIIIIAY CALCUTTA IIADRAI LUACIII LAIIORR DACCA
KUALA LUMPUR IINOAPOR& HOMO KONO TOKYO
Digitized by Google
'-":- J
Preface
'In etatiatiea, just like in the industry of COD.8UDler goods, there are producers
and COnBUm8l'8. The goods are atatiatical methoda. These come in various kinda
and 'branda' and in great and often confusing variety. For the COD.8UDler, the
appUer of atatiatical methods, the choice between alternative methods ia often
cU1Ilcu1t and too often dependa on personal and irrational facto1'8.
The advice of producers cannot always be trusted implicitly. They are apt--
. . ia only natural-to praiae their own wares. The advice of COD.8UDlera-ba8ed
on experience and personal impreaaiona---amnot be trusted either. It ia well
known among applied atatiaticians that in many flelda of applied science, e.g.
in industry, experience, especially 'experience of a lifetime', compares unfavour-
ably with objective scientiflc research: tradition and aversion from innovation
are usually strong impedimenta for the introduction of new methods, even if
these are better than the old onea. Thia also holda for atatiatiea'
(J. Hm«Eu\lJ][ 1961).
DURING the preparation of the courses for final year students, mostly
of pharmacology, in Edinburgh and London on which this book is
based, I have often been struck by the extent to which most textbooks,
on the flimsiest of evidence, will dismiss the substitution of assumptions
for real knowledge as unimportant if it happens to be mathematically
convenient to do so. Very few books seem to be frank about. or perhaps
even aware of, how little the experimenter actually know8 about the
distribution of errors of his observations, and about facts that are
assumed to be known for the purposes of making statistical calculations.
Considering that the purpose of statistics is supposed to be to help in
the making of inferences about nature, many texts seem, to the
experimenter, to take a surprisingly deductive approach (if assump-
tions a, b, and c were true then we could infer such and such). It is also
noticeable that in the statistical literature, as opposed to elementary
textbooks, a vast number of methods have been proposed, but remark-
ably few have been a88t88ed to see how they behave under the condi-
tions (small samples, unknown distribution of errors, etc.) in which
they are likely to be used.
These considerations, which are discussed at greater length in the
text, have helped to determine the content and emphasis of the methods
in this book. Where possible, methods have been advocated. that
involve a minimum of untested assumptions. These methods, which
Digitized by Google
tJi Preface
occur mostly in Chapters 7-11, have the secondary advantage that
they are much easier to understand at the present level than the
methods, such as Student's t test and the chi-squared test (also described
and exemplified in this book) which have, until recently, been the main
sources of misery for students doing first courses in statistics.
In Chapter 12 and also in § 2.7 an attempt has been made to deal
with non-linear problems, as well as with the conventional linear ones.
Statistics is heavily dominated by linear models of one sort and another,
mainly for reasons of mathematical convenience. But the majority of
physical relationships to be tested in practical science are not straight
lines, and not linear in the wider sense described in § 12.7, and attempts
to make them straight may lead to unforeseen hazards (see § 12.8),
so it is unrealistic not to discuss them, even in an elementary book.
In Chapters 13 and 14, calibration curves and assays are discussed.
The step by step description of parallel line assays is intended to
bridge the gap between elementary books and standard works such as
that of Finney (1964).
In Chapter 5 and Appendix 2 some aspects of random ('stochastic')
processes are discussed. These are of rapidly increasing importance to
the practical scientist, but the textbooks on the subject have always
seemed to me to be among the most incomprehensible of all statistics
books, partly, perhaps, because there are not really any elementary
ones. Again I have tried to bridge the gap.
The basic ideas are described in Chapters 1-4. They may be boring,
but the ideas in them are referred to constantly in the later chapters
when these ideas are applied to real problems, so the reader is earnestly
advised to study them.
There is still much disagreement about the fundamental principles
of inference, but most statisticians, presented with the problems
described, would arrive at answers similar to those presented here,
even if they justified them differently, so I have felt free to choose the
justifications that make the most sense to the experimenter.
I have been greatly influenced by the writing of Professor Donald
Mainland. His Elementary medical statistics (1963), which is much more
concerned with statistical thinking than statistical arithmetic, should
be read not only by every medical practitioner, but by everyone who
has to interpret observations of any sort. If the influence of Professor
Mainland's wisdom were visible in this book, despite my greater
concern with methods, I should be very happy.
I am very grateful to many statisticians who have patiently put up
Digitized by Google
PreJact vii
with my pestering over the last few years. H I may, invidiously,
distinguish two in particular they would be Professor Mervyn Stone
who read most of the typescript and Dr. A. G. Hawkes who helped
particularly with stochastic processes. I have also been greatly helped
by Professor D. R. Cox, Mr. I. D. Hill, Professor D. V. Lindley, and
Mr. N. W. Please, as well as many others. Needless to say, none of
these people has any responsibilities for errors of judgment or fact that I
have doubtless persisted with, in spite of their best efforts. I am also
very grateful to Professor C. R. Oakley for permission to quote ex-
tensively from his paper on the purity-in-heart index in § 7.8.
University College Londmt D. C.
April 1970
Digitized by Google
Contents
Digitized by Google
x Oonlmt8
6. CAN YOUR RESULTS BE BELIEVED? TESTS OF
SIGNIFICANCE AND THE ANALYSIS OF VARIANCE 86
6.1. The interpretation of testa of significance 86
6.2. Which sort of test should be used, parametric or nonparametric 1 96
6.S. Randomization tests 99
6.4. Types of sample and types of measurement 99
Digitized by Google
OO'llknlB xi
10.8. The randomization teat for paired observations 167
10.4. The Wilcoxon signed ranb test for two related samples 160
10.5. A data selection problem arising in small samples 166
10.6. The paired , test 167
10.7. When will related samples (pairing) be an advantageP 169
Digitized by Google
18.5. Conftdence limite for the ratio of two normally distributed
varla.bles: derivation of Fieller's theorem 298
18.6. The theory of paraJlel line 8811&,... Conftdence limibl for the
potency ratio and the optimum design of 8811&,.. 297
18.7. The theory of parallel line 8811&,... Testing for non-validity 800
18.8. The theory of symmetrical parallel line 8811&,... Use of ortho-
gonal contrast. to test for non-validity 802
18.9. The theory of symmetrical parallel line 8811&,... Use of con-
traata in the analysis of variance 306
18.10. The theory of symmetrical parallel line 888&1'8. Simplified
calculation of the potency ratio and ibl conftdence limibl 308
18.11. A numerical example of a symmetrical (2+2) dose para.1lel
line 8811&1' 311
18.12. A numerical example of a symmetrical (3+8) dose parallel
line 8811&1' 319
18.18. A numerical example of an unsymmetrical (8+2) dose parallel
line 8811&1' 327
18.a. A numerical example of the standard curve (or calibration
curve). Error of a value of #e read off from the line 332
13.16. The (k+1) dose 8811&1' and rapid routine 8811&ya 840
Digitized by Google
o~ mi
A2.8. The connection between the lifet;lmes of Indi-
vidual adrenaline molecules and the obaerved
breakdown rate and half-life of adrenaline 879
A2.'. A atocha8tic view of the adsorption of molecules
from solution sao
A2.5. The relation between the lifetime of individual
radioisotope molecules and the interval be-
tween dieintegratioDII S86
A2.6. Why the waiting time until the next event does
not depend on when the timing is started for
a Poiaaon proceaa 888
A2.7. Length-biaaed sampling. Why the average length
of the interval in which an arbitrary moment of
time falls is twice the average length of all
intervals for a Poiaaon proceaa 889
TABLES
Table AI. Nonparametric con1ldence limits for the median 896-7
Table A2. Con1ldence limite for the parameter of a binomial
distribution (i.e. the population proportion of
'aucceaaea') 898-401
Table AS. The Wilcoxon teat for two independent samples .02-'
TableA'. The Wilcoxon signed ranks teat for two related
samples .05
Table A5. The Kruakal-Wallia one-way analysis of variance
on ranks (independent samples) ~6-8
Table A6. The Friedman two-way analysis of variance on
ranks for randomized block experiments ~9
Table A7. Table of the critical range (cWference between rank
auma for any two treatmente) for comparinc all
pairs in the Kruakal-Wallia nonparametric one way
analysis of variance UO
Table AS. Table of the critical range (cWference between rank
auma for any two treatments) for comparing all
pairs in the Friedman nonparametric two way
analysis of variance '11
Table AD. Rankite (expected normal order statistics) U2-8
RBFBRBNOBS U6
INDBX '19
Digitized by Google
Index of symbols
Reference is to the main section in which the symbol is used, explained, or defined.
Digitized by Google
1fide 0/ '1/fIIbol8
C total In I x 2 table (18.2)
_(III,.) population covariance of III and • (12.8)
cov(.,.) umple eatim&te of UII(III,.)
II cWrerence between 2 observatioDa (1110.1, 10.8)
d m-.n of II valuea (110.6)
II cWrerence between Y and • (112.2)
II Dunnett's (1964.) atatiatic (111.9)
D ratio of each dose to the next lowest dose (I 13.2)
e
exp(lII) .,.
base of naturalloga, 2·71829. • • (f 2.7)
•
~,~
error part of the mathematical model for an observation (111.2)
event. <t 2.4)
B(III) expectation (long run mean) of. (I Al.l)
BD60 effective dose in 60 per cent of subject. (I 14.3)
1(.) any function of. (12.1)
1(111) probability delUllty function of III (14..1)
11(111) probability delUllty function for length-biased samplea (I AI.7)
I number of degrees of freedom (II 8.15, 11.3)
I frequency of observing a particular vaIue (I 8.8)
F(III) diatribution function. Probability of maJdng an observation of III or
leas (f 4.1)
,
'1(111) length-biaaed diatribution function (I AI.7)
variance ratio (I 11.8)
(/ index ofaigni1lcance of b (118.5)
(/(111) any function of III (II 2.1, Al.l)
H Kruakal-Wallia atatiatic (I 11.5)
-
m an estimate of the mean, I' (§ 2.5)
population mean number of event. in unit time (space, etc.) for
Poiaaon (II 8.5, 6.1, AI.2)
-
m
m
m
sample estimate of _ (§ 8.6)
population median (17.8)
number of new observations (§§ 7.4,12.4)
estimate of a ratio, alb (§ 7.6)
M log,(potency ratio) (§ 13.8)
fl. N number of observations, an integer (§ 2.1)
fI number of binomial 'trials' (18.2)
fI(t) number of intervals ~ t (15.1)
fli number of observations in Jth group (§ 11.4)
Digitized by Goog Ie
•
xvii
number of replicate responses to each dose ell 13.2, 13.8)
number of responses for standard, unknown (It 13.2, 13.8)
total number of adsorption sites (§ A2.4)
number of sites occupied, or nuclei not disintegrated, at time ,
U§ A2.4, A2.5)
NED norID&!. su " ....""£1' deviation (I 14.8)
any quantEy,y Y"r"enmes negligible time-tntervala
(1A2·2)
logical
observed nifif?nnYTIr??,if," §§ 3.6,8.4, 14.4,
8&lDe as
cumulatE en {I 14.2)
probability (true or estimated) of the event in brackets (I§ 2.2, 2.4)
conditional probability of E1 given that E. baa happened (12.4)
result of signiftcance test (§ 6.1), confidence level (U 7.2,7.9)
probability of a 'success' at each trial in binomial situation
(I§ 3.2,3.5)
high and low confidence limits for ~ (§ 7.7)
probability is occupied, emytt
number II 3.1,3.2), of
observ",t 'J§1?Ccesses' (17.7)
rank of tnrming confidence YTIedian (I 7.3)
number treatments) (§§
mean sut2YTInte of ... (§ 8.6)
r Pearson correlation coefficient (112.9)
r. Spearman rank correlation coefficient (§ 12.9)
r base of logarithms for &88&YS (§§ 13.2, 13.3)
R sum of ranks (II 9.3,11.3,11.5)
R potency ratio (U 13.1, 13.3)
.C.) 8&lDple standard deviation of the variable in brackets. An estimate
ofa{.)
8&lDple nariable in bracknin yyuare of B{.).
An esiYnntn viW (.) (§§ 2.1,
rlDlll largest a set of 8&lDple 1.2)
Friedman ll.7)
Scheff' ,>'0)
8 sum of squared deviations to be minimized (§ 12.2)
,,
8 standard preparation (§ 13.1)
Student's statistic (§ 4.4)
time, time interval between events, lifetime (Chapter 5, Appendix 2)
1 time considered as a random variable, t denoting a particular value
ofi (§A2.5)
a time inte%¥al
total of nJ§rnnneni',rn,1n
populatiun (§§ 5.1, Al.l.
A2.2)
8&lDple
sum of ",Ylinhever is the
smaller
u standard norID&!. (GaU88ian) deviate (§ 4.3)
U unknown preparation (§ 13.1)
11].1' etc. variance multipliers (§ 13.5)
2
gitized by Goo
xviii Index oj symlJo18
val{.) population variance of variable in brackets. Same as a2{.).
(§§ 2.6, 2.7)
var{.) Sample estimate of va.{.). Same as r{.) (§§ 2.6,2.7)
..y population (true) maximum value of y (§ 12.8)
V a sample estimate of..y (§ 12.8)
V least squares estimate of ..y (§ 12.8)
to weight (§ 2.5)
Z any variable (§§ 2.1, 2.7, 4.1)
II :z: considered as a random variable, :z: denoting a particular value of
II (§§ 4.1, Al.1)
II geometric mean of:e values (§ 2.5)
Z independent variable in curve fitting problems (§§ 12.1, 12.2)
Z log II (Chapter 13)
zo' :z:. observed frequency (integer) and expected frequency (§ 8.5)
th., etc. means of observations (§ 2.1)
y observed value of dependent variable in curve-fitting problems
(§§ 12.1, 12.2)
y value of dependent variable read oft' fitted curve (§ 12.2)
doses ofstandard and unknown (§ 18.1)
doses giving equal responses (§ 13.3)
Greek symbols
ex (alpha) probability of an error of the first kind (§ 6.1)
ex population value of y when:e = ~, estimated by a (§§ 12.2, 12.7)
ex population value of numerator of ratio (§ 13.5)
ex orthogonal coeftl.cient (§ 13.8)
{J (beta) probability of an error of the second kind (§ 6.1)
{J population value of slope, estimated by b (§ § 12.2, 12.7)
{JI block eft'ect for ith block in model observation (§ 11.2)
(J population value for denominator of ratio (§ 13.5)
I:J. (delta) change in, interval in, value of following variable (§§ 5.1,
A2.2)
(lambda.) population (true) mean number of events in unit time
(§§ 5.1, A2.2)
measure of probability of catabolism, disintegration, adsorption in
a short time-interval (§§ A2.3-A2.6)
emu) population mean (§§ 2.5,4.2, 12.2, Al.l)
population (true) value of ratio (§ 13.5)
measure of probability of desorption (§ A2.4)
3·141593 .••
II (capital pi) multiply the following § 2.1
a2{.) (sigma) population (true) variance of variable in brackets. Same as
va.{.) (§§ 2.1,2.6,2.7, Al.2)
1: (sigma) add up the following (§ 2.1)
(tau) treatment eft'ect for jth treatment in model observation
(§ 11.2)
(cbi) chi-squared statistic with f degrees of freedom (§ 8.5)
rank statistic distributed approximately as r (§ 11.7)
(omega) interblock standard deviation (§ 11.2)
Digitized by Google
1. Is the statistical way of thinking
worth bothering about?
'I wish to propose for the reader's favourable consideration a doctrine which
may, I fear, appear wildly paradoxical and subve1'8ive. The doctrine in question
is this: that it is undesirable to believe a proposition when there is no ground
whatever for supposing it true. I must of course, admit that if such an opinion
became common it would completely transform our social life and our political
system: since both are at present faultless, this must weigh against it. I am also
aware (what is more serious) that it would tend to diminish the incomes of
clairvoyants, bookmake1'8, bishops and othe1'8 who live on the irrational hopes of
those who have done nothing to deserve good fortune here or hereafter. In spite
of these grave arguments, I maintain that a case can be made out for my paradox,
and I shall try to set it forth.'
BERTRAND RUSSELL, 1985
(On the Value of ScepticiBm)
Digitized by Google
2 § 1.1
to measure abstraot quantities suoh a.s pain, intelligence, or purity in
heart (§ 7.8). As Ma.inla.nd (1964) points out, most of us find arithmetio
easier than thinking. A partioular effort has therefore been made to
explain the rational basis of a.s many methods a.s possible. This has
been made muoh easier by starting with the randomization approach
to signifioance testing (Chapters 6--11), because this approaohis easy to
understand, before going on to tests like Student's t test. The numeri08.I
examples have been made a.s self-oontained a.s possible for the benefit
of those who are not interested in the rational basis.
Although it is difficult to aohieve these aims without a certain
amount of arithmetio, all the mathematiea.l ideas needed will have
been learned by the age of 15. The only diffioulty may be the 0008.-
siona.l use of longer formulae than the reader may have encountered
previously, but for the vaat majority of what follows you do not need
to be able to do anything but add up and multiply. Adding up is so
frequent that a speoial notation for it is described in detail in § 2.1.
You may find this very dull and boring until familiarity ha.s revealed its
beauty and power, but do not on any a.ocount miss out this section.
In a few sections some elementary O8.loulus is used, though anything at
all daunting has been confined to the appendices. These parts oan be
omitted without affecting understanding of most of the book. If you
know no O8.loulus at all, and there are far more important rea.sons for
no biologist being in this position than the ability to understand the
method ofleaat squa.res, try Silvanus P. Thompson's OalcWUII made Easy
(1965).
A list of the uses and scope of statisti08.I methods in laboratory and
olini08.I experimentation is necessa.rily arbitrary and personal. Here is
mine.
(1) Statisti08.I prudence (Lancelot Hogben's phrase) encourages the
design of experiments in a way that allows conolusions to be drawn from
them. Some of the ideas, suoh a.s the central importance of randomiza-
tion (see §§ 2.3, 6.3, and Chapters 8-11) are far from intuitively obvious
to most people at first.
(2) Some prooesses are inherently probabilistio in nature. There is no
altemative to a sta.tisti08.I approaoh in these oa.ses (see Chapter 5 and
Appendix 2).
(3) Statistica.l methods allow an estimate (usually optimistio, see
§ 7.2) of the uncertainty of the conclusions drawn from inexact observa-
tions. When results are a.ssessed by hopeful intuition it is not uncommon
for more to be inferred from them than they really imply. For example.
Digitized by Google
§ 1.1 3
Schor and Ka.rten (1966) found that, in no less than 72 per cent of a
sample of 149 artioles seleoted from 10 highly regarded medical journals.·
conolusions were drawn that were not justified by the results presented.
The most common single error was to make a general inference from
results that could quite easily have arisen by ohance.
(4) Statistical methods can only cope with random errors and in
real experiments systematio errors (bias) may be quite as important as
random ones. No amount of statistics will reveal whether the pipette
used throughout an experiment was wrongly calibrated. Tippett (1944)
put it thus: 'I prefer to regard a set of experimental results as a biased
sample from a population, the extent of the bias varying from one kind
of experiment and method of observation to another, from one experi-
menter to another, and, for anyone experimenter, from time to time.'
It is for this reason, and because the assumptions made in statistical
analysis are not likely to be exactly true, that Mainland (1964) em-
phasizes that the great value of statistical analysis, and in particular
of the confidence limits disoussed in Chapter 7, is that 'they provide
a kind of minimum estimate of error, because they show how little a
partioula.r sample would tell us about its population, even if it were a
strictly random sample.'
(5) Even if the observations were unbiased, the method of calculating
the results from them may introduce bias, as disoussed in §§ 2.6 and
12.8 and Appendix 1. For example, some of the methods used by
bioohemists to caloulate the Michaelis constant from observations of the
initial velocity of enzymio reactions give a biased result even from
unbiased observations (see § 12.8). This is essentially a statistical
phenomenon. It would not happen if the observations were exact.
(6) The important point to realize is that by their nature statistioal
methods can never 'JY"OVe anything. The answer always comes out as
a probability. And exactly the same applies to the assessment of
results by intuition, except that the probability is not caloulated but
guessed.
Digitized by Google
4 Statistical thinking? § 1.2
concentration will, in general, be different at every attempt. An
unknown true value such as the unknown true concentration of the
drug is called a parameter. The mean value from all the assays gives an
estimate of this parameter. An approximate experimental estimate
(the mean in this example) of a parameter is called a 8tatistic It is
calculated from a 8ample of observations from the poptdation of all
possible observations.
In the example just discussed the individual assay results differed
from the parameter value only because of experimental error. However,
there is another slightly different situation, one that is particularly
common in the biological sciences. For example, if identical doses of
a drug are given to a series of people and in each case the fall in blood
sugar level is measured then, as before, each observation will be differ-
ent. But in this case it is likely that most of the difference is real.
Different individuals really do have different falls in blood sugar level,
and the scatter of the results will result largely from this fact and only
to a minor extent from experimental errors in the determination of the
blood sugar level. The average fall of blood sugar level may still be of
interest if, for example, it is wished to compare the effects of two
different hypoglycaemic drugs. But in this case, unlike, the first, the
parameter of which this average is an estimate, the true fall in blood
sugar level, is no longer a physical reality, whereas the true concentra-
tion was. Nevertheless, it is still perfectly all right to use this average as
an estimate of a parameter (the value that the mean fall in blood
sugar level would approach if the sample size were increased indefin-
itely) that is used simply to define the distribution (see §§ 3.1 and
4.1) of the observations. Whereas in the first case the average of all
the assays was the only thing of interest, the individual values being
unimportant, in the second case it is the individual values that are of
importance, and the average of these values is only of interest in so far
as it can be used, in conjunction with their scatter, to make predictions
about individuals.
In short, there are two problems, the older one of estimating a true
value by imperfect methods, and the now common problem of measur-
ing effects that are really variable (e.g. in different people) by relatively
very accurate methods. Both these problems can be treated by the
same statistical methods, but the interpretation of the results may be
different for each.
With few exceptions, scientific methods were applied in medicine and
biology only in the nineteenth century and in education and the socia)
Digitized by Google
§ 1.2 Statistical thinking 1 5
sciencee only very recently. It is necessary to distinguish two sorts of
scientific method often called the ob8ervational method and the
experimental method. Claude Bernard wrote: 'we give the name observer
to the man who applies methods of investigation, whether simple or
complex, to the study of phenomena which he does not vary and which
he therefore gathers as nature offers them. We give the name experi-
menter to the man who applies methods of investigation, whether
simple or complex, so as to make natural phenomena vary.' In more
modem terms Mainland (1964) writes: 'the distinctive feature of an
experiment, in the strict sense, is that the investigator, wishing to
compare the effects of two or more factors (independent variables)
assigns them himself to the individuals (e.g. human beings, animals or
batches of a chemical substance) that comprise his test material.'
For example, the type and dose of a drug, or the temperature of an
enzyme system, are independent variable8.
The observational method, or survey method as Mainland calls it,
usually leads to a correlation; for example, a correlation between
smoking habits and death from lung cancer, or between educational
attainment and type of school. But the correlation, however perfect
it may be, does not give any information at all about ca'U8ation, t such
as whether smoking ca'U8e8 lung cancer. The method lends itself only
too easily to the confusion of sequence with consequence. 'It is the
:po&t hoc, ergo propter hoc of the doctors, into which we may very easily
let ourselves be led' (Claude Bernard).
This very important distinction is discussed further in §§ 12.7 and
12.9. Probably the most useful precaution against the wrong interpreta-
tion of correlations is to imagine the experiment that might in principle
be carried out to decide the issue. It can then be seen that bias in the
results is controlled by the randomization proceSB inherent in experi-
ments. If all that is known is that pupils from type A schools do better
than those from type B schools it could well have nothing to do with
the type of school but merely, for example, be that children of educated
parents go to type A school and those of uneducated parents to type B
schools. If proper experimental methods were applied in the situations
mentioned above the first step would be to divide the population (or a
random sample from it) by a random process, into two groups. One
group would be instructed to smoke (or to go to a particular sort of
school), the other group would be instructed not to smoke (or go to
t It is not even necessarily true that zero correlation rules out causation, because
Jack of correlation does not necessarily imply independence <see § 12.9).
Digitized by Google
§ 1.2
a different sort of school). The difficulty in the medical and social
sciences is usually that an experiment may be considered unethical.
Since it can hardly be assumed a priori that there is an equal ohance of
smoking having good or bad effects on health, it is not poBSible to
gAOUP of peopl4i though it
leave one A4iI4icted group to
Amoking by sorfl4i gTUUp) and to 4ither
smoking.
situation is nl)) this, however~ g4iTuine
doubt about the relative merits of different sorts of sohool, and, very
often, about different sorts of therapy, so in these cases it is not merely
ethical to do a proper experiment, but it would be unethioal, though
not unusual, not to do the experiment.
of new
llllllllllllllOllllll'''U'' the
to mention the
Inimdations of ftarting on TTlllllnlln
gle
§ 1.3 Stati8tical thinlcing, 7
the hypothesis that state schools provide a better education than
private schools.
The use to which natural soientists wanted to put probability theory
was, it seems, of a quite different kind from that for whioh the theory
was designed. All that probability theory would answer were questions
such as: Given certain premises about the thorough shuftling of the paok
and the honesty of the players, what is the probability of drawing four
consecutive aces' This is a statement of the probability of making
some observations, given an hypothesis (that the cards are well ahuftled,
and the players honest), a deductive statement of direct probability.
What was needed waa a statement of the probability of the hypothesis,
giwm some observations-an induotive statement of inver8e probability.
An answer to the problem was provided by the Rev. Thomaa Bayes
in his JD88tJlIloUJarda solving a Problem in tke Doctrine o/Oh,afI.CU published
in 1763, two years after his death. Bayes' theorem states:
posterior probability of a hypothesis = constant X likelihood
of hypothesis X prior probability of the hypothesis (1.3.1.)
In this equation prior (or a priori) probability means the probability
of the hypothesis being true be/ore making the observations under
consideration, the posterior (or a posteriori) probability is the probability
after making the observations, and the likelihood 0/ tke hypotke8i8 is
defined aa the probability of making the given observations i/ the
hypothesis under consideration were in fact true. This technical
definition of likelihood will be encountered again later.
The wrangle about the interpretation of Bayes' theorem continues
to the present day. Is 'the probability of an hypothesis being true' a
meaningful idea' The great mathematician Laplace assumed that if
nothing were known of the merits of rival hypotheses then their prior
probabilities should be considered equal ('the equipartition of ignor-
ance'). Later it was suggested that Bayes' theorem waa not really
applicable except in a small proportion of cases in whioh valid prior
probabilities were known. This view is still probably the most common,
but there is now a strong sohool of thought that believes the only
sound method of inference is Bayesian. An uncontroversial use of
Bayes' theorem, in medical diagnosis, is mentioned in § 2.4.
Fortunately, in most, though not all, cases the practical results are
the same whatever viewpoint is adopted. If the prior probabilities of
several mutually exclusive hypotheses are known or assumed to be
equal then the hypothesis with the maximum posterior probability
Digitized by Google
8 Statistical thinki1l{/? § 1.3
will also be that with the maximum likelihood. In fact a popular
procedure is to ignore the prior probability altogether and to select the
hypothesis with the maximum likelihood. This procedure avoids
altogether the making of statements of inverse probability that many
people think to be invalid, but loses something in interpretability.
The probability considered is the probability of the observations calcu-
lated assuming the hypothesis in question to be tme--a statement of
direct probability.
It has been argued strongly by Karl Popper that scientific inference
is a wholly deductive process. A hypothesis is framed by inspired guess-
work. It consequences are ded~ and then tested experimentally. This
is certainly just how things should be done. But, as A. J. Ayer points
out, the experiment is only useful if it is supposed that it will give the
same result when it is repeated, and the argument leading to this
supposition is the sort of inductive inference with which much of
statistics is concerned.
Digitized by Google
2. Fundamental operations and
definitions
Functional notation
IF the value of one variable, say 1/, depends on the value of another,
aay x, then 1/ is said to be a function of x. For example, the response to a
drug is a function of the dose. The usual algebraic way of saying this is
11 = J(x) where J denotes the function. This equation is read '1/ equals a
function oj x'. If it is required to distinguish different functions of the
aame variable then different symbols are chosen to represent each
function. For example, 111 = g(x), 112 = ~(x). If the function 1 were the
square root, 9 were the logarithm, and ~ the tangent then the above
equations could be written in a less abstract form as 1/ = yX,1/l = log
x, and 1/2 = tan x. This notation can be extended to several variables.
H the value of 1/ depends on the value of two different variables, Xl
and X2 say, this could be denoted 11 = I(Xl' X2)' An example of such a
function is 1/ = X I 2 +X2'
Needless to say, the symbols, I, g, and ~ do not stand for numbers
and, for example, it is very important to distinguish 1/ = I(x) from
'11 equals 1 times x'. In the present case I, g, and ~ stand for operations
carried out on the argument x in just the same way as the symbol
'+' stands for the operation of addition of the quantities on each side
of the plus sign, or the symbol d/dx stands for 'find the differential
coefficient with respect to x'.
In the following pages this operational notation is used frequently.
For example, 8(X) will stand for 'the estimated standard deviation oj
x' (not '8 timu x'). t The square of the standard deviation is called the
t See § 2.6 for the definitions. Although it is commonly used, this is not really a
consistent uae of the notation. The 88IDple standard deviation, 8(1Il), is not a function of
a single variable Ill, but of the whole set of III values making up the 88IDple. And in the
cue of the population standard deviation, a(IIl), a is really an operator on the probability
distribution of:ll (_ Appendix 1).
Digitized by Google
10 § 2.1
variance. The variance of z is thus [8(Z)]2, whioh is usually written
r(z). The situation may look even more confusing if it is wished to
denote the estimated standard deviation of a quantity like Zl -~, i.e.
a measure of the scatter of the values of the quantity Zl-Z2' Using the
notation given above this number would be written 8(Zl -Zs) , but this
is not the same 88 8(Zl)-8(Z2); 8 is an operator not a number. To add
to the difficulties it is quite common for 8(Z) or I(z) to be abbreviated
to 8 and I, the argument, z, being understood. So in this case 8 and I
do stand for numbers; the numbers 8(Z) and I (z). Braokets rather than
parentheses are sometimes used to make the notation olearer so the
standard deviation of Zl -Z2 is written 8[Zl -zJ.
Two important operators are those used to denote the formation of
sums and produots, viz. 1: and n (Greek capitals, sigma and pi). For
example, l:z means find the sum of all the values of z, and nz means
find the product of all the values of z. These operations ooour often
and are disoussed in more detail below.
Factorial notation
Another operation that will ooour in the following pages is written
n!, whioh is read 88 'faotorial n'. When n is an integer this has the
value n(n-l)(n-2) ... 1. For example, 4! = 4X3X2Xl = 24. A
more general definition (the gamma function) is valid also for non-
integers and ooours often in more advanced work than is dealt with
here. In the light of this more general definition, 88 well 88 for reasons of
convenience that will be apparent later, 0 t (faotorial zero) is defined 88
having the value 1.
Digitized by Google
§ 2.1 11
blood pressure. There are n observations 80 in general an observation is
71t where i = 1. 2•...• n. If n = 5. for example. then the five observations
are symbolized 711. 112. 1Ia. 11,. 1Ia. Note that the subscript i has not neoes-
aarily got any particular experimental signifi.canoe. It is merely a method
for counting. or labelling. the individual observations.
The observations can be laid out in a table thus:
sum =
,..I1/, or, more briefly,
"
I1/,.
,.1 '.1
This expression symbolizes the number
sum = 111 +1I2+lIa+ ••• +1I".
Similarly,
,·a
I1/, stands for the number 1Ia+1I, + lIa·
,.a
Thus the arithmetic mean of n values of 11 is
Digitized by Google
12 Operations and definitions § 2.1
Two subscripts, say i andj, are now needed, one for keeping count of
the observations and one for the animals. i takes the values 1, 2, 3, ... ,
n; and j takes the values 1, 2, 3, ... , k. The ith observation on the jth
animal is thus represented by the symbol YI/' In more general terms,
YII stands for the value of Y in the ith row and jth column of a table
(or two-dimensional a"ay, or matrix) such as that shown above.
For example, a table with 3 columns and 4 rows could be written
1-"
T.I = IYII =YlI+Y2/+Y3/+···+Yft/·
1~1
(2.1.1)
12"
-
IYII T .1
1= 1 (2.1.2)
Y.I=--=-·
n n
Again notice that after summing over the values of i (i.e. adding
up the nu mben in a specified column) the answer does not involve i,
but does sti II involve the specified column number j. The symbol i, the
subscript operated on, is replaced by a dot in the symbols T.I and Y.I'
In an exactly similar way the total for the ith row, Tu is written
J-"
TI . = I
J-l
YII = Yll +YI2+YI3+"'+YIIc (2.1.3)
Digitized by Google
§ 2.1 Operationa and deftnitiona 13
For example, for the second row P2 • = Y21+Y22+ ..• +Y2/C (=11 in
the example). The mean value for the ith row is
I_Ie
-
P
I.
IYII
1= 1 (2.1.4)
Yt. =k=-Ic-·
Using the numbers in the 4X 3 table above, the totals and means
are found to be
Phe grand talal of all the observations in the table illustrates the
meaning of a double summation sign. The grand total (0, sayt) could
be written as the sum of the row totals
1=71
G= I (Pt,}.
1= 1
o= If('fYIi).
1=1 1=1
Equally, the grand total could be written as the sum of the column
totals
I=/c
0= I (P. / )
1= 1
t It would be more coDBistent with earlier notation to replace both lIuftb:ee by dota
and call the grand total T .• , but the symbol G is often used instead.
Digitized by Google
14 § 2.1
which, inserting the definition of Pol from (2.1.1), becomes
Since the grand total is the same whichever order the additions are
carried out in, the parentheses are superfluous and the operation is
usually symbolized
'znl=k I=k'-n
G= I I YII or 1-1'-1
,=1/ .. 1
I I YII or simply IIY·
What to do if you get 8twk
If it is ever unclear how to manipulate the summation operator
simply write out the sum term by term and apply the ordinary rules of
algebra. For example, if k denotes a constant then
"
Ik-x, "
= kIx, (2.1.5)
'-I '-I
because the left-hand side, written out in full, is kx1 +k-x2+.o.+kxn
= k(xi +xt+ ... +x,,), which is the right-hand side. Thus if k is the same
for every x it can be 'taken outside the summation sign'. However
l:k,x" in which each x is multiplied by a different constant, is klxl
+k~2+ ... +k"x", which cannot be further simplified.
It follows from what has been said that if the quantities to be added
do not contain the subscript then the summation becomes a simple
multiplication. If all the x, = 1 in (2.1.5) then
,."
I k= k+k+ ... +k =nk (2.1.6)
'=1
and furthermore, if k = I,
(2.1.7)
(2.1.8)
Digitized by Google
§ 2.2 15
2.2. Probability
The only rigorous definition of probability is a set of moms defining
its properties, but the following discussion will be limited to the less
rigorous level that is usual among experimenters. For practical pur-
poses the probability of an event is a number between zero (implying
impossibility) and one (implying certainty). Although statisticians
differ in the way they define and interpret probability, there is complete
agreement about the rules of probability described in § 2.4. In most of
this book probability will be interpreted as a proportion or reiatitJe
frt.q'lWflC'!J. An excellent discussion of the subject can be found in
Lindley (1965, Chapter 1).
The simplest way of defining probability is as a proportion, viz.
'the ratio of the number of favourable cases to the total number of
equiprobable cases'. This may be thought unsatisfactory because the
concept to be defined is introduced as part of its own definition by the
word 'equiprobable', though a non-numerical ordering of likeliness
more primitive than probability would be sufficient to define 'equally
likely', and hence 'random'. Nevertheless when the reference set of
'the total number of equiprobable cases' is finite this description is
used and accepted in practice. For example if 55 per cent of the popula-
tion of college students were male it would be asserted that the prob-
ability of a single individual chosen from this finite population being
male is 0·55, provided that the probability of being ohosen was the
same for all individuals, i.e. provided that the choice was made at
random.
When the reference population is infinite the ratio just discussed
cannot be used. In this case the frequency definition of probability is
more useful. This identifies the probability P of an event as the limiting
value of the relative frequency of the event in a random sequence of
trials when the number of trials becomes very large (tends towards
infinity). For example, if an unbiased coin is tossed ten times it would
not be expected that there would be exactly five heads. H it were tossed
100 times the proportion of heads would be expected to be rather
closer to 0·5 and as the number of tosses was extended indefinitely the
proportion of heads would be expected to converge on exactly 0·5.
This type of definition seems reasonable, and is often invoked in
practice, but again it is by no means satisfactory as a complete,
objective definition. A random sequence cannot be proved to converge
in the mathematical sense (and in fact any outcome of tossing a true
s
Digitized by Google
16 Operations and definitions § 2.2
coin a million times is poBBible), but it can be shown to converge in a
statistical sense.
Degrees oj belieJ
It can be argued. persuasively (e.g. Lindley (1965, p. 29» that it is
valid and sometimes necessary to use a subjective definition of prob-
ability as a numerical measure of one's degree of belief or strength of
conviction in a hypothesis ('personal probability'). This is required in
many applications of Bayes' theorem, which is mentioned. in §§ 1.3 and
2.4 (see also § 6.1, para. (7». However the application of Bayes' theorem
to medical diagnosis (§ 2.4) does not involve subjective probabilities,
but only frequencies.
2.3. Randomization and random sampling
The selection of random samples from the population under study
is the basis of the design of experiments, yet is an extraordinarily
difficult job. Any sort of statistical analysis (and any 80rt oj intuitive
analyBis) of observations depends on random selection and allocation
having been properly done. The very fundamental place of randomiza-
tion is particularly obvious in the randomization significance tests
described in Chapters 8-11.
It should never be out of mind that all calculations (and all intuitive
assessments) belong to an entirely imaginary world of perfect random
selection, unbiased. measurement, and often many other ideal properties
(see § 11.2). The 8BBumption that the real world resembles this imagin-
ary one is an extrapolation outside the scope of statistics or mathe-
matics. As mentioned in Chapter 1 it is safer to 8BBume that samples
have some unknown bias.
For example, an anti-diabetic drug should ideally be tested on a
random sample of all diabetics in the world-or perhaps of all dia-
betics in the world with a specified. form and severity of the disease,
or all diabetics in countries where the drug is available. In fact, what is
likely to be available are the diabetic patients of one, or a few,
hospitals in one country. Selection should be done 8trictly at random
(see below) from this restricted population, but extension of inferences
from this population to a larger one is bound to be biased to an un-
known extent.
It is, however, quite 68Sy, having obtained a sample, to divide it
strictly randomly (see below) into several groups (e.g. groups to receive
new drug, old drug, and control dummy drug). This is, nevertheless,
Digitized by Google
§ 2.3 Operation8 and deftnitioTUJ 17
very often not done properly. The hospital numbers of the patients will
not do, and neither will their order of appearance at a clinic. It is very
important to realize that 'random' is not the same thing as 'haphazard'.
If two treatments are to be compared on a group of patients it is not
good enough for the experimenter, or even a neutral person, to allocate
a patient haphazardly to a treatment group. It has been shown re-
peatedly that any method involving human decisions is non-random.
For all practical purposes the following interpretation of randomness,
given by R. A. Fisher (1951, p. 11), should be taken as a fundamental
principle of experimentation: ' . . . not determined arbitrarily by
human choice, but by the actual manipulation of the physical apparatus
used in games of chance, cards, dice, roulettes, etc., or, more ex-
peditiously, from a published collection of random sampling numbers
purporting to give the actual results of such manipulation.'
Published random sampling numbers are, in practice, the only
reliable method. Samples selected in this way (see below) will be re-
ferred to as selected strictly at random. Superb discussions of the crucial
importance of, and the pitfalls involved in random sampling have been
given by Fisher (1951, especially Chapters 2 and 3) and by Mainland
(1963, especially Chapters 1-7). Every experimenter should have read
these. They cannot be improved upon here.
Digitized by Google
18 Operations and definitions § 2.3
is the largest size of random permutation available). Suppose that 15
subjects are to be divided into group of size "1' ~, and na. First number
the subjects 0 to 14 in any convenient way. Then obtain a random
permutation of 15 by taking the first random permutation of 20
from the tables and deleting the numbers 15 to 19. (This permutation in
the table should then be crossed out 80 that it is not used again- use
each once only.) Then allocate the first ~ of the subjects to the first
group, the next ~ to the second group, and the remainder to the third
group. For example, if the random permutation of 15 turned out to be
1, 6, 8, 5, 10, 12, 11,9, 2, 0, 3, 14, 7,4, 13 (the first permutation from
Fisher and Yates (1963), p. 142» and the 15 subjects were to be divided
randomly into groups of 5, 4, and 6 subjeots then subjects 1, 6, 8, 5, and
10 would go in the first group, 12, 11, 9, and 2 in the second group, and
the rest in the third group.
For larger numbers of subjeots the tables of random digits must be
used. For example, to divide 24 subjects into 4 groups of 6 the procedure
is as follows. First number the subjects in any convenient way with the
numbers 00 to 23. Take the digits in the table in groups of two. The
table then gives the integers from 00 to 99 in random order. One
procedure would be to delete all numbers from 24 to 99, but it is more
economical to delete only 96, 97, 98, and 99 (i.e. those equal to or larger
~ than 96, whioh is the biggest multiple of 24 that is not larger than 100).
Now the remaining numbers are a random sequence of the integers
from 00 to 95. From each number between 24 and 47 subtract 24; from
each number between 48 and 71 subtract 48; and from each number
between 72 and 95 subtract 72 (or, in other words, divide every number
in the sequence by 24 and write down the remainder). For example, if
the number in the table is 94 then write down 22; or in place of 55
write down 07. (The numbers from 96 to 99 must, of course, be omitted
because their presence would give the numbers 00 to 03 a larger chance
than the others of occurring.) Some numbers may appear several times
but repetitions are ignored. H the final sequence were 21, 04, 07, 13, 02,
02,04,09,00,23, 14, 13, 11, etc., then subjects 21, 04, 07, 13,02,09 are
allocated to the first group, subjects 00,23, 14, 11, eto. are allocated to
the second group, and 80 on.
The method is simpler for the random block experiments described
in §§ 11.6 and 11.7. Blocks are never likely to contain more than 20
treatments 80 the order in which the treatments oocur in each blook is
taken from a random permutation found from the tables of random
permutations as above. For example, if there are four treatments in
Digitized by Google
§ 2.3 Operations and definitions 19
each block number them 0 to 3, and for each block obtain a random
°
permutation of the numbers to 3, by deleting 4 to 9 from the tabulated
random permutations of 10, crossing out each permutation from the
table 88 it is used.
The selection of a Latin square at random is more complicated
(see § 11.8).
Digitized by Google
20 Operations and definitions §2.4
The simple addition rule holds because the events considered are
mutually exclusive. In the last case only they are also exhaustive. An
example of the use of the full equation (2.4.1) is given below.
Digitized by Google
§ 2.4 Operati0n8 and deftniti0n8 21
or, equivalently,
p[London student] = p[London studentlsmoker].
Digitized by Google
22 OparatioM and deftnitioM § 2.4
where k is a proportionality constant. If the set of hypotheses con-
sidered is exhaustive (one of them must be true), and the hypotheses are
mutually exclusive (not more than one can be true), the addition
rule states that the probability of (hypothesis 1) or (hypothesis 2) or
. . . (which must be equal to one, because one or another of the
hypotheses is true) is given by the total of the individual probabilities.
This allows the proportionality constant in (2.4.7) to be found. Thus
IP[BJiY] = kl:(P[YIBJ].p[BJ]) = 1 and therefore
aUI
(2.4.8)
Digitized by Google
§ 2.4 Operations tJM definitions 23
to have given some good results (see, for example, Bailey (1967,
Chapter 11).
A numerical example
The simplest (to the point of naivety) example of the above argu-
ment is the case when only one disease and one symptom is oonsidered.
The example is modified from Wallis and Roberts (1956).
Suppose that a diagnostio test for cancer has a probability of 0·96
of being positive when the patient does have cancer. H 8 stands for the
event that the test is positive and S for the event that it is negative
(the data), and if D stands for the event that the patient has cancer,
and D for the event that he has not (the two hypotheses) then in
symbols P[8ID] = 0·96 (the li1celihood of D if 8 observed). Because
the test is either positive or not a slight extension of (2.4.3) gives
p[SID] = I-P[SID] = 0·04 (the li1celihood of D if 8 is observed). The
proportion of patients with cancer giving a negative test (false nega-
tives) is 4 per cent. Suppose also that 95 per cent of patients without
cancer give a negative test, p[SIi5] = 0·95. Similarly P[SID] = 1-0·95
= 0·05, i.e. 5 per cent of patients without cancer give a positive test
(false positives). As diagnostic tests go, these proportions of false
results are not outrageous. But now oonsider what happens if the test
is applied to a population of patients of whom 1 in 200 (0·5 per cent)
suffer from cancer, i.e. P[D] = 0·005 (the prior probability of D) and
p[i5] = 1-0·005 = 0·995 (from (2.4.3) again). What is the probability
that a patient reacting positively to the test aotually has cancer'
In symbols this is p[DIS], the posterior probability of D after obse~
S, and from (2.4.7) or (2.4.9), and (2.4.8) it is, using the probabilities
assumed above,
P[SID].P[D]
P[DISJ = P[SID].P[D]+P[SID].P[D]
0·96 X 0·005
-- ---------
(0·96 X 0·005)+(0·05 X 0·995)
0·0048
-------
0·0048+0·04975
0·0048
-- --
0·05455
= 0·0880. (2.4.11)
Digitized by Google
24 OpBf'ati0n8 and deji:n.iti0n8 § 2.4
In other words, only 8·80 per cent of positive reactors actually have
cancer, and 100-8'80 = 91·2 per cent do not have cancer. Not such
a good performance. It remains true that 96 per cent of those with
cancer are detected by the test, but a great many more without cancer
also give positive tests.
It is easy to see how this arises without the formality of Bayes'
theorem. Suppose that 100000 patients are tested. On average 500
(=100000xO'005) will have cancer and 99500 will not have cancer.
Of the 500 with cancer, 500 X 0·96 = 480 will give positive reactions on
average. Of the 99500 without cancer, 99500 X 0·05 = 4975 will give
positive reactions on average (a much smaller proportion, but a much
larger number than for the patients with cancer). Of the total number
of positive reactors, 480+4975 = 5455, the 1/,umber with cancer is
480 and the proportion with cancer is 480/5455 = 0·0880 as above.
If these numbers are divided by the total number of patients, 100000,
they are seen to coincide with the probabilities calculated by Bayes'
.
theorem in (2.4.11) .
2.5. Averages
If a number of replicate observations is made of a variable quantity
it is commonly found that the observations tend to cluster round some
central value. Some sort of average of the observations is taken as an
estimate of the true or population value (see § 1.2) of the quantity that
is being measured. Some of the possible sorts of average will be defined
now. It can be seen that there is no logical reason for the automatic
use of the ordinary unweighted arithmetic mean, (2.5.2). If the distri-
bution of the observations is not symmetrical it may be quite inappro-
priate, and nonparametric methods usually use the median (see §§
4.5,6.2, and 7.3 and Chapters 9, 10, and 14).
Digitized by Google
§ 2.5 Operations and definitions 25
the values of x in the population from which the sample was taken and
will be denoted I' (see § ALl fOl' a more rigorous definition).
The weiglal of an ob8ervation. The weight, WI' 8880Ciated with the ith observa-
tion, Zit is a measure of the relative importance of the observation in the flnal
result. U8ua1ly the weight is taken &8 the reciprocal of the variance (see § 2.6 and
(2.7.12», 80 the observations with the 8Dl&llest scatter are given the greatest
weight. If the observations are uncorrelated, this procedure gives the best
estimate of the population mean, i.e. an unbiased estimate with minimum variance
(maximum precision). (See §§ 12.1 and 12.7, Appendix I, and Brownlee (1965,
p. 95).) From (2.7.8) it is seen that halving the variance is equivalent to doubling
the number of observations. Both double the weight. See § IS.4 for an example.
Weights may also be arbitrarily decided degrees of relative importance. For
example, if it is decided in examination marking that the mark for an e88&y
paper 8hould have twice the importance of the marks for the practical and oral
examinations, a weighted average mark could be found by &88igning the e88&y
mark (say 70 per cent) of a weight of 2 and the practical and oral marks (say
30 and 40 per cent) a weight of one each. Thus
Ii = (2X70)+(1 xSO)+(l x(0) = 52.5.
2+1+1
If the weights had been chosen &8 I, 0'5, and 0·5 the result would have been
exactly the same.
i= ('.oN
nx,)l/N. (2.5.3)
'-I
Digitized by Google
26 § 2.5
It will now be shown that this is the sort of mean found when the
arithmetic mean of the logarithms of the observations is calculated
(as, for example, in § 12.6), and the antilog of the result found.
Call the original observations z" and their logarithms x" so
x, = log z,
. -- :Ex, l:(log z,)
Arithmetic mean of log z = f = (log z) = N = N
= log {~(nz,)}
The median
The population (or true) median is the value ofthe variable such that
half the values in the population fall above it and half above it (i.e.
it is the value bisecting the area. under the distribution curve, see
Chapters 3 and 4). It is not necessarily the same as the population mean
(see § 4.5). The population median is estimated by the
Digitized by Google
27
The mode
The sample mode is the most frequently observed value of a variable.
The population mode is the value corresponding to the peak of the
population distribution curve (Chapters 3 and 4). It may be different
the mean and § 4.5).
aquaru utimate
aection anticl~lL5ht_ "§§"'£l,,""'ClEon of eetima.ti5h5h z§§lLggopter 12, and
ALS. The arithmeAgg sample is said to 5h"1ua.ree eatuna.te
(Bee Chapter 12) because it is the value that best represents the sample in the
aeuse that the sum of the aquaree of the deviations of the observations from the
arithmetic mean, l:(II:,-Z)2, is mnaller than the sum of the aquaree of the deviations
from any other value. This can be shown without using calculus &8 follows.
Suppoae, &8 above, that the sample consists of N observations, 11:11 ~ ..... II:N' It
is requIred to find a value of m that makes l:(II:,-m)2 &8 8Dl&ll &8 poasible. This
fud5hws immediately SoE§§5h§§ralc identity
= l:(II:,-.f)2+N(gt~ffi}El" (2.5.6)
values of m ~""'r"H""'" this is clearly 5h5hithmetic mean,
this makes
§§og~"~~5hW18 zero; &8 8Dl&ll at §§"5hr the example
tdd'5hwing (2.5.2), the
l:(11:,-:2)2 = (70-46'7)2+(30-46'7)2+(40-46'7)2 = 866·7.
and a few trials will show that in8erting any value other than 46·7 makes the IIUDl
of aquaree larger than 866·7.
The intermediate steps in establishing (2.5.6) are 8&8y. By definition of the
arithmetic mean Nz = U" 80 the right-hand side of (2.5.6) can be written.
ffi~"_§§leting the squa.ree (§§.1.6). &8
as stated in (2.5.6).
Using calculus the same result can be reached more elegantly. The usual way of
flnding a minimum in calculus is to difterentiate and equate the result to zero
(see Thompson (1965. p. 78». This process is described in detail, and illustrated.
in Chapter 12. In this caae l:(z,-m)2 is to be minimiwci with respect to m.
LELEl_5hntlating l:(II:, zml:z,+Nm2. the 11:, are
4:EK5h"s5ht§§E§§ta for a given £lquating to zero,
(2.5.7)
and m = u,/N = z
&8 found above.
gitized by Goo
28 Operations and definitions § 2.6
2.6. Measures of the variability of observations
When replicate observations are made of a variable quantity the
scatter of the observations, or the extent to which they differ from
each other, may be large or may be small. It is useful to have some
quantitative measure of this scatter. See Chapter 7 for a discUBBion
of the way this should be done. As there are many sorts of average
(or 'measures of location'), so there are many measures of scatter.
Again separate symbols will be used to distinguish the estimates of
quantities calculated from (more or leBS small) samples of observations
from the true values of these quantities, which could only be found if
the whole population of possible observations were available.
The range
The difference between the largest and smallest observations is the
simplest measure of scatter but it will not be used in this book.
If, however, the deviations are all taken as positive their sum (or mean)
i8 a measure of scatter.
Digitized by Google
§ 2.6 Operations aM definitions 29
variance will be denoted VM(X) or a2(x). The estimates are calculated as
_ J[1:(X,-i)~
8(X) - N-l -J
(2.6.2)
_!I 1:(X,-1')2
.,-(x) = . (2.6.3)
N
t If the 'obvious' quantity, N, were used 88 the denominator in (2.6.2) the estimate
of a2 would be biaud even if the obll8rvations themll81ves were perfectly free of biaa
(ayatematio errors). This sort of biaa results only from the way the obll8rvations are
treated (another example occurs in § 12.8). Notice also that this implies that the mean
of a very large number of values of T.(:t:-~)·/N would tend towards too 1IJII&11 a value,
viz. '''''(11:) X (N -I)/N, 88 the number of values, each calculated from a small sample,
~; whereaa the same formula applied to a single very large sample would tend
towarda ."..(:t:) itll8lf 88 the size of the sample (N) increaaes. Th8118 results are proved in
§ AI.S. It mould be mentioned that unbiaaedn8ll8 is not the only criterion of a good
statistio and other criteria give different divisors, for example N or N + 1.
Digitized by Google
-----------~~- ~-- -----
x, XI-X (X,-X)2
0 +2 +4
1 -2 +4
2 -1 +1
4 +1 +1
Totals 12 0 10
Thus
.-tI (X,-X)2 = 1080, from (2.6.2), 8(X) = V(10/3) = 1·83.
'-1
TM coeJficient 01 t1ariation
This is simply the standard deviation expressed as a proportion (or
peroentage) of the mean (as long as the mean is not zero of course).
8(X)
O(x) = -_- (sample value),
X
tl(x) = -,;-
O'(X) = J[v&t(X)]
~ (population value), (2.6.4)
Digitized by Google
§ 2.6 31
Now, since u, = Ni, this becomes
and thus
(2.6.5)
Plat COWJria7lCe
This quantity is a measure of the extent to whioh two variables are
correlated.. Uncorrelated events are defined as those that have zero
covariance, and statistically independent events are necessarily un-
correlated. (though uncorrelated. events may not be independent--eee
§ 12.9).
The true, or population, covariance of x with y will be denoted
uv(x,y), and the estimate of this quantity from a sample of observa-
tions is
~(x-i)(y-g)
cov(x,y) = N -1 . (2.6.6)
The numerator is called the BUm 0/ prodvd8. That the value of this
expresaion will depend on the extent to whioh y increases with x is
clear from Fig. 2.6.1 in which, for example, y might represent body
weight and x calorie intake. Each point represents one pair of observa-
tions.
II the graph is divided into quadrants drawn through the point
i, 9 it can be seen that any point in the top right or in the bottom left
quadrant will contribute a positive term (x-i)(y-g) to the sum of
products, whereas any point in the other two quadrants will contribute
a negative value of (x-i)(y-g). Therefore the points shown in Fig.
2. 6. 2(a) would have a large positive covariance, the points in
Fig. 2.6.2.(b) would have a large negative covariance, and those in
Fig. 2.6.2(0) would have near zero covariance.
4
Digitized by Google
32 § 2.6
The tDOf'ang formula for the nm of products
A more convenient expression for the sum of products can be found
in a way exactly analogous to that used for the sum of squares (2.6.6).
It is .
y I
(x-i) is negative I (x-i) is positive 0
(y-j) is positive : (y - y) is positive o
I
I o o
I
I
o I
j -------------~-----~----------
I 0
(x-i) is negative o I (x-i) is positive
(y- g) is negative : (y-j) is n~ative
I
o o I
I
o I
o 0 I
I
I
Digitized by Google
§ 2.7 33
Digitized by Google
§ 2.7
standard deviation of z, or 'sample standard deviation of the observa-
tions'. This term is unnecessary and 8ample standard deviation 01 the
fJWJn is a preferable name for 8(i).
The sample standard deviation of the observations, 8(Z), is the
interest if on€'! ti.Etimate the ..m........... Ealues
In other ffu1titi·tiures
Taking J1J11mple
0.8
!> 0·6
0·7
1 (a)
'a
CIl
"'C 0·5.
,,=4ES
I
I
tiEl
J
ffu ..
0·1
0'0~=--7---!:--~--:-----!:~~
I 6
x
FlO. 2.7.1 (a) Distribution of observations (z values) in the population.
ffuffuder the curve beJ1WJ1'1ti t.wo z values is tEffu of an
.. 1..,l11.....,111.1 ... n Ealling between 80 the total area
CE€'!Eter 4 for details). E.€'!...J1E.1ular distribution
chapter are vaE.E .11"'J1dbution (though Eevia-
nlmple interpretaJ1111n the distribution The
uf z is 4·0 and the d.ti~€'!tion, a(z), is 1·0. Eliwl1YEution
of:f values. :f is the mean of a sample of four z values from the population repre-
sented in (a). The area under this curve must be 1·0, like the distribution in (a).
To keep the area the same, the distribution must be taller, because it is narrower
{i.e. the :f values have less scatter than the z values). The ordinate and abscissa
are drawn on the same scale in (a) and (b). The mean value of:f is 4·0 and its
standa.rd deviation, a(x) (the 'standard error of the mean'), is 0·5.
gle
§ 2.7 Operations and definitions 35
on the average, because it is an estimate of 0'(£) (the population
'standard error'), which is smaller than o'(x).
The standard deviation of the mean may sometimes be measured, for
special purposes, by making measurements of the sample mean (see
Chapter 11, for example). It is the purpose of this section to show that
the result of making repeated observations of the sample mean (or of
any other function calculated from the sample of observations), for
the purpose of measuring their scatter, can be predicted indirectly.
If the scatter of the means of four observations were required it could
be found by observing many such means and calculating the varianoe
of the resulting figures from (2.6.2), but alternatively it could be
predicted using 2.7.8) (below) even if there were only four observations
altogether, giving only a single mean. An illustration follows.
TABLE 2.7.1.
Phree random samples, each with N = 4 ob8ervations, from a population
with mea'll. p. = 4·00 and standard deviation o'(x) = 1·00
Z values 3.99
8·9 5·88 8·79
8'88 2·45 8·25
8·89 2·21 4'07
6'36 5'96 8·21
That is dealt with in Chapter 11.) The observations were all selected
randomly from a population knoum (because it was synthetic, not
experimentally observed) to have mean p. = 4·00 and standard
deviation o'(x) = 1·00, as shown in Fig. 2.7.1(a)
Digitized by Google
36 § 2.7
The sample means, i, of the three samples, 4·40, 4·12, and 3·58, are all
estimates of p. = 4·00. The grand sample mean, 4·04; is also an estimate
of p. = 4·00 (see Appendix 1). The standard deviations, 8(Z), of each
of the three samples, 1·33, 2·08, and 0·420, are all estimates of
the population standard deviation, a(z) = 1·00 (a better estimate
could be found by averaging, or pooling, these three estimates as
described in § 11.4). The population standard deviation of the mean
can be estimated directly by calculating the standard deviation of the
sample of three means (4·40, 4·12, 3·58) using (2.6.2). This gives
__ J[(4.40-4.04rOl+(4.12-4.04rOl+(3.58-4.04)~
8(Z) - 3-1 -J
= 0·420.
A relert/fI,U list
The problem dealt with throughout this section has been that of
predicting what the scatter of the values of various functions of the
observations (such as the mean) would be il repeated samples were
taken and the value of the function calculated from each. The aim is
to predict this, given a single sample containing any number of observa-
tions that happen to be available (not fewer than two of course).
The relationships listed below will be referred to frequently later on.
The derivations should really be carried out using the definition of
the population variance (§ A1.2 and Brownlee (1965, p. 57», but it
will, for now, be sufficient to use the sample variance (2.6.2). The
results, however, are given properly in terms of population variances.
The notation was defined in §§ 2.1 and 2.6.
Digitized by Google
2.7 37
Variance 0/ tk sum or differenu 0/ two tJariablu. Given the variance of
the values of a variable x, and of another variable y, what is the pre-
dicted variance of the figures fOWld by adding an x value to a y value'
From, (2.6.2),
j-N
I [(Xj+Yj)
j--=..I_ _ __
gitized by Goo
38 Operatitma and dejinititma § 2.7
the seoond form being appropriate (cf. (2.1.6» if all the x. have the
same variance, v&t(x).
(from (2.7.5»
N
= -V4t(X) (from (2.7.4))
N2
and therefore the variance of the mean is
v&t(x)
vM(i) = - - (2.7.8)
N
and the standard deviation a/the mean (the standard error, see discussion
above) is
a(x)
a(i) = y[vat(i)] = - . (2.7.9)
yN
Digitized by Google
§ 2.7 39
Notice that var(x). being an average (like x). will be more or leas the
same whatever size sample i8 used to e8timate it (thoJJgh a larger
sample will give a more preci8e value). whereas var(x) becomes 8maller
&8 the number of observations averaged in0rea.&e8. &8 expected from the
discusaion at the beginning of this section and from (2.7.8).
TA. variance of 1M tHig1aled arithmelic mean. The variance of the weighted mean.
defined in (2.5.1). follows from (2.7.5) and (2.7.10).
EWfll:I) _ _ 1_ ~__ _ E[wlIU•• (ZI)).
".. ( ~
~WI
~ aua.(~cZl) -
- (~WI) (E WI)a
Now if WI = l/u.t(~I)' as diacuaaed in f 2.5. then E[wlI uat(zl)) = EWI 10
Digitized by Google
40 § 2.7
if the variances are reuonably amall relative to the me&D8, 80 that the function
can be represented approximately by a atraight line in the range over which each
~ varies. The derivatives should be evaluated at the true mean value of the ~
variables. If the ~ variables are correlated then terms involving their covarlances
must be added as shown below. For diacuaeion and the derivation lee, for example,
Lindley (1965, p. 184), Brownlee (1965, p. 144), and Kendall and Stuart (1968,
p.281).
If 1 is a linear function then (2.7.18) reduces to (2.7.10), which 18 euct. If 118
not linear then the result is only approximate and furthermore/willl!ot have the
same dJatrlbution of er1'01'II as the ~ variables 80 if, for example, the ~ values were
normally dJatrlbuted (lee f 4.2) 1 would not be normally dJatrlbuted, 80 it. vari-
ance, even if it were euct, could not be interpreted in any simple way.
Tile vcariGtICe 01 log "z. If the true mean value of ~ is P and then wdng the version
of (2.7.18) for a single variable gives
d log"z)1I ,,_(~)
"_(log,,z)~ ( ~ ,,-(~) = --11- = tflI(~). (2.7.14)
_.,. P
Therefore the standard deviation of log"z is approximately equal to the coeftlcient
of variation of~, "(~), defined by (2.6.4). If the standard deviation of ~ increases
in proportion to the true mean value of~, 80 that the coeftlcient of variation of ~
is constant, the standard deviation of log"z will be approximately constant
(of. If 11.2 and 12.2).
Tile 1ICIf'iGtICe of tile product 01 two wriGbla, ~1a:a. In tbIa case an exact result
can be derived for the variance of values of ~la:a, given the variances of ~1 and of
a:a. Suppoee that ~1 and ~11 are independent of each other, and have population
me&D8 PI and P2 respectively. Thent
"-(~la:a) = "a.(~l)."at(Zs)+pl "at(~l)+P~ ""(a:a).
If tbIa result is divided through by C/AlP.)II, it can be expreeaed in terms of co-
emcient. of variation, defined in (2.6.4), as
tflI(ZlZs) = tflI(~l).tflI(Zs>+'i'2(~l>+'i'2(a:a). (2.T.15)
It is interesting to compare tbIa with the result of applying the approximate
formula, (2.7.13), viz.
V"'(ZlZ.)~(8Z1Zs)a
8zt
vat(zl) + (8Z1Zs )1I" ... (Zs)
o-,z.
= P: "at(zl>+ p~ ,,"'(a:a);
or, again dividing through by (PlP.)· to get the result in terms of coeftlcient. of
variation,
Digitized by Google
12.7 41
By oompe.riaon with (2.7.15) it appeus that WI8 of the appl'OxImate formula
involves neglect.iDg the term t'2(zl).tII(~). The apPl'Oximation involved can be
lllustrated by two numerical eumplee.
Firat, euppoee that both zt and ~ have coetIlcient. of variation of 50 per cent.
Le. 'l(zt) ... 0'5. '1'(.. ) = 0·5. In t.hla cue (2.7.15) gives
'l(zt.. ) .. v[(Oo&lx O·&lHO·&I+O·&i] -= V(0·0025+0·25+0·15.) - 0·750.
i.e. 75·0 per cent. The appl'Oximate form gives
'l(Zl~) ~ V(0·61 +O·&I) = 0·707.
Le. 70·7 per cent. 8econd.ly. coulder more accurate obeervatio.... _1' 100 'l(zt)
... 5 per cent and 100 '1'(.. ) = 5 per cent. Simllar ealculatlo... show that (2.7.16)
gives 100 'l(zt~) = 7·075 per cent. whereu the approximate form gives 100
'l(zt~) ~ 7·071 per cent. The more accurate the obeervationa, the better the
approximate veralon will be.
n. WIt'ionce o/IA. ratio o/tUKJ tIf.Iriabla. zl/~. Using (2.7.18) gives, in terma of
ooeftlolent. of variation. the approximate reault
(2.7.18)
An euct treatment for the ratio of two normally dJstrlbuted variables Ie given in
118.6 and exempWled in II 13.11-18.15.
(2.7.18)
ThIs relationabip Is referred to in I 13.5. For a linear function this reduces to the
two variable cue of (2.7.11). The n variable exteulon of (2.7.18) involves all
poBble pairs of z variables in the 8&Dle way 88 (2.7.11).
8vm 0/ fJ t1CIriable number 0/ random tlClriabla. Let 8 denote the sum of a randomly
variable number of random variables
III
8=1:-..
1-1
Digitized by Google
42 Operationa and definitiona § 2.7
random 8&Dlple (of variable Bize) from the population of. values. then it is mown
in I AI.' that the coefficient of variation of S is
where P. is the population mean value of m (size of the 8&Dlple). This result is
llluatrated on p. 58.
Digitized by Google
3. Theoretical distributions: binomial
and Poisson
Digitized by Google
44 § 3.1
possible to discover something approaohing the true proportion of
samples containing r drinkers, r being any specified number between
o and 25. These figures are called the probability distribution of the
proportion of whisky drinkers in a sample of 25 and since this propor-
tion is a discontinuous variable (the number of drinkers per sample
must be a whole number) the distribution is described &8 discontinuous.
The distribution is usually plotted &8 a block histogram &8 shown in
Fig. 3.4.1 (p. 52), the block representing, say, 6 drinkers extending
from 1).1) to 6·5 along the absci8880.
The second example, concerning the estimation of the true concentra-
tion of a drug solution, leads to the idea of a continUOUB probability
di8tribution. If many estimates were made of the same concentration
it would be expected that the estimates would not be identical. By
analogy with the discontinuous case just discussed it should be possible,
if a large enough number of estimates were made, to find the proportion
of estimates having any given value. However, since the concentration
is a continuous variable the problem is more difficult because the
proportion of estimates having exactly any given value (e.g. exactly
12 pg/ml, that is 12·00000000...pg/ml) will obviously in principle be
indefinitely small (in fact experimental difficulties will mean that the
answer can only be given to, say, three significant figures so that in
practice the concentration estimate will be a discontinuous variable).
The way in which this difficulty is overcome is disoussed in § 4.1.
Digitized by Google
§3.2 41S
teet a new drug because it does not inolude a control or placebo group.
Suitable experimental designs are discussed. in Chapters 8-11.
Suppose that" trials are made of a new drug. In this case 'one trial
of an event' is one administration of the drug to a patient. After each
trial it is reoorded whether the patient's oondition is apparently better
(outcome B) or apparently worse (outcome W). It is assumed for the
moment that the method of measurement is sensitive enough to rule
out the possibility of no change being observed.
The derivation of the binomial distribution specifies that the prob-
ability of obtaining a suooess shall be the same at every trial. What
eDOtly does this mean' If the" trials were all oonducted on the same
patient this would imply that the patient's reaction to the drug must
not ohange with time, and the condition of independence of trials
implies that the result of a trial must not be affected by the result of
previous trials. The result would be an estimate of the probability of
the drug producing an improvement in the Bingle patient tested.
Under these oonditions the proportion of successes in repeated sets of"
trials should follow the binomial distribution.
At first sight it might be thought, because it is doubtless true that
the probability of a suooess outcome B, will differ from patient to
patient, that if the " trials were conducted on " dil/ert:nt patients,
the proportion of successes in repeated sets of" trials would fIOI follow
the binomial distribution. This would quite probably be 80 if, for
example, each set of " patients was selected in a different part of the
oountry. However, if the sets of" patients were selected 8Wictly til
ra1'ltlom (see § 2.3) from a large population of patients, then the propor-
tion of patients in the population who will show outcome B (i.e. the
probability, given random sampling, of outcome B) would not change
between the selection of one patient and the next, or between the
selection of one sample of " patients and the next. Therefore the
conditions of oonstant probability and independence would be met in
spite of the fact that patients differ in their reactions to drugs. Notice
the critical importance of strictly random selection of samples, already
emphasized in § 2.3.
From the rules of probability diaoussed. in § 2.4 it is easy to find
the probability of any specified result (number of sucoesses out of "
trials) if 9'(B), the 'rue (population) proportion of cases in whioh the
patient improves, is known. This is a deductive, rather than induotive,
prooedure. A true probability is given and the probability of a particular
result calculated. The reverse prooess, the inference of the population
Digitized by Google
§ 3.2
proportion from a sample, is discU88ed later in § 7.7 and exemplified
by the assay of purity in heart by the Oakley method, described in
§7.S.
Two different drugs will be considered.
'''''''''''''DC> that drug [Jijmpletely inactive, bRRt zt!CRR!C!CR!CRtheless
of patients, mn, improve i.e.
&,(B) = O·J:S = 1-&,(B) = (3.2.1)
¥u"r,~"...
that drug and that of
patients improving in the long run is increased to 90 per cent. Thus
&,(B) = 0·9 and &,(W) = 1-9i'(B) = o·l. (3.2.2)
In both cases, because the outcomes Band W are mutually exclusive,
the special case of the addition rule (2.4.2) gives &,(B or W) =
and becau¥!CR &~re exhaustive twssible
'"'''''''''''''''''' 9i'(B or W) =
= 2)
two trials 0, 1, might be obs!CR!CRR!C'!CRP. t"!CRssible
outcomes of the two trials are shown in Table 3.2.1 and from these
probabilities, P(r), of observing r successes (r = 0, 1, or 2), are calculated
using the multiplication rule, (2.4.6), and the addition rule (2.4.2) .
. 2.1
0·25
1 W a'(W) xa'(B) 0'25} 0'09} 0.18
1 B a'(B) xa'(W) 0·25 0·09
2 B a'(B) xa'(B) 0·25 0·81
,.-11
seen that I each case, hy the
,.-0
because it th"t r will take
o and n.
It is also clear from the table that the calculations are affected by
gle
§ 3.2 Binomial aM P0i88fm 47
the number of ways in which a given result can ocour. One suocess
out of two can occur in two ways, either at the first trial or at the
second, 80 the probability of one success out of two trials, if the order
in which they ocour is immaterial, is 0·6 when 9i'(B) = 0·5 (drug X),
and 0·18 when 9i'(B) = 0·9 (drug V). This follows from the addition
rule, being the probability of either (B at first trial and W at second)
or (W at first trial and B at second).
The mean number (expectation, see Appendix 1) of successes out of
n trials is nfJI, i.e. 1 success out of 2 trials when 9i'(B) = 0·5 (drug X),
0·8
0·7
~ 0·6
~
-.:-
0·5
i='" 0'5
::: ....
III
0·4 0·4
!..
0
~
~
~ 0·3
:cas 0·3
'0 - .Q
~
.€ 0·2 0·2
::5.,
£ 0·) 0· )
0·0 0·0
() 2 r 0 r
FlO. 3.2.1. Binomial distribution FlO. 3.2.2.
of T, the number of BUCCesees out of As in Fig. 3.2.1. butfJ' = 0·0.
n = 2 trials of an event. the
probability of 8uccees at each trial
being fJ' = 0·6.
and 1·8 successes out of two trials when 9i'(B) = 0·9 (drug V). The
results in the table are plotted in Figs. 3.2.1 and 3.2.2.
Digitized by COOS Ie
48 § 3.2
performed. In this oaae there are three possible orders in which one
success may occur in three trials, and the same number for two successes
in three trials. Check the figures in the table to make sure you have got
the idea.
These distributions are plotted in Figs. 3.2.3 and 3.2.4.
0·5
-..
i( 0.4
I..
~ 0':3
...
'0
~. 0·2
~ 0·'
Flo. 8 . 2.8. Binomial distribution of T, the number of BUcctae8 out of" = 8
trlaJs of an event, the probability of 8UCceas at each trial being fJI' = 0·5.
0·7
.~
];
J 0 ·2
J:
0 ·)
0·0
II r
Digitized by COOS Ie
§ 3.3 Binomial and p~ '9
TABLB 3.2.2
P(r) when P(r) when
r 1st 2nd Srd 8'(B) = 0·6 8'(B) = 0'9
trial trial trial (Drq X) (Drq Y)
0 W W W 0·126 0-001
1 W W B 0'125} 0'009}
1 W B W 0·126 0·S75 0'009 0-027
1 B W W 0-126 0-009
2 B B w 0-126} O'OSI}
2 B W B 0-126 0-S76 O·OSI 0-24S
2 W B B 0'126 O'OSI
S B B B 0-126 0'729
TABLE S.S.1
The aum of the product., 0'1196, gives, by the addition ruJe, the probability
of obtaining eil1aer (0 succeaaea with both drugs) or (1 success with both) or
(2 succeaaea with both) or (S succeaaea with both). Thus in 11·96 per cent of experi-
ment. in the longrun, treatment X will appear to be equi-eftective with treatment
Y, though in fact the latter is considerably better.
Furthermore, in some experiment. X will actually produce a bettn result than
Digitized by Google
50 § 3.3
Y. By enumerating the ways in which this can happen, and applying the addition
and multiplication rules, the probability of this outcome Is seen to be
(0·376 X 0·001)+0·375(0'027 +0'001)+0'126(0'243+0'027 +0'001) = 0·04475.
For eumple, the second term is the probability of obtaining both (2 succeaes
with X) and (either 0 or 1 8UCceaaea with Y). The treatments will be placed in the
t.Or'OtIf1 order of e1!ectiveneee in 4·475 per cent of triaJB in the long run.
The result of theBe calculations is the prediction that in the long run X will
appear to be &8 good &8, or even better than Y in 11'96+4·476 = 16·4 per cent of
experiments. It would thus be quite likely that a good new treatment would
remain undetected if an experiment were conducted with 8&D1plee &8 email &8
those in this illustration. The baz&rds of email 8&D1plee are dealt with further in
I 7.7 and in I 7.8, which describes the use of the binomial for the &88&y of purity
in heart.
3.4. The generel expression for the binomiel distribution and for
its mean and variance
The probability, P(1'), of observing l' successes out of 11. trials when the
probability of a SUCOOB8 is fJI, and the probability of a failure is therefore
I-fJ1 from (2.4.3», can be inferred by generalization ofthe deductions
in § 3.2. It is
(3.4.1)
if the order in which the SUCOOBSeB occur is specified. Commonly the
order is of no interest, and therefore, by the addition rule, this must
be multiplied by the number of ways in which l' SUCOOBSeB can occur in
11. trials namely
11.1
---, (3.4.2)
l' !(n-1')!
The proof that the sum of these probabilities, for all po8Bible values
of l' from 0 to 11., is 1 follows from the fact that (3.4.3) is a term in the
t This quantity is often denoted by the symbol (~), or by ·0•. It is the number of
po88ib1e ways of dividing n objects into two groupe containing r and n-r objects ('suc-
C8IIIJ8II' and 'failures' in the preeent caee). The n objects can be arranged in n I different
orders (pennutatione), and in each caee the fi1'IIt reelected for one group, the remaining
n-r for the other. However the rl pennutatione of the objects within the fi1'IIt group,
and the (n -r) I pennutatione within the eecond group, all result into the l&IIle division
into two groupe, hence the denominator of (3.'.2).
Digitized by Google
§ 3.4 51
expansion (.t+9')tl, where .t = 1-9', by the binomial theorem. Thus
I per) = (.t+9')tl = 1- = 1.
r-O
Example l. If n. = 3 and 9' = O'G, then the probability of one
SUCOO88 (r = 1) out of three trials is, using (3.4.3),
31
P(I) = -0·51 0·GII = 3xO'12G = 0'37G
1121
Digitized by Google
52 Theoretical distrilndioM § 3.4
typical of the population, each of these values of r being equiprobable,
see Table 3.2.1). The variance of r oould now be estimated using
(2.6.3). N = 4 is used in the denominator, not N -1, because the
population mean,,.,, not the sample mean, is being used~ § 2.6). It
would be
C 0·15
col
.~
.€
I o
Digitized by COOS Ie
3.5 Binomial au P0i88on. 53
The agreement is only e:md because the sample happened to be pa-Jtdly
representative of the population. If the calculations are baaed on small
samples the estimate of varianoe obtained from (3.'.') will agree
approximately, but not exaotly, with the estimate from (2.6.2). A
similar situation arises in the case of the Poisson distribution and a
numerical example is given in § 3.7.
Results are often expressed as the proportion (rIft), rather than
the number (r), of SU0088888 out of ft trials. The varianoe of the propor-
tion of 8UOO88888 follows directly from the rule (2.7 .G) for the effect of
multiplying a variable (r in this case) by a constant (11ft in this case).
Thus, from (3.'.'),
v&t(r) .9'(1-.9')
v&t(rIft) = - - = . (3.'.G)
ftll ft
Digitized by Google
54 Theoretical distributions § 3.5
intervals of time or space are independent. This derivation is given in
§ A2.2 (Chapter 5 should be read first). The independence of time
intervals is part of the definition of random events (see Chapter 5 and
Appendix 2).
Secondly, the Poisson distribution can be derived from the binomial
distribution (§ 3.4). In the examples cited the number of 'successes'
(e.g. disintegrations per second) can be counted, but it does not obviously
make sense to talk about the 'number of trials of the event'. Consider
an interval of time Ilt seconds long (or an interval of space) divided into
n small intervals. If the true (or population, or long-term) average
number of events in Ilt seconds is called m, then the probability of one
event occurring ('SUCce88') in a small interval of length I1t/n is 9
= mIn. t Because of the independence of time intervals the n intervals
are like n independem trials with a C01I.8tam probability 9 = ",/n of
SUCce88 at each trial, just like n tosses of a coin. These properties of
independence and constancy define (plausibly enough) what is meant by
'random'. If n is finite, the number of successes in n trials is therefore
given by the binomial distribution, (3.4.3), with 9 = mIn. In order to
consider very short time intervals let n-oo (and thus 9_0) in eqn
(3.4.3), so that m = ~ remains fixed. The result is (3.5.1), a limiting
form of the binomial distribution in which neither n nor 9, but only m
appears. The derivation is discussed by Feller (1957, p. 146), Mood and
Graybill (1963, p. 70), and Lindley (1965, p. 73). It is easy to follow ifit
is remembered that as n-oo,lim (l-m/n)" = e-M. See Thompson (1965,
Chapter 14) if it is not remembered.
The distribution gives the true probability of observing r events per
unit of time (or space) as
m'
P(r) = ----:-e--, (3.5.1)
rl
where m is the true mean number of events per unit of time or space.
(It is shown in Appendix I, (Al.l.7), that m is the population mean
value of r.) This is a discontinuous distribution because r must be an
t You may object that .. could be bigger than n, giving a probability bigger than 1 !
But the argument only applies to very 6Iion intervals 10 that #It < n and the chance of
fIIOf"8l1aon OM event occurring in a short interval (length At/n) is negligible. For example,
if At = 1 hour (3600 8) and #It = 36 eventa/h, then if n = 3600 it follows that fI' = 36/
3600 = 0·01. On average, 99 out of 100 Is intervals contain no event ('failure'), 1 in
100 contains I event ('8UCO_') and a negligible proportion contains more than one
event. The 'negligible proportion' is dealt with more rigorously in Appendix 2. It be-
comes zero if the intervals are made infinitely ahart, which is why we let n-+ co in the
derivation.
Digitized by Google
§ 3.5 BiflOmial and PoiuoA
integer. It baa the basic property of all probability distributions that it
should be certain (P = 1) that one or other of the possible outcomes
(r = 0, r = 1, ... ) will be observed. From the addition rule (2.4.2),
this means that p[r = 0 or r = 1 or... co] is the sum of the separate
probabilities i.e., from (3.5.1),
IX> IX> ",r ",2
I P(r) = e-'" I - = e-"(I+",+-+ ... ) = e--eM = 1. (3.5.2)
,-0 ,.orl 21
(See Thompson (1965, p. 118) if you do not recognize the expansion of
eM.)
"i:.Jr 531
r = -
"i:.J
= -80 = 6·625 .
Digitized by Google
56 §3.6
The Poisson distribution (3.5.1) gives the probability of a square
containing r cells as per) = e-..",r/rl, where"" the mean number of
cells per square, is estimated by r. For example, the probability of a
square containing 3 cells is predicted to be
(6·625)3
P(3) = e- 8 •826 31 ,....,0·064.
TA.BLB 3.6.1
,. obe. fI'eq. (f) calc. fI'eq. fr
0 0
&
1 0 •
2
,
3 V'
6 9 20
2
9
6 10 11 60
6 16 18 90
7 20 12 ao
8 17 10 136
9 6 7 M
10 8 6 80
0
~}o
11
12 :}6 0
ormor.,
Totals 80 80 631
Bacterial dilutions
H samples of a dilute suspension of bacteria are subcultured into
several replicate tubes then bacterial growth will result in those tubes
in which the added sample contained one or more viable bacteria. The
proportion of tubes showing growth is therefore an estimate of the
probability that a sample contains one or more organisms, P(r~l).
1/ the bacteria in the sample 8U8penai0n8 u·ere randomly and independently
Digitized by Google
§3.6 Birwmial aM PoiBson. 57
diMribWed throughout the suspending medium the number of bacteria
in unit volume of solution (r) would follow the Poisson distribution;
this enables an estimate of the mean number of cells per sample (M)
to be made from the observed proportion of suboultures showing
growth (P(r~l) = p, say).
From (3.&.1) the probability of the sample being sterile (r = 0)
is P(O) = e-· and therefore, by (2.4.3)(0£. (3.&.2»,
p = P(r~l) = I-P(O) = I-e- Ill •
By solving this fot m the mean number of viable organisDl8 per sample
is estimated to be
m = -log.(I-p)
(remember loglle% = z). For example, if 40 per cent of oultures are non-
sterile, p is 0·4 and m = -log.(1-0·4) = 0·51 organisDl8 per sample.
The error of this estimate depends on the number of suboultures on
whioh the estimate of p is based and is usually quite large.
Digitized by Google
#1Re;r[i#1ReuCI:U diatributi4freEii: § 3.6
TABLE 3.6.2
Oompari8on oj ob8erved and P0i88on di8tributions oj the number oj quanta
oj acetyl choline released per 8timulus (Katz, 1966; based on Boyd and
Martin, 1956)
r ObseiVi,i£
number of freqUiViY
quanta
19 18
°1 44 44
2 52 55
3 40 36
4 24 25
5 11 12
6 5 5
7
8
9
gle
§ 3.6 Binomial aM PoWon. 69
potential C&Il be repreeented (when 8 is small enough for the _. to be additive) by
8 = '"
1:-., (8.6.1)
'-1
which is the sum of a va.ria.ble number (m) of random variables (_). It is stated in
(2.7.19), and proved in I A1.4, that if the miniature end-plate potentiaJa are
independent of each other (which is probably so), and if the end-plate potential, 8,
is produced by a random sample (of variable size, m) from the population of
single quanta (which is leas certain), then the square of coeftlcient of variation
of the end-plate potential size is given by
..
'iP(8) = 'iP(_) +'iP(m), (3.6.2)
where '1'(_) and 'I'(m) are the population coeftlcients of variation of _ and m
defined in (2.6.4). This result does not depend on aasuming any particular distri-
bution for either _ or m (see I Al.4).
Suppose, for example, that m is binomially distributed, which might beexpected
if the nerve impul8e caU8ed there to be a constant probability ~ of releaaing each
of a population of N quanta, so the true mean number of quanta releaaed is
.. = N~ 88 in 13.4, and, on average, a proportion~ of the population is releaaed.
According to (3.4.4), luu(m) = N~(l-~) = ..(I-IP) and therefore 'iP(m)
== ."u(m)/__ = (l-~)/"', Substituting this into (3.6.2) gives
.. ..
'iP(8) = 'iP(_) + (1-~). (8.6.8)
'iP(_)+I-~
.. = ---'-c=-- (8.6.4)
'iP(8)
(3.6.6)
This, and the other results in this section, are discussed in the review by Martin
(1966). An estimate of ... is obtained by substituting the experimental estimates of
'1'(_) and '1'(8) into (8.6.5).
Equations (3.6.4) and (3.6.5) do not entirely account for the experimenta
observations and it W88 pointed out by del Cutillo and Katz (see Martin 1966)
that if we drop the rather unreaaonable 888umption that all the quanta have the
same probability of releaae, then 'iP(m) will be leas than the binomial value
(1-~)/"', which in turn is leas than the Poisson value, 11.... It can be shown
(e.g. Kendall and Stuart (1963, p. 127» that if each quantum baa a dUferent
probability (~/) of releaae, and that if these probabilities are constant from one
nerve impulse to the next, then 'iP(m) = (l-~-"a"(~)/~)/'" where ~ is the
mean probability of a quantum being rele88ed (i.e. ~IIN) and "at(IP) is the
Digitized by Google
60 § 3.6
variance of the 1111 values (zero in the binomiaJ caae when all the 111 are identical).
If this Is substituted in (3.6.2), solving for M gives
(3.6.6)
which Is amaller than Is given by either (3.6.4) or (3.6.5). In the caae where N
is very larp (3.6.6), like (8.6.4), tends to the Poisson form (3.6.5) despite the
variability of 111•
.As an example consider the observations discussed above. The observed values
for the response to one quantum waa I = 0·4 m V with standard deviation 0'086
m V, Le. coeftlcient of variation C(.) = 0·086/0·4 = 0'215. The observed mean end-
plate potenti&l waa 'B = 0'988 mV with a standard deviation of 0·684 mV (this
value Is taken for the purpoeea of illustration, the original figure not being
available) and hence CIS) = 0·684/0'938 = 0·680. If m were Polsaon-distributed
its mean value could be estimated from (3.6.5) aa
_ 0'215111 +1 1'046
m = 0'6S01i1 = 0.462 = 2'26,
which agrees quite well with eatbn&te (viz. 2.4) from the proportion of failures,
which al80 &lBUlDes a Poisson distribution, and the direct estimate 0'988/0'4
= 2·88 which does not.
Digitized by Google
f3.7 BifW'mial aM PoWoA 61
over a period of &min, of a radioactive sample. The decay over the
period of the experiment is &88UID.ed to be negligible. The Z valu88 are
10536 10636 10398 10393 10&86
10381 10479 10401 10262 10403
The total number of counts is l:z = 10447& oounts/GO min, mean oount
i = l:z/_ = 10447·& counts/5 min, and oount-rate = 10447·&/& =
2089·& oounts/min. What is the unoertainty in the oount-rate' Its
variance can be calculated in two ways.
If there had only been one 5-min count, say the first one, its variance
would have been estimated as 10&36, a similar figure.
However. what is really wanted is the variance of the count-rate
per minute, determined from &0 min of counting in the experiment,
not the variance of a I-min count. The count-rate is l:z/50 oounts/min.
In general, from (2.7.5), ViH(az) = a2viH(z), where a is a constant
(I/GO in this 0&88), therefore
fU) = var(U)
var\iO 502
104475
= ~= 41·79.
Digitized by Google
62 Theoretical tlistributiom §).7
total number of counts. If it is known that the count-rate has a Poisson
distribution (as it will have if the counter is functioning correctly) its
uncertainty can be estimated without having to do replicate observa-
tions.
Ob8erved variance
In this particular case there are replicate counts 80 the variance of
an observation (a 5-min count) can be estimated in the usual way
using (2.6.2),
~(X_i)2 111938
va.r(x) = .-1 = 9 = 12437·5.
And the variance of the mean count-rate per minute will be, from
(2.7.5),
(i) = -
var -5
var(i)
52- =
1243·75
25 = 49·75 .
By using the scatter of replicate counts, the standard deviation of
the count-rate (2089·5 counts/min) is therefore estimated to be
v'49·75 = 7·05 counts/min. This estimate, which has not involved any
assumption about the distribution of the observations, agrees well with
the estimate (6·46 counts/min) calculated assuming that the count-rate
was Poisson-distributed. This suggests that the assumption was not
far wrong. With either estimate the coefficient of variation of the
count-rate, by (2.6.4), comes out to about 0·3 per cent.
Digitized by Google
§3.7 63
counts were recorded in 10 min. Th,e net count is thus 2089'6-2000
= 89·6 counts/oiin.
By arguments similar to those above:
estimated variance of background count/min = var(count/l0) = var
(count)/I()2 = count/l()2 = 20000/100 = 200.
The estimated variance of the net count-rate (sample minus baok-
ground) is required. Because the counts are independent this is, by
(2.7.3), the sum of the variances of the two count-rates
var(sample-background) = var(sample)+var(baokground)
= 49'76+200 = 249'76,
Digitized by Google
4. Theoretical distributions. The
Gaussian (or normal) and other
continuous distributions
Digitized by Google
§ 4.1 66
are more classes the number (or probability) of observations falling
in a particular cl&88 will be reduced. The blocks will also be drawn
narrower though it will usually be convenient to keep them about the
0·4
0·3
0·)
I I
0'0 0·5 ),0 )'5 2·0 2·5 3·0 3·5
Muscle tension (grams)
0·08
r-r-:-r-r-
r- t-
0·06 r- -
r- ~
r- -
r- '; r-
r- ~
0·02 r- -
0·00
0·5
~ )·0 1·5 2·0 2'5
~ 3·0 3·5
luscle tension (grams)
FlO. 4.1.2. Same 'observations' as Fig. 4.1.1 but grouped into narrower
cla8lea (0·1 g), 8howing shape of distribution more clearly. Total Might of all
blocks = 1·0 as before.
aame height, &8 shown. This suggests that it might be more convenient
to represent the probability of an observation falling in a particular
cl&88 by the area of the block rather than its height. If the width of
the block (cl&88 interval) is constant then the area of the block is
proportional to its height 80 in ordinary histograIns the area is in fact
Digitized by COOS Ie
66 § 4.1
proportional to the probability. When the 018.88 width is reduced, the
reduction in width of the blocks will reduce their area, and hence the
probability they represent, without having to reduce their height. An
example of a histogram with unequal block widths ocours in §§ 14.1-
14.3. Fig. 14.2.3(a) shows it drawn with height representing probability
and Fig. 14.2.3(b) shows the preferable representation of the histogram
with area representing probability.
Using the convention that the probability is represented by the area
of the blocks rather than their height, the condition that the sum of all
the probabilities must be 1·0 is expressed by defining the total area of
the blocks as 1·0 (see (3.5.2) for example).
0-4
~
[~~-----------------
...... 0-3
~
1
;.. 0-2
.~
£.. o-}
] srI's up to :1'=
If the cl8.88 intervals are made narrower and narrower, the probability
attached to each becomes smaller. The probability of observing a
muscle tension (x) between 1·999 and 2·001 g is small, and a very
large number of observations would be necessary to use intervals as
narrow &8 this. When the cl8.88 interval becomes infinitesimally narrow
then the probability represented by eaoh block (i.e. the probability
that x will lie within the interval dx) is also infinitesimal, say dP, and
the graph becomes continuous instead of being made up of finite
blocks, as shown in Fig. 4.1.3. It now represents an infinite population
and it can never be observed exactly. It is a mathematical idealization.
Digitized by COOS Ie
§4.1 67
The area of a block, i.e. the probability of %falling in the interval of
width dz (between %and %+dz), must now be written
dP =J(%)dz. (4.1.1)
where the functionJ(%) is the ordinate of the curve shown in Fig. 4.1.3
(i.e. the height of the block), and % is the continuous variable (e.g.
blood preasure or muscle tension) the distribution of which is being
defined (see, e.g., Thompson, 1966, if the notation of (4.1.1) is not
understood). The function J(%) is known as the probability density
function (or simply density) of %. A value of this function is called the
probability density of a particular value of %. It is taoC the probability
of that value of %, but merely a function that defines a curve such that
the arm under ,he CUn1e represents probability. For example, the
uniformly shaded area in Fig. 4.1.3, as a proportion of the whole
area under the curve, is the probability that a value of % will lie
between two specified values, %1 and %2' The summation of the in-
finitesimal blooks of whioh this area is made up is handled mathe-
matically by integration so this area can be written as the integrated
form of (4.1.1),
This area is said, in statistical jargon, to be the lower tail of the distri-
bution. It can be called p, or F(%1)' and is vertically shaded in Fig.
4.1.3. It depends, of course, on the value of %1 ohosen, i.e. it is a funotion
of %1'
Digitized by Google
68 § 4.1
A more satisfactory way of writing the same thing is to use a special
symbol, say X, to distinguish x considered as a random variable, from
a particular value of the random variable, denoted simply x. The
probability of observing a value of the variable (e.g. musoletension) equal
to or less than some specified value x (e.g. 2·0 g) as in (4.1.3), is written
in this notation ast
1·0
J ()'9
F(~)
~
S F(r l >
I ()'8
'it 0·7
r;
8~
()'6
i.! 0·6
t ~
J:8
oJ
0·'
5:S ()'3
'8:6
~.J ()'2
:=
:I it ()'I
FlO. 4.1.4. Diatrlbution function, F(z), for the distribution mown in Fig.
4.1.S. The probability of obeerving a value of z or lees Is plotted agaiDat z. The
area between Zl and ~ in Fig. 4.1.S Is F(~)-FCzl) = 0'988-0·894 = 0'094, the
probability of an obeervation falling between Zl and ~.
Digitized by Google
§ 4.1 69
a specified very large value (e.g. 100 kg). Differentiating (4.1.4) shOWI
that the distribution funotion is related to the probability density, as
mggeeted by (4.1.I), thus
dF(z)
-=/(z). (4.1.6)
dz
I g g
I(z) = ay'(21r)exp[ -(z-p) /2a ], (4.2.ll
where 'It has its uaual value, and p and a are constants. The factor
l/ay'(21r) is a constant suoh that the total area under the ourve
(from z = - ex> to z = + ex» is 1·0. The notation exp(z) is used to
stand for eZ when the exponent, z, is a long expreBBion that would be
inconvenient to write as a superscript. H I(z) is plotted against z the
graph comes out as shown in Fig. 4.2.1.
It is a symmetrical bell-shaped curve asymptotio to, i.e. never quite
reaching, the z-axis. Being continuous it represents an infinite popula-
tion (see § 4.1). The constant p is the population meant and also the
population median and mode because the diatribution is symmetrical and
unimodal; see §§ 2.5 and 4.5. The constant a m.eaaurea the widtht of
Digitized by Google
70 § 4.2
the ourve as shown in Fig. 4.2.1, i.e. it is a measure of the scatter of the
values of z, and is the population standard deviation of z. An estimate
of (I could be made from a sample of observations, taken from the
population represented by Fig. 4.2.1, using (2.6.2). The distribution is
completely defined by the two parameters '" and (I.
O·O'-----~~I..---.....I.-----L-·--.l..---~""--=--
mean-2a mean-a mean mean+a mean+2a :r
FlO. 4.2.1. Gauaai&n ('n01'JD&l') distribution. 4·6 per cent of the observations
in the population are more than two population standard deviations &om the
mean (the shaded area is 4·6 per cent of the total area). The value 4·6 does not
apply to Mlmpla or, in general, to distributions other than the Gauaai&n (see
II 4.4 and 4.5).
Digitized by Google
§ 4.2 71
population represented in this way [i.e. by a Gaussian curve] is suffi-
ciently accurately specified for the purpoBe of the inquiry', or 'Many of
the frequency functions applicable to obBerved distribution do have a
normal form'. Such remarks are, at least as far as most laboratory
investigations are concerned, just wishful thinking. Anyone with
experience of doing experiments must know that it is rare for the
distribution of the obBervations to be investigated. The number of
obBervations from G ringk population needed to get an idea of the form
of the distribution is quite large--a hundred or two at least--eo this is
not surprising. In the vast majority of C&Be8 the form ofthe distribution
is simply not known: and, in an even more overwhelming majority of
cases there is no substantial evidence regarding whether or not the
Gaussian curve, is a sufficiently good approximation for the PurpoBeS of
the inquiry. It is simply not known how often the assumption of
normality is Beriously misleading. See § 4.6 for tests of normality.
That most eminent amateur statistician, W. S. GosBet ('Student', Bee
§ 4.4), wrote, in a letter dated June 1929 to R. A. Fisher, the great
mathematical statistician, '. . . although when you think about it you
agree that "exactness" or even appropriate use depends on normality,
in practice you don't consider the question at all when you apply your
tables to your examples: not one word.'
For theBe reasons some methods have been developed that do not
rely on the assumption of normality. They are discussed in § 6.2. How-
ever, many problems can still be tackled only by methods that involve
the normality assumption, and when such a problem is encountered
there is a strong temptation to forget that it is not known how nearly
true the assumption is. A possible reason for using the Gaussian method
in the abBence of evidence one way or the other about the form of the
distribution, is that an important use of statistical methods is to
prevent the experimenter from making a fool of himBelf (Bee Chapters
1, 6, and 7). It would be a rash experimenter who preBented results that
would not pass a Gaussian test, unless the distribution was definitely
known to be not Gaussian.
It is commonly said that if the distribution of a variable is not
normal, the variable may be transformed to make the distribution
normal (for example, by taking the logarithms of the obBervations, Bee
§ 4.5). As pointed out above, there are hardly ever enough obBervations
to find out whether the distribution is normal or not, so this approach
can rarely be used. Transformations are discussed again in §§ 4.6, 11.2
(p. 176) and § 12.2 (p. 221).
Digitized by Google
72 § 4.2
Various other reasons are often given for using GaU88ian methods.
One is that aome GaU88ian methods have been shown to befairly immune
to BOme sorts of deviations from normality, if the samples are not too
small. Many methods involve the estimation of means and there is an
ingenious bit of mathematiOB known &8 the central limil theorem that
states that the distribution of the means of samples of observations will
tend more and more nearly to the GaU88ian form &8 the sample size
increases whatever (almost) the form of the distribution of the observa-
tions themselves (even if it is skew or discontinuous). These remarks
suggest that when one is dealing with reasonably large samples,
GaU88ian methods may be used &8 an approximation. The snag is that
it is impossible to say, in any particular case, what is a 'reasonable'
number of observations, or how approximate the approximation will
be.
Further discU88ion of the &88umptions made in statistical calculations
will be found particularly in §§ 6.2 and 11.2.
t TabJee of Student', , (1188 § '.') give, on the line fer infinite degreM of floeeclom. the
_ below -II plua the _ above +u. i.e. the _ in bolla taiJa of the diltribation of u.
Digitized by Google
73
same whatever the values of the mean and standard deviation. For
example it is found that:
(1) 68·3 per cent of the area under the curve lies within one standard
deviation on either side of the mean. That is, in the long run, 68·3
per cent of random observations from a Gauuian population would
be fO\Uld to dift"er from the population mean by not more than one
population standard deviation.
(2) 95·' per cent of the area lies within two standard deviations (or
95·0 per cent within ±I·96a). The 4·6 per cent of the area outside
±2a is shaded in Fig. '.2.1.
(3) 99·7 per cent of the area lies within three standard deviations.
Only 0·3 per cent of random observations from a Gaussian population
are expected to differ from the mean by more than three standard
deviations.
,,=-_.
%-J.'
a <'.3.1)
t See II AU ADd AU. The IteDdard Conn ofa diatribution i8 de8ned in I AU.
Digitized by Google
In terms of the standard normal distribution the areas (obtainable
from the tables referred to above) become
0·15 Ca,
0·OOI.--_""""!"3oo::::::~!:---*"--+.------*--*~:::::o.:1"='5-x
0·15
Cb)
~
1~
0·10
I O~
&OO~--~9~~~--~-~--~-~~~~9~X---~
0·40
Digitized by Google
§4.3 75
(2) 95 per cent of the area lies between u = -1·96 and u = + 1·96,
(3) 99·7 per cent of the area. lies between u = -3 and u = +3.
In order to convert an observation z into a value of u = {z-p)/a, it
is, of course, neoessa.ry to know the values of p and a. In re&llife the
values of p and a will not generally be known, only more or less acourate
ulimatu of them, viz. f and 8, will be available. H the normal distribu-
tion is to be used for induction &8 well &8 deduction this fact must be
allowed for, and the method of doing this is disoussed in the next
section.
Digitized by Google
_......
----------~_.-.- --.
76
the methods used for dealing with large samples would need modification
if the results were to be applicable to the small samples he had to
with
work in the laboratory.
Gosset spent a year (1906-7) away from the brewery, mostly working
in the Biometrio Laboratory of University College London with
Karl Pearson, and in 1908 published a paper on the distribution of t .
.As an example, suppose that the normally distributed variable of
interest is i, the mean of a sample of 4 observations selected randomly
from a population of normally distributed values of x with population
mean I' and population standard deviation o'(x). The popUlation
standard deviation of i (or 'standard error', see § 2.7) will be O'(i)
0·4 /-
/
/ '\
I \
I ~'
I \
I \
I \
I \
\
CH
0-0 -4 -I o 4
t
~
= O'(x)/...j4 (by (2.7.9» and the population mean of i will be 1', the
same as for x. (See Appendix I, (AI.2.3).) Therefore if a very large
number of samples of 4 were taken, and if for each u = (i-I')/O'(i)
(from the definition (4.3.1» were calculated, it would be found that in
the long run 96 per cent of the values of u would lie between u = -1'96
and u = +1·96, as disoussed in § 4.3 and illustrated in Fig. 4.4.1.
Digitized by Google
§ 4.4 77
However, if O'(z) were not known, an estimate of it, 8(Z), oould be
calculated from each sample of 4 observations using (2.6.2) as in the
example in § 2.7, and from eaoh sample 8(Z) = 8(Z)/ v'4 obtained by
(2.7.9). For each sample' = (Z-I')/8(Z) (from the definition (4.4.1»
oould be now calculated. The values of Z would be the same as those
used for calculating u, but the value of 8(Z) would differ from sample
to sample, whereas the same population value, O'(z), would be used in
calculating every value of u). The extra variability introduoed. by
variability of 8(Z) from sample to sample means that t varies over a
wider range than u, and it can be found from the tables referred to
below, that it would be expeoted that, in the long ron, 95 per cent of
+
the values of t would lie between -3·182 and 3·182, as illustrated in
Fig. 4.4.1.
Notice that both the distributions in Fig. 4.4.1 are baaed on observa-
tions from the normal distribution with population standard deviation
0'. The distribution of t, unlike that of u, is not normal, though it is
baaed on the assumption that Z is normally distributed.
Although the definition of t (4.4.1) takes aooount of the uncertainty
of the estimate of O'(z), it still involves knowledge of the true mean I'
and it might be thought at first that this is a big disadvantage. It will
be found when tests of significance and oonfldence limits are disoussed
that, on the oontrary, everything neoessa.ry can be done by giving I' a
hypothetical value.
Digitized by Google
78 § 4.4
The Biometrika Tables of Pearson and Hartley (1966, Table 12,
p. 146, 'Percentage points of the t distribution') give the sa.me sort of
table. The number of degrees of freedom is denoted" and the probability
2Q. Q being the sha.ded area. in one tail of Fig. 4.4.1.
4.6. Skew distributions and the lognormal distribution
In § 4.2 it waa stressed that the normal distribution is a mathe-
matical convenience that cannot be supposed to represent real life
adequately, and that it is very rare in experimental work for the
distribution of observation to be known. In those cases where the
distribution baa been investigated it haa often been found to be non-
normal. Distributions may be more fiat-topped or more sharp-topped
than the normal distribution, and they may be 'II/MYmmetrical. Un-
symmetrical distributions may have poBititJe 8kew aa in Fig. 4.5.1
(an even more extreme case is the exponential distribution Fig. 5.1.2),
or negative 8kew, aa in the mirror image of Fig. 4.5.1.
0·07
~
.~
.
~"
.
:.
•0;:
1
~
e
g"
0·01
30
z
Flo. 4.6.1. The lognormal distribution; a positively skewed. probability
distribution. The mean value of z is greater than the median, and the mode is
1_ than the median. The 60 per cent of the area that lies (by definition) below
the median is ahaded. For the lognormal distribution, in general, mode = antilog1o
(p-2'S026al') (= 6'81 in this example), median = antUog1ol' (= 10·0 in this
example), mean = antUog1o (p+l'1613al') (= 13'1 in this example), where
p and a2 are mean and variance of the (normal) distribution of the IOgloZ shown in
Fig. 4.6.2. Reproduced from Documen14 Geigy BCientiJlc tabla, 6th edn, by per-
mhBion of J. R. Geigy S.A., Baale, Switzerland.
Digitized by Google
§ 4.5 The GamBian. and other oon.tin.UQU8 diBtributiOfl.8 79
mean is greater than the population median which is in turn larger
than the population mode. There is no particular reason to prefer the
mean to the median or mode as a measure of the 'average' value of the
variable in a case like this. A reason for preferring the median is men-
tioned below (see also Chapter 14). The distribution of personal incomes
has a positive skew so the most frequent income (the mode) is less than
the mean income, and more people earn less than the mean income than
earn more than the mean income, because incomes above the mean are,
mode, median,mean
0·2
on the whole, further from the mean than incomes below it, i.e. more
than 50 per cent of the area under the curve is below the mean, as
shown by the shading in Fig. 4.5.1.
It is usually recommended that non-normal distributions be con-
verted to normal distributions by transforming the scale of z (see
§§ 4.2, 11.2, and 12.2). This should be done when possible, but in
most experimental investigations there is not enough information to
allow the correct transformation to be ascertained. In Chapter 14
an example is given of a variable (individual effective dose of drug)
with a positively skewed distribution (Fig. 14.2.1). In this particular
example the logarithm of the variable is found to be approximately
normally distributed (Fig. 14.2.3). In general, a variable is said to
follow the logn.ormal diBtrilYution., which looks like Fig. 4.5.1, if the
logarithm of the variable is normally distributed, as in Fig. 4.5.2.
Digitized by Google
80 §4.5
In Chapter 14 the median value of the variable (rather than the
mean) is estimated. The median is unohanged by transformation, i.e.
the population median of the (lognormal) distribution of z is the
antilog of the population median (= mean = mode) of the (normal)
distribution of log z, whereas the population mode of z is smaller, and
the population mean of z greater than this quantity (of. (2.0.4». For
example, in Fig. 4.0.2 the median = mean = mode of the distribution
of IOglO z is 1'0, and the median of the distribution of z in Fig. 4.0.1
is antilog1o 1 = 10, whereas the mode is less than 10 and the mean larger
than 10.
Because of the rarity of knowledge about the distribution of observa-
tions in real life these theoretical distributions will not be discussed
further here, but they occur often in theoretical work and good accounts
of them will be found in Bliss (1967, Chapters 5-7) and Kendall and
Stuart (1963, p. 168; 1966, p. 93).
Digitized by Google
5. Random processes. The exponential
distribution and the waiting time
paradox
(5.1.1)
Digitized by Google
82 § 5.1
time greater than, or at a time equal to or less than t. Because these
cannot both ha})pen it follows from the addition rule, (2.4.3), that
p[interval>t]+p[interval ~ t] = 1
and thus
p[interval ~ t] = F(t) = l_e- At (for t ~ 0). (5.1.2)
(The distribution function, F, was defined in (4.1.4).
;
-5 0·9
.I. 0·8
0
.s
-; 0·7
::I 0·632
2'
a 0·6
1 0·5
::I
"CI
'0
-i
~
t
.5 3·0
...e
0·4
.,
-'
[:'
~,
,
Gi'
>,
-'
._,
C' ",
~,
C
0·2 c,
0 c'
._,
oS' '-,
i c,
~,
8- i'
);:
[ 0·1 -','
, ....
,,,
~
~ 0·693 1·0 2·(1 3·0 4·0
~ lJuration of interval (as a multiple of the lIIt'an intt'l'\'al)'\ t
Digitized by Google
§ 5.1 83
which the next event occurs. And, in particular, it is the distribution of
the time interval between succe88ive events (see § 5.2 and Appendix 2).
Because the intervals can be of any length this is a continuous distribu-
tion, unlike the Poisson, and it has probability density (see § 4.1), using
(5.1.2) and (4.1.5),
dF(t) d
I(t) = - - =-(I_e- At ) = Ae- At (for t :> 0), (5.1.3)
dt dt
=0 (for t < 0).
Digitized by Google
§ 5.1
Fig. G.1.1 is the oumulative form, F(') (see (4.1.4}}, of the exponential
distribution (of. Fig. 4.1.4, whioh is the oumulative form of the normal
distribution in Fig. 4.1.3). To obtain Fig. G.1.1 from Fig. G.1.2 notice
that the probability of observing an interval ~, is given by the area
under the distribution ourve (Fig. G.1.2) below I, i.e. between 0 and ,
(see § 4.1). This, using (4.1.4), is
Digitized by Google
16.2
The subtle flaw in the argument for a waiting time of 19" lies in the
implicit &88umption that the interval in whioh an arbitrarily selected
time falls is a random selection from all intervals. In fact, longer
intervals have a better ohance of covering the selected time than
shorter ones, and it can be shown that the average length of the interval
in which an arbitrarily selected time falls is not the same &8 the average
length of all intervals, r, but is actually Jr (see I A2.7). Since the
selected time may fall anywhere in this interval, the average waiting
time is half of Jr, i.e. it is r, the average length of all intervals, &8
originally suppoeed. The paradox is resolved. In the bus example this
means that a peraon arriving at the bus stop at an arbitrary time would,
on average, arrive in a 20-min interval. On average, the previous bus
would have passed 10 min before his arrival (&8 long &8 this was not
too near the time when bU888 started running) and, on average, it
would be another 10 min until the next bus.
These aasertions, whioh surprise moat people at &rat, are disousaed
(with examples of biological importance), and proved, in Appendix 2.
Digitized by Google
6. Can your results be believed?
Tests of significance and the analysis
of variance
'. • • before anything was known of Lydgate's skill, the judgements on it had
naturally been divided, depending on & sense of likelihood, situated perhaps in
the pit of the stomach, or in the pineal gland, and diflering in its verdicts, but not
leas valuable as & guide in the total deficit of evidence. '
GEORGE ELIOT
(Middlemarch, Chap. 45)
(2) A8sumption.!
Assumptions about, for example, the distribution of errors, must
always be made before a significance test can be done. Sometimes
some of the assumptions are tested but usually none of them are
(see §§ 4.2 and 11.2). This means that the uncertainty indicated by the
test can be taken as only a minimum value (see §§ 1.1 and 7.2). The
assumptions of tests involving the Gaussian (normal) distribution are
discussed in §§ 11.2 and 12.2. Other assumptions are discussed when
the methods are described.
Digitized by Google
§ 6.1 Puts 0/ significance and tke analysi8 0/ variance 87
Some teeta (nonparametric tests), which make fewer &88umptions than
those ba.sed on a specified, for example normal, distribution (parametric
tests such a.s the t test and ana.lysis of variance), are described in the
following sections. Their relative merits are discussed in § 6.2. Note,
however, that whatever test is used, it remains true that if the test
indica.tes that there is no evidence that, for example, an experimental
group differs from a control group then the experimenter cannot
rea.sonably suppose, on the basis of the experiment, that a real difference
exists.
Digitized by Google
88 TutB oj Bipificance and the analY8i8 oj mriat&ce 6.1
this figure is described as the result of a one-tail Bipificance test. Its
interpretation is disoU88ed in (4) below. It is the figure that would be
used to test the null hypothesis against the alternative hypothesis that
the trw difference is positive. When the alternative hypothesis is that
true difference is poBitit1e, the result of a one-tail test for the
difference between two means always has the following form.
II -Negative II P08itive
differenL'e8 f differences between mealUl
Hypothetical Obaervt'd
population difference
difference
Flo. 6. 1. 1. Basis of aigniflcance teste. See text for explanation.
Digitized by Google
§6.1 89
If the alternative to the null hypothesis is the hypothesis that the true
difference between means is, say, positive, this implies that however
large a negative difference W&8 obsenJed it would be attributed to
chance rather than a Iru6 (population) negative difference (or at least
that it would be considered of no interest if real).
Suppoae now that it cannot be specified beforehand whether the Iru6
difference between means is positive, zero, or negative. In the example
above there would be probability of 0·04 of seeing a difference at least
as large as the positive difference obaerved in the experiment iJ the null
hypothesis were true. But there would also be a probability of 0~04 (the
horizontally shaded area) of seeing a deviation from the null hypothesis
at least as extreme &8 that actually obaerved but in the opposite direc-
tion. The total probability of obaerving a deviation from the null
hypothesis (in either direction) at least &8 extreme &8 that actually
obaerved would be P = 0·04+0·04 = 0·08 iJ the null hypothesis were
true. This is the appropriate probability because, if it were resolved
to reject the null hypothesis as falae every time an experiment gave a
difference between means as large as, or larger than that obaerved in
this experiment, then, iJ the null hypothesis were actually true it
would be rejected (wrongly) not in 4 per cent of repeated experiments,
but in 8 per cent. This is because negative obaerved differenoea in the
lower tail of Fig. 6.1.1, which would also lead to wrong rejection of the
null hypothesis, would be just as common. in the long run, &8 positive
differenoea. The probability is ohoaen so &8 to control the frequency of
this IOrt of error. This is disCUBBed in more detail in subaection (6)
below.
The value P = 0·08 is described as the result of a 'wo-ta.il ~ oJ
BipipflU. Its interpretation is diacUBSed in subaection (4) below. The
value of P is usuallyt twice that for a one-tail test. The result of a
two-tail teat always has the following form.
t In the OllIe of the normal diatribution (I 4.2), or any other diatn'bution that ia
8)'IDIIl8trioaI, whether oontinuoUi or diaoontinUOUl, for eumple the binomial diatribution
with 8' ... 005 (§§ 3·2 and 3·4) or Student'. diatn'bation, (14.4), one oould _y here •••••
deviation from it, in either direction, cu lorp cu, or lorger Illata, that obeerved in the
Digitized by Google
90 Tuts oJ significance and the analY8i8 0/ variance § 6.1
Notice that P is not the probability that the null hypothesis i8 true
but the probability that certain observations would be made i/ it were.
Perhaps the best popular interpretation of P is that it is the ~proba
bility of the results occurring by chanoe'. Although this is inaccurate
and vague, and should therefore be avoided, it is not too misleading.
experiment • • • ' In general, this simpler statement is not poaaible, however. Two
other c.- must be considered. (1) The -.mpling distribution (e.g. Fig. 6.1.1) is con-
tinuoua but unsymmetrical (_ § 4.5). In this caae dift'erent sized positive and negative
deviations will be needed to cut oft' equalareaa in the upper and lower tails (respectively)
of the distribution. It is the a:tremenua (i.e. rarity) of the deviation measured by the
arM it cute oft' in the tail of the distribution (rather than ite .me) that matters. The
two-tail probability is still twice the one-tail probability, however. (2) The ampling
distribution is both unsymmetrical and diacontinuoua (as often happens in the very
important IOrt of testa known as randomization testa, _ §§ 8.2, 9.2, 9.3, and 10.2-
10.4). A greater difficulty an- in this caae because the most extreme obeervations in the
opposite tail of the distribution (that not containing the obeervation) will not genera1ly
cut oft' an arM exactly the -.me as that cut oft' by the observation in ite own tail 10 P
for the two-tail teat cannot be exactly twice that for the one·tail teat. There is no
definite rule about what to do in this caae. Moat commonly a deviation is chosen in the
opposite direction to that obeerved that cute cute oft' an area in the opposite tail tIM
gntJIlIr lllan the value found in the one-tail teat, 10 the two-tail P is not greater than
twice the one-tail P. However, it may be decided to ohooae a deviation that oute oft'
an area in the opposite tail that is cu near a.t poNible to that of the one-tail teat. This
ia exempWled at the end of § 8.2 where the deviations of a from the null hypothetical
value are stated, to show exactly what baa been done. With amall unequal samples
the moat extreme poaaible obeervation in the opposite tail may out oft' an area far greater
than that in the one tail teat. This problem is diacu.ad in § 8.2.
Digitized by Google
§ 6.1 Tuts 0/ Bignijicance aM the an.alllN 0/ variance 91
entirely matters for personal judgement. The calculations throw no
light whatsoever on these problems. It is often found in the biomedical
literature that P = 0·05 is taken as evidence for a 'significant differ-
ence'. However 1 in 20 is not a level of odds at which most people would
want to stake their reputations as an experimenters and, if there is no
other evidence, it would be wiser to demand a much smaller value
before choosing explanation (c).
A twofold change in the value of P given by a test should make
little difference to the inference made in practice. For example,
P = 0·03 and P = 0·06 mean much the same sort of thing, although
one is below and the other above the conventional 'significance level'
of O·OiS. They both suggest that the null hypothesis may not be true
without being small enough for this conclusion to be reached with any
great confidence.
In any case, as mentioned above, no single test is ever enough.
To quote Fisher (1951) again: 'In relation to the test of significance, we
may say that a phenomenon is experimentally demonstrable when we
know how to conduct an experiment which will rarely fail to give us a
statistically significant result'.
Digitized by Google
92 Te8t8 of aigftificattee aM the analy8i8 of vanattee § 6.1
apparent. Although this may seem gloomy, it is only common sense.
To show that two population means are identioal exactly, the whole
population, usually infinite, is obviously needed.
An emmple. The supposition that a large P value constitutes evidence
in favour of the null hypothesis is, perhaps, one of the most frequent
abuses of 'significance' tests. A nice example appears in a paper just
received. The essence of it is &8 follows. Differences between membrane
potentials before and after applying three drugs were measured.
The mean differences (d) are shown in Table 6.1.1.
TABLB 6.1.1
d stands for the clifterenee between the membrane potentia.ls (millivolts) in the
presence and absence of the specified drug. The mean of n such ditlerences
isa, and the observed standard deviation of d is B{d). The standard deviation
of the mean clifterence is B{d) = B(d)/vn and values of Student's t are calculated
&8 in § 10.6.
B(d) P (appl'Ox)
Noradrenaline 2'7 10'1 40 1'60 1-7 0·1
Adrenaline 3·4 12·2 80 1·36 2'5 <0·02
Isoprenaline 3'9 10'8 60 1'39 2·8 <0'01
Digitized by Google
§ 6.1 93
Table 6.1.1, but despite this the authors would presumably have oome
to the opposite oonolusion. This is clearly absurd. But if the original
experiment with. = 40 differences had been interpreted &8 'no evidence
for a real effect of noradrenaline' or 'effect. if any, masked byexperi-
mental error' there would have been no trouble. It is reasonable that
the 1arger experiment should be capable of detecting differences that
escape detection in the smaller experiments.
Theee ideas can be formalized by oonsidering the power of a signi-
ficanoe test which is defined &8 the probability that the test will reject the
null hypothesis (e.g. that two population means are equal), this proba-
bility being considered &8 a function of the trw difference between the
means. For example, if the null hypothesis was always rejected when-
ever a test gave P ~ 0·0i5 then, if the null hypothesis really were true it
would be rejected (1Df'Oftg11l) in 6 per cent of trials, &8 explained in
subsection (3) above (see subsection (7), below). The wrong rejection
of a correct hypothesis is called 4. error oJ the fir. kind, and, in this
case, the probability (at) of an error of the first kind would be at = 0·06.
If in fact there was a difference between true population means.
and this real difference was, for example. equal in size to the true
standard deviation of the difference between means (see §§ 2.7 and
9·4) (i.e. the difference, although real, is similar in size to the experi-
mental errors), then it can be shown that a two-tail normal deviate
testt would reject the null hypothesis (this time oorreotly) in 17 per
cent of experiments. However. if the null hypothesis was accepted as
true every time it was not rejected then it would be 1Df'Oftg11l accepted
in 83 per cent of experiments. The wrong acceptance of a false hypothesis
is oalled 4. error oJ the 8econd kiM, and, in this case, the probability
(ft) of this sort of error is P= 0·83.
The power curve for a two-tailed normal deviate test for the difference
between two means is shown in Fig. 6.1.2 and compared with the
power curve for the (non-existent) ideal test that would always accept
true hypotheses and reject false ones. The power of even the best testa
to detect real differences that are similar in size to the experimental
error is quite small.
Digitized by Google
94: Tut8 of 8ignificance and tke analY8is of variance § 6.1
(a)
-Tr-
.
!
0·8
0·7
of" 0·6 I p'=U.S3
.~
'0
0·5 :I
I
I
I
£ 0·4 I
I
0·3
0·2
I
0·1 ________ !~ :-~l':~')J II
«=0·05
0·0
-.- -- --
-4 -3 -2 -I 0 +1 +2 +3 +4
1·0
(b)
J
..~
'0
(1·5
O . O L . - - - - - - - - - - -....o' - - - - - - - - '
FlO. 6.1.2. In both ftgures the abscissa. gives the dii'rerence between the
poptclalion meana (exprell8ed 88 a multiple of the standard deviation of the
dii'rerence between means: see § 9.4). (a) The power curve for a two-tail norm&l
deviate test for difterence between two means (see text) when Cl = 0'05, i.e. the
null hypothesis is rejected whenever P < 0'05, 80 if it were actually true it would
be wrongly rejected in 5 per cent of repeated experiments. If the null hypothesis
were false, i.e. there it a difterence between the population means (in this example,
a difference equal in size to one standard deviation of the difference between
meana: see § 9.4) the null hypothesis would be rejected (correctly) in 17 per cent
of experiments and not rejected (wrongly) in fJ = 83 per cent of experiments.
(b) Power curve for the (non-exist.ent) ideal test that always rejects a hypothesis
(population meana equal) when it is false, II.lld never rejects it when it is true.
Digitized by Google
§ 6.1 95
because, although there fa much informed opinion, there fa rather Utt1e COD88D.8U8.
A personal view followa.
The ftrat point concerD8 the role of the null hypotheeia and the role of prior
knowledge, Le. knowledge available before the experiment waa done. It fa widely
advocated nowadays (particularly by Bayealana, aee 111.8 and 2.') that prior
information should be uaed in making atatiatical decisions. There fa no doubt
that thfa fa deairable. AD relevant information should be taken into account in
the aearch for truth, and in aome fielda there are reuonable ways of doing thIa.
But in thfa book the view fa taken that attention must be restricted to the infor-
mation that can be provided by the experiment iteelf. Tbfa fa forced on ua because,
in the sort of amall-acaJe laboratory or clinical experiment with which we are
mostly concerned, no one baa yet devised a way that fa acceptable to the acientiat,
aa oppoaed to the mathematician, of putting prior information in a quantitative
form.
Now it baa been mentioned already that in moet real experiments it fa 1lD1'eal-
iatlo to suppose that the null hypothealat could ever be true, that two treatments
could be 1IMdl'll equl-etrectlve. So fa it reaaonable to con.atract an experiment
to teet a null hypotheala P The &D8Wer fa that it fa a perfectly reaaonable way of
approaching our aim of preventing the experimenter from maldng a fool of
hbnaelf' if, aa recommended above, we say only that 'the eq'HJrimenl provides
evidence aplnat the null hypotheala' (if P fa amall enough), or that 'the eq'HJrimenl
does not provide evidence aplnat the null hypotheala' (if P fa large enough).
The fact that there may be prior evidence, not from the experiment, aplnat the
null hypotheala does not make it unreaaonable to say that the experiment iteelf
provides no evidence aplnat it, in those cases where the observations in the
experiment (or more extreme ones) would not have been unusual in the (admit-
tedly improbable) event that the null hypothesis waa exactly true.
And, because it baa been atreeaed that if there fa no evidence aplnat the null
hypotheala it does not imply that the null hypothesis fa true, the inference from
a large P value does not contradict the prior ideaa about the null hypotheala.
We may still be convinced on prior grounda that there fa a real di1!erence of lOme
IOrt, but aa it fa apparently not large enough, relative to the experimental error
and method of analyafa, to be detected in the experiment, we have no idea of its
size or tlirectton. So the prior knowledge fa of no practical importance.
Another point CODcerD8 the diacuBon of power. It baa been recommended
that the reault of slgnitI.cance teat should be given aa a value of P. It would be
aUly to reject the null hypotheala automatically whenever P fell below arbitrary
level (0·0«; say). Each cue must be judged on its merits. So what fa the justl1lca-
tion for diacusaI.ng in subaections (8) and (6) above, what would happen 'if the
null hypotheeia were always rejected when p.s;;;; 0·05'P As usual, the aim fa to
prevent the experimenter maldng a fool of himself. Suppose, in a particular cue,
that a slgnitI.cance teat gave P = 0·007, and the experimenter decided that, all
thinp considered, thfa should be interpreted aa meaning that the experiment
provided evidence against the null hypotheala, then it fa certainly of interest to
the experimenter to known what would be the consequences of acting conaiatently
in thfa way, in a aeries of imaginary repetitions of the experiment in question.
Tbfa does not in any way imply that given a di1!erent experiment, under di1!er-
ent circumatanCe8, the experimenter should behave in the same way, i.e. use
P = 0·007 aa a critical level.
t Thia remark appliee to point hypoth_, i.e. thOll8 stating that me&DII, populationa,
etc., are idenIicaZ. All the null hypoth_ uaed in this book are of this aort.
a
Digitized by Google
96 Test8 oJ sivnificance and the analysi8 oJ variance § 6.2
8.2. Which 80rt of te8t 8hould be u8ed, parametric or
nonparametric 1
Parametric tests, such as the t test and the analysis of variance are
those based on an assumed form of distribution, usually the normal
distribution, for the population from which the experimental samples
are drawn. Nonparametric tests are those that, although they involve
some assumptions, do not assume a particular distribution. A discussion
of the relative 'advantages' of the tests is ludicrous. If the distribution
is known (not assumed, but known; see § 4.6 for tests of normality),
then use the appropriate parametric test. Otherwise do not. Neverthe-
less the following observations are relevant.
Digitized by Google
§ 6.2 Tut8 oj Bipifioance and the aMlyN oj variance 97
well be more powerful, 80 this cannot really be considered an advantage.
In any case, even when the assumptions of parametric methods are
fulfilled the nonparametric methods are often only slightly less powerful.
In fact the randomization tests described in §§ 9.2 and 10.3 are as
powerful as parametric tests even when the 8.88umptions of the latter
are true, at least for large samples.
There is a considerable volume of knowledge about the asymptotic
relative eJlicienciu of various tests. These results refer to infinite sample
sizes and are therefore of no interest to the experimenter. There is less
knowledge about the relative efficiencies of tests in small samples. In
any case, it is always necessary to specify, among other things, the
distribution of the observations before the relative efficiencies of tests
can be deduced; and because it is part of the problem that nothing is
known about this distribution, even the results for small samples are not
of much practical help. Of the alternative tests to be described, eaoh
can, for certain sorts of distribution, be more efficient than the others.
There is, however, one rather distreBSing consequence of lack of
knowledge of the distribution of error, which is, of course, not abolished
by a88uming the distribution known when it is not.
As an example of the problem, consider the comparison of the effects
of two treatments, A and B. The experimenter will be very pleased if a
large and consistent difference between the effects of A and B is
observed. and will feel, reasonably, that not many observations are
nece88&ry. But it turns out that with very small samples it is impossible
to find evidence against the hypothesis that A and Bare equi-effective,
however large, and however consistent, the difference observed be-
tween their effects, unle88 80mething is known about the distributions
of the observations. Suppose, for the sake of argument, that the
experimenter is prepared to accept P = 1/20 (two tail) as small
enough to constitute evidence against the hypothesis of equi-effective-
neBS (see § 6.1). If the experiment is conducted on two independent
samples, each sample must contain at least 4 observations (for all the
nonparametric tests described in Chapter 9, q.v., the minimum poBBible
two-tail P value with samples of 3 and 4 would be 2.3141/71 = 1/17},
however large and consistent the difference between the samples).
Similarly, if the observations are paired, at least 6 pairs of observations
are needed; with 5 pairs of observations the observations on the
nonparametric methods described in Chapter 10, q.v., can never give a
two-tail P le88 than 2.(1)5 = 1/16. (See al80 the discUBBion in §§ 10.5
and 11.9.)
Digitized by Google
98 16.2
In contrast, the parametrio methods can give a very low P with the
smallest samples if the difference between A and B is sufficiently large
and consistent. N et1erl1&elu8, 'hue lads mea. ,hal it is tJ diBtJdt1Gf&lQge noI
10 &aow ,he diBtribution. 01 ,he obsenJaJi0n8. They do not constitute a
disadvantage of nonparametrio tests. The problem is less acute with
samples larger than the minimum sizes mentioned.
In view of these remarks it may be wondered why parametrio tests
are used at all when there are nonparametrio alternatives. In fact they
are still widely used even now. This is partly because of familiarity.
The , test and analysis of variance were in use for many years before
most nonparametrio methods were developed. It probably also results
from the sacrifice of relevance to the real world for the sake of mathe-
matical elegance. Methods based on the assumption of a normal distribu-
tion have been developed to cover a wide range of problems within a
single, admittedly elegant, mathematical framework.
It is not uncommon for those who are dubious about the assumptions
necessary for parametrio tests to be told something along the lines
'experience has shown that the t test (for example) will not mislead
us'. Unfortunately, as Mainland (1963) has pointed out, this is just
wishful thinking. There is no knowl~ at all of the number of times
people have been misled by using the' test when they would not have
been misled by a nonparametric test (see §§ 4.2 and 4.6).
A plausible reason for using tests based on the normal distribution
is that some of them have been shown to be fairly insensitive to some
sorts of deviations from the assumptions on whioh they are based if the
samples are reasonably big. The tests are said to be fairly robu8t. But
this knowledge can usually be used only by intuition. One is never
sure how large is large enough for the purposes in hand. When the
nature and extent of deviations from the assumptions is unknown,
the amount of error resulting from assuming them true is also unknown.
It is muoh simpler to avoid as many as possible of the assumptions.
In spite of what has just been said parametrio methods are discussed
in the following chapters, even when nonparametrio methods exist.
This is necessary as an approach to the more complex experimental
designs, curve-fitting problems, and biological assay for which there are
Digitized by Google
§ 6.2 99
still ha.rdly any nonparametrio methods available, so parametrio tests
or nothing must be used. Whichever test is used, it should be inter-
preted as suggested in §§ 1.1, 1.2,6.1, and 7.2, the uncertainty indicated
by the test being taken as the minimum uncertainty that it is reasonable
to feel.
Digitized by Google
100 Te8t8 oJ Bign,ijicance and the analysis oJ variance § 6.4
be arranged (ranked) in order oJ magnitude (as, for example, with arbitrary
soores such as those used for subjective measurements of the intensity
of pain) then the rank methods described in §§ 9.3, 10.4, 10.5, 11.5,
11.7, and 11.9 should be used. For quantitative numerical mea8'Urement8
the methods described in the remaining sections of Chapters 9-11 are
appropriate.
Methods for dealing with a Bingle sa.mple are discussed in Chapter 7
and those for more than two sa.mples in Chapter 11.
Digitized by Google
7. One sample of observations. The
calculation and interpretation of
confidence limits
Digitized by Google
102 § 7.2
only a more or less inaccurate estimate of the population value (see
§ 4.4).
As usual it must be emphasized that the distribution is hardly ever
known, so it will usually be preferable to use the nonparametric
confidence intervals for the median (§ 7.3), which do not assume a
normal distribution.
No sort of oonfidence interval, nonparametrio or otherwise, can
make allowance for samples not having been taken in a strictly random
fashion (see §§ 1.1 and 2.3), or for systematio (non-random) errors.
For example, if a measuring instrument were wrongly calibrated so
that every reading was 20 per cent below its correct value, this error
would not be detectable and would not be allowed for by any sort of
confidence limits.
Therefore in the words of Mainland (1967a), confidence limits
'provide a kind of minimum estimate of error, because they show how
little a partioular sample would tell us about its population, even if
it were a strictly random sample'. It seems then that estimates cannot
be trusted very far. To quote Mainland (1967 b) again,
'Any hesitation that I may have had about questioning error estima.te8 in
biology disappeared when I recently learned more about error estimates in that
sanctuary of scientUlc precision-physics.
'One of the most disturbing things about scientUlc work is the failure of an
investigator to con1lrm results reported by an earlier worker. For example in
the period 1895 to 1961, some 15 observations were reported on the magnitude
of the astronomical unit (the mean distance from the earth to the sun). You will
find these summarized in a table • . . which lists the value obtained by each
worker and his estimates of plus or minus limits for the error of the estimate.
It is both entertaining and shocking to note that, in every case, a worker's
estimate is out.idfJ the limits set by his immediate predecessor. Clearly there is
an \lIlN801ved problem here, namely, that experimenters are apparently unable
to arrlYe at realistic estimates of experimental errors in their work" (Youden
1968).
If we add to the problems of the physicist the variability of biological and
human material, and the nonrandomness of our samples from it, we may well
marvel at the confidence with which "confidence intervals" are presented.'
Confidence limits purport to predict from the results of one experi-
ment what will happen when the experiment is repeated under the
same (as nearly as possible) conditions (see § 7.9). But the experimentalist
will not need much persuading that the only way to find out what will
happen is actually to repeat the experiment and see. And on the few
occasions when this has been done in the biological field the results
have been no more encouraging than those just quoted. For example,
Dews and Berkson (1954) found that the internal estimates of error
Digitized by Google
17.2 103
calculated in individual biological aasays were mostly considerably
lower than the true error found by aotual repetition of the aasay. As
Dewa and Berkson point out, if the aasays were performed at different
times or in different laboratories it would probably be said that there
were 'inter-time' or 'inter-laboratory' differences; and if there were
no suoh 'obvious' reasons for the interval error estimates being too
low, then probably 'the animals would be stigmatized as "heterogen-
eous", with more than a hint that there had been too little incestuous
activity among them'. The moral is once again that confidence limits,
or other estimates of error calculated from the internal evidence of an
experiment, must be interpreted as lower bounds for the real error.
Nevertheless, on the grounds that a minimum estimate of error is
better than none at all, examples follow. Their interpretation is
discusaed further in 1 7.9.
Digitized by Google
104 § 7.3
The reasoning behind the construction of Table Al is roughly as
follows (see Nair (1940) and Mood and Graybill (1963, p. 407». Let m
denote the population (true) median. By definition of the median
(§ 2.5) the probability is 1/2 that an observation selected at random
from the population, which is assumed to follow any continuous
distribution, will be less than m. The probability that i observations
out of n fall below m follows directly from the binomial distribution
(3.4.3) with U' = i, i.e.
n!
i!(n-i)! 2
(1)" (7.3.1)
To find from this the probability that the rth ranked observation,
1/(r), in a sample of n observations, will be greater than the population
median, note that this will be the case if the sample contains either
i = 0 or 1 or...or (r-I) observations below the median, so, by using the
addition rule (2.4.2),
t-r-1 n!
P(Y(r»m) = I . '( _.) , -2 •
(1)"(7.3.2)
t .. o ... n ...
H a 95 per cent confidence limit is required, r is now chosen so as to
make this expression as near as possible to 0·025 (2·5 per cent). In the
above example this means taking r = 2 giving
P(Y(i) > m)
t .. 1 91
= t~oi !(9-i)! 2 =
(1)9 1
512+ 512
9
= 0·0195,
i.e. it is unlikely that Y(i) will be above the population median. Because
of the symmetry of the binomial distribution when U' = i (§ 3.4)
this is also the probability that Y(8) < m; it is equally unlikely that
Y(8) is Ius than the population median. Thus, in general, (7.3.2) also
Digitized by Google
§ 7.3 105
Y(2) ~ In ~ Y(8)' and the probability of this must be 1-0·039 = 0·961,
P[y(r) ~ In ~ Y(II-r+l)] = 1- 2
l-r-1 n!
I .'(n _.),.,
(1)"
-2 • (7.3.3)
1=0 , .
t The _umption of normality could be tested B8 in § '.6 if there were more observa·
tions, but with one aample of 9 no useful test can be made.
Digitized by Google
106 17.4
or not be the population in which the investigator is really interested)
is likely to lie.
The limits must be baaed on Student's t distribution (§ 4.4) because
only the estimated standard deviation is available. Reference to tables
(see § 4.4) shows that, in the long ron, 95 per cent of values of t (with
8 d.f.) will fall between t = -2·306 and t = +2·306. The definition of
t (eqn. (4.4.1» is (Z-p)/8(Z) where Z is normally distributed. In the
present example the (assumed) normally distributed variable of interest
is the sample mean, g, 80 t is defined as (g-p)/8(g).
It follows that in 95 per cent of experiments t = (g-p)/8(g) is
expected to lie between -2·306 and +2·306, i.e.
P[ -2·306~(g-p)/8(g)~+2·306] = 0·95,
:.p[ -2·306.8(g)~(g-p)~+2·306.8(g)] = 0·95,
:.P[g-2·306.8(9)~p~+2·306.8(g)] = 0·95.t
(7.4.1)
Digitized by Google
- - - - , - - - -- -- -- --- ---
- - - - -
17.4 107
per cent (0'005) of the area. under the curve for the distribution of t
with 8 d.f. lies below -3,355 a.nd a.nother 0·5 per cent above +3'355,
a.nd 99 per cent lies between these figures. The 99 per cent Gaussia.n
confidence limits are then 1i±"'(g), i.e. 128·5 to 149·9 ml/min.
Digitized by Google
108 § 7.5
be normally distributed with var(log m) = var(log a)+var(log b)
from (2.7.3) (given independence). Thus confidence limits for log 'In
could be calculated as in § 7.4, log m±tv'[var(log m)], and the anti-
logarithms found. See § 14.1 for a discussion of this procedure.
When a and b are normally distributed the exact solution is not
difficult. But, because it looks more complicated at first sight, it
will be postponed until § 13.5 (see also § 14.1), nearer to the numerical
examples of its use in §§ 13.11-13.15.
Digitized by Google
----. - - - ----
§ 7.6 109
8(y), and 80 1'96 must be replaced by the appropriate value of Student's
t (for example 2'306 for P = 0·96 limits in the example in § 7.4; see
al80 § 4.4). When this is done PL andPB are the limits previously found
using (7.4.1).
Obeerved
(a) mean
fi...
I
I
I
I
I
I
I
I
I
t-a(y)--t
I
I
:--1·96 a(!}) 2·5 per cent of area
Sample meani
(b)
I
I
I
I
I
I
I
I
t-a(g).l
I
I
I
~.-- 1'96a(g)--I
,.
Observed
Sample mean i
mean
FIG. 7.6.1. One way of looking at confidence limite. See text.
Digitized by Google
no §7.7
when given the drug is rln (as in § 3.4), i.e. 7/8 = 0·876 or 8Nl per
cent. What is the error of this estimate' Would it be unreasonable,
for example, to suppose that the population contained only 60 per cent
of 'improvers' t The answer can be found without any ca.1culation at
all using Table A2, which is based on the following reasoning.
The approach described in 17.6 can be used to find confidence limite for the
population value of~. For concreteness suppose that 95 per cent (or P = 0·95)
confidence limits are required for the population value of" when "oIM 'successes'
have been observed out of n trials. The highest reasonable value of ", "a 8&1',
wiD be taken &8 the value that, if it were the true value, would make the observa-
tion of "OIM or fewer successes a rare event (an event occurring in only 2·5 per cent
of repeated. sets of n trials). Now the probability of" succeaaea PC,,), is given
by (8.4.8), and " <; ,,_ if " = 0 or 1 or..• or "OIM' so, using (8.4.8) and the
addition rule, (2.4.2), it is required that
'-'.'" nl (7.7.1)
p[" <; "oIMJ = 1: 1
,-0 " (n-,,) 1"*(I-"a)"-' = 0·025.
The only unknown in this equation is "a, the upper confidence limit for the
population proportion, so it can be solved for "a. There is no simple way of
rearranging the equation to get ~a however, so tables are provided (Table A2)
giving the solution. Similarly, the lowest reasonable value, "L' for the population
~ (the lower confidence limit for ") is taken &8 the value that, if it were the
true value, would make the observation of "ob. succeaaea or more (i.e. " = "OIM
or,,_+ 1 Of••• or n) a rare event. Thus "L is found by solving
(7.7.2)
Digitized by Google
§ 7.7 HI
of samples would contain 7 or fewer improvers. Thus, if the drug were
tested on an infinite sample (rather than only 8) it would not be sur-
prising (see § 7.9 for a more precise interpretation) to find any propor-
tion of patients improving between ~L = 0·4735 and ~B = 0·9968.
The observation is compatible with any hypotAetical population ~
that lies between the confidence limits (see § 9.4) 80 the observation of
7 improving out of 8 cannot be considered incompatible with a true
improvement rate of 50 per cent (~ = 0·5) at the P = 0·95 level of
significance. For greater certainty the P = 0·99 confidence limits
would be found from the tables. They are, of course, even wider,
36·85 to 99·94 per cent. A sample of 8 gives surprisingly little informa-
tion about the population it was drawn from, even when all the assump-
tions of randomness and simple sampling (see § 3·2) are fulfilled.
The comparison of two ob8enJed binomial proportions is a different
problem. It is discussed in Chapter 8.
The main variables are the virgin he-goat and the maiden pure in
heart. Virginity may for the present be regarded as an absolute char-
acter, but purity in heart no doubt varies from person to person.
Digitized by Google
112 The calculation and interpretation oJ confidence limits § 7.8
Oakley therefore supposed that it might be possible to estimate the
purity in heart index (PHI) of a maiden by observing how many of a
group of he-goats are converted into young men. The original experi-
menters were clearly guilty of a grave scientific error in using only one
he-goat.
We shall assume, as Oakley did, that the conversion of he-goats into
young men is an all-or-nothing process; either complete conversion or
nothing occurs. Oakley supposed, on this basis, that a. comparison
could be made between, on one hand, the percentage of he-goats
converted by maidens of various degrees of purity in heart, and, on the
other hand, the sort of pharmacological experiment that involves the
measurement of the percentage of individuals showing a specified
effect in response to various doses of a drug. In conformity with the
common pharmacological practice he supposed that a plot of percent-
age he-goat conversion against log purity in heart index (log PHI)
would have the sigmoid form shown in Fig. 14.2.4. As explained in
Chapter 14, this implies that log PHI required to convert individual
he-goats is a normally distributed variable. Furthermore it means that
infinity purity in heart is required to produce a population he-goat
conversion rate (HGCR) of lOO per cent.
Although there is a lack of experimental evidence on this point,
the present author feels that the assumption of a normal distribution
is, as so often happens, without foundation (see § 4.2). The implication
of the normality assumption, that there exist he-goats so resistant to
conversion that infinite purity in heart is needed to affect them, has
not been (and cannot be) experimentally verified. Furthermore the
very idea of infinite purity in heart seems likely to cause despondency
in most people, and should therefore be avoided until such time as its
necessity may be demonstrated experimentally. Oakley's treatment of
the problem requires, in addition, that PHI be treated as an independent
variable (in the regression sense, see Chapter 12), which raises problems
because there is no known method of measuring PHI other than he-goat
conversion.
In the light of these remarks it appears to the present author desirable
tha.t the purity in heart index should be redefined simply as the
population percentage of he-goats converted. t This simple operational
definition means that the PHI of all maidens will fall between 0 and
100, and confidence limits for the true PHI can be found easily from
Digitized by Google
.8 interpretation 113
gitized by Goo
114 § 7.8
economic views such young men might have, and it might well be that
their behaviour would bring scientific experiment into disrepute. This
is, however, a problem for necromancers rather than statisticians.'
7.9. Interpretation of confidence limits
The logical basis and interpretation of confidence limits are, even now,
a matter of controversy. However, few people would contest the
statement that if P = 0·95 (say) limits were calculated according to
the above rules in each of a large number of experiments then, in the
iE
-a
CIlE
~.
·c .
~1 1 2 3 4 !i 6 7 8 9 10 Il 12 13 14 15 16 17 18 19 :!H
Experiment number
long run, 95 per cent of the intervals so calculated would include the
population mean (§ 7.4) or median (§ 7.3),1-', if the assumptions made
in the calculation were true. The limit8 must be regarded a8 optimiBtic
a8 explained in § 7.2.
In any particular experiment a single confidence interval is calculated
which obviously either does or does not include 1-" It might therefore
be thought that it could be said a priori that the probability that the
interval includes I-' is either 0 or 1, but not some intermediate value.
However, in a series of identically conducted experiments, somewhat
different values of the 84mple median or mean, and of the sample
Digitized by Google
§ 7.9 115
scatter, for example of 8(g), will, in general, be found in every experi-
ment. The confidence limits will therefore be different from experiment
to experiment. The prediction is that, in the long run 95 per cent (19
out of 20) of such limits will include", as illustrated in Fig. 7.9.1. It is
not predicted that in 95 per cent of experiments the true mean will
fall within the particular set of limits calculated in the one actual
experiment.
Thus, if one were willing to consider that the actual experiment
was a random sample from the population of experiments that might
have been done, i.e. that 'nature has done the shuftling' one could go
further and say that there was a 91S per cent chance of having done an
experiment in which the calculated limits include the true mean, ",.
Another interpretation of confidence intervals will be mentioned
later during the discussion of significance tests.
Digitized by Google
8. Classification measurements
It must be manifest that, were this true, the population of the world would be at
a atandstill. In truth the rate of birth is slightly in excess of that of death. I
would auggeat that in the next edition of your poem you have it read:
Strictly speaking this is not correct. The actual figure is a decimal so long that
I cannot get it in the line, but I believe 1 1/16 will be autHciently accurate for
poetry.
I am etc.'
Letter said to have been written to Tennyson by Charles Babbage after reading
'The vision of sin' (Mathematical GaIIelte, 1927. p. 270)
Digitized by Google
§ 8.2 117
8.2. Two independent samples. The randomization method and
the Fisher test
Randomization tests were introduced in § 6.3. As an example of the
result of classification measurements (see § 6.4), consider the clinical
comparison of two drugs, X and Y, on seven patients. It is funda-
mental to any analysis that the allocation of drug X to four of the
patients, and of Y to the other three be done in a strictly random way
using random number tables (see §§ 2.3 and 8.3). It is noted, by a
suitable blind method, whether each patient is improved (I) or not
improved (N). The result is shown in Table 8.2.1 (b).
TABLE 8.2.1
P08Bible re81dtB 01 the trial. Result (b) was actually observed
1 N I Total 1 N Total 1 N Total 1 N Total
Drug X 0 2 2
DrugY "
0 3 "
3
3
1
1
2 "
3 2 1 "
3
1
3
S
0 "
S
Total 3 7 3 7 7
" (a)
" (b)
" S
(e)
"(el)
S 7
With drug X 75 per cent improve (3 out of 4), and with drug Y only
33 1/3 per cent improve. Would this result be likely to occur if X and Y
were really equi-effective 1 If the drugs are equi-effective then it follows
that whether an improvement is seen or not cannot depend on which
drug is given. In other words, each of the patients would have given
the same result even if he had been given the other drug, 80 the observed
difference in 'percentage improved' would be merely a result of the
particular way the random numbers in the table came up when the
drugs were being allocated to patients. t For example, for the experiment
in Table 8.2.1 the null hypothesis postulates that of the 7 patients, 4
would improve and 3 would not, quite independently of which drug was
given. If this were so, would it be reasonable to suppose that the
random numbers came up so as to put 3 of the 4 improvers, but only
1 of the 3 non-improvers, in the drug X group (as observed, Table
8.2.1 (b» 1 Or would an allocation giving a result that appeared to
t Of course, if a subject who received treatment X during the trial were given an
equi-eft'ective treatment Y at a later time, the re&pOJlIIe of the aecond occaaion would
not be exactly the lIBDle B8 during the trial. But it is being postulated that i/ X and Y
are equi-efl'ective then if one of them is given to a given subject at a given moment in
time, the re&pOJlIIe would have been exactly the lIBDle if the other had been given to the
lIBDle 8llbject at the lIBDle moment.
Digitized by Google
118 § 8.2
favour drug X by as much as (or more than) this, be such a rare happen-
ing as to make one BUBpect the premise of equi-effectiveness ,
Now if the selection was really random, every possible allocation of
drugs to patients should have been equally probable. It is therefore
simply a matter of counting permutations (possible allocations) to
find out whether it is improbable that a random allocation will come
up that will give such a large difference between X and Y groups as
that obaerved (or a larger difference). Notice that attention is restricted
to the actual 7 patients tested without reference to a larger population
(see also § 8.4). Of the 7 patients, 4 improved and 3 did not.
Three ways of arriving at the answer will be described.
(a) PhyBic4l randomization. On four cards write 'improved' and on
three write 'not improved'. Then rearrange the cards in random order
using random number tables (or, less reliably, shuffle them), mimicking
exactly the method used in the actual experiment. Call the top four
cards drug X and the bottom three drug Y, and note whether or not
the difference between drugs resulting from this allocation of drugs to
patients is as large as, or larger than, that in the experiment. Repeat
this say, 1000 times and count the proportion of randomizations that
result in a difference between drugs as large as or larger than that in the
experiment. This proportion is P, the result of the (one-tail) significance
test. If it is small it means that the observed result is unlikely to have
arisen solely beoause of the random allooation that happened to come
up in the real experiment, so the premise (null hypothesis) that the
drugs are equi-effective may have to be abandoned (see § 6.2). This
method would be tedious by hand, though not on a computer, but
fortunately there are easier ways of reaching the same results. The
two-tail test is discussed below.
(b) Oounting :permutstionB. As each possible allocation of drugs to
patients is equally probable. if the randomization was properly done.
the results of the procedure just described can be predicted in much
the same way that the results of coin tossing were predicted in § 3.2.
If the seven patients are distinguished by numbers. the four who
improve can be numbered 1. 2. 3. and 4, and those who do not oan be
numbered G, 6, and 7. According to the null hypothesis eaoh patient
would have given the same response whiohever drug had been given.
How many way oan the 7 be divided into groups of 3 and 4 t The
answer is given, by (3.4.2), as 71/(4131) = 35 ways. It is not neoessary
to write out about both groups since once the number improved has
been found in one group (say the smaller group, drug Y, for convenience),
Digitized by Google
§ 8.2 Olauijication metJ8Uremnt8 119
TABLB 8.2.2
EftufMf'ation oj aU 3G poaibk tDa1J8 oj 86lecting a group oj 3 ~
from 7 10 be lima drug Y.PtJt~ I, 2, 3, and 4: ~ ami ptJIN"'.
G, 6, and 7 did noI. Number oj ftlbjed8 improfIifl{1 tDiIA Y = b (see
Table 8. 2. 3(a»
1 6 6 1 S 6
1 6 '1 1 S 6
1 6 '1 1 S '1 0= 2 improve.
,,
"
18 waya all
2 6 8 1 5 pving
2
2
6 '1
6 '1
b = 1 improve.
12 W&ya all
1
1 , 6
'1
Table 8.2.1(c).
P - 18/8' = 0·6l4.
giving
S 6 6 Table 8.2.1(b). 2 S 6
S 6 '1 P = 12/86 = 0·S4.S 2 S 6
S 6 '1 2 S '1
,, 6 6 2
,,, 6
, 6
6
'1
'1
2
2
6
'1
S ,, 6
S
S , 6
'1
1
1
2 S
2 ,, b = S improve.
'W&ya all
1
2
3
S , giving
Table 8.2.1(d).
P =- '/86 = 0·11'
the number improved in the other group follows from the fact that the
total number improved is neoessarily 3. All 35 ways in whioh the drug Y
group could have been constituted are listed systematically in Table
8.2.2. If the randomization was done properly each way should have
had an equal ohance of being used in the experiment. Notice that
fJf'O'Pt3 randomization ift coruluctifl{1 the experiment is crucial lor the
Digitized by Google
120 Olas8ification measurements § 8.2
analyBis of the re8'Ults. It is seen that 12 out of the 35 result in one
improved, two not improved in the drug Y group, as was actually
observed. Furthermore, lout of 35 shows an even more extreme
result, no patient at all improving on drug Y group, as shown in
Table 8.2.1{a).
Thus P = 12/35+ 1/35 = 0·343+0'029 = 0·372 for a one-tail testt
(see § 6.1). This is the probability (the long-run proportion of repeated
experiments) that a random allocation of drugs to patients would
be picked that would give the results in Table 8.2.1{a) or 8.2.1(b), i.e.
that would give results in which X would appear as superior to Y as in
the actual experiment (Table 8.2.1{b», or even more superior (Table
8.2.1(80», if X and Y were, in fact, equi-effective. This probability is
not low enough to suggest that X is really better than Y. Usually a
two-tail test will be more appropriate than this one-tail test, and this is
discussed below.
Using the results in Table 8.2.2, the sampling distribution under the
null hypothesis, which was assumed in constructing Table 8.2.2, is
plotted in Fig. 8.2.1. This is the form of Fig. 6.1.1 that it is appropriate
to consider when using the randomization approach. The variable on the
abscissa is the number of patients improved on drug Y, i.e. b in the
notation of Table 8.2.3{a). Given this figure the rest of the table can
be filled in, using the marginal totals, so each value of b corresponds to a
particular difference in percentage improvement between drugs X and
Y. Fig. 8.2.1 is described as the randomization (or permutation) distribu-
tion of b, and hence of the difference between samples, given the null
hypothesis. The result of a one-tail test of significance when the
experimentally observed value is b = 1 (Table 8.2.1{b», is the shaded
area (as explained in § 6.1), i.e. P = 0·372 as calculated above.
The two-tail test. Suppose now that the result in Table 8.2.1{a)
had been found in the experiment (b = 0). A one-tail test would give
P = 1/35 = 0,029, and this is low enough for the premise of equi-
effectivene88 of the drugs to be suspect if it is known beforehand that
Y cannot po88ibly be better than X (the opposite result to that observed).
As this is usually not known a two-tail test is needed (see § 6.1). How-
ever, the most extreme result in favour of drug Y {b = 3 as in Table
t This is • one·tail teat of the null hypothesis that X and Y are equi·eifective, when
the alternative hypothesis is that X is better than Y. If the alternative to the null
hypothesis had been that Y was better than X (the alternative hypothesis must, of
oourae, be choeen before the experiment) then the one·tail P would have been 12/35
+18/35+'/35 = 0'971, the probability of rellUlt as favourable to Y as that obeerved,
or more favourable, when X and Y are really equi·eft'ective.
Digitized by Google
§ 8.2 Classification measurements 12]
8.2.1(d» is seen to have P = 0·114. It is therefore impossible that 8.
verdict in favour of drug Y could have been obtained with these
patients. 1/ the drugs were rea.1ly equi-effective then, if the hypothesis
of equi-effectiveness were rejected every time b = 0 or b = 3 (the
two most extreme results), it would be (wrongly) rejected in 2·9+11·4
= 14·3 per cent of trials--far too high a level for the probability of an
error of the first kind (see § 6.1, para. 7). A two-tail test is therefore
not possible with such a small sample. This difficulty, which can only
occur with very small samples-it does not happen in the next example,
has been discussed in a footnote in § 6.1 (p. 89).
o·5 -
o.-&
0·343
o·3
-
o·2
o·1
0·0"29
--
( )·0 I
3 b
FIG. 8.2.1. Randomization distribution of b (the number of patients improv-
ing on drug Y). when X and Y are equi-eftective (i.e. null hypothesis true).
TABLE 8.2.3
8ucceaa failure total 8UCceaa failure total
treatment X a A-a A 8 7 15
treatment Y b B-b B 1 11 12
total C D N 9 18 27
(a) (b)
Digitized by Google
122 § S.2
then Fisher has shown that the proportion of permutations giving rise
to the table is
AIBIO!D!
p= . (S.2.1)
Nla!(A-a) !b!(B-b)!
For example, for Table S.2.1(b), P = 4!3!4 !3 !/(713!I1l !21) = 12/35
= 0·343 as already found. With larger figures (S.2.1) is mGSt con-
veniently evaluated using tables of logarithms of factorials (e.g. Fisher
and Yates, 1963).
In fact no calculation at all is necessary as tables have been published
(Finney, Latacha, Bennett, and Hau, 1963) for testing any 2 X 2 table
with A and B, or 0 and D, both not more than 40. Unfortunately, to
keep the tables a reasonable size it is not possible to find the exact P
value for all 2 X 2 tables, but it is given for those 2 X 2 tables with
marginal totals up to 30 for which P ~ 0·05 (one tail). The published
tables are for B ~ A and b < a only, to avoid duplication. H the
table to be tested does not comply with this, rows and/or columna
must be interchanged until it does. As an example, the table in Table
S.2.3(b), which is from the introduction to the table of Finney et al.
(1963), is tested using the appropriate part of their table, which has been
reproduced in Table 8.2.4.
TABLB S.2.4
Exact tut Jor the 2 X 2 table (Extract from tables of Finney et al. (1963»
Probability (nominal)
.
9 2 0-0118 1 0-007 1 0·007 100'OCIlI
8 II 0-018 1 1 0-018 00'008 00'008
7 1 0-088 00'007 00·007
Digitized by Google
§ 8.2 O1,assification m41(J,8Uremmt8 123
probability a figure in bold type which is the largest value of 6 that is just
'significant in a one-tail test at the 6 per cent (or 2·6, 1, or 0·6 per cent)
level', i.e. for whioh the one-tailt P ~ 0·05, (or 0·025, 0·01, or 0·005).
The exact value of P is given in sma.ller type. It is the nearest value,
given that 6 must be a whole number, that is not greater than the
nominal value. In this example the one-tail P corresponding to the
observed 6 = 1 is 0·018. This is the sum of the P values oa.lculated
from (8.2.1) for the observed table (a = 8,6 = 1, P = 0·017), and the
only possible more extreme one with the same ma.rgina.l totals (a = 9,
6 = 0, P = 0·001). To find the two-tail P value (see § 6.1 and above)
oonsider the distribution of 6 a.na.Iogous to Fig. 8.2.1. In this case 6
can vary from 0 to 9 and if the null hypothesis were true it would be
4 on the average (see § 8.6). The one-tail P found is the tail of the
distribution for 6 ~ 1. It is required to out off an area. &8 near &8
poeaibleto1;his in the other tail ofthe distribution (6)4),&8 in Fig. 6.1.1.
No value of 6 oute off exactly P = 0·018 but 6 = 7 cuts off an area. of
P = 0·019 that is near enough (see footnote, § 6.1, p. 89). This is the
sum of the probabilities of 6 = 7 and a.ll the more extreme (6 = 8 and
6 = 9) resulte. It can be found from the tables of Finney et al. by the
method described in their introduction. The table has a = 2, 6 = 7,
A. - ( I = 13, B -6 = 6 so oolumns are interohanged, &8 mentioned
above, and the table entered with 13 and 6 rather than 2 and 7, &8
marked in Table 8.2.4. Therefore if it were resolved to reject the null
hypothesis whenever 6 ~ 1 (as observed) or when 6 ~ 7 (opposite tail)
then, if the null hypothesis W&8 in fact true, the probability that it
would be rejected (wrongly)-an error of the first kind-would be
P = 0·018+0·019 = 0·037. This result for the two-tail test is small
enough to make one question the null hypothesis, i.e. to suspect a
real difference between the treatmente, (see § 6.1).
In practice, if the samples are not too sma.ll, it would be adequate,
and much simpler, to double the one-tail P from the table to get the
required two-tail P.
Digitized by Google
124 Ola8sijication measurements § 8.3
treatment X is given only to men and treatment Y only to women.
Yet the logical basis of significance tests will be destroyed if the
experimenter rejects randomizations producing results he does not
like. Often this will be preferable to the alternative of doing an experi-
ment that is, on scientifio grounds, silly. But it should be realized that
the choice must be made.
There is a way round the problem if randomization tests are used.
IT it is decided beforehand that any randomization that produces two
samples differing to more than a specified extent in sex composition-
or weight, age, prognosis, or any other criterion-is unacceptable then,
if such a randomization comes up when the experiment is being
designed, it can be legitimately rejected if, in the analysis of the results,
those of the possible randomizations that differ excessively, according to
the previously specified criteria, are also rejected, in exactly the same
way as when the real experiment was done. So, in the case of Table
8.2.2, the number of possible allocations of drugs to the 7 patients
could be reduced to less than 35. This can only be done when using
the method of physical randomization, or a computer simulation
of this process, or writing out the permutations as in Table 8.2.2.
The shorter methods using calculation (e.g. from (8.2.1), or published
tables (e.g. for the Fisher exact test, § 8.2, or the Wilcoxon tests,
§§ 9.3 and 10.4), cannot be modified to allow for rejection of randomiza-
tions.
8.4. Two independent samples. Use of the normal approximation
Although the reasoning in § 8.2 is perfectly logical, and although
there is a great deal to be said for restricting attention to the observa-
tions actually made since it is usually impossible to ensure that any
further observations will come from the same population (see §§ 1.1
and 7.2), the exact test has nevertheless given rise to some controversy
among statisticians. It is possible to look at the problem differently.
IT, in the example in Table 8.2.1, the 7 patients were thought of as
being selected from a larger population of patients then another sample
of 7 would not, in general, contain 4 who improved and 3 who did not.
This is considered explicitly in the approximate method described in
this section. However there is reason to suppose that the exact test
of § 8.2 is best, even for 2 X 2 tables in which the marginal totals are
not fixed (Kendall and Stuart 1961, p. 554).
Consider Table 8.2.3 again but this time imagine two infinite popula-
tions (e.g. X-treated and Y-treated) with true probabilities of success
Digitized by Google
§ 8.4 Olassification measurements 125
(e.g. improved) (#11 and (#12 respectively. From the first population a
sample of A individuals is drawn at random and is observed to contain a
successes (e.g. improved patients). Similarly b successes out of B are
observed in the sample from the second population. The experimental
estimates of (#11 and (#12 are, as in § 3.4, PI = alA and P2 = bIB, the
observed proportions of successes in the samples from the two popula-
tions. In repeated trials a and b should vary as predicted by the binomial
distribution (see § 3.4).
(8.4.1)
The true value, (#I, is, of course, unknown, and it must be estimated
from the experimental results. No allowance is made for this, which is
another reason why the method is only approximate. The natural
estimate of (#I, under the null hypothesis, is to pool the two samples
and divide the total number of successes by the total number of trials
(e.g. total number improved by total number of patients), i.e.
P = (a+b)/(A+B). Thus, taking x = (Pl-P2) as the normal variable,
with, according to the null hypothesis, " = 0, an approximate normal
Digitized by Google
126 Ola88iflcation mea8Urement8 § 8.4
deviate (see § 4.3) can be calculated, using (4.3.1) and (8.4.2). This
value of u can then be referred to tables of the standard normal distri-
bution (see § 4.3).
~-I' (PI -P2)
U=-'" • (8.4.3)
(1(x) V[p(I-p)(I/A+IIB)
~ 8/ 11)-1/ 12 = 2,4648.
(PI-0·5IA)-<.P2+ 0·5I B )
(8.4.4)
~ .yCP(I-p)(I/A+IIB)] ,
where PI > Pi' Using the results in Table 8.3 again, gives u = 2·054.
Again using Table I of the Biometrika tables it is found that 4·0 per
cent of the total area of the standard normal distribution lies outside
u = ±2·054 as shown in Fig. 8.4.1 (cf. Fig. 6.1.1). In other words, in
repeated experiments it would be expected, il the null hypothesis were
tme, that in 2·0 per unit of experiments u would be less than -2,01)4,
percent
t This table actually gives the area below u = +2"68. i.e. 1-0·007 = 0·993.
See 1'.3 for det&ila.
Digitized by Google
§ 8.4 127
and in 2·0 per cent u would be greater than +2·054. This is a two-tail
test (see § 6.1).
TM ,...." of 1M tat. The probability of observing a dUrerence (poeitive or
negative) inauccess rate between the sample from population 1 (X-treated) and
tha.t from population 2 (Y-treated) as la.rge as. or la.rger than, the obeerved
sample dUrerenC8, if there were DO real dUrerence between the treatment.
(popuJa.tiODB), would be approximately 0.04:. a 1 in 26 cha.nce.
0·4
-;-
.
;::;
;... 0·3
·iI
.8
.~ 0·2
j
e 0·1
c..
Digitized by Google
128 Olas8ification. measurements § 8.5
distribution with! degree of freedom, denoted rU)' As suggested by the definition,
the scatter Been in estimates of the popula.tion variance, calculated from repeated
samples from a rwrmally diBtribtded population, follows the 1.2 distribution. In
fact r = !r/a2 where r is an estimate of rr issued on! degrees of freedom. The
coDBequent use of r for testing hypotheses about the true variance of such a
population is described, for example, by Brownlee (1965, p. 282).
In the special case of 1= 1 d.f., one has X(~) = 1£2, the square of a
single standard normal variate. Tables of the distribution of chi
squared with one degree of freedom can therefore be used, by squaring
the values of 1£ found in § 8.4, as an approximate test for the 2 X 2
table. In practice X(~) is not usually calculated by the method given
for the calculation of 1£, but by another method which, although it
does not look related at first sight, gives exactly the same answer, as
will be seen. The conventional method of calculation to be described
has the advantage that it can easily be extended to larger tables of
classification measurements than 2 X 2. An example is given below.
The form in which r
is most commonly encountered is that appro-
priate for testing (approximately) goodness of fit, and tables of
classification measurements (contingency tables). If Xo is an observed
frequency and x. is the expected value of the frequency on some
hypothesis, then it can be shown that the quantity
(8.IU)
t This need not be a whole number (_ Table 8.6(b) for ezample). It is a predicted
long·run ClWraflII frequency. The individual frequencies mUBt, of course, be integen.
Digitized by Google
§ 8.5 129
= 4. The original table of observations, and the table of values expected
on the null hypothesis are thus:
Popu1&tion 1 8 7 16 6 10 15
Population 2 1 11 12 4, 8 12
Total 9 18 27 9 18 27
The summation in (8.5.1) is over all of the cells of the table. The
differences (xo-x.) are 8-5 = 3, 1-4 = -3, 7-10 = -3, and
11-8 = 3. Thus, from (8.5.1),
32 ( -3)2 ( -3)2 32
X2 = 5"+-4-+10+8" = 6,075.
Digitized by Google
130 § 8.5
true, a value of %(~) as large as 4·219 or larger would be found in
4·0 per cent of repeated experiments in the long run. This casts a
certain amount of suspicion on the null hypothesis as explained in
§ 8.4.
It should be noticed that the probability found using is that r
appropriate for a two-tail test of significance (as shown in § 8.4,
0·5
0'4
1 c1el(rt'(' of frt't'llnm
0·2
IJ-l
0." L_L--_L--_L-__.L..--,~~::::i:::==~""",,,,,,_~----I
o ~ 3 4 Ii 7 H 9
4·~1!1 Xl
FIG. 8.5.1. The distribution of chi-squared. The observed value, '·219, for
chi-squared with one degree of freedom (see text) would be exceeded in only'
per cent of repeated experiments in the long run if the null hypothesis were true.
The distribution for 4, degrees of freedom is also shown. (See Chapter 4, for
explanation of probability density.)
Fig. 8.4.1, of. § 6.1) in spite of the fact that only one tail of the r
distribution is considered in Fig. 8.5.1. This is because involves the r
8fJ'U4ru of deviations, 80 deviations from the expected values in either
r
direction increase in the same direction.
Digitized by Google
§ 8.5 131
U8e oj cAi-aqu.ared Jor luting tJ88OCiation in tablu oj claa8iji.calion me.tJ8Ure-
menta larger than 2 X 2
If the results of treatments X and Y had been olassified in more than
two ways, for example sucoess, no change, or failure, the experiment
shown in Table 8.2.3(b) might have turned out as in Table 8.5.1(a).
TABLB 8.5.1
no
Treatment X ~ 8 4 16
1 6 12
Treatment Y
I 6
(a) observed
I 9 8 10 27
no
aucceaa change failure
Digitized by Google
132 § 8.5
Note that no correction, lor contin.uity i8 used lor tables larger tllan.
2 X 2. r has two degrees of freedom since only two cells C&ll be filled
in Table 8.5.I(b), the rest then follow from the marginal totals. Consult-
ing a table of the r distribution (e.g. Fisher and Yates 1963, Table
r
IV) shows that a value of (with 2 d.f.) equal to or larger than 6·086
would occur in slightly less than 5 per cent of trials in the long run, il
the null hypothesis were true; i.e. for a two-tail test 0·025 < P < 0·05.
This is small enough to cast some suspicion on the null hypothesis.
Independence of classifications (e.g. of treatment type and succe88
r
rate) is tested in larger tables in an exactly analogous way, being the
sum of rTc terms, and having (r-I)(Tc-I) degrees of freedom, for a
table with r rows and Tc columns.
Digitized by Google
§ S.6 133
in eqn. (8.5.1). The expected numbers, on the null hypothesis, are
found much as above. The example is more complicated than in the
case of tossing a die, becauae different numbers of students will be
studying each subject and this must obviously be allowed for. The total
number of smokers divided by the total number of students in the
college gives the proportion of smokers that would be expected in each
class is the null hypothesis were true, so multiplying this proportion
by the number of people in each class (the number of physics students
in the college, etc.) gives the expected frequencies, x., for each class.
The value calculated from (S.5.1) can be referred to tables of the ohi-
squared distribution with k-l degrees of freedom, as before.
Digitized by Google
134 § 8.6
cells are Poisson distributed as x. gives, using (S.5.1) with the results
in Table 3.6.1,
2 (4-8)2 (5-9)2 (3-5)2 (0-5)2
X = - 8 - + 9 +"'+-5-+ 5 = 14·7.
Digitized by Google
1.8.7 135
12) have been assigned to the XY sequence, the other half to the
Y X sequence. These results can be arranged in a 2 X 2 table, 8.7 .3(a)
consisting of two independent samples. A randomization (exact) test
or r approximation applied to this table will test the null hypothesis
TABLE 8.7.1
not
improved improved
(I) (N)
Drug X 12 0 12
DrugY (; 7 12
17 7 24 8.7. 1(a)
DrugY
r
I "
N
Ii 7 12
DrugX{: 0 0 0
Ii 7 12 8.7.1(b)
TABLE 8.7.2
In both periods 3 2 5
In period (1)
not in (2) 3 0 3
In period (2)
not in (1) 0
In neither period 0 "0 "
0
6 6 12
that the proportion improving in the first period is the same whether
X or Y was given in the first period, i.e. that the drugs are equi-etfective.
The test has been described in detail in 18.2 (the table is the same as
Table 8.2.1(a», where it was found that P (one tail) = 0·029 but that
the sample is too small for a satisfactory two-tail test, which is what is
needed. In real life a larger sample would have to be used.
Digitized by Google
136 Cla8siftcation measurements § 8.7
The subjects showing the same response in both periods give no
information about the difference between drugs but they do give
information about whether the order of administration matters.
Table 8.7.3(b) can be used to test (with the exact test or chi squared)
the null hypothesis that the proportion of patients giving the same
result in both periods does not depend on whether X or Y was given
first. Clearly there is no evidence against this null hypothesis. If, for
TABLE 8.7.3
Improved in Improved in
(1) not (2) (2) not (1)
X in period (1)
X in period (2)
S
0 ,
0
,
3
3 , 7 8.7.3 (a)
X in period (1)
X in period (2)
3
2 ,
S 6
6
5 7 12 8.7.3(b)
Digitized by Google
9. Numerical and rank measurements.
Two independent samples
Digitized by Google
138 §9.2
as in § 8.2, but, instead of each being classified as improved or not
improved, a numericaJ measurement is made on each. For example,
the reduction of blood glucose concentration (mg/IOO mI) following
treatment might be measured. Suppose the results were a.s in Table
9.2.1.
The numbering of the patients is arbitrary but notice that if a positive
response is counted a.s an 'improved' and negative as 'not improved',
Table 9.2.1 is the same as Table 8.2.1(b) 80, if the size of the improve-
ment is ignored, the results can be a.naJyaed exactly a.s in § 8.2.
However, with such a sma.1l sample it is easy to do the randomiza-
tion test on the measurements themselves. The argument is as in
TA.BLE 9.2.1
Ruponau (glUCOBe ~, mg/l00 mI) to ewo drug,. The ra"ks 01
the ruponau are giwn lor use iA § 9.3
Drug A DrugB
Patient Reaponae Patient Reaponae
number (mar/100m!) Rank number (mar/100m!) Rank Total
1 10 6 4: 6 4:
2 16 6 6 -8 2
8 20 7 7 -6 1
«5 -2 8
Total 4:S 21 -8 7 4:0
§ 8.2. See p. 117 for deta.ils. If the drugs were really equi-effective (the
null hypothesis) each patient would have shown the same response
whichever drug had been given, 80 the apparent difference between
drugs would depend 80lely on which patients happened to be selected.
for the A group and which for the B group, i.e. on how the random
numbers happened to come up in the selection of 4 out of the 7 for drug
A. Again, a.s in § 8.2, the seven measurements could be written on
cards from which 4 are selected at random (just as in the real experi-
ment) and ca.lled A, the other 3 being B. The difference between the
mean for A and the mean for B is noted and the process repeated. many
times. There is actually no need to calculate the difference between
means each time. It is sufficient to look at the total response for drug
B (taking the smaller group for convenience) because once this is
known the total for A follows (the total of &11 7 being always 40), and
80 the difference between means a.1so follows. If the experimenta.1ly
Digitized by Google
19.2 Two ifldependent samplu 139
observed total response for B (-3 in the example), or a more extreme
(i.e. smaller in this example) total, arises very rarely in the repeated
randomizations it will be preferred to suppose that the difference
between samples is caused by a real difference between drugs and the
null hypothesis will be rejected, just as in § 8.2.
TABLE 9.2.2
Enumeration, 01 all 35 po8Bible waY8 018electing a gt"oup 0/3 patien.t8lrom
7 to be given drug B. The re8p01l.8e lor each patient i8 given in Table
9.2.1. The total ranklllor drug Bare given lor we in § 9.3
6 6 7 -10 6 1 2 6 28 U
1 2 6 22 18
1 2 7 20 12
1 Ii 6 Ii 10 1 8 Ii 28 16
1 6 7 3 9 1 8 6 27 U
,,,
1 6 7 2 8 1 8 7 26 18
2 Ii 6 10 11 1 6 18 12
2 Ii 7 8 10 1 6 Ii 11
2 6 7 7 9 1 7 10 10
8 Ii 6 16 12 2 8 6 88 16
8 6 7 18 11 2 3 6 82 16
,,
8 6
6
7
6
12
0
10
9
2
2 ,,
8 7
Ii
80
18
U
18
, 6
6
7
7
-2
-3
8
7
2
2
,,,
6
7
17
16
12
11
3 6 23 l'
3
3 , 6
7
22
20
18
12
1 2 3 '6 18
,,
1 2 4 80 16
1 3 86 16
2 3 40 17
Digitized by Google
140 §9.2
number improved, the total response is calculated. For example, if
patients, 1,5, and 6 had been allocated to drug B the total response
would have been 10+(-2)+(-3) = 5 mg/l00 mI. The results from
Table 9.2.2 are collected in Table 9.2.3, which shows the randomization
distribution (on the null hypothesis) of the total response to drug B.
This is exactly analogous to Fig. 8.2.1. The observed total (-3) and
smaller totals (the only smaller one is -10) are seen to occur 2/35
(= 0'057) times, if the null hypothesis is true, and this is therefore the
one-tail P. For a two-tail test (see § 6.1) an equal area can be cut off
in the other tail (total for B ~ 40), 80 the result of the two-tail test is
P = 4/35 = 0·114. This is not small enough to cast much suspicion on
the truth of the null hypothesis, but it is somewhat different from the
P = 0·372 (one tail) found in the analysis of Table 8.2.1(b), to which,
as mentioned above, Table 9.2.1 reduces if the size.! of the improve-
ments are ignored. In § 8.2 a one-tail P = 0.372 was found and a
two-tail test was not po88ible. The reason for the difference is that in
the results in Table 9.2.1 the 'improvements' on drug A are much
greater in size than the (negative) 'non-improvements' on drug B.
The two-tail test can be done since in § 8.2 all 35 randomizations
yielded only 4 different poBBible results (Table 8.2.1) for the trial, but
with numerical measurements the 35 randomizations have yielded
27 possible results, listed in Table 9.2.3, so it it is possible to cut off
equal areas in each tail (cf. §6.1). Notice that if patient 3 had been
in the B group and patient 4 in the A group (this leaves Tables 9.2.2
and 9.2.3 unchanged) the observed total for group B would have been
20+ (-3)+( -5) = 12 and it is seen from Table 9.2.3 that a total
~ 12 occurs in a proportion 13/55 = 0·372 of cases. This one-tail P
(when a large improvement, patient 4, is seen with drug B) is as large
as that found in § 8.2.
With larger samples there are too many permutations to enumerate
easily. For two samples of 10 there are (by (3.4.2» 201/(10110 I)
= 184756 ways of selecting 10 samples from 20 individuals. However it
is not difficult for a computer to test a large sample of these possible
allocations by simulating the physical randomization (random &88ort-
ment of cards) mentioned at the beginning of this section, and of
§ 8.2. Programs for doing this do not seem to be widely available at
the moment but will doubtle88 become more common. This method has
the advantage that it can allow for the rejection of a random arrange-
ment that the experimenter finds unacceptable (e.g. all men in one
sample) as explained in § 8.3. The results in Table 9.2.4 are observations
Digitized by Google
§ 9.2 Two independent aamples 14:1
TABLE 9.2.3
Randomization distribution 01 total re8p0f&8e (mg/lOOml) 01 a gr01l.p 01
3 patimts given drug B (according to the null kypothe8i8 tkat A and B
are equi-eJJective). OO'lUJtf't.lded from Table 9.2.2
Total for
drugB Frequency
(mg/lOOml)
-10 1
-3 1
-2 1
0 1
2 1
3 1
6 1
7 1
8 1
10 2
12 2
13 2
16 2
17 1
18 1
20 2
22 2
23 2
26 1
27 1
28 1
30 2
32 1
33 1
36 1
4,0 1
46 1
Total 36
t In tbia paper the names of the drugs were mistakenly given 88 ( - ) hyoecyamine
and (+ ).hyoeoyamine. When someone pointed this out Student commented in a letter
to R. A. Filber, dated 7 January 1935, 'That blighter is of COUl'lle perfectly right and
of oourae it doesn't really matter two straws ••• '
Digitized by Google
142 § 9.2
randomization test of the sort just described could be done as follows.
(In the original experiment the samples were not in fact independent
but related. The appropriate methods of analysis will be discussed in
Chapter 10.) A random sample of 12000 from the 184756 possible
permutations was inspected on a computer and the resulting randomiza-
tion distribution of the total response to drug A is plotted in Fig. 9.2.1
TABLE 9.2.4
Response in. ko'Urs extra Bleep (compared with controls) induced
by (- )-hyOBcyamin.e (A) and (- )-hyoscin.e (B).
From Cushny and Peebles (1905)
Drug A DrugB
+0·7 +1·9
-1·6 +0·8
-0·2 +1-1
-1·2 +0'1
-0·1 -0,1
+3·"
+3'7 +"."
+5·5
+0'8 +1-6
0'0 +"·6
+2·0 +3'"
l:YA = 7'5 l:y. = 23·3
ftA = 10 ft. = 10
iiA = 0·75 'O. = 2·33
(of. the distribution in Table 9.2.3 found for a very small experiment).
Of the 12000 permutations 488 gave a total response to drug A of less
than 7·5, the observed total (Table 9.2.4), 80 the result of a one-tail
randomization test is P = 488/12000 = 0·04067. With samples of this
size there are so many possible totals that the distribution in Fig. 9.2.1
is almost continuous, so it will be possible to cut off a virtually equal
area in the opposite (upper) tail of the distribution. Therefore the
result of two-tail test can be taken as P = 2 X 0·04067 = 0·0813. This
is not low enough for the null hypothesis of equi-effectiveness of the
drugs to be rejected with safety because the observed results would not
be unlikely if the null hypothesis were true. The distribution in Fig.
9.2.1, unlike that in Table 9.2.3, looks quite like a normal (Gaussian)
distribution, and it will be found that the t test gives a similar result
to that just found.
Digitized by Google
19.3 1.&3
50n
100
°0-4 2 04 111--1 120-1 1-1 0-1 W04 IH04 204 22 04 24 0-1 26°" 284 3eH
j05 (obtlP"ofd ,0.hIP) Tot&1l'l'8pot1M' to dl"U(r A (hou",)
Digitized by Google
144 N 'Umerical and rank measurements § 9.3
measurements the 1088 of information involved in converting them to
ranks is surprisingly small.
Assumptions. The null hypothesis is that both samples of observations
come from the same population. If this is rejected, then, if it is wished
to infer that the samples come from populations with different medians,
or means, it must be assumed that the populations are the same in all
other respects, for example that they have the same variance.
r--
r-- I---
- r--
6 789 10 11 12 13 14 In 16 171M
t
obS('rvro value
Total of ranks for drug B
(sum of 3 ranks from 7)
if null hypothesis were true
Digitized by COOS Ie
§9.3 145
The frequency of each rank total in Table 9.2.2 is plotted in Fig. 9.3.1,
which shows the randomization distribution of the total rank for drug
B (given the null hypothesis). This is exactly analogous to the distribu-
tions of total response shown in Table 9.2.3 and Fig. 9.2.1, but the
distributions of total response depend on the particular numerical
values of the observations, whereas the distribution of the rank sum
(given the null hypothesis) shown in Fig. 9.3.1 is the same for any
experiment with samples of 3 and 4 observations. The values of the
rank sum cutting off 2·5 per cent of the area in each tail can therefore
be tabulated (Table A3, see below).
The observed total rank for drug B was 7, and from Fig. 9.3.1, or
Table 9.2.2, it can be seen that there are two ways of getting a total
rank of 7 or less, 80 the result of a one-tail test is P = 2/35 = 0·057.
An equal probability, 2/35, can be taken in the other tail (total rank of
17 or more) 80 the result of a two-tail test is P = 4/25 = 0·114. This
is the probability that a random selection of 3 patients from the 7
would result in the potency of drug B (relative to A) appearing to be
as small as (total rank = 7), or smaller than (total rank < 7), was
actually observed, or an equally extreme result in the opposite direction,
'I A and B were actually equi-effective. Since such an extreme apparent
difference between drugs would occur in 11·4 per cent of experiments
in the long run, this experiment might easily have been one of the
11·4 per cent, 80 there is no reason to suppose the drugs really differ
(see § 6.1). In this case, but not in general, the result is exactly the
same as found in § 9.2.
A check can be applied to the rank sums, based on the fact that the
mean of the first N integers, 1,2,3, . . . , N, is (N+I)/2 80 therefore
sum of the first N integers = N(N +1)/2. (9.3.1)
In this case 7(7+1)/2 = 28, and this agrees with the sum of all ranks
(Table 9.2.1), which is 21+7 = 28.
The distribution of rank totals in Fig. 9.3.1 is symmetrical, and this
will be 80 as long as there are no ties. The result of a two-tail test will
therefore be exactly twice that for a one-tail test (see § 6.1).
Digitized by Google
146 § 9.3
because the table refers to the randomization distribution of integers,
1, 2, 3, 4, 6, 6, . . . 20, not the actual figures used, i.e. I, 2, 3, 41, 41, 6,
etc. Such evidence as there is suggests a moderate number of ties does
not cause serious error.
The rank sum for drug A is 1+2+3+41+6+8+91+14+161+17
= 801, and for drug B it is 1291. The sum of these, 801+ 1291 = 210,
TA.BLB 9.3.1
The obaervations/rom Table 9.2.4 ranked in a8Ce1Ulif&{l order
Observation
Drug (hoUl'll) Rank
A -1·6 1
A -1·2 2
A -0·2 S
B -0·1 4*} _ 4+6
A. -0·1 4i - 2
A. 0·0 6
B 0·1 7
A 0·7 8
B 0·8 9*} 9+10
A. 0·8 91 - - 1 -
B 1-1 11
B 1·6 12
B 1·9 13
A. 2·0 14
B S·4 16*} _ 16+16
A. S·4 lOi - - 2 -
A 3·7 17
B 4·4 18
B 4·6 19
B 6·6 20
Total 210
Digitized by Google
147
P (two tail) can be found (approximately) from Table A3 in which
nl and n~ are the sample sizes (nl ~ n~). For each pair of sample sizes
two figures are given. If the rank sum for sample 1 (that with n l
observations) is equal to or less than the smaller tabulated figure,
if it is equal than the la.rg<7<7 figure, then
(two tail) is nut than the figure at the oolumn.
this case n l 10 and the figures is
128t for P = 132 for P = ubserved rank
of 801 is less greater than l~ween 0·1 and
0·05. This mea.ns that if the null hypothesis of equi-effectiveness were
true then the probability of observing a rank sum of 801 or less would
be under 0'05, a.nd the probability of observing a rank sum equally
extreme in the other direction would al80 be under 0'05, 80 the total
two-tail P (see § 6,1) is under 0·1. This result is similar to that found in
D,2 using the powerful test on the
(wbiliijjrvations thej?wjj~wl<7vv, not small evidence for
hifference betW??V'wn
gitized by Goo
148 § 9.3
and the rarity of the result judged from tables of the standard normal
distribution.
For example, the results in Table 9.3.1 gave "1
= 10, N = 20,
Rl = 80·0. Thus, from (9.3.2)-(9.3.4),
80·0-10(20+1)/2
'U = v'[10x 10(20+1)/12] = -1·85.
This value is found from tables (see § 4.3) to cut off an area P = 0·032
in the lower tail of the standard normal distribution. The result of
two-tail test (see § 6.1) is therefore this area, plus the equal area above
'U = +1·85, i.e. P = 2xO·032 = 0·064, in good agreement, even for
samples of 10, with the exact result from Table A3. The two-tail result
can be found directly by referring the value 'U = 1·85 to a table of
Student's t with infinite degrees of freedom (when t becomes the
same as 'U, see § 4.4).
Digitized by Google
§ 9.4 Two independent Bamplu 149
object is to estimate the standard deviation of the difference, 8[gA -YB]' t
80 that it can be predicted (see example in § 2.7) how much scatter
would be seen in (y A -YB) if it were determined many times (this
prediction is likely to be optimistic, see § 7.2).
(l:Yr~
l:(y_y)2 = l:y2 _ _-
n
(7·5)2
= 34·43--- = 28·805
10
t Note that this means the estimated standard deviation, II, of the random variable
(fiA -fiB)' It i8 the functional notation described in § 2.1. It does not mean II 'imu
(fiA -fiB)'
Digitized by Google
150 § 9.4
(5) Using (2.7.3) the variance of the difference between two such
means (assuming them to be independent, see also § 10.7) is
82[gA -YB] = 82[gA]+82[gB] = 0·3605+0·3605 = 0·7210.
The standard deviation of the difference between means is therefore
V(0·7210) = 0·8491 hours = 8[gA -YB]' with 18 degrees of freedom.
(6) The definition of t, given in (4.4.1), is (X-I')/8(X) where x is
normally distributed and 8(X) is its estimated standard deviation. In
this case the normally distributed variable of interest is the difference
between mean responses, (YA -YB). It is required to test the null
hypothesis that the drugs are equi-effective, i.e. that the population
value of the difference between means is zero, I' = 0, and therefore
I' = 0 is used in the expression for t because, as usual, it is required to
find out what would happm il the mdl Aypotlae8i8 were true. Inserting
these quantities giv~ on the null hypothesis,
(YA -YB)-O 2·33-0·75
t= = = 1·861.
8[gA -YB] 0·8491
t = 1~A-YB)-I' 12 '
J ["l:(YA-YA) +l:(YB-YB) (~+~)J
L- nA+nB- 2 n A nB
(9.4.1)
Digitized by Google
§ 9.4 151
the t distribution (with "A+"B-2 degrees of freedom) in order to
find P.
Digitized by Google
10. Numerical and rank measurements.
Two related samples
t Whether A or B is given flrat should be decided randomly Cor each patient. See
§§ 8.4 and 2.3.
Digitized by Google
----_.-- ._. --
TABLE 10.1.1
The ruu.lt8 from Table 9.2.4 presented in a way 81wwing
how the experiment was really done
Patient DUrerence Total
(block) 'lilt. 'II. d = ('II.-'IIIt.) ('11.+'11,,)
Digitized by Google
1M § 10.2
11 the probability of a positive difference is 1/2 (null hypothesis) 11k>n
the probability of observing 9 positive differences in 9 'trials of the
event' (just like 9 heads out of 9 tOMeS of an unbialled coin) is given by
the binomial diHtribution (3.4.3) as (1/2)9 = 1/512~0·OO2. For a two-
tail test of significance (see § 6.1) equally extreme deviations in the
oppotrite direction (i.e. 9 negative signs out of 9) must be taken into
accoWlt and for this P~0·002 also, 80 the reault of a two-tail sign test
II P~0·004. This is aubetantially lower than the values obtained in
Chapter 9 (when it was not taken into account that the samples were
related) and auggeste rejection of the null hypothesis because reaults
deviating it by aa much as was actually observed would be rare if it
were true.
EZMIIIpk (2). If there had been one negative difference (however
small) and 9 positive ones, then the one tail P (see § 6.1) weuld be the
probability of observing 9 or more positive signs out of 10. This would
be the situation if it were decided to count the zero difference in Table
10.1.1 as negative, to be on the safe side. From the binomial distribu-
tion, (3.4.3), the probability of observing 9 positive differences out of 10
is
101
P(9) =-(0'5)9 (0·5)1
9111
= 9-'10'i I(I)'
2 10 = 0'00976,
Digitized by Google
§ 10.2 Two related Mlmplu 155
The gmeral re8UU
Generalizing the argument shows that if robe differences out of 1& are
observed to be negative (or positive, if there are fewer positive signs
than negative), then the result of a two-tail test of the hypothesis that
the population median difference is zero is
'-'-
P=2I 1& I (1)11
,-0 r I(n-r)! -2 . (10.2.1)
t This situation show. the difficulties that can be introduced by ties. There is no ~n
to exclude the zero difference when finding confidence limite for the median, but the
reeulta will only agree exactly with the sign teet (from which the zero was omitted) if
this is done. The beet BDIIWer is probably to be on the aafe side. This 11II11alI:7 mee.na
counting the zero difference as though it had the sign leaat conducive to rejection of the
null hypothesis. In the example diacUMed this meana pretending that patient 5 actually
pve a Deptive difference. Example (2) showa that P ... 0·0214 in this - .
Digitized by Google
156 Numerical and rank measuremmt.Y § 10.2
Table AI, are the next-to-smallest and next-to-largest observations,
i.e. +0·8 and +2·4, which just fail to include zero. This agrees with the
exact two-tail result, P = 0·0214 (= 1-0·9786), found by direct
calculation above.
Method, (2). The same result is obtained if Table Al is entered with
r = rObl+1. This is obvious if (10.2.1) is compared with (7.3.3).
Considerations of a few examples shows that if limits are taken as the
(robl+1) the observation from end of the ranked observations, the
limits will just fail to include zero. For example, in Table 10.1.1, as
just discussed, r = robl+1 = I, gives P = 1-0·996 = 0·004. Likewise
in the second example above, robl = 1 negative sign out of n = 10.
Entering Table Al with n = 10 and r = rObl+ 1 = 2 gives the result of
the two-tail significance test as P = 1-0·9786 = 0·0214, exactly as
found from first principles above.
Method, (3). As might be expected, the same result can be obtained by
finding confidence limits, gJ B and gJL, for the population proportion
(9') of positive (or negative) differences and seeing whether these limits
include 9' = 0·5 or not. The method has been described in § 7.7 and the
the result can be obtained, as explained there, from Table A2. It will be
left to the reader to improve his moral fibre by showing (by comparing
(7.7.1), (7.7.2), and (7.3.2» that if the upper confidence limit for
the population median difference, found above, just fails to include
zero, then it will be found that the upper confidence limit for gJ, gJB' is
equal to or less than 0·5. Similarly, if the lower confidence limit for the
population median just fails to include zero then it will be found that
gJL ~ 0·5.
For example, in Table 10.1.1, robl = 0 out n = 9 differences were
negative, 80 l00r/n = 0 per cent negative differences were observed.
Entering Table A2 with r = 0 and n = 9 shows that 99 per cent
confidence limits for the population proportion of negative differences
are gJL = ° and gJ B = 0·445. These limits do not include 0·5 (as
expected) and this implies that for a two-tail significance test P
< 0·01 (i.e. 1-0·99), as found above.
In the second example above (robl = 1 negative difference out of
n = 10), consulting Table A2 with r = 1, n = 10, gives 95 per cent
confidence limits for the population proportion (9') of negative differ-
ences as 0·0025 and 0·445 which do not include 0·5. This is as expected
from the fact that the 97·86 per cent (which is as near to 95 per cent
as it is possible to get, see § 7.3) confidence limits for the population
median difference, +0·8 to +2·4 found above, just fail to include
Digitized by Google
§ 10.2 Two related Bample8 157
zero. The 99 per cent confidence limits for [#J are 0·0005 to 0'5443,
which do include the null hypothetical value, [#J = 0'5, as expected.
These results imply that the result of a two-tail sign test is 0·01 < P
< 0·05. The exact result is 0·0214, found above.
Digitized by Google
158 § 10.3
6). In fact it can be shown that the same result can be obtained by
inspecting the sum of only the positive (or of only the negative) d
values resulting from random allocation of signs to the differences, 80 it
is not necessary to find the mean each timet (similar situations arose
in §§ 8.2 and 9.2).
A88Umptiona. Putting the matter a bit more rigorously, it can be
seen that the hypothesis that an observation (d value) is equally
likely to be positive or negative, whatever its magnitude, implies that
the distribution of d values is symmetrical (see § 4.5), with a mean of
zero. The null hypothesis is therefore that the distribution of d values
is symmetrical about zero, and this will be true eitlw if the lIA and lIB
values have identical distributions (not necessarily symmetrical), or
the distributions of lIA and lIB values both have symmetrical distribu-
tions (not necessarily identical) with the same mean. This makes it clear
that if the null hypothesis is rejected, then, if it is wished to infer from
this that the distributions of lIA and lIB have different population means,
it must be a88Umed either that their distributions both have the same
shape (i.e. are identical apart from the mean), or that they are both
symmetrical.
Note that when the analysis is done by enumerating possible alloca-
tions it is assumed that eaoh is equi-probable, i.e. that an allocation
was picked at random for the experiment, the de8ign of which is therefore
inutricabllllinked with iU analyBi8 (see § 2.3)
If there are n differences (10 in Table 10.1.1) then there are 2"
poBSible ways of allocating signs to them (because one difference oan
be + or -, two can be ++, +-, -+,or - - , and each time another
is added the number of poBSibilities doubles). All of these combinations
could be enumerated as in Table 8.2.2 and Fig. 8.2.1, and Tables 9.2.2
and 9.3. This is done, using ranks, in § 10.4. In the present example,
however, only the most extreme ones are needed.
Emmple (1). In the results in Table 10.1.1 there are 9 positive
differences out of 9 (the zero difference, even if included, would have no
effect because the total is the same whatever sign is attached to it).
The number of ways in which signs can be allocated is 28 = 512. The
observed allocation is the most extreme (no other can give a mean of
1·58 or larger) 80 the chance that it will come up is 1/512. For a two-tail
test (taking into account the other most extreme poBSibility, all signs
t As before this is because the IoIaZ of all differences is the same (lIi'S in the example)
Cor all raruiomizations, 80 specifying the sum of negative differences also specifies the
mean difference.
Digitized by Google
§ 10.3 Two rel4tetlllamplu 169
negative, see § 6.1), the P value is therefore 2/512~0·004. In this
most extreme case (though no other) the result is the same as given by
the sign test (§ 10.2). Consider, for example, what would have happened
if patient 5 had given a negative difference instead of zero. The result
of the randomization test will, unlike that of the sign test, depend on
how large the negative difference is.
Example (2). Suppose that patient 5 had given tl = -0·9, the other
patients being as in Table 10.1.1. There are now 210 = 1024 possible
ways of allocating signs to the n = 10 differences. How many of these
give a total for the negative differences (see above) equal to leas than
0·9 t Apart from the observed allocation, only two. That in whioh
patient 8 is negative but 5 is positive giving a sum of negative differ-
ences of 0·8, and that in whioh all differences are positive giving a sum
of negative differences of zero. The probl;'bility of observing, on the
null hypothesis, a sum of negative differences as extreme as, or more
extreme than 0·9 is thus 3/1024. For a two-tail test (see § 6.1), therefore,
P = 6/1024 = 0·0059t (see next example for the detailed interpreta-
tion).
Example (3). If, however, patient 5 had had tl = 2·0, the mean
difference, l, would have been 13·8/10 = 1·38. In this case a sum of
negative differences equal to or less than 2 could arise in ten different
ways, as well as that observed, 80 P (one tail) = 11/1024 and P
(two tail) = 22/1024 = 0·0225.t The 11 possible ways are (a) all
differences positive (sum = 0), (b) one difference negative (patient
8,6, 1,3,4, 10,7, or 5) giving a sum of 0·8, 1·0, 1·2, 1·3, 1·3, 1·4, 1·8, or
2·0, depending on which patient has the negative difference, (0) two
differences negative, patients 6 and 8 giving a sum of negative differ-
ences of 1·0+0·8 = 1·8, or patients 1 and 8 giving a sum of 1·2+0·8
= 2·0.
This result means that if the null hypothesis were true then the
probability would be only 0·0225 that the random numbers would
come up, during the allocation of the treatments, in such a way as to
give a sum of negative differences of 2·0 or less (i.e. a mean difference
between B and A of 1·38 or more), or results equally extreme in the
other direction (A giving larger responses tha.n B). This probability
is small enough to make one suspect the null hypothesis (see § 6.1).
Digitized by Google
160 § 10.4
10.4. The Wilcoxon signed-ranks test for two related samples
This test works on muoh the same principle as the randomization
test in § 10.3 except that ranks are used, and this allows tables to be
constructed, making the test very easy to do. The relation between the
methods of §§ 9.2 and 9.3 for independent samples was very similar.
However, the signed-ranks test, unlike the sign test (§ 10.2) or the
rank test for two independent samples (§ 9.3), will not work with
observations that themselves have the nature of ranks rather than
quantitative numerical measurements. The measurements must be
such that the values of the differences between the members of eaoh
pair can properly be ranked. This would certainly not be possible if
the observations were ranks. If the observations were arbitrary scores
(e.g. for intensity of pain, or from a psychological test) they would
be suitable for this test if it could be said, for example, that a pair
difference of 80-70 = 10 corresponded, in some meaningful way, to a
smaller effect than a pair difference of 25-10 = 15. Seigel (I956a, b)
disousses the sorts of measurement that will do, but if you are in
doubt use the sign test, and keep Wilcoxon for quantitative numerical
measurements. Sections 9.2, 9.3, 10.3 and Chapter 6, should be read
before this section. The precise nature of the assumptions and null
hypothesis have been discussed already in § 10.3.
The method of ranking is to arrange all the differences in ascending
order regardle88 oj sign, rank them 1 to n and then attach the sign of the
difference to the rank. Zero differences are omitted altogether. Differ-
ences equal in absolute value are allotted mean ranks as shown in
examples (2) and (3) below (and in § 9.2). To use Table A4 find T, which
is either the sum of the positive ranks or the sum of the negative ranks,
whichever sum is smaller. Consulting Table A4 with the appropriate
n and T gives the two-tail P at the head of the column. Examples are
given below. Of course for simple cases the analysis can be done
directly on the ranks as in § 10.3.
Digitized by Google
110.4 181
isn(n+l)/2 = 4(4+1)/2 = 10, which checks (3+7 = 10). Thus 21 = 3,
the amaller of the rank sums. Table A4 indicates that it is not pouible
to find evidence against the null hypothesis with a sample as small as
4 dift'erences. This is because there are only 2" = 2' = 18 dift'erent
ways in which the results could have turned out (Le. ways of allocating
signs to the dift'erences, see § 10.3), when the null hypothesis is true.
TABLB 10.4.1
PAe 16 poMble tlJtJ1I' in whiM a trial on Jour pair' oj ftCbjtdl could ,.".
out iJ treatment. ..4 and B were equi-effectitJe, 10 lAe lip oj eacA differenu
" decided by wAetAer tAe randomization fWOCU' allocatu ..4 or B to lAe
member oj tAe pair gitJi"fl tAe larger 'upoMe. For tlXJmpIe, on 1M IeCOntl
liM lAe If'IIGllut differenu " negatitJe and all lAe rut are poftti", gitJi"fl
IUm oj negatitJe ranJ:" = 1
Sum of Sum of T
Ra.ok 1 2 S 4 poe. ranb. neg. ranks
+ + + + 10 0 0
+ + + 9 1 1
+ + + 8 2 2
+ + + 7 S S
+ + + 6 4 4
+ + 7 S S
+ + 6 4 4
+ + 6 6 6
+ + 6 6 6
+ + 4 6 4
+ + 3 7 S
+ 1 9 1
+ 2 8 2
+ 3 7 3
+ 4 6 4
0 10 0
Therefore, even the most extreme result, all dift'erences positive, would
appear, in the long run, in 1/16 of repeated random allocations of
treatments to members of the pairs. Similarly 4 negative differences
out of 4 would be seen in 1/16 of experiments. The result of a two tail
test cannot, therefore, be le88 than P = 2/16 = 0·125 with a sample of
four dift'erences, however large the differences (see, however, §§ 6.1 and
Digitized by Google
162 § 10.4
10.5 for further comments). With a small sample like this, it is easy to
illustrate the principle of the method. More realistic examples are
given below.
The 2~ = 16 poBBible ways of allocating signs to the four differences
(i.e. the possible ways in whioh A and B could have been allocated to
members of a pair, see § 10.3 for a full discussion of this proOeBB), are
listed systematically in Table 10.4.1, together with the sums of positive
and negative ranks, and value of T, corresponding to each allocation.
In Table 10.4.2, the frequencies of these quantities are listed from the
TABLE 10.4.2
The relative Jreque:ru;ie8 of observing various vcUU68 of i.e. the distrilntJioM
of, the rank sum, and T, with n = 4 :pairs of obBervatioM when the null
hypothe8iB is true. Ocmstruded, from Table 10.4.1
0 1 1 2
1 1 1 2
2 1 1 2
3 2 2 4
4 2 2 4
5 2 2 2
6 2 2
7 2 2
8 1 1
9 1 1
10 1 1
Total 16 16 16
results in Table 10.4.1, and in Fig. 1004.1 the distribution of the sum
of positive ranks is plotted (that for negative ranks is identical).
(These are the paired sample analogues of the rank distribution worked
out for two independent samples in Table 9.2.2 and Fig. 9.3.1.)
Now the observed sum of positive ranks was 3, and the probability
of observing a sum of 3 or less is seen from Table 10.4.2 or Fig. 10.4.1, to
be 5/16. The probability of an equally large deviation from the null
hypothesis in the other direction (sum of positive ranks ~ 7) is also
5/16. (The distribution is symmetrical, like that in Fig. 9.3.1, unleBB
there are ties, so the result of a two-tail test is twice that for a one-tail
test. See § 6.1.) The result of a two-tail significance test is therefore
Digitized by Google
§ 10.4 163
P = 10/16 = 0'626, 80 there is no evidence against the null hypothesis,
because results deviating from it by as much as, or more than, the
observed amount would be common if it were true. In other words, if
the null hypothesis were true it would be rejected (wrongly) in 62'6
per cent of repeated experiments in the long run, if it were rejected
whenever the sum of positive ranks was 3 or less, or when it was equally
extreme in the other direction (7 or greater). A value of T equal to or
less than the observed value (3) is seen, from Table 10.4.2, to occur in
4+2+2+2 = 10 of the 16 possible random allocations. The probability
(on the null hypothesis) of observing T ~ 3 is therefore P = 10/16
;;..
...
s::
~ I
;r
2:
:..
(I
(I 2 3 4 5 6 7 8 9 10
Rank 8um
FlO. 10.4.1. Distribution of the sum ofpoeitive ranks when the null hypothesis
is true for the Wilcoxon signed ranks teat with four paira of observations. The
distribution is identical for 8um of negative ranks. Plotted from Table 10.4.2.
Digitized by Google
]84 1]0.4
This is, 88 11811&1, because if the null hypothesis were true, deviations
from it (in either direction) 88 large 88, or larger than, those obeerved
in this experiment would occur only rarely (P = 0.0(4) because the
random numbers happened to come up 80 that all the subjects giving
big reBpOD888 were given the same treatment.
Emmple (2). Suppose however, 88 in § 10.3, Example (2), that patient
6 had given d = -0·9 instead of zero. When the observations are ranked
regardleu of sign the result is 88 follows:
d o·s -0·9 ]·0 1·2 1·3 1·3 1·4 I·S 2·4 4·6
rank 1 -2 3 4 5t lit 7 S 9 10
t Thia i8 found by conatructing a sum of 8 or Ie. from the integers from 1 to n( = 10),
.. 11-'. in oaIoulating Table A4. More properly it mould be done with the figuree 1,2, a,
4la, 41b, 8, 7. 8. 9. 10 and with theee there are only 24 ways of getting a sum of 8 or
Je..
Digitized by Google
§ 10.4 186
In this case most of the obeerved difFerenoes are negative. Is the
population mean difference different from zero t The sum of the
negative ranks is 2+5+7+8+9+10+11+12 = 64, and the sum of
the positive ranks is 1+3+4+6 = 14, 80 T = 14, the Imaller rank
sum. An arithmetical check is provided by (9.3.1) which gives n(n+ 1)/2
= 12(12+1)/2 = 78, and, correctly, 64+14 = 78. TableA4showathat
when n = 12, a value of T = 14 corresponds to P = 0·01S. Only
marginal evidence against the null hypothesis (see § 6.1).
(10.4.2)
Digitized by Google
166 § 10.4
this case the normal approximation gives the same value as the exact
result, P = 0,05, found above from Table A4.
TABLE 10.5.1
Treatment
Block A B Difference
The experiment W88 designed exactly like that in Table 10.1.1. All the differences
are positive so the three nonparametric teste described in §§ 10.2-10.f all give
P = 2/2' = 1/8 for a two-tail test. In general, for " differences all with tbfl
same sign, the result would be 2/2".
It baa been stated that the design of an experiment dictates the form its an-
alysis must take. Selection of particular features after the results have been seen
(data-snooping) can make significance teste very misleading. Methods of dealing
with the sort of data-snoopingt problem that arise when comparing more than
two treatments are discU88ed in § 11.9. Nevertheless, it seems unreasonable to
ignore the fact that in these results, the observations are completely separated,
i.e. the smalleet response to A is bigger than the largest response to B, a feature
of the results that baa not been taken into account by the paired tests. (In
general, the statistician is not saying that experimenters should not look too
cloeely at the results of their experiments, but that proper allowance should be
made for selection of particular features.) This feature means that if the results
could be analysed by the two nonparametric methods designed for independent
samples (described in §§ 9.2 and 9.3), both methods would give the probability
of complete separation of the results, if the treatments were actually equi-
effective, 88 P = 2" I" 1/(2,,) I = 2(flf 1)/8 I = 1/35 (two-taill-a far more 'signi-
ficant' result I The naive interpretation of this is that it would have been better
not to do a paired experiment. This is quite wrong. It baa been shown by Stone
(1969) that the probability (given the null hypothesis of equi-effectiveness of
A and B) of complete separation of the two groups 88 observed, would be 1/35
even if there were no differences between the blocks, and even less than 1/35 if
there were such differences. This is not the same 88 the P = 1/8 found using the
paired tests because it is the probability of a different event. If the null hypo-
thesis were true, then, in the long run, 1 in 8 of repeated experiments would be
expected to show f differences all with the same sign out of f, but only 1 in
35, or fewer, would have no overlap between groups 88 in this case.
It remains to be decided what should be done faced with observations such 88
t This is atatistioiana jargon. 'Data I18I8Otion' might be better.
Digitized by Google
§ 10.5 167
thoee in Table 10.5.1. The BD&g ie, of course, that IIfty a1ngle apeclfted arrange-
ment of the resulte is improbable. If the treatments were equi-effective (and
there were no differences between blocks) IIfty of the 4141/81 = 70 poaible
arrangements of the eight ftgurea into two groupe of 4 would have the same
probability, P = 1/70, of occurring. It is oDly because the particular a.rra.np-
ment, with no overlap between A and B, corresponds to a preconceived ldea,
that it is thought unusual, and what constitutes 'correspondence with a pre-
conceived idea' may be arguable. The problem is an old one:
' .••• when Dr Beattie ot-rved, .. BOmethiDg remarkable which had happened to him,
that he had ohanoed to _ both No.1 and No. 1000, of the haokney-ooaohea, the fint
and the Iut; "Why, Sir, (1I&id Jobnaon,) there is an equal ohanoe for one'. _ing thoee
two numbera 88 any other two". He W88 oIearly right; yet the -intr of the two extreDIM,
each of which is in BOme degree more conspicuous than the rest, could not but .trike
one in a stronger manner than the Bight of any other two numbers'
(Boawe1l'. Life oJ JoAtieoft)
The oDly aafe gtmlll"tll rule that can be offered at the moment is to analyae the
aperiment 88 a paired aperiment if it was designed in that way. In other words
take P = 1/8 in the present caae; not much evidence apinat the null hypotheala.
The problem is, however, a complicated one that is atiIl not fully solved. t
I:(d-J'f~ 38'58-(15·8)2/10
r[tl]= n-l = 10-1
t In IOfM 088M, such .. this one, when the amane.t P value (given, in this _ by
the Wilcoxon two.l8IDple teat) is amalIer than the P value that the other teIIt8 under
oonaideration can ever reach, however large the dift'erence between the treatments,
Stone (1969) baa argued that it is proper to quote the amalIer P, i.e. P <; 11M for the
results in Table 10.6.l.
Stone'. method al80 introdUC88 another factor of 1/2, i.e. he t&kea P <; 1/70, but this
feature baa not yet come into wide U88.
Digitized by Google
168 § 10.6
And the variance 01 the mean difference is, by (2.7.8),
r[d] l-ln3
s2[J] = - n = -10- = 0·1513 with 9 degrees of freedom.
This should be carefully distinguished from the variance 01 the
diJJerence between mea1&8 found in § 9.4, which was larger (0,7209) and
had more degrees of freedom (18). The disappearance of 9 degrees of
freedom will be explained when the results are looked at from the point
of view of the analysis of variance in § 11.6.
The standard deviation of the mean difference is estimated as
B[J] = YO·Un3 = 0·3890
and the null hypothesis is that the population (true) 'mean difference,
,." is zero. Thus, from (4.4.1),
t= J-,., =~ = J . (10.6.1)
B[J] 8[J] y[~di-(l:drjJ/n}/{n(n-l)}]
In this example
1·58
t = 0.3890 = 4·062.
Referring to a table of the t distributiont with n-l = 9 degrees of
freedom shows that a value of t (of either sign) as large as, or larger
than 4·062 would be soon in leaa than 0·5 per cent of trials if the null
hypothesis that the population (true) mean difference ,., = 0 were
true, and if the asaumptions of normality, etc. were true, i.e. P (two
tail) < 0·006. This strongly suggests that the null hypothesis is not in
fact true and that there is a real difference between the means.
This result is rather different from that found in § 9.4 and the other
sections in Chapter 9, when the same results were analysed as though
the drugs had been tested on independent samples, and the reasons for
this are discussed in §§ 10.7, 11.2, 11.4, and 11.6. It is in reasonable
agreement with the other analyses in this chapter but it cannot be
asaumed that the , test will always give similar results to the more
asaumption-free methods.
As in § 9.4, the same conclusion could be reached by calculating
confidence limits for the mean difference. The 99·6 per cent confidence
t For example, Fieber and Yates (1983), Table 3, or PearIIOn and Hartley (1988),
Table 12. Only the latter baa P = 0'005 values. See § 4.4 for detail••
Digitized by Google
§ 10.8 189
limits for p would be found not to inolude zero, the null hypothetical
value, but the 99·8 per cent confidence limits do inolude zero.
10.7. When will related sampl.. (pairing) be an advantage?
In Chapter 9 the results in Table 9.2.4 and § 10.1 were analysed by
several different methods and in no case was good evidence found
against the hypothesis that the two drugs were equi-effective. The
methods all assumed that the measurements had been made on two
independent samples of ten subjects each. In § 10.1 it was revealed
that in fact the measurements were paired and the same results were
reanalyaed making proper allowance for this in §§ 10.2-10.6. It was then
found that the evidence for a difference in effectiveness of the drugs
was actually quite strong. Why is this' On commonsense grounds the
difference between responses to A and B is likely to be more consistent
if both respon.ses are measured on the same subject (or on members of
a matched pair), than if they are measured on two different patients.
This can be made a bit less intuitive if the correlalicm between the
two responses from each pair is considered. The correlation coeffioient,
r (whioh is disoussed in § 12.9, q.v.), is a standardized measure of the
extent to whioh one value (say YA) tends to be large if the other (Ya) is
large (in 80 far as the tendency is linear, see § 12.9). It is olosely related
to the covariance (see § 2.6), the sample estimate of the correlation
coefficient being r = COV[YA' Yal/(8[YA].8[Ya]).
Now in § 9.4 the variance of the difference between two means was
found as rrgA-ga] = r(gA]+r(ga] using (2.7.3), which assumes that
the two means are not correlated (which will be 80 if the samples are
independent). When the samples are not independent the full equation
(2.7.2) must be used, viz.
r[J] = r(ga-gA] = 82[gA]+82[ga]-2 COV(gA' galt
Using the correlation coeffioient (r) this can be written
(10.7.1)
(This expression should ideally contain the population correlation
coefficient. If an experiment is carried out on properly seleoted in.-
depen.dmt samples, this will zero, 80 the method given in § 9.4, which
ignores r, is correct even if the 8ample correlation coefficient is not
exactly zero.)
These relationships show that if there is a positive correlation between
t There are equal numbera in each IalDple 80 (rA -g.) = (YA -Y.) = d.
Digitized by Google
170 § 10.7
the two responses of a pair (r> 0) the variability of the difference
between means will be reduced (by subtraction of the last term in
(16.7.1)), as intuitively expected. In the present example r = +0'8,
and r[jA-YBl is reduced from 0·7210 when the correlation is ignored
(§§ 9.4 and 11.4), to 0·1513 when it is not (§§ 10.6 and 11.6). The
correct value is 0,1513, of course.
Although correlation between obaervations is a useful way of looking
at the problem of designing experiments 80 as to get the greatest
possible preoision, this approaoh does not extend easily to more than
two groups and it does not make olear the exact assumptions involved
in the test. The only proper approaoh to the problem is to make olear the
exact mathematical model that is being assumed to describe the
observations, and this is disousaed in § 11.2.
Digitized by Google
11. The analysis of variance. How to
deal with two or more samples
'. • • when I come to "Evidently" I know that it means two hoUIB hard work at
least before I can see way.'
why W. S. GOSBET ('Student')
in letter dated June 1922, to R. A. FIsher
Digitized by Google
172 The afllllyBiB oj variance §11.1
assumption (among others, see § 11.2) that experimental errors follow a
normal (Gaussian) distribution (see § 4.2), but the nonparametrio
methods of §§ 11.0 and II. 7 should be UBed in preference to the Gaussian
ones usually (see § 6.2). Unfortunately, nonparametric methods are not
available (or at least not, at the moment, practicable) for analysis of
the more complex and ingenious experimental designs (see § 11.8 and
later ohapters) that have been developed in the context of normal
distribution theory. For this reason alone most of the chapters following
this one will be based on the assumption of a normal distribution (see
§§ 4.2 and 11.2).
When comparing two groups, the difference between their means or
medians was used to measure the discrepancy between the groups.
With more than two it is not obvious what measure to use and because
of this it will be useful to desoribe the normal theory tests (in whioh a
suitable measure is developed) before the nonparametrio methods.
This does not mean that the former are to be preferred. Tests of
normality are disoussed in § 4.6.
Digitized by Google
§ 11.2 How 10 deal with two or more llamplu 173
TAt culditiw model
These assumptions can be summarized by saying that the ith
observation in the jth group (e.g. jth drug in § 9.4) can be represented
as 1Iu = I'I+e ll (i = 1.2 •...•nl. where n l is the number of obaervations
in the jth group-reread § 2.1 if the meaning of the subacripts is not
clear). In this expreesion the 1'1 (constants) are the population mean
responses for the jth group. and ell' a random variable. is the error
of the individual observation. i.e. the dift"erence between the individual
obeervation and the population mean. It is assumed that the eu are
independent of each other and normally distributed with a population
mean of zero (so in tAt lotag runt the mean 1111 = 1'1) and standard
deviation (/. Usually the population mean for thejth grouP. 1'1' is written
as I'+TI where I' is a constant common to all groups and TI is a con-
stant (the treatmem effect) characteristic of the jth group (treatment).
The model is therefore usually written
(11.2.1)
The paired t test (§§ 10.6 and 11.6). and its k sample analogue
(§ 11.6). need a more elaborate model. The model must allow for the
po88ibility that. &8 well &8 there being systematic dift"erences between
samples (groups. treatments). there may also be systematic differences
between the patients in § 9.4. i.e. between blocb in general (see § 11.6).
The analyses in § 11.6 &88ume that the observation on the jth sample
(group. treatment) in the ith block can be represented &8
(11.2.2)
where I' is a constant common to all observations. PI is a constant
characteristic of the ith block, TJ is a constant, &8 above. characteristic
of the jth sample (treatment). and eu is the error. a random variable.
values of which are independent of each other and are normally distri-
buted with a mean of zero (80 the long run average value of 1111 is
Jl+PI+TJ)' and standard deviation (/. This model is a good deal more
restrictive than (11.2.1) and its implications are worth looking at.
Notice that the components are supposed to be additive. In the case
of the example in § 10.6. this means that the differtnCtA between the
responses of a pair (block in general) to drug A and drug B are supposed
to be the same (apart from random errors) on patients who are very
sensitive to the drugs (large PI) &8 on patients who tend to give smaller
t In the notation of Appendix I, E(e) = 0 ao E{y) = E(p/HE(e) = PI from (11.2.1).
And E(y) = E(p)+E(/I.HE("/HE(e) = P+/I'+"I from (11.2.2).
Digitized by Google
174 § 11.2
response8 (8mall P.). Likewise the difference in response between
any two patients who receive the same treatment, and are therefore in
different pairs or blocks, is supposed to be the same whether they
receive a treatment (e.g. drug) producing a large observation (large 1'/)
or a treatment producing a 8mall observation (small 1'/). These remarks
apply to differtnCU between responses. It will not do if drug A always
give8 a re8ponse larger than that to drug B by a constant percentage,
for example.
Consider the first two pairs of observations in Table 10.1.1. In
the notation just defined (see al80 § 2.1) they are 1111 = +0·7, 1112
= +1·9,1121 = -1·6, and 1122 = +0·8. The first difference is &88umed,
from (11.2.2), to be 1112-1111 = (P+P1+1'2+e12)-(P+P1+1'l+e11)
= (1'2-1'l)+(e12-e11) = +1·2. That is to say, apart from experimental
error, it measures onlll the difference between the two treatments
(drugs), viz. 1'2-1'1 whatever the value of Pl. Similarly, the second
difference is 1122-1121 = (1'2-1'l)+(t:a2-t:a1) = 2·4, which i8 al80 an
estimate of exactly the same quantity, 1'2-1'1; whatever the value of P2,
i.e. whatever the sensitivity of the patient to the drugs.
Looking at the difference in response to drug A (treatment 1) between
patients (blocks) 1 and 2 8hoW8 1111-1121 = (P+P1+1'l+e11)-(P+P2
+1'l+t:a1) = (P1-P2)+(e11-t:a1) = +0·7-(-1·6) = +2·3, and sim-
ilarly for drug B 1112-1122 = (P1-P2)+(e 12 -t:a2) = 1·9-0·8 = 1·1.
Apart from experimental error, both estimate only the difference
between the patients, which i8 &88umed to be the same whether the
treatment i8 effective or not.
The best estimate, from the experimental results, of 1'2-1'1 will be
the mean difference, J = 1·58 hours. H the treatment effect i8 not the
same in all blocks then block X treatment interactions are said to be
present, and a more complex model is needed (see below, § 11.6 and,
for example, Brownlee (1956, Chapters 10, 14, and 15».
This additive model i8 completely arbitrary and quite restrictive.
There i8 no reason at all why any real observations should be repre-
sented by it. It is used because it is mathematically convenient. In the
case of paired observations the addivity of the treatment effect can
easily be ohecked graphically because the pair differenoes should be
a measure of 1'2-1'1' as above, and should be unaffected by whether
the pair (patient in Table 10.1.1) is giving a high or a low average
response. Therefore a plot of the difference, d = lIA -liB' against the
total, lIA+lIB, or equivalently, the mean, for each pair should be apart
from random experimental errors, a 8traight horizontal line. This
Digitized by Google
§ 11.2 How to deal with two or more samplu 175
plot, for the results in Table 10.1.1, is shown in Fig. 11.2.1. No system-
atic deviation from a horizontal line is detectable with the available
results but there are not enough observations to provide a good test
of the additive model. For methods of checking additivity in more
complex experiments see, for example, Bliss (1967, p. 323-41).
"
3
::
~ ®
2
®
®
® ® ®
®
®
0
-2 -I
.0 2 5 6 7 8 9
3
" Y.+YA
10
FlO. 11. 2.1. Teet or additive model ror two way a.na1ysis or variance with
two aamples (i.e. paired t teet). Pair difterences are plotted. against pair sum
(or pair mean).
H omogemity oj error
In the models for the Gaussian analysis of variance all random errors
are pooled into the single quantity, represented by e in (11.2.1) and
(11.2.2), which is supposed to be normally distributed with a mean of
zero and a variance of a2. In other words, if observations could be
made repeatedly using a given treatment (e.g. drug) and block (e.g.
patient) the scatter of the retrults would be the same whatever the
size of the observation and whatever treatment was applied. This means
that the scatter of the observations must be the same for every group
(sample, treatment) for experiments with independent samples,
represented by (11.2.1).
13
Digitized by Google
176 The aftalysi8 0/ variance §11.2
To test whether the variance estimates calculated from each group
can reasonably all be taken to be estimates of the same population
value, a quick test is to calculate the ratio of the largest variance to
the smallest one, 8~8~. This test assumes that the k samples are
independent and that the variation within each follows a normal
distribution. Under these conditions the distribution of 8~;arum is
known (when k = 2 it is the same as the variance ratio, see § 11.3). For
the results in Table 11.4.1 it is seen that k = 4, 8~B:nin = 158·14/
34·29 = 4'61, and each group variance is based on 7-1 = 6 degrees of
freedom. Reference to tables (e.g. Biometrika tables of Pearson and
Hartley (1966 pp. 63-7 and Table 31» shows that a value of '8~
orrum of 10·4 or larger would be expected to occur in 5 per cent of
repeated experiments if the 4 independent samples of 7 observations
were all from a single normally distributed population, and therefore
all had the same population variance-the null hypothesis. Thus
P > 0·015 and there are no grounds for believing that population
variance is not the same for all groups, though the test is not very
sensitive. The tables only deal with the case of k groups of equal size.
If the sizes are not too unequal the average number of degrees of
freedom can be used to get an approximate result.
U8e 0/ trafUJ/ormationa
If the original observations do not satisfy the assumptions, some
function of them may do so, although you will be lucky if you have
enough observations to find out which function. Aspects of this problem
are discussed in Bliss (1967 pp. 323-41), §§ 4.2, 4.5, 4.6 and 12.2.
For example, suppose the observations (I) were known to be log-
normally distributed (see § 4.5) and (2) were represented by a multi-
plicative model (e.g. one treatment always giving say 50 per cent greater
response than another, rather than a fixed increment in response) and
(3) had standard deviations that were not constant, but which were
proportional to the treatment mean (i.e. each treatment had the same
coefficient of variation, eqn (2.6.4». In this case the logarithms of the
observations would be normally distributed with constant standard
deviation, and would be represented by an additive model. The
constancy of the standard deviation follows from eqn (2.7.14). Therefore
the logarithm of each observation would be taken before doing any
calculations.
If the standard deviation for each treatment group is plotted against
the mean as in Fig. 11.2.2 the line should be roughly horizontal.
Digitized by Google
§ 11.2 How to deal with two or more 8amplu 177
This can be tested rapidly using sWaZmin &8 above, given normality.
H it is a straight line p&88ing through the origin then the coefficient
of variation is constant and the logs of the observations will have
approximately constant standard deviation &8 just described. H the
line is straight, but does not pass through the origin, &8 shown in Fig.
11.2.2, then 110 = alb (where a is the intercept and b the slope of the
Digitized by Google
178 The analYN 01 variance § 11.2
to an assertion that the scale adopted will give the required additivity
etc.' A good discussion is given by BliM (1967 pp. 323-41).
Digitized by Google
§1 How
11.3. The distribution of the variance ratio F
This section describes the generalization of Student's t distribution
(§ 4.4) that is necessa.ry to extend the two-sample normal distribution-
based tests (see §§ 9.4 and 10.6) to more than two (k in general) samples
of with normally errors (see §§
The t teSt23 £:~Hd 10.6 will be
&'Y',ore general methr)h23,
the t test f023 samph523
discrepancy between the samples was measured by tb23
between the sample means, and if this was large enough, compared
with experimental errors, the null hypothesis that the two population
means were equal, i.e. that both samples were samples from the same
parent population, with variance cr,
was rejeoted. If it is required to
test that more 2323tmples are all
therefore have rf±ean and VILI:lt±r,±±rr~
between
didhrence that Wee
measure of SCehhe23 He23mal distribution
standard deviation, and it turns out that the standard deviation of the
k sample means is a suitable generalization of the difference between
two means.
The sensibleness of this is made apparent when it is realized that the
difference between two figures is their standard deviation, apart from a
two observehirr71e, Y2' What is
l:(y_g)2
=1A+~
- -(h~ I ~+2YIY2)/2 hYIY2)/2 = (Yl
standnrh deviation of the two dgures is tde square root od
(11.3.1)
gle
180 § 11.3
111 and 112 are assumed to be independent, and both are assumed to
have the same variance, of which an estimate ~] baaed on I degrees of
freedom is available (calculated, for example, from the variability
within groups as in § 9.4 if 111 and 112 were group means). The estimated
variance of 111 -112 is therefore, by (2.7.3), r[y1 -yal = r[y1]+r£9a]
= 28~], 80 Bf1I1-yal = Y28rf11]. Using these values gives
t _ Y1 -Y2 _ Y1 -Y2
1- Bf1I1-yal - y28rf11]
81U1]
= 8rf11]' by (11.3.1). (11.3.2)
(11.3.3)
Digitized by Google
§ 11.3 How to deal with two or more 8amplu lSI
Imagine repeated samples drawn from a single population of normally
distributed observations. From a sample of A+ 1 observations the
sample variance, "" is calculated as an estimate of the population
variance (with 11 degrees of freedom). Another independent sample
of/2+ 1 observations is drawn from the same population and ita sample
variance, 81., is also calculated. The ratio, F, of these two estimates
of the population variance is calculated. If this prooesa were repeated
very many times the variability of the population of F values 80
0·7
0·6
01 5
F(oI.6)
Flo. 11. 8.1. Distribut.lon of the variance ratio when there are , degreee of
fieedom for the estimate of II' in the numerator and 6 degreee of fieedom for
that in the denominator. In 10 per cent of repeated experiments in the long run,
the ratio of two moo estimates of the .me variance (null hypotheaia true) will be
8·18 or larger. The mode of the distribut.lon is lea than 1, and the mean is greater
than 1.
Digitized by Google
182 The analyN 01 variance § 11.3
and 12 = 6 i8 8hown in Fig. 11.3.1. Reference to the tables show8 that
in 10 per cent of repeated experiments F will be 3.18 or larger, as
illustrated. The distribution has a different shape for different numbers
of degrees of freedom, but it i8 alwaY8 positively skewed 80 mode and
mean differ (see § 4.5). Since numerator and denominator are estimates
of the same quantity values of F would be expected to be around one.
r
As in the case of (see § 8.5), deviations from the null hypothesis in
either direction tend to increase the value of F (because squaring
makes all deviations positive), 80 the area in one tail of the F distribu-
tion, as in Fig. 11.3.1 i8 appropriate for a two-tail test (see § 6.1) of
significance in the analY8i8 of variance. This can be seen clearly in the
case of the t teat. In § 9.4 it was found that the probability that t 18 will
be either 1688 than -2·101 or greater than +2.101 was 0·05. Either of
these possibilities implies that t 1:a.. ~ 2.1012, i.e. F(I,18) ~ 4·41.
Reference to the tables of F with 11 = 1 and 12 = 18 show8 that
F = 4·41 cuts off 5 per cent of the area in the upper tail of the di8tribu-
tion, the same result as the two-tail t test.
Digitized by Google
§ n.' How to deal with two or more 8ampUa 183
the blood 8ugar level (mg/lOOml) of 28 rabbits shown in Table n.'.!'
As umal, the rabbits are 8upposed to be randomly drawn from a
population ofrabbits, a.nd divided into four groups in a 8trictly random
way (see § 2.3). One of the four treatments (e.g. drug, type of diet, or
environment) is &88igned to each group. Is there any evidence that the
treatments affect the blood 8ugar level1 Or, in other words, do the
TA.BLE 11.4.1
Blood sugar level, (mg/lOOml)-IOO, in lour gt'O'UpB 01 8et1eft. rabbiU.
8u § 2.1 lor explafllJtion. 01 tWlation.. The flguru in parmtlauu are the
ranD aM rank BUms lor use in § 11.6
1 2
Treatment (J)
8 ,
17 (IOi) 87 (26) 86 (22i) 9 (6)
16 (9) 86 (2') 22 (16) 8 (')
28 (18) 21 (lSi) 86 (22i) 17 (101)
, (8) 18 (7) 88 (26) 18 (12)
21 (18i) '6 (28) 81 (19) 1 (2)
o (1) as (10i) 8' (201) 8' (201)
28 (10i) 18 (7) '0 (27) 18 (7)
=
Ie
l:
i-1.-1
682
Mean Grand mean
T.,
g., = - 15'571 26'867 38·671 U·286 g.• = 682/28
taJ = 22'57U
.1,
Variance
102'96 168'U 8',29 109·2'
four mean level8 differ by more than reasonably could be expected if the
null hypothe8is that all 28 observations were randomly selected from
the 8&me population (so the population means are identical) were true 1
The a88umptions concerning normality, homogeneity of variance8,
and the model involved in the following analY8i8 have been di8cU8&8d in
§ 11.2 which 8hould be read before thi8 section. Although the large8t
group variance in Table 11.4.1 i8 4·6 times the 8malle8t, this ratio is not
large enough to provide evidence against homogeneity of the variance8,
&8 shown in § 11.2. Tests of normality are discussed in § 4.6.
Digitized by Google
184 The analy8i8 of variance § 11.4
The nonparametric method described in § 11.5 should generally be
preferred to the methods of this section (see § 6.2).
The following discussion applies to any results consisting of Ie
independent groups, the number of observations in the jth group being
"I (the groups do not have to be of equal size). In this case Ie = 4 and
all n l = 7. The observation on the ith rabbit and jth treatment is
denoted Yfj' The total and mean responses for the jth treatment are
T.I and Y.I' See § 2.1 for explanation of the notation.
The arithmetic has been simplified by subtracting 100 from every
observation. This does not alter the varianoes calculated in the analysis
(see (2.7.7», or the differenoes between means.
The problem is to find whether the means (9'/) di1fer by more than
could be expected if the null hypothesis were true. There are four
sample means and, as discussed in § 11.3, the extent to which they differ
from each other (their scatter) can be measured by calculating the
variance of the four figures.
The mean of the four figures is, in this case, the grand mean of all
the observations (this is true only when the number of observations is
the same in each group). The sum of squared deviations (SSD) is
1-'
I (9.I-g ..>2 = (15·571-22·571)2+(26'857-22·571)2+
+(33·571-22·571)2+(14·286-22·571)2
= 257·02.
Is this figure larger than would be expected 'by chance', i.e. if the
null hypothesis were true' It would be zero if all treatments had
resulted in exactly the same blood sugar level, but of course this
result would not be expected in an experiment, even if the null hypo-
thesis were true. However, the result that 'I.lJO'lIld be expected can be
predicted, because the null hypothesis is that all the observations come
from a single Gaussian population. If the true mean and variance of
this hypothetical population were p. and a2 then group means, which are
means of 7 observations, would be predicted (from (2.7.8» to have
variance a2/7. If another estimate of a2, independent of differences
Digitized by Google
11.4 to deal with 18Jb
between treatments, were obtainable then it would be possible to see
whether this prediction was fulfilled. How this is done will be seen
shortly.
With greater generality, suppose that" groups of n. observations are
be compared" example = 7. If all thu
t:)lbservations wen, uingle population nli%7innoe cr then thu
nananoe of groub E?hf"uld be crln., 80 of the means
(h6·673) in the ca.loulated their observed
E?E?ntter about thu should be an crln, caloulated
from differences between groups. Multiplying through by n., it therefore
foUows that
I-Ie
n. I (9.1-9 ..>2
(k-l)
I-Ie
} n.I (9 .1 -.i
"'= ;, .. )2
i~l _ _ __ (11.4.1)
k-l
gitized by Goo
11.4
summing the squares of deviations (SSD) of the individual observations
in a group from the mean for that grO'Up. Thus, for the jth group,
iI
1=1/-1
(YII-fj
- '---"~N=--$fu.~~- (2.1.8) and (20 1.4.2)
k
I nl is the of observato,{':&,,,o the
1-1
required estimate of a2 calculated from differences within groups. In
the present example its value is 2427'714/(28-4) = 101·155. An easy
method of calculating the numerator is given below.
Furthermore, if all N observations were from the same population,
a2 oould be estimated from the sum of squared deviations (SSD) of all
28 in this from thei" grand
rihus, using
riotal SSD = I = l: l:yjj2_(l: 1.4.3)
I
= (17"+ 16"+ ... +34 2 + 13 2 )-(632)"/28.
gle
§ 11.4: How to deal with two or more _mpU8 187
on the null hypothesis, improbably large. It is shown below that the
sums of squares in the numera.tors of (11.4:.1) and (11.4:.2) add up to
the total sum of squares (11.4.3). Furthermore, the number of degrees
of freedom for between groups (11.4:.1) comparisons, k-I, and that for
11.4:.2
TA.BLB
Ie
Between k-l 1:"I<9.I-g··)11 SSD/d.f. = MSII F = MStJMS.
groupe J
11.4.3
TA.BLE
AnalyaiB oj the rabbit blood BUlJar level ob8ervatiooB in Table 11.4.1
So_of d.r. Sum or Mean Variance P
variation sqUareB square ratio
Between
treatment. '-1- 3 1799,1'3 1799,1'3 = /J99.71
3
~=/J'93 <000011
101-111'
Within lI'lI7'71' = 101-11111
treatment. lIS-' - 2' lI'lI7'71'
lI'
Digitized by Google
188 The analllBiB 0/ variance § 11.4
If'the null hypothesis were true 599·71 and 101·16 would both be
estimates of the sa.me variance (002). Whether this is plausible or not is
found by referring their ratio, 5·93, to tables of the distribution of
F (see § 11.3) with A = 3 and /2 = 24 degrees of freedom. This
shows (see Fig. 11.3.1) that a value of F(3,24) = 5·93 would be exceeded
in only about 0·5 per cent of experiments in the long run (if the assump-
tions made are satisfied). Therefore, unle88 it is preferred to believe
that a 1 in 200 chance has come off, the premise that both 599·71 and
101·16 are estimates of the same variances will be rejected, and this
implies that all the observations cannot have come from the same
population, i.e. that the treatments really differ in their effect on blood
sugar level (see § 6.1).
Notice that this does not say anything about which treatments
differ from which others"":""whether all treatments differ, or whether
three are the same and one different for example. The answering of this
question raises some problems, and a method of doing it is disoussed in
§ 11.9. It is not correct to do t tests on all poBBible pairs of groups.
(b) the second term: again substituting the definition of fJ.1 shows
that the second term is 2fJ ..'J:.nifJ.I = 2fJ ..J:.T.1 = 29 ..G = 2G2/N,
because IT.I = sum of group totals = grand total, G, and fJ .. = GIN;
I
(c) the third term: because the sum of the group numbers Inl = N,
I
Digitized by Google
§ 11.4 How to deal with two Of' more IItJmplu 189
the total number of observations, and g.. = GIN, the third term is
gJ ..I-J = Ng.:~ = (PIN.
J
Substitution oftheee three results in (11.4.4) gives
= 1799·143
~ (1I1/-17,Jl1 = ~ [(III/-17./H(Y.I-Y.J]I
1-1 1-1
= ~ [(IIII-17./)I+2(1I1/-17./)(17.,-17.. H(17.I-17.,)I]
1-1
The last step in this derivation follows from (2.6.1), which shows that 1.:(111/-17./)
= 0, 80 the central term disappears. •
Digitized by Google
=
groups groups
Tbus is a purely algebraic result and must hold for any set of numbers. but
unleaa the observations can really be represented by the postul&ted model
(see 111.2) the components will not have the simple interpretation impHed in the
'source of variation' column of Table 11 .•. 2.
The t te8t on the reault8 oj Ct,uih""y and Peeblu written as a"" analyBia oj
variance
The calculations in § 9.4 can usefully be written as an analysis of
variance on the lines just described, with k = 2 independent groups.
The results and necessary totals are given in Table 10.1.1. Again refer
to § 2.1 if in doubt about the notation.
The first step is usually to calculate 02/N as it appears several times
in the calculations. This quantity is often called the correction Jadof'
Jor the mea"", because, from (2.6.5), it corrects l:y2 to l:(y_y)2.
From Table 10.1.1:
(a) correction factor G2/N = (30·8)2/20 = 47·4320; (11.4.8)
(b) total sum of squares, from (2.5.6) (cf. (11.4.3».
2 10
I I (YI/-ti ..>2 = l:l:ml- G2/N
lal 1-1
= 0'72+1.62+ ... +3.42-47.432
= 77·3680; (11.4.9)
(c) sum of squares between columns (i.e. between drugs A and B).
calculated from the working formula (11.4.5), is
2 T2 T2 0 2
I "/(Y.I_y ..>2 = ~+~--
1-1 ""1 ""2 N
7.52 23.32
= 10+10-47 '432
= 12·4820; (11.4.10)
and, as above, when divided by its number of degrees of freedom
this would give an estimate of a2 if the null hypothesis (that all
observations are from a single population with variance a2)
were true;
Digitized by Google
§ 11.4 HOUJ to deal witA ttOO Of' more «&mplu 191
(d) the sum of squares within groups can now be found by difference.
as in (11.4.6),
2 10
I I (YU-g./)2 ::z 77·3680-12·4820 = 64·8860.
1-1.-1
TABLB 11.4.4
Sum of
Source of va.rla.tion d.f. squares MS F P
Between drugs 1 12·.820 12'.820 3·.63 0'1-0·05
Error (or within drugs) 18 6.'8860 3'60.7
Total 19 77'3680
Digitized by Google
192 The aMly8i8 oJ variance 111.5
cannot be prepared to facilitate the test, which will therefore be tedious
(though simple) to calculate without an electronic computer. AB in
§ 9.3, this can be overcome, with only a small loss in sensitivity, by
replacing the original observations by their ranks, giving the Kruskal-
Wallis method described below.
The principle of the randomization method is exactly 808 in §§ 9.2,
9.3, and 8.2 80 the arguments will not all be repeated. If all four treat-
ments were equi-effective then the observed differences between group
means in Table 11.4.1, for example, must be due solely to the way the
random numbers come up in the process of random allocation of a
treatment to each rabbit. Whether such large (or larger) differences
between treatments means are likely to have arisen in this way is
again found by finding the differences between the treatment means
that result from all possible ways of dividing the 28 observed figures
into 4 groupe of 7. On the assumption that, when doing the experiment,
all ways were equiprobable, i.e. that the treatments were allocated
strictly at random, the value of P is once again simply the proportion
of possible allocations that give rise to discrepancies between treatment
means 808 large as (or larger than) the observed differences. In 19.2 the
discrepancy between two means was measured by the difference
between them. AB explained in §§ 11.3 and 11.4, when there are
more than two (say k) means, an appropriate thing to do is to measure
their discrepancy by the variance of the k figures, i.e. by calculating,
for each possible allocation the 'between treatments sum of squares' 808
described in § 11.4.
An approximation to the answer could be obtained by card shuftling,
808 in § 8.2. The 28 observations from Table 11.4.1 would be written on
cards. The cards would then be shuftled, dealt into four groupe of
seven, and the 'between treatments sum of squares' calculated. This
would be repeated until a reasonable estimate was obtained of the
proportion (P) of shuftlings giving a sum of squares equal to or larger
than the value observed in the experiment. In fact, just as in § 9.2
it was found to be sufficient to calculate the total response for the
smaller group, so, in this case, it is sufficient to calculate "2:.T}/1I.1 for
each possible allocation, because once this is known the between
treatments sum of squares, or between-treatments F ratio, follows from
the fact that the total sum of squares is the same for every possible
allocation.
By a slight extension of (3.4.3), the number of possible allocations of
N objects into k groupe of size 11.1, 11.2''''11.". ("2:.11.1 = N) isN '/(11.1 Ins 1.•.11." I).
Digitized by Google
§11.5 How to deal with two or more aamplu 193
the Table vt! theE:i3 there:GE:i3ti 281/(717
4725183,l24424oo possi:Glti allocatic)¥iiCt, This is too mi3rlY
enumerate by hand (doing one every 5 minutes it would take about 20
thousand million normal working years), though it is easy to select a
:>i%f~fciently random of 1f1fith an ""'1f1f,,,"1i%',',n,
this is tivailablE:, reoommE::>iYtiYl procedilii3ti
",. m%.nr.., .. of inYependen2 samples is, just &8 in § 9.3, to replace the
observations by ranks allowing tables to be constructed. This is known
&8 the Kruskal-Wallis method.
Ie samp:Gc E:cE:ndomizatii3ff&
analysis 0/ variance
This is simply an extension to more than two (Ie, say) groups of
WilcoxE:n):>iciCt-samplE: (see abotiti, § 9.3), :GE:fore, thti
:GhYltithesis all N tihE:~tiE:ntitions aE:"1iti the saKi,E: lE:JpulatioIl,
i2 this is reject.1f,,:G the conclusion will t:Gat the poptilations diyer.
it is wished to conclude that the population medians differ then it must
be assumed that the underlying distributions (before ranking) for all
24i3uPS are same t:Gom thti no nA.11"t".H''''''%'
Treatment
A B c
Score
7 1
115 8 15 2 81 11
12 6 6 3 14 7
8 I) 22 10
Ra=
12 (232 62 372)
H = - + - + - -3(11+1)
11(11+1) 4 3 4
= 8·277
.
Digitized by Google
§ 11.5 How to deal with two or more _mplu 195
= 28(28+1)/2 = 406 and, correctly, Rl+~+R8+R. = 406. Thus,
from (11.5.1),
12 (7H~SI 1212 11)2,52 61 2)
H = 28(28+1) -7-+-7-+-7-+"'7 -3(28+1) = 11·66.
Digitized by Google
196 § 11.6
randomized complete blook experiments when the observations are
described by the single additive model with normally distributed error
(11.2.2) described in § 11.2, which should be read before this section.
The analysis in § 10.6, Student's paired t teet, was an example of
a randomized block experiment with Ie = 2 treatments and 2 units
(periods of time) in each block (patient). This teet will now be reform-
ulated as an analysis of variance.
1-1
(p
N (11.6.1)
= 68·0780.
t The expected value of the mean lIqu&rell ( _ § 11.2, p. 173 and 11.', p. 186) are
derived by Brownlee (1966, Chapter 1'). Often the mixed model in which treatment.
are fixed effect. and blocks are random effect. is appropriate (loc. cit., p. '98).
Digitized by Google
§ 11.6 How to deal witA ttOO or more 8CJmplu 197
In this, the group (row, block) totals, T t ., are squared and divided by
the number of observations per total just as in (11.4.5). See §§ 2.1 and
11.4 if clarification of the notation is needed. Since there 3 = 10 groups
(rows, blocks) this sum of squares has 3-1 = 9 degrees of freedom.
The values of (J2/N(47'4320), and of the sum of squares between
drugs (treatments, columns) (12·4820) and the total sum of squares
(77·3680), are found exactly as in (11.4.8Hll.4.10). The results are
asaembled in Table 11.6.1. The residual or error sum of squares is again
TABLB 11.6.1
The paired t tul oJ § 10.5 tDt"iItm (J8 a3 analllN oJ vanaftC8. The mea"
IKJ'fU'rea are Jourul by dividi1l{/ the BUm oJ IKJ'fU'rea by their d.J. The F ratw.
are the ratio oJ eacA mea3 IKJ'fU're to the error mea" IKJ'fU're
8umof Mean F P
Source of variation d.f. aquarea aquare
Digitized by Google
198 'I'he anoJyBi8 01 variance J 11.6
would be exceeded in 0·5 per cent, and F(I,9) = 22·86 in 0·1 per cent
of trials in the long run, if the null hypothesis were true. The observed
F fa.1ls between these figures so 0·001 < P < 0·005, just as in § 10.5.
As pointed out in § 11.3 (and exemplified in § 11.4), F with 1 d.f.
for the numerator is just a value of t a 80 V[F(I,9)] = V(16·5009)
= 4·062 = t(9), a value of t with 9 d.f.-exactly the same value, as
was found in the paired t test (§ 10.6). Furthermore, the error variance
from Table 11.6.1 is 0·7564 with 9 d.f. The error variance ofthe differ-
ence between two observations should therefore, by (2.7.3), be 0·7564
+0·7564 = 1·513, which is exactly the figure estimated directly in
§ 10.6.
If there were no real differences between blocks (patients) then
6·453 would be an estimate of the same a2 as the error mean square
0·756. Referring this ratio (8·531) to Tables (see § 11.3) of the distribu-
tion of the F ratio (with 11 = 9 d.f. for the numerator and la = 9 d.f.
for the denominator) shows that the probability of an F ratio at least
as large as 8·531 would be between 0·001 and 0·005, if the null hypo-
thesis were true.
This analysis, and that in § 11.4, show clearly why thad 18 d.f. in the
unpaired t test (§ 9.4) but only 9 d.f. in the paired t test (§ 10.6). In the
latter, 9 d.f. were used up by comparisons between patients (blocks).
There is quite strong evidence (given the assumptions in § 11.2)
that there are real differences between the treatments (drugs), as
concluded in § 10.5. This is because an F ratio, i.e. difference between
treatments relative to experimental error, as large as, or larger than,
that observed (16·532) would be rare if there were no real (population)
difference between the treatments (see §§ 6.1 and 11.3). Similarly,
there is evidence of differences between blocks (patients).
Digitized by Google
§ 11.6 How to deal with two or mort 8tJmplu 199
TABLB 11.6.2
Wtal dia".." UBi1&(! Jour antibody preparatiotaB in piMa t»g,
Antibody preparation (treatment)
Guinea.
pig
(block) A B C D Totala (Te.)
1 U 61 62 43 207
2 48 68 62 48 266 226
3 63 70 66 53 242
4 66 72 70 62 250
Total (T. / ) 198 271 260 196 0= 926
Mean 49'6 67'7 66·0 49·0
Digitized by Google
200 The analy8i8 of variance § 11.6
preparations, and between different animals, for the same reasons as in
the previous example.
TABLB 11.6.3
Sum of Mean
Source of variation d.f. squares square F P
Between antibody 3 1188'6876 396·229 107·86 < 0'001
preps. (treatments)
Between guinea pip 3 270·6875 90·229 24·56 < 0·001
(blocks)
Error 9 33·0625 3'674
Total 15 1492·4376
Digitized by Google
§ 11.7 How to deal wieA two or more 8CJmplu 201
blocks there are (k I)" ways in which the randomization oould oome
out (an extension to k treatments of the 2· ways found in § 10.3).
If the randomization was done properly these would all be equi-
probable, and if the F ratio for 'between treatments' is caJculated for all
of these ways (of. § I U5), the proportion of oases in which F is equal to
or larger than the observed value is the required P, as in f 10.3. .AJJ in
§ 11.5 it will give the same result if the sum of squared treatment
totals, rather than F, is caJculated for each arrangement.
As in previous oases an approximation to this result oould be obtained
by writing down the observations on cards. The cards for each blook
would be separately ahuftled and dealt. The first card in eaoh block
would be labelled treatment A, the second treatment B, and 80 on.
If this prooeaa were repeated many times, an estimate of the proportion
(P) of oases giving 'between treatments' F ratios as large as, or larger
than the observed ratio, oould be found. If this proportion was small
it would indicate that it was improbable that a random allocation
would give rise to the observed result, if the observation was not
dependent on the treatment given. In other words, an allocation that
happened to put the same treatment group subjects that were, despite
any treatment, going to give a large observation, would be unlikely to
turn up.
Digitized by Google
202 The aWllN of variaftCe § 11.7
identical with the sign teet. (l,mpare this with the Wilooxon signed-
ranks teat (§ 10.4) in which proper numerical measurements were
neoeaaary because differences had to be formed between members of a
pair before the differences oould be ranked.
Suppose, as in § 11.6, that Ie treatments are oompareci, in random
order, in " blocks. The method is to rank the Ie observations in each
block from 1 to Ie, if the observations are not already ranks. The rank
totals, RI (see §§ 2.1, 11.4, and 11.5 for notation), are then found for
each treatment. H there were no difference between treatments these
totals would be approximately the same for all treatments. The sum
of the ranks (integers 1 to Ie) in each block should be 1e(1e+ 1)/2, by
(9.3.1), and because there are " blocks the sum of the rank sums should
be
Ie
I RI = n1c(Ie+l)/2. (11.7.1)
1-1
and find P from tables of the chi-squared distribution (e.g. Fisher and
Yates (1963, Table IV) or Pearson and Hartley (1966, Table 8» with
Ie-I degrees of freedom.
As an example, oonaider the results in Table 11.6.2, with Ie = 4
treatments and " = 4 blocks. H the observations in each block (row)
are ranked in aaoending order from 1 to 4, the results are as shown in
Table 11.7.1. Ties are given average ranks as in Table 9.3.1. This is an
approximation but it is not thought to affect the result seriously if the
number of ties is not too large.
Digitized by Google
§ 11.7 How to dt.al with two (JI' more ItJmplu 203
Applying the oheck (11.7.1), shows that 'E.RI should be 4.4.(4+1)/2
= 40, as found in Table 11.7.1. Now oa.loulate, from (11.7.2),
(40)2
8 = 62+162+132+62--4- == 66·00.
TABLB 11. 7 .1
PM ob.gemJtionB within each blocle in Pable 11.6.2 redvced 10 rtJna
(Antibody preparation (treatment)
Guinea. pig
(block) A B C D
1 1 3
• 2
2
3
•
Ii
Ii
2 •••
3
3
3
a
1
Rank aum(B/ ) B,. =6 Bs = 15 Ba = 13 B. = 6 ~B, = fO
Digitized by Google
204: § 11.7
treatments do not all have the same effect says nothing about which
ones differ from which others. It would not be oorrect to perform sign
testa on all possible pairs of treatment groups, in order to find out, for
example, whether treatment B differs from treatment D. A method of
answering this question is given in § 11.9.
11.8. The Latin square and more complex designs for experiments
There is a vast literature describing ingenious designs for experiments
but the analysis of almost an of these depends on the assumption of a
normal distribution of errors and on elaborations of the models des-
cribed in § 11.2. If the experiments are large there is, in some cases,
some evidence that the methods will not be very sensitive to the
assumptions. As the assumptions are rarely checkable with the amount
of data available it may be as well to treat these more complex designs
with caution (see comments below about use of small Latin squares).
Certainly if they are used the advice of a critical professional statistician
should be sought about the exact nature of the assumptions being
made, and the interpretation of the results in the light of the mathe-
matical model (see § 11.2).
To emphasize the point it should be sufficient to quote Kendall and
Stuart (1966, p. 139): 'The fact that the evidence for the validity of
normal theory tests in randomized Latin squares is flimsy, together
with the even greater paucity of such evidence for most other, more
complicated, experiment designs, leads one to doubt the prevailing
serene assumption that randomization theory will always approximate
normal theory.'
Digitized by Google
§ 11.8 How to deal witA two M more 8amplu 205
but with another additive component characteristics of each column
(injection site), is supposed to represent the real observations, then a
8um of 8qUares (see II 11.2, 11.3, 11.4:, and 11.6) can be found from the
observed 8catter of the column total8 (the corresponding mean 8quare,
would, &8 usual, estimate as, if the null hypothesi8 were true), and used
TABLB 11.8.1
The Latin square design
Column Injection Bite
Row 1 2 8
•
Gulnea
pia 1 2 8 , Total
1 A B C D 1 U 61 62 <&S 207
2 C A D B 2 62 .8 '8 68 226
,
8 D
B
C
D
B
A
A
C
8
•
6S
72
66 70
62 66
6S
70
2U
260
TABLB 11.8.2
A nalysi8 oj mnance JM the Latin square
Sums of Mean
Source of variation d.C. squares square F P
Between antibody prepara-
tion (treatments) 3 1188·6876 896·23 129'. <0'001
Between guinea pigs
(rows) 8 270·6876 90·28 29·5 <0'001
Between Bites (columns) 3 14·6876 "89 1·6 >0'2
Error 6 18·8760 8·06
Total 16 U92·.876
Digitized by Google
206 The aftalgBi8 oj mnance § 11.8
with 3 degrees of freedom (because there are 4 columna). The sums of
squares for differences between treatments and between guinea pigs,
and the total sum of squares, are euotly as in § 11.6. When these
results are filled into Table 11.8.2, the error sum of squares and degrees
of freedom can be found by difference, and the rest of the table com-
pleted as in § 11.6. Referring the variance ratio, F{3,6) = 1·6, to
tables (see § 11.3) shows that there is no evidence for a population
difference between injection sites (P > 0·2).
Digitized by Google
§ 11.8 How to deal with two qr more MJmplu 207
treatments. Sometimes it may not be possible to test every treatment
on every block as when, for example, four treatments are to be com-
pared, but each patient = block is only available for long enough
to receive two. It is sometimes still possible to eliminate differences
between blocks even when each block does not contain every treatment.
Catalogues of designs are given by Fisher and Yates (1963, pp. 25,
91-3) and Cochran and Cox (1957).
A nonparametric analysis of incomplete block experiments has been
given by Durbin (1951).
Examples of the use of balanced incomplete block designs for bio-
logical 8088&18 (see § 13.1) have been given by, for example, Bliss
(1947) and Finney (1964). General formulas for the simplest analysis
of biological 8088&ys based on balanced incomplete blocks are given by
Colquhoun (1963).
11.9. Data snooping. The problem of multiple comparisons
In all forms of analysis of variance discussed, it has been seen that
all that can be inferred is whether or not it is plausible that all of the
i treatments (or blocks, etc.) are really identical. If there are more
than two treatments the question of which ones differ from which
others is not answered. The obvious answer is never to bother with the
analysis of variance but to test all possible pairs of treatments by the
two sample methods of Chapters 9 and 10. However, it must be re-
membered that it is expected that the null hypothesis will sometimes
be rejected even when it is true (see § 6.1), so if a large number of
teats are done some will give the wrong answer. In particular, if several
treatments are tested and the results inspected for possible differences
between means, and the likely looking pairs tested ('data selection',
or as statisticians often call it 'data snooping'), the P value obtained
will be quite wrong.
This is made obvious by considering an extreme example. Imagine
that sets of, say, 100 samples are drawn repeatedly from the same
population (i.e. null hypothesis true), and each time the sample out of
the set of 100 with largest mean is tested, using a two-sample test,
against the sample with the smallest mean. With 100 samples the
largest mean is likely to be so different from the smallest that the
null hypothesis (that they come from the same population) would be
rejected (wrongly) almost every time the experiment was repeated,
not only in 1 or 5 per cent (according to what value of P is chosen as
low enough to reject the null hypothesis) of repeated experiments as it
I,
Digitized by Google
208 The aftalY8i8 of variance § 11.9
should be (see § 6.1). If the particular treatments to be compared. are
not chosen before the results are seen, allowance must be made for data
snooping. There are various approaches.
One way is to compare all possible pairs of treatments. This is
probably the most generally useful, and methods of doing it for both
nonparametric and Gaussian analysis of variance are described below.
Another case arises when one of the treatments is a control and it is
required to test the difference between each of the other treatment
means and the control mean. In the Gaussian analysis of variance
this is done by finding oonfidence intervals for the jth difference
as difference ± dBv(l/nc+l/n/) where nc is the number of oontrol
observations, n l the number of observations on the jth treatment, s is
the square root of the error mean square from the analysis of variance,
and d is a quantity (analogous to Student's t) tabulated by Dunnett
(1964). Tables for doing the same sort of thing in the nonpa.ra.metric
analyses of variance are given by Wilooxon and Wilcox (1964).
A third possibility is to ask whether the largest of the treatment
means differs from the others. Nonparametric tables are given by
McDonald and Thompson (1967).
TM. critical range method for testing all possible pairs in tM. KN.Ul1cal-
WaUiB nonparametric one way analysis of variance (§ 11.5)
Using this method, which is due to Wilcoxon, all possible pairs of
treatments can be compared validly using Table A7, though the table
only deals with equal sample sizes. The procedure is very simple.
Just caloulate the difference between the rank sums for any pair of
groups that is of interest. If this difference is equal to (or larger than)
the critical range given in Table A7, the P value is equal to (or less
than) the value given in the table. For small samples exact probabilities
are given in the table (they cannot be made exactly the same as the
approximate P values at the head of the column because of the dis-
continuous nature of the problem, as in § 7.3 for example). For larger
samples use the approximate P value at the head of the column.
The first example of the Kruskal-Wallis analysis given in § 11.5
cannot be used to illustrate the method because it has unequal groups.
The second example in § lUi, based on the (parenthesized) ranks in
Table 11.4.1, will be used. In this example there were J: = 4 treatments
and n = 7 replicates, and evidence was found in § 11.5 that the treat-
ments were not equi-effective. Consulting Table A7 shows that a
difference between two rank sums (seleoted from four) of 79·1 or larger
Digitized by Google
§ 11.9 How to deal with two or more MJmplu 209
would occur in about I) per cent of random allocations of the 2S subjeota
to 4 groups (i.e. in about I) per cent of repeated experiments if the
null hypothesis were true), that is to say P ~ 0·01) for a difference of
79·1. Similarly P ~ 0·01 for a difference of 95·S.
The simplest way of writing down the differences between all six
possible pairs of rank sums is to construct a table of differences, with
the rank sums from § 11.5 (or Table 11.4.1), as in Table 11.9.1. The
treatments have been arranged in ascending order so the largest
differences occur together in the bottom left-hand comer of the table.
TABLB 11.9.1
Treatment
rank II1JD1
I ,61
1
71-1;
2
121
3
152·5
, 61
10.5
1 71·5 11·5
2 121 60·0 49·5
3 152·5 91·5· 81'0· 31·5
--------
The differences marked with an asterisk in Table 11.9.1 are larger than
79·1 but less than 91)·S. So P is somewhere between 0·01 and 0·01) for
these differences suggesting (see § 6.1) that there is a real difference
between treatments 3 and I, and between treatments 3 and 4. All
other differences are less than 79·1 so there is little evidence (P > 0·01)
for any other treatment differences.
The critical range metlwd lor teating all poBBible paira in the Friedman
'1W'nfHJrametric two way analysia 01 mriQ/nu (§ 11.7)
This method, also due to Wilcoxon, allows valid comparison of any
pair of treatments in the Friedman method (§ 11.7), using Table As
in much the same way as just described for the one way analysis.
The results in Table 11.7.1 will be used to illustrate the method.
There are k = 4 treatments and n = 4 blocks (replicates), so reference
to Table AS shows that a difference (between any two treatment rank
sums selected from the four) as large as, or larger, than 11 would be
expected in only 0·5 per cent of repeated experiments if the null
hypothesis (see § 11.7) were true, i.e. if ranks were allocated randomly
within blocks. Similarly a difference of 10 would correspond to P
= 0·026.
A table of all possible pair differences between the rank sums from
Table 11.7.1 can be constructed as above in Table 11.9.2.
Digitized by Google
%10 § 11.9
AD the six di«erences are less than 10, i.e. none reaches the P = o-OH
level of significance. Despite the eridence (in § 11.7) that U1e four
tzeatmenta are DOt equi-effective, it is not, in this cue, poBble to
detect with any certainty which treatments di«er from which others.
Thia ia not 10 smprisiDg looking at the raub in Table 11.7.1. but
IookiDg at the original figmee in Table 11.6.2 mggeatB stroDgIy that
TABL. 11.9.2
D C B
8 13 16
7
9 2
(11.9.1)
(11.9.2)
where 8 2 is the variance of y (the error mean square from the analysis
of variance) and 11.1 is the number of observations in the jth treatment
Digitized by Google
§ 11.9 How to deal waIA two or mort 8amplu 211
mean, fh. The method is to construct confidence limits (see Chapter 7)
for the population (true) mean value of L as
L ± Sy[var(L)] (11.9.3)
where S = V[(k-I)F], and F is the value of the variance ratio (see
§ 11.3) for the required probability. For the numerator F has (k-I)
degrees of freedom, and for the denominator the number of degrees of
freedom associated with r. H the confidence limits include any hypo-
thetical value of L the observations cannot be considered inoompatible
with this value, as explained in § 9.4.
Digitized by Google
212 f 11.9
11 .•. 1. Todotbill,takeal = -I, ~= +I,aa= +1 anda. = -110
L = W2+lia) -(til +Ii.)· The true (population) value of this will be
zero if the hypothesis to be tested is true. The sample value is L
= -15-67+26·86+33'67-14·29 = 30-57. From (11.9.2) var(L) =
101·16 (-12/7+12 /7+12 /7+-12/7) = 67·80. 8 = 3·763 eDCtly aa
above, 10 the 99 per cent (P = 0'99) confidence limite for the population
value of L are 30-67 ± 3·763V(67·80) = 30·67 ± 28,61, i.e. +1·98
to +69'18. The limite do not include zero 80 the null hypotheaia that
the true (population) value of L is zero would be rejected if P < 0-01
were considered BUfticiently small (see § 6.1). The same could be said
of any difference (between the sum of any two means and the sum of
the other two), that exceeded 28·61.
EZMnfIk 3. The method can be used, at least as an approximation,
for randomized block experimente also. For the resulte in Table 11.6.2,
r = 3·674 with 9 d.f. (from Table 11.6.3). To test li2 against Iii take
al = -I, ~ = + I, as = 0, a. = 0, as in Example (I). There are
,,= ·heplicates,80 var(L) = 3.674:(-12 /4+1 2/4:+02/4:+02/4:) = 1,837,
from (11.9.2). And this value will be the same for the difference
between any two means. There are J: = 4: treatmente 80 values of
F{3,9) are required. From the Biometrika tables (see f 11.3) the
P = 0·26 value is 1·63 and the P = 0·001 value is 13·90. Thus, 8 =
V{3x 1·63) = 2·211 and 8v'£var{L)] = 2·211V(I·837) = 2·996 for
P = 0·26. And for P = 0'001,8 = V(3x 13'90) = 6·4:67,80 8v'£var
(L)] = 8'749. The differences between the six poBSible pairs of means
from Table 11.6.2 are, tabulating as above, shown in Table 11.9.3.
TABLB 11.9.3
Treatment D A C B
:Mean .9'0 .9·6 65'0 67'7
D .9·0
.A 49·6 0·6
0 65·0 16,0- 16'5-
B 67·7 18,7- 18,2- 2·7
Digitized by Google
§ 1 How
treatments A a.nd D, but that no difference ca.n be detected between
A a.nd D, or between Band C. Compare this result with rank analysis
of the sa.me observations (§§ 11.7 a.nd 11.9, above). Remember that a
norma.1 (Gaussia.n) distribution has been a.ssumed throughout these
""""hd!J0&"!'L#~ d"",,,,.f·..... the fa.ct th!!t !'±L#id!!itce was present~i,
"""""""""!!''''Uu. was justifi!!d,
gle
12. Fitting curves. The relationship
between two variables
tt4§Jt4§Jre of the
ll4§Jamples H%%,asurements 01" Hgiable
have been involved (e.g. blood sugar level or change in duration of
sleep). However, experiments are often concerned with the relationship
between two (or more) variables; for example, dose of drug and response,
concentration and optical density, time and extent of chemical reaction,
or school and university examination results. The last ofthese examples
different from ,md suggests of
"''''''''''''' llflCur.
variable can be ii(lcurately and itsl by
for when a moosurlfmllEi
of a drug. variable is ca11iit
variable (notice that independent in this context has a different meaning
from that encountered in §§ 2.4 and 2.7). The other variable, called the
dependent variable, is subject to experimental error, and its value
depends on the value chosen for the independent variable. For example,
a dependent ilhich is related to th'l inde-
r~%%","S"ii'" uariable (as be measurr4§J EUdHgible
gle
§ 12.1 The relatioMkip between two variablu 215
The expression 'fitting a curve to the observed points' means the
process of finding estimates of the parameters of the fitted equation
that result in a ca.lculated curve which fits the observations 'hest' (in a
sense to be specified below and in § 12.8). For example if a straight line
is to be fitted the 'hest' estimates of its arbitrary parameters (the slope
and intercept) are wanted. The method of fitting the straight line is
disc1l88ed in detail in §§ 12.2 to 12.6 because it is the simplest problem.
But very often, especially if one has an idea of the physical mechanism
underlying the observations, the observations will not be represented
by a straight line and a more complex sort of curve must be fitted.
This situation is discussed in §§ 12.7 and 12.8. Often some way of
transforming the observations to a straight line is adopted, but this
may have a. considerable hazards as explained in §§ 12.2 and 12.8.
It is, however, usually (see § 13.14) not justified to fit anything but a
straight line if the deviations from the fitted line are no greater than
could reasonably be expected by chance (i.e. than could be expected if
the true line were straight). In general it is usually reasonable to use
the simplest relationship consistent with the observations. By simplest
is meant the equation containing the smallest number of arbitrary
parameters (e.g. slope), the values of whioh have to be estimated from
the observations. This is an application of 'Occam's razor' (one version
of which states 'It is vain to do with more what can be done with fewer' :
William of Occam, early fourteenth century). The reason for doiDg
this is not that the simplest relationship is likely to be the true one,
but rather because the simplest relationship is the easiest to refute
should it be wrong. (The opposite would, of course, be true if the
parameters were not arbitrary, and estimated from the observations,
but were specified numerically by the theory.)
Digitized by Google
216 Fitting curves § 12.1
mean that the true relationship can be inferred to be straight (see
§ 13.14 for an example of practical importance).
The but .fitting curve (see § 12.8) is usually found using the metlwd oJ
It.a8t BqUlJrea. This means that the curve is chosen that minimizes the
'badness of fit' as measured by the sum of the squares of the deviations
of the observations (1/) from the calculated values (Y) on the fitted
curve. In other words, the values of the parameters in the regression
equation must be adjusted so as to minimize this sum of squares. In
the case of the straight line and some simple curves such as the parabola
the best estimates of the parameters can be calculated directly (see
§§ 12.2 and 12.7). The principle of least squares can be applied to
any sort of curve but for non-linear problems (see § 12.8; but note that
fitting BOme sorts of curve is a linear problem in the statistical sense,
as explained in § 12.7) it may not have the optimum properties that it
can be shown to have for linear problems (those of providing unbiased
estimates of the parameters with minimum variance, see § 12.8, and
Kendall and Stuart (1961, p. 75». For linear problems these optimum
properties are, surprisingly, not dependent on any assumption about
the distribution of the observations, but the construction of confidenoe
limits and all the analyses of variance depend on the assumption that
the errors ofthe observations follow the Gaussian (normal) distribution,
so all regression methods must (unfortunately) be classed as parametrio
methods (see § 6.2). Tests for normality are discussed in § 4.6.
Digitized by Google
§ 12.2 The relatioMhip betwem two variablu 217
Thua the best fitting straight line is the one that minimizes the sum of
squared deviations
I-N
8 = Id1 = l:(YI-Y/)l~ (12.2.1)
1-1
where YJ is the observed, and Y J the calculated, value of the dependent
variable corresponding to xJ. The resulting line is called the regre88ilm
liM oJ Y on. x. If the deviations of points from the fitted line were not
measured vertically as in (12.2.1), but, say, horizontally, the least
squares line would be different from that found in the way just described
It.
~ ---------------- .
.. ~ "" -r"1 i
y. -----------------
Y1
r
Flo. 12.2.1. The dependent variable 1/, plotted against; the Independent
variable :e. Definition of l/i, :el' Y I and d l for dJscuaslon of curve fitting.
(it would be called the regression line of x on y), but this would not be
the correct approach when the experimental errors are supposed to
affeot y only.
The general equation for a straight line can be written Y = at +bx
where a' is the intercept (i.e. the value of Y when x = 0). It will be
more convenient (for reasons explained in § 12.7) to write this in a
slightly different form, viz.
Y = a+b(x-i) (12.2.2)
where b is the slope, and a is the value of Y when x = i (so that
a-hi, which is a constant, is the same as at). The left-hand side is
Digitized by Google
218 Fitting curves § 12.2
written as capital Y to emphasize that the evaluation of the equation
gives the calculated value of the dependent variable, which will in
general, differ from the observations (y) at the same value of x, unless
the observation happens to lie exactly on the calculated line.
The true (population) regression equation, assuming the line to be
really straight, can be written, for any specified value of x,
p. = population value of Y = ex+P(x-i) (12.2.3)
where ex and p are the true parameters, of which the statistics a and b
are estimates made from a sample of observationa. Because the in-
dependent variable, x, is assumed to be measured with negligible error
there is no distinction between the observed and true values of x.
The problem is now to find the least squares estimates of ex and p,
from the observations. This will be done algebraically for the moment.
In § 12.7 the geometrical meaning of the algebra is explained. Firat
substitute the calculated value of Y at the jth value of x, which, from
(12.2.2), is Y I = a+b(xl-i), into (12.2.1) giving
I-N
S = I [Y/-a-b(xl -i)]2. (12.2.4)
1-1
Digitized by Google
§ 12.2 The relationahip between. two tJariablu 219
It is shown below (see (12.2.10» how the least squares estimates can be
derived without using ca.loulus at all. Thus, to find the least squares
value of a, differentiate (12.2.5) treating b as a constant
{)8 N
- = 2Na-2I,h = 0
8a 1-1
therefore N a = l:.1h
l:.y
80 a = N = Y· (12.2.6)
80
70
~
860
"=.~ SO
-8
1i"30
40
_SlopE' =0
'0 at minimum
e 20
rZ
10
04
5 6 i ~. 9 10 11 12
a = 8.00 P088ible values of a
FlO. 12 •2 .2. The sum of squared devia.tlou (8) plotted a.gaiD8t; various
values of II using eqn. (12.2.6). The data (If and 116 values) are those in Table 12.7.1
and b was held coutant at 3·00 (cf. Fig. 12.7.8). The slope ofthe curve, 08180,
is zero at the minimum. The graph is diacuaaed in detail in 112.7.
or (12.2.8)
Digitized by Google
220 FiUing C'Urve8 § 12.2
Although it is not immediately obvious, the numerators of these two
expressions for b are identical, as shown by (12.2.9) below.
Using (2.6.2) and (2.6.6) shows that the estimated slope, (12.2.8), can
be written b = cov(x, y)/var(x). It was shown in § 2.6 that l:(y-y)
hence the mtlH,Hllre the extent lHmds
]il7'ili:72H"i:7 mhen x is inore2ililmL
a:)
N
l: (11l-yHzJ- x ) = ~YPJ-y~-yzJ+gx]
i-I
= ~YJ(zJ -a:) -y(zJ -x)]
= :EYJ(zJ -x) -y:E(ZJ -a:)
N
-l: -x) (12.2.9)
The argument is exactly analogous to that already used for the arithmetic
mean in § 2.6 (p. 27). The sum of squares to be minimized, (12.2.4). can be
written
S = l:[YJ-a-b(zJ-x)]2 = ~YJ-y-b(zJ-xH~+N(a-Y)~+(b-b). :E(zJ-x)~.
(12.2.10)
where a and b denote possible estimates of ex and P. and ;; denotes, as in § 12.7.
,utdllU221t'22 of P given by expression is {H.6.6).
see that the ValU22U that minimize and
two
H2UH]H~], value. It can quEtu
the right Side, in the same sort of way as shown in detail for (h.h.h).
AS8'UmptionB made in find the leCUlt squares fitting and analysis oJ straight
lines
(1) The standard deviation of y was assumed to be a constant. That
t,hat the observiltif2ntl the same scattnu L"nints
80 that be attaohed {2bnuHhntions
buifnations, of.
condition is observations
h2,ilUn£j2tuut,[1,'§tlll. Quite often is not fulfillHd
Fig. 12.2.3. For instance, it is quite commonly found in practice that
there is a tendency for the smaller observations to have less scatter,
in a way that the relative scatter (e.g. the coefficient of variation,
gle
§ 12.2 TII~ relatioMkip bdween. two variablu 221
(2.6.4» is more nearly constant than the absolute scatter (e.g. the
standard deviation). If this is the case the observations (whioh are said
to show ~icity) should not be given equal weight, and this
makes the calculations more complicated (of. Chapter 14).
(2) The population (true) relation between y and % has been assumed
to be a straight line. In § 12.6 it will be shown how it can be judged
whether deviations from linearity can reasonably be ascribed to
experimental error.
y y
(a) Homoecedaatic' (b) Heteroecedaatil'
z
FlO. 12.2.S. (a) A homoscedastic curve-fitting problem (idealized). (b) An
eumple of heteroeced.aatic observations.
PM U8~ oJ tra1&8JormatioM
This discU88ion is olosely related to that in § 11.2, in which a method
of choosing a transformation to equalize variances was described.
Transformation (e.g. logarithm, square root, reciprocal) may be used
to make results conform with the above assumptions. For example,
if the observations are described by an exponential relationship.
Y = Yoe- IcZ, then taking natural logarithms gives log Y = log Yo-kz.
The regression of log y on % should therefore be a straight line with
intercept = log Yo and slope = -k. An example is worked out in
§ 12.6. Notice, however, that if y were homosoedastic and normally
distributed then log y would be neither, 80 it may not be possible to
Digitized by Google
222 Fittif&{/ CUrve8 § 12.2
satisfy all the assumptions simultaneously (see §§ 11.2, 12.8 and
Bartlett (1947». Tests for normality are discussed in § 4.6.
It is important to distinguish between the effects of transformations
of the dependent variable, '!I, on one hand, and on the independent
variable. ~. on the other. Transformations of ~ are often used to make
a line straight (e.g. response, '!I. is often plotted against the log of the
dose. ~. in pharmacology). This merely alters the spacing at which
points are plotted along the abscissa in Fig. 12.3, but cannot have any
effect on the homoscedasticity or distribution of errors of the observa-
tions, '!I. Transformations of '!I, on the other hand, affect these as well as
linearity.
12.3. Measurement of the error in linear regression
Consider the straight line fitted to the results in Fig. 12.3.1. As
before, '!I stands for the observation at a particular value of ~, and Y
y.
------------------------~
• L ______________ ~:~:~i i
y----------------- (y~~~{-j
0:
I
1
1
I
I
I
1
o 1
1
1
,-<
,x
X1
x
Flo. 12.3.1. Definition of terms used in curve fitting. The values of the
dependent variable are plotted on the ordinate and the independent variable (2:)
on the abscissa (see §§ 12.1 and 12.2). The five observed values (0),"1 to "11'
have been plotted against tbe corresponding 2: values, 2:1 to 2:11, and a straight
line fitted to them. The nature of the terms (" - Y), (Y -fj), and (" -fj) occurring
in eqna. (12.3.2) and (12.3.3), is illustrated for the fourth 2: value.
for the predicted value of the dependent variable (i.e. that calculated
from the estimated line) at a particular ~. The equation for the esti-
mated line, Y = a+b(~-i), can be written. using (12.2.6),
Y = g+b(~-i). (12.3.1)
Digitized by Google
§ 12.3 The relationahip betwem two t1ariablu 223
from which it can be see that the line must go through the point
(g, x) because Y = 9 when x = x (i.e. when x-x = 0).
This seotion is concerned only with errors in y, because x has been
a.ssumed to be measured without error (§§ 12.1, 12.2). The total devia-
tion of the observed point from the mean, in Fig. 12.3.1, can be divided
into two parts: (y- Y) = deviation of observed value from the line,
and ( Y -g) = deviation of predicted value on the line from the mean of
all observations. This can be written
(12.3.3)
in whioh the first term on the right-hand side measures the extent
to whioh the observations deviate from the line and is called the SSD
Jor deviations Jrom linearity. It is this that is minimized in finding least
squares estimates, see § 12.2. The seoond term on the right-hand side
measures the amount of the total variability of Yfrom 9 that is aocounted
for by the linear relation between y and x, and is oalled the SSD due to
linear regression. That (12.3.3) is merely an algebraio identity following
from (12.3.2) will now be proved.
Digitized by Google
224 § 12.3
The central term in the penultimate equation is zero because
2E(y - Y)( Y -fl)
= 2lb-fl-b(z-z)B+b(z-:8) -fl] (from (12.8.1»
= 2l:[yb(z -:e) -fjb(z -z) -b2(z _Z)2]
= 2bEy(z-z)-2b2E(Z-Z)2 (from (2.6.1»
= 2bEy(z-:8) -2bEy(z-z) = O. Q.E.D.
(from (12.2.7»
(2) .A toorking formvla for tile BUm of aqua,.. due to linear regreuioft
As US1l&l, it is inconvenient to calcuJate the individual deviations (Y -fl), and
a more coDvenient working formuJa is used. As before, the 8UIIlID&tions are over
aU N observations.
E(Y-fl)2 = E£Y+b(z-z)-fl]2 (from (12.8.1»
= l:(b(~-:e)]2 = b2E(Z-Z)2 (from (2.1.5».
Substituting (12.2.8) for the alope, b, gives the alternative forms:
[E(y -fl)(~ -:e)]2
SSD due to linear regreaaion = b2E(~_Z)2 or - E(
~-z)
_2 • (12.8.')
Digitized by Google
§ 12.4: The relatioMhip between two tJariable8 225
of a particular optical density. This sort of problem is probably the
most important in practice but its solution is rather more complicated
than for (a) and (b). Its solution will be given in § 13.14.
In case (b) confidence limits are required for a value of Y calculated
frt>m the fitted linu, for an observ»d § 12.2); and
uow be found.
the meaning see §§ 7.2 and
~yj(xj-i)
- (12.4.1)
~(xl-i)2
gitized by Goo
226 § 12.4
directly from (2.7.10) and is va.r[g].l:cJ. Now c] = (ZJ-i)2/[l:(zJ-i)2]2
80 l:cJ = 1:(ZJ-i)2/[l:(zJ-i)2P and therefore, cancelling,
va.r[g]
var[b] = ~( -)2' (12.4.2)
~zJ-z
The variance oj a
By (12.2.6) a = ii 80 var[a] = var[fi] = va.r[y]/N, by (2.7.8).
Oon.fidence limits Jor the true (populaticm) 8traight line
The value of Y estimated from the line, Y = a+b(z-i), is a linear
function of the observations because, as above, both a and b are. It
will therefore be normally distributed when the observations are.
The population mean value of Y at any given value of z is p (see
(12.2.3», 80 the error of a value of Y is Y -p which has a population
mean valuet of p-p = 0 and variance var[Y] (because p is a constant).
The variance of Y is
var[Y] = var[fi+b(z-i)] (by (12.3.1»
= va.r[fi]+var[b(z-i)] (by (2.7.3»
= va.r[fi]+(z-i)2ovar[b] (by (2.7.5»
= va.r[y] + (Z_i)2 var[y] (by (2.7.8) and (12.4.2»
N 1:(z/-i)2
(12.4.4)
Digitized by Google
The
Notice that the UBe of (2.7.3) 8.88umes that ti and bare uncorrelated
(i.e. in repeated experiments there will be no tendency for y to be
large in experiments when b is). This has not been proved but a similar
relationship is discussed in greater detail in §§ 13.S and 13.10. See also
§ l~t
','C"liCC",',g"D limits for I-' value of
cs?'lculating Y, arc,
Y
Several points about (12.4.4) are worth noticing. First, although
the term l:(Xj_X)2 is a constant (depending on the particular values
of x chosen) for a given experiment, the term (X_X)2 is not. The presence
of this latter term shows that the variance of Y, unlike that of y, ill
dependent on the value of x. The variance of Y will be at a minimum
whcTI bccaUBe at this Rc,cond term, wti£tt
,''','',,',~''F'' leaving var[ as expected
ti, see § 12.3). be seen that thR7
tJle width of the limits) increasec
in euher direction from X, becaUBe the deviation (x -x) is squared and
therefore always positive.
The common sense of these results is discussed further when they
are illustrated numerically in §§ 12.5 and 13.14 (and plotted in Figs.
12.5.1 and 13.14.1).
for new oblJ'erY5·{,4:C4'>"i,'?
gle
228 Fitting curtJU § 12.'
As expeoted (and 88 in § 7.') this reduces to (12.4.5) when m is very
Ia.rge 80 g. becomes the same as p. The prediction is that if repeated
experiments are conducted and in each experiment the limits calculated.
then in 95 per cent of experiments (or any other chosen proportion.
depending on the value chosen for t) the mean of m new observations
will fall within the limits. The limits. and g•• will of course vary
from experiment to experiment-see § 7.9. This prediction is. 88
usual. likely to be optimistic (see § 7.2). The use of this method is
illustrated on § 13.14 (and plotted in Fig. 13.14.1).
12.&. Fitting a straight line with one observation at each JC value
The results in Table 12.IU show a single observation on the dependent
variable. y. at each value of the independent variable. x. For example.
y might be the plasma concentration of a drug at a precisely measured
time x after administration. The common sense supposition that the
times have not been chosen sensibly will be confirmed by the analysis.
The assumptions n.ecesaary for the analysis have been discussed in
U 11.2. 12.1. and 12.2. and the meaning of confidence limits has been
discussed in 117.2 and 7.9. These should be read before this section.
TABLE 12.5.1
lie 11
160 59
165 54
169 M
175 67
180 85
188 78
Totals 1087 407
Digitized by Google
§ 12.G 229
The least aqua.ree estimate of the slope, p, is, by (12.2.8),
b= l:(y-g)(z-i)fI'.(z-ir~.
Digitized by Google
230 Fitting CUrt1e8 § 12.5
of stating the null hypothesis implies that the population mean of the
observations is always p (whatever the x value), i.e. it implies that
{J = 0, the way in whioh the null hypothesis was put above. The
probability that the ratio of two independent estimates of the same
variance will be 10·72 (as observed, Table 12.5.2), or larger, is 0·02 to
0·05 (see §§ 11.3 and 11.4), i.e. 10·72 would be exceeded in something
120
y I
100 /
/
--
80
#
X/ 0
-- -- -- /.
.
x--- ----
60
40
20
Or-------·OO~----~~~--~~I~OO~----~~~
z
-20
-40
-60
-80
- 100
FlO. 12.5.1. Observed points from Table 12.5.1.
--Least squares estimate ofatraight line (eqn. (12.5.1).
- - - 95 per cent confidence limits for Y, i.e. for the fitted line.
X Particular values of confidence limits calculated in the text.
between 2 and 5 per cent of experiments in the long run (the limitations
of the tables of F, see § 11.3, prevent P being found to any greater
acouracy).
In this analysis there is no good estimate of the experimental error.
beoause only one observation was made at each value of x. This analysis
Digitized by Google
§ 12.6 The. relationahip betwem two t1ariablu 231
should be compared with that in § 12.6, in which replication of the
observations gives a proper estimate of a2. The best that can be done
in this case is to a88'Utne that the line is straight, in which case the
mean square for deviations from linearity, 46'393, will be an estimate of
TABLB 12.6.2
8oU1'C8 of
variation d.f. SSD M8 F P
the error variance (see § 12.6). Following this procedure shows that a
value of b dift"ering from zero by as much as or more than that observed
(0'9716) would be expected in between 2 and 6 per cent of repeated
experiments if pwere zero. This suggests, though not very conclusively,
that 11 really does increase with x (see § 6.1).
Digitized by Google
232 § 12.5
The Gaussian confidence limite for the population value of
Y at x = 200 are thus, by (12.4.5), 94·23 ± 2·776'\1(72·72), i.e.
from Y = 70·56 to Y = 117·90; these are plotted in Fig. 12.5.1
at x = 200.
(b) x = 172·833 = i. At the point (x-i) = 0 80, from (12.5.1),
Y = 67·833 = g. From (12.4.4) var[Y] = var(y)(I/N) = 46·393/6
= 7·73, and the confidence limits for the population value of Y
are, by (12.4.5), 67·833 ± 2·776'\1(7·73), i.e. from 60·11 to 75·55.
(c) x = o. At this point, the intercept on the y axis, Y = 67·833
+0·9715 (0-172·833) = -100·1. Thisis, of course, a considerable
extrapolation beyond the range of the experimental results. From
(12.4.4),
1 (0-172.833r~)
var[ Y] = 46·393 ( 6+ 526.833 = 2638,
which is far larger than when x is nearer i. The confidence limits are
-100·1 ± 2·776'\1(2638), i.e. from -243 to +42·5.
The confidence limits, are much wider at the ends than at the central
portion of the curve which illustrates the grave uncertainty involved
in extrapolation beyond the observations. Moreover it must be re-
membered that these confidence limits a&9'Ume that the population
(true) line is really straight in the region of extrapolation. There is, of
course no reason (from the evidence of this experiment) to a&8ume this.
In fact with only one observation at each x value linearity could not be
tested even within. the range of the observations. The uncertainty in
the extrapolated intercept, Y = -100·1, at x = 0 is therefore really
even greater than indicated by the very wide confidence limits which
extend from -243 to +42'5 (even apart from the further uncertainties
dillC1188ed in § 7.2). The intercept does not differ 'significantly' from
zero (or even from +40 or -240) 'at the P = 0·05 level'.
Puting a hypothetical value with the t tut
As in § 9.4 the confidence limits can be interpreted as a t test, and
this will make it clear that the (rather undesirable) expression 'not
significant at the P = 0·05 level' means the result of the test is P
> 0·05. For example to test the hypothesis that the population value of
the intercept is p = +40, calculate, from (4.4.1),
t = (Y-p)/'\Ivar[Y] = (-100·1-40)/'\1(2638) = -2·728
with 4 degrees of freedom. Referring t = 2·728 to a table (see § 4.4)
of Student's t distribution shows P > 0·05 (two tail; see § 6.1).
Digitized by Google
§ 12.6 233
The curvature of the confidence limits for the population line is
only common sense because there is uncertainty in the value of a,
i.e. in the vertical position of the line, as well as in b, its slope. If lines
with the steepest and shallowest reasonable slopes (confidence limits for
fl) are drawn for the various reasonable values of a the area of uncer-
tainty will have the outline shown by the broken lines in Fig. 12.6.1.
Another numerical example (with unequal numbers of observations at
each point) is worked out in § 13.14.
Oonfi,tlen,u limita lor the slope. Identity 01 the analyriB 01 tHJrianu with a
ttut
In § 12.4 it was mentioned that the slope will be normally distributed
if the observations are, with variance given by (12.4.2). In this example
b = 0·97115 and var[b] = 46·393/1526·833 = 0·08806. The 915 per cent
confidence limits, using t = 2'776 as above, are thus, by (12.4.3),
0·97115±2·776v'(0·08806), i.e. from 0·16 to 1·80. These limits do not
include zero, indicating that b 'differs signficantly from zero at the
P = 0·015 level'.
Ju above, and as in § 9.4, this can be put as a t teat. The GauaBian
(normal) variable of interest is b, and the hypothesis is that its popula-
tion value (fJ) is zero 80, by (4.4.1),
b-fl 0,97115-0
t = v'var[b] = 0.08806 = 3·274
Digitized by Google
234 Fitting CUrtJe8 § 12.5
incidentAlly, . makes it clear why this component in the analysis of
variance should have one degree of freedom.
TABLB 12.6.1
Val'Ue8 0/ adrenaline (epinephrine) concentration, y Cpg/ml)
Time, II) (min)
6 18 80 42 64 Total
Digitized by Google
§ 12.6 235
blocks, between observers) component in the analysis of varianoe. The
additional factor, compared with the one-way analysis in § 11.4, is
o
o
o
o 6 18 30 42 54
Timer
FlO. 12.6.1. Observed mean adrenaHne concentration (g./) plotted against
time (z). Data of Bain and Batty (1956) from Table 12.6.1.
that part of the differences 'between treatments' (i.e. between the mean
conoentrations at the five different times) can be accounted for by a
linear change of conoentration with time (see § 12.3).
OalctUati'llfl lite analy8i8 0/ variance 0/ y
The first part is exactly as in § 11.4 (where more details will be found).
(1) OOf'f'ectifm/actor (PIN = (137.2)2/15 = 1264·923.
(2) Total sum o/aquarea, from (2.6.5) (cf. (11.4.3) and refer to § 2.1 if
you are confused by the notation).
II 3
I I (yu-g.J2 = '}:.'}:.y,~-G2IN
1,.1 '''I
= 30·()2+8·92 + ... +2·22 +1·()2-1264·923
= 1612·037.
(3) Sum 0/ aquarea (SSD) between. columm (Le. between the ooncen-
trations at different times), by (11.4.5), is
87.12 27.72 2.42
SSD between times = -3-+-3-+"'+3-1254'923 = 1605·937.
This SSD can be split into two components, just as in § 12.5. In this
case the calculations could be made easier by transforming the
independent variable (x), as shown at the end of this section. But,
for generality, the full calculation will be given first.
Digitized by Google
236 112.6
(a) 811m oJ MJ1UII'U title to liMaf'regreuicm. This is found. from (12.3.4)
88
N
SSD = [I(y-y)(x-i)]2 I(x-i)2.
IN
It is easy to make a mistake at this stage by supposing that there
are only five x values, when in fact there are N = 15 values. This
will be avoided if the N = 15 pairs of observations are written
out in full, as in Table 12.6.1, rather than in the oondensed form
shown in Table 12.6.1. This is shown in Table 12.6.2.
TABLE 12.6.2
6 80·0
6 28·6
6 28·6
18 8·9
64 0·8
64 0·6
54 1·0
Firstly find the sum of products using (2.6.7) and Table 12.6.2:
Digitized by Google
111.8 237
Secondly, find the BUm of aquarea for %. From (2.8.6)
II II (U)I 46C)I
!(%-zf' = !r- N = ea+ea+ ... +M2+MI _15
46C)I
= 3(61+182+ ... +6041 )-15
= 4320·000. (12.6.2)
From (12.3.4) the SSD due to linear regreeaion now follows
SSD = (-2286·00)2 = 1209.88
4320·000 .
,b) SSD lor tkviatioM from liMarity. As in § 12.4 this is most easily
found by dift"erenoe (cr. (12.3.3»
SSD due to deviations from linearity = SSD between % values-
SSD due to linear regres-
sion (12.6.3)
= 1606·937 -1209·68
= 396·26.
(4) SSD lor etTor. This is simply the within groupB SSD of § 11.4.
The experimental error is assessed from the scatter of replicate observa-
tions of each % value. It has N -Ie = 15-5 = 10 degrees of freedom
TABLB 12.6.3
GaU88ian analYN 01 roriance 01 y
Source d.C. SSD MS F P
Linear regreaDOD I 1209·68 1209·68 1988 <0·001
Deviations from
tinearlty k-2 = 8 896·26 132·09 216·5 <0·001
Between lie valuee
(timee) k-l = 4 1605·937 401·48 658·2 <0·001
Error (within lie
valuee (timee)) N-k = 1O 6·100 0·6100
Total N-I = 14 1612·037
Digitized by Google
238 Fitting c.rves § 12.6
resembling Table 12.5.2 (except that the number of different x values, k,
is no longer the same as the total number of observations, N).
The table is completed as described in Chapter 11 and § 12.5. Each
mean square would be an estimate of a2 if the null hypothesis that all
15 observations came from a single normal population (with variance
a2) were true. The ratio of each mean square to the error mean square is
referred to tables of the F ratio (see § 11.3), to see whether it is larger
than could be expected by chance. Although a considerable part of the
differences between the mean adrenaline concentrations at different
times ('between times') is accounted for by a linear relationship between
concentration (y) and time (x), the remainder ('deviations from linearity')
is still much larger than could be reasonably expected if the true line
were straight. P < 0·001, i.e. deviations from linearity as large as
those observed, or larger, would occur in far fewer than 1 in 1000
repeated experiments if the true line were straight, and if the assump-
tions about normal distributions, etc (see § 12.2), made in the calcula-
tions are sufficiently nearly true.
There are now two possibilities. Either a curve can be fitted directly
to the observations (see §§ 12.7 and 12.8), or a transformation can be
sought that converts the graph to a straight line. The latter approach is
now desoribed.
(12.6.4)
t The symbol k has already been used for the number of treatments (times). but
there should be no risk of oonfusion between ita two lllMIIings.
Digitized by Google
§ 12.6 The relatiOMkip between two variables 239
(Remember the log is the power to which the base must be raised to
give the argument, 80 loge e- 1cz = -kz.) Therefore there is a straight
line relation between log y and x, with slope -k and interoept log Yo'
The half-life of adrenaline is related to the rate constant in a simple
way. Putting y = yo/2 in (12.6.5) gives the half-life as
loge2 0·69315
XO'6 =k = k (12.6.6)
(12.6.7)
TABLE 12.6.4
Values oflogloY f()IUnd from Table 12.6.1
Time, IIC (min) Total
6 18 80 42 64
Digitized by Google
240 Fitting CUrtJe8 § 12.6
observations the analysis will, &8 previously emphasized (see §§ 4.2,
6.2, and 7.2), be in error to some unknown extent. 1J y were known to
be normally distributed the methods of § 12.8 would be preferred to
that now described.
To see whether the straight line defined by (12.6.7) fits the observa-
tions, the logarithms of the observations are tabulated in Table 12.6.4.
TABLE 12.6.5
GaUilBian. an.alyBia oJ mrian.ce oJ logloY
Source d.f. SSD MS F P
1·7
!
1 1.0
'0
0'0
-0.3L-..-;6!----t.18;;---~30~-~4~2~---;}54
Timer
Flo. 12.6.2. Same data 88 Fig. 12.6.1, but the mean value of the ioglO
adrenaline concentration (from Table 12.6.4) is plotted against time. The line is
that found by the method ofleaat squares, eqn (12.6.12).
The mean log concentrations are plotted against time in Fig. 12.6.2.
The graph looks much straighter than Fig. 12.6.1. The analysis of
variance of the log concentrations in Table 12.6.4 is now calculated in
exactly the same way &8 the calculation of the analysis of variance of
Digitized by Google
§ 12.6 241
the ooncentrations themselves (Table 12.6.1). The result is Table
12.6.5. Compare Table 12.6.3.
The results in Table 12.6.5 show that almost all the variation of the
log ooncentrations between times is aooounted for by a straight line
relationship between log y and time and the evidence against the null
hypothesis that the true slope of this line, p, is zero is very strong.
Deviations from linearity 88 large 88, or larger than, those observed
would be expeoted to ooour in between 10 and 20 per cent of repeated
experiments if the true population line were straight (if the assumptions
made are oorrect). There is, therefore, no compelling reason to believe
that the relation between log y and time is non-linear, i.e. the experi-
mentprovidesnoevidencethateqn. (12.6.7),andhenceal80eqn. (12.6.4),
fit the observations inadequately. In other words there is no reason
to believe that the concentration of adrenaline does not decay
exponentially.
Having established that it is reasonable to fit a straight line to the
log concentrations, the next step is to estimate the parameters (slope
and intercept) of the line.
Fitting the Btraight line
If the log observations are denoted y', i.e.
y' = IOglo1!, (12.6.8)
the equation to be fitted (12.6.7) can be written 88
IOglO Y == y' = a+b(x-i), (12.6.9)
whioh has the same form as in previous examples in this ohapter.
Using (12.2.6) the estimate of a is
9·9159
a = y' = 15 = 0·66106.
To estimate the slope, the sum of products is first found as described
in the analysis of untransformed concentrations (eqns. (12.6.1) and
(2.6.7»
"(' li')( x-x_) = "~X1J,
~ Y -:I
(1:x)(1:y')
N
= (6X4·3883)+(18x 2·8859)+ ... +(S4x -0'3178)-
(450)(9·9159)
15
= -135'446 (12.6.10)
Digitized by Google
242 § 12.6
This is negative because 1/' decreases with x (see § 2.6). The sum of
squares for x is found as in eqn. (12.6.2), and is 4320·000 as before. The
slope is therefore estimated, by (12.2.8), to be
l:(y' -g')(x-i) -135'466
b= l:(x-xr~ = 4320.000 = -0·03135. (12.6.11)
Digitized by Google
§ 12.6 The relati0n8kip between two variable8 243
these values of b are now found as above (xu = 0,69315/( -2·3026b».
Because 2·3026 and 0·69315 are C01&8tant8, not variables, no additional
error enters in the conversion of b to xu. The 95 per cent Gaussian
confidence limits for the true half-life are thus 8·930 to 10·38 min. As
usual these limits can be interpreted as in § 7.9 only i/ a.ll the assump-
tions disoUBBed in §§ 7.2, 11.2, and 12.2 are fulfilled. And, as usual,
the limits are likely to be optimistio (see § 7.2) .
~ -2 -1 0 +1 +2
Digitized by Google
244 Fitti"'ll CUrve8 1)2.7
Finding ka8t 8tJU4ru solutioM. The geometrical meaning oJ the algebra
In § 12.2 the least square estimates, is and 6, of the parameters, «
and p, of the straight line (12.2.3) were found algebraically. (In this
section; and in 112.8, the symbols is and 6 will be used to distinguish
least squares estimates from other poBBible estimates of the parameters.)
It will be convenient to illustrate the approach to more complicated.
curves by first going into the case of the straighter line in greater
detail.
The intention is to find the values of the parameter estimates that
make the sum of the squares of the deviations of the observations
(y) from the ca.lculated values (Y), 8 = l:(y- y)2 (eqns. (12.2.1) and
(12.2.5», as small as poBBible. Notice that during the utimation procedure
the ezperimen.tal obaertJationB are treated as C<m8ta1ll8 (the particular
obaertJationB made) and various po88ible val'Ue8 oJ the parameters are
CO'II.8idered. The conventional way of finding a minimum, as in § 12.2,
is to differentiate and equate to zero. How this works was illustrated. in
Fig. 12.2.2, in which 8 was plotted. against various poBBible values for
a(b being held constant). The slope of this graph (i.e. 08/&) is zero at
the minimum, and the corresponding value of a is taken as the least
squares estimate of «. The curly 0, indicating partial differentiation
means that b is treated. as a constant when differentiating (12.2.5) to
obtain (12.2.6). This means that b is given a fixed value which is
inserted, along with the experimental observations (from Table 12.7.1)
into (12.2.5) 80 that 8 can be ca.lculated for various values of a,
giving the curve plotted. in Fig. 12.2.2. It may occur to you to ask
whether the value at which b is held constant makes any difference to
the estimate of «. In fact it does not because the expreBBion found for
08/& did not involve b, and similarly the expreBBion for 08/& did not
involve a. The geometrical meaning of this will be made clear using the
data in Table 12.7.1.
Fitting a straight line in the form
Y = a+b(x-i) (12.7.1)
Digitized by Google
§ 12.7 245
TABLB 12.7.1
til: 11
-2 1
-1 4
0 6
1 9
2 11
S 10
4 15
Total 7 56
Mean 1'0 8'0
16
~2~---~I----O~--~--~2--~3~--J4
FlO. 12.7.1. A atrafght Une (eqn. (12.7.2)) fitted by the method of leut
squares to the data In Table 12.7.1.
Digitized by Google
246 Fitting CUf'Ve8 § 12.7
greater than could be reasonably expected if the population 8lope (fJ)
were zero.
The least 8quares estimate8 given in (12.7.2) are d = 9 = 8·000 and
6 = 2·107, calculated from eqns. (12.2.6) and (12.2.7). H the values of x
and y from Table 12.7.1 are inserted in the expre88ion for the 8um of
TABLE 12.7.2
Source d.f. 8S MS F p
Total 6 182·000
Digitized by Google
Rections taken along theae lines
[
5~------~----~--~r---~~~~~--~
o 2 3 3·4 "
b Values oU
FIG. 12.7.2. Contour map of the sum of squared deviations, S (on an &Xis
perpendicu1a.r to the paper), plotted against various values of a and b using eqn.
(12.2.5) and the data in Table 12.7.1 (plotted in Fig. 12.7.1). The contours for a
stra££lE£lEt in the fonn always have
ValUL'0:§ ililSkLked on the m%nimum value of S~
£Sues the least SqUiliLU as c'l = 8'000
,,£lEe valley, along are plotted in
lSk~uint on each llnL 12.7.3) is marlSk'0:§J
gle
24:8 § 12.7
where «' is the intercept, {J the slope, and p the population value of y.
Inserting the estimates of the parameters gives
Y = d'+6x (12.7.3)
and, because (12.7.2) can be written as
Y = 5·893+2·107x, (12.7.4)
80
70
20
s_10
0~--~--~--~~--~--~--~~~7-~,
" 5 6 7 8 9 10 11 12
•= 8.00
a V&lues of II
Flo. 12.7.3. Sections &Cl'OIIII the valley along the lines indicated in Fig. 12.7.2.
The slope of the line, as/aa, is zero when S is at a minimum, &8 shown in Fig.
12.2.2. The value of S at the bottom of the valley is 7 ·679 &8 shown, and &8 found
in Table 12.7.2.
Digitized by Google
§~12.7
to the straight line in the form of (12.7.3), in just the same way as it
was applied in § 12.2 to the straight line in the form of (12.7.1).
Denoting the observations y and the values calculated from (12.7.3)
as Y, as in § 12.2, gives the sum of squared deviations, which is to be
minimized, as
8 = l':(y_Y)2 = l':(y-a' -bX)2
= l':(y+a'2+bY-2a'y-2ybz+2a'bx)
= l':r+Na'2+b2l:x2-2a'l:y-2bl':yx+2a'bl:x. (12.7.6)
This is analogous to (12.2.5), but notice that this time the last term is
not zero. As in § 12.2,8 is differentiated with respect to a', treating b as a
constant, giving
08
ea' = 2Na' -2l':y+2bl:x, (12.7.7)
and equating this to zero to find the value of a' for which 8 is a minimum
(see Fig. 12.7.5) gives
Na' +bl:x = l':y. (12.7.8)
The value of a' for which 8 is a minimum is no longer independent of b,
as shown by the presence of b in (12.7.8), the solution of which will
depend on the value of b chosen.
Differentiating (12.7.6) with respect to b, holding a' constant, gives
08
ab = 2bl:x2-2l':yx+2a'l':x, (12.7.9)
Digitized by Google
250 F'tt'ng curves § 12.7
The contours are still elliptical, but their axes are no longer parallel
with the coordinates of the graph. When sections are made acr088 the
valley at the values of b shown in Fig. 12.7.4, the results are &8 shown in
Fig. 12.7.6.
6
U'
b
FlO. 12.7.4. Contour maps of S (values marked on the contours) for same
data as Fig. 12.7.2, but straight line fitted in the form Y = a'+bz. Sections
&Cl'08S the valley, along the lines shown, are plotted in Fig. 12.7.5. The lowest
point along each line (minima in Fig. 12.7.5) is marked x.
The value of a' for which 8 is a minimum is seen to depend on the value
at which b W&8 held constant when making the section aCl'088 the valley,
&8 expected from (12.7.8). Of course, the slope of the curves in Fig.
Digitized by Google
§ 12.7 The relatioMhip betwetf& two txJriahle8 251
12.7.5, 08/0a', is zero at the minimum of each curve. But the only
point at which 08/ob is BimuZtane0'U8ly zero is at the bottommost
point of the valley in Fig. 12.7.4 (hence the BimuZtaneotl8 equations).
For example, on the curve for b = 3·4, in Fig. 12.7 .li, 8 is at a minimum
(i.e. oS/Oa' = 0) at the point a' = 4·6. Inspection of Fig. 12.7.4 makes
it clear that if a section is made acro88 the valley (at 900 to the sections
80
S
70
60
50
40
30
20
10
S ...
I
I
~~)---7.3~--47---~5~-'~6--~7~--~8--~9~~10
a' Values of a'
FlO. 12 •7 .5. Sections &Cl'OII8 the valley, along the lines shown in Fig. 12.7.4.
when a straight line is fitted in the form Y = a' +bit. The value of S at the bottom
of the valley is 7·679 as before.
in Fig. 12.7 .li) at a' = 4·6, giving a plot of 8 against b (with slope
= oS/ob), the minimum will not be at b = 3·4. That is to say, at the
point a' = 4·6, b = 3·4, oS/oa' is zero but 08/ob is not.
It is now clear that the effect of writing the straight line in the form
Y = a+b(x-x), is to make the estimates If and 6 independent of each
other 80 two simple independent equations (derived in § 12.2) can be
used for their estimation. lithe line is written in the form Y = a'+bx,
then the estimates are no longer independent, but must be found by
solving simultaneous equations.
Digitized by Google
262 F'tt'n.g Ct.U't1U § 12.7
WAaI dou linear mean t
The term linear, as usually based by statisticians, embraces more
than the simple straight line. It includes any relationship of the form
(12.7.11)
where Xu Xlh Xa, ••• are independent variables (see § 12.1; examples
are given below), and a, b, c, d, . . . are estimates of parameters. This
relationship includes, as a special case, the straight line (Y = a+bx),
which has already been disc1188ed at length. Equation (12.7.11) is des-
cribed as a mull'ple linear regression equation (the 'linear' bit is,
sad to say, often omitted). As well &8 describing straight line
relationships for several variables (Xl' X" • • • ), (12.7.11) also includes,
for example, the parabola (or 8econcl degree polynomial, or quadratic),
Y = a+bx+cT, &8 the special case in which X, is the square of Xl'
(As disc1188ed in § 12.1, an 'independent variable' in the regression
sense is simply one the value of which can be fixed precisely by the
experimenter; it does not matter that in this case Xl and X, are not
independent in the sense of II 2.4 and 2.7 since their covariance is not
zero. All that is required is that the values of Xl' X" • • • be known
precisely.) The parabola is not a straight line of course, but it is linear
,n the 8ense tAal Y is a linear function (p. 39) 0/ the parameter8 i/ the X
tHJluea are regarded as wnatanta (they are fixed when the experiment is
designed). This is the sense in which 'linear' is usually used by the
statisticians. It turns out that for (12.7.11) in general (and therefore
for the parabola), the estimates of the parameters are linear functions
of the observations. This has already been shown in the case of the
straight line for which 8 = g, and for which 6 h&8 also been shown
(eqn. (12.4.1» to be a linear function of the observations. This means
that the parameter estimates will be normally distributed if the
observations are, and the standard deviations of the estimates can be
found using (2.7.11). Also, if the parameter estimates are normally
distributed, it is a simple matter to interpret their standard deviations
in terms of significance tests or confidence limits. Furthermore, linear
problems (including polynomials) give rise to linear simultaneous
equations (like (12.7.8) and (12.7.10» which are relatively easy to
solve (cf. § 12.8). They can be handled by the very elegant branch of
mathematics known &8 matrix algebra, or linear algebra (see, for example,
Searle (1966), if you want to know more about this). It is doubtless
partly the aesthetic pleasure to be found in deriving analytical solutions
Digitized by Google
§ 12.7 The relationBhip between two tNJriahlu 253
in terms of matrix algebra that has accounted for the statistical litera-
ture being heavily dominated by empirical linear models with no
physical basis, and, much more dangerous, the widespread availability
of computer programs for fitting such models by people who do not
always understand their limitations (some of which are mentioned
below).
Polynomial curt1e8
It does not change the nature of the problem if some z values in
(12.7.1) are powers of the others. Thus the general polynomial regression
equation
(12.7.12)
Digitized by Google
204 FiUing CUrve8 § 12.7
required for the prediction-see § 13.14. In this case a polynomial
curve might be useful if the observed line was not straight.
Digitized by Google
§ 12.7 The relatiOfUlhip between. two mriablea 255
working days lost though illne88 per 1000 of population in various
areas of a large city. Call this number y. It is thought that this may
depend on the number of doctors per 1000 population (Zl) in the
area and the level of prosperity (say mean income, ZII) of the area.
Values of y, Zl' and ZII are found by observations on a number of areas
and an equation of the form of (12.7.13) is fitted to the results. Even
supposing (and it is not a very plausible &88umption) that such complex
results can be described adequately by a linear relationship, and that
the other &88umptions (§ 12.2) are fulfilled, the result of such an
exercize is very difficult to interpret. Suppose it were found that areas
with more doctors (Zl) had fewer working days lost through illne88 (y).
(If (12.7.13) were to fit the observations this would have to be true
whatever the prosperity of the area.) This would imply that the co-
efficient b must be negative. Suppose it were also found that areas with
high incomes had few working days lost through illne88 (whatever
number of doctors were present in the area), so the coefficient c is also
negative. Inserting the values of a, b, and c found from the data into
(12.7.13) gives the required multiple regression equation. If Zl in this
equation is increased y will decrease (because b is negative). If ZII is
increased y will decrease (because c is negative). It might therefore be
inferred (and often is) that if more doctors were induced to go to an
area (increasing Zl), the number of working days lost (y) would decrease.
This inference implies that it is believed that the presence of a llf,fge
number of doctors is the cawe of the low number of working days lost,
and the data provide no evidence for this at all. Whatever happens in
the equation, it is clear that in real life one still has no idea whatsoever
what will happen if doctors go to an area. The number of working days
lost might indeed decrease, but it might equally well increase. For
example, it might be that doctors are attracted to areas of the city
which are near to the large teaching hospitals, and that these areas also
tend to be more prosperous. It is quite likely, then that most people in
these areas will do office jobs which do not involve much health hazard,
and this might be the real cawe of the small number of working days
lost in such areas. Conversely,le88 prosperous areas, away from teaching
hospitals, where many people work at industrial jobs with a high
health hazard, (and where, therefore, many working days are lost
through illne88) attract fewer doctors. If the occupational health hazard
were the really important factor then inducing more doctors to go to
an area might, far from decreasing the number of working days lost
according to the naive interpretation of the regression equation, might
II
Digitized by Google
§ 12.7
actually increase the number lost, because the occupational health
hazards would be unohanged, and the larger number of doctors might
increase the proportion of oases of ocoupational disease that were
diagnosed. Similarly it cannot be predioted what effect in the ohange in
,"F',',YTFd"FFit,v of an area the number of lost.
I'FjldCI',I'JI',ion equation most) only the
j,I'ldCS notM",(/ at aU
gle
§ 12.7
where %1 is defined to have the value 1 for all responses to treatment
1 (j = 1) and 0 for all responses to treatment 2{j = 2), and %2 is 1 for
responses to treatment 2, and 0 for responses to treatment 1. Inserting
these values (12.7.14) reduces to lIu = P+T1+eU for treatment I,
and to 11'2 = P+T2+e'2 for treatment 2, exactly as in § 11.2 H the
estimates of T1 and T2 from the data are called b and c, and estimate of p
is called a, the estimated value for ith response to the jth treatment
becomes Y = a+b%1+C%2, identical with (12.7.13). The estimation of
treatment effects (values of T) is the same problem as the estimation of
the regression coefficients. An intermediate level disCU88ion of this
approach will be found in the first (1960) edition of Brownlee's (1966)
book.
Digitized by Google
258 Fittift(/ C1U'tIe8 § 12.8
problem of estimating the error of these estimates from the experimental
results is important but complicated, and it will not be considered here
(see Draper and Smith 1966). Oliver (1970) has given formulas for
calculating the asymptotic variances of V and K from BCatter of the
observations 8(y). H there were several observations (y values) at each
:e then 8(y) would be estimated from the scatter of these values 'within
.----- -------
.-.---
--.--
®
Y(LB plot)
-----
10
I Two population
standard dl'viations
fKplot)(LB tf(true)
fK(Le&lit squares)
10 20 40
Substrate concentration,x
FlO. 12.8.1. Fitting the Hicbaelie-Menten hyperbola.
o'Observed' values from Table 12.8.1.
- - True (population) hyperbola (known only because the
'observations' were computer-simulated, not real. See
discussion on p. 268). The population standard deviation Is
a(y) = 1·0 at all fie values.
- . - Least squares estimate of population tine found from
'observed' values.
- - - Lineweaver-Bark (LB, or double reciprocal plot) estimate
of population tine found from the same ·observations'.
The true values, -r and :JI", and their values estimated by the two methods
(from Table 12.8.4) are marked on the graph.
:e values', but in the following example where there is only one observa-
tion at each :e, the best that can be done is to assume the population
curve follows (12.8.1), in which case the sum of squares of deviations
from the fitted curve, SmlD will be an estimate of r(lI). This is exactly
like the situation for a straight line discussed in § 12.6. The formulas
involve the population values -r and .1f' for which the experimental
Digitized by Google
--~----
§ 12.8 269
values V and K must be substituted. No allowance is made for the
uncertainty resulting from the use of sample values V, K, and 8(y) in
place of population values "/1",r, and a(y) so the formul808 are to some
extent optimistic. Using them is just like using the normal deviate "
instead of Student's' (see §§ 4.3 and 4.4).
S = l:(y_Y)2 = ~(Y_::JI
(12.8.2)
Digitized by Google
§ 12.8
TABLB 12.8.1
RuulU 01 Gft ~ lcinelic expmfM1ll. PM populat"'" (Irw) wlociIiu
Me tWo gifJm. Plaq Me hotm only becaue 1M 'expmfM1ll' U7GI tIOI
mil, but U7GI rim1llat«l ma (J computer, tU di8cuued later 'n tAU aectiota.
K=15'89
FlO. 12.S.2(a) Fitting the Mtcbaeli&--Menten hyperbola. Contour map of the
IIUIDof squared devlatlona, S (on an aUa perpendicular to the paper), aplDat
various values of K aDd V. This figure is analogous to Figs. 12.7.2 and 12.7.4
which referred to the fitting of a atralght line. The values of S, calculated from
eqn. (12.S.2) using the obBervationaln Table 12.S.1, are marked on the contours.
(a) Thia covera the (phyalcally important) positlve values of V and K. The
mlnlmum value of S, 4'828, at the bottom of the valley oorreaponda to the leut
square estimates 9 = 81·45 and g = 16·S9.
Digitized by Google
§ 12.8 261
FIG. 12.8.2(b) Thla shows the contour map in the recton of neptlve
(phyalca.1ly bnpoeaible) K values. There I.e Been to be a subminimum at V = 8·2""
and K = -8.798, but this correaponda to S = 76"'8, a far W01'll8 fit than the lowest
minimum, S = 4.828.
The contoUl'll marked 10"0 are actually for S = l:V' = 10<&0'''726. For values
of S equal to or greater than this, the contour lines behave curiously.
Digitized by COOS Ie
262 Fitting CUnHl8 § 12.8
There are, in faot, several solutions to the simultaneous 'normal
equations' in this case. t For example, there is another pit at the
point V = 3·244 and K = -3'793, shown in Fig. 12.8.2(b). Although
these values correspond to a minimum in 8, the minimum is merely a
hollow in the mountain side. The value of 8 at this minimum, 764'3,is far
greater than the value of 8 at the least of all the minimums, ',323, as
shown at the bottom of the valley in Fig. 12.8.2(a). H there are several
minimums that with the smallest 8, i.e. the best fitting ourve, corres-
ponds to the least squares estimates. In this case (though not neoessarily
in all problems) all of the subminimums correspond to negative values
of K that are physically impossible and can therefore be ruled out.
There are many methods of finding the least squares solutions (see,
for example, Draper and Smith (1966, Chapter 10), Wilde (1964».
In almost all non-linear problems the solution involves suocessive
approximations (iteration). The procedure is to make a guess at the
solution and then to apply a method for correcting the guess to bring
it nearer to the correct solution. The method is applied repeatedly
until further correotions make no important difference. The final
solution should, of course, be independent of the initial guess. Geo-
metrically, the initial guess corresponds to some point on Fig. 12.8.2
(say V = 10, K = 2 for example). The mathematical procedure is
intended to proceed by steps down the valley until it reaches (suffioiently
nearly) the bottom, which corresponds to V = , and K = k. One
method, whioh sounds intuitively pleasing is to follow the direclion 0/
8teepea'de8cent (which is perpendicular to the contours) from the initial
guess point to the minimum. However inspection of Fig. 12.8.2 shows
that the direction of steepest descent often points nowhere near the
minimum. Furthermore if the search for the minimum is started in the
precipitous terrain shown in Fig. 12.8.2(b), or if this region is reached
at some time during the search, the direction of steepest decent may be
completely misleading. Although this and other sophisticated methods
(see, e.g Draper and Smith (1966, Chapter 10» have had muoh success,
many people now favour simpler search methods whioh seem to be
rather more robust (see Wilde 1964). One such method whiohhasproved
useful for ourve fitting (Hooke and Jeeves 1961 ; Wilde 1964; Colquhoun,
1968, 1969) will now be described.
t There is alIIo, in general, the JK*l'biJity of • aaddle point or mountain pa. when •
minimum in the plot of 8 against one parameter coincides with • maximum in the plot
of 8 apinat the other. Such. point alIIo aatiaftes the normal equationa becauae both
derivatives are SIIl'O.
Digitized by Google
§ 12.8 263
PalmuearM minimization
In Table 12.8.2 a computer program (an Algol 60 prooedure) is
12.8.2
TABLB
Digitized by Google
2M § 12.8
Min: = junction (bp); et1CIl: = 1;
GO ON: for i: = 1 . p 1 until k do "pCi]: = bp[i];
T BY: etII1pltwe;
If laU. = k tbeD
..... for i: = 1 . , 1 until k do
If aba(.c.p{i]) ~ critlUp[i] tbeD goto CaNT;
-;
gotoBXIT;
CaNT: for i: = 1 . p 1 until k do IIlep [i]: = red/ad[i]x.c.p{i];
goto TBY
-;
If mi" < minIIore tbeD
..... fori:= 1 . . 1 until k do mow(i]: = "pCi]-bp[i];
BXIT: _
fori:= 1 .... 1 untilkdolfaba(mow(i]) > epa tbeD goto PATTERNING
given that can be used to minimize any function, i.e. that will find
the values of the k variables (in the present example k = 2 variables,
viz. K and V) required to make the function (in the present example,
S given by (12.8.2» a minimum. The procedure was written by Bell
on the basis of the work of Hooke and Jeeves (1961). The procedure
ata.rts from the initial guess (basepoint) by trying steps (of speoified
size) in each variable to see whether the function is reduced. The
size of the reduction is not taken into account. When a suocessful
pattern of moves has been found it is repeated, the step size increasing
while the moves are successful (i.e. while they reduce the funotion
value). When the function cannot be decreased any further the step
size is reduced (by a specified factor) and a further exploration carried
out. When the steps fall below a specified size the search terminates on
the assumptions that a minimum has been found. Further details are
given by Wilde (1964).
Of course, if the surface has several pits patternsearch will locate only
one of them, which one depending on the initial guess, step sizes,
etc.
A typical procedure for calculating values of the function is shown in
Table 12.8.3. It calculates the sum of square deviations (eqn. (12.8.2»
for fitting the Michaelis-Menten equation. It incorporates a simple
Digitized by Google
§ 12.8
device for preventing the searoh venturing into the oraggy (and physically
impoaaible) region of negative V and K values.
When the paltt;m,8earcA program was used for fitting the Michaelis-
Menten curve to the results in Table 12.8.1 a minimum of 8 = 4·32299
r
was found at = 31·45004 and 8.. = 15·89267 after 215 evaluations of
8 (from Table 12.8.3) with various trial values of V and K. In this case
the initial guesses, bp in Table 12.8.2, were set to V = 2·0 K = 50·0,
TABLB 12.8.3
An Algol 60 procedure lor calculating the lunction to be minimized lor
.fitting the Mic1wuliB-Mente1& equation. The arraya containing the n
obaertJationa, 1/ [1 :n], and the n aubatrate concentrationa, z[1 :n], are
declared and read in belore calling pattemaearch. 11 the Boolean tHJriable
conatrained i8 aet to true the 8«Jrch i8 reatricted to non-negative valtu8 01
VandK
real pI'OOIIIIure fu.neCion (P); real am)' (P);
..... lDtepr;; real 8, K, V, Ycalc;
8:=0;
If eonatra..... &hen for;: = I, 2 do If l'r;] < 0 &hen PIj]: = 0;
V:= P[l]; K:= P[2];
for;: = 1 Idep 1 untiU " do
..... Ycalc:= VXa(;1/(K+z{;]);
8:= 8+(W]-Ycalc) 2 t
ead;
fundion:= 8
ead 0/ function;
step sizes were 1·0 for both V and K, reduction factor was 0·2 for both
V and K, and crit8tep was 10 - a for both V and K. Patternf'actor was
2·0. In another run, the same except that patternf'actor was set to
1·0 virtually the same point was reached (8 = 4·32299 at = 31·45019 r
and k = 15·89286) after 228 evaluations of 8-not quite as fast. If
the initial guesses were V = 1·0, K = 2·0 then again the virtually same
minimum (8 = 4·32299 at r
= 31·45018 and k = 15·89283) was
reached after 191 trial evaluations of 8. On the other hand if the initial
guesses are V = 2'5, K = -3·8 and the step sizes 0·01 then the program
locates the subminimum (8 = 764·299 at V = 3·2443 and K =
-3'793) shown in Fig. 12.8.2(b), if not constrained.
Digitized by Google
266 § 12.8
set of simultaneous equations (linear or non-linear). If the n equations
are denoted/,(zl'''''z,,) = 0 (i = 1, ... ,11.) then the values of z correspond-
ing to the minimum value of "i:./l (which will be zero if the equations
have an exact solution) are the required solutions.
Digitized by Google
§ 12.8 267
reciprocal plot (or Lineweaver-Burk plot). This depends on rearrange-
ment of (12.8.1) into the form
211
]011
!I
Hi
10
1011
30
10 20 311 40
]00
z
FlO. 12.8.S. Double reciprocal (or Lineweaver-Burk) plot (1/11 against
1/#e) for the 'observations' in Table 12.8.1. See also Table 12.8.4.
o· Observations'.
- - Straight Une fitted (see text) to ·observations'.
Intercept = 100/V = 100/22·58.
Slope = K/V = 8'16/22·58.
• • •• True Une corresponding to population mean velocities in
Table 12.8.1 (i.e. .y = SO ~ = 15, see Table 12.8.4).
Intercept = 100/S0 = S·SS.
Slope = .y/~ = 0·5.
Digitized by Google
.. --~~--=-:.. . --
(12.8.4)
TABLE 12.8.4
V K
True population value 80·00 15·00
Leaat aquarea estimate 81-45 15'89
Lineweaver-Burk
estimate (eqn. (12.8.8» 22·58 8·16
" agaiDat ,,/IID estimate
(eqn. (12.8.4» 25·76 10'18
Digitized by Google
§ 12.8 269
and known to have a standard deviation aCy) = 1·0 at every conoentra-
tion (i.e. the 'oheervations' were hom08oedastio--eee Fig. 12.2.3). The
'oheervations' were generated using computer methods. The oheen-a-
tions are thus known to be unbiased (their population means, p, are
,,
30 -"1'"=30
,,
\
- 1'=25·76
\ ,,
,,®
,
20 " ,
~
,,
!I 15
,,
",
True,
10 ,,
,
®,,
\
5
\ ,,
,,
\ ,
'!lIz
Flo. 12.8.4. Linearized plot using 11 aga.inat 1I1~.
known to lie exactly on the caloulated curve in Fig. 12.8.1) and, unlike
what happens in any real experiment, their distribution and population
means and standard deviations are known. Seven hundred and fifty
Digitized by Google
§ 12.8
such 'experiments' were performed, and from each 'experiment'
estimates of V and K were calculated by five methods (three of which
have been mentioned above). The resulting 750 estimates of V and K
were grouped to form histograms. The distributions 80 obtained of the
sssss,sssssssssss Df V are shown for three methDd" eetimation.
True valm:
r-
300
y against y/x
-
-
H
15 e;s 60 75
;., 150
I/y against Ilx
c"
ill
:>
a-
t:
~
60 75
Least,
60 75
(sf V
gle
§ 12.8 The relati0f&8hip between two variahlu 271
The distributioll8 of estimates of K are similar, which is expected in
the light of the finding that the estimates of Y and K are highly corre-
lated, i.e. experiments that yield an estimate of Y that is too high tend
to give an estimate of K that is too high also, whichever method
of estimation is used. Inspection of Fig. 12.8.5 shows that in this
particular case (the p. values shown in Table 12.8.1 and Fig. 12.8.1, with
normally distributed homosoedastic observatioll8) the method of least
squares is in fact the best of the three methods. The LS estimates are
more closely grouped round the population value (Y = 30'0) than the
estimates found by the other methods (i.e. they have the smallest
variance), and the average value of the LS estimates (viz. 30.4) is close
to the population value (i.e. they have little bias).
By comparison the Lineweaver-Burk method is olearly terrible-
the scatter of estimates being very much greater (near infinite estimates
will be obtained when the plot in Fig. 12.8.3 goes nearly through the
origin giving l/Y ~ 0, and these distort the average estimate 80 muoh
that no realistio estimate of the bias is possible).
The plot of y agaill8t ylx falls in between these extremes. In spite of
breaking the rules for fitting straight lines by having error in the
quantity (ylx) plotted along the abscissa, the estimates are obviously
much less variable than those found by the Lineweaver-Burk method
(their standard deviation is only about 28 per cent greater than that of
the LS estimates in this case). The estimates from the y vs.ylx plot
are, however clearly coll8istently too low-they have a negative bias.
The average of all 750 estimates is 28·0, well below the population
value of 30·0, and about 73 per cent of estimates are too low (i.e. below
30'0). This bias is purely a property of the method of estimation. In these
simulated experiments the observations themselves were known to be
completely unbiased (a similar situation was seen in the case of the
standard deviation, see § 2.6 and Appendix 1). In real life there tJJO'Uld
in addition be Bome unknown amount of biaB in the obBervationB themBel1J68
(see §§ 1.2 and 7.2).
If, as is usually the case, experiments are repeated several times,
bias would be considered a more serious problem than large variance.
This is because the variance of an estimate can always be reduced by
doing a large enough number of experiments, whereas bias remains
however many experiments are averaged, and there is no way of
detecting the presence of bias from the results of repeated experiments.
These results are only valid for the particular conditions under which
they were obtained. In fact different results are obtained if the errors
19
Digitized by Google
~ ~~~-----
§ 12.9 273
especially in social and behavioral scienoes, when often it is not
possible, or thought not to be possible, to do proper experiments (see
Chapter I), two (or more) variables are measured, neither (or none)
of which can be fixed by the experimenter, or assigned by him to
particular individuals. Results of this sort are far more difficult to
interpret, and therefore far less satisfactory, than the results of proper
experiments as discussed in Chapter I, but they are sometimes un-
avoidable.
Examples of the sort of questions usually treated by correlation
methods are (a) do people with good scores in school exams also have
Is) (d)
If) (hi
o o
o o
o o
o o o
,.=0'60 o o '8=0·60 '.= +0·09
r =0·79
o , =()o,I;) , = -0·01
_, , I I ,
high scores in university exams' (b) are people who smoke a lot of
cigarettes more likely to die of lung cancer than those who smoke few'
(c) do parts of the country that have a large number of doctors per
1000 of population have more or fewer working days lost because of
illness than less well supplied areas' and so on. In each of these cases
there are two sets of figures (e.g. school and university exam scores for a
n~mber of people) which can be plotted on a graph or scatter diagram
like those in Fig. 12.9.1. The tendency of one variable to increase (or
decrease) as the other variable increases can be measured by a correlation
Digitized by Google
274 § 12.9
coeJ1icient. There are many different sorts of correlation coefficient, of
which two will be described briefly. For detailed descriptions of correla-
tion methods see, for example, Guilford (1954).
H a correlation is observed between two variables (A and B say),
and if it is large enough for it to be unlikely that it arose by chanoe,
then it can be concluded that
either (1) A causes B,
or (2) B causes A,
or (3) some other factor, directly or indirectly, causes both A and B,
or (4) an unlikely event has happened and a large correlation has
arisen by chanoe from an uncorrelated population (see § 6.1).
Digitized by Google
§ 12.9 275
oorrelation and a value of -1 oorresponds to perfect negative oorrela-
tion (y deoreasing as z increases). However, what is meant by 'perfect
oorrelation' is not the same for different coefficients (see Fig. 12.9.1).
In the case of the Spearman coefficient it means that the ranking of
individuals is the same for both criteria. As an example take the N = 6
pairs of obaervations shown in Table 12.5.1. These were analysed by
regression methods in § 12.5. They are reproduced in Table 12.9.1,
in which the ranks of the z and of the y values are given, and also
de = difference between ranks for the ith pair of obaervations. In this
case one variable might be a measure of the rarity of doctors in the ith
area. and the other variable a measure of the number of working days
lost through illness in that area.
TABLJI 12.9.1
pair rank rank
no. (i) e. If. ole. of If. fit df
1 160 59 1 2 -1 1
2 165 54 2 1 +1 1
S 169 64 S S 0 0
4: 175 67 4: 4: 0 0
5 ISO 85 5 6 -1 1
6 188 78 6 5 +1 1
where l:d2 is the sum of the squares of the differences in rank. for each
pair of observations (as shown in Table 12.5.1) and N = number of
pairs. From Table 12.5.1. N = 6 and l42 = 480
6x4
rs =1 6(36-1) = 0·886.
This is a leBS than perfect positive oorrelation, as expected. If the ranks
for y had been exactly the same as those for z. all the differences,
Digitized by Google
276 § 12.9
tI., would have been zero, 80 it is obvious from (12.9.1) that rs would
have been + 1. H the ranks for 11 had. been in exactly the opposite order
to the ranks for 11 then rs would have been -1. And that is about all
that can be said. In no 8e1Ule does a correlation coefficient (of any sort)
of 0-886 mean '88·6 per cent perfect correlation', and olearly rs does
not meaaure the slope of the line when the observations (or the ranks)
are plotted against each other &8 shown in Fig. 12JU, &8 rs can only
vary between + 1 and -1. Some examples of the Spearman and
Peanon (see below) correlation coefficients calculated from partioular
II8t8 of observations are shown in Fig. 12.9.1. to give an idea of their
properties. It is obvious from this figure that far more information is to
be gained from plotting the graph than from calculating a correlation
coefficient.
Tau. Small numbers ofties can be given average ranks &8 in Chapters
8-10. For a description of the corrections necessary when there are
many ties see, for example, Siegel (1956).
,= rJ(N-2)l-r2
(12.9.2)
Digitized by Google
§ 12.9 277
and refer the value of t found to tables (described in § 4.4) of Student's
t distribution with N -2 degrees of freedom. Equivalently, when N > 8,
rs can be referred to tables (e.g. Fisher and Yates, 1963, Table VII) of
TABLE 12.9.2
Oritical mlue8 0/ rs. 1/ the ob8ertJed rs (taken as poritive) i8 equal to or
larger than the tabulated mlue then P(two tail) i8 not more than the BpBCiJied
mlue. Reproduced from Mainland (1963), by permission of author and
publisher.
cov(y,%)
(12.9.3)
- v'[var(lI).var(%)]'
The second form follows from the definition of variance and covariance
«2.6.2) and (2.6.6». It was shown in § 2.6 that the covariance measures
the extent to which 11 increases &8 % increases. Pearson's r will be 1
t It ill actually . .umed that III and 11 fonows • bivariate normal diIItribution (_. for
eumple. Mood and Graybill (1963). p. 198).
Digitized by Google
278 Correlation § 12.9
(or -1) only if the points lie exaotly on a 8traight line as shown in
Fig. 12.9.1. The relationship between x and y may be perfeotly pre-
dictable and yet have a low correlation coefficient if the relation is not
a straight line, as illustrated in Fig. 12.9.1 (0), (d), and (g). The informa-
tion to be gained from r is therefore limited.
Using the results in Table 12.5.1 and Table 12.9.1 as an example
once again, r can be estimated easily because the sums of squares and
produots have already been caloulated in § 12.5. Inserting their
values in (12.9.3) gives
lHI-833
r = v'(526.833 X 682'833) = 0·853
, = 0'853J[(I~;8:3i)] = 3·27.
Digitized by Google
13. Assays and calibration curves
'11 est vra.i que certaines paroles et certalnea cm-m:nomea auftlaent pour faire
p~rir un tropeau de moutons, pourvu qu'on y ajoute de l'a.rsemc.'t
VOLTAIRB 1771
(Questions sur "Ef&C1/clop«lu: 'Enchantement')
Digitized by Google
280 A88aY8 and calibration curvea § 13.1
those who, like me, prefer to have the argument laid out in words of one
syllable.
The experimental designs according to whioh the various concentra-
tions of standard and unknown substance can be tested are disoussed
at the end of this section.
All the methods to be disoussed involve the assumption, whioh may
be tested, that the relationship between the measurement (y, e.g.
response) and the concentration (x) is a straight line. Some transforma-
tion of either the dependent variable, y, or the independent variable, x,
may be used to make the line straight. The effects of suoh transforma-
tions are discussed in § 12.2. In biological assay the transformed response
is called the reaponae metameter (Le. the measure of response used for
caloulations) and the transformed ooncentration or dose is oalled the
doae metameter. Of oourse the response metameter may be the response
itself, when, as is often the case, no transformation is used.
Furthermore, all the methods to be discussed assume that the
standard and unknown behave as though they were identical, apart
from the concentration of the substance being assayed. Suoh assays are
called analytical dilution atlIJaY8. When this condition is not fulfilled
the assay is oalled a comparative a88ay. Comparative assays occur
when, for example, the concentration of one protein is estimated
using a different protein as the standard, or when the potenoy of a
new drug relative to a different standard drug is wanted. (Relative
potency means the ratio of the concentrations or doses required to
produce the same response.) One difficulty with oomparative assays is
that the estimate of relative concentration or potency may not be a
oonstant, i.e. independent of the response level chosen for the oomparison,
so when a log dose scale is used the lines will not be parallel (see below).
Oalibration curvea
Chemical assays are often done by constructing a calibration curve.
plotting response metameter (e.g. optical density) against concentration
of standard. The ooncentration corresponding to the optical density
(or whatever) ofthe unknown solution is then read off from the calibra-
tion curve. This sort of assay is discussed in § 13.14.
Digitized by Google
§ 13.1 281
For example the tension developed by a musole, or the fall in blood
pressure, is measured in response to various concentrations of the
standard and unknown preparations. A8says based on oontinuous
responses are discussed in this ohapter. Sometimes, however, the
proportion of individuals, out of a group of ft, individuals, that produced
a fixed response is measured. For example 10 animals might be given a
dose of drug and the number dying within 2 hours counted. This
response is a discontinuous variable-it can only take the values
0, 1, 2, . . . , 10. The method of dealing with suoh responses is con-
sidered in Chapter 14, together with olosely related direct tJ86ay in
whioh the dose required to produce a fixed response is measured.
One of the assumptions involved in fitting a straight line by the
methods of Chapter 12, discussed in § 12.2, is the assumption that the
response metameter has the same scatter at each x value, i.e. is homo-
soedastio (see Fig. 12.2.3). This is usually assumed to be fulfilled for
assays baaed on continuous responses (it should be tested as described
in § 11.2). In the case of discontinuous (quantal) responses there is
reason (see Chapter 14) to believe that the homosoedastioity assumption
will not be fulfilled, and this makes the calculations more complicated.
Digitized by Google
........--=-...
~::..- ----_. -
282 § 13.1
the response should be the same (zero, or control level) when the
dose of either the standard or unknown preparation is zero. The
straight line for standard can be written Y8 = a+b8 %8' where b8 is
the slope, Zs the dose (amount) of standard, and a the response to
zero dose (Zs = 0); similarly for the unknown Yu = a+buZu. the
response to zero dose being a, &8 for the standard. When Y8 = Yu
it follows that a+b8 z8 = a+buZu so the potency ratio, from (13.1.1),
is B = Za/Zu = bulbs, the ratio of the slopes of the linea, &8 illustrated
in Fig. 13.1.1(a). An assay in which the abscissa is the dose or amount
of substance is therefore called a 8lope ratio aBBall (e.f. § 13.14). This
sort of assay is described in detail by Finney (1964).
~
.. .
:§
.,
~
~ E
(a) (b)
j. oCI
"i:
E
E il
~
J Standard
~
y
I (slope=bs )
I
I
I
I
I
I
I
-.
I
0 :tv .,.
Xu = log :tv X8 = log :8
Dose or amount (:)
log dOle (x)
FlO. 13.1. (a) Slope ratio &ll8&y. RespoDSe metameter plotted aga.inst dose.
(b) ParaUelline &ll8&y. ReepoDSe metameter plotted against log dose. See text for
discusaion.
Digitized by Google
§ 13.1 AB8aYB and calibration Cun1ea 283
must also be a constant. This will be 80 whether or not the lines are
straight (the argument has not involved the assumption that they are),
but when they are straight it implies that they will be parallel. .Assays
in which the abscissa is on a logarithmic scale are therefore called
parallel line auaYB. The reaaon for using a logarithmic dose scale is to
produce a straight line. Parallelism is a COta8equmc6 of using the log-
arithmic scale (see § 12.2 also). Another ~ of using the
logarithmic dose scale is that the ratio between doses is usually kept
constant 80 that the interval between the log doses will be constant.
The spacing of the doses is, of course, a consequence of using a log-
arithmic scale, and not a reason for using it as is 80metimes implied.
Furthermore; the range covered by the doses has nothing to do with
scale chosen. A wide range can be accommodated just as easily on an
arithmetic scale as on a logarithmic scale.
A similar situation arises in pharmacological studies when the log
dose-response curve is plotted in the presence and absence of a drug
antagonist. The parallelism of the lines can be tested as described in
the following sections. H they are parallel the potency ratio can be
estimated. In this context the potency ratio is the ratio of the doses of
drug required to produce the same response in the presence and absence
of antagonist, and is called the doae ratio.
The rest of this chapter, except for § 13.14, will deal with parallel
line assays with continuous responses. Sections 13.2-13.10 deal with
the theory and numerical examples are worked out in §§ 13.11-13.16.
Digitized by Google
284 A88atl8 and calibration. CUf'V68 § 13.1
hypothesis that the slope of the response-log dose curve is zero is
tested. Obviously the assay is invalid unless it can be rejected. PoBBible
reasons for an increase in dose not causing an increase in response are
(a) insensitive test object; (b) doses not sufficiently widely spaced, or
(c) responses allsupramaximal.
(2) For difference between standard and unknown preparations, i.e.
is the average response to the standard different from that to the
TABLE 13.1.1
Number of doees of
Std (le.) Unknown
(lev)
Digitized by Google
§ 13.1 ABBa!!B and calibration cu",u 286
tested. H this hypothesis is rejected the assay must be oonsidered in-
valid. In an analytica.l dilution assay the most probable cause of non-
parallelism is that one of the preparations is off the linear part of the
log dose-response curve. This is shown in Fig. 13.1.2.
(b)
(a)
,
.
~
~
.
~
~
w
e ew
Sw ~
e e
l l
J ~
FlO. 13.1.2. Apparent deviations from pa.ralleHsm can reault when some
doeee are not on the straight part of the dose response curve, 88 shown in (b),
even when the horizontal dlatance between the two curves ia constant.
o Observations.
- - - Straight line fitted to observations.
- - True response-log dose curve.
Digitized by Google
286 Assays and calibration CU.rv68 § 13.1
For example in a (3+3) dose assay there are 6, different solutions,
each of whioh is to be tested several (say 17,) times. The 6n tests may be
done in a completely random fashion as described in § 11.4. If each dose
is tested on a separate animal this means allocating the 617, doses to
617, animals strictly at random (see §§ 2.3 and 11.4). Often all observa-
tions are made on the same individual (e.g. the same spectrophotometer
or the same animal). In this case the order in which the 617, tests are
done must be strictly random (see § 2.3), and, in addition, the size of a
response must not be influenced by the size of previous responses (see
disoussion of single subjeot a.ssa.ys below).
If, for example, all 617, responses could not be obtained on the same
animal, it might be possible to obtain 6 responses from each of 17,
animals, the animals being block8 as described in § 11.6. Examples of
a.ssays based on randomized block de8i(fIUJ (see § 11.6) are given in
§§ 13.11 and 13.12. A second source of error could be eliminated by
using a 6 X 6 Latin square design (this would force one to use 17, = 6
replicate observations on each of the 6 treatments). However it is
safer to avoid small Latin squares (see § 11.8).
If the natural blocks were not large enough to acoommodate all the
treatments (for example, if the animals survived long enough to receive
only 2 of the 6 treatments); the balanced incomplete block design could
be used. References to examples are given in § 11.8 (p. 207).
The analysis of a.ssays based on all of these designs is done using
Gaussian methods. Many untested a.ssumptions are made in the analysis
and the results must therefore be treated with caution, as described in
§§ 4.2, 4.6, 7.2, 1l.2, and 12.2. In particular, the estimate of the error
ef the result is likely to be too sma.ll (see § 7.2).
Digitized by Google
113.1 287
(1969). If the doeea have to be well separated in time to prevent inter-
action it may not be poaaible to give all the treatments to one subject,
80 an incomplete block design may have to be used (see § 11.8 and; for
example, Colquhoun (1963». The problem is discussed by Finney
(1964, p. 291).
13.2. The theory of paralle. line auays. The respon.. and do..
metamete,.
Digitized by Google
288 § 13.2
If the low doses of standard and unknown preparations are ZLS and
~U then, by definition of D, the high doses will be
Zas = Dzu" and Zau = ~u. (13.2.2)
The most convenient base for the logarithms is v'D. This looks most
improbable at first sight, but the reason why it is 80 will now be shown.
Taking the logarithms to the base v'D of the doses (remembering that
10g.mD = 2 whatever the value of D, because the log is defined as the
power to which the base must be raised to give the argument) gives,
from (13.2.1) and (13.2.2),
XLS = log .mZLB (13.2.3)
XBS = 10g.mZBB = log .m(Dzu,)
= log .roD+log NLS
= 2+XLS' (13.2.4)
Similarly, for the unknown, ~U = log~u, Xsu = 2+xLu.
The mean value of the log dose for the standard preparation, if the
high and low doses are given an equal number of times (n.), will be,
using (13.2.4)
_ 'nXLS+'nXBS
XS=
2n. } 13.2.6)
and
Combining these results with (13.2.4) gives
(xas-is) = +1,
(XLS-iS) = -1,
and similarly )<13.2.6)
(xBu-iu ) = +1,
(XLU-iU) = -1.
Using logs to the base v'D has made (x-i) takes the value + 1 for
the high doses (of both standard and unknown), and -1 for the low
doses. This means that (X_i)2 = 1 for every dose; and since there are
iN doses of standard and iN doses of unknown, it follows from (2.1.7)
that
(13.2.7)
where the summations are over all iN doses of standard (or unknown).
Thus the total sum of squares for x, pooling standard and unknown, is
s.u
I I(x-i)2 =l:(Xs-is)2+l:(XU-iU)2 = N, (13.2.8)
Digitized by Google
§ 13.2 ABBayS and calibration curvu 289
where the symbol I means 'add the value of the following for the
s.v
standard to its value for the unknown' (as shown in the central expres-
sion of (13.2.8». The sums of squares are greatly simplified by using
logs to the base VD.
XSl = lognZsl' )
Xu = lognZslI = logD(Dzsl ) = logDD+lognZsl = I+XSl, (13.2.10)
Xsa = lognZsa = logDDli+lognZsl = 2+XSl'
The mean standard dose, if ea.ch dose level is given the same number
of times (11.), will be, using (13.2.10),
}I3.2.11)
and
Combining this with (13.2.10) gives, for the standard
(XSl -xs) = -I
(XS2-XS) = 0 )(13.2.12)
(XS3 -xs) = + 1
and similarly (xu-xv) = -1,0, +1 for low, middle, and high doses of
unknown.
Because the &88&y is symmetrical (see §§ 13.1 and 13.8.1) each dose
is given the same number of times, n. The total number of observations
is N = 6n and the number of standa.rds is ns = 311. = iN, and of
unknowns no = 311. = iN. Now (X_X)lI = + 1 for all high and low
doses, and 0 for all middle doses 80
l:(XS-xs )2 = l:(XU-xv)lI = iN (13.2.13)
Digitized by Google
- -----------
290 § 13.2
the summations being over all iN doaea (n low, n middle, and n high)
of standard or unknown. The total sum of squares for z, pooling stand-
ard and unknown, is
Digitized by Google
§ 13.3 291
When the response is the same for ata.ndard and unknown Ys = Yv, so
these can be equated giving
tis+b(z~-fs} = tiv+b(z~-fu},
where z~ and z~ are the log doaes giving equal responses as above.
Rearranging this to give M = z~-z~, from (13.S.2), gives the result
M = Iog R '
= zs-zv, = (- -)+ (Iv-tis)
zs-zv b· (13.3.4)
Digitized by Google
292 § 13.3
Therefore, multiplying (13.3.4) by the conversion factor, loglo r, gives
- - )+(YU-YS)]
IoglO R = [(Zs-Zu b • Ioglo'"· (13.3.7)
(13.4.1)
where the weights are the reciprocals of the variances of the slopes
(see § 2.5). The estimated variances of the individual slopes, by (12.4.2),
are
(13.4.2)
where r[y] is, as usual, the estimated error variance olthe observations
(the error mean square from the analysis of variance).
Digitized by Google
§ 13.4 293
Now in general the variance of the weighted mean 9 = "£.wtYJ"£.w,
will be given by (2.7.12) as
1
var(g] = - . (13.4.3)
"£.w,
Taking Ws = l/var[bs ] and Wo = l/var[bo] from (13.4.2), and insert-
ing the estimate of the slope from (12.2.7) gives
~(xs _XS)2 ~Ys(xs -xs) "£.1Is(xs -xs)
wsbs = r[y] . ~(xs-xsr.l = r[y] (13.4.4)
and similarly for unknown. Inserting these results in (13.4.1) gives the
weighted average slope
~ (xs-xs
"'1Is - )+~
"'Yo (xo-xo
-) IIY(x-x)
so
6 = ~(xs-XS)2+"£.(XU-XU)2 = =i=-I--:-(-x-_-X)-2 (13.4.6)
s.o
where the symbol I means, as before, 'add the value of the following
s.o
quantity for the standard to its value for the unknown'. In other
words, the average slope is simply (pooled sum of products for 8 and U)/
(pooled sum of squares of x).
For symmetrical assays it was shown in § 13.2 that ~(X_X)2 is the
same for standard and unknown so, from (13.4.2), the weights are
equal and the two slopes (6s and bu) are simply averaged.
From (13.4.2) and (13.4.3) it follows that the variance of the average
slope is, in general, estimated as
1 8 2[y] r[y]
var[b] =~
...w =~( - 2 ~( -
... xs-xs) + ... xo-xo )2= ~~
~~(x-x)·
_ft (1346)
..
s.o
(compare this with (12.4.2». It is, of course being assumed that the
variance of the observations, r[y], is the same for standard and un-
known as well as for each dose level-see §§ 11.2, 12.2, and 13.1.
Digitized by Google
294: § 13.G
distributed, their ratio is not. Therefore, as disOU88ed in § 7.G, the
methods disc1188ed so far cannot provide oonfidence limits for the
ratio. A solution of the problem will now be described.
The simplest application of the result is to find the oonfidence limits
for the ratio (= m, say) of two means (see § 14:.1), the problem dis-
0U88ed in § 7.G. It is shown below that if g (eqn (13.5.8» is very sma.ll
oompa.reci with one, so (I-g) ~ I, the result of using Fieller's theorem
is the same as the approximate result, m±tV[va.r(m)], where va.r(m)
is given, approximately, by (2.7.16).
The theorem is needed to find oonfidence limits for the value of
the independent variable (z) neoessa.ry to produce a given value of the
dependent variable (y) as disOU88ed in § 12.4:. A numerical example of
this 'oa.libration curve problem' is given in § 13.14:. The oonfidence
limits for a potency ratio are also found using Fieller's theorem.
Before oonsidering a ratio, the argument of § 7.4: leading to oon-
fidence limits for a single Gaussian variable, 11, will be repeated in a.
rather more helpful form. If 11 is normally distributed with population
mean p and estimated variance r then lI-P is normal with population
mean = p-p = 0 and variance 8~, so, as in § 4:.4:, t = (y-p)/r. As in
§ 7.4: the 100 IX per cent oonfidence limits for the value of p are baaed
on Student's t distribution (§ 4:.4:) which implies
P[ -tB ~ (y-p) ~ +"'1 = IX (13.5.1)
or, in other words (see § 11.3, p. 182),
P[(y-p)~ ~ t~r] = IX. (13.5.2)
The deviation (y-p) will border on significance when it is equal to
-tB or +tB, i.e. when
(y-p)~ = t~r.
Digitized by Google
§ 13.5 295
fIOf'fI1lIllll diBtribvled variables (with population meaDS IX and Il). The
variances of a and b must be specified and this will be done using a
new notation. This notation is based on the fact that not only the
variances but also the oovariances (in analysis of variance problema
that are linear in the general sense discussed in § 12.7) can be expresaed
as multiples of the error variance of the observations, r[y] (as usual
this is the error mean square from the analysis of variance). For ex-
ample, the variance of a mean g, is, by (2.7.8), 1/"" times the error
variance. Similarly the variance of a slope, b, is, by (12.4.2), 1/l'.(z-z)1J
times the error variance. If these multiplying factors are symbolized t1
then one can define
var[a] = t1ur,
var[b] = t1~, }(13.&.3)
oov[a, b] = t11~'
r
where is written for r[y]. The subscripts distinguishing the t1af'iGnca
mulhplw8, t1, are arbitrary (of. § 2.1), but the notation used emerges
naturally from a more advanced treatment, and is that used in Finney
(1964), who disousses Fieller's theorem and two of ita extensions. For
example, if a was a mean, g, then tJu = 1/"" as above.
Since a and b are normally distributed and I' is a oonstant, the
variable (a -ph) is a linear function of normal variables, and is therefore
normally distributed. The population mean of (a-ph) will be IX-I'll = 0
and ita estimated variance will be, using (13.5.3), (2.7.2), (2.7.5), and
(2.7.6),
(13.5.5)
And, again by analogy, the 100 IX per cent oonfidence limits for I' are
found by solving for I' the equation
(13.5.6)
Digitized by Google
296 § 13.5
This is again a quadratic equation in p and when solved for p by the
usual formula (see above) the two solutions are the required oonfidence
limits for p. They are
1 [ m-(/1J12
(I-g) tBJ{v11 -2mv12+ m2v22-g ( v11 - V122)}]
v22 ±'b V22 (13.5.7)
where
t 2"22
g=~. (13.5.8)
(1 m 9)±b(I'.:..9)J[(1-9)V11+m2v22]. (13.5.9)
(13.5.11)
Digitized by Google
§ 13.5 297
If a and b are uncorrelated (VIi = 0), &8 well 9 ~ 1, then the confidence
limits for p ca.n again be found &8 m ± tv'[var(m)], the approximate
expression for va.r(m), (13.5.11), simplifying even further to
,,(va.r(a) Var(b»)
va.r(m) ~ m ~+~ iO
r
== bi(vu +mivii). (13.5.12)
13.8. The theory of parallel line assays. Confidence limits for the
potency ratio and the optimum design of assays
This discU88ion applies to any parallel line &88ay. The simplifications
possible in the case of symmetrical &88&Ys (see § 13.1) are given later in
§ 13.10, and numerical examples in § 13.11 onwards.
The logarithm of the potency ratio (R) is M = log R &8 in § 13.3. It
will be convenient to rearrange the formula for the potency ratio,
(13.3.4), to give
- -)
M - (xs-xu (iiu-iis)
= . (13.6.1)
b
The term (xs -xu) ha.s zero variance because x is supposed to be me&8-
ured with negligible error (see §§ 12.1, 12.2, and 12.4), and 80 can be
treated &8 a constant. The approach is therefore to find confidence
limits for the population value of (iiu-iis)/b and then add the constant
(xs -xu) to the results. Now if the observations are normally distributed
then 80 are (fiu-fis) and (as explained in § 12.4) the average slope, b.
The right-hand side of (13.6.1) is therefore the ratio of two normally
distributed variables, a.nd confidence limits for it can be found using
Fieller's theorem (§ 13.5).
The variance multipliers defined in (13.5.3) are required first.
From (2.7.3) and (2.7.8) it follows that va.r[fiu-iis] = ,i[y]/nu+r[y]/ns
and therefore
V
11
= (~+~)
nu ns
(13.6.2)
Digitized by Google
113.6
where "0 and "s are the numbers of responses to the 1IJlImown and
standard preparations. The variance of the average slope. b. in the
denominator. is. from (13.4.6). var[b] = ru,]/ I I(z-%)2 80
•• U
1
"22 = ",,(z-z/
~~ -\2' (13.6.3)
••U
<Iu-j.)/b ±
(I-g)
I
b(I-g)
J[(I-g)(~+~)+ <Iu-j.)2J~.
"0 "s II(z-%)2 (13.6.4)
••U
9 = 68II(z-%)2' (13.6.5)
••U
From (13.6.1) it follows that (13.6.4) gives the confidence limite for
11 -(%.-%u). so the confidence limite for the log potenoy ratio. 11. are
(%.-%u>+[13.6.4].
To find the confidence limite for the potenoy ratio itself the anti-
logarithms of these limite are required. Now. &8 discussed in 113.3. the
calculations are often carried out not with logarithms to base 10 of the
dose. but with some other convenient base, say,. In this case 11
= 10grB. and. &8 explained in 113.3. it is necessary to multiply by
10gI0" to convert to logarithms to base 10. before looking up the anti-
logarithms. The confidence limits for true value of log10B are thus
Digitized by Google
§ 13.6 A88aYB au caZibrtDioR CUn168 299
8impli.ftcation oll1&e calculation lor good a88aYB
If the 810pe of the log dose-response line, b, i8 large oompa.red with
the experimental error then 9 will be 8mall (see § 13.5), 80 (I-g) ~ 1.
Inserting this into (13.6.6), together with the definition of M == 1000B
from (13.3.4), giVeB the oonfidenoe limits for 10810B as approximately
(13.6.7)
(13.6.8)
Digitized by Google
300 A88aYB and calibration c'UrtJe8 § 13.6
(4) (1/"0+ 1/",s) should be small. That is, as many responses as
possible should be obtained. For a fixed total number of responses
(1/"'u+ 1/",s) is at a minimum when "'u = "'s so a symmetrica.l
design (see § 13.1) is preferable.
(5) (fiu-gs) should be sma.ll because it occurs after the ± sign in
§ 13.6. That is, the size of the responses to standard and unknown
should be as similar as possible. The a.ssa.y will be more precise
if a good guess at its result is made beforehand.
(6) l:l:(x-i)1 should be large. That is, the doses should be as far
apart as possible, making (x-i) large; but the responses must,
of course, remain of the straight part of the response-log dose
curve.
Digitized by Google
§ 13.7 301
only one of these ways is of interest. Each component must be un-
correlated with all others and this will be demonstrated, in the case of
symmetrical assays, in § 13.8. Three components, eaoh with one degree
offreedom, oan always be separated; (a) linear regression, (b) deviations
from parallelism, and (0) dift'erenoe between standard and unknown
responses, as desoribed in § 13.1. H there are more than 3 degrees of
freedom (i.e. more than 4 'treatments') the remainder can be lumped
together as 'deviations from linearity' (of. § 12.6), as in Table 13.1,
or further subdivided as in §§ 13.10 and 13.12. The analysis thus has
the appearence of Table 13.1 ifthere are k dose levels ('treatments') and
N responses altogether.
TABLE 13.7.1
Degrees of
Source of variation Sum of squares
freedom
Linear regression 1 A
Deviations from parallelism 1 B
Between standard and unknown 1 C
Deviations from linearity k-4 D-(A+B+C)
Total N-l
The bottom part of the analysis would look like Table 11.6.1 or Table
11.8.2 if a randomized blook or Latin square design (respectively) were
used.
(1) Linear regruaion. To test whether the population value of the
slope differs from zero, the appropriate sum of squares (SSD) is, from
(13.4.0) by analogy with (12.3.4),
SSD for linear regression =
[l:ys(xs -xs)+l:yu(xu -xu>r~
(13.7.1)
l:(xs _XS)2+l:(XU-XU)2
(2) Deviations from paralleli8m. To test whether the lines are parallel
it seems reasonable to oaloulate the dift'erenoe between (a) the total
sum of squares for linear regression for lines fitted separately to
standard and unknown (from 12.3.4), and (b) the sum of squares for
linear regression when the slopes are averaged (i.e. (13.7.1», because
this difference will be zero if the lines are parallel. Thus
Digitized by Google
302 § 13.7
SSD for deviations from parallelism ~
D = ratio between each dose and the one below it. The same for all
doses, and for standard and unknown (see also §§ 13.1 and 13.2
and Fig. 13.8.1), so the doses are equally spaced (by log D) on
the logarithmic seale.
Digitized by Google
§ 13.8 A8saY8 and calibration Curt168 303
As usual a hypothesis is formulated. Then the probability that observa-
tions would be made, deviating from the hypothesis by as much &8, or
more than, the experimental results do, il the hypothesis were in fact
true, is calculated (of. § 6.1).
(1) Linear regreuion. From Fig. 13.8.1 it is clear that il the null
hypothesis (that the true value, p, of the average slope, see § 13.4, is
zero) were true then, in the long run, the responses to the high doses
NHI' --------. HU
Slope=bv ~ I Slope=b ?fHS
/) I ~I
.. I Standard I
~so III'
- I
I Slope=bs I
i I A I
~ Slopt'=b? ILS h I
8- 'I I . I
~ ii,.I· --1Ll.r I I
I I
log dOle (x)
(13.8.2)
Digitized by Google
304 A88aY8 and calibration CUnJU § 13.8
even if the null hypothesis were true, and it is shown § 13.9 how to
judge whether Ll is la.rge enough for rejection of the null hypothesis.
(2) DetJiationB from paralleliBm. From Fig. 13.8.1 it is clea.r that if
the null hypothesis (that the population lines are pa.rallel, Ps = Pu, see
§ 13.4) were true then, in the long run, gBU-gLU = gBS-gLS' Therefore
deviations from pa.ra.llelism are measured, as above, by a detJiationB
from paralleliBm ccm.traBt, L; defined as
L; = l:YLs-l:YBS-l:YLU+l:BU'yHU (13.8.3)
Again the population value of L; will be zero if the null hypothesis is
true.
(3) Between Btandartl and unknown preparationB. If the null hypo-
thesis that the population mean response to standard is the same as
that for unknown were true then; in the long run, gLS+gBs = gLU+gau'
Departure from the null hypothesis is therefore measured by the
between 8 and U (or between preparationB) ccm.traBt, L p , defined as
Lp = -l:YLS-l:YBS+l:YLU+l:YBU, (13.8.4)
which will have a population mean of zero if the null hypothesis is
true.
These contrasts are used for calculation of the analysis of variance
and potency ratio, as described below and in §§ 13.9 and 13.10.
The subdivision of a sum of squares (of. § 13.7) using contrasts
is quite a general process described, for example, by Mather (1951) and
Brownlee (1965, p. 517). The set of contrasts used must satisfy two
conditions.
(1) The sum of the coefficients of the contrast must be zero. In
Table 13.8.1 the coefficients (which will be denoted at) of the response
totals for the contrasts defined in (13.8.2), (13.8.3), and (13.8.4) are
summarized. In each case l:at = 0 as required. This means that the
population mean value of the contrast will be zero when the null
hypothesis is true. t
(2) Each contrast must be independent of every other. A set of
mutually independent contrasts is described as a set of orthogonal
ccm.traBt8. It is easily shown (e.g. Brownlee (1965, p. 518» that two
contrasts will be uncorrelated (and therefore, because a normal distri-
bution is assumed, independent, see § 2.4) when the sum of products
of corresponding coefficients for the two contrasts is zero. All results
t In the language of Appendix I, E{L] = E[l4T1] where T, is the total of the "
reapollSN of the jth treatment (dose). If all the observations were from a. Bingle popula-
tion, all EfT,] = rap where E(y] = p. Thus E{L] - rap14 = 0 if14 = O.
Digitized by Google
§ 13.8 A88aY8 and calibration CUn7es 305
necessary for the proof have been given in § 2.7. It is shown in the
lower part of Table 13.8.1 that this condition is fulfilled for all three
poBBible pairs of contra.sts.
TA.BLE 13.8.1
The upper part summarizes the coefficient8 (at) 0/ the response totala/or
the tJalidity tests/or the symmetrical (2+2) dose as8ay. The lower part
demonstrates the orthogonality (i.e. independence) 0/ the contrasts
Linear regression ~ -1 +1 -1 +1 0 4
Parallelism L~ +1 -1 -1 +1 0 4
Preparations (8 and U)
Lp -1 -1 +1 +1 0 4
TotaJ
«£1 X «£~ -1 -1 +1 +1 0
«£1 X «£p +1 -1 -1 +1 0
«£1 X «£p -1 +1 -1 +1 0
Digitized by Google
306 § 13.8
Therefore a ~ from linearity contra.Bl, La, measuring departure
from the -hypothesis of straightness, can be defined as
(13.8.5)
and this will be zero in the long run if the average line is straight. The
fifth contrast is dictated by the conditions mentioned above. It is
called L~, and inspection of the coefficients in Table 13.8.2 shows that
TA.BLE 13.8.2
OoeJficient8 (at) oJ 1M rup0n8e foIal8 Jor 1M orthogonal contraats in a
symmetrical (3+3) do8e auay
ReepoD8e totaJa Eat Eat2
~ -1 0 1 -1 0 1 0 4
Li 1 0 -1 -1 0 1 0 4
Lp -1 -1 -1 1 1 1 0 6
La 1 -2 1 1 -2 1 0 12
LI -1 2 -1 1 -2 1 0 12
It can eaai1y be checked t.ha.t, 88 in Table IS.8.1, the sum of the products of the
coeftlcients of corresponding totaJa is zero for all poaaible p&ira of contrasts, 80
all pa.ift of contraate a.re orthogonal.
Digitized by Google
§ 13.9 307
to stand for the total of the" responses to the jth dose level, then the
contrasts defined in § 13.8 all have the form
(13.9.1)
,s---,
v
-~r
(13.9.4)
and this expression also gives the sum of squares required for the
analysis of variance, beoaU8e eaoh sum of squares has one degree of
freedom---iJee §§ 13.7, 13.8, 13.11, and 13.12-eo the sum ofsquaree is
the same as the mean square.
It is not difficult to show (try it) that, when the appropriate base is
used for the logarithms giving (13.2.6), (13.2.7), (13.2.12), and (13.2.13),
the sums of squares for testing validity given by the general formulas
(13.7.1), (13.7.2), and (13.7.3) are the same as those given by (13.9.4),
using the definitions of the contrasts in § 13.8. The demonstrations
follow the lines used in the next section.
Digitized by Google
308 A88aY8 and calibration C'UrveB § 13.10
13.10. The theory of symmetrical parallel line assays. Simplified
calculation of the potency ratio and its confidence limits
The general results in §§ 13.3, 13.4, and 13.6 can be simplified when
the appropriate dose metameter is used (see § 13.2). The notation, and
the definition of 81Jmmetrical, are given in (13.8.1). Numerical examples
are given in §§ 13.11 and 13.12. Try to suspend your belief that this is
a very complicated sort of simplification until you have compared the
calculations for symmetrical assays (in §§ 13.11 and 13.12) with those
for an unsymmetrical assay (§ 13.13).
(13.10.1)
(13.10.2)
Digitized by Google
§ 13.10 309
Furthermore, from (13.2.5),
is -iu = ZLB -ZLU = logrZLs -logrZLu = logr(Zul/Zx.u)
(13.10.4)
and from (13.3.6)
(13.10.5)
R= (~:)anti10g10[~~OgI0D J. (13.10.6)
Digitized by Google
310 .A88tJY8 and caZibraHon CUf'VU § 13.10
where, from (13.6.5), (13.10.2), and (13.10.8),
N8 2t 2
g=-y. (13.10.10)
1
(13.10.11)
(13.10.12)
and, from the general definition of the slope b, (13.4.5), using (13.i.12),
(13.2.14), and the definition of Ll in Table 13.8.2,
1
b = I I(Z_i)2[(ZSl -is)l:YSl +(z82-is)~S2+ (Z83 -is)~83+
S.U
y
(13.10.13)
Digitized by Google
§ 13.10 311
Furthermore, from (13.2.11)
(zs-Zu) = (Z81-Zul) = logrZal-logrZul = log,(Zsl/Zul)
(13.10.1ft)
(13.10.17)
(::).anti1og1o[(3L~~~g)±3Ll~_g)J{N(I-g)+ ~(~:)1).log10D]-
(13.10.18)
....here
(13.10.19)
(13.10.20)
Digitized by Google
13.11
by stimula.tion of the phrenic nerve) of the isolated rat dia.phragm.
The four doses (or 'treatments') were allotted arbitrarily to the numbers
0, 1, 2, 3 &8 described in § 2.3:
Dose 0 = LU = 0·28 ml of unknown solution,
HU = 0·32 ml £¥c)lution,
= I6·0pg of )~LhIbocura.rine,
LS = 14·0 pg of H,fcbocura.rine.
))&8 given four doses were
The doses were given in sequence to the same tissue (see § 13.1, p. 286),
the blocks, in this ca.se, corresponding to periods of time. The analysis
TABLE 13.11.1
(+ )-tuboc'Ura1FiffhI, df"1Fffe8 were given in fCCHUff"Cfff in
(time period) a8 cC'fff'f,f,f"ffC the text, not in in
UI HS LU HU Totals
1 43 62 41 61 207
2 48 62 48 68 226
Block
3 53 66 53 70 242
4, 52 70 56 72 250
nlnalysis (normnl of
errors, equnl scatter in all gronpn, of response nnt, fCCH"fC"CCf,f
vions responses, additivity etc.) have been discussed in §§ 1l.2 and
13.1, p. 279. The ans.lysis is the same &8 that for randomized block
gle
§ 13.11 A88aY8 and calibration C'UrtIU 313
experiments (§ 11.6), with the addition that the between treatment
sum of squares can be split into components as described in § 13.7.
Because this assay is symmetrical (see (13.8.1» the arithmetic can be
simplified using the results in §§ 13.8, 13.9 and 13.10. Remember that
70
40 '--_""7U'-:_6:---_-:-'U-=_5--1 f + i .1 +1-2
loglodOBe
FIG. 13.11.1. Results ofsymmetrical2+2 dose 8811&y from Table 13.11.1.
o Observed mean re&pODSes.
- - Least sqU&re8 lines constra.ined to be p&ra.llel (i.e. with
mean slope, see §§ 13.4, 13.10 and calculations at end of
this section).
Notice break on absciss&. The question of the units for the potency ratio,
50·61, is discU8lled I&ter in this section.
the assumptions discussed in §§ 4.2, 7.2, 11.2, and 12.2 have not been
tested, 80 the results are more uncertain than they appear.
Digitized by Google
314 .A.88ays and calibranon C'UnJU § 13.11
The sum of squared. deviations (SSD) for linear regression is found
using eqn. (13.9.4):
lJf 137i
SSD = ~.
----: = -4X4 = 1173·0625.
Digitized by Google
§ 13.11 315
looked up in tables of the distribution of the variance ratio, as described
in § 11.3, it is found that a value of F(I,9) as large as, or larger than,
319·3 would be very rarely (P«O·OOI) observed if both 1173·0625 and
3'674 were estimates of the same varianoe (as), i.e. if there were in
fact no tendency for the high doses to give larger responses than low
doses (see §§ 13.8 and 13.9). It is therefore preferred to reject the null
hypothesis in favour of the alternative hypothesis that response dou
TABLB 13.11.2
AMlYN 01 tHJriance 01 re8pO'l&8U lor aymmetncal (2+2) do8e assay 01
( +) tubocurarine. The lower pari 01 the aMl,N i8 identical with Table
11.6.3 which was calcvlaIed uftng the _me ftguru
Source of variation d.f. SSD 1tIB F P
Total 15 a92·4,875
ohange with dose (i.e. that (J, the population value of b, is f&Ot zero, of.
t 12.5). The logical reason for this preference was discussed in § 6.1.
Proceeding similarly for the other variance ratios shows that devia-
tions from parallelism such as those observed would be quite common
if the true (population) lines were parallel. The same (or larger devia-
tions from parallelism would be expected in more than 20 per cent of
repeated experiments if the population lines had the same slope
({Js = (Ju)· There is therefore no evidence against the hypothesis of
parallelism.
Similarly there is little evidence that the average responses are
dift'erent for standard and unknown. Of course it is most unlikely that
they are exactly equipotent, but dift'erences as large as, or larger than
those observed would not be very uncommon if they were (see p. 93).
There appears to be a real dift'erence between blocks. Differences as
large as, or larger than, those observed would be expected in less than
1 in 1000 experiments if the population block means were equal; of.
Digitized by Google
316 A88aY8 aM calibration curves § 13.11
§ 11.6. Inspection of the results reveals a tendency for the responses to
get larger with time, and the analysis suggests that this cannot be
attributed to experimental error. The arrangement in blocks has
therefore helped to decrease the experimental error.
All these inferences depend on the assumption of §§ 4.2, 7.2, 11.2,
and 12.2 being sufficiently nearly true. If they were, the conclusion
would be that there is no evidence that the assay is invalid 80 it is
not unreasonable to carry on and calculate the potency ratio and its
confidence limits.
D is the ratio between high and low doses (see (13.8.1», i.e.
D = 0·32/0·28 = 16·0/14·0 = 1·14286, and the contrasts, Lp and Ll
have been already calculated. In § 13.3 and later sections it W808
assumed that all doses were expressed in the same units. This means
that Zr.s/Zr.u, and hence R, is a dimensionless ratio. In this case the
dose of standard was given in pg, and that of unknown in ml, sO
Zr.s/Zr.u = 14·0 pg/O·28 mI = 50·0 pg/ml. If these units are used Zr.s/
Zr.u, and hence R, will have the units pg/mI, suggesting that, if these
units are used, R is actually the potency (concentration inpg/mI) of the
unknown, rather than a potency ratio. It can easily be seen that this
is so by converting standard and unknown to the same units. For
example the doses of standard could be assumed to be 16·0 mI and
14·0 mI of a 1·0 pg/mI standard solution of ( + )-tubocurarine (the fact
that they are more likely in reality, to have been 0·16 mI and 0·14 mI
of a 100 pg/mI solution does not alter the dose given). This would give
ZLS/ZLU = 14·0 mI/O·28 mI = 50 (a dimensionless ratio). The potency
ratio would therefore be 50·61, as above, also a dimensionless ratio.
The concentration of the unknown is, from the definition of the
potency ratio (13.3.1), Rx concentration of standard = 50·61 X 1·0 pg/
ml = 50·61 pg/mI, as found above.
Digitized by Google
§ 13.11 A8saY8 and calibration, curvM 317
OOfl,jidenu limits/or lhe potency ratio
The simplified form of Fieller's theorem appropriate to this assa.y
is eqn. (13.10.9), which gives confidence limits for the population
value of the potenoy ratio as
The fact that 9 is considerably less than one implies that the slope, h,
is much larger than its standard deviation (as inferred from the large
variance ratio for linear regression in Table 13.11.2). This means that
it is safe to use an approximate equation, based on (2.7.16), for the
variance of the log potency ratio (as discUBBed in §§ 13.5, 13.6, and
13.10, and illustrated below). However, it is very little trouble to use
the full equation. Substituting the above quantities into the equation
for limits gives
0.09489 1·917 X
50.antilog1o [( 0.984 ± 137.0 X 0.984
2.262J{(16 X 0·984)+ 16(0·09489)2})
X 0'05529]
= 50 antilog 1o [-0·001757 and +0·01242lt
= 49·80 pg/ml and 51·45 pg/ml.
49.79 51.52
t If neceuary, _ p.325 for a footnote deecribing how to find the antilog of a
negative number.
Digitized by Google
318 § 13.11
Appro:rimate ctmjideftce lim.,.
Because g is muoh less than 1 the approximate formula for the limits,
eqn. (13.10.11), can be used (see §§ 13.5, 13.6, and 13.10). Substituting
the quantities already calculated into (13.10.11) gives the estimated
standard deviation of IOg10B aa
0·05529
B[logI0R] ~ 137.0 VJ:3·674X 16(1+0·0948Di)] = 0·003108.
Summary oj eM resua
There is no evidence that the assay is invalid, and the estimated
potenoy of the unknown tubocurarine solution is 50·61 pg/ml, with
95 per cent Ga1lB8ian confidence limits 49·9 ",g/ml to 51·45 ",g/ml. These
conclusions are based. on the aasumptions disousaed in §§ 4.2, 7.2, 11.2,
and 12.2. The confidence limits are, aa uauaJ, likely to be too narrow
(see § 7.2). Notice that the confidence limits for B are not equally
spaced on each side of R, unlike the limits encountered in Chapter 7.
In fact even the limits for log R are not equally spaced on each side of
log R unless g is small (see §§ 13.5, 13.6, and 13.10).
Digitized by Google
§ 13.11 319
dose ooours in the denominator of the slope, b must be tlitMedt by
loglov'D, i.e. by i log10D = i IOgI0(0·32/0·28) = 0·0290. The required
slope is therefore b' = 8'6626/0·0290 = 295·3. The dose response
ourves have the eqns. (13.3.3),
Ys = 's+b'(xs-is ),
Y u = lu+b'(xu-iu),
where x is now being used to stand for loglo(dose), the abscissa of
Fig. 13.11.1. The response means are, from Table 13.11.1, Is = (196
+260)/8 = 67·0 and lu = (198+271)/8 =- 58·625. The dose me&D8
have not been needed explicitly because of the simplifications resulting
from the ohoice of dose metameter. For the standard, 1081016'0
= 1·204:1 and log1014:·0 = 1·14:61 80 is == (4: X 1·204:1 +4: X 1'14:61)/8
= 1·1751 (each dose occurs four times remember). Similarly logloZau
= log100'32 == -0·4:94:9 and 10g1oZLu = IOg100·28 = -0·6628, 80 iu
= (4: X -0·4:94:9+4: X -0·5528)/8 = -0·6238.
Substituting these results gives the lines plotted in Fig. 13.11.1 as
Ys = 57·0+295·3(xs -l·1751), Yu = 58·626+296·3(xu+0·6238).
Digitized by Google
A88ays and calibr77j,i{77~ 13.12
TABLE 13.12.1
Re8p0n8u oj the isolated ileum. The do8u were given in random orrkr
(see ten) in each block (time period), not in the order shown in the table
o
/
30
Standard
0'7___ ?-known
J
log
20
10~~~------~------~------~
0·602
/
0·903 1·2Q.i 1·505 log!. doee
2 3 4 5 logl d08e
4 8 16 32 dose (muml)
(IOjl&ritllmic
sl'ale)
The analysis indicates tha.t these straight lines may well not fit the observations
adequately. The abscissa. shows three equivalent ways of plotting the log dose.
Note tha.t the ordinate does not st&rt at zero.
gle
§ 13.12 321
and for unknown. The ratio between each dose and the one below it is
D = 2 throughout. The first stage is to perform an analysis of varianoe
on the responses to test the &88&y for non-validity. As for aU &88&ya,
this is a GaUBBian analysis of variance, and the &88UDlptioDS that.
must be made have been discU88ed in §§ 4.2, 7.2, 11.2, and 12.2, whioh
should be read. Uncertainty about the &88umptioDS means, as usual,
that the results are more uncertain than they appea.r.
Ll = -97·0+197'5-72·0+174·5 = 203·0.
Digitized by Google
322 113.12
The corresponding 8UDl of squares for linear regression is found, using
(13.9.4), to be
L~ 2031
SSD = - = - = 2060·45.
nul 5X4
L~ = 97·0-197'5-72·0+174'0 = 2·0.
The corresponding sum of squares is
L'~ 2·01
SSD=-=-=O·20.
n1:«1 oX4
(c) Between standard and unknown. preparatioM. The contrast, from
Table 13.8.2, is
Lp = -97·0-133·0-197·5+72'0+131·0+174'5 = -00·0.
The sum of squares, from (13.9.4) (using 1:«1 = 6 from Table 13.8.2), is
LI 50·01
SSD = n1:«1 =
-p - - = 83·33.
5x6
Digitized by Google
§ 13.12 A88GY8 tmd calibrtJlion cumI8 323
(j) OAeci on ariIAmeIical accuracy. The total of the five BUms of squares
just calculated is
2060·45+0·20+83·33+2'82+32'27 = 2179'07
agreeing, as it should, with the sum of squares between doees which
was caJculated independently above.
All these results are now assembled in an analysis of variance table,
Table 13.12.2, which is completed as usual (of. §§ 11.6 and 13.7).
Divide each sum of squares by its number of degrees of freedom to find
TABLB 13.12.2
The P mlU6 marked t iB Jound from reciprocaZ F = 1),838/0,2,
Bee Ia:e
the mean squares. Then divide each mean square by the error mean
square to find the variance ratios. The value of P is found from tables
of the distribution of the variance ratio as described in § 11.3. As
usual P is the probability of seeing a variance ratio equal to or
greater than the observed value iJ the null hypothesis (that all 30
observations were randomly selected from a single population) were
true.
Interpretation oj ,he analy8i8 oj mriance
The interpretation of analyses of variance has been disC11.888d in
§§ 6.1, 11.3, and 11.6 and in the preceding example, § 13.11. As usual
it is conditional on the assumptions being sufficiently nearly true, and
must be regarded as optimistic (see §§ 7.2, 11.2, and 12.2). There is no
Digitized by Google
324 A88ays aM calibration CunJeB § 13.12
evidence for differences between blocks, so little or nothing was gained,
and some degrees of freedom were lost, by using the block arrangement
in this particular case (of. § 13.11). The average slope of the dose
response curves, shown in Fig. 13.12.1, is clearly not likely to be zero
because if it were, a value of F ~ 362·9 would be exceedingly rare.
The question of pa.raJlelism is interesting, especially as the standard
and unknown were not identical substances. The variance ratio,
F(l,20) = 0·2/6·838 = 0·034, is very small so there is no hint of
deviations from pa.raJlelism. To find the P value for F < 1 the method
described in § 11.3 can be used. Looking up F(20,l) = 6·838/0·2
= 29·2 in tables of the variance ratio gives the probability of observing
an F value of 29·2 or larger as something between 0·1 and 0·2. Therefore
the probability of observing F(l,20) ~ 0·034 is 0·1-0·2,-not so rare
that the lines must be considered as more nearly parallel than would
be expected on the basis of the observed experimental error. Another
way of stating the result is that in 80-90 per cent of repeated experi-
ments the F value for deviations from parallelism would be predicted
to be greater than 0·034 if the population lines were parallel.
Though neither the standard nor the unknown observations lie
on straight lines, as seen in Fig. 13.12.1, the analysis of variance gives
no hint of deviations from linearity. This is because the average of the
two lines (to which the analysis refers) is very nearly straight. The
observations lie on linea that curve in opposite directions so the curva-
tures cancel when the slopes are averaged. In fact an F value corres-
ponding to a difference in curvature as large as, or larger than, the
observed one would be expected to occur, as a result of experimental
error, in rather less than 6 per cent of repeated experiments. This
cannot be explained further without doing more experiments. There
could be a real difference in curvature as a result of the impurities in
the unknown solution. In intuitive pharmacological grounds this
does not seem very likely so perhaps there is no real difference in
curvature and a rarish (rather less than 1 in 20) chance has come off
(see § 6.1). More experiments would be needed to tell.
H the possibility of a real difference in curvature were not considered
to invalidate the assay, the potency ratio and its confidence limits
would be calculated as follows.
Digitized by Google
§ 13.12 325
§ 13.11 does not arise. The least squares estimate of the potency ratio,
from (13.10.17), is
R = ( Zsl) . (4Lp
Zul . antilog1o 3L1 • log10D
)
4) • (4X(-50) )
= ( 8 . antilog10 3 X 20S • IOglo 2
Digitized by Google
326 AMay6 au calibralitm CWI1U § 13.12
Thus, = 2x30X 5-838 X 2'0862/(3 X 2032) = 0001233
and (1-,) = 009877.
As in the last example, , is small 80 the approximate formula for the
limite could be 1l8ed, but before doing this the full equation given above
wi1l be uaed to make 811J'8 that the approximation is adequate. Substitut-
ing the above quantities into the general formula gives
[(4 X ( -0024:63)
OoG antilogl0 3 X 0.9877 ±
2
4 X 2·416 X 2'086j{(30 X009877)+---a-<
3 X 203 X 009877
4 X 30 })]
-0'24:63)2 0·3010
.pogl0R]-4XOo30IOj[
3X203 5'838x30(1+'3(-0.2463)2
2 )] = 0·02669.
Taking antilogst gives the confidence limite as 0'3GO and 0'453, similar
to the values found from the full equation.
Digitized by Google
§ 13.12 A8saY8 and calibration C'Unle8 327
PloIIing the re8'UlU
The slope of the response-log dose lines, from (13.10.13), is h = 203/20
= 10·15. This is the slope using z = logD (dose) (eee f 13.2). It must be
divided by IOgloD = 0'3010, giving h' = 33'72, the slope of the response
against log10 (dose) lines, whioh are plotted in Fig. 13.12.1. The full
argument is similar to that for the 2+2 dose assay given in detail in
§ 13.11.
20
10
used, will be illustrated using the results shown in Table 13.13.1 and
plotted in Fig. 13.13.1. The figures are not from a real experiment--
in real life a symmetrical design would be preferred. Tho 16 dOBe8
Digitized by Google
328 § 13.13
should be allocated strictly at random (see § 2.3) so a one way analysis
of va.rian.ce (see § 11.4) is appropriate (given the &88umptions described
in § 11.2).
TABLB 13.13.1
BuulIB 0/ a 3 + 2 do8e atl8ay
Standard doaea UDknown doaea
1'0 8'0 10·0 1·0 4·0
ft 8 4 8 2 8 15
Mean 10·1 18·2 28'0 18·2 24·7
Total SO·8 72·8 84·0 26·4 74·1
T '---y---J
Total 187·1 100·5 287'6
Digitized by Google
§ 13.13 Assays and calibration curves 329
Fur the &:+!0$3kliw:\oru the Ta.bl&:+ 13.U, givr
Now these results can be used to find the components of the sum of
squf?UUE? bel ween Oesc?'ibed in 13.
187, 100'02
SSD = -W+-5--5514·25066 = 6'4403.
TABL. 13.13.2
8omoe of variation eLf. 88D JrI8 11 P
26·89672+8·30898
b=
1·50126+0·43l503
== 18·18.
The slopes of lines fitted separately to standard and unknown would be
bs = 26·89672/1·50126 = 17'92, and bu = 8·30898/0'43503 = 19·10.
The lines plotted in Fig. 13.13.1 are therefore, from (13.3.3),
Ys = 18·71+18·18(:':s-0·4908),
Yu = 20·10+18·18(:':u-O·3613).
This calculation, but not the preoeeding ones, has been made rather
simpler than in §§ 13.11 and 13.12, because there is no Bimplifying
transformation to bother about.
Digitized by Google
§ 13.13 331
PM pom&C1J ralio
From (13.3.7). the potency ratio is estimated to be (because ~ = loglo
dose)
(20.10-18-71)]
R = antiloglo [ (0'49084-0-36126)+ 18-18
= antiloglo(0'2060) = 1-607.
r[y] = 0-2680 (the error mean square with 10 d.f. from Table 13.13.2),
8 = V(0-2680) = 0-6177,
(gu-ys)/b = (20·10-18-71)/18-18 = 0-076468,
I I(~-i)1I = 1-50126+0-43603 = 1-9363,
S.u
t = 2-228 for P = 0-951imita and 10 d.f_ (from tables of Student's ';
see §§ 4.4 and 7.4).
80 (I-g) = 0-9979.
Logs to the base 10 have been used, 80 the conversion factor
loglo 10 = l. The 95 per cent confidence limits for the population value
of R are therefore, from the general formula (13.6_6),
0·076458
antiloglo [ (0·49084-0·36126)+ 0.9979 ±
Digitized by Google
332 A8aaY8 and calibration CUn168 § 13.13
the approximate formula, its general form, can be used. In this case
M = IOglOR 80, using (13.6.8),
va.r[logloR]~18'182
0·268 (I5+ 10I+ 0.076458
1·9363
2)
= 2·4570 X 10-'.
TABLE 13.14.1
III Observations (II) Total " Mean
Digitized by Google
§ 13.14 .A88aY8 and calibration CUn168 333
results of the sort that are obtained when measurements are made from
a standard calibration curve. This method is often used for ohemioal
a.ssa.ys. For example z could be concentration of solute, and 11 the
optioal density of the solution measured in a spectrophotometer.
In this example z can be any independent variable (see § 12.1), or any
transformation of the measured variable, as long as 11 (the dependent
10
Yu=8·3
8
11
FIG. 13. H.!. The standard caJibration curve plotted from the results in
Table 13.14.1.
o Observed mean responses to standard.
- - Fitted least squa.rea straight line (see text).
-.-.- 95 per cent Gaussian confidence limits for the popuJation
(true) line, i.e. for the popuJation value of 11 at any
given m value (see text).
- - - 95 per cent Gaussian confidence limits for the mean of
two new observations on 11 at any given mvalue (see text).
The graphical meaning of the confidence limits for the value of z corresponding
to the value of 11 observed for the unknown is illustrated.
Digitized by Google
334 .A8aay8 and calibration CUn168 § 13.14
Frequently the sta.ndard curve is determined. first and it is a&BUmed.,
as in this section, that it has stayed. consta.nt during the subsequent
period in which measurements are made on the unknowns. This
requires separate verification, and it would obviously be better if
sta.ndards and unknowns were given in random order or in random blocks.
If this is done the unknowns can be incorporated in the analysis of
variance as described in § 13.15, the effect of this being to reduce the
risk of bias and to improve slightly the estimate of error by taking
into account the scatter of replicate observations on the unknown. It
will of course be an &BBumption that the scatter of responses is the same
for all of the sta.ndards and for the unknowns, in addition to the other
&BBumptions of the GaUBBian analysis of variance which have been
described in §§ 11.2 and 12.2.
60·()i
correction factor = 10 = 360·0.
Although the :l: values are equally spaced, the simplifYing trans-
formation described at the end of § 12.6 cannot be used, because the
number of observations is not the same at each x value.
(a) Sum o/8fJ1UJru due to linear regru8Wn. First calculate
~xs = (lx2)+(2x3)+(3x2)+(4x3) = 26·0
and fa = 26,0/10 = 2·60.
Digitized by Google
§ 13.14 336
The sum of products, from (2.6.7) (see § 12.6), is
l:ys(zs-xs) = (I X4·0)+(2X 15·0)+(3 X 14·0)+(4X27·0)-
26·0 X 60·0
10 = 2S·00.
= 12·40.
TABLE 13.14.2
Source d.f. SSD MS F P
Total 9 65·6200
Digitized by Google
336 Aaaaya and calibration curvea § 13.14
true line is not straight. However this analysis does not distinguish
between systematio and unsystematio deviations from linearity.
Looking at Fig. 13.14.1 suggests the deviations in this case, though no
larger than would be expected on the basis of experimental error, are of
a systematio sort. The line appears to be flattening out. Now physical
considerations, and past experience suggest that this is just the sort of
nonlinearity that would be expeoted in a plot of, say, optical density
against concentration. In a case like this it would be rather rash to fit
a straight line, in spite of the fact that there are no grounds for rejeoliing
the null hypothesis that the true (population) line is straight. TAia ia a
good example 01 the practical importance 01 the logicallact explained i.,..
§ 6.1, that il there are 1100 ~ lor rejecti1&{/ a AypotAe8i8 tAia does ?loOt
mea.,.. that there are good grouw lor accepti1&{/ it. In a small experiment,
suoh as this, with substantial experimental errors, it is more than
likely that deviations from linearity that are real, and large enough to be
of praotical importance, would not be detected with any certainty.
The verdiot is not proven (see § 6.1). For purpo8e8 01 illuatration, a
atraigAt lim will TWW be JUted, thougA the loregoi1&{/ remark8 8U{Jge8t that a
pol1J1loOmial (aee above) would be aaler. The least squares estimates of the
parameters (see § 12.2) are thus, from (12.2.6),
~ys 60·0
aa=gs = - = - = 6·00
""s10
and, from (12.2.8),
~Ys(xs -is) 28·00
bs = l:(XS-iS)2 = 12.40 = 2·2581,
(13.14.2)
Digitized by Google
§ 13.14 A88aY8 and calibration CU",68 337
The estimate of Xu (e.g. concentration) corresponding to the mean
observation gu (e.g. optical density) on the wllmOwn is therefore, from
(13.14.1) and (13.14.2),
Xu = :is+(Yu-gs)/bs
= 2·60+(8·30-6·00)/2'2581
= 3'619, (13.14.3)
&8 shown graphically in Fig. 13.14.1.
(I-g) = 0·9744.
Digitized by Google
338 .A.aaaya and calibration CUnJe8 § 13.14
The 95 per cent confidence limits for the true value of Zu therefore
follow from (13.5.9) (by adding is to the confidence limits for (gu -gs)/6;
of. § 13.6) and are
Digitized by Google
§ 13.14 339
and, from (12_4_4),
1 (1-0-2-60)2)
var( Y) = 0-2700 ( 10+ 12.40 = 0·082742.
The 96 per cent Ga1188.ian confidence limite for the population value of
Y at z = 1·0 are therefore, from (12.4.6), 2-387 ± 2·447 X y'00082742
= 1·683 and 3·091.
These limits are seen to be wider than the limits for the population
value of Y as would be expected when the uncertainty in the new
observations is taken into account_ They are also less strongly ourved.
Digitized by Google
340 A88aY8 and calibration curves § 13.14
The mean of the two observations on the unknown in Table 13.14.1,
was iiu = 8·3, and the oorreaponding value of Xu read off from the line
was 3·619 as calculated above, and as shown in Fig. 13.14.1. The
95 per cent confidence limits for Xu at y = 8·3 were found above to be
3·173 to 4·118. It can be seen in Fig. 13.14.1 that these are the points
where the line for y = 8·3 intersects the confidence limits just cal-
culated (the limits for the mean of two new observations at a given x).
The limits found from Fieller's theorem (13.5.9) are, in general, the
same as those found graphically via (12.4.6).
Routine a88ay8
The (2+2) or (3+3) dose assays should be preferred for accurate
assays. The (ks +l) dose assay probably occurs must frequently in
the form of the (2+ 1) dose assay in which the unknown is interpolated
between 2 standards. This is the fastest method and is often used when
large number of unknowns have to be assayed. It is rare in practice
for the doses to be arranged randomly, or in random blocks of 3 doses
(ks + 1 doses in general). Even worse, standard and unknowns are often
given alternately, 80 each standard is used to interpolate both the
unknown immediately before it and the unknown immediately after it.
This introduces correlation between duplicate estimates, making the
estimation of error difficult. Quite often the samples to be assayed will
come from an experiment in which replicate samples were obtained,
and several assays will be done on each of the replicate samples. In
this case a reasonable compromise between speed and statistical
purity is to do (2+ 1) dose assays with alternate standard and unknown,
and to interpolate each unknown response between the standard
responses (one high and one low) on each side of it. The replica lie assays
on each sample are then simply averaged. An estimate of error can
then be obtained from the scatter of the average assay figures for
replicate samples rather than doing the calculations described below.
The treatments should have been applied in random order (see § 2.3)
in the original experiment and the samples should be assayed in random
order. If the ratio between the high and low standard doses is small
(say less than 2) it will usually be sufficiently accurate to interpolate
Digitized by Google
§ 13.15 A88aY8 and calibration curves 341
linearly (rather than logarithmically) between the standards. See
Colquhoun and Tattersall (1969) for further discussion.
I
4 I
I
I
I
I
2 ® r--logtO R = 1·619--1
I I
I I
I I
O~------~----~~I------~--~~I ~
o 2 3 3·619 4
logt. dose (x)
FIG. 13.15.1. Ifxin §13.14 (Table 13.14.1) were log dose, then the resulta
in Table 13.14.1 could be treated as a 4 + 1 dose pa.rallelline asaa.y, as illustrated,
as an alternative to the treatment as a standard curve problem which was worked
out in § 13.14. The observations and fitted line are as in Fig. 13.14.1 with the
addition that the dose of unknown required to produce the unknown responses
has been specified.
Digitized by Google
342 § 13.15
responses 1Iu = 8·1 and 8·5, 80 flu = 8·3 as in Table 13.14.1. This
&88&y is plotted in Fig. 13.U.1. Using the general formula for the
log potency ratio, (13.3.7), gives, using (13.14.3),
- - +flu-fls
1og10 R = Xs-Xu -b-
= 3·619-2·00 = 1·619.
When this is entered in Table 13.15.1 the sum of squares for deviations
from linearity can be found by differenoe. It is seen to be identioal
with that in Table 13.14.2, as expected.
The error varianoe in Table 13.U.l is 0'2429, less than the figure of
0·2700 from Table 13.14.2. Inclusion of the unknown responses baa
Digitized by Google
§ 13.11S A88aY8 and calibration C1U't1U 343
slightly reduced the estimate of error because they are in relatively
good agreement. The interpretation is the same as in § 13.14.
The confidence limits for the log potency ratio be found from the
general parallel line assay formula. (13.6.6). The oaloulation is. with
TABLB 13.16.1
Source eLf. 88D M8 F P
Digitized by Google
14. The individual effective dose,
direct assays, all-or-nothing
responses and the probit
transformation
Digitized by Google
§ 14.1 Probit8 345
by sample estimates, the average, i, of the observed IEDs. The question
immediately arises as to what sort of average should be used.
H the IEDs were normally distributed there are theoretical reasons
(see §§ 2.5, 4.5, and 7.1) for preferring to caloulate the arithmetio
mean lED for each preparation (standard and unknown). In this case
the estimated potency ratio would be R = is/Zo. Because the lED has
been supposed to be a normally distributed variable, this is the ratio
of two normally distributed variables. A pooled estimate of the variance
rrz] could be found from the soatter within groups (as in § 9.4). The
confidence limits for R could then be found from Fieller's theorem,
eqn. (13.5.9), with till = I/.",s and tl22 = I/nu, where ""s and Au are the
numbers of observations in each group. (Because each lED is supposed
to be independent of the others, tl12 = 0.)
However, if the IEDs are lognormally distributed (see § 4.5) then
the problem is simpler. Tests of normality are discussed in § 4.6.
If the log lED is denoted x = log z then it follows that the estimated
log potency ratio will be
M = log R = xs-xu. (14.1.2)
The variance of this will, because the estimates of lED have been
assumed to be independent, be
Digitized by Google
Pro1Jn8 § 14.1
var(zs) var(zu)
var(M) = var(xs)+var(xu) = + ,(14.1.3)
"s "u
from (2.7.3) and (2.7.8). It is neoessary, as in § 9.4, to assume that the
scatter of the measurements (z values) is the same in both groups so a
pooled estimate of var(z) is calculated from the scatter of the logs of
the observations within groups as in § 9.4, and used as the best estimate
of both var(zs) and var(zu}. The oonfi.dence limits for the log potency
ratio are then M ± 'v{var(M}} as in § 7.4. Taking antilogarithms of
these, and of (14.1.2), gives the estimates of B and its oonfi.dence limits.
A numerical example is given by Burn, Finney, and Goodwin (1950.
pp.44-8).
14.2. The relation between the individual effective dose and
all-or-nothing (quantal) responses
In the sort of experiment described in § 14.1 the individual effective
dose (lED) just sufficient to produce a given effect is measured directly
on each individual. For example. the amount of digitalis solution needed
to produce cardiac arrest can be measured on each of a group of animals
by giving it as a slow intravenous infusion and observing the volume
administered at the point when the heart stops. The results given in
Table 14.2.1 are an idealized version of experimental measurements
of 100 individual lethal doses (21) of cocaine cited by J. W. Trevan
(1927). The results have been grouped so that a histogram can be
plotted from them and the percentage of individual effective doses
falling in each dose interval is denoted I. The logarithms (z) of the
doses are also given (I has been added to each of the values to make
them all positive).
From the results in Table 14.2.1 the mean individual effective dose is
the total of the 21 values divided by the total number of observationst
"1:.121 61·476
i = -
"1:.1 = 100 ""'0·516 mg.
The median effective dose (dose for'P = 60 per cent) (I4.2.I)
(interpolated from Fig. 14.2.2) ~ 0·49 mg.
The modal effective dose (interpolated from Fig.
I4.2.I) ~ 0·44 mg.
t Thill mean is calculated from the grouped results, each lED being a.umed to have
the O8IltraJ. value of the group in whioh it falla. If the original ungrouped obllervatioD8
were available, the mean of these would be preferred. If it is aooepted that I is lognormal
<_ below) then the mean can also be estimated uaiDg the equation on p. 78 with
P = 1.707 and t1 == 0'104 from Fig. 14.2.6. Thill gives antilog1o (i.707 + 1-1513 X
0.1048 ) == 0·524 mg.
Digitized by Google
§ 14.2 ProbitB
A histogram of the distribution of the individual effective doses is
plotted in Fig. 14.2.1 and the estimated mean, median, and modal
IEDs (see § 2.5) plotted on it. The distribution looks positively skewed
and therefore, as expected, mean > median > mode (see § 4.5).
TABLE 14.2.1
Fr~y 1 = percemo,ge 01 animal8 re8pOft(ling in each, do8e if&ten1al.
Oumulative frequency p = total percemo,ge 01 animal8 re8pOft(ling to do8e
equal to or leu than tke tqYJ>er limit 01 each, do8e intenJal. ProbitB (au
§ 14.3) were obtained from Fiaker and Yatu table8 (1963, Table IX,
p. 69). Tke p txUuu are lound a8 tke cumulative BUm 01 tke obat.nJed, 1
txUuu. For example 1S4 = 38+ 15+ 1
Dose log dose Probit
interval Mid-point interval +1 / P /. of
(mg of coca.ine) (.) (:Il) P
'E./ 'E./ts =
= 100 61-475
Digitized by Google
Probits §14.2
the curve in Fig. 14.2.1 below z (cf. Fig. 5.l.2 and its cumulative form,
Fig. 5.l.1, and Figs. 4.l.3 and 4.l.4).
The relation between the lED and quantal responses can now be
illustrated. When a quantal response is obtained the lED itself is not
nt¥2cn¥24¥2rrn~ A fixed dose, given to a grmzp ¥2ubjects
r, of the chosen ub¥2nrved.
nTf",T"'M".1Inn of subjerI'¥2 in the group is, of
zliscontin'U0U8 ¥2cnd if the samr given
60
Mode !::o< 0'44 mg
50
\0
gle
§ 14.2 Probits 349
100
90
80
70
J
c 60
~
8.
~ 50 ------------- ---
j
::I
e~ 40
~
30
20
IfI
Mode (maximum slope) ~0·44 mg
Median ~ 0·49 mg
oL--.&:;::::::::;;;;~==:::i.-L..Ji-~.I.."___.,""'=_..:...,,..L,_-....,.J..,,-__,.J
()·I 0·2 ( ·5 0·6 0·7 0·8 0·9 1·0
% (dose mg)
FIG. 14.2.2. Results from Table 14.2.1. The histogram is plotted. using the
cumulative frequency p, against dose •. The blocks, each of height I, from Fig.
14.2.1, have been put above each other 80 that the total height is p. The sigmoid
curve has been drawn by eye through the top right-hand comer of each block
(see text) &8 an estimate of the true (continuous) cumulative distribution (i.e.
the distribution function, see § 4.1) of individual effective doses, i.e. the ordinate
is the percentage of animals with an individual effective dose equal to or leu
than •.
Digitized by Google
3&0 ProhiU
8D=0'104
Digitized by Google
§ 14.2 ProbitB 31n
So if the quantal responses (values of rIft) were plotted against the
dose an unsymmetrioaJ sigmoid dose-response ourve like the continuous
line in Fig. 14.2.2 would be expected.
Thus when quanta.! responses are measured the dose is fixed by
the experimenter and the number (or proportion) of subjects responding
is the variable measured. On the other hand, in direct 8.ssa.ys the dose
is not fixed but is the variable quantity measured by the experimenter.
The subjects responding in the quanta.! experiment are the subjects
in the group with an lED equal to or less than the fixed dose given. No
information is obtained about IEDs of a Bingle animals 80 Fig. 14.2.1
cannot be plotted directly (though it can of course be obtained by
plotting the slope of the quanta.! dose-response curve, Fig. 14.2.2,
against dose, i.e. by differentiation of Fig. 14.2.2 (this was shown in
(4.1.5».
The cumulative curve in Fig. 14.2.2 is anaJOgOUB to an ordinary
dose-response curve, for example the tension (a continuous variable)
developed by a smooth muscle preparation in response to various
doses of histamine. Because it is easier to handle a straight line than a
ourve, it is usual to look for ways of converting dose-response ourves to
straight lines. A method of doing this that often works in the oase of
histogram and the GaWlllian curve equal, but the two are stUl not comparable
because area represents frequency for the continuous curve (see § 4.1), but not
for the histogram.
(b) The histogram in this figure baa been constructed 80 that the area of each
block represents frequency, I, or, more precisely, the proportion II'I:.I = 11100
in this eumple. The area is the height (1& say) times the width of the log dose
interval (~ say). For eumple, the first and laat blocks represent a fioequency of
l:z 1 per cent (see Table IS.2.1) 80 the first and laat blocks are of equal height in
FIgs. U.2.1 and U.2.8(a). However, in Fig. 14.2.8(b) they have equal areas (each
have 1 per cent of the total area), and therefore UDequal heights. By deflnition,
proportion = 11100 = 1&~ = area. For e:umple, for the first block ~ = 0·477
-O·SOI = 0'176, 80 the heisht (probability deusity) is 1& = Ill00~ = 1/17'6
= 0'05682, aa plotted. For the laat block ~ = 1'0 -0'954 = 0'046, 80 1& =
III00~ = 1/4·6 = 0·2174 aa plotted.
The area convention shown in Fig. U.2.S(b) is the preferable one, because It
shows the shape of the distribution correctly when the widths of the groupe are
not equal (though only at the expense of making it not obvious when frequencies
are equal, because it is more di1Dcult to judge relative areas than relative heights
by eye). The continuous curve Is a GaWlllian curve with the same mean and
standard deviation aa in Fig. 14.2.8(a), and it can now be compared directly
with the hiatogram because both have been plotted wdng the same (area) con-
vention (see § 4.1), and both have a total area of 1·0. The GaWlllian curve is seen
to fit the obeervationa N880nably well.
Digitized by Google
352 Probil8 114.2
(II,
t
i 50 --------------
=
.~
1
8
40
20
Mean median
10 . and modal lED
=antilog1-707
=0'509 mg
S~'3===0~'4~~&~5===0~'6~~0~'7~=0~'8;:~O~'9~~1'0'
log dOle (1+ log :)
Digitized by Google
§ 14.2 ProbitB 353
7·5 . - - - - - - - - - - - - - - - - - - - - - - . . , 9 9 · 4
7'0 98
6·5
90
6·0
80
20
4·0
10
3·5
3·0 2
2'5
0·2
FlO. 14.2.5. Reaults from Ta.ble 14.2.1. Plot of the probit of J' a.ga.inst the
doae (.). The corresponding percentage aca.le is shown on the right for comparison.
The non-linearity indicates that lED values are not norma.lly distributed. A
smooth curve baa been drawn through the points by eye and the median lED
(p = 50 per cent, probit [P] = 5) is estimated to be 0'49 mg, 88 W88 also found
by interpolation in Fig. 14.2.2 (cf. Fig. 14.2.6, which gives a. slightly cWrerent
estimate).
Digitized by Google
3li4 ProbitB § 14.3
7·0
6·5
6·0
5~ -------------------------
...o
SI,
•
... 5· ----------------------
~
4~ -------------------
4·0
3·5
Median lED
0·3 =antilog 10T'707
=0·509 mg
2·5 L.----:::'-:---::"::----::l-=-.i.....::~-'-_:_L:_-__:_'::__~
0·4 0·5 0·6 0·7 0·8 0·9 1·0
log dOBe (I +log1.:)
FIG. 14.2.6. Results from Table 14.2.1. Plot of the probit of p aga.inat log
dose (I/: = log s). The graph is reasonably straight, indicatmg that log lED
values are approximately normally distributed (i.e. lED values are approximately
lognormal, see § 4.5). The reciprocal of the slope (1/9,60 = 0·104) estimates the
standard deviation of the normal distribution of log lED values, and the dose
corresponding to p = 50 per cent (probit(p] = 5), i.e. antilog I'707 = 0'609 mg,
estimates the median (= mean = mode) of the distribution of log lED values.
The distribution plotted with this mean and standard deviation is shown in
Fig. 14.2.3. The estimate of the median effective dose from this plot, 0·609 mg,
is d1f!erent from that obtained from Fig. 14.2.2 and 14.2.5 (0·49 mg). This is
becauae a atra.Ight line baa been drawn in this figure, using all the points; and
the dose corresponding to probit(p] = 5 baa been interpolated from the atraIght
line even though it does not go exactly through the points. This would be the
best procedure if the true line were in fact straight (Le. if the population of log
lED values were in fact Gausaian). In Figs. 14.2.2 and 14.2.5, curves were
drawn by eye to go exactly through all the points, 80 effectively on the obaerva-
tions on each side of probit(p] = 5 were being used fO!' interpolation of the
median, whereas when a straight line (or other rpecijitJd function) is fitted, all the
observations are taken into account. In a real quantal uperiment the IItrataht
Digitized by Google
§ 14.3 ProbiU 366
dose, the result, shown in Fig. 14.2.4, is not a straight line but a
symmetrical sigmoid curve. (In fact similar results are often observed
with continuous responses also.)
A way of converting the results to a straight line is suggested. by
Fig. 14.2.3, in which / rather than p is plotted against log dose. The
histogram has become roughly symmetrical compared with the skewed
distribution of IEDs seen in Fig. 14.2.1. The continuous line in Fig.
14.2.3 is a calculated normal (Gaussian) distribution with a mean
and standard deviation estimated as described below and illustrated
in Fig. 14.2.6. The calculated normal distribution is seen to fit the
observed histogram quite well suggesting that the logarithms of the
IEDs (values of x = log z) are normally distributed, i.e. that the
IEDs (values of z) are lognormally distributed (see § 4.5) • .Any ourve
can be linearized i/ the mathematioal formula describing it is known.
The sigmoid curve in Fig. 14.2.4, the cumulative form of the distribu-
tion in Fig. 14.2.3, is a cumulative normal distribution. This was
illustrated in Fig. 4.1.4, which shows the cumulative form, p == F(x),
of the normal distribution in Fig. 4.1.3. If the abscissa in Fig. 4.1.4 is
some measure of the effective dose then the ordinate of the cumulative
normal distribution is
p = F(x) = area under normal curve below x
= proportion of animals for which lED ~ x, (14.3.1)
i.e. exactly what is plotted as the ordinate in Fig. 14.2.2 and 14.2.4.
The formula for the integral normal curve shown in Fig. 14.1.4,
from (4.1.4) and (4.2.1), is
[(x-P)~
= F(x) = f_«)av'(2Tr)exp
z 1
P - ~J'dx. (14.3.2)
line would be fitted to the pointe in this figure using the iterative method dis-
cWllJeCi in § 14.4. In this example, the quantal data has been generated, for
illustrative purposes, using actual lED meaauremente rather than by giving
fixed doees to groupe of an.imaJa, so the best that can be done is to fit an un-
weighted straight line (shown) as deecribed in § 12.6.
Digitized by Google
356 P,0bit8 § 14.3
would be read off as shown in Fig. 14.3.1. This value of the absoissa
would then be plotted against the dose (or some transformation of it,
such as the logarithm of the dose), as shown in Fig. 14.3.2.
The abscissa of the standard normal curve, is, as described in § 4.3,
U = (z-I')/a, where a is the standard deviation of Z (i.e. of the log
lED in the present case). So in effect, instead of plotting 'P against z,
tM mlue 01 u corre8fJ01Uli1lfl to 'P (which iB called tM normal equitJalent
detJiation or NED) iB 'Plotted againat x. But because the relation between
U and x,
(14.3.3)
has the form of the general equation for a straight line u = bx+a, the
plot of NED against x will be a straight line with slope l/a and intercept
(-I'/a) iI, and only if, the values of x are normally distributed. This
is because the NED corresponding to be observed 'P were read from a
normal distribution curve.
The values of u are negative for 'P < 50 per cent response and so,
to avoid the inconvenience of handling negative values, 5·0 is added to
all values of the NED and the result is called the probit corresponding
to 'P or probit [Pl. Tables of the probit transformation are given, for
example, by Fisher and Yates (1963, Table IX, p. 68). From Fig.
14.3.1, it is seen that'P = 50 per cent response corresponds to u == NED
= 0, i.e. probit [50 per cent] = 5. Thus
Z-I'
probit [P] == u+5 == NED+5 = 5+-
a
(14.3.4)
so the plot of probit [P] against z will be a straight line (if z is Gaussian)
with slope l/a (as above) and intercept (5-I'/a). Here, as above, a is the
standard deviation of the distribution of x, i.e. of the log lED in the
present case. It is therefore a measure of the heterogeneity of the
subjects (see § 14.4 also).
From Fig. 14.3.1, it can be seen that the NED of a 16 per cent
response (i.e. 16 per cent of individuals affected) is -lor, in other
words, the probit of a 16 per cent response is +4. This follows from the
fact (see § 4.3) that about 68 per cent of the area under a normal
distribution curve is within ±a (i.e. within ±1 on a standard normal
Digitized by Google
§ 14.3 367
0·4
·fO.3
-8
~0·2
j
£ o·}
O·OL-~-==--~--........JL- _ _-'-_ _.....L.-_ _ -'-_~ __
-3 -I 0 +1 +3~EDoru
3 4 Ii 6 8 Probit
(=u+5)
FIG. 14.3.1. Standard GaWl8ian (normal) diet.rl.bution (see Chapter 'I.
Sixteen per cent of individuals responding corresponda to a value of u of -1
(the NED), i.e. to a probit of 4.
8 +3
R.,
'S
... ztz.l
i
~ i +2 ...,0='
~
2 -3----~'------~----~~'------~'------~'
2 :r: 4 5 6
FIG. 14.3.2. If the dose (or transformed dose, e.g. log dose) :e = 3 caused
16 per cent of individuals to respond, the probit of 16 per cent, i.e. 4·0, from
Fig. 14.3.1, would be plotted against:e = 3. See complete plots in Figs. 14.2.6 and
14.2.6.
Digitized by Google
368 Probits § 14.4
curve), and of the remaining 32 per cent of the area, 16 per cent is
below u = -1 and 16 per cent is above 1. +
In Fig. 14.2.6 the probit of the percentage response is plotted against
the dose z. The curve is not straight, implying that individual effective
doses do not follow the Ga1188ian distribution in the animal population.
This has already been inferred by inspection of the distribution shown
in Fig. 14.2.1 which is clearly skew. However, in the usual quantal
response experiment the distribution in Fig. 14.2.1 is not itself observed.
The directly observed results are of the form shown in Fig. 14.2.2; and
it is not immediately obvious from Fig. 14.2.2 that individual effective
doses are not normally distributed. When the probit of the percentage
response is plotted against log dose, in Fig. 14.2.6, the line is seen to be
approximately straight, showing that (in this particular instance) the
logarithms of the individual effective doses are approximately normally
distributed (cf. Fig. 14.2.3).
The use of this line is discUBBed in the next section.
Digitized by Google
§ 14.4 ProbitB 369
units, from Fig. 14.2.6) is an estimate of a, the standard deviation of
the distribution of the log lED, and this value of a was used to plot
the distribution in Fig. 14.2.3. This standard deviation is a measure of
the variability of the individuals, i.e. of the extent to which they do not
all have the same individual effective dose.
Assays of the sort described in Chapter 13 can also be done using
quantal responses, and if a log dose scale is used they will be parallel
line assays (see § 13.1).
In all the applications discussed the problem arises of how to fit the
'best' line to the observed points. Methods for doing this have been
described in Chapters 12 and 13 but they all assume that the scatter of
the observations is the same at every value of x, i.e. that the results
are homoscedastic (see §§ 12.2 and 13.1). This is not the case for probit
plots (see § 14.6 for an exception) and this complicates the process of
curve fitting. Numerical examples of the methods are given by Burn,
Finney, and Goodwin (1950, p. 114), and Finney (1964, Chapters
17-21).
The reason for the heteroscedasticity is not difficult to see. The
number of individuals (r) responding, out of a randomly 8elected (notice
that random selection is, as usual, essential for the analysis) group of
n., should follow the binomial distribution (§§ 3.2-3.4), and the variance
of the proportion responding, p = rln., would be estimated from
(3.4.5) to be var[p] = p(l-p)/n.. Because the line is to be fitted to the
plot ofprobit[p](= y. say) against dose metameter, it is the variance of
y = probit[p] that is of interest. From (2.7.13) it is seen that var[g]
~ var[p].(dy/dp)2 = (dy/dp)2.p(l-p)/n.. Now the standard normal
curve, in Fig. 14.3.1 can be written (by (4.1.1» as dp = / dy, and thus
dy/dp = 1/1, where / is the ordinate of the standard normal curve
(the probability density, see § 4.1 and (4.2.1) ;/was used with a different
meaning in §§ 14.2 and 14.3). This result follows, slightly more rigorously
from (14.3.1) and (4.1.5). Therefore var[g] ~p(l-p)/n.r, and this is
not a constant but varies with p. The probit plot is therefore hetero-
scedastic and eaoh probit (y value) must be given a weight I/var[g]
~ n.rlp(l-p) when fitting the dose response lines (of. §§ 2.5 and 13.4);
it is this that gives rise to the complications. When a line is fitted. it will
lead to a better estimate of the y corresponding to each x, and hence to
better estimates of the weights and hence to a better fitting line. The
calculation is therefore iterative.
It is because of the existence of this theoretical estimate (of. § 3.7)
of var[g] that the deviations from linearity of Fig. 14.2.6 can be tested
Digitized by Google
360 ProbitB § 14.4
even though there is only one observation (y value) at each x value
(of. §§ 12.5 and 12.6).
H the weight is plotted against p {Fisher and Yates (1963, p. 71)
give a table ofr/p{l-p», it is found to have a maximum value when
p = 0'5, i.e. 50 per cent response rate. This is the reason why the
ED50 is caJculated as a measure of effectiveness. It is the quantity that
can be determined most precisely.
Digitized by Google
361
sample size (n) would have to be much bigger. And this is not the
only problem. Working with small proportions means working with
the tails of the distribution where assumptions about its form are least
reliable. For example, the straight line in Fig. 14.2.6 might be extra-
polated-a very hazardous process, as was shown in § 12.6, Fig. 12.5.1.
Digitized by Google
362 Probits § 14.6
are many other curves that closely resemble the sigmoid cumulative
normal curve in Figs. 4.1.4 and 14.2.4. One example is the logistic
curve definedt by
1
(14.6.1)
This plotted in Fig. 14.6.1, curve (b), and is seen to be very like the
cumulative normal curve. If the relation between p and x was repre-
sented by (14.6.1) then it could be linearized by plotting logit[p]
P=Y/Y...
1·0
FIG. 14.6.1. Curve (a) Plot of p against II from eqn. (14.6.8). When b = 1
this curve is part of a hyperbola.
Curve (b) Plot of p against :e from eqn. (14.6.1). This curve is the same sa
curve (a) with II plotted on a logarithmic scale (three equivalent ways of plotting
:e = log II are shown). It is a logistic curve and can be linearized by plotting
logit{p] against :e.
The particular values WJed to plot the graphs were K = 100, b = 1.
Digitized by Google
§ 14.6 prob;ts 363
(instead of probit) against ~, where logit[p] is defined as log.{P/(I--p)}.
This follows from (14.6.1) which implies
11 1 1
P = Ymaz = l+e-r - II- - = l+e-r - -..
1 1
(14.6.3)
Digitized by Google
364 ProbitB § 14.6
Summarizing these arguments, if the response y, plotted. against
dose or concentration z, follows (14.6.3) (which in the special case
b = 1 is the hyperbola plotted in Fig. 14.6.1, curve (a), and in Fig.
12.8.1), then the response plotted against log concentration, x = log z,
will be a sigmoid logistic curve defined by (14.6.1) and plotted in Fig.
14.6.1, curve (b). And logit fJ//Ymaz] plotted against x will be a straight
line with intercept a = -log X, and slope b. Quite empirically,
equations like (14.6.3) are often found to represent dOB&-response
curves in pharmacology reasonably well (the extent to which this
justifies physical models is discussed by Rang and Colquhoun
(1973», so plots of response against log dose are sigmoid like Fig.
14.6.1, curve (b). The central portion of this sigmoid curve is sufficiently
nearly straight to be not grossly incompatible with the assumption,
made in most of Chapter 13, that response is linearly related to log
dose.
It is worth noticing that the sigmoid plot of Y against x in Figs.
14.2.4 or 4.1.4 (the cumulative normal curve, linearized by plotting
probit[P] against x) looks very like the sigmoid plot of Y against x in
Fig. 14.6.1, curve (b) (the logistic curve, linearized by plotting logit
[P] against x). However, if x is log z, then the corresponding plots of Y
against z (e.g. response against dose, rather than log dose) are quite
distinct. The corresponding plots are, respectively, that in Fig. 14.2.2
(the cumulative lognormal distribution, see § 4.5), which has an
obvious 'foot', it flattens off at low z values; and the hyperbola in
Fig. 14.6.1, curve (a), which rises straight from the origin with no
trace of a 'foot' or threshold'. This distinction is effectively concea.Ied
when a logarithmic scale is used for the abscissa. (e.g. dose).
In order to use the logit transformation for continuously variable
responses it is necessary to have an estimate of the maximum response,
Ymax' This introduces statistical complications (see, for example,
Finney (1964, pp. 69-70». A simple solution is not to bother with
linearizing transformations except as a convenient method for pre-
liminary assessment and display of results, but to estimate the para-
meters Ymax' X, and b directly by the method of least squares as des-
cribed in § 12.8.
Digitized by Google
Appendix 1
Expectation, variance, and non-experimental bias
Digitized by Google
366 §AI.I
whioh is n9' (= 3 x 0·9 = 2'7), the population mean value of r (number
of SUooes&e8 in three trials), as mentioned in § 3.4. Notice that in this case
the mean value is never actually observed. All observations must be integers.
Several properties follow directly from the definition of expectation.
For example, for a linear function, where a and b are constants,
E[a+b3:] = E[a]+E[b3:] = a+bE[%] (Al.l.3)
and also, more generally.
(Al.l.4)
(AI.I.7)
= ".
M ea.n oj 1M nonnaZ diatribulion
Using (Al.l.2) the statement that the parameter pin (4.2.1) can be inter-
preted as the mean of the normal distribution can be justified. From (AU.2),
= p+O = p (AU.S)
Digitized by Google
§A1.1 367
because the first integral is the area under the whole distribution curve,
i.e. 1. The second integral is zero because, using (4.2.1) for the density, 1(:1:),
of the normal diatributionand putting 1/ = (:I:-pr~80thatd.y = 2(:I:-p) dll:,
it becomes
1
aV(2w)
f(:I:-p)e-CZ-lI)II/S"IIdll: = 1
aV(2w) . 2
f !e-Jl/S"lId1/
= - V~)[e-CZ-")II/S"II]:,", = O. (AI. 1.9)
The lower limit can be taken as 0 rather than -00 because, from (Al.l.IO),
1(:1:), and hence the integral, is 0 for :I:< O. This can be evaluated using inte-
gration by parts. See, for example, Thompson (1965, p. 188), or Massey and
Kestelman (1964, pp. 332 and 402).) Putting u = :1:, 80 du = dll:, and dt1
= A-.I%dll: 80 t1 = J).e-.I%dll: = [-e-.I%], gives
To evaluate this notice that :re-.I% _ 0 as :I: _ 00; see, for example, Massey
and Kestelman (19M, p. 122). The area under the distribution curve up to
any value :1:, i.e. the probability that an interval is equal to or less than :I:
is, from (5.1.4),
F(:I:) = I-e-.I% (Al.1.12)
80 the proportion of all intervals that are shorter than the mean interval,
putting:l: = A-I in (A1.1.12), is
F(A- 1 ) = l-e- 1 = 0'6321, (A1.I.13)
i.e. 63·21 per cent of the area under the distribution in Fig. 5.1.2liea below
.,
the mean (1.0 in Fig. 5.1.2) .
Digitized by Google
368 §AI.I
The median (see § 2.5, p. 26) length of the intervals between random
events is the length such that 50 per cent of intervals are longer, and 50 per
cent shorter than it, i.e. it is the value of z bisecting the area. under the
distribution curve. If the population median value of z is denoted Zm then,
from (Al.l.12),
l-e- um = 0'5,
i.e. Zm = l-llog 2 = 0·69315l- 1• (Al.I.14)
This is shown on Fig. 5.1.2. As expected for a positively skewed distribu-
tion (see § 4.5), the population median is leas than (in fact 69·315 per cent of)
the population mean, l-l. The mode of the distribution is even lower at
Z = 0, as seen from Fig. 5.1.2.
The variance of an exponentially distributed variable, from (A1.2.2), is
v&t(z) = l-I. (AI.1.15)
For details see, e.g. Brownlee (1965, p. 59).
PM ~ion 0/ G fufldion 0/ Z
The expectation, or long run mean, of the value of any function of z, say
g(z) , can be found without first finding the probability density of g(z).
The derivation is given, for example, by Brownlee (1965, p. 55). The results,
analagous to (A1.l.I) and (AI.1.2), are
E[g(z)] = 14(z) P(z) for discontinuous distributions, (AI. 1.16)
E[g(z)] = i + 1It
_ lit g(z) I(z) dz for continuous distributions. (AI. 1.17)
Digitized by Google
§A1.2 369
The sIQ/TlIlo,rdizetl/orm of any random variable, z, can be defined &8 X say,
where
X _ z-E[z]
(Al.2.3)
- YV&t(z)
(see for example, the standard normal distribution, § 4.3). X must always
have a population mean of zero, and population variance of one because
E[~(Z-P)~
N -J =
E~(Z_p)2]
N =
~E[(Z_p)2]
N =
Nv&t(z)
N = v&t(z).
(Al.3.3)
However, if p is replaced by its (unbiased) sample estimate, z, an unbiased
estimate of v&t(z) is no longer obtained (as diBcUBBed in § 2.6). If tr =
~(z-z)2/N then
i-N
Ntr =I (Z_Z)2 = ~[(z-p)_(i_p)]2
i-I
Digitized by Google
370 Appmdi3: I § AI.3
because 2(i-p). l:(z-p) = 2(i-p) (u-Np) = 2(i-p) (Ni-Np) =
2N(i-p)l. Thus, using (AU.3), (AI.1.4), (A1.2.1), and (2.7.8),
NE[8I] = E[l:(Z_p)"_N(i-p)l]
= l:E[(Z_p)I]-NE[(i-p)l]
= NE[(Z_p)I]-NE[(i-p)"]
= N va-t(z)-N va-t(i) = N va-t(z)-N va-t(z)/N
We ahall deal only with the ca.se where the 8 values are independent, and each
S value is made up of a random sample (of variable size) from the population of
III values. It is assumed in (Al. •• 8) that the 8 values all have the same mean and
variance, for example, that they are from a single population.
The probability that S". is equal or less than a specified value S, i.e. the dis-
tribution function of the sum (see § •. 1), can be written, looking at each possible
value of m separately, as
Pf.S". ~ 8] = Pf.(m = 1 and Sl ~ 8) or (m = 2 and SI ~ 8) or... ].
(AU•• )
Digitized by Google
A1.4 37
The events in parentheses are mutually exclusive 80, using the addition rule
(2.4.2), this becomes a sum (over all possible m values), viz.
!p[m = m and S", ~ 8]. (Al.4.5)
m
Now, using the rule in its gen<5<5ni (t,4.4) shows that
~tm = m &DdS", ,<5<5itten in terms unnilitiunal probabilities
P[Sm~8Im = nnd 80 (Al.4.5) <5ee,nn'""n
Ip[S",~8Im = (AI.4.6)
can be written
F(8) = !F(8Im).p[m = m] (AI.4.7)
m
p(8) !(8)dS
= Lg(S).~(Slm).p[m = m].dS
= ~ {fl(S) !(Slm) dS}.p[m = m] (Al.4.9)
I{Esfg(S.)lm = (AU.10Y
272
The last step folluwe [",n,,,,,,*, term in curly Dwn',,,eTU (AL'i.9) is simply the
g"pectation of the ,,,Pen m has a fixg~ ALe value of this
of course depenP t,mction of) the valn*" 80 (AI.4.IO) has
form! (functin" = m)], just like means that the
m
term in curly brackets is being averaged over all m values and (AI.4.IO) can
therefore be written
(AU.Il)
which describes in symbols the two-stage averaging process mentioned at the
~gginning of the sec$,inn, *n"nult is much mOn<5 it appears from
derivation. If nny two random nuntinuous or dis-
ngntinuous, then, IAI.4.Il) we haVg
EII{EJg(z,y)ly (A1.4.12)
The mean value $",lloWB directly if the function
g(SIII) is simply identified with Sill' Averaging the sum for a fixed value of m,
using the definitions in (AI.4.I)-(Al.4.S), gives the term in curly brackets in
(AI.4.Il) 88
(AU.IS)
gitized by Goo
372 AppetUliz 1 §Al.4
i.e. the average value of the total of a fixed number, m, of values of IS is m times
the average value of IS, fairly obviously. Actually, this step is not quite &8 obvious
&8 it looks. Written out in full we have
Digitized by Google
AI.4 373
Using the coefticientB of variation defined in (2.6.4), i.e.
~(i) €I./P., = and == vt",,,,(S,,,)]tR3Zt,4i,,,],
we get, using (Al.4.21) and (A1.4.16),
CAt.4.22)
Digitized by Google
§A2.1 375
idea. ca.n also be expressed by saying, at the risk of being anthropomorphic,
that the process has 'no memory' and therefore is unaffected by what has
happened in the past. or that the process 'does not age' (see also Cox
(1962. pp. 3-5 and 29».
Examples of Poisson processes discussed in Chapters 3 and 5 were the
disintegration of radioactive atoms at random intervals and the random
occurrence of miniature end plate potentials (MEPP). Other examples are
(1) the random length of time that a molecule remains adsorbed on a mem-
brane before being desorbed (e.g. an atropine molecule on its cellular receptor
site, see § A2.4), and (2) the random length of time that elapses before a
drug molecule is broken down in the experiment described in § 12.6.
The lifetime of a molecule on its adsorption site (or of a drug molecule
in solution, or of a radioactive atom) is a random variable with the same
properties as the random intervals between MEPP (see § 5.1). In the case
of the adsorbed molecule, this implies that the complex formed between
molecule and adsorption site does not age, and the probability of the complex
breaking up in the next 5 seconds, say, is a constant and does not depend
on how long the molecule has already been adsorbed, just as the probability
of a penny coming down heads was supposed to be constant at each throw,
regardless of how many heads have already occurred, when discussing the
binomial distribution in Chapter 3. Consequently the Poisson distribution
ca.n be derived from the binomial as explained in §3.5. Another derivation
is given in § A2.2 below.
The arrival of buses would not, in general, be a Poisson Process, although
it often seems pretty haphazard. The waiting time problem for ra7Ulomly
arriving buses. discussed in § 5.2, is typica.l of the sort of result that is
usually surprising and puzzling to people who have not got used to the
properties of random processes. I certainly found it surprising and puzzling
until recently, and so I hope the reader will find the results presented below
as enlightening as I did.
Fur further reading on the subject see, for example, Cox (1962), Feller
(1957, 1966), Bailey (1964, 1967), Cox and Lewis (1966), and Brownlee
(1965, p. 190).
Digitized by Google
376 § A2.2
time intervals, tll, are considered then the probability of 07U event in the
interval between t and I+tll should be written Atll+O(tll) (see (A2.2.9».
Furthermore, the probability of more than 07U event occurring in the interval
ru becomes negligible when the interval is very short, and so it is also
~TI'TI'TITI" u(tll), as shown in (A2,2Jl),
o(tll), which ilihen discU88ing ili~%TICTI%TI%TIili'ILi qrocesses,
stand for any becomes negligiqlLi to III
ikii%TIrvailength tll R:iiiuumTIili small (it does TI'TTI'md for
q%TI%TIntity, and maq in the same exp%TIm:::TI'i%u%TI for
i!i:iiil%TIu3LiLi quantity each precisely, any iliTI'itten
o(tll) if it obeys the definition
1:!e~») = 0 (A2.2.1)
AP(O, f) (aU.2.3)
gle
§A2.2 377
This is found using the condition that P(O, 0) = eO = 1 (i.e. it is certain
that zero events will occur in zero time). The solution is easily checked by
differentiating (A2.2.4), giving (A2.2.3) back again, thus dP(O, t)/dl =
de-AI/dl = -AP(O, t). Equation (A2.2.4) is just the probability of zero
events occurring in time I given by the Poisson distribution (3.5.1), if J.
is interpreted &8 the average number of events in unit time (see §§ 3.5, and
5.1 and eqn (Al.l.7», so III = At is the mean number of events in time I.
To find the Poisson distribution when r > 0 notice that r events will
occur between 0 and 1+ tu if either
[(r events occur between 0 and I) and (zero events occur between I and I+tu)]
or
[(r-I events occur between 0 and I)
and (one event occurs between t and I+tu)].
The probabilities of the four events in brackets have been defined &8 P(r, I),
(I-J.tu-o(tu», P(r-I, I), and Atu+O(tu) respectively. Therefore, using
the addition rule (2.4.2) and the multiplication rule for independent events
(2.4.6), the probability of r events occurring between 0 and t+ltu becomes
d P(r,t)
~ = -J.P(r,t)+AP(r-I,t). (A2.2.7)
Digitized by Google
378 §A2.2
whioh is the Poisson distribution defined in (3.5.1) (see also § 5.1), beoaU8e it
baa been shown in (A2.2.4) that (A2.2.8) does aotually hold for r = 0 &8 well.
This solution is easily ohecked by differentiating (A2.2.8) giving
dP(r ') d (Ae)r ) lr
- - ' - - _ _ e- AI - - rlr-l e- AI
de - de r!' - rr'
+-
(Ae)r
r! .
(_18-.11)
_ ~
(Ae)r-l
_ _ e-AI , (Ae)r
_ .-.11
-A'(r_I)1" -A· rl ·..
= lP(r-I,t)-AP(r,I).
Thus (A2.2.8) is a solution of (A2.2.7).
WAy 1M remainder tmn8 ran be negleclal
Having derived the Poisson distribution, the remainder terms, which were
written o(~t) above, can be written explicitly, so it can be seen that they do in
fact become negligible relative to ~t when ~t-+O, .. stated in (A2.2.1).
The probability of r = 1 event occurring in the interval ~ is found by putting
r = 1 in (A2.2.8). The exponential is then expanded in series (8.8 in (8.5.2)) giving
= AAt+o(At) (A2.2.9)
8.8stated at the beginning of this section. All the terms but the ftrst on the
penultimate can be written 8.8 o(At) because they obey the deftDition (A2.2.1),
thus
(A2.2.10)
because every term is sero when At becomes sero.
The probability that more than one event (r> 1) 0CCUl'8 in At is, from (A2.2.8),
(A~We-A~'
rl
and for a.ll r> 1 this can also be written o(At). For example, for r = 2 we have.
using the definition (A2.2.1),
= lim (AAt • e-
At-+O 21
-'4')
= 0 (A2.2.1l)
8.8 stated at the beginning of this section.
Digitized by Google
§A2.3 379
A2.3. The connection between the lifetimes of Individual
adrenaline molecules and the observed breakdown rate
and half-life of adrenaline
In the experiment analysed in § 12.6 it is found that when adrenaline
was incubated with liver slices in f1iWo the concentration of adrenaline
fell exponentially or, to be more precise, there was no evidence that the
relationship was not exponential. The estimated rate constant was
k = OO()7219 min- 1 (from (12.6.14», i.e. the estimated time constant was
Ilk = 13·85 min (from (12.6.16», and the estimated half.life, from (12.6.6),
was 0·69315/k = 9·602 min. The arguments in this section apply equally to
the disintegration of radioisotopes since the number of radioactive nuclei is
observed to fall with an exponential time course. One then considers the
lifetimes of individual unstable nuclei.
Focus attention on single adrenaline molecules. Suppose that they are
perfectly stable until, at zero time, the adrenaline solution is added to the
liver preparation that contains enzymes catalysiDg its catabolism. Suppose
that after the addition of enzymes at t = 0, there is a constant probability,t
l&+o(&) say, that any individual adrenaline molecule will be catabolized
in any short interval of time &. As before, l is a constant (it does not
vary with time) that characterizes the rate of catabolism. The probability
that the molecule will flOC be catabolized, from (2.4.3), is therefore I-l&
-0(&). The argument is now exactly like that in § A2.2. Denote as P(I)
the probability that the molecule is still intact at time t. The molecule will
still be intact at time t+& if
(it is still intact at time t) and (it is not catabolized between t and ,+ &).
H these events are independent, then the multiplication rule of probability,
(2.4.6), implies
P(t+&) = P(I).[I-l&-o(&)]. (A2.3.1)
This is like eqn. (A2.2.2). Rearranging gives
P(I+&L-P(I) = -).P(')-P(').o(~)
and, using (A2.2.1) just as in § A2.2, when & _ 0 this becomes dP(I)/dt
= -AP(t) (see, for example, Massey and Kestelman (1964, p. 59». The
solution (using the condition that P(O) = I, i.e. it is certain that the molecule
is still intact at zero time) is, as in § A2.2,
P(I) = e- A'. (A2.3.2)
Now in a large population of molecules the probability that a molecule
will be still intact at time I can be identified with the profJOrlion of molecules
t See § A2.2. A fuller explanation of the natU1'e of the term o(A,), which become.
neglisJ'ble for abort enough time intervala, ia liven in SA2.1.
Digitized by Google
380 §A2.3
that are still intact at time I, i.e. y/Yo where y is the concentration of adrena·
line at time t, and Yo is the initial concentration. Equation (A2.3.2) is now
seen to be identical with the observed exponential decline of concentration
(eqn. (12.6.4» if the rate constant, i, is identified with A.
Furthermore, the probability that a molecule is still intact at time I,
given by (A2.3.2), can be identified, just as in § 5.1, with the probability
that a molecule has a lifetime greater than t (if it did not it would not still
be intact). The probability that the lifetime is equal to or lu8 than I is
therefore, from the addition rule (2.4.3), and (A2.3.2),
I-P(t) = l-e- A1 =F(I), (A2.3.3)
which is exactly like (5.1.4) (the distribution function, F, was defined
in (4.1.4». This is consistent with (see § 5.1) the hypothesis that lifetimes
of individual adrenaline molecules are random variables following the
exponential distribution (see Fig. 5.1.1 and 5.1.2), with probability density
(from (4.1.5) and (A2.3.3»
dF(I)
1(1) = ( i t = le- A1 (I ~ 0) (A2.3.4)
Digitized by Google
§A2.4 381
for the interaction of drug molecules with cell receptor sites and, &8 such,
it is diBcusBed by Rang and Colquhoun (1973).
Consider a single site. The probability that a site is occupied by an ad-
sorbed molecule at time' will be denoted Pl('), and the probability that the
Bite is empty at time 1 will be denoted P 0(1). ThU8, from (2.4.3),
Po(I) = I-P1 (t). (A2.4.1)
The probability that an empty Bite will become occupied would be ex-
pected to be proportional to the rate at which solute molecules are bombard-
ing the surface, i.e. to the concentration, c say, of the 801ute (aBBUmed
constant). The probability that an empty site will become occupied during
the short interval oftime Ill, between t and 1+ Ill, will therefore be written
kill where k, &8 in §§ A2.2 and A2.3, is a constant (i.e. does not change
with time). The probability that an occupied Bite becomes empty during the
interval III will not depend on the concentration of 801ute, and so will be
written pill, where p is another constant. The probability that an occupied
site does noC become empty during III is therefore, from (2.4.3), I-pill. t
Now a Bite will be occupied at time 1+ III if either [(Bite W&8 empty at time I)
and (Bite is occupied during interval between I and ,+Ill)] or [(site was
occupied at time t) and (site does noC become empty between I and t+Ill)].
Now the probabilities of the four events in parentheses have been defined &8
Po(t) , kill, P 1(t), and (I-pill) respectively. So, by application of the
addition rule (2.4.2), and the multiplication rule (2.4.6) (&88uming, &8 in
§§ A2.2 and A2.3, that the events happening in the non-overlapping intervals
of time, from 0 to I and from I to t+ Ill, are independent), it follows that the
probability that a site will be occupied at time t+1ll will be
P 1 (t+lll) = Po(I).klll+P1 (t)(I-plll)+0(1ll), (A2.4.2)
where 0(1ll) is a remainder term that includes the probability of several
transitions between occupied and empty states during Ill. As in §§ A2.2
and A2.3, 0(1ll) becomes negligible when III is made very small. Rearranging
(A2.4.2) gives
(A2.4.3)
Digitized by Google
382 §A2.4
If Pl(t), the probability that an individual site is occupied at time t, is
interpreted as the proportion of a la.rge population of sites that is occupied
at time t, then (A2.4.3) is exactly the same as the equation arrived at by a
conventional deterministic approach through the law of mass action, if
A and P are identified with the mass action adsorption and desorption rate
constants. Thus AlP is the law of mass action affinity constant for the solute
molecule·site reaction. The derivations and solution of (A2.4.3), and its
experimental verification in pharmacology is discussed by Rang and
Colquhoun (1973).
Phe lengIA oj "me Jor whiM an ad8orption riCe is occupied; iC8 tli8tribul':ion and
(A2.4.4)
This equation has already been encountered in §§ A2.2 and A2.3. Integration
gives the probability that a site will be occupied, at time t after transfer to
solute free medium, as
(A2.4.5)
where P 1(0) is the probability that a site will be occupied at the moment of
transfer (' = 0). In other words, the proportion of sites occupied, and there-
fore the amount of solute adsorbed, would be expected to fall exponentially
with rate constant p. Such exponential desorption has, in some cases, been
observed experimentally.
Now if the total number of adsorption sites is N tot, then the number of
sites occupied at time' will be N(t) = N tot P 1 (t), and the number occupied
at t = 0 will be N(O) = N totPl(O). The proportion of initially occupied sites,
that are still occupied after time t will, from (A2.4.5), be
N(') _ P 1 (t) _ _/I'
N(O) - P 1(0) - e , (A2.4.6)
and this will also be the probability that an individual site, that was occupied
at t = 0, will still be occupied after time 't.
. A site will only be occupied after time t if the length for whioh the molecule
remains adsorbed (its lifetime) is greater than t, so (A2.4.6) is the probability
t i.e. baa been oontiDuoualy oooupied between 0 aDd e.
Digitized by Google
§A2.4 383
that the lifetime of a.n adsorbed molecule is longer than t. Analogous situations
were met in §§ 5.1 and A2.3. The probability that the lifetime of a.n adsorbed
molecule is t or Zu8 is therefore, from (2.4.3),
P(O~lifetime~') == F(t) = l-e- II'. (A2.4.7)
This is exactly like (5.1.4) a.nd (A2.3.3), and is consistent with (see § 5.1) the
hypothesis implied by the physical model of identical and i~
adsorption sites, that the lifetime of individual adsorbed molecules is a.n
exponentially distributed variable, with probability density as before, from
(4.1.5),
dF(I)
/(1) = Cit = ,sa-II'. (A2.4.8)
Digitized by Google
884: §A2.4
the bus stop during a long interval than a short one. In fact, the mean life-
time (from the moment of adsorption to the moment of desorption) oj
mokcules Fe8W at an arbitrary moment (such as the moment when the
surface with its adsorbed molecules is transferred to solute-free medium) is
exactly twice the mean lifetime of all molecules, i.e. it is 2p -I, so the average
residual waiting time until desorption is p-l as stated, and the mean length
of time that a molecule has &lre&dy been adsorbed at the arbitrary moment
is also p-l, of. § 5.2). These statements are further discussed and proved in
§§ A2.6 and A2.7.
(A2.4.8)
which has e:u.ctly the same form as (A2.4.4). Using the same arguments as
above, it follows that the length of time for which a site remains empty is
an exponentially distributed random variable with a mean length of
(h:)-1. The mean length is inversely proportional to the concentration of
solute (c). As above, this is the lifetime measured either from an arbitrary
moment, or from the time when the site was last vacated by a desorbing
molecule.
AdBorptWn at equilibrium
.After a long time (t _ 00) equilibrium will be reaohed, i.e. the rate at
which molecules are desorbed will be the same as the rate at whioh they are
adsorbed. Therefore, the proportion of sites oocupied, PI' will be coD8t&nt,
i.e. dP1/dt = o. Equation (A2.4.3) gives
h:(I-P1)-pP1 = 0
from which it follows that, at equilibrium,
h: Ke
p ------ (A2.4.10)
1- h:+p - Ke+l
if K = I.!p, the law of m&BB action affinity constant. This equation is the
hyperbola in §§ 12.8 and in 14.6. Now it has been shown that the mean
length of time for whioh an individual site is occupied is p-l, and the mean
length oftime for which it is empty is (h:)-1. These values hold whether or
Digitized by Google
A2.4 385
not equilibrium has been reached. t Mter transferring a membrane with
empty sites, to a solution containing a constant concentration, c, of solute,
the empty sites will have to wait, on average, (Ac)-1 seconds before they
become occupied so equilibration will take time; see Rang and Colquhoun
(1973). Using these values, it follows that
(uuuupied time/empty (A2.4.11)
therefore written
1
(A2.4.12)
(UTrFpty time/occupiuY
For example, if the probability that a site is oocupied is PI = 0·5 (this will
be independent of time at equilibrium), i.e. 50 per cent of sites, on the average,
are ocoupied at any moment of time, it follows from (A2.4.12) that empty
time = occupied time, Le any given site is occupied for 50 per cent of the
time. This state is attained, at equilibrium, when = ",-1, Le. when
concentration c = ",I). = 11K directly from
(Yt.4.l0».
tty,S. The relath:m uiEiuu"u'iEiiEiiEi the lifetime ot ±Trtt,tt±Tr±,Tr± radioisotope
molecules Buterval betweeiEi Tr'i"ii±t~"t"""ln,n.
The examples ot intttiEiuals between plate potentials
(MEPP) discussed in § 5.1) and between bus arrivals (in § 5.2) were straight-
forward in that there was in each case a single continuous stream of events.
In the case of radioisotope disintegration (§§ 3.5-3.7), catabolism of adrena-
line (§ A2.3), or adsorption of solute molecules (§ A2.4) the situation is
not quite the same. For each isotopic atom there is one event, disintegra-
Nevertheless intervals betwzum buses have the
properties as intervals definiEit TIiuutimes of isotope
utzums (or adrenalinu tzr solute ±4ite complexes).
The mean lifetimu molecules, any arbitrary
time (see §§ A2.3, und A2.7) may of years. For
uuumple the half-litzz lifetime, see § uuzub'un-14 molecules
is 5760 years, so the mean lifetime of a molecule is ).-1 = 5760/0·69135 0.69315
= 8310 years (from (Al.l.ll) and (Al.l.I4», i.e. multiplying by the number
of seconds in a Gregorian year, 8310 X 3·155695 X 107 = 2·6224 X lOlls.
This is obviously independent of the amount of 1'0 present.
However, in § 3.7 the Poisson distribution considered was that of the
zz±±mber of second (it will tor the sake of
u±±zzmple, that the iUUZJlued was 140). Beu±uiUJi uUYiEiFble is Poisson-
d titributed with mu,uiEi, in § 3.7, dtT9'5 disintegra-
If the movemenUf could be obeervttLY, length of time for
Tr£>"'oh a site W88 ooouZ;,i""E meamred, but th±4 obviously have
taken over a Ion£> tiiiOe, relative to ",-z, t,±4±4n if many sites,
rather than just one, were obB9rved. It can be shown that the time constant for equilibra-
tion of the sites is (Ao+",)-l (_, for example, Rang and Colqulloun (1973)),80 in fact
the average can only be given a frequency interpretation over a period that is long
relative to the time taken *0 reach equilibrium.
gitized by Goo
386 A~2 §A2.5
tioDS per second, the mean number of events in I = 1 second (assuming that
the counter detects all disintegratioDS), it follows from the arguments in
§ 5.1 that the intervals between disintegratiODS are exponentially distributed
with mean interval (1')-1 = 1/2089·5 = 0·000478583 second (this obviously
depends on the amount of 1'0 preeent). Compare this with the lifetimes
of individual molecules that are also exponentially distributed with mean
lifetime 1- 1 = 8310 years. These two exponential distributioDS are, as
expected, closely related. This will now be shown.
The probability that any individual 1'0 atom disintegrates in an interval
of time of length Ill, from the arguments in §§ 5.1 and A2.3, must be llll. t
Suppose that at time I a sample of 1'0 contains N(I) undisintegrated 1'0
atoms. Define as an 'event' the disintegration of any of these atoms, i.e.
if the atoms could be numbered, the disintegration of either atom number
1 or atom number 2 or . . . or atom number N(t). The probability of this
event occurring in an interval of time of length Ill, is, from the addition rule
(2.4.1),
11ll+11ll+ ... +11ll-0(1lI) = N(I)llll-o(IlI), (A2.5.1)
where 0(1lI) is a remainder term (see (A2.2.1» that includes all the prob-
abilities of more than one disintegration occurring during Ill, which will be
negligible when III is made very small. The argument now follows exactly
the same lines as in § A2.3. Define the probability that no event occurs up
to time I as P(t). No event will occur up to time 1+ III if (no event occurs up
to t) and (no event occurs between t and 1+ Ill), and the probability of this is,
from the multiplication rule (2.4.6),
P(t+llI) = P(t)[I-N(t)llll+o(IlI)]. (A2.6.2)
Rearranging this and allowing 1lI-0 gives, as in (A2.2.2) and (A2.3.1),
dP(t)/dt = -N(t)lP(t). Now ijthe length of time considered is short enough
for the decay of the radioisotope to be negligible (as assumed in § 3.7) then
N(I) can be treated as a coDStant. It follows that the solution for pet),
using the condition that P(O) = 1 (i.e. it is certain that no events will
occur in zero time), will be, as before,
P(I) = e-N(I)AI, (A2.5.3)
just as (A2.3.2). This probability, that no disintegration will occur up to
time I, can be identified with the probability that the interval between
disintegratioDS is longer than I. Using the same arguments as in § A2.3 it
follows that the interval between disintegratioDS is an exponentially dis·
tributed variable with a mean length, defined above as (1')-1, of (N(I)1)-1,
and the mean number of disintegratioDS per second is therefore
l' = N(I)l (A2.5.4)
Digitized by Google
§A2.5 A~2 887
which decreaaes, as expected, as the total number of isotope moleonlee,
N(I), decreases. The intervals, will of COUl'8e, only be exponentially distributed,
and the disintegration rate will only be Poisson distributed, over time
intervals short enough for N(I) to be substantially coutant. Using (A2.5.4)
and the figures given above for the example in § 3.7 shows that the number
of 1'0 atoms present at the time the sample was counted mUlt have been
N(I) = ).'/). = 2089·5 (atoms 8- 1) x 2·6224 X 1()11 (8).
= 5·4795 X 10!t atoms.
Therefore the weight of !to was
5·4795 X 1014/6.023 X IOU = 9·098 X 10- 10 gramme molecules, or
9·098 X 10- 10 X 14 = 12·74 X 10- 8 g.
A more careJullook at the nature oJ the o( at) termB in prOCUlle8 like the
cataboliBm oJ adrenaline, the decay oJ radw-i8otopeB and the ad8orption oJ
moZeculea
The basic Poiaaon proceaa coD8lsta of a continuous stream of event., such .. the
occurrence ofmiDJature end plate potentiaJs (see 15.1) or the random arrivala of
buses at a bus atop. It W88 shown in I A2.2 that in thla 80rt of proceaa the prob-
ability of one event occurring in a finite time interval flt can be written .. Aflt
+o(flt). Obviously this probability cannot be written simply .. Aflt because this
would become indefinitely large, if long enough time intervala were considered,
whereas all probabilities must be leaa than I.
In processes like the catabolism of adrenaline, the decay of radioisotopes, or
the adsorption of molecules, the situation Js not quite the same. Each adrenaline
molecule can only be destroyed once, 80 one cannot consider the probability of it
being "destroyed f' times during flt" .. in I A2.2. Neve1'theleaa it clearly will not
do to say that the probability of cataboliam (decay, adsorption, etc.) during
flt Js Aflt, because, .. above, thJa can be greater than I. Suppose that thla prob-
ability can be written Aflt+o(flt). The argument in the ftrst part of this aect.lon
can now be made more rigorous.
The cataboliam (decay, etc.) of dUrerent atoms during a finite time flt are not
mutually exclusive event., 80 the simple addition rule cannot be used. Instead,
the binomial theorem should be used. In the Jangua.ge of II 8.2-3.4, let a 'trial'
be the observing of a molecule during the time flt, and let a 'SUCCe88' be the
occurrence of cataboliam (decay, etc.) during thJa period. If, 88 above, there are N
molecules present altogether, then the probability that one of them will be
catabolized (decay, etc.) during flt can be identifted with the probability of
f' = 1 SUCCe88 occurring in N triala, and thla Js given by the binomial distribution,
(8.4.8), 88 N9'(I-IJ')N-l where it baa been supposed that the probability of
success at each trial can be written 9' = Aflt+o(flt). ThJs probability Js the same
at every trial 88 discU8lled in 18.5. Substituting it for 9' in N9'(l-9')N-l, and
expanding the resulting expression (use the binomial expansion on (l-9')N-l),
it Js found that the required probability of one of the N molecules being cata-
bolized during flt can indeed be written
N9'(I-IJ')N-l = NAflt+o(flt) (A2.5.5)
88 asaerted, on the basis of a simpWled argument, in (A2.5.1).
ThJs argument can now be turned upside down starting from the experi-
mental observations and working backwarda. The decay of radioJaotopea, and,
Digitized by Google
388 A~2 §A2.5
in lIOIDe circumatancee at leaat, the catabolism of molecules, and the deaorbtton
of a.d8orbed molecules, are ob.tenHId to follow an exponential time coune. In each
cue the impHcation is that the probability that a molecule is stUl intact at time
t, I-F(t), is e- AI • This is coDBistent, 88 deacribed in earBer sectioDa, with the
phyaical model that apeciJles that the Metime of individual mo1ecu1ee is an
exponentially distributed va.ria.ble with mean A-I, In the cue of radioisotope
decay, this can be confirmed experimenta.lly by the observation that the number
of disintegrations in unit time is Poisson distributed (over timee during which N
is subata.nti&lly constant). Now, if the number of molecules catabolized, etc.
during Ilt is Poisson distributed, and the mean number of events during Ilt is
A' Ilt 88 above, then the probability that one molecule of the N preeent will be
catabolized, etc. during Ilt is given by the Poisson distribution, (3.5.1), with
f' = 1 and #It = A' Ilt, i.e. it is A' Ilte-A~I. Substituting A' = NA from (A2.5.'), and
expanding the exponential term exactly 88 in (A2.2.9), givee the probability of
one of the N molecules being catabolized, etc. during Ilt 88
just 88 in (A2.5.5) and (A2.5.1). Now, according to the argument above, this can
be equated with NIJI(I-IJI)H-l, where IJI is the probability of any individual
molecule being catabolized during Ilt. The only two solutions of this equation for
IJI are' = Allt or' = AIlt+o(Ilt). The former will not do, 88 explained above,
so the probability must be written AIlt+o(llt) 88 asserted in II A2.3-A2.5.
A2.S. Why the waiting time until the next event does not
depend on when the timing is started. for a Poisson
proce..
The assertion that waiting time does not depend on when timing wsa
started baa been made repeatedly in Chapter 5 and this appendix. For
example, the mean waiting time until a molecule is desorbed does not depend
on the arbitrary time when the timing is started, and will be the same,
A-I, sa if the timing were started from the moment the molecule wsa adsorbed.
Suppose that the interval from one event to the next is exponentially
distributed with mean A-I. It will be convenient, sa at the end of § 4.1,
to use l to stand for time me&BUred from the laat event considered sa a
random variable, and I, '0'
etc., to stand for partioular values of l. Suppose
that a time 10 is known to have elapsed from the last event. Given this fact,
what is the probability that the time from '0
until the next event (the
refttlual "fmfM) is less than any specified time I, i.e. what is the probability
that l<Io+' (event El say) given that l>to (event Ea say)' In symbols
this is P(E1 IEa), i.e. from the definition of conditional probability (2.4.4),
P(l<Io+I uull>Io)
P('<'o+'I'>Io) == P(l> ' 0 ) •
(A2.6.1)
Now the event that (l<Io+I and l>Io) is the same sa the event
'o<l<'o+' and, because the interva1a between events are being suppoeed
Digitized by Google
§A2.6 389
to follow the exponential distribution (0.1.3),/(') = hi-A' with mean interval
between events = A-I, the probability of this is, as in (4.1.2),
(A2.6.2)
The denominator of (A2.6.l) is (cf. (0.1.4)),
Substituting (A2.6.2) and (A2.6.3) into (A2.6.1) gives the required conditional
distribution function (cf. (4.1.4)) for the residual life-time, t, (meaB'Ured from
to to the nezI etI67'It) as
(A2.6.3)
which is identical with the distribution function «0.1.4) or (A2.3.3)) for
the intervals between events (mea8Ured lrom. ,he laBt etlW to the nezI etlw).
Differentiating, as in (A2.3.4), gives the probability density for the residual
lifetime, I, as 1(1) = la-A', the exponential distribution with mean A-I
(from (AI.I.11)), exactly the same as the distribution of intervals between
events. The common-sense reason for this curious result has been discussed
in words in §§ 0.2 and A2.4, and is proved in § A2.7.
Digitized by Google
390 A~2 § A2.7
of beiDg picked. Sampling of this sort is described as ~ (see,
for example, Cox 1962, p. 65).
The specifying of the arbitrary moment of time constitutes the choice of
an interva.l (the interval in which the time falls) from the population of
intervals between events. Imagine that intervals are repeatedly chosen in
this way. What will their average length bet First, the distribution of their
length mm be found.
'.
I'.· (A2.7.1)
all II
H these fractions are added up for all intervals that are longer than some
specified length " the result is the proportion of time occupied by intervals
longer than ,:
=1-1'1(') (A2.7.4)
t i.e. a point choeen at random with the uniform (or rectangular) diatribution over
the interval 0, 1:'I.
all II
Digitized by Google
§A2.7 A~2 391
probability of choosing an interval of length '. is
constant x,. = ~.
~
'. '. (A2.7.5)
aU "
It follows, using the addition rule, (2.4.2), that the probability of choosing
an interval longer than , is found by adding these probabilities for all inter-
vals longer than' giving
Digitized by Google
§A!.7
-1-11 (1)
_ f.-4(')dI
(A2.7.9)
-!-"(I)dl°
1.0
.~.
1
Ii< 0.4
2 :1 4 5
"'.(duration of interval &8 a multiple of the mean of all intervala)
Digitized by Google
§A2.7 393
For the exponential distribution of intervals in the population, which is
what we are interested in, substitute the definition of this distribution,
J(I) = ).a-A', in (A2.7.9). The integral in the denominator of (A2.7.9) has
already been shown in (A1.1.11) to be A-I. The numerator of (A2.7.9),
integrating by parts exactly as in (A1.1.11), is
as the proportion of intervals longer than I, when the intervals are ohosen
by length-biased sampling. Compare this with the proportion of intervals
longer than I in the whole population which, from (5.1.4) or (A1.1.12), is
I-F(I) = e- A'. The oumulative distributions are plotted in Fig. A2.7.2.
Digitized by Google
394 §A2.7
1·0
0·9
~
J 0·8
f5
S 0·7
1 0·632
go 0·6
a
~
§ 0·5 -------
~
1 0'4
.£
.5 0·3
1 2 3 4 5
At (duration of interval as a mUltiple of the mean of all intervals)
Digitized by Google
§A2.7 395
The mean length of an interval ch088D by length-biased sampling now
follows from (Al.1.2), and is
To solve this, integrate by parts (see, for example, Massey and Kestelman,
(1964, pp. 332, 402», as in (Al.l.ll). Put u = fll, so du = 2' dt, and put
d" = X'e-A'dt, so " = IXlIe-A'dt = [-Ae-
A']. Thus
E(I) = l" f
ud" = [v,,]- tldu
= [_'2Ae-A']~ -So"(-Ae- A
') (2tdt)
= O+2So"'Ae- A
'dt
= 2A- 1 , (A2.7.15)
i.e twice the mean (A -1) of all intervals, as stated. In the evaluation of the
first term on the second line of (A2.7.15) notice that t2e- A,_O as'- 00;
see, for example, Massey and Kestelman (1964, p. 122). The integral on the
third line of (A2.7.15) is simply the mean of the exponential distribution,
shown in (AI. 1.11) to be A- 1 •
Digitized by Google
Tables
TABLK Al
N onparaf1U/lJrie confolau limiU 1M 1M tUtIim&
See ff 7.3 aDd 10.2. Raak the II obeervat.iol:w aDd take the rth from MCh ead . .
IimitA tWith _pies ....uer thaD II = 8, 95 per cent limits eumot be fOUDd,
bat the P value for the limite formed by the IarpBt uul .....n.t. (,. = 1) obeerva-
UoDa are given (Nair 1940).
31 10 17-oe 8 11-_
2 It 50-0 32 10 98·00 9 11-30
3 It 76·0 33 11 96·60 I II-M
87·6 Sf 11 17·61 10 11-10
"6 It
It 91·76 36 12 96·90 10 11-40
Digitized by Google
Table8 397
66 25 98·44 23 99·08
67 26 915·02 23 99·32
88 26 98·18 23 99·150
89 28 97·08 24 99·24
70 27 915·88 24 99·44
Digitized by Google
TABLL A2
Confidenc.e limits for the parameter of a binomial distribution, i.e. the
pD1y,,s,I,ltion 'kkUCU~,s,yy~,s,'
See 7.7, 7.8, 10.2 and 8.2-3.4. If r 'successes' are observed in & sample of
n 'L~m&ls't con%fudnnce TIh',lite 100 from ~hns. ,1) (7.7.2))
for the propontjon of'succe86ai' in popuuukk2on t 8.z) from kkf&:ktch
sample was drawn, can be found from the table. Reproduced from DocumBtlta
SC'i",u2&:~ftc T,dt£u~kk, 6tb edn bh z,ermi~u2n of z, R. 8. Baslu, bwitzL5kk~
The (ttdgy t~b1ee giuu limitu (or &ll from to 10(z).
n - 11
-,---,- --~'-----
0·00
1 18·87
2 88·88
8 60·00
, &e·87
" 88·88
:i 100-00
0·00
14·29
28·67
42·86
67-14
71·43 -1st
86·71
7 100'00
ft-8
kk 0·00 0'00- ,Y'94
1 12·60 O'S2- tsts'66
2 26·00 S'l~ 66'09
8 S7'6O 8'62- 76'61
, 60·00 U'i'7D- tkk'SO
n 82·60 Y4"~ tl2'48
n 76·00 13"91- kkt'81
7 87'60 47'S6- 99'88
8 100·00 88·011--100·00
Tables 399
95% limits 118% limIu 115')(, limits 011% limIu
, 1GOr/ft lOO81"L l00'u lOO'L lOO81"u , IGOr", lOO81"L 10081"u lOO81"L lOO81"u
---- "---
ft - 18 ft - 17 (continued)
o 0'00 0'00- 2',71 0'00- 38'47 5 29'41 10,81- 55'116 8'117- 63·10
1 7-611 0'1~ 88'03 0·04- "'110 6 35'211 14,21- 61'67 10'14- 68,'6
2 16'38 1·\12- '5,'5 0·83- 6HO
,
8
6
23'08
30'77
38,'8
5'04- 53·81
II'~ 61,'3
13·88- 68"2
2'78- 62·08
5'71- 611-13
11,'2- 75"6
7 'H8
8 47'08
II 62''''
18''''
2\1'118-
27'81-
67'08
72-111
77'02
18'71- 78·"
17'84- 78'07
21'112- 8\1·86
10 58'82 32'\12- 81'56 26'56- 86·\111
6 '6-15 111·22- 7',87 18·83- 81-13 11 114'71 38·83- 85'711 81'6'- 811·86
7 53'85 25-13- 80'78 18·87- 88'17 12 70'611 "'04- 811·611 86'110- 118·03
8 61·114 31'58- 88'14 2',54- 110·58 13 76,'7 60'10- 98'111 '2'88- 115'7'
II 611'23 88'57- 110·111 30'87- "'·211 l' 82·85 56'57- 116'20 48'\1&- 117'111
10 76·112 ,6-1 ~ ""116 37·114- 117·22 15 88·\14 63'56- 118'114 55'87- 118·37
11 84062 114'56- 118'08 '5·110- 118'17 16 ""12 71'81- 118'85 68'70- 118'117
12 112'81 63·117- 118·81 55·10- 118'116 17 100·00 8O"~100'00 78·22-100-00
18 100·00 75'2~100'00 66,53-100.00
.-18
ft - I t
0 0-00 0'00- 18'58 0·00- lUi'60
o 0'00 0·00- 23'16 0·00- 81'61 1 6'56 0'1'" 27'211 0·03- 14'68
1 7-1, C)-I8-88'87 0'04- 411'40
,
2
8
1t·211
21-48
23·57
1·78-
"86-
411·81
60·80
8'8~ 58·10
0'76-
2'67-
6'26-
51-l18
58'112
85'711
,
2
8
5
11-11
16'67
\12'\12
27-78
1·88-
8·68-
.'1-
14'71
'1'411
'7'114
II'~ 68'68
0'5~ 411'17
1'117- 48''''
,-00- 114·112
6·6'- eo'56
6 36·71 12·76- 114'86 8'86- 72·01 8 88'88 18'84- l1li001 11·61- 66'79
8 '2·86 17·86- 71-1' 12'87- 77'66 7 88'811 17·80- M·lUi 12·8'- 70068
7
8
II
60·00
57-1'
114·211
23·04- 78'116
28'86- 8\1·14
35-1... 87·\14
17'24-
22·84-
27'~
82'76
87'88
111·14
8
II
10
"."
60-00
56·56
21'53- 611·\14
28002- 78'118
80-76- 78"7
16"~ 76-26
20-47- 711·68
\14'7... 88'61
10 71"8 '1'110- 111·81 14'21- ""7' 11 61'11 35'7ft- 81'70 211·82- 87018
11 78·57 ,".\10- 115·14 41·08- 117"8 12 66'87 40.... 86'66 14·21- 110,'"
12 85·71 5N~ 118'22 48'77- 011·\14 13 72'\12 '8'62- 110'81 811"ft- 98'46
18 112'86 66'13- 118'82 67-60- 011'116 It 77'78 52'86- 118·l1li 411-08- Il6-oO
It 00'00 78'8'-100·00 68"~100'00 15 88'88 58'58- 116·41 61-16- 98-08
16 88·811 66'\10- 98·8\1 67'88- 118"1
ft - 15 17 ",." 72'71- 118'86 66'87- l1li·97
18 100'00 81-47-100'00 76-60-100-00
o 0·00 0·00- 21'80 0'00- 211'78
1 8'87 0'17- 81'116 0'03- 40'18 • -19
2 18·88 0,71- 68'68
,
8
6
20'00
28'87
88·88
1·86-
"83-
40-'8
68-011
7'~ 65-10
11'82- 81·8\1
2'8~ 68'06
"88- 82-78
8·01- 68'82
0
1
0·00
6'28
0-00-
0013-
17'66
28·08
0'00-
0'03-
\14·14
8B-11
2 10'68
6
7
8
40·00
'8'87
58·88
18·84- 87'71
21'27- 78·41
28'6~ 78·78
11·70- 74'311
15'87- 7"""
20'51- 8H8
,
8
6
16'711
21'06
16'81
1·80-
8·88-
8·or.-
II-lft-
88·1'
811'68
411·67
61'20
0·66-
1'86-
8·78-
6-17-
40'87
46'81
62·71
58'18
II eo·OO 82'2~ 88'66 25·81- 88'80 6 81'58 12'58- 56'55
10 66·87 88·88- 88'18 81-18-111'118 8'116- 68'\111
7 88·'" 1.2~ 81'114 12'07- 68-011
11 78·88 "'110- 112·21 87·27- 115'12 8 '2·11 2O.2ft- 66·60 IH~ 72·eo
12 110-00 61'111- 115·87 '8·116- 117·81 II '7'87 \I4.,ft- 71-1' 111'1~ 76·'"
18 86·87 511·6'- 118·14 61·87- 118'211 10 62'68 28'86- 76-66 28'16- 80·81
It 118·88 68-06- 118·88 611·8'- 011·117 11 67·811 88·60- 711'75 2No- ""61
16 100·00 78·\10-100-00 70·24-100·00 12 63'16 88·86- 88'71 81-111- 87-118
18 68'411 '8·4ft- 8N2 88,71- 111006
.-18 It 73'68 68'80- 110'85 '1'82- 118'88
15 78'116 114,'3- 118'95 '7'2~ 116·\12
o 0·00 0·00- 20·511 0'00- 23'111 16 ""21 eo"2- 116'8\1 68'18- 118·1t
1 8·25 0'18- 30'23 0·08- 88·1' 17 811-47 66·86- 118'70 69'63- l1li'"
2 12·60 1·56- 88·85 0·87- '8'28
,
3
6
18·75
25·00
31·25
"06- '5·85
7-27- 52·88
11·0\1- 58'66
2·23-
'·5ft-
704ft-
58'"
611·111
85·85
18
111
""7'
100·00
78,117- l1li'87
8I·Br.-l00·OO
66'8~ l1li'117
76'86-100'00
Digitized by Google
400 TtJblu
Digitized by Google
1IO,,1bDlta
,
116" IbDlta 110" IbDlta
r l00r/tl l00gtL l00gtv 100gtL l00gtv l00r/tl l00gtv l00gtv l00gtL l00gtv
..
It - 10 CooaUnued) It - III CooaUnued)
18 88'" 011·86- 117066 08·711- 118_ 11 87'118 10'011- 67-7' 10·00- 08·10
111'81 7',87- 110-06 OS'1I6- 110·611 11 '1·88 18·6&- OHIO 111'18- 00·88
16 110'16 SO'8&- 110'110 7',71- 110·118 18 "·88 ZO'66- N·81 11'111- 011·"
10 100'00 80·77-100·00 81·6&-100-00 l' "·18 n'66- ON7 "'011- 7...8
16 61-71 81·68- 70'66 17'67- 76'81
" -17 10 66-17 86'011- 78·61\ 8O-M- 78'011
0-00 0-00- 11'77 0'00- 17·81 17 68'01 88·114- 70·" 88'0-.. SO'77
0 18 01-07 d·1&- 711-81 SO·SO- 88·M
1 8·70 0'08- 18'117 0·. . .·" 111 06·61 66'07- BI-OO .o-oe- 86'SO
I 7,'1 0,111- 1"111 0'811- 8O-ot
, 10 OS'1I7 '11-17- M·7I '8"11- 88·16
.
8 11-11 1·86- n'18 1·111- 86-o? 21 71"1 61'7&- 87·17 41·01- IIO·SO
U'81 HII- 88'78 I'eo- 811·78 22 76'80 I\O.,&- 811'70 60-117- III·"
6 18·1\1 11·80- SS-08 ,.lIS- "·11 23 711'81 00'18- 111-01 6,",&- 114·86
8 11'11 8'81- d'lO 11'10- ts·18 82·70 N·lIS- 114'11; 68·ta- 110'08
7 16·118 11'11- "·18 8·17- 61·111 16 80'11 OS'M- l1li-11 Ol·eo- 117·68
8 211·08 18'76- 60·18 10,,1- 1\0·08
II 88·88 10·62- 1\8·110 11'sa- 611'76 10 811'00 71'06- 117'81 07-0&- 118·110
10 87-ot II1'to- 67·08 16'88- 08'18 27 118'10 77'18- 110·16 71·77- IIO'N
18,07- 00'l1li 18 110'66 BI'U- 110·111 77iN- 110'118
11 .0'7' 11'811- 01'10
11
18
U
"."
"'16
61'86
16·t&- N·1I7
18'117- OS-06
81'116- 71'88
111-88- 011·118
23·81- 78-1'
10·8&- 711-111
n 100-00 88-0&-100-00
It -10
88·80-100-00
..
11 "77'78 67'74- 1I1'SS 61·72- 118·110 0 10·00 7-71- 88'67
22 81'ts 01'91- 118'70 66·811- 116-77 6·66- "'IS
18 86'111 00,17- 116·81 00·17- 117·.0 7 1S·88 11·118- d·18 7·111- '7'110
88'811 70'M- 117'06 N'1I8- 118·71 8 10'07 11·18- 66'l1li II..... 61·1\0
16 111·611 76·71- IIO-GII 011·116- 110·01 II 80'00 1"78- '"•.0 11"&- 66-01
III 76·M- 110'118 10 sa'88 17'111- 61'81 18,07- 68'34
I18'SO 81·08- 110'111 11 80·07 111·118- 60·1' lOiN- 01-67
17 1011-00 87-1IB-l00'00 BI'I8-100-OO 11 .a-oo 11'00- 69-.0 18·1\0- N'70
,,- IS 18 "·18 16'.01'67 11-07- 07-78
U "'07 IS'M- 06·07 18-78- 70-07
0 0'00 0'00- 111·8' 0-00- 17'" 16 60'00 81'80- OS'70 1O·t&- 78'61
1 8'67 0'08- 18·86 0·.1S·01l 10 68'sa M·as- 71_ n·sa- 70·17
I 0·88- 88·60 0'88- n'l1 17 1\0'07 87·ta- 7"U 81'17- 78'"
,
8
6
N'
10'71
U'III
17'80
2'27- 18·18
4-08- 81·07
11-0&- SO'811
1·16- sa'lIO
2-61- 88·1\8
'-07- d'SO
18
111
10
00-00
OS·88
00-07
.o-eo- 77·M
'8'8&- SO-o?
'Nil- BI·71
86·80-
88·ta-
'1'00-
81'60
88·118
8O·as
..
0 11"8 8'80- .0·116 6'8&- ""87 21 70-00 6O.eo- 86'17 ,,-l1li- 88'68
7 16'00 10'1111- "·87 7'8&- 60·70 II 78'sa U'11- 87·71 "·.... IIO·n
8 18·57 18·22- "'117 10·. U"II 18 70·07 67'72- IIO-o? 61-01- lII·n
II 81'14 16'88- 62'86 11·a... 68·08 SO-OO 01·ta- 111·111 61\.72- 114·61\
10 86·71 18'Ot- 66'" U'77- 01-66 16 88·sa 06'18- 114·80 611·eo- 118·11
11 811·n 11·1\0- 611"2 17-88- N'IIO 10 80·07 011'18- 118·.. OS·OO- 117·07
11 d'811 ...,&- 01'81 10-0&- OS'U 27 110'00 78"7- 117'811 07'117- I18'M
18
U
16
"."
60-00
1\8'67
17'61- 00'18
80-06- 011'86
88'87- 71,'"
11'82- 71-10
16'72- 7"1S
1S'74- 7N8
18
III
80
118·88
118'07
100·00
77'118- 110'18
BI·7&- 110'111
88·ta-l00-OO
7I.eo- 110'06
77-78- 110-118
88'81-100-00
18 6N' 8N8- 76'U 81'8&- 711'118
17
18
111
00'71
N·n
07'80
.0.1\8- 78·60
"'07- 81·80
'7'06- M'11
86'10- BI'1I7
88"6- 86'1S
'1·91- 87'OS 0 ...- ,.,.
,,.,,-
,·tHI
" -1000C~)
,." ,...- ,..,
10 71·ta 61-88- 811·78 '6·61- 811'118 1
"1'
,." ,·OJ- ,." "'1-
"10- "'4
21
12
18
76·00
78·67
81·U
66-18- 811·81
611·06- 111·70
OS'11- 118·114
'"·14- 111'14
68-18- 114'14
67-10- 116'118
2
8 ,." "11-
,6 ,." '41.- ',8'1
141.
,·u
',01- 1·09
'·''1-1·111
I' 86·71 07'sa- 116'117 01'67-1170411 0'60 0-1&- 1'17 o-oe- 1'811
16 811·211 71-77- 117-78 00'01- 118'76 II 0'00 0·22- 1'81 0'14-1'U
10 111·80 70·1\0- 110'12 70'811- 110·01 7 0'70 0'18-1·" 0'18- 1·011
27 118"8 81'06- 110-91 70'81- 110·118 8 o-SO O'M- 1·68 o-U- 1·88
IS 100'00 87'00-100-00 81'7&-100·00 II 0-110 o·u- 1·71 0·111- 1·117
10 1-00 O-t&- I'M 0·86- I'll
11 1'10 0·66- 1'117 0"1- 1·16
" - n 12 1'10 0-0-.. 2·011 O·t&- 1·88
0 0'00 0-00- n'M 0-00- 10'70 18 1'80 0'011- 2·11 O'M- 1·61
1 B"6 o-oe- 17'70 0-.11·118 U 1'.0 0·77-1'34 o·eo- 1·06
2 0'110 0'86- 11·77 O·B&- 18·18 16 1'60 o·at- 1-'7 0·07- 1'78
,
B
6
10'M
18'711
2'111- 17-86
8'118- 81·00
6'86- 86·77
I'ZO- BI·1I8
2·&1- 87·.0
10
17
18
1'00
1'70
I'SO
0'91- 2'611
0'l1li- 1·71
1·07- I'M
0-74- 1'111
0'81- B-ot
0'88- B'17
17'" 8·91- '1-67
0 10'011 7'l1li- 811'72 6'06- '6'U 111 1'110 H6- 1'118 0'116- 8'80
7 "'14 10'80- ta·u N&- ,",as 10 1-00 1·.... B-08 I-G1- 8'd
8 17-611 11'78- '1'" lI'Ot- 61'110
II 81-08 16'18- 60-88 11·86- 1\0'61
10 h·" 17'114- U'88 u·zo- 611'111
Digitized by Google
TABLlI A3
PM W ilrozon tut lor two intk~ BGmplu
See § 9.3. The sample sizes are "t and ":I. If the sample sizes are not equal "t
is taken 88 the amaJJer. If the rank sum for sample 1 (that with fit ObeervatiODB)
is equal to or laB than the amaJJer tabuJated value, or equal to or peater than the
larger tabuJated value, then P (two tail) is equal to or laB than the figure at the
head of the column. If the null hypothesis were true P would be the probability
of ob8erving a rank sum equal to or greater than the larger figure, or equal to or
laB than the amaJJer. If one or both samples contain more than 20 obeervatioDB,
use the method deBcribed at the end of § 9.3. M. I. Sutclilre'. table reproduced
from Mainland (1963) by permiBon of the author and publiaher.
Digitized by Google
p (appros.) P (approS.)
Digitized by Google
0.10 P (approx.) P (approx.)
fil fill CE·OI 0·0/; 0·01 fil fIa 0·10 0·06 0·01
--- ---
I. 17. 1{;'fI; 180 H6; 190 19 2M 293 l{;8; 308
16 116; 181 110; 187 99; 198 20 197; 293 188; 302 172; 318
4 to(P = 0·126) -
6 to(P = 0·0626)-
6 0
7 2 0
8 4 2 0
9 6 3 2
10 8 6 3
11 11 7 6
12 14 10 7
13 17 13 10
14 21 16 13
16 26 20 16
16 30 24 20
17 36 28 23
18 40 33 28
19 46 38 32
20 62 43 38
21 69 49 43
22 66 66 49
23 73 62 66
24 81 69 61
26 89 77 68
Digitized by Google
TABI,~ A5
The Kru8kal-WaUi8 one way analysis 0/ variance on rank" (i~
aamg:;~~I)
Bee § For valuso H, t~ble the valuso P (th4'%
probabWty of obeerving a value of H eqoal to or greater than the tabulated valoe
if the noll hypothesis is true, found from the randomization distribution of rank
table only with k grouyw, t,Ye of ok::w;,'&tioso::so
"s, fIa) in being to 5.:&vger 0;' w,)re grollyll 1188 the
method deacrlbed at the end of § 11.5. From Kroskal and Wallis (l952,J. Amer.
statW. A ... 47, 614; 48, 910) with permiaaion of the author and publisher.
-
tis tis tis "1 tis tis
4 , 2 7·03M 0·006
6·M86
6·6162
0·Of9
0·061
8·8727 0·011 4'6333 0·097
6·4646 0'Of6 4,'121 0·109
6.23M 0·062
'·6646 0·098 5 4 I 8·9645 0·008
"«55 0·103
, , 3 H439 0·010
8·MOO
'·9865
'·8800
0·011
O·Off
0·068
H3M 0·011
3·9873 0·098
6·5986 0·Of9
3·9800 0·102
5·5758 0·051
'·6456
4,'773
0·099
0.102 5 , 2 7·2Of6 0·009
, , 4 7011638 0·008
H182
6·2727
0·010
0'Of9
7·6386 0·011 5·2682 0·060
5·6923 0'Of9 '·6409 0·098
5·6638 0·064 '·6182 0·101
0·097
'·6639
'·6001 O'IOf
6 , 3 N"9 0'010
5 I I 3·8671 0,1'3 7·39f9 0·011
6·66M 0'Of9
5 2 I 6·2600 0·036 6·8308 0·050
6·0000 0·Of8 '·6487 0·099
'·4600 0·071 '·6231 0·103
,·2000
'·0600
0·096
0·119 5 4 , 7'78Of 0·009
7-7"0 0·011
5 2 2 6·5333 0·008 6·6671 0·Of9
6·1333 0·013 6·8178 0·060
6-1600 0·034 ,·8187 0·100
5·OfOO 0·066 '·6627 0·102
4·3733 0·090
,·2933 0·122
6 6 I 7·3091 0·009
5 3 I 6,'000 0·012 8'83M 0·011
4·9800 0·048 6-1273 0'Of8
,·8711 0·062 ,·9091 0·063
'·0178 0·096 H091 0·088
3·MOO 0'123 "03M 0·106
Digitized by Google
4.08 Tabla
Sample IIizee Sample IIizee
H P B P
"I lis fta "I fta fta
6 6 , 7·8229 0·010
6·7800
6·6800
0·049
0·061
7·791' 0·010 '·6600 0·100
6·6867 0·049 '·6000 0·102
Digitized by Google
A6 TABLE
The Friedman two way analysis oj variance on ranks Jor randomized block experiments
See § 11.7. For each value of S the table gives the exact value of P (the proba.bility of observing a. value of S equal to or greater
than the tabulated value if the null hypothesis is true, found from the randomization distribution of rank. aums). Approximate P
values are given at the head of the column. If the number of treatments, k, or the number of observations per treatment = number
of blocks, flo is too large for this table, use the method described at the end of § 11.7. From Friedman, M. (1987, J. Amer. SlaNt. A ...
11,688), by permission of the author and publisher.
Number of treatments
1:=3 1:=-4 1: ... 5
No. of P~O·06 P~O-OI P ~0-001 P ~0-05 P~O-Ol P ~0-001 P ~0-06 P~O·01 P~O·OOI
blocks
n S P S P S P S P S P S P S P S P S P
2 20 0-042
3 18 0-028 37 0-033 64 0-046 78 ()O0078 88 0-0009
~ 28 0·042 32 0-0048 62 0-038 64 0-0089 74 0·0009
6 32 0·039 42 0·0085 60 0-0008 85 0·044 83 0·0087 106 0-0008
8 42 0·029 M 0-0081 72 0-0001 78 0-043 100 0-0100 128 0-0009
7 50 0·027 82 O-OOM 88 0-0003
8 60 0-047 72 ()O0099 98 0-0009
o 9 68 ()O048 78 0-0100 114 0-0007
cO"
'"
N" 10 82 0·048 98 0-0075 128 0·0008
~ ---------
~
C"')
o
-
~
rv
TA.BLB A7
Table 01 t1ae critical range (dil/er6f1C6 between rani BUm.t lor any two
treatment8) lor comparing aU paira in. t1ae KrualaJl-WalliB nonparametric
OM way a'1U1lYN 01 varia1l.C6 (a66 §§ 11.5 aM 11.9)
Values for which an exact P is given are abridged from the tables of McDonald
and Thompaon (1967), the remaining values are abridged from WUcoxon and
WUcox (19M). Reproduction by permiaaion of the autho1'8 and pubUsbere. tNot
attainable. Number of treatments (samples) = k. Number of obeervation (repli-
cates) per treatment = ft.
P (approximate) P (approximate)
0·01 0·06 0·01 0·06
orit. orit. orit. orit.
i range P range P i range P range P
" "
8 2 t 8 0·067 5 2 18 0·018 15 0·048
8 17 0·011 15 O·OM 3 32 0·007 28 0·060
4 27 0·011 24 0·043 4 50 0·010 44 0·068
6 89 0·009 33 0·048 6 76·8 83·6
8 61 0·011 43 0·049 8 99·3 88·2
7 87-8 154·4 7 124·8 104·8
8 82·4 88·3 8 162·2 127·8
9 98-1 78·9 9 18H 162·0
10 114·7 92·3 10 212·2 177·8
11 132-l 106·3 II 244·8 206·0
12 150·4 120·9 12 278·6 233-4
13 189·4 138·2 13 313·8 283·0
14 18H 152-l 14 360·6 298·8
16 209·8 188·8 16 388·6 326·7
18 230·7 186·8 18 427·9 368·8
17 262·6 203·1 17 488·4 392·8
18 276·0 221-2 18 510·2 427·8
19 298-1 239·8 19 563-l 483·8
20 321-8 268·8 20 597·2 500·6
4 2 t 12 0·029 8 2 20 0·010 19 0·030
3 24 0·012 22 0·043 3 39 0·009 35 0·066
4 38 0·012 34 0·049 4 87·3 67·0
5 58·2 48·1 I) 93·8 79·3
8 78·3 82·9 8 122·8 104·0
7 95·8 79·1 7 1154·4 130·8
8 118·8 98·4 8 188·4 159·8
9 139·2 114·8 9 224·5 190·2
10 182·8 134·3 10 282·7 222·8
II 187·8 164·8 II 302·9 268·8
12 213·6 178·2 12 344·9 292·2
13 240·8 198·6 13 388·7 329·3
14 288·7 221-7 14 434·2 387·8
16 297·8 246·7 16 481-3 407·8
18 327·9 270·8 18 63()O1 449·1
17 369·0 298·2 17 680·3 491-7
18 391·0 322·8 18 832-1 636·6
19 423·8 349·7 19 886·4 68008
20 467·8 377-8 20 740·0 828·9
Digitized by Google
TABLE AS
Table oj the critical range (diJJer6'fI,U between rank BUmB Jor any two
treatments) Jor compairing aU pair8 in the Friedman nonparametric two
way analysis oj mriance (8ee §§ 11.7 and 11.9)
Values for which an exact P is given are abridged from McDonald and
ThompBOn (1967), the remaining values are abridged from Wilcoxon and Wilcox
(1964). Reproduction by permission of the authors and publishers. tNot attain-
able. NumberoftreatmentB = k. Number of replicates (= numberofblocks) =n.
P (approximate) P (approximate)
0·01 0·06 0·01 0·06
orit. crit. orit. orit.
~ fa range P range P ~ fa range P range P
3 3 t 6 0·028 6 2 t 8 0·060
4 8 0·005 7 0·042 3 12 0·002 10 0·067
6 9 0·008 8 0·039 4 14 0·006 12 0·064
6 10 0·009 9 0·029 6 16 0·006 14 0·040
7 II 0·008 9 0·061 6 17 0·013 16 0·049
8 12 0·007 10 0·039 7 19 0·009 16 0·062
9 12 0·013 10 0·048 8 20 0·012 18 0·036
9 22 0·008 19 0·037
10 13 0·010 11 0·037 10 23 0·009 20 0·038
11 14 0·008 II 0·049 II 24 0·010 21 0·038
12 14 0·012 12 0·038 12 26 O'OII 22 0·038
13 16 0·009 12 0·049 13 26 O'OII 23 0·036
14 16 0·007 13 0·038 14 27 o·on 24 0·034
16 16 0·010 13 0·047 16 28 0·010 24 0·046
16 16·6 13·3 16 29-1 24·4
17 17·0 13·7 17 30·0 26·2
18 17·6 14·1 18 30·9 26·9
19 18·0 14·4 19 3J07 26·6
20 18·4 14·8 20 32·6 27·3
4 2 t 6 0·083 6 2 t 10 0·033
3 9 0·007 8 0·049 3 14 0'008 13 0·030
4 II 0·006 10 0·026 4 17 0·006 16 0·047
5 12 0·013 II 0·037 6 19 0·010 17 0·047
6 14 0·006 12 0·037 6 21 0·010 19 0'040
7 16 0·008 13 0·037 7 23 0·010 20 0·049
8 16 0·009 14 0·034 8 26 0·008 22 0·039
9 17 0·010 15 0·032 9 26 0·012 23 0·043
Digitized by Google
A9
TABLE
Digitized by Google
TABLE All (Continued)
Digitized by Google
Digitized by Google
References
gitized by Goo
416 References
DEWS, P. B. and BERKSON, J. (1954). Statistics and mathematics in Biology,
(Eds. O. Kempthorne, Th.A. Bancroft, J. W. Gowen, and J. L. Lush), pp.
361-70. Iowa State College Press.
Documenta Geigy scientific tables, 6th edn (1962). J. R. Geigy, S. A. Basle, Switzer-
land.
DOWD, J. E. and RIGGs, D. S. (1965). A comparison of estimates of Michaelis-
Menten kinetic constants from various linear transformations. J. biol. Chern.
140,863-9.
DRAPER, N. R. and SMITH, H. (1966). Applied regression analysis. Wiley, New
York.
DUNNE'rl', C. W. (1964). New tables for multiple comparisons with a control.
Biometrics 20, 482-91.
DURBIN, J. (1951). Incomplete blocks in ranking experiments. Br. J. statist.
Psychol. 4, 85-90.
F'ELLER, W. (1957). An introduction to probability theory and its applications,
Vol. 1, 2nd edn. Wiley, New York.
- - (1966). An introduction to probability theory and its applications, Vol. 2,
2nd edn. Wiley, New York.
FINNEY, D. J. (1964). Statistical method in biological aBBaY, 2nd edn. GrUHn,
London.
- - LATSCHA, R., BENNE'rl', B. M., and Hsu, P. (1963). Tables lor testing
significance in a 2 X 2 table. Cambridge University Press.
FISHER, R. A. (1951). The design 0/ ezperiments, 6th edn. Oliver and Boyd,
Edinburgh.
- - and YATES, F. (1963). Statistical tables /ar biological, agricuUural and medical
research, 6th edn. Oliver and Boyd, Edinburgh.
GOULDEN, C. H. (1952). Methods 0/ statistical analysis, 2nd edn. Wiley, New York.
GUILFORD, J. P. (1954). Psychometric methods, 2nd edn. McGraw-Hill, New York.
HmmLRlJK, J. (1961). Experimental comparison of Student's and Wilcoxon's
two sample tests. In Quantitive methods in pharmacology (Ed. H. de Jonge).
North Holland, Amsterdam.
HOOKE, R. and JEEVES, T. A. (1961). 'Direct search' solution of numerica.1 and
statistica.1 problems. J. ABB. comput. Mach. 8,212-29.
KATZ, B. (1966). Nerve, muscle and synapse. McGraw-Hill, New York.
KEMPTHORNE, O. (1952). The design and analysis 0/ experiments. Wiley, New
York.
KENDALL, M. G. and STUART, A. (1961). The advanced theory 0/ statistics, Vol. 2.
GrUHn, London.
- - - - (1963). Theadvancedtheoryo/statistics,Vol.l,2nded. Griffin,London.
- - - - (1966). The advanced theory o/statistics, Vol. 3. Griffin, London.
LINDLEY, D. V. (1965). Introduction to probability and statistics /rom a Bayesian
viewpoint, Part 1. Cambridge University Press.
- - (1969). In his review of "The structure 0/ in/erence" by D. A. S. Fraser.
Biometrika 56, 453-6.
MAINLAND, D. (1963). Elementary medical statistics, 2nd edn. S&uders, Philadel-
phia.
- - (1967a). Statistical ward rounds-I. Clin. Pharmac. Tiler. 8, 139-46.
- - (1967b). Statistica.1 ward rounds-2. Clin. Pharmac. Ther. 8.346-55.
MARLOWE, C. (1604). The tragicaU history 0/ Doctor Faustus. London: Printed by
V. S. for Thomas Bushell.
MABTIN, A. R. (1966). Quanta.l nature ofsynaptic transmission. Physiol. Rev. 46.
51-66.
Digitized by Google
Re/erencetJ 417
MASSEY, H. S. W. and KESTELMAN, H. (1964). Ancillary mathematicB, 2nd
edn. Pitman, London.
MATHER, K. (1951). StaliBtical anal1lBiB in biology, 4th edn. Methuen, London.
McDoNALD, B. J. and THOMPSON, W. A. JR. (1967). Rank sum multiple com-
parisons in one- and two-way cl88llitlcations. Biometrika, 54, 487-97.
MOOD, A. M. and GRAYBILL, F. A. (1963). Introduction to the theory of BtatiBticB,
2nd edn. McGraw-Hill Kogakusha, New York.
NAIR, K. R. (1940). Table of confidence intervals for the median in samples from
any continuous population. Sankh1la " 551-8.
OAKLEY, C. L. (1943). He-goats into young men: tlrst steps in statistics. Univ.
Coli. HoBp. Mag. J8, 16-21.
OLIVER, F. R. (1970). Some a.symptotic properties of Colquhoun's estimato1'8
of a rectangular hyperbola. J. R. BtatiBt. Soc. (Series C, Applied statistics) 19,
269-73.
PEARSON, E. S. and HARTLEY, H. O. (1966). Biometrika tabl6IJ for BtatiBticianB,
Vol. 1, 3rd edn. Cambridge Unive1'8ity Press.
POINCARJ!:, H. (1892). Thermod1lnamique. Gauthier-Villars, Paris.
RANG, H. P. and CoLQUHOUN, D. (1973). DTUfI ReceptorB: Theory and Ezperiment.
In preparation.
ScHOR, S. and KARTEN, I. (1966). Statistical evaluation of medical journal
manuscripts. J. Am. med. ABB. 195, 1123-8.
SEARLE, S. R. (1966). Matm algebra for the biological BCienc6IJ. Wiley, New York.
SIEGEL, S. (1956a). Nonparametric BtaliBticB for the behavioural BCienc6IJ. McGraw-
Hill, New York.
- - (1956b). A method for obtaining an ordered metric scale. PBychometrika II,
207-16.
SNEDECOR, G. W. and COCHRAN, W. G. (1967). StaliBtical methodB, 6th edn.
Iowa State Unive1'8ity PreY, Iowa.
STONE, M. (1969). The role of signitlcance testing: some data with a message.
Biometrika 56, 485-93.
STuDENT (1908). The probable error of a mean. Biometrika 6, 1-25.
TAYLOR, D. (1957). The meaBUrement ofradioiBotop6IJ, 2nd edn. Methuen, London.
THOMPSON, SILVANUS, P. (1965). CalculUB made easy. Macmillan, London.
TrPPE'rI', L. H. C. (1944). The methodB of BtatiBticB, 4th edn. Williams and Norgate,
London; Wiley, New York.
TREVAN, J. W. (1927). The error of determination of toxicity. Proc. R. Soc. BIOI,
483-514.
TUKEY, J. W. (1954). Causation, regre88ion and path analysis. In StatiBticB and
mathematicB in biology (Ede. o. Kempthorne, Th. A. Bancroft, J. W. Gowen,
and J. L. Lush), p. 35. Iowa State College Press, Iowa.
WILCOXON, F. and WILCOX, ROBERTA, A. (1964). Some rapid approrimate
BtatiBtical procedur6IJ. Published and distributed by Lederle Laboratories, Pearl
River, New York.
WILDE, D. J. (1964). Optimum Beeking methodB. Prentice·Hall, Englewood Cliffs,
N.J.
WILLIAMS, E. J. (1959). RegreBBion anal1lBiB. Wiley, New York, Chapman and
Hall, London.
Digitized by Google
Digitized by Google
Index
Digitized by Google
420 Index
assays-( com.) confidence-{com.)
validity teste, 300-7 interpretation, 101-3, 108, 114, 333
888WDptions, 70-2, 86, 101-3, 111, 139. for median, nonparametric, 103, 396
144, 148, 153, 158, 167, 172, 205-6, (table)
207; 8U al80 individual method8 for new observations, 107
in fitting curves, 220, 234 on straight line, 227
in multiple "'"",'",,,',,", nm',m"", '4i""ributed variable,
asymptotic rej"t"", "%H',',,,,,,,'Y Hf 297,308,344-Y
significance t, 325, 331, 343
averages,8u P''''''Y'i£1'4ieart index, 111
"osnto£1,t, 242
gitized by Goo
Index 421
data IMllection ('data 8Il00ping'), 166,207 error-(com.)
deduction, I, 6 trustworthineaa of estimates of, 1-8,
degrees of freedom, meaning, 29, 369 101-8
density, BU probability density estimation, BU bias, least squares, likeli-
dependent variable, 21', 216 hood, aOO beet estimate
discontinuous distribution, au distribution exp(~), 69
distribution expectation, 305-8
binomial, '3-52, M, 69, 1M, 109-14, of any function, 368
124, 164, 359, 365, 398 (table) of function of two variables, 370-3
continuous, meaning, 64-9 of mean squares in analysis of variance,
cumulative, BU distribution function 178,186
discontinuous, meaning of, 43-4, 64-9, au alBomean
350 experimental method, meaning, 3-8
exponential, 81--5, 367-8, 380, 383, exponential
388-95 curve fitting, 23~3
stocbaetic interpretation, 81-5, distribution, au distnbution
379-96
function, 67-9, 367, 389
examples of, 68, 82, 3'6--58, 380, 383, F ratio, au variance ratio
389 factorial function, 9, 60
length-biased, 389-95 fiducial limite, au confidence limite
Gauaaian (normal), 69-76, 96-9, 101, Fieller's theorem, 293
346-64,866 Fisher exact test for 2 X 2 table, 116, 117
approximation to binomial, 62-3, use of tablllll for, 122
116,124 four.point _y, BU _ye, parallel line,
teste for fit, 80 (2 + 2) doee
transformations to, 71, 78, 80, 221-2 Friedman method, 200, '09 (table), .. 1
goodnese-of-fit teste (table)
chi-squared, 132 function
probit and rankit, 80 expectation of, 365-8, 370-3; au alBo
length-biased, 85, 389-96 mean
lognormal, 78-80, 107, 176, 221, 289, factorial, 10
346-64 mathematical, meaning of, 9
meaning of, 43-4, 64-9 variance of, au variance
multinomial, ..
Poi880n, 52-63, 81-5, 375-8, 388-95;
BU alBo quantal rel_ Gau88ian (normal) distribution, au distri-
skew and symmetrical, 78-80 bution
standard form of, 369 generalization, 1-8,91, 102
standard Gauaaian, 72--5, 126 GoBBet, W. S., au 'Student'
Student's " 75-8, 148, 167
dOIMl metameter, 280, 287
ratio, 283 half.life, 239
drug-receptor interaction, Bee adsorption confidence limite for, 242
Dunnett's d statistic, 208 stochastic interpretation, 380, 386
heteroecedaetic, BU homoacedaetic
Hill plot, 363
ED50, BU median effective dOIMl histogram, ", 63, 64-8, 3'6--53
efficiency of significance teete, 97 area convention, 66, 360
epinephrine, au adrenaline homoecedaetic, 167, 176, 221, 266, 269,
error 272, 281, 359
distribution of, BU distribution hyperbola, fitting of, 257-72, 361-4
estimates of, 1-8, 28-42; Bee alBo hypothesis, 6, 87-96
variance
of the first kind, 98
homogeneity of, au homoacedaetic lED, Bee individual effective dose
limite of, au confidence limite incomplete block deeignB, 206-7
of the second kind, 93 for_ye, 286
Digitized by Google
]'fIIl,a
independence, statistical, 20, 21, 22, 31, ", Lineweaver-Burk plot, 216-72
M, 84., te, 278-7, 286, 375, 379, 381 logarithm
ofoontnurt., 302-7 aha.nging hue of, 291
independent negative, 825
I81Dplee transformation, _ transformation
oluaifloation meallUl'elD8Dte, 91, 116 logistio curve, 861-4.
numerical meallUl'elD8Dte, 91, 187-61, lotPt transformation, 861-4.
182 lopormal di*ibution, _ distribution
rank - t e , 91, 187-4.8, 191
_ az.o eipifloanoe testa, random, anIi
I81Dple JIann.-Whitney~, 148
variable in curve fittiDg, 214 mean
individual efI'eoti.ve dOle, 112,844-84 of any function, 868, 868
relation with all-or-nothiDg NIpOD88, aritbmetio, population (expectation),
848 865-8
induction, 6 arithmetic, I81Dple
infennoe, lOientiilo, 8-8 lean ~ ~te, 27
preoiaion of, 101-3, 114; _ az.o ate.ndard deviation of, 83-8, 39
variaDoe, confidence limit., ctnd variaDoe of, 83-8, 81
bial weipted, 24, 39
intervaJa between random evente, 81-6, of binomial didribution, 50, 865
874-96; _ az.o lifetime deviation, 28
iIotope, _ radioilotope of uponential diatribution, 81-6, 367
of function of two variablee, 870-1
of GauaIian (normal) diatribution, 61,
Kruakal-Wallia method, 191,406 (table) 868
for testing all pairs, 410 (table) geometrio,25
lifetime and reaidual lifetime, _ life-
time
LaDpauir equation of lopormal distribution, 78, 846-67
fittiDg of, 257-72, 361 of Poiam diatribution, M, 81, 868, 375
lItoohutic interpretation, 380-6 relation with median and mode, 78-80,
Latin equare, 204 101,848,368
LD50, _ median effective doae 1IqUlU'M, 187, 117, 221, 238; _ aIM»
ean~method analyaia of variance
for_ya, 271 median
and'~'~Una~,216,257-72 effective dOlle (ED50), U6-64
for curve fitting, 218-20 lifetime, _ lifetime anIi atoohutio
geometrical interpretation, 243-63, p~
259-62 population, 26
for 1D8IIIIII, 27 relation with mean and mode, 78-80.
for lIrfichaelia-:Menten hyperbola, 257-72 101,346-64,368
without oalculua, 27, 220 I81Dple,26, 101. 108
lifetime, 81-6 metameter, dOlle and NIpOD88, 280, !87;
of adreDaline molecule, 880 _ az.o transformationa
of adIorbed molecule, 382 1Irfichaelia-:Menten equation, fittiDg of.
of empty adeorption lite, 384r 257-72,861
independence of when timing started, minimi-tion, _ optimiation
388 minimum
of iIotope, 885-7 effective dOlle, definition of, 860
length bialed I81Dple, 84., 389-95 lethal dOlle, 360
meaning of, 81-6, 385 mode, 27
reaidual,84.,383,388-95 relation to mean and median, 78-80.
twice averap length, 84., 389-95 U6-64,868
likelihood models for obllervationa
maximum, 8,268-72 flDd and random, 173, 178, 186, ~
technical meaning, 7, 21-4. mized,l96
limite of error, _ confidence limite multinomial diatn'bution, "
Digitized by Google
lflllez
multiple probability-(con.c.)
oompa.n.m.. 207 multiplication rule, 20, 378
IiDear repellion, 26~ poaterior, 8-8, 21-4, 96
and aualyBia or variance, 268 prior, 8-8, 21-4, N
multiplication aipiflcanoe value, 88-96, 207
operator, 10, 26-6 wbjective, 18, 96
rule, 20, 378 prob~~onDAtion,347,368
and haemolyaia, 381
linearizing aigmoid curvea, 381
neuromuaular junction, ... quantal teat ror GauaiaD diatn"bution, SO
re1eaae purity in heart, _1' ror, III
noD·1iDear repelliOD, ... curve fitting
noDparametriC methoda, characteriatiOll,
96,98-9 quadratic equation
DOrmal fitting, ... cnrve fitting, polynomial
diatn"bution, ... diatn'butioDB, GaUllian. IOlution or, 2M
equivalent deviation, ... probit quantal
Dull hypotheaia, 8, 87-98 rel_ or acetylcholine
Dumber or quanta per impul8e, 67-10
intervala between quanta, 81-6
obeervational method, meaning, 6; ... reaponI8I. ... all.or.nothing reIpOD1811
aZ.o oorrelation cmd probit traDBf'ormation
Occam'. razor, 216 quantitative numerioal m_ente, H
operation,1'DMIling, 9: ... ol8o aummatiOD,
etc.
optimism, or eatimatel or error, 101-3 radiation, ''''e dOlle' of, 380
op~tion,282-7 radioilotope diaintegratiOD
orthogonal oontraltl, 802-7 erron in, 62, 80-8
Dumerical eumplea, 311-43 atochutic interpretation, 386-7
variance or, 308 raDdom
blocka, 171, IN, 200, 207
Latin aquare, J08
P value, £rom Iignifloance teat, meaning, permutationa, vii, 18-19
88-100, 207 pl'OCellll, 62-83, 81-6, 374-96: Bee aZ.o
parallelilm, teat for, ... _ya, parallel 1i£etime cmd atochutic
line aample
parameter, 4 re&IIODI for uece.ity, 119
pattern.aroh minimi".atioD, 283 rejection or 1lD8CCleptable, 123
permutatiODB, 8ee oombiDatioDB aelection or, 3, 18-19,43-6
raDdom, vii, 18-19 aampling Dumben, 1188 of, vii, 18-19
pennutation teats, 8ee raDdomization teste raDdomization teats, 98, H
PoiBaon diatn'bution, ... distribution cluai&ation meuuremente, 117
polynomial curve fitting, 262-4, 388 Cuahn1' and Peeblea' data, 143
population, 4, 16, 20, 43, 84-9: 8ee ol8o numerioal and rank meellUftllDeDte, 138,
IltaDdard deviation cmd mean 143, 163, 167, 180, 191,200
power, of .igniftcance teats, 93-100 rationale, 98, 117
prior probabilitiea, Bee probability unacceptable ranciomizatiODB, 123
probability ... aZ.o card ahufBiDg analyail
addition rule, 19 raDdomized blOOD, 171, 196,200,207
Bayea' theorem, 21-4, 96 range, 28
binomial, 46, 109-14 rank m..uremente, 98, 99,118, 137, 162,
oonfidence, Bee confidence limite 171, 191,200,207-10
deDBity,84-9 oorrelation, 274
direct, 8-8 rankite, &I teat for GaUllian. diatn'bution,
distn'butioD, Bee diatn"bution SO, 412 (table)
invene, 8-8, 87 rate CODItant, 238
meaning of, 16-18,96 atochutic interpretation, 3SO, 386
Digitized by Google
424 Index
ratio significanoe-{cont_ )
dOIle,283 between , tests and analysis of
of maximum to minimum variance of variance, 190, 196,233
llet,176 between VariOUl methods, 116, 137,
potency, He _ys and oonfldence limits 152, 171
of two Gaull8ian variables, He confidence relative effioienoy, 97
limits and variance of funotions two-tail, 88
of two estimates of l&II1e variance, .ee for variance, population value of, 128
variance ratio simulation, 268
receptor-drug interaction, He adsorption six point _y, He _y, parallel line,
regression (3 + 3) doae
analysis, He ourve fitting skew distributions, 78-80, 101, 348, 368
equation, 214 standard
linear, 216-257 deviation, .ee variance of funotions
non-linear, 243-272 of obaervation, .ee variance of functions
related I&II1ples error, 33, 35, 36, 38
advantages of, 169 form of distribution, 369
ol&llllifioation measurements, 91, 134 Gaull8ian (normal) distribution _
rank measurements, 91, 152-66, 200 distribution
numerical measurements, 91, 152-70, statistioa
195,200 expected normal-order, 412
He az.o randomized blocks role of, 1-3, 86, 93, 96, 101, 214, 374
root-mean-square deviation, He standard technical meaning, 4
deviation steepest-desoent method, 262
stochastic pl"Oce8llell, 1, 81-5, 374-95
adsorption, 380-5
catabolism, 379
I&II1ple isotope disintegration, 52, 60-3, 385-7
length-biaaed, 85, 389-95; He az.o length bias, 85, 389-95
lifetime and stochastio pl"Oce8llell lifetime, .ee lifetime
simple, 44 meaning, I, 81, 374
small,49,75,80,89,96-9 of 0(111), 376, 378, 387
striotly random, vii, 3-6, 16-19, 43-5, Poisson, derivation, 375-8
117, 207; He aZ.o random quantaI rel_ of acetyloholine, 57-60,
Soheffe's method, 210 81-5
8Oientifio method, 3-8 residual lifetime, He lifetime
sign test, 153 .ee az.o distribution, exponential and
significance testa, He guide to partioular distnbution, Poi880n
tests on end sheet straight-line fitting, 214-57
for aU poBBible pain, 191, 200, 207-10 'Student' (W. S. Gosaet), 71
aBBUmptions in, 70-2, 86, 101-3, Ill, paired ,test, 167
139, 144, 148, 153, 158, 167, 172, , distribution, 75-8
205-6,207 tables of, 77
oritique of, 93-5 test, 148
efBciency of, 97 relation with analysis of variance, 179,
interpretation of, 1-8,70-2,86-100 190, 196, 226, 232-4
maximum variance/minimum variance, relation withoonfidence limits, 151, 168
176 Bum
multiple, 191, 200, 207-10 of produots, 31
one-tail, 86 working formula, 32
parametrio verBU8 nonparametrio, 96-9 of squared deviations (SSD), 27,184-90,
randomization, _ randomization testa 197,217-20,244-57
ranks, 96, 99, 116,137, 152, 171 additivity of, 189, 223
ratio of maximum to minimum varianoe, working formula for, 30, 188,224
176 .ee az.o least squares method t.md
relation analysis of variance
with oonfldence limits, 151, 155, 168, summation operator, :E, 10-14
232 survey method, meaning, 5
Digitized by Google
Index 425
tables, published, vii variance-( coral.)
tail of distribution, 67, 72 of lognormal distn"bution, 78-9,
testa 346-57
for additivity, 174 of Poisson distribution, 55, 368; 8U
of assumptions, 8U assumptions al80 distribution and quanta!
for equality of variances, 176 release
for GaU88ian (normal) distribution, of potency ratio, . . confidence limita
probit and rankit, 80 of product of two variables, 40
for goodness of fit, 132 of ratio of two variables
for Poisson distribution, 133 (approx.), 41, 107,296
of significance, 8U significance (exact), 8U confidence limita
thremold dose, 360, 364 ofreciproca1ofvariable(approx.),41,272
time constant, 238 sample, xviii, 28, 29, 369
stochastic interpretation, 380, 385 bias of, 29, 307, 369
transformations ratio of maximum to minimum, 176
for additivity, 17~ ratio of two estimates, 8U variance
for analysis of variance, 176 ratio
in 8888YS, 280-3, 287, 340, 344-6 when population mean known, 29,
in curve fitting, 221-2, 238, 243 307,309
to Gau88ian distribution, 71, 78, 80, working formula for, 30
17~, 221, 239, 287,344-6 of slope of straight line, 225
linearUing, 221-2, 238, 266-72,353 of sum
logarithmic, 78, 176,221-2,238,280-3, or difference, 37
287,291,344-6,361-4 of N variables, 37, 307
logit, 361 of variable number of variables, 41,
normalizing 8ee transformation, to 58-60,370-3
Gaussian of value
probit, 80, 347, 353-64 of II) read off straight line, 8U con·
rankit,80 fidence limita
reciprocal, 266-72 of Y read off straight line, 227
2 X 2 table of variable
independent samples, 116-134 + constant, 38
related samples, 134 X constant, 38
two samples, difference between, 8U signifi. of variance, 128
cance testa; and guide on end meet of weighted arithmetic mean, 39, 292
8U al80 confidence limita
unacceptable randomizations, 123 variance ratio (P)
less than one, 182
validity of 8888YS, 91 meaning of, 176, 179
variability, measures of, 28 relation
variance of functions of observations with chi·squared, 180
of any function (approx.), 39-40 with Student's " 179, 190, 196, 226,
of any linear function, 39, 225, 307 232-4
of difference, 37 tables of, 181
of function of correlated variables, 27, 41 virginity, III
of linear functions, 39, 225, 307
of logarithm of variable (approx.), 40 waiting time, 8U lifetime and stoch·
of mean, 33, 35, 36, 38, 101 astic pro0e8B88
meaning, 33-42 paradox, 84,374-95
multipliers, definition, 295 weighting, 25, 220, 272, 292
population, xviii, 28, 29 Wilcoxon
of binomial distribution, 50, 359, 368 signed ranks teat for two related
constancy of, 167, 175-6, 221, 266, samples, 160,405 (table)
269, 272, 281, 359 test (Mann-Whitney) for two inde·
definition of, 368-9 pendent samples, 143, 402 (table)
estimation from probit plot, 353-64
examples, 51, 60-3 Yates' correction for continuity, 126, 129,
of exponential distribution, 368 132
Digitized by Google