Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
85 views

Lectures On Biostatistics-Ocr4 PDF

Uploaded by

Shime HI Raya
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views

Lectures On Biostatistics-Ocr4 PDF

Uploaded by

Shime HI Raya
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 446

This book is provided in digital form with the permission of the rightsholder as part of a

Google project to make the world's books discoverable online.

The rightsholder has graciously given you the freedom to download all pages of this
book. No additional commercial or other uses have been granted .

Please note that all copyrights remain reserved .

About Google Books

Google's mission is to organize the world's information and to make it universally


accessible and useful. Google Books helps readers discover the world's books while
helping authors and publishers reach new audiences. You can search through the full
text of this book on the web at http://books.google.com/
t ,.
.
, '0 , • , ~.

\.

"-'1' .r', L>\/


. "~ i J

L.... , ..... ..
Lectures on Biostatistics

Di! gle
D.COLQUHOUN

Lectures on Biostatistics
An Introduction to Statistics with
Applications in Biology and Medicine

.-. I'
U ~'.: '.' .
• ,.,
:.' ","'
\ t .•..
I '.
"

E: ~"
, (_. ~

~ I,..,
~;.. .. .'
'\ i. ...,'

l....... ' .... ~

CLARENDON PRESS· OXFORD· 1971

Digitized by Google
Oxford University Press, Efy House, LoruJon W. I
OLAMIOW NJlW YOR& TORONTO KRLIIOUILN1I WRLLIIfOTON
CAP. TOWN DADAK NAIIlOIII DAR a IIALAAII LIIIAKA ADDII A _
DRLIII BOIIIIAY CALCUTTA IIADRAI LUACIII LAIIORR DACCA
KUALA LUMPUR IINOAPOR& HOMO KONO TOKYO

(C OXPOIlD UNIVEIlSITY PIlESS 1971

PRINTED IN OIlEAT BIlITAIN AT THE PITMAN PIlESI. BATH

Digitized by Google
'-":- J

Preface

'In etatiatiea, just like in the industry of COD.8UDler goods, there are producers
and COnBUm8l'8. The goods are atatiatical methoda. These come in various kinda
and 'branda' and in great and often confusing variety. For the COD.8UDler, the
appUer of atatiatical methods, the choice between alternative methods ia often
cU1Ilcu1t and too often dependa on personal and irrational facto1'8.
The advice of producers cannot always be trusted implicitly. They are apt--
. . ia only natural-to praiae their own wares. The advice of COD.8UDlera-ba8ed
on experience and personal impreaaiona---amnot be trusted either. It ia well
known among applied atatiaticians that in many flelda of applied science, e.g.
in industry, experience, especially 'experience of a lifetime', compares unfavour-
ably with objective scientiflc research: tradition and aversion from innovation
are usually strong impedimenta for the introduction of new methods, even if
these are better than the old onea. Thia also holda for atatiatiea'
(J. Hm«Eu\lJ][ 1961).

DURING the preparation of the courses for final year students, mostly
of pharmacology, in Edinburgh and London on which this book is
based, I have often been struck by the extent to which most textbooks,
on the flimsiest of evidence, will dismiss the substitution of assumptions
for real knowledge as unimportant if it happens to be mathematically
convenient to do so. Very few books seem to be frank about. or perhaps
even aware of, how little the experimenter actually know8 about the
distribution of errors of his observations, and about facts that are
assumed to be known for the purposes of making statistical calculations.
Considering that the purpose of statistics is supposed to be to help in
the making of inferences about nature, many texts seem, to the
experimenter, to take a surprisingly deductive approach (if assump-
tions a, b, and c were true then we could infer such and such). It is also
noticeable that in the statistical literature, as opposed to elementary
textbooks, a vast number of methods have been proposed, but remark-
ably few have been a88t88ed to see how they behave under the condi-
tions (small samples, unknown distribution of errors, etc.) in which
they are likely to be used.
These considerations, which are discussed at greater length in the
text, have helped to determine the content and emphasis of the methods
in this book. Where possible, methods have been advocated. that
involve a minimum of untested assumptions. These methods, which

Digitized by Google
tJi Preface
occur mostly in Chapters 7-11, have the secondary advantage that
they are much easier to understand at the present level than the
methods, such as Student's t test and the chi-squared test (also described
and exemplified in this book) which have, until recently, been the main
sources of misery for students doing first courses in statistics.
In Chapter 12 and also in § 2.7 an attempt has been made to deal
with non-linear problems, as well as with the conventional linear ones.
Statistics is heavily dominated by linear models of one sort and another,
mainly for reasons of mathematical convenience. But the majority of
physical relationships to be tested in practical science are not straight
lines, and not linear in the wider sense described in § 12.7, and attempts
to make them straight may lead to unforeseen hazards (see § 12.8),
so it is unrealistic not to discuss them, even in an elementary book.
In Chapters 13 and 14, calibration curves and assays are discussed.
The step by step description of parallel line assays is intended to
bridge the gap between elementary books and standard works such as
that of Finney (1964).
In Chapter 5 and Appendix 2 some aspects of random ('stochastic')
processes are discussed. These are of rapidly increasing importance to
the practical scientist, but the textbooks on the subject have always
seemed to me to be among the most incomprehensible of all statistics
books, partly, perhaps, because there are not really any elementary
ones. Again I have tried to bridge the gap.
The basic ideas are described in Chapters 1-4. They may be boring,
but the ideas in them are referred to constantly in the later chapters
when these ideas are applied to real problems, so the reader is earnestly
advised to study them.
There is still much disagreement about the fundamental principles
of inference, but most statisticians, presented with the problems
described, would arrive at answers similar to those presented here,
even if they justified them differently, so I have felt free to choose the
justifications that make the most sense to the experimenter.
I have been greatly influenced by the writing of Professor Donald
Mainland. His Elementary medical statistics (1963), which is much more
concerned with statistical thinking than statistical arithmetic, should
be read not only by every medical practitioner, but by everyone who
has to interpret observations of any sort. If the influence of Professor
Mainland's wisdom were visible in this book, despite my greater
concern with methods, I should be very happy.
I am very grateful to many statisticians who have patiently put up

Digitized by Google
PreJact vii
with my pestering over the last few years. H I may, invidiously,
distinguish two in particular they would be Professor Mervyn Stone
who read most of the typescript and Dr. A. G. Hawkes who helped
particularly with stochastic processes. I have also been greatly helped
by Professor D. R. Cox, Mr. I. D. Hill, Professor D. V. Lindley, and
Mr. N. W. Please, as well as many others. Needless to say, none of
these people has any responsibilities for errors of judgment or fact that I
have doubtless persisted with, in spite of their best efforts. I am also
very grateful to Professor C. R. Oakley for permission to quote ex-
tensively from his paper on the purity-in-heart index in § 7.8.
University College Londmt D. C.
April 1970

STATISTICAL TABLES FOR USE WITH THIS BOOK


The appendix contains those tables referred to in the text that are
not easily available elsewhere. Standard tables, such as normal distri·
bution, Student's t, variance ratio, and random sampling numbers,
are so widely available that they have not been included. Any tables
should do. Those most referred to in the text are Fisher and Yates
Stati8tical tablea Jor biological, agricultural and medical reaearch (6th
edn 1963, Oliver and Boyd), and Pearson and Hartley Biometrika
tablea Jor 8tati8ticians (Vol. 1, 3rd edn 1966, Cambridge University
Press). The former has more about experimental designs; the latter
has tables ofthe Fisher exact text for a 2 x 2 contingency table (see § 8.2),
but anyone doing many of these should get the full tables: Finney,
LatBcha, Bennett, and Hsu Tablea Jor teating Bignificance in a 2 X 2
table (1963, Cambridge University Press). The Cambridge elementary
atatiBtical tablea (Lindley and Miller, 1968, Cambridge University Press)
give the normal, t, chi-squared, and variance ratio distributions, and
some random sa.mpling numbers.

Digitized by Google
Contents

INDEX 01' SYMBOLS xv


1. IS THE STATISTICAL WAY 01' THINKING WORTH
KNOWING ABOUTP 1
1.1. How to avoid maJdDg a fool of yourself. The role of statistics 1
1.2. What is an experimentP Some basic ideaa 8
1.8. The Dature of acientlftc iDference 6
2. FUNDAMENTAL OPERATIONS AND DEFINITIONS 9
2.1. FunctioDB and operaton. A beautiful notation for adding up 9
2.2. Probability 16
2.8. Random.ization and random sampling 16
2.4. Three rules of probability 19
2.6. Averapll 24
2.6. Meaauree of the variability of obaervatioDB 28
2.7. What is a atanda.rd errorP Varla.nces of fonctioDB of the obaerva-
tioDB. A reference list 88
S. THEORETICAL DISTRIBUTIONS: BINOJrUAL .AND POISSON 48
S.I. The idea of a distribution 48
S.2. SImple sampling and the derivation of the binomial distribution
through eumplee 44
S.S. muatration of the danger of drawing conclusioDB from tnnall
eamplee 49
S.4. The general eXpre8Bion for the binomial distribution and for ita
mean and variance 50
3.6. Random eventa. The Polaaon distribution 58
3.6. Some biological applicatioDB of the Polaaon distribution 66
S.7. Theoretical and observed variances: a numerical example 60
4. THEORETICAL DISTRIBUTIONS. THE GAUSSIAN (OR
NORMAL) AND OTHER CONTINUOUS DISTRIBUTIONS 64
4.1. The representation of continuous diatributioDB in general 64
4.2. The GalBian, or normal, distribution. A caae ofwishful thinking! 69
4.8. The standard normal distribution 72
4.4. The distribution of t (Student's distribution) 76
4.6. Skew distributions and the lognormal distribution 78
4.6. Testing for a Gauasian distribution. Rankita and probita 80
5. RANDOM PROCESSES. THE EXPONENTIAL DIS-
TRIBUTION AND THE WAITING TIME PARADOX 81
5.1. The exponential distribution of random intervals 81
5.2. The waitlDs time paradox 84

Digitized by Google
x Oonlmt8
6. CAN YOUR RESULTS BE BELIEVED? TESTS OF
SIGNIFICANCE AND THE ANALYSIS OF VARIANCE 86
6.1. The interpretation of testa of significance 86
6.2. Which sort of test should be used, parametric or nonparametric 1 96
6.S. Randomization tests 99
6.4. Types of sample and types of measurement 99

7. ONE SAMPLE OF OBSERVATIONS. THE CALCULA-


TION AND INTERPRETATION OF CONFIDENCE
LIMITS 101
7.1. The representative value: mean or median? 101
7.2. Precision of inferences. Can estimates of error be trusted 1 101
7.S. Nonparametric confidence limits for the median 103
7.4. Confidence limits for the mean of a normally distributed variable 105
7.5. Confidence limits for the ratio of two normally distributed
observations 107
7.6. Another way oflooking at confidence intervals 108
7.7. What is the probability of 'success'? Confidence limits for the
binomial probability 109
7.8. The black magical aesay of purity in heart as an example of
binomial sampling 111
7.9. Interpretation of confidence limits 114

8. CLASSIFIC~TION MEASUREMENTS 116


8.1. Two independent samples. Relationship between various methods 116
8.2. Two independent samples. The randomization method and the
Fisher test 117
8.S. The problem of unacceptable randomizations 123
8.4. Two independent samples. Use of the normal approximation 124
8.5. The cbi-squared (T) test. Classi1lcation measurements with two
or more independent samples 127
8.6. One sample of observations. Testing goodness of fit with cbi-
squared. 132
8.7. Related samples of classification measurements. Cross-over
trials 134

9. NUMERICAL AND RANK MEASUREMENTS. TWO


INDEPENDENT SAMPLES 1~
9.1. Relationship between various methods 137
9.2. Randomization test applied to numerical measurements 137
9.S. Two sample randomization test on ranks. The Wilcoxon (or
Mann-Whitney) test 143
9.4. Student's t test for independent samples. A parametric test 148

10. NUMERICAL AND RANK MEASUREMENTS. TWO


RELATED SAMPLES 152
10.1. Relationship between various methods 152
10.2. The sign test 153

Digitized by Google
OO'llknlB xi
10.8. The randomization teat for paired observations 167
10.4. The Wilcoxon signed ranb test for two related samples 160
10.5. A data selection problem arising in small samples 166
10.6. The paired , test 167
10.7. When will related samples (pairing) be an advantageP 169

11. THE ANALYSIS OF VARIANCE. HOW TO DEAL


WITH TWO OR MORE SAMPLES 171
11.1. Relationship between various methods 171
11.2. Assumptions involved in the analysis of variance based on the
Gaussian (normal) distribution. Mathematical models for real
observations 172
11.8. Distribution of the variance ratio F 179
11.4. Gaussian analysis of variance for k independent samples (the
one way analysis of variance). An illustration of the principle
of the analysis of variance 182
11.6. Nonparametric analysis of variance for independent samples
by randomization. The Kruskal-Wallis method 191
11.6. Randomized block designs. Gaussian analysis of variance for k
related samples (the two·way analysis of variance) 196
11.7. Nonparametric analysis of variance for randomized blocks.
The Friedman method 200
11.8. The Latin square and more complex designs for experiments 204
11.9. Data snooping. The problem of multiple comparisons 207

12. FITTING CURVES. THE RELATIONSHIP BETWEEN


TWO VARIABLES 214
12.1. The nature of the problem 214
12.2. The straight line. Estimates of the parameters 216
12.8. Measurement of the error in linear regression 222
12.4. Confidence limits for a fitted line. The important distinction
between the variance of 11 and the variance of Y 224
12.5. Fitting a straight line with one observation at each III value 228
12.6. Fitting a straight line with several observations at each II:
value. The use of a linearizing transformation for an exponen-
tial curve and the error of the half·life 284
12.7. Linearity, non-linearity, and the search for the optimum 248
12.8. Non-linear curve fitting and the meaning of 'best' estimate 257
12.9. Correlation and the problem of causality 272

13. ASSAYS AND CALIBRATION CURVES 279


IS.I. Methods for estimating an unknown concentration or potency 279
IS.2. The theory of parallel line 888&YS. The response and dose
metameters 287
IS.S. The theory of parallel line 888&YS. The potency ratio 290
13.4. The theory of parallel line 888&YS. The best average slope 292

Digitized by Google
18.5. Conftdence limite for the ratio of two normally distributed
varla.bles: derivation of Fieller's theorem 298
18.6. The theory of paraJlel line 8811&,... Conftdence limibl for the
potency ratio and the optimum design of 8811&,.. 297
18.7. The theory of parallel line 8811&,... Testing for non-validity 800
18.8. The theory of symmetrical parallel line 8811&,... Use of ortho-
gonal contrast. to test for non-validity 802
18.9. The theory of symmetrical parallel line 8811&,... Use of con-
traata in the analysis of variance 306
18.10. The theory of symmetrical parallel line 888&1'8. Simplified
calculation of the potency ratio and ibl conftdence limibl 308
18.11. A numerical example of a symmetrical (2+2) dose para.1lel
line 8811&1' 311
18.12. A numerical example of a symmetrical (3+8) dose parallel
line 8811&1' 319
18.18. A numerical example of an unsymmetrical (8+2) dose parallel
line 8811&1' 327
18.a. A numerical example of the standard curve (or calibration
curve). Error of a value of #e read off from the line 332
13.16. The (k+1) dose 8811&1' and rapid routine 8811&ya 840

14. THE INDIVIDUAL EFFECTIVE DOSE, DIRECT


ASSAYS, ALL-OR-NOTHING RESPONSES, AND THE
PROBIT TRANSFORMATION 344
a.1. The individual effective dose and direct 888&ya 344
a.2. The relation between the individual effective dose and all-or-
nothing (quantal) responses 346
a.3. The probit transformation. Linearization of the quantal dose
response curve 353
a.4. Probit curves. Estimation of the median effective dose and
quantal 8811&,.. 857
14.6. Use of the probit transformation to linearize other so& of
sigmoid curve 861
14.6. Logibl and other transformations. Relationship with the
Michaelia-Menten hyperbola 861

APPENDIX 1. Expectation, variance and non-experimental bias 365


A1.I. Expectation-the population mean 865
Al.2. Variance 868
A1.3. Non-experimental bias 369
AI.4. Expectation and variance with two random
variables. The sum of a variable number of
random varla.bles 370

APPENDIX 2. Stochastic (or random) processea 374


A2.1. Scope of the stochastic approach 374
A2.2. A derivation of the Poiaaon distribution 375

Digitized by Google
o~ mi
A2.8. The connection between the lifet;lmes of Indi-
vidual adrenaline molecules and the obaerved
breakdown rate and half-life of adrenaline 879
A2.'. A atocha8tic view of the adsorption of molecules
from solution sao
A2.5. The relation between the lifetime of individual
radioisotope molecules and the interval be-
tween dieintegratioDII S86
A2.6. Why the waiting time until the next event does
not depend on when the timing is started for
a Poiaaon proceaa 888
A2.7. Length-biaaed sampling. Why the average length
of the interval in which an arbitrary moment of
time falls is twice the average length of all
intervals for a Poiaaon proceaa 889

TABLES
Table AI. Nonparametric con1ldence limits for the median 896-7
Table A2. Con1ldence limite for the parameter of a binomial
distribution (i.e. the population proportion of
'aucceaaea') 898-401
Table AS. The Wilcoxon teat for two independent samples .02-'
TableA'. The Wilcoxon signed ranks teat for two related
samples .05
Table A5. The Kruakal-Wallia one-way analysis of variance
on ranks (independent samples) ~6-8
Table A6. The Friedman two-way analysis of variance on
ranks for randomized block experiments ~9
Table A7. Table of the critical range (cWference between rank
auma for any two treatmente) for comparinc all
pairs in the Kruakal-Wallia nonparametric one way
analysis of variance UO
Table AS. Table of the critical range (cWference between rank
auma for any two treatments) for comparing all
pairs in the Friedman nonparametric two way
analysis of variance '11
Table AD. Rankite (expected normal order statistics) U2-8

RBFBRBNOBS U6

INDBX '19

GUIDE TO THB SIGNIFIOANOE TESTS IN


DB APTERS 6-11

Digitized by Google
Index of symbols

Reference is to the main section in which the symbol is used, explained, or defined.

Mathematical symbols and operations


is equal to
is equivalent to, is defined 88
is approximately equal to
> is greater than
» is much greater than
is equal to or greater than
P is between a and b (greater than a and less than b)
"Vre. nth root of re
lire"
factorial z (§ 2.1)
logarithm of z. The power to which the base must be raised to
equal z. IT the base is important it is inserted (e.g. log. re, loglO re).
otherwise the expression holds for any base
antilogr: = aZ where a is the base of the logarithms
~ add up &II terms like the following (§ 2.1)
II multiply together &II terms like the following (§ 2.1)
and logical and (§ 2.4)
or logical or (§ 2.4)
d:tJlfh}
1Jy1&
see Thompson (1965)
or Massey and Kestelman (1964)
I
Roman symbols
a a constant (§ 2.7)
a entry in a 2 X 2 table (18.2)
a eatim.a.te of IX (value of 11 when z = z) (§ 12.2)
a' estimate of value of 11 when z = 0 (§ 12.2)
aor 4 leaat squares value of a (§§ 12.2, 12.7)
A total in 2 X 2 table (§ 8.2)
and logical 'and' (§ 2.4)
b estimate of p, slope of straight line (§ 12.2)
b or b leaat squares estimate of P (§ § 12.2, 12.7)
slope for standard line (§ 13.4)
slope for unknown line (§ 13.4)
denominator of a ratio (§ 13.5)
total in 2 X 2 table (§ 8.2)
a constant (§ 2.7)
concentration (§ A2.4)
population value of coeftlcient of variation (§ 2.6)
sample estimate of~ (§ 2.6)

Digitized by Google
1fide 0/ '1/fIIbol8
C total In I x 2 table (18.2)
_(III,.) population covariance of III and • (12.8)
cov(.,.) umple eatim&te of UII(III,.)
II cWrerence between 2 observatioDa (1110.1, 10.8)
d m-.n of II valuea (110.6)
II cWrerence between Y and • (112.2)
II Dunnett's (1964.) atatiatic (111.9)
D ratio of each dose to the next lowest dose (I 13.2)
e
exp(lII) .,.
base of naturalloga, 2·71829. • • (f 2.7)


~,~
error part of the mathematical model for an observation (111.2)
event. <t 2.4)
B(III) expectation (long run mean) of. (I Al.l)
BD60 effective dose in 60 per cent of subject. (I 14.3)
1(.) any function of. (12.1)
1(111) probability delUllty function of III (14..1)
11(111) probability delUllty function for length-biased samplea (I AI.7)
I number of degrees of freedom (II 8.15, 11.3)
I frequency of observing a particular vaIue (I 8.8)
F(III) diatribution function. Probability of maJdng an observation of III or
leas (f 4.1)
,
'1(111) length-biaaed diatribution function (I AI.7)
variance ratio (I 11.8)
(/ index ofaigni1lcance of b (118.5)
(/(111) any function of III (II 2.1, Al.l)
H Kruakal-Wallia atatiatic (I 11.5)

;) counting subscript. (12.1)


k number of claaaea (II 8.5, 8.6)
k number of groups, treatment. (Chap. 11)
k number of III vaIuea (§ 12.6)
k rate constant (§§ 12.2, 12.6, AI.3)
k., kg number of dose levels for standard, unknown (II 18.1, 13.8)
k k.+kg (II 13.1, 18.8)
~ parameter of hyperbola (e.g. Michaella constant) (I 12.8)
K a sample eatimate of ~ (I 12.8)
it. the least squa.rea estimate of.". (I 12.8)
L linear function of observations (§ 2.7)
L orthogonal contrasts (I 18.8)
LD60 lethal dose in 50 per cent of subjects (I 14.3)

-
m an estimate of the mean, I' (§ 2.5)
population mean number of event. in unit time (space, etc.) for
Poiaaon (II 8.5, 6.1, AI.2)

-
m

m
m
sample estimate of _ (§ 8.6)
population median (17.8)
number of new observations (§§ 7.4,12.4)
estimate of a ratio, alb (§ 7.6)
M log,(potency ratio) (§ 13.8)
fl. N number of observations, an integer (§ 2.1)
fI number of binomial 'trials' (18.2)
fI(t) number of intervals ~ t (15.1)
fli number of observations in Jth group (§ 11.4)

Digitized by Goog Ie

xvii
number of replicate responses to each dose ell 13.2, 13.8)
number of responses for standard, unknown (It 13.2, 13.8)
total number of adsorption sites (§ A2.4)
number of sites occupied, or nuclei not disintegrated, at time ,
U§ A2.4, A2.5)
NED norID&!. su " ....""£1' deviation (I 14.8)
any quantEy,y Y"r"enmes negligible time-tntervala
(1A2·2)
logical
observed nifif?nnYTIr??,if," §§ 3.6,8.4, 14.4,
8&lDe as
cumulatE en {I 14.2)
probability (true or estimated) of the event in brackets (I§ 2.2, 2.4)
conditional probability of E1 given that E. baa happened (12.4)
result of signiftcance test (§ 6.1), confidence level (U 7.2,7.9)
probability of a 'success' at each trial in binomial situation
(I§ 3.2,3.5)
high and low confidence limits for ~ (§ 7.7)
probability is occupied, emytt
number II 3.1,3.2), of
observ",t 'J§1?Ccesses' (17.7)
rank of tnrming confidence YTIedian (I 7.3)
number treatments) (§§
mean sut2YTInte of ... (§ 8.6)
r Pearson correlation coefficient (112.9)
r. Spearman rank correlation coefficient (§ 12.9)
r base of logarithms for &88&YS (§§ 13.2, 13.3)
R sum of ranks (II 9.3,11.3,11.5)
R potency ratio (U 13.1, 13.3)
.C.) 8&lDple standard deviation of the variable in brackets. An estimate
ofa{.)
8&lDple nariable in bracknin yyuare of B{.).
An esiYnntn viW (.) (§§ 2.1,
rlDlll largest a set of 8&lDple 1.2)
Friedman ll.7)
Scheff' ,>'0)
8 sum of squared deviations to be minimized (§ 12.2)
,,
8 standard preparation (§ 13.1)
Student's statistic (§ 4.4)
time, time interval between events, lifetime (Chapter 5, Appendix 2)
1 time considered as a random variable, t denoting a particular value
ofi (§A2.5)
a time inte%¥al
total of nJ§rnnneni',rn,1n
populatiun (§§ 5.1, Al.l.
A2.2)
8&lDple
sum of ",Ylinhever is the
smaller
u standard norID&!. (GaU88ian) deviate (§ 4.3)
U unknown preparation (§ 13.1)
11].1' etc. variance multipliers (§ 13.5)
2

gitized by Goo
xviii Index oj symlJo18
val{.) population variance of variable in brackets. Same as a2{.).
(§§ 2.6, 2.7)
var{.) Sample estimate of va.{.). Same as r{.) (§§ 2.6,2.7)
..y population (true) maximum value of y (§ 12.8)
V a sample estimate of..y (§ 12.8)
V least squares estimate of ..y (§ 12.8)
to weight (§ 2.5)
Z any variable (§§ 2.1, 2.7, 4.1)
II :z: considered as a random variable, :z: denoting a particular value of
II (§§ 4.1, Al.1)
II geometric mean of:e values (§ 2.5)
Z independent variable in curve fitting problems (§§ 12.1, 12.2)
Z log II (Chapter 13)
zo' :z:. observed frequency (integer) and expected frequency (§ 8.5)
th., etc. means of observations (§ 2.1)
y observed value of dependent variable in curve-fitting problems
(§§ 12.1, 12.2)
y value of dependent variable read oft' fitted curve (§ 12.2)
doses ofstandard and unknown (§ 18.1)
doses giving equal responses (§ 13.3)

Greek symbols
ex (alpha) probability of an error of the first kind (§ 6.1)
ex population value of y when:e = ~, estimated by a (§§ 12.2, 12.7)
ex population value of numerator of ratio (§ 13.5)
ex orthogonal coeftl.cient (§ 13.8)
{J (beta) probability of an error of the second kind (§ 6.1)
{J population value of slope, estimated by b (§ § 12.2, 12.7)
{JI block eft'ect for ith block in model observation (§ 11.2)
(J population value for denominator of ratio (§ 13.5)
I:J. (delta) change in, interval in, value of following variable (§§ 5.1,
A2.2)
(lambda.) population (true) mean number of events in unit time
(§§ 5.1, A2.2)
measure of probability of catabolism, disintegration, adsorption in
a short time-interval (§§ A2.3-A2.6)
emu) population mean (§§ 2.5,4.2, 12.2, Al.l)
population (true) value of ratio (§ 13.5)
measure of probability of desorption (§ A2.4)
3·141593 .••
II (capital pi) multiply the following § 2.1
a2{.) (sigma) population (true) variance of variable in brackets. Same as
va.{.) (§§ 2.1,2.6,2.7, Al.2)
1: (sigma) add up the following (§ 2.1)
(tau) treatment eft'ect for jth treatment in model observation
(§ 11.2)
(cbi) chi-squared statistic with f degrees of freedom (§ 8.5)
rank statistic distributed approximately as r (§ 11.7)
(omega) interblock standard deviation (§ 11.2)

Digitized by Google
1. Is the statistical way of thinking
worth bothering about?

'I wish to propose for the reader's favourable consideration a doctrine which
may, I fear, appear wildly paradoxical and subve1'8ive. The doctrine in question
is this: that it is undesirable to believe a proposition when there is no ground
whatever for supposing it true. I must of course, admit that if such an opinion
became common it would completely transform our social life and our political
system: since both are at present faultless, this must weigh against it. I am also
aware (what is more serious) that it would tend to diminish the incomes of
clairvoyants, bookmake1'8, bishops and othe1'8 who live on the irrational hopes of
those who have done nothing to deserve good fortune here or hereafter. In spite
of these grave arguments, I maintain that a case can be made out for my paradox,
and I shall try to set it forth.'
BERTRAND RUSSELL, 1985
(On the Value of ScepticiBm)

1.1. How to avoid making a fool of yourself. The role of statistics


IT is widely held by non-statisticians, like the author, that if you do
good experiments statistics are not necessary. They are quite right.
At least they are right as long as one makes an exception of the import-
ant branch of statistics that deals with processes that are inherently
statistica.} in nature, so-called 'stochastic' processes (see Chapters 3
and 5 and Appendix 2). The snag, of course, is that doing good experi-
ments is difficult. Most people need all the help they can get to prevent
them making fools of themselves by claiming that their favourite
theory is substantiated by observations that do nothing of the sort.
And the main function of that section of statistics that deals with
tests of significance is to prevent people making fools of themselves.
From this point of view, the function of significance tests is to prevent
people publishing experiments, not to encourage them. Ideally, indeed,
significance tests should never appear in print, having been used, if at
all, in the preliminary stages to detect inadequate experiments, 80 that
the final experiments are 80 clear that no justification is needed.
The main aim of this book is to produce a critical way of thinking
about experimentation. This is particularly necessary when attempting

Digitized by Google
2 § 1.1
to measure abstraot quantities suoh a.s pain, intelligence, or purity in
heart (§ 7.8). As Ma.inla.nd (1964) points out, most of us find arithmetio
easier than thinking. A partioular effort has therefore been made to
explain the rational basis of a.s many methods a.s possible. This has
been made muoh easier by starting with the randomization approach
to signifioance testing (Chapters 6--11), because this approaohis easy to
understand, before going on to tests like Student's t test. The numeri08.I
examples have been made a.s self-oontained a.s possible for the benefit
of those who are not interested in the rational basis.
Although it is difficult to aohieve these aims without a certain
amount of arithmetio, all the mathematiea.l ideas needed will have
been learned by the age of 15. The only diffioulty may be the 0008.-
siona.l use of longer formulae than the reader may have encountered
previously, but for the vaat majority of what follows you do not need
to be able to do anything but add up and multiply. Adding up is so
frequent that a speoial notation for it is described in detail in § 2.1.
You may find this very dull and boring until familiarity ha.s revealed its
beauty and power, but do not on any a.ocount miss out this section.
In a few sections some elementary O8.loulus is used, though anything at
all daunting has been confined to the appendices. These parts oan be
omitted without affecting understanding of most of the book. If you
know no O8.loulus at all, and there are far more important rea.sons for
no biologist being in this position than the ability to understand the
method ofleaat squa.res, try Silvanus P. Thompson's OalcWUII made Easy
(1965).
A list of the uses and scope of statisti08.I methods in laboratory and
olini08.I experimentation is necessa.rily arbitrary and personal. Here is
mine.
(1) Statisti08.I prudence (Lancelot Hogben's phrase) encourages the
design of experiments in a way that allows conolusions to be drawn from
them. Some of the ideas, suoh a.s the central importance of randomiza-
tion (see §§ 2.3, 6.3, and Chapters 8-11) are far from intuitively obvious
to most people at first.
(2) Some prooesses are inherently probabilistio in nature. There is no
altemative to a sta.tisti08.I approaoh in these oa.ses (see Chapter 5 and
Appendix 2).
(3) Statistica.l methods allow an estimate (usually optimistio, see
§ 7.2) of the uncertainty of the conclusions drawn from inexact observa-
tions. When results are a.ssessed by hopeful intuition it is not uncommon
for more to be inferred from them than they really imply. For example.

Digitized by Google
§ 1.1 3
Schor and Ka.rten (1966) found that, in no less than 72 per cent of a
sample of 149 artioles seleoted from 10 highly regarded medical journals.·
conolusions were drawn that were not justified by the results presented.
The most common single error was to make a general inference from
results that could quite easily have arisen by ohance.
(4) Statistical methods can only cope with random errors and in
real experiments systematio errors (bias) may be quite as important as
random ones. No amount of statistics will reveal whether the pipette
used throughout an experiment was wrongly calibrated. Tippett (1944)
put it thus: 'I prefer to regard a set of experimental results as a biased
sample from a population, the extent of the bias varying from one kind
of experiment and method of observation to another, from one experi-
menter to another, and, for anyone experimenter, from time to time.'
It is for this reason, and because the assumptions made in statistical
analysis are not likely to be exactly true, that Mainland (1964) em-
phasizes that the great value of statistical analysis, and in particular
of the confidence limits disoussed in Chapter 7, is that 'they provide
a kind of minimum estimate of error, because they show how little a
partioula.r sample would tell us about its population, even if it were a
strictly random sample.'
(5) Even if the observations were unbiased, the method of calculating
the results from them may introduce bias, as disoussed in §§ 2.6 and
12.8 and Appendix 1. For example, some of the methods used by
bioohemists to caloulate the Michaelis constant from observations of the
initial velocity of enzymio reactions give a biased result even from
unbiased observations (see § 12.8). This is essentially a statistical
phenomenon. It would not happen if the observations were exact.
(6) The important point to realize is that by their nature statistioal
methods can never 'JY"OVe anything. The answer always comes out as
a probability. And exactly the same applies to the assessment of
results by intuition, except that the probability is not caloulated but
guessed.

1.2. What is an experiment 1 Some basic ideas


StatiBtic8 originally meant state records (births, deaths, etc.) and its
popular meaning is still muoh the same. However. as is often the case,
the scientifio meaning of the word is much narrower. It may be illus-
trated by an example.
Imagine a solution containing an unknown ooncentration of a drug.
If the solution is assayed many times, the resulting estimate of

Digitized by Google
4 Statistical thinking? § 1.2
concentration will, in general, be different at every attempt. An
unknown true value such as the unknown true concentration of the
drug is called a parameter. The mean value from all the assays gives an
estimate of this parameter. An approximate experimental estimate
(the mean in this example) of a parameter is called a 8tatistic It is
calculated from a 8ample of observations from the poptdation of all
possible observations.
In the example just discussed the individual assay results differed
from the parameter value only because of experimental error. However,
there is another slightly different situation, one that is particularly
common in the biological sciences. For example, if identical doses of
a drug are given to a series of people and in each case the fall in blood
sugar level is measured then, as before, each observation will be differ-
ent. But in this case it is likely that most of the difference is real.
Different individuals really do have different falls in blood sugar level,
and the scatter of the results will result largely from this fact and only
to a minor extent from experimental errors in the determination of the
blood sugar level. The average fall of blood sugar level may still be of
interest if, for example, it is wished to compare the effects of two
different hypoglycaemic drugs. But in this case, unlike, the first, the
parameter of which this average is an estimate, the true fall in blood
sugar level, is no longer a physical reality, whereas the true concentra-
tion was. Nevertheless, it is still perfectly all right to use this average as
an estimate of a parameter (the value that the mean fall in blood
sugar level would approach if the sample size were increased indefin-
itely) that is used simply to define the distribution (see §§ 3.1 and
4.1) of the observations. Whereas in the first case the average of all
the assays was the only thing of interest, the individual values being
unimportant, in the second case it is the individual values that are of
importance, and the average of these values is only of interest in so far
as it can be used, in conjunction with their scatter, to make predictions
about individuals.
In short, there are two problems, the older one of estimating a true
value by imperfect methods, and the now common problem of measur-
ing effects that are really variable (e.g. in different people) by relatively
very accurate methods. Both these problems can be treated by the
same statistical methods, but the interpretation of the results may be
different for each.
With few exceptions, scientific methods were applied in medicine and
biology only in the nineteenth century and in education and the socia)

Digitized by Google
§ 1.2 Statistical thinking 1 5
sciencee only very recently. It is necessary to distinguish two sorts of
scientific method often called the ob8ervational method and the
experimental method. Claude Bernard wrote: 'we give the name observer
to the man who applies methods of investigation, whether simple or
complex, to the study of phenomena which he does not vary and which
he therefore gathers as nature offers them. We give the name experi-
menter to the man who applies methods of investigation, whether
simple or complex, so as to make natural phenomena vary.' In more
modem terms Mainland (1964) writes: 'the distinctive feature of an
experiment, in the strict sense, is that the investigator, wishing to
compare the effects of two or more factors (independent variables)
assigns them himself to the individuals (e.g. human beings, animals or
batches of a chemical substance) that comprise his test material.'
For example, the type and dose of a drug, or the temperature of an
enzyme system, are independent variable8.
The observational method, or survey method as Mainland calls it,
usually leads to a correlation; for example, a correlation between
smoking habits and death from lung cancer, or between educational
attainment and type of school. But the correlation, however perfect
it may be, does not give any information at all about ca'U8ation, t such
as whether smoking ca'U8e8 lung cancer. The method lends itself only
too easily to the confusion of sequence with consequence. 'It is the
:po&t hoc, ergo propter hoc of the doctors, into which we may very easily
let ourselves be led' (Claude Bernard).
This very important distinction is discussed further in §§ 12.7 and
12.9. Probably the most useful precaution against the wrong interpreta-
tion of correlations is to imagine the experiment that might in principle
be carried out to decide the issue. It can then be seen that bias in the
results is controlled by the randomization proceSB inherent in experi-
ments. If all that is known is that pupils from type A schools do better
than those from type B schools it could well have nothing to do with
the type of school but merely, for example, be that children of educated
parents go to type A school and those of uneducated parents to type B
schools. If proper experimental methods were applied in the situations
mentioned above the first step would be to divide the population (or a
random sample from it) by a random process, into two groups. One
group would be instructed to smoke (or to go to a particular sort of
school), the other group would be instructed not to smoke (or go to
t It is not even necessarily true that zero correlation rules out causation, because
Jack of correlation does not necessarily imply independence <see § 12.9).

Digitized by Google
§ 1.2
a different sort of school). The difficulty in the medical and social
sciences is usually that an experiment may be considered unethical.
Since it can hardly be assumed a priori that there is an equal ohance of
smoking having good or bad effects on health, it is not poBSible to
gAOUP of peopl4i though it
leave one A4iI4icted group to
Amoking by sorfl4i gTUUp) and to 4ither
smoking.
situation is nl)) this, however~ g4iTuine
doubt about the relative merits of different sorts of sohool, and, very
often, about different sorts of therapy, so in these cases it is not merely
ethical to do a proper experiment, but it would be unethioal, though
not unusual, not to do the experiment.

of new
llllllllllllllOllllll'''U'' the
to mention the
Inimdations of ftarting on TTlllllnlln

The earliest natural philosophers based their work largely on


deductive arguments from axiomB. The only criterion for a valid set of
axioms is that it should not be poBSible to deduce contradictory con-
clusions from them, i.e. that the axioms should be consistent. Even if
this is so it has no bearing on whether or not the axioms are true.
flTme to be knowledge of uorld
obtained by general theori4iA
and not, as been assumeg, grrIl4iflion of
case from a
Ilm()!88 of induction be subjeot
but it was not until much later that these uncertainties were investigated
and attempts made to measure them. During the seventeenth century
the study of probability theory was started by Fermat and Pascal. This
was, and still is, a branch of mathematics, wholly deductive in nature.
Probability theory and eXflk?rimental method grew ug alongside
but largely One of the first at a
fflome when astrommHYff
ll,n,,'ll,ll'l,ll to find out ftars
randomlg sort of order. Ilu?ded
'lllnll¥llfl¥it.ttltfJ of a true
given some experimental observations relevant to it. For example:
(1) the hypothesis that the stars are randomly distributed; (2) the
hypothesis that morphine is a better analgesio than aspirin; or (3)

gle
§ 1.3 Stati8tical thinlcing, 7
the hypothesis that state schools provide a better education than
private schools.
The use to which natural soientists wanted to put probability theory
was, it seems, of a quite different kind from that for whioh the theory
was designed. All that probability theory would answer were questions
such as: Given certain premises about the thorough shuftling of the paok
and the honesty of the players, what is the probability of drawing four
consecutive aces' This is a statement of the probability of making
some observations, given an hypothesis (that the cards are well ahuftled,
and the players honest), a deductive statement of direct probability.
What was needed waa a statement of the probability of the hypothesis,
giwm some observations-an induotive statement of inver8e probability.
An answer to the problem was provided by the Rev. Thomaa Bayes
in his JD88tJlIloUJarda solving a Problem in tke Doctrine o/Oh,afI.CU published
in 1763, two years after his death. Bayes' theorem states:
posterior probability of a hypothesis = constant X likelihood
of hypothesis X prior probability of the hypothesis (1.3.1.)
In this equation prior (or a priori) probability means the probability
of the hypothesis being true be/ore making the observations under
consideration, the posterior (or a posteriori) probability is the probability
after making the observations, and the likelihood 0/ tke hypotke8i8 is
defined aa the probability of making the given observations i/ the
hypothesis under consideration were in fact true. This technical
definition of likelihood will be encountered again later.
The wrangle about the interpretation of Bayes' theorem continues
to the present day. Is 'the probability of an hypothesis being true' a
meaningful idea' The great mathematician Laplace assumed that if
nothing were known of the merits of rival hypotheses then their prior
probabilities should be considered equal ('the equipartition of ignor-
ance'). Later it was suggested that Bayes' theorem waa not really
applicable except in a small proportion of cases in whioh valid prior
probabilities were known. This view is still probably the most common,
but there is now a strong sohool of thought that believes the only
sound method of inference is Bayesian. An uncontroversial use of
Bayes' theorem, in medical diagnosis, is mentioned in § 2.4.
Fortunately, in most, though not all, cases the practical results are
the same whatever viewpoint is adopted. If the prior probabilities of
several mutually exclusive hypotheses are known or assumed to be
equal then the hypothesis with the maximum posterior probability

Digitized by Google
8 Statistical thinki1l{/? § 1.3
will also be that with the maximum likelihood. In fact a popular
procedure is to ignore the prior probability altogether and to select the
hypothesis with the maximum likelihood. This procedure avoids
altogether the making of statements of inverse probability that many
people think to be invalid, but loses something in interpretability.
The probability considered is the probability of the observations calcu-
lated assuming the hypothesis in question to be tme--a statement of
direct probability.
It has been argued strongly by Karl Popper that scientific inference
is a wholly deductive process. A hypothesis is framed by inspired guess-
work. It consequences are ded~ and then tested experimentally. This
is certainly just how things should be done. But, as A. J. Ayer points
out, the experiment is only useful if it is supposed that it will give the
same result when it is repeated, and the argument leading to this
supposition is the sort of inductive inference with which much of
statistics is concerned.

Digitized by Google
2. Fundamental operations and
definitions

'Considering how many fools can calculate, it is surprising that it should be


thought either a diftlcult or a tediOU8 task. for any other fool to learn how to
master the same tricks. SILVANUS P. THOMPSON

2.1. Functions and operators. A beautiful notation for adding up

Functional notation
IF the value of one variable, say 1/, depends on the value of another,
aay x, then 1/ is said to be a function of x. For example, the response to a
drug is a function of the dose. The usual algebraic way of saying this is
11 = J(x) where J denotes the function. This equation is read '1/ equals a
function oj x'. If it is required to distinguish different functions of the
aame variable then different symbols are chosen to represent each
function. For example, 111 = g(x), 112 = ~(x). If the function 1 were the
square root, 9 were the logarithm, and ~ the tangent then the above
equations could be written in a less abstract form as 1/ = yX,1/l = log
x, and 1/2 = tan x. This notation can be extended to several variables.
H the value of 1/ depends on the value of two different variables, Xl
and X2 say, this could be denoted 11 = I(Xl' X2)' An example of such a
function is 1/ = X I 2 +X2'
Needless to say, the symbols, I, g, and ~ do not stand for numbers
and, for example, it is very important to distinguish 1/ = I(x) from
'11 equals 1 times x'. In the present case I, g, and ~ stand for operations
carried out on the argument x in just the same way as the symbol
'+' stands for the operation of addition of the quantities on each side
of the plus sign, or the symbol d/dx stands for 'find the differential
coefficient with respect to x'.
In the following pages this operational notation is used frequently.
For example, 8(X) will stand for 'the estimated standard deviation oj
x' (not '8 timu x'). t The square of the standard deviation is called the
t See § 2.6 for the definitions. Although it is commonly used, this is not really a
consistent uae of the notation. The 88IDple standard deviation, 8(1Il), is not a function of
a single variable Ill, but of the whole set of III values making up the 88IDple. And in the
cue of the population standard deviation, a(IIl), a is really an operator on the probability
distribution of:ll (_ Appendix 1).

Digitized by Google
10 § 2.1
variance. The variance of z is thus [8(Z)]2, whioh is usually written
r(z). The situation may look even more confusing if it is wished to
denote the estimated standard deviation of a quantity like Zl -~, i.e.
a measure of the scatter of the values of the quantity Zl-Z2' Using the
notation given above this number would be written 8(Zl -Zs) , but this
is not the same 88 8(Zl)-8(Z2); 8 is an operator not a number. To add
to the difficulties it is quite common for 8(Z) or I(z) to be abbreviated
to 8 and I, the argument, z, being understood. So in this case 8 and I
do stand for numbers; the numbers 8(Z) and I (z). Braokets rather than
parentheses are sometimes used to make the notation olearer so the
standard deviation of Zl -Z2 is written 8[Zl -zJ.
Two important operators are those used to denote the formation of
sums and produots, viz. 1: and n (Greek capitals, sigma and pi). For
example, l:z means find the sum of all the values of z, and nz means
find the product of all the values of z. These operations ooour often
and are disoussed in more detail below.

Factorial notation
Another operation that will ooour in the following pages is written
n!, whioh is read 88 'faotorial n'. When n is an integer this has the
value n(n-l)(n-2) ... 1. For example, 4! = 4X3X2Xl = 24. A
more general definition (the gamma function) is valid also for non-
integers and ooours often in more advanced work than is dealt with
here. In the light of this more general definition, 88 well 88 for reasons of
convenience that will be apparent later, 0 t (faotorial zero) is defined 88
having the value 1.

Phe uae 01 the 8'Ummation operator


The operation of adding up ooours very often. The arithmetio is
familiar, but the notation used may not be. In the following pages the
summation operator is used often. Frequently it is written with a full
panoply of superscript and subscripts. This makes the operation
unambiguous, at the expense of looking a bit complicated. It is very
well worth while (for far wider reasons than merely understanding this
book) making sure that you can add up, so the temptation to skip this
section should be resisted. The use of the product operator n is ana.-
logous, + being replaced by X.
Given a set of observations, for example n replicate observations on
the same animal of the fall in blood pressure in response to a drug, an
observation can be denoted y,. This symbol stands for the ith fall in

Digitized by Google
§ 2.1 11
blood pressure. There are n observations 80 in general an observation is
71t where i = 1. 2•...• n. If n = 5. for example. then the five observations
are symbolized 711. 112. 1Ia. 11,. 1Ia. Note that the subscript i has not neoes-
aarily got any particular experimental signifi.canoe. It is merely a method
for counting. or labelling. the individual observations.
The observations can be laid out in a table thus:

111 112 lIa···11.·


Mathematicians would refer to suoh a table as a OM-dimenriotaal arrall
or a t;edor. but 'table' is a good enough name for now.
The instruction to add up all values of 11 from the first value to the
nth value is written

sum =
,..I1/, or, more briefly,
"
I1/,.
,.1 '.1
This expression symbolizes the number
sum = 111 +1I2+lIa+ ••• +1I".
Similarly,
,·a
I1/, stands for the number 1Ia+1I, + lIa·
,.a
Thus the arithmetic mean of n values of 11 is

Notioe that after the summation operation, the counting, or sub-


scripted. variable, i. does not appear in the result.
A slightly more complicated situation arises when the observations
can be olassified in two ways. For example, if n readings of blood
pressure (y) were taken on each of k animals the results would probably
be arranged in a table like this:
Animal (value of j)
1 2 3 .... k

1 1111 11111 lila •••• lilli'


2 11111 111111 1111:;····11»
observation 3 lIal lIall 1133 •••• lIare
(value of i)
n IInl IInll Ifna •••• Ifnrc

Digitized by Google
12 Operations and definitions § 2.1
Two subscripts, say i andj, are now needed, one for keeping count of
the observations and one for the animals. i takes the values 1, 2, 3, ... ,
n; and j takes the values 1, 2, 3, ... , k. The ith observation on the jth
animal is thus represented by the symbol YI/' In more general terms,
YII stands for the value of Y in the ith row and jth column of a table
(or two-dimensional a"ay, or matrix) such as that shown above.
For example, a table with 3 columns and 4 rows could be written

Y11 Y12 Y13 2 4 3


Y21 Y22 Y23 and a particular table 1 6 4
Y31 Y32 Y33 of this size could be 5 7 6
Yu Yf.2 Yf.3 8 4- 5

In this case n = 4 and k = 3 and the n X k table contains nk = 12 obser-


vations.
The row and column totals and means can be represented by an
extension of the notation used above. For example the total of the
observations in the jth column, which may be called T.I for short,
would be written

1-"
T.I = IYII =YlI+Y2/+Y3/+···+Yft/·
1~1
(2.1.1)

Thus the total of the first column is T.1 = Y11+Y21+Y31+"'+Y,,1


(=16 in the example). The mean of the readings in thejth column (the
mean fall in blood pressure in the jth animal in the example given
above), which is usually called Y.I> is thus

12"

-
IYII T .1
1= 1 (2.1.2)
Y.I=--=-·
n n

Again notice that after summing over the values of i (i.e. adding
up the nu mben in a specified column) the answer does not involve i,
but does sti II involve the specified column number j. The symbol i, the
subscript operated on, is replaced by a dot in the symbols T.I and Y.I'
In an exactly similar way the total for the ith row, Tu is written

J-"
TI . = I
J-l
YII = Yll +YI2+YI3+"'+YIIc (2.1.3)

Digitized by Google
§ 2.1 Operationa and deftnitiona 13
For example, for the second row P2 • = Y21+Y22+ ..• +Y2/C (=11 in
the example). The mean value for the ith row is
I_Ie

-
P
I.
IYII
1= 1 (2.1.4)
Yt. =k=-Ic-·

Using the numbers in the 4X 3 table above, the totals and means
are found to be

Column number (value of j) Row totala Row


meane
I_I:
1 2 3 T,.= 1: '11 9,. = Tdt
1-1
I_Ie
Row 1 '11 = 2 1111 = 4 1113 = 3 TI. =
1: ' "
1-1 = 9 91.= 9/ 3
lale
number 2 1/11 = I 1/12 = 8 1123 =4 TI . = 1: 1111 = II gl. = 11/3
1-1
1=1:
(value
of i)
3 '81 = 6 I/al= 7 l/a3 = 8 Ta. - 1: lIal = 18 9a. = 18/3
1-1
I_Ie
4 lIu = 8 1/42 = 4 1143 = 6 T4 • = 1: 1141 = 17 94. = 17/3
1-1

Column tota.. Grand total


1-71 1=71 1·71 1=71 '.nl=1e
T.,- 1:1/" T.l = 1:1/11 T.2 = 1:YII T.a = 1:Y13 1: 1: III, = 66
1-1 1-1 1= 1 1=1 i-11-1
= 18 = 21 = 18
Column meane
g.,= T.,/n g.l = 18/4 g.1 = 21/4 g.3 = 18/4

Phe grand talal of all the observations in the table illustrates the
meaning of a double summation sign. The grand total (0, sayt) could
be written as the sum of the row totals
1=71
G= I (Pt,}.
1= 1

Inserting the definition of PI. from (2.1.3) gives

o= If('fYIi).
1=1 1=1

Equally, the grand total could be written as the sum of the column
totals
I=/c
0= I (P. / )
1= 1
t It would be more coDBistent with earlier notation to replace both lIuftb:ee by dota
and call the grand total T .• , but the symbol G is often used instead.

Digitized by Google
14 § 2.1
which, inserting the definition of Pol from (2.1.1), becomes

Since the grand total is the same whichever order the additions are
carried out in, the parentheses are superfluous and the operation is
usually symbolized
'znl=k I=k'-n
G= I I YII or 1-1'-1
,=1/ .. 1
I I YII or simply IIY·
What to do if you get 8twk
If it is ever unclear how to manipulate the summation operator
simply write out the sum term by term and apply the ordinary rules of
algebra. For example, if k denotes a constant then
"
Ik-x, "
= kIx, (2.1.5)
'-I '-I
because the left-hand side, written out in full, is kx1 +k-x2+.o.+kxn
= k(xi +xt+ ... +x,,), which is the right-hand side. Thus if k is the same
for every x it can be 'taken outside the summation sign'. However
l:k,x" in which each x is multiplied by a different constant, is klxl
+k~2+ ... +k"x", which cannot be further simplified.
It follows from what has been said that if the quantities to be added
do not contain the subscript then the summation becomes a simple
multiplication. If all the x, = 1 in (2.1.5) then
,."
I k= k+k+ ... +k =nk (2.1.6)
'=1

and furthermore, if k = I,

(2.1.7)

Another useful result is

I" (x,-y,) = (X1 -YI)+(X2-Y2)+···+(x,,-yJ

(2.1.8)

These results will be used often in later sections.

Digitized by Google
§ 2.2 15

2.2. Probability
The only rigorous definition of probability is a set of moms defining
its properties, but the following discussion will be limited to the less
rigorous level that is usual among experimenters. For practical pur-
poses the probability of an event is a number between zero (implying
impossibility) and one (implying certainty). Although statisticians
differ in the way they define and interpret probability, there is complete
agreement about the rules of probability described in § 2.4. In most of
this book probability will be interpreted as a proportion or reiatitJe
frt.q'lWflC'!J. An excellent discussion of the subject can be found in
Lindley (1965, Chapter 1).
The simplest way of defining probability is as a proportion, viz.
'the ratio of the number of favourable cases to the total number of
equiprobable cases'. This may be thought unsatisfactory because the
concept to be defined is introduced as part of its own definition by the
word 'equiprobable', though a non-numerical ordering of likeliness
more primitive than probability would be sufficient to define 'equally
likely', and hence 'random'. Nevertheless when the reference set of
'the total number of equiprobable cases' is finite this description is
used and accepted in practice. For example if 55 per cent of the popula-
tion of college students were male it would be asserted that the prob-
ability of a single individual chosen from this finite population being
male is 0·55, provided that the probability of being ohosen was the
same for all individuals, i.e. provided that the choice was made at
random.
When the reference population is infinite the ratio just discussed
cannot be used. In this case the frequency definition of probability is
more useful. This identifies the probability P of an event as the limiting
value of the relative frequency of the event in a random sequence of
trials when the number of trials becomes very large (tends towards
infinity). For example, if an unbiased coin is tossed ten times it would
not be expected that there would be exactly five heads. H it were tossed
100 times the proportion of heads would be expected to be rather
closer to 0·5 and as the number of tosses was extended indefinitely the
proportion of heads would be expected to converge on exactly 0·5.
This type of definition seems reasonable, and is often invoked in
practice, but again it is by no means satisfactory as a complete,
objective definition. A random sequence cannot be proved to converge
in the mathematical sense (and in fact any outcome of tossing a true
s

Digitized by Google
16 Operations and definitions § 2.2
coin a million times is poBBible), but it can be shown to converge in a
statistical sense.

Degrees oj belieJ
It can be argued. persuasively (e.g. Lindley (1965, p. 29» that it is
valid and sometimes necessary to use a subjective definition of prob-
ability as a numerical measure of one's degree of belief or strength of
conviction in a hypothesis ('personal probability'). This is required in
many applications of Bayes' theorem, which is mentioned. in §§ 1.3 and
2.4 (see also § 6.1, para. (7». However the application of Bayes' theorem
to medical diagnosis (§ 2.4) does not involve subjective probabilities,
but only frequencies.
2.3. Randomization and random sampling
The selection of random samples from the population under study
is the basis of the design of experiments, yet is an extraordinarily
difficult job. Any sort of statistical analysis (and any 80rt oj intuitive
analyBis) of observations depends on random selection and allocation
having been properly done. The very fundamental place of randomiza-
tion is particularly obvious in the randomization significance tests
described in Chapters 8-11.
It should never be out of mind that all calculations (and all intuitive
assessments) belong to an entirely imaginary world of perfect random
selection, unbiased. measurement, and often many other ideal properties
(see § 11.2). The 8BBumption that the real world resembles this imagin-
ary one is an extrapolation outside the scope of statistics or mathe-
matics. As mentioned in Chapter 1 it is safer to 8BBume that samples
have some unknown bias.
For example, an anti-diabetic drug should ideally be tested on a
random sample of all diabetics in the world-or perhaps of all dia-
betics in the world with a specified. form and severity of the disease,
or all diabetics in countries where the drug is available. In fact, what is
likely to be available are the diabetic patients of one, or a few,
hospitals in one country. Selection should be done 8trictly at random
(see below) from this restricted population, but extension of inferences
from this population to a larger one is bound to be biased to an un-
known extent.
It is, however, quite 68Sy, having obtained a sample, to divide it
strictly randomly (see below) into several groups (e.g. groups to receive
new drug, old drug, and control dummy drug). This is, nevertheless,

Digitized by Google
§ 2.3 Operation8 and deftnitioTUJ 17
very often not done properly. The hospital numbers of the patients will
not do, and neither will their order of appearance at a clinic. It is very
important to realize that 'random' is not the same thing as 'haphazard'.
If two treatments are to be compared on a group of patients it is not
good enough for the experimenter, or even a neutral person, to allocate
a patient haphazardly to a treatment group. It has been shown re-
peatedly that any method involving human decisions is non-random.
For all practical purposes the following interpretation of randomness,
given by R. A. Fisher (1951, p. 11), should be taken as a fundamental
principle of experimentation: ' . . . not determined arbitrarily by
human choice, but by the actual manipulation of the physical apparatus
used in games of chance, cards, dice, roulettes, etc., or, more ex-
peditiously, from a published collection of random sampling numbers
purporting to give the actual results of such manipulation.'
Published random sampling numbers are, in practice, the only
reliable method. Samples selected in this way (see below) will be re-
ferred to as selected strictly at random. Superb discussions of the crucial
importance of, and the pitfalls involved in random sampling have been
given by Fisher (1951, especially Chapters 2 and 3) and by Mainland
(1963, especially Chapters 1-7). Every experimenter should have read
these. They cannot be improved upon here.

HfYW to 8elect aample8 strictly at random uaing random number table8


This is, perhaps, the most important part of the book. There are
various ways of using random number tables (see, for example, the
introduction to the tables of Fisher and Yates (1963». Two sorts of
tables are commonly encountered, and those of Fisher and Yates (1963)
will be used as examples. The first is a table of random digits in which
the digits from 0 to 9 occur in random order. The digits are usually
printed in groups of two to make them easier to read, but they can be
taken as single digits or as two, three, etc. digit numbers. If taken in
groups of three the integers from 000 to 999 will occur in random order
in the tables. The second form of table is the table of random permuta-
tions. Fisher and Yates (1963) give random permutations of 10 and 20
integers. In the former the integers from 0 to 9, and in the latter the
integers from 0 to 19, occur in random order, but each number appears
once only in each permutation.
To divide a group of subjects into several sub-groups strictly at
random the easiest method is to use the tables of random permutations,
as long as the total number of subjects is not more than 20 (or whatever

Digitized by Google
18 Operations and definitions § 2.3
is the largest size of random permutation available). Suppose that 15
subjects are to be divided into group of size "1' ~, and na. First number
the subjects 0 to 14 in any convenient way. Then obtain a random
permutation of 15 by taking the first random permutation of 20
from the tables and deleting the numbers 15 to 19. (This permutation in
the table should then be crossed out 80 that it is not used again- use
each once only.) Then allocate the first ~ of the subjects to the first
group, the next ~ to the second group, and the remainder to the third
group. For example, if the random permutation of 15 turned out to be
1, 6, 8, 5, 10, 12, 11,9, 2, 0, 3, 14, 7,4, 13 (the first permutation from
Fisher and Yates (1963), p. 142» and the 15 subjects were to be divided
randomly into groups of 5, 4, and 6 subjeots then subjects 1, 6, 8, 5, and
10 would go in the first group, 12, 11, 9, and 2 in the second group, and
the rest in the third group.
For larger numbers of subjeots the tables of random digits must be
used. For example, to divide 24 subjects into 4 groups of 6 the procedure
is as follows. First number the subjects in any convenient way with the
numbers 00 to 23. Take the digits in the table in groups of two. The
table then gives the integers from 00 to 99 in random order. One
procedure would be to delete all numbers from 24 to 99, but it is more
economical to delete only 96, 97, 98, and 99 (i.e. those equal to or larger
~ than 96, whioh is the biggest multiple of 24 that is not larger than 100).
Now the remaining numbers are a random sequence of the integers
from 00 to 95. From each number between 24 and 47 subtract 24; from
each number between 48 and 71 subtract 48; and from each number
between 72 and 95 subtract 72 (or, in other words, divide every number
in the sequence by 24 and write down the remainder). For example, if
the number in the table is 94 then write down 22; or in place of 55
write down 07. (The numbers from 96 to 99 must, of course, be omitted
because their presence would give the numbers 00 to 03 a larger chance
than the others of occurring.) Some numbers may appear several times
but repetitions are ignored. H the final sequence were 21, 04, 07, 13, 02,
02,04,09,00,23, 14, 13, 11, etc., then subjects 21, 04, 07, 13,02,09 are
allocated to the first group, subjects 00,23, 14, 11, eto. are allocated to
the second group, and 80 on.
The method is simpler for the random block experiments described
in §§ 11.6 and 11.7. Blocks are never likely to contain more than 20
treatments 80 the order in which the treatments oocur in each blook is
taken from a random permutation found from the tables of random
permutations as above. For example, if there are four treatments in

Digitized by Google
§ 2.3 Operations and definitions 19
each block number them 0 to 3, and for each block obtain a random
°
permutation of the numbers to 3, by deleting 4 to 9 from the tabulated
random permutations of 10, crossing out each permutation from the
table 88 it is used.
The selection of a Latin square at random is more complicated
(see § 11.8).

2.4. Three rules of probability


The words and and or are printed in bold type when they are being
used in a restricted logical sense. For our purposes El and E2 means
that both the event El and the event E2 occur, and El or E2 means that
either El or E2 or both occur (in general, that at leaat one of several
events oocurs). More explanation and details of the following rules can
be found, for example, in Mood and Graybill (1963), Brownlee (1965),
or Lindley (1965).

( 1) The addition rule 0/ :probability


This states that the probability of either or both of two events, El
andE 2,ocourringis
P[E1or E 2] = P[E 1 ]+P[E2]-p[E1 and E 2 ]. (2.4.1)
H the events are mutually exclusive, i.e. if the probability of El and
E2 both occurring is zero, P[EI and E 2] = 0, then the rule reduces to'
(2.4.2)
the sum of probabilities. Thus, if the probability that a drug will
deoreaae the blood pressure is 0·9 and the probability that it will have
no effect or increase it is 0·1 then the probability that it will either
(a) increase the blood preBBure or (b) have no effect or decrease it is,
since the events are mutually exclusive, simply 0·9+0·1 = 1·0.
Because the events considered are exhaustive, the probabilities add up
to 1·0. It is certain that one of them will occur. That is
P[E occurs] = I-P[E does not occur]. (2.4.3)
This example suggests that the rule can be extended to more than two
events. For example the probability that the blood preBBure will not
change might be 0·04 and the probability that it will decrease might be
0·06. Thus
P[no change or decrease] = 0·04+0·06 = 0·1 as before,
p[no change or increase] = 0·04+0·9 = 0·94,
p[no change or decrease or increase] = 0·04+0·06+0·9 = 1·0.

Digitized by Google
20 Operations and definitions §2.4
The simple addition rule holds because the events considered are
mutually exclusive. In the last case only they are also exhaustive. An
example of the use of the full equation (2.4.1) is given below.

(2) The multiplication rule oj probability


It is possible that the probability of event El happening depends on
whether Ei has happened or not. The conditional probability of El
happening given that Ei has happened is written P[El IE 2], which is
usually read 'probability of El given Ei'.
The probability that both El and E2 will happen is
P[El and E 2] = p[E l ],p[E2IEl]
= p[E 2],p[El IE 2]. (2.4.4)
If the events are independent in the probability sense (different
from the functional independence) then, by definition of independence,
P[E l IE 2] = P[El ]
and similarly P[EiIEl] = P[E 2 ] (2.4.5)
80 the multiplication rule reduces to
P[El and E 2] = P[E l ].P[E2 ], (2.4.6)
the product of the separate probabilities. Events obeying (2.4.5) are
said to be independent. Independent events are nece88arily un cor-
related but the converse is not necessarily tme (see § 12.9).

A numerical illustration oj the probability rules. If the probability


that a British college student smokes is 0·3, and the probability that
the student attends London University is 0'01, then the probability
that a student, selected at random from the population of British
college students, is both a smoker and attends London University can
be found from (2.4.6) ast
p[smoker and London student] = p[smoker] X p[London student]
= 0'3xO'01 = 0·003
Q,8long Q,8 smoking and attendence at London University are in-
dependent 80 that, from (2.4.5 ),
p[smoker] = p[smokerlLondon student]
t Notice that P [BlDoker], could be written &8 P [BlDoker I British college student].
All probabilities are really conditional (on membel'Bhip of a specified population). See,
for example, Lindley (1965).

Digitized by Google
§ 2.4 Operati0n8 and deftniti0n8 21
or, equivalently,
p[London student] = p[London studentlsmoker].

The first of these conditions of independence can be interpreted in


words as 'the probability of a student being a smoker equals the
probability that a student is a smoker given that he attends London
University', that is to say 'the proportion of smoking students in the
whole population of British college students is the same as the propor-
tion of smoking students at London University', which in turn implies
that the proportion of smoking students is the same at London as at
any other British University (which is, no doubt, not true).
Because smoking and attendance at London University are not
mutually exclusive the full form of the addition rule, (2.4.1), must be
used. This gives

p[smoker or London student] = 0·3+0·01-(0·3xO·Ol) = 0·307.

The meaning of this can be made clear by considering random samples,


each of 1000 British college students. On the average there would be
300 smokers and 10 London students in each sample of loOO. There
would be 3 students (1000 X 0·003) who were both smokers and London
students if the implausible condition of independence were met (see
above). Therefore there would be 297 students (300-3) who smoked but
were not from London, and 7 students (10-3) who were from London
but did not smoke. Therefore the number of students who either
smoked (but did not come from London), or came from London (but
did not smoke), or both came from London and smoked, would be
297+7+3 = 307, as calculated (1000 X 0·307) from (2.4.1).

(3) Bayes' theorem, ill'UStrated by the problem of medical diagnosi8


Bayes' theorem has already been given in words as (l.3.1) (see
§ l.3). The theorem applies to any series of events HI' and is a simple
consequence of the rules of probability already stated (see, for example,
Lindley (1965, p. 19 et seq.». The interesting applications arise when the
events considered are hypotheses. If the jth hypothesis is denoted
HI and the observations are denoted Y then (1.3.1) can be written
symbolically as
(2.4.7)
posterior UkeUhood of prior probabUlty
probabUlty hypothealll ofhypoth.1I1
ofh)'lllltbealal

Digitized by Google
22 OparatioM and deftnitioM § 2.4
where k is a proportionality constant. If the set of hypotheses con-
sidered is exhaustive (one of them must be true), and the hypotheses are
mutually exclusive (not more than one can be true), the addition
rule states that the probability of (hypothesis 1) or (hypothesis 2) or
. . . (which must be equal to one, because one or another of the
hypotheses is true) is given by the total of the individual probabilities.
This allows the proportionality constant in (2.4.7) to be found. Thus
IP[BJiY] = kl:(P[YIBJ].p[BJ]) = 1 and therefore
aUI

(2.4.8)

Bayes' theorem has been used in medical diagnosis. This is an un-


controversial application of the theorem because all the probabilities
can be interpreted as proportions. Subjective, or personal, probabilities
are not needed in this case (see § 2.2)
If a patient has a set of symptoms S (the observations) then the
probability that he is suffering from disease D (the hypothesis) is,
from (2.4.7),
P[DIS] = kxp[sID]xP[D]. (2.4.9)
In this equation the prior probability of a patient having disease D,
P[D], is found from the proportion of patients with this disease in the
hospital records. In principle the likelihood of D, i.e. the probability
of observing the set of symptoms S if a patient in fact has disease D,
P[SID], could also be found from records of the proportion of patients
with D showing the particular set of symptoms observed. However, if
a realistic number of possible symptoms is considered the number of
different po88ible sets of symptoms will be vast and the records are not
likely to be extensive enough for P[SID] to be found in this way. This
difficulty has been avoided by a88uming that symptoms are independent
of each other so that the simple multiplication rule (2.4.6) can be
applied to find P[SID] as the product of the separate probabilities of
patients with D having each individual symptom, i.e.
(2.4.10)
where S stands for the set of n symptoms (81 and 8 2 and . . . and
8ft) and P[81ID], for example, is found from the records as the propor-
tion of patients with disease D who have symptom 1. Although the
a88umption of independence is very implausible this method seems

Digitized by Google
§ 2.4 Operations tJM definitions 23
to have given some good results (see, for example, Bailey (1967,
Chapter 11).

A numerical example
The simplest (to the point of naivety) example of the above argu-
ment is the case when only one disease and one symptom is oonsidered.
The example is modified from Wallis and Roberts (1956).
Suppose that a diagnostio test for cancer has a probability of 0·96
of being positive when the patient does have cancer. H 8 stands for the
event that the test is positive and S for the event that it is negative
(the data), and if D stands for the event that the patient has cancer,
and D for the event that he has not (the two hypotheses) then in
symbols P[8ID] = 0·96 (the li1celihood of D if 8 observed). Because
the test is either positive or not a slight extension of (2.4.3) gives
p[SID] = I-P[SID] = 0·04 (the li1celihood of D if 8 is observed). The
proportion of patients with cancer giving a negative test (false nega-
tives) is 4 per cent. Suppose also that 95 per cent of patients without
cancer give a negative test, p[SIi5] = 0·95. Similarly P[SID] = 1-0·95
= 0·05, i.e. 5 per cent of patients without cancer give a positive test
(false positives). As diagnostic tests go, these proportions of false
results are not outrageous. But now oonsider what happens if the test
is applied to a population of patients of whom 1 in 200 (0·5 per cent)
suffer from cancer, i.e. P[D] = 0·005 (the prior probability of D) and
p[i5] = 1-0·005 = 0·995 (from (2.4.3) again). What is the probability
that a patient reacting positively to the test aotually has cancer'
In symbols this is p[DIS], the posterior probability of D after obse~
S, and from (2.4.7) or (2.4.9), and (2.4.8) it is, using the probabilities
assumed above,
P[SID].P[D]
P[DISJ = P[SID].P[D]+P[SID].P[D]
0·96 X 0·005
-- ---------
(0·96 X 0·005)+(0·05 X 0·995)
0·0048
-------
0·0048+0·04975
0·0048
-- --
0·05455
= 0·0880. (2.4.11)

Digitized by Google
24 OpBf'ati0n8 and deji:n.iti0n8 § 2.4
In other words, only 8·80 per cent of positive reactors actually have
cancer, and 100-8'80 = 91·2 per cent do not have cancer. Not such
a good performance. It remains true that 96 per cent of those with
cancer are detected by the test, but a great many more without cancer
also give positive tests.
It is easy to see how this arises without the formality of Bayes'
theorem. Suppose that 100000 patients are tested. On average 500
(=100000xO'005) will have cancer and 99500 will not have cancer.
Of the 500 with cancer, 500 X 0·96 = 480 will give positive reactions on
average. Of the 99500 without cancer, 99500 X 0·05 = 4975 will give
positive reactions on average (a much smaller proportion, but a much
larger number than for the patients with cancer). Of the total number
of positive reactors, 480+4975 = 5455, the 1/,umber with cancer is
480 and the proportion with cancer is 480/5455 = 0·0880 as above.
If these numbers are divided by the total number of patients, 100000,
they are seen to coincide with the probabilities calculated by Bayes'
.
theorem in (2.4.11) .

2.5. Averages
If a number of replicate observations is made of a variable quantity
it is commonly found that the observations tend to cluster round some
central value. Some sort of average of the observations is taken as an
estimate of the true or population value (see § 1.2) of the quantity that
is being measured. Some of the possible sorts of average will be defined
now. It can be seen that there is no logical reason for the automatic
use of the ordinary unweighted arithmetic mean, (2.5.2). If the distri-
bution of the observations is not symmetrical it may be quite inappro-
priate, and nonparametric methods usually use the median (see §§
4.5,6.2, and 7.3 and Chapters 9, 10, and 14).

The arithmetic mea1/,


The general form is the weighted arithmetic 8ample mea1/, (using the
notation described in § 2.1),
_ ~WIXI
x=--· (2.5.1)
~W,

This provides an estimate, from a sample of observations, of the


unknown population mea1/, value of x (as long as the sample was taken
strictly at random, see § 2.3). The population mean is the mean of all

Digitized by Google
§ 2.5 Operations and definitions 25
the values of x in the population from which the sample was taken and
will be denoted I' (see § ALl fOl' a more rigorous definition).

The weiglal of an ob8ervation. The weight, WI' 8880Ciated with the ith observa-
tion, Zit is a measure of the relative importance of the observation in the flnal
result. U8ua1ly the weight is taken &8 the reciprocal of the variance (see § 2.6 and
(2.7.12», 80 the observations with the 8Dl&llest scatter are given the greatest
weight. If the observations are uncorrelated, this procedure gives the best
estimate of the population mean, i.e. an unbiased estimate with minimum variance
(maximum precision). (See §§ 12.1 and 12.7, Appendix I, and Brownlee (1965,
p. 95).) From (2.7.8) it is seen that halving the variance is equivalent to doubling
the number of observations. Both double the weight. See § IS.4 for an example.
Weights may also be arbitrarily decided degrees of relative importance. For
example, if it is decided in examination marking that the mark for an e88&y
paper 8hould have twice the importance of the marks for the practical and oral
examinations, a weighted average mark could be found by &88igning the e88&y
mark (say 70 per cent) of a weight of 2 and the practical and oral marks (say
30 and 40 per cent) a weight of one each. Thus
Ii = (2X70)+(1 xSO)+(l x(0) = 52.5.
2+1+1
If the weights had been chosen &8 I, 0'5, and 0·5 the result would have been
exactly the same.

The definition of the weighted mean has the following properties.


(a) If all the weights are the same, say w, = w, then the ordinary
unweighted arithmetic mean is found; (using (2.1.5) and (2.1.6»,
_ l:wx, u'l:x, l:X,
x=--=--=-' (2.5.2)
l:w Nw N
In the above example the unweighted mean isl:x,fN = (70+30+40)/3
= 46'7.
(b) If all the observations (values of x,) are the same, then i has this
value whatever the weights.
(c) If one value of x has a very large weight compared with the
others then i approaches that value, and coQversely if its weight is zero
an observation is ignored.

The geometric mean


The unweighted geometric mean of N observations is defined as the
Nth root of their product (cf. arithmetic mean which is the Nth part of
their sum).

i= ('.oN
nx,)l/N. (2.5.3)
'-I

Digitized by Google
26 § 2.5
It will now be shown that this is the sort of mean found when the
arithmetic mean of the logarithms of the observations is calculated
(as, for example, in § 12.6), and the antilog of the result found.
Call the original observations z" and their logarithms x" so
x, = log z,
. -- :Ex, l:(log z,)
Arithmetic mean of log z = f = (log z) = N = N

log(nz,) because the sum of the logs is the log of the


= N product

= log {~(nz,)}

= log (geometric mean of z)

or, taking antilogs,


antilog (arithmetic mean of log z) = geometric mean of z. (2.5.4)

This relationship is the usual way of calculating the geometric mean.


For the figures used above the unweighted geometric mean is the cube
root of their product, (70 X 30 X 40)1/3 = 43·8. The geometric mean
of a set of figures is al'waY8 less than their arithmetic mean, as in this
case. If even a single observation is zero, the geometric mean will be
zero.

The median
The population (or true) median is the value ofthe variable such that
half the values in the population fall above it and half above it (i.e.
it is the value bisecting the area. under the distribution curve, see
Chapters 3 and 4). It is not necessarily the same as the population mean
(see § 4.5). The population median is estimated by the

sample median = central observation. (2.5.5)

This is uniquely defined if the number of observations is odd. The


median of the 5 observations I, 4, 9, 7, 6, is seen, when they are ranked
in order of increasing size giving, I, 4, 6, 7, 9, to be 6. If there is an even
number of observations the sample median is taken half-way between
two central observations; for example the sample median of 1,4,6,7,
9, 12 is 61.

Digitized by Google
27
The mode
The sample mode is the most frequently observed value of a variable.
The population mode is the value corresponding to the peak of the
population distribution curve (Chapters 3 and 4). It may be different
the mean and § 4.5).

aquaru utimate
aection anticl~lL5ht_ "§§"'£l,,""'ClEon of eetima.ti5h5h z§§lLggopter 12, and
ALS. The arithmeAgg sample is said to 5h"1ua.ree eatuna.te
(Bee Chapter 12) because it is the value that best represents the sample in the
aeuse that the sum of the aquaree of the deviations of the observations from the
arithmetic mean, l:(II:,-Z)2, is mnaller than the sum of the aquaree of the deviations
from any other value. This can be shown without using calculus &8 follows.
Suppoae, &8 above, that the sample consists of N observations, 11:11 ~ ..... II:N' It
is requIred to find a value of m that makes l:(II:,-m)2 &8 8Dl&ll &8 poasible. This
fud5hws immediately SoE§§5h§§ralc identity
= l:(II:,-.f)2+N(gt~ffi}El" (2.5.6)
values of m ~""'r"H""'" this is clearly 5h5hithmetic mean,
this makes
§§og~"~~5hW18 zero; &8 8Dl&ll at §§"5hr the example
tdd'5hwing (2.5.2), the
l:(11:,-:2)2 = (70-46'7)2+(30-46'7)2+(40-46'7)2 = 866·7.
and a few trials will show that in8erting any value other than 46·7 makes the IIUDl
of aquaree larger than 866·7.
The intermediate steps in establishing (2.5.6) are 8&8y. By definition of the
arithmetic mean Nz = U" 80 the right-hand side of (2.5.6) can be written.
ffi~"_§§leting the squa.ree (§§.1.6). &8

as stated in (2.5.6).
Using calculus the same result can be reached more elegantly. The usual way of
flnding a minimum in calculus is to difterentiate and equate the result to zero
(see Thompson (1965. p. 78». This process is described in detail, and illustrated.
in Chapter 12. In this caae l:(z,-m)2 is to be minimiwci with respect to m.
LELEl_5hntlating l:(II:, zml:z,+Nm2. the 11:, are
4:EK5h"s5ht§§E§§ta for a given £lquating to zero,

(2.5.7)

A§§§§F5hdore 2Nm = 24:~",

and m = u,/N = z
&8 found above.

gitized by Goo
28 Operations and definitions § 2.6
2.6. Measures of the variability of observations
When replicate observations are made of a variable quantity the
scatter of the observations, or the extent to which they differ from
each other, may be large or may be small. It is useful to have some
quantitative measure of this scatter. See Chapter 7 for a discUBBion
of the way this should be done. As there are many sorts of average
(or 'measures of location'), so there are many measures of scatter.
Again separate symbols will be used to distinguish the estimates of
quantities calculated from (more or leBS small) samples of observations
from the true values of these quantities, which could only be found if
the whole population of possible observations were available.

The range
The difference between the largest and smallest observations is the
simplest measure of scatter but it will not be used in this book.

The mean deviation


If the deviation of each observation from the mean of all observations
is measured, then the sum of these deviations is easily proved (and this
result will be needed later on) to be always zero. For example, consider
the figures 5, I, 2, and 4 with mean = 3. The deviations from the
mean are respectively +2, -2, -I, and +1 so the total deviation is
zero. In general (using (2.1.6), (2.1.8), a.nd (2.5.2)),
N
I (XI-i) = 'f.xl-Ni = Ni-Ni = O. (2.6.1)
' .. 1

If, however, the deviations are all taken as positive their sum (or mean)
i8 a measure of scatter.

The standard deviation and variance


The standard deviation is also known, more descriptively, as the
root mean square deviation. The population (or true) value will be
denoted a(x). It is defined more exactly in § A1.2. The estimate of the
population value calculated from a more or less small sample of, say,
N observations, the 8ample 8tandard deviation, will be denoted 8(X).
The square of this quantity is the estimated (or 8ample) variance (or
mean 8tJ114re deviation) of x, var(x) or 82 (X). The population (or true)

Digitized by Google
§ 2.6 Operations aM definitions 29
variance will be denoted VM(X) or a2(x). The estimates are calculated as

_ J[1:(X,-i)~
8(X) - N-l -J
(2.6.2)

The standard deviation and variance are said to have N -1 degreea 01


freedom. In calculating the mean value of (x, _i)2, N -1, rather than N,
is used because the sample mean i has been used in place of the popula-
tion mean 1" This would tend to make the estimate too small if N were
used (the deviations of the observations from I' will tend to be larger
than the deviations from i; this can be seen by putting m = I' in
(2.5.6». It is not difficult to show that the use of N -1 corrects this
tendency (see § Al.3). t It also shows that no information about scatter
can be obtained from a single observation if I' is unknown (in this case
the number of degrees of freedom, on which the accuracy of the estimate
of the scatter depends, will be N -1 = 0). If I' were known even a
single observation could give information about scatter, and, as
expected from the foregoing remarks, the estimated variance would be
a straightforward mean square deviation using N not N -1.

_!I 1:(X,-1')2
.,-(x) = . (2.6.3)
N

A numerical example of the calculation of the sample standard


deviation is provided by the following sample of N = 4 observations
with arithmetic mean i = 12/4 = 3.

t If the 'obvious' quantity, N, were used 88 the denominator in (2.6.2) the estimate
of a2 would be biaud even if the obll8rvations themll81ves were perfectly free of biaa
(ayatematio errors). This sort of biaa results only from the way the obll8rvations are
treated (another example occurs in § 12.8). Notice also that this implies that the mean
of a very large number of values of T.(:t:-~)·/N would tend towards too 1IJII&11 a value,
viz. '''''(11:) X (N -I)/N, 88 the number of values, each calculated from a small sample,
~; whereaa the same formula applied to a single very large sample would tend
towarda ."..(:t:) itll8lf 88 the size of the sample (N) increaaes. Th8118 results are proved in
§ AI.S. It mould be mentioned that unbiaaedn8ll8 is not the only criterion of a good
statistio and other criteria give different divisors, for example N or N + 1.

Digitized by Google
-----------~~- ~-- -----

30 OperatioM and deftn.;'ioM § 2.6

x, XI-X (X,-X)2

0 +2 +4
1 -2 +4
2 -1 +1
4 +1 +1

Totals 12 0 10

Thus
.-tI (X,-X)2 = 1080, from (2.6.2), 8(X) = V(10/3) = 1·83.
'-1
TM coeJficient 01 t1ariation
This is simply the standard deviation expressed as a proportion (or
peroentage) of the mean (as long as the mean is not zero of course).

8(X)
O(x) = -_- (sample value),
X

tl(x) = -,;-
O'(X) = J[v&t(X)]
~ (population value), (2.6.4)

where p. is the population mean value of X (and of x). O(x) is an estimate,


from a sample, of tl(x).
Whereas the standard deviation has the same dimensions (seconds,
metres, etc.) as the mean, the coefficient of variation is a dimensionless
ratio and gives the relative size of the standard deviation. If the scatter
of means (see § 2.7) rather than the scatter of individual observations,
were of interest O(x) would be calculated with 8(X) in the numerator.
In the numerical example above O(x) = 8(X)/X = 1'83/3 = 0'61, or
100 O(x) = 100 X 0·61 = 61 per oent.

The worlcing lormula lor the sum 018quared deviatioM


When using a desk calculating machine it is inconvenient to form
the individual deviations from the mean and the sum of squared devia-
tions is usually found by using the following identity. Using (2.1.6) and
(2.1.8),

Digitized by Google
§ 2.6 31
Now, since u, = Ni, this becomes

and thus

(2.6.5)

In the above example ~x: = 52+12+22+42 = 46 and therefore


~(X,_i)2= 46-122/4 = 46-36 = 10, as found above.

Plat COWJria7lCe
This quantity is a measure of the extent to whioh two variables are
correlated.. Uncorrelated events are defined as those that have zero
covariance, and statistically independent events are necessarily un-
correlated. (though uncorrelated. events may not be independent--eee
§ 12.9).
The true, or population, covariance of x with y will be denoted
uv(x,y), and the estimate of this quantity from a sample of observa-
tions is

~(x-i)(y-g)
cov(x,y) = N -1 . (2.6.6)

The numerator is called the BUm 0/ prodvd8. That the value of this
expresaion will depend on the extent to whioh y increases with x is
clear from Fig. 2.6.1 in which, for example, y might represent body
weight and x calorie intake. Each point represents one pair of observa-
tions.
II the graph is divided into quadrants drawn through the point
i, 9 it can be seen that any point in the top right or in the bottom left
quadrant will contribute a positive term (x-i)(y-g) to the sum of
products, whereas any point in the other two quadrants will contribute
a negative value of (x-i)(y-g). Therefore the points shown in Fig.
2. 6. 2(a) would have a large positive covariance, the points in
Fig. 2.6.2.(b) would have a large negative covariance, and those in
Fig. 2.6.2(0) would have near zero covariance.
4

Digitized by Google
32 § 2.6
The tDOf'ang formula for the nm of products
A more convenient expression for the sum of products can be found
in a way exactly analogous to that used for the sum of squares (2.6.6).
It is .

l:(z-f){y-g) = l:zy (l:x){l:y) (2.6.7)


N

y I
(x-i) is negative I (x-i) is positive 0
(y-j) is positive : (y - y) is positive o
I
I o o
I
I
o I
j -------------~-----~----------
I 0
(x-i) is negative o I (x-i) is positive
(y- g) is negative : (y-j) is n~ative
I
o o I
I
o I
o 0 I
I
I

FlO. 2.6.1. IlluatratJ.on of covariance. For eleven of the thirteen obaervationa,


the product (~I-~)("I-g) is positive; for the other two it Ie negative.

(a) (b) (e)


!I I y I 11 I
I 0 o 0 I I
I o I 001 0
I 00 o I
o 01 0 0 0
_ 0 1000 000
11 ------to------ j -----1------ j -----~-----
001 ~ 0 o 0 10 0 0
o I 10 0 0 o 0 I1 0
000 I I 0
I I I
x x
FlO. 2.6.2. I11uatration of cova.rJance: (a) positive covariance aa in FIg. 2.6.1;
(b) negative covariance; (e) near zero cova.rJance.

Digitized by Google
§ 2.7 33

2.7. What is a standard error? Variances of functions of the


observations. A reference list

Prediction 0/ tJanances without direct ~


A problem that recurs continually is the prediction of the scatter of
some function calculated from the observations, using the internal
evidence of one experiment.
Suppose, for example, that it is wished to know what degree of
confidence can be placed in an observed sample mean, the mean, i, of a
single sample of, say, N values of x selected from some specified popula-
tion (see § 1.2). A numerical example is given below. The sample
mean, i, is intended as an estimate of the population mean, p. How good
an estimate is it t The direct approach would be to repeat the experi-
ment many times, each experiment oonsisting of (a) making N observa-
tions (i.e. selecting N values of %from the population) and (b) calculat-
ing their mean, i. In this way a large set of means would be obtained.
It would be expected that these means would agree with each other
more closely (i.e. would be more closely grouped about the population
mean p) than a large set of single observations. And it would be
expected that the larger the number (N) of observations averaged to
find each mean, the more closely the means would agree with each other.
If the set of means was large enough their distribution (see Chapter 4)
could be plotted. Ita mean would be p, as for the %values (= 'means of
samples of size N = 1'), but the standard deviation of the population
of i values, O'(i) say, would be less than the standard deviation, 0'(%),
of the population of %values, as shown in Fig. 2.7.1. The closeness with
which the means agree with each other is a measure of the confidence
that could be placed in a single mean as an estimate of p. And this
closeness can be measured by calculating the variance of the set of
sample means; using the means (£ values) as the set of figures to which
(2.6.2) is applied, giving var(i) , or r(£), as an estimate of 0-2(£), as
illustrated below.
If (2.6.2) was applied to a set of observations (% values), rather than
a set of sample means, the result would be var(%), an estimate of the
scatter of repeated observations. As it has been mentioned that a set
of means would be expected to agree with each other more closely
than a set of single observations, it would be expected that v&t(i}
would be smaller than v&t(x), and this is shown to be so below «2.7.8».
The standard deviation of the mean, 8(£) = Vvar(i), is often called the
&ample Blandard error olthe mean to distinguish it from 8(%), the sample

Digitized by Google
§ 2.7
standard deviation of z, or 'sample standard deviation of the observa-
tions'. This term is unnecessary and 8ample standard deviation 01 the
fJWJn is a preferable name for 8(i).
The sample standard deviation of the observations, 8(Z), is the
interest if on€'! ti.Etimate the ..m........... Ealues
In other ffu1titi·tiures
Taking J1J11mple

0.8

!> 0·6
0·7
1 (a)

'a
CIl
"'C 0·5.
,,=4ES
I
I

tiEl
J
ffu ..

0·1

0'0~=--7---!:--~--:-----!:~~
I 6
x
FlO. 2.7.1 (a) Distribution of observations (z values) in the population.
ffuffuder the curve beJ1WJ1'1ti t.wo z values is tEffu of an
.. 1..,l11.....,111.1 ... n Ealling between 80 the total area
CE€'!Eter 4 for details). E.€'!...J1E.1ular distribution
chapter are vaE.E .11"'J1dbution (though Eevia-
nlmple interpretaJ1111n the distribution The
uf z is 4·0 and the d.ti~€'!tion, a(z), is 1·0. Eliwl1YEution
of:f values. :f is the mean of a sample of four z values from the population repre-
sented in (a). The area under this curve must be 1·0, like the distribution in (a).
To keep the area the same, the distribution must be taller, because it is narrower
{i.e. the :f values have less scatter than the z values). The ordinate and abscissa
are drawn on the same scale in (a) and (b). The mean value of:f is 4·0 and its
standa.rd deviation, a(x) (the 'standard error of the mean'), is 0·5.

a more QCC'UwJ1f€'! of the popul€'!fim1 O'(z).


hand, the deviation 8(i),
€'!mlllH,y of interesf to estimate of a
sample mean, i. It is used if the object of making the observations is
to estimate the population mean, rather than to estimate the inherent
variability of the population. Taking a larger sample makes 8(i) smaller,

gle
§ 2.7 Operations and definitions 35
on the average, because it is an estimate of 0'(£) (the population
'standard error'), which is smaller than o'(x).
The standard deviation of the mean may sometimes be measured, for
special purposes, by making measurements of the sample mean (see
Chapter 11, for example). It is the purpose of this section to show that
the result of making repeated observations of the sample mean (or of
any other function calculated from the sample of observations), for
the purpose of measuring their scatter, can be predicted indirectly.
If the scatter of the means of four observations were required it could
be found by observing many such means and calculating the varianoe
of the resulting figures from (2.6.2), but alternatively it could be
predicted using 2.7.8) (below) even if there were only four observations
altogether, giving only a single mean. An illustration follows.

A numerical example to iUUBtrate the idea 01 the Btaflllaryj tk11iation 01 the


mean
Suppose that one were interested in the precision of the mean found
by averaging 4 observations. It could be found by determining the
mean several times and seeing how closely the means agreed. Table
2.7.1 shows three sets of four observations. (It is not the purpose of this
section to show how results of this sort would be analysed in practioe.

TABLE 2.7.1.
Phree random samples, each with N = 4 ob8ervations, from a population
with mea'll. p. = 4·00 and standard deviation o'(x) = 1·00

Sample 1 Sample 2 Sample 8

Z values 3.99
8·9 5·88 8·79
8'88 2·45 8·25
8·89 2·21 4'07
6'36 5'96 8·21

Sample mean, % 4'40 4'12 8·58 Grand mean = 4·04


Sample standard
deviation,8(z) 1'38 2·08 0·420

That is dealt with in Chapter 11.) The observations were all selected
randomly from a population knoum (because it was synthetic, not
experimentally observed) to have mean p. = 4·00 and standard
deviation o'(x) = 1·00, as shown in Fig. 2.7.1(a)

Digitized by Google
36 § 2.7
The sample means, i, of the three samples, 4·40, 4·12, and 3·58, are all
estimates of p. = 4·00. The grand sample mean, 4·04; is also an estimate
of p. = 4·00 (see Appendix 1). The standard deviations, 8(Z), of each
of the three samples, 1·33, 2·08, and 0·420, are all estimates of
the population standard deviation, a(z) = 1·00 (a better estimate
could be found by averaging, or pooling, these three estimates as
described in § 11.4). The population standard deviation of the mean
can be estimated directly by calculating the standard deviation of the
sample of three means (4·40, 4·12, 3·58) using (2.6.2). This gives

__ J[(4.40-4.04rOl+(4.12-4.04rOl+(3.58-4.04)~
8(Z) - 3-1 -J
= 0·420.

Now according to (2.7.9) (see below), if we had an infinite number of


i values instead of only 3, their standard deviation would be
a(i) = a(z)/v'N = 1·00/v'4 = 0·5 (see Fig. 2.7.I(b», and8(i) = 0·420is
a sample estimate of this quantity. And, furthermore, if we had only one
sample of observations it would still be poBBible to estimate indirectly
a(i) = 0·5, by using (2.7.9). For example, with only the first group,
8(i) = 8(z)/v'N = 1·33/v'4 = 0·665 could be found as an estimate of
a(i) = 0·500, i.e. as a prediction of what the scatter of means would
be il they were repeatedly determined. (This prediction refers to
repeated samples from the same population. If the repeated samples
were from different populations the prediction would be an under-
estimate, as described in Chapter ll.)

A relert/fI,U list
The problem dealt with throughout this section has been that of
predicting what the scatter of the values of various functions of the
observations (such as the mean) would be il repeated samples were
taken and the value of the function calculated from each. The aim is
to predict this, given a single sample containing any number of observa-
tions that happen to be available (not fewer than two of course).
The relationships listed below will be referred to frequently later on.
The derivations should really be carried out using the definition of
the population variance (§ A1.2 and Brownlee (1965, p. 57», but it
will, for now, be sufficient to use the sample variance (2.6.2). The
results, however, are given properly in terms of population variances.
The notation was defined in §§ 2.1 and 2.6.

Digitized by Google
2.7 37
Variance 0/ tk sum or differenu 0/ two tJariablu. Given the variance of
the values of a variable x, and of another variable y, what is the pre-
dicted variance of the figures fOWld by adding an x value to a y value'
From, (2.6.2),
j-N
I [(Xj+Yj)
j--=..I_ _ __

since (x+y) = 'f.xdN +Z:ydN 9 (from 2.1.8),


this can be rearranged giving
l:[ (Xj -i) + (y j -gn:!
var(x+y) = ---N=----'-
-1
l:[(Xj_i)2+(yj_g)2+2(xj-i)(yj-g)]

l:(Xj _i)2 + l:~(y_'g ~_ 2:2(Xj-i)(yj-g)


N-l N-l

suggesting, from (2.6.2) and (2.6.6), the general rule


va.t(x+y) = vat(x)+vat(y)+2 cov(x,y). (2.7.1)
By a similar argument the variance of the differenu. between two
variables is fOWld to be
(2.7.2)
N:2us if the variZ3,Yle~e vvoorrelated, i.e, = 0, then the
vZ3,riance of eith'5e the differenee 2he sum of the
veparate varianevZ3,
va.t(x+y) = vaa(x-y) = va.(x)+vat(y). (2.7.3)
If variables are independent they are necessarily Wlcorrelated (see
§§ 2.4 and 12.9),80 (2.7.3) is valid for independent variables.

:2vnance 0/ the ,gg",,,,,e,tPA By a simple eZ3,tj£zztzUY'&


iitgument for tWf' if Xl' X2' ... , xZ3, ,,,,,,,~,"''''Pln~pn. vari-
vYlas then (2.7.3) yeneralized giving
if'ifgggg(X 1 +X2+'"
= vat(x 1 )+Va.t(X2)+···+Va.t(XN)
= l:va.t(Xj), or Nva.t(x), (2.7.4)

gitized by Goo
38 Operatitma and dejinititma § 2.7
the seoond form being appropriate (cf. (2.1.6» if all the x. have the
same variance, v&t(x).

The effect 0/ multiplying each x by a conBtant /actor. If a is a ooll8tant


then, by (2.6.2), the variance of the set of figures found by multiplying
each of a set of x values by the same figure, a, will be
l:(ax.-axr~ l:[a(x.-i)]2 a2.l:(x.-i)2
var(ax) = N-l = N-l = N-l-

suggesting the general rule


v&t(ax) = a2v&t(x); (2.7.5)
and similarly from (2.6.6),
e4V(ax, by) = ab eov(x,y) (2.7.6)
where a and b are constants.

The effect 0/ adding a conatant ~ each x. By similar arguments to those


above it can be seen, from (2.6.2), that adding a constant has no effect
on the scatter.
v&t(a+x) = v&t(x). (2.7.7)
The t1arianu 0/ the mean 0/ N ob8enJatitma and the 8tandard error. This
relationship, the answer to the problem of how to estimate indirectly
the scatter of repeated observations of the mean discussed, with a
numerical example, at the beginning of this section, follows from those
already given.

(from (2.7.5»

N
= -V4t(X) (from (2.7.4))
N2
and therefore the variance of the mean is
v&t(x)
vM(i) = - - (2.7.8)
N
and the standard deviation a/the mean (the standard error, see discussion
above) is
a(x)
a(i) = y[vat(i)] = - . (2.7.9)
yN

Digitized by Google
§ 2.7 39
Notice that var(x). being an average (like x). will be more or leas the
same whatever size sample i8 used to e8timate it (thoJJgh a larger
sample will give a more preci8e value). whereas var(x) becomes 8maller
&8 the number of observations averaged in0rea.&e8. &8 expected from the
discusaion at the beginning of this section and from (2.7.8).

TA. ~ of a liftllCl1' ftmdion of 1M 06aervationa. A linear function of the


obeervatioDa ~1o Zs!••••• ~. is defined as

L = c&o+ClJ.4I:t+~+ ••• +CIa~. =c&o+ l:afll:. "


'-1
where the a. are COII.Itant.. From (2.7.7) it can be aeen that ao baa no eftect on the
variance and can be Ignored. Using (2.7.4) it can be aeen that. if the obeervatioD8
are uncorrelated.
" •• (L) = " ..(Ecafll:.) = E". . (afll:.)
and using (2.7.5) this becomes
" . .(L) = E(a"'.. (~.)). (2.7.10)
or " . .(L) = " . . (~).Eca'
if the varlancee of·all the ~. are the 8&Dle. " •• (~) 8&y.
If the ~. do ftOt have zero convariances (are not uncorrelated) a more pnerai
form Ia neceaaary. Using (2.7.5). (2.7.5). and an exteDaion of (2.7.1) thIa Ia found
to be
" •• (L) = Eca'u.. (~.)+2l: l:catCIl e4>"(~ •• ~/)' (2.7.11)
• 1
."'1
where the BeCOnd term is the sum of all posaible pain of covariancea. For example
if L = ClJ.4I:t +~+fIa:I:a. then uat(L) = af uat(~l)+a= uat(za)+al uat(Zs)+ 2a1Cla
-..(4I:toZs)+2CI].CIa -(4I:toZaH2aaaa e_(ZaoZa)·

TA. variance of 1M tHig1aled arithmelic mean. The variance of the weighted mean.
defined in (2.5.1). follows from (2.7.5) and (2.7.10).
EWfll:I) _ _ 1_ ~__ _ E[wlIU•• (ZI)).
".. ( ~
~WI
~ aua.(~cZl) -
- (~WI) (E WI)a
Now if WI = l/u.t(~I)' as diacuaaed in f 2.5. then E[wlI uat(zl)) = EWI 10

.... t ( E1D cZl) = ~. (2.7.12)


s EWI Ew.
and if all the weight (variances) are the 8&me thIa reduces to (2.7.8).
TA. approcrimate WlrianeB of any function. The variance of any function f(zl.
Zs!. •••• ~) of the UfICOfTBlated variables 4I:t. Zs!. •••• ~. is given appr'Oftmately
(taJdDg only the linear terms in a Taylor series expaDaion of f) by

",,(f)~(:;lrua.(Zll+ (ll~ar... at(zal+ ••• + (ll~ruat(Zn). (2.7.13)

Digitized by Google
40 § 2.7
if the variances are reuonably amall relative to the me&D8, 80 that the function
can be represented approximately by a atraight line in the range over which each
~ varies. The derivatives should be evaluated at the true mean value of the ~
variables. If the ~ variables are correlated then terms involving their covarlances
must be added as shown below. For diacuaeion and the derivation lee, for example,
Lindley (1965, p. 184), Brownlee (1965, p. 144), and Kendall and Stuart (1968,
p.281).
If 1 is a linear function then (2.7.18) reduces to (2.7.10), which 18 euct. If 118
not linear then the result is only approximate and furthermore/willl!ot have the
same dJatrlbution of er1'01'II as the ~ variables 80 if, for example, the ~ values were
normally dJatrlbuted (lee f 4.2) 1 would not be normally dJatrlbuted, 80 it. vari-
ance, even if it were euct, could not be interpreted in any simple way.
Tile vcariGtICe 01 log "z. If the true mean value of ~ is P and then wdng the version
of (2.7.18) for a single variable gives
d log"z)1I ,,_(~)
"_(log,,z)~ ( ~ ,,-(~) = --11- = tflI(~). (2.7.14)
_.,. P
Therefore the standard deviation of log"z is approximately equal to the coeftlcient
of variation of~, "(~), defined by (2.6.4). If the standard deviation of ~ increases
in proportion to the true mean value of~, 80 that the coeftlcient of variation of ~
is constant, the standard deviation of log"z will be approximately constant
(of. If 11.2 and 12.2).

Tile 1ICIf'iGtICe of tile product 01 two wriGbla, ~1a:a. In tbIa case an exact result
can be derived for the variance of values of ~la:a, given the variances of ~1 and of
a:a. Suppoee that ~1 and ~11 are independent of each other, and have population
me&D8 PI and P2 respectively. Thent
"-(~la:a) = "a.(~l)."at(Zs)+pl "at(~l)+P~ ""(a:a).
If tbIa result is divided through by C/AlP.)II, it can be expreeaed in terms of co-
emcient. of variation, defined in (2.6.4), as
tflI(ZlZs) = tflI(~l).tflI(Zs>+'i'2(~l>+'i'2(a:a). (2.T.15)
It is interesting to compare tbIa with the result of applying the approximate
formula, (2.7.13), viz.
V"'(ZlZ.)~(8Z1Zs)a
8zt
vat(zl) + (8Z1Zs )1I" ... (Zs)
o-,z.
= P: "at(zl>+ p~ ,,"'(a:a);
or, again dividing through by (PlP.)· to get the result in terms of coeftlcient. of
variation,

t Prool· From appendix equation (AU.2), ..... (Zl~) = E(*:)-[E(Zl~)P. Now


E(Zl~) ,.. PlJI.lj and E(z~~) = E(zr).E(zl) if, u auppoMd, Z'l and Z'a are independeDt.
Al8o, from (Al.2.2), E(Z'r) ........ (Z'l)+Ph and aimilarly for Z'a. Thus
..... (zt~) ... E(~).E(~)-p~=
= (....(Z'l)+pr)( ..... (Z'a)+P:)-P~:
= ....(Z'l).....(Z'a)+P:..... (Z'l)+pr.....(~)
u etated above.

Digitized by Google
12.7 41
By oompe.riaon with (2.7.15) it appeus that WI8 of the appl'OxImate formula
involves neglect.iDg the term t'2(zl).tII(~). The apPl'Oximation involved can be
lllustrated by two numerical eumplee.
Firat, euppoee that both zt and ~ have coetIlcient. of variation of 50 per cent.
Le. 'l(zt) ... 0'5. '1'(.. ) = 0·5. In t.hla cue (2.7.15) gives
'l(zt.. ) .. v[(Oo&lx O·&lHO·&I+O·&i] -= V(0·0025+0·25+0·15.) - 0·750.
i.e. 75·0 per cent. The appl'Oximate form gives
'l(Zl~) ~ V(0·61 +O·&I) = 0·707.
Le. 70·7 per cent. 8econd.ly. coulder more accurate obeervatio.... _1' 100 'l(zt)
... 5 per cent and 100 '1'(.. ) = 5 per cent. Simllar ealculatlo... show that (2.7.16)
gives 100 'l(zt~) = 7·075 per cent. whereu the approximate form gives 100
'l(zt~) ~ 7·071 per cent. The more accurate the obeervationa, the better the
approximate veralon will be.

n. WIt'ionce o/IA. ratio o/tUKJ tIf.Iriabla. zl/~. Using (2.7.18) gives, in terma of
ooeftlolent. of variation. the approximate reault
(2.7.18)
An euct treatment for the ratio of two normally dJstrlbuted variables Ie given in
118.6 and exempWled in II 13.11-18.15.

n. I1GriGnce o/tM NCiprocal 0/ a tlClriabIe. lIz. According to (2.7.18).


d(l/z»1 .... (z)
.... (1/z)~ ( --:0:- ..at(z) = --41-' (2.7.17)
.... --,. p.
The weight (11M 11.5) to be attached to a value of lIz Ie therefore approximately
pl'OpOl'tlonal to the fourth power of z if vat(z) Ie coDStant I ThIs explalDa why
plot. involving reciprocal traoaformatloDB may give bad result. (11M 112.8
for detaUe) if not correctly weighted.

C~ ~. In the simplest cue of two correlated variables, Zl and ~.


the appropriate exteulon of (2.7.13) Ie

(2.7.18)

ThIs relationabip Is referred to in I 13.5. For a linear function this reduces to the
two variable cue of (2.7.11). The n variable exteulon of (2.7.18) involves all
poBble pairs of z variables in the 8&Dle way 88 (2.7.11).

8vm 0/ fJ t1CIriable number 0/ random tlClriabla. Let 8 denote the sum of a randomly
variable number of random variables
III
8=1:-..
1-1

where ., are independent variables with coefftcient of variation tr(.) and m Is a


random variable with ooefftcient of variation tr(m). If each 8 Is made up of a

Digitized by Google
42 Operationa and definitiona § 2.7
random 8&Dlple (of variable Bize) from the population of. values. then it is mown
in I AI.' that the coefficient of variation of S is

~(S) = J(:~) +'i'2(m»). (2.7.19)

where P. is the population mean value of m (size of the 8&Dlple). This result is
llluatrated on p. 58.

Digitized by Google
3. Theoretical distributions: binomial
and Poisson

THE variability of experimental results is often assumed to be of a


known mathematical form (or distribution) to make statistical analysis
easy, though some methods of analysis need far more assumptions than
others. These theoretical distributions are mathematical idealizations.
The only reason for supposing that they will represent any phenomenon
in the real world is comparison of real observations with theoretical
predictions. This is only rarely done.

3.1. The ide. of • distribution


If it were desired to discover the proportion of European adults who
drink Scotch whisky then the population intJOlved is the set of all
European adults. From this population a strictly random sample
(see § 2.3) of, say, twenty-five Europeans might be examined and
the proportion of whisky drinkers in this sample taken as an estimate
of the proportion in the population.
A similar statistical problem is encountered when, for example, the
true conoentration of a drug solution is estimated from the mean of a
few experimental estimates of its concentration. Although it is con-
venient still to regard the experimental observations as samples from a
population, it is apparent that in this case, unlike that discussed in the
previous paragraph, the population has no physical reality but consists
of the infinite set of all valid observations that might have been made.
The first example illustrates the idea of a di8contin1&O'U8 probability
dwribution (it is not meant to illustrate the way in which a single
sample would be analysed). If very many samples, each of 25 Euro-
peans, were examined it would not be expected that all the samples
would contain exactly the same number of whisky drinkers. If the
proportion of whisky drinkers in the whole population of European
adults were 0·3 (i.e. 30 per cent) then it might reasonably be expected
that samples containing about 7 or 8 oases would appear more frequently
than samples containing any other number because 0·3 X 25 = 7·5.
However samples containing about I) or 10 cases would be frequent,
and 3 (or fewer) or 13 (or more) drinkers would appear in roughly 1 in
20 samples. If a sufficient number of samples were taken it should be

Digitized by Google
44 § 3.1
possible to discover something approaohing the true proportion of
samples containing r drinkers, r being any specified number between
o and 25. These figures are called the probability distribution of the
proportion of whisky drinkers in a sample of 25 and since this propor-
tion is a discontinuous variable (the number of drinkers per sample
must be a whole number) the distribution is described &8 discontinuous.
The distribution is usually plotted &8 a block histogram &8 shown in
Fig. 3.4.1 (p. 52), the block representing, say, 6 drinkers extending
from 1).1) to 6·5 along the absci8880.
The second example, concerning the estimation of the true concentra-
tion of a drug solution, leads to the idea of a continUOUB probability
di8tribution. If many estimates were made of the same concentration
it would be expected that the estimates would not be identical. By
analogy with the discontinuous case just discussed it should be possible,
if a large enough number of estimates were made, to find the proportion
of estimates having any given value. However, since the concentration
is a continuous variable the problem is more difficult because the
proportion of estimates having exactly any given value (e.g. exactly
12 pg/ml, that is 12·00000000...pg/ml) will obviously in principle be
indefinitely small (in fact experimental difficulties will mean that the
answer can only be given to, say, three significant figures so that in
practice the concentration estimate will be a discontinuous variable).
The way in which this difficulty is overcome is disoussed in § 4.1.

3.2. Simple sampling and the derivation of the binomial


distribution through examples
The binomial distribution predicts the probability, P(r), of observing
any specified number (r) of 'suocesses' in a series of n independent trials
of an event, when the outcome of a trial can be of only two sorts
('success' or 'failure'), and when the probability of obtaining a 'success'
is constant from trial to trial. If the conditions of independence and
constant probability are fulfilled the process of taking a sample (of n
trials) is described &8 Bimple aampling. When there are more than two
possible outcomes a generalization of the binomial distribution known
&8 the multinomial distribution is appropriate. Often it will not be
possible a priori to assume that sampling is Bimple and when this is so
it must be found out by experiment whether the observations are
binomially distributed or not.
The example in this section is intended to illustrate the nature of the
binomial distribution. It would not be a well-designed experiment to

Digitized by Google
§3.2 41S
teet a new drug because it does not inolude a control or placebo group.
Suitable experimental designs are discussed. in Chapters 8-11.
Suppose that" trials are made of a new drug. In this case 'one trial
of an event' is one administration of the drug to a patient. After each
trial it is reoorded whether the patient's oondition is apparently better
(outcome B) or apparently worse (outcome W). It is assumed for the
moment that the method of measurement is sensitive enough to rule
out the possibility of no change being observed.
The derivation of the binomial distribution specifies that the prob-
ability of obtaining a suooess shall be the same at every trial. What
eDOtly does this mean' If the" trials were all oonducted on the same
patient this would imply that the patient's reaction to the drug must
not ohange with time, and the condition of independence of trials
implies that the result of a trial must not be affected by the result of
previous trials. The result would be an estimate of the probability of
the drug producing an improvement in the Bingle patient tested.
Under these oonditions the proportion of successes in repeated sets of"
trials should follow the binomial distribution.
At first sight it might be thought, because it is doubtless true that
the probability of a suooess outcome B, will differ from patient to
patient, that if the " trials were conducted on " dil/ert:nt patients,
the proportion of successes in repeated sets of" trials would fIOI follow
the binomial distribution. This would quite probably be 80 if, for
example, each set of " patients was selected in a different part of the
oountry. However, if the sets of" patients were selected 8Wictly til
ra1'ltlom (see § 2.3) from a large population of patients, then the propor-
tion of patients in the population who will show outcome B (i.e. the
probability, given random sampling, of outcome B) would not change
between the selection of one patient and the next, or between the
selection of one sample of " patients and the next. Therefore the
conditions of oonstant probability and independence would be met in
spite of the fact that patients differ in their reactions to drugs. Notice
the critical importance of strictly random selection of samples, already
emphasized in § 2.3.
From the rules of probability diaoussed. in § 2.4 it is easy to find
the probability of any specified result (number of sucoesses out of "
trials) if 9'(B), the 'rue (population) proportion of cases in whioh the
patient improves, is known. This is a deductive, rather than induotive,
prooedure. A true probability is given and the probability of a particular
result calculated. The reverse prooess, the inference of the population

Digitized by Google
§ 3.2
proportion from a sample, is discU88ed later in § 7.7 and exemplified
by the assay of purity in heart by the Oakley method, described in
§7.S.
Two different drugs will be considered.
'''''''''''''DC> that drug [Jijmpletely inactive, bRRt zt!CRR!C!CR!CRtheless
of patients, mn, improve i.e.
&,(B) = O·J:S = 1-&,(B) = (3.2.1)
¥u"r,~"...
that drug and that of
patients improving in the long run is increased to 90 per cent. Thus
&,(B) = 0·9 and &,(W) = 1-9i'(B) = o·l. (3.2.2)
In both cases, because the outcomes Band W are mutually exclusive,
the special case of the addition rule (2.4.2) gives &,(B or W) =
and becau¥!CR &~re exhaustive twssible
'"'''''''''''''''''' 9i'(B or W) =
= 2)
two trials 0, 1, might be obs!CR!CRR!C'!CRP. t"!CRssible
outcomes of the two trials are shown in Table 3.2.1 and from these
probabilities, P(r), of observing r successes (r = 0, 1, or 2), are calculated
using the multiplication rule, (2.4.6), and the addition rule (2.4.2) .

. 2.1

lat 2nd P(r) when


trial trial a'(B) = 0'5

0·25
1 W a'(W) xa'(B) 0'25} 0'09} 0.18
1 B a'(B) xa'(W) 0·25 0·09
2 B a'(B) xa'(B) 0·25 0·81

Total 1·0 1-0

,.-11
seen that I each case, hy the
,.-0
because it th"t r will take
o and n.
It is also clear from the table that the calculations are affected by

gle
§ 3.2 Binomial aM P0i88fm 47
the number of ways in which a given result can ocour. One suocess
out of two can occur in two ways, either at the first trial or at the
second, 80 the probability of one success out of two trials, if the order
in which they ocour is immaterial, is 0·6 when 9i'(B) = 0·5 (drug X),
and 0·18 when 9i'(B) = 0·9 (drug V). This follows from the addition
rule, being the probability of either (B at first trial and W at second)
or (W at first trial and B at second).
The mean number (expectation, see Appendix 1) of successes out of
n trials is nfJI, i.e. 1 success out of 2 trials when 9i'(B) = 0·5 (drug X),

0·8

0·7

~ 0·6
~

-.:-
0·5
i='" 0'5

::: ....
III

0·4 0·4

!..
0
~
~
~ 0·3
:cas 0·3

'0 - .Q

~
.€ 0·2 0·2
::5.,
£ 0·) 0· )

0·0 0·0
() 2 r 0 r
FlO. 3.2.1. Binomial distribution FlO. 3.2.2.
of T, the number of BUCCesees out of As in Fig. 3.2.1. butfJ' = 0·0.
n = 2 trials of an event. the
probability of 8uccees at each trial
being fJ' = 0·6.

and 1·8 successes out of two trials when 9i'(B) = 0·9 (drug V). The
results in the table are plotted in Figs. 3.2.1 and 3.2.2.

Three trial administrations oj the drug (n = 3)


Calculations, exactly similar to those above are shown in Table
3.2.2 for the case when three trial administrations of the drug are

Digitized by COOS Ie
48 § 3.2
performed. In this oaae there are three possible orders in which one
success may occur in three trials, and the same number for two successes
in three trials. Check the figures in the table to make sure you have got
the idea.
These distributions are plotted in Figs. 3.2.3 and 3.2.4.

0·5

-..
i( 0.4

I..
~ 0':3
...
'0
~. 0·2

~ 0·'
Flo. 8 . 2.8. Binomial distribution of T, the number of BUcctae8 out of" = 8
trlaJs of an event, the probability of 8UCceas at each trial being fJI' = 0·5.

0·7

.~
];
J 0 ·2
J:
0 ·)

0·0
II r

FIG . 8.2 . 4 . As in Fig. 8.2.3 butfJl' = 0·9.

Digitized by COOS Ie
§ 3.3 Binomial and p~ '9
TABLB 3.2.2
P(r) when P(r) when
r 1st 2nd Srd 8'(B) = 0·6 8'(B) = 0'9
trial trial trial (Drq X) (Drq Y)

0 W W W 0·126 0-001
1 W W B 0'125} 0'009}
1 W B W 0·126 0·S75 0'009 0-027
1 B W W 0-126 0-009
2 B B w 0-126} O'OSI}
2 B W B 0-126 0-S76 O·OSI 0-24S
2 W B B 0'126 O'OSI
S B B B 0-126 0'729

Total 1-000 1'000

3.3. illustration of the danger of drawing conclusions from


smallsampl..
suppoee that it is wished to compare the treatment., X &nd Y, uaed in the
previoua IIeCtion (Bee (S.2.1) and (S.2.2». An experiment is performed by teetina
three 8Dbjecte with treatment X and three subject. with treatment Y, the 8Dbjecte
being randomly eelected from the population of subjecte, and randomly allocated
to X or Y using random number tables (Bee § 2.S). The probabilitiea of obtaiDing
r 8Dcce118e8 in each eet of S triala have already been given in Table S.2.2 &nd
are reproduced in Table S.S.1 together with the product. which, by the multi-
pHcation rule, give the probabilities of observing both (r succeaaea with X) &Dd
(r SUcce118e8 with V).

TABLE S.S.1

P(r) when8'(B) = 0'5 P(r) when8'(B) = 0'9 product


r (treatment X) (treatment Y)

0 0'125 0'001 0·000126


1 0'S75 0'027 0'010125
2 0·S75 0·24S 0'091126
S 0'126 0'729 0'01S226

Totals 1·000 1'000 0'1196

The aum of the product., 0'1196, gives, by the addition ruJe, the probability
of obtaining eil1aer (0 succeaaea with both drugs) or (1 success with both) or
(2 succeaaea with both) or (S succeaaea with both). Thus in 11·96 per cent of experi-
ment. in the longrun, treatment X will appear to be equi-eftective with treatment
Y, though in fact the latter is considerably better.
Furthermore, in some experiment. X will actually produce a bettn result than

Digitized by Google
50 § 3.3
Y. By enumerating the ways in which this can happen, and applying the addition
and multiplication rules, the probability of this outcome Is seen to be
(0·376 X 0·001)+0·375(0'027 +0'001)+0'126(0'243+0'027 +0'001) = 0·04475.
For eumple, the second term is the probability of obtaining both (2 succeaes
with X) and (either 0 or 1 8UCceaaea with Y). The treatments will be placed in the
t.Or'OtIf1 order of e1!ectiveneee in 4·475 per cent of triaJB in the long run.
The result of theBe calculations is the prediction that in the long run X will
appear to be &8 good &8, or even better than Y in 11'96+4·476 = 16·4 per cent of
experiments. It would thus be quite likely that a good new treatment would
remain undetected if an experiment were conducted with 8&D1plee &8 email &8
those in this illustration. The baz&rds of email 8&D1plee are dealt with further in
I 7.7 and in I 7.8, which describes the use of the binomial for the &88&y of purity
in heart.

3.4. The generel expression for the binomiel distribution and for
its mean and variance
The probability, P(1'), of observing l' successes out of 11. trials when the
probability of a SUCOOB8 is fJI, and the probability of a failure is therefore
I-fJ1 from (2.4.3», can be inferred by generalization ofthe deductions
in § 3.2. It is
(3.4.1)
if the order in which the SUCOOBSeB occur is specified. Commonly the
order is of no interest, and therefore, by the addition rule, this must
be multiplied by the number of ways in which l' SUCOOBSeB can occur in
11. trials namely

11.1
---, (3.4.2)
l' !(n-1')!

which is the number of poBSible combinations of l' objects selected from


n. t Thus, when the order of the successes ignored,
11.1
P(1') = I( _ ) 1fJ"'(l-fJI)"-". (3.4.3)
r.n1'.

The proof that the sum of these probabilities, for all po8Bible values
of l' from 0 to 11., is 1 follows from the fact that (3.4.3) is a term in the
t This quantity is often denoted by the symbol (~), or by ·0•. It is the number of
po88ib1e ways of dividing n objects into two groupe containing r and n-r objects ('suc-
C8IIIJ8II' and 'failures' in the preeent caee). The n objects can be arranged in n I different
orders (pennutatione), and in each caee the fi1'IIt reelected for one group, the remaining
n-r for the other. However the rl pennutatione of the objects within the fi1'IIt group,
and the (n -r) I pennutatione within the eecond group, all result into the l&IIle division
into two groupe, hence the denominator of (3.'.2).

Digitized by Google
§ 3.4 51
expansion (.t+9')tl, where .t = 1-9', by the binomial theorem. Thus

I per) = (.t+9')tl = 1- = 1.
r-O
Example l. If n. = 3 and 9' = O'G, then the probability of one
SUCOO88 (r = 1) out of three trials is, using (3.4.3),
31
P(I) = -0·51 0·GII = 3xO'12G = 0'37G
1121

as found already in Table 3.2.2.


Example 2. If n. = 3 and 9' = 0·9, then the probability of three
trials all beiqg succe88ful (r = 3) is , similarly,
3!
P(3) = 3!010'93 0.1 0 = 1 X 0·729 = 0·729

as found in Table 3.2.2 (because 01 = I, see § 2.1, p. 10).

Estimation. oJ e1ae mean. aM mnance oJ e1ae binomial distribution.


When it is required to estimate the probability of a succe88 from
experimental results the obvious method is to use the observed propor-
tion of successes in the sample, rln., as an estimate of 9'. Conversely,
the average number of SUCC88888 in the long run will be n.9' as exempli-
fied in § 3.2 (this can be found more rigorously using appendix equation
(AI.I.I».
If many samples of n. were taken it would be found that the number
of successes, r, varied from sample to sample (see § 3.2). Given a number
of values of r this scatter could be measured by estimating their variance
in the usual way, using (2.6.2). However, in the case of the binomial
distribution (unlike the Gaussian distribution) it can be shown (see
eqn (AI.2.7» that the variance that would be found in this way can be
predicted even from a single value of r, using the formula
fuu(r) = n.9'(I-9')
into which the experimental estimate of 9', viz. rln, can be substituted.
The meaning of this equation can be illustrated numerically. Take
the case of n. = 2 trials when 9' = 0'5, which was illustrated in § 3.2.
The mean number of successes in 2 trials (long run mean of r) will be
p = n.9' = 2 X 0·5 = 1. Suppose that a sample of 4 sets of 2 trials were
performed, and that the results were r = 0, r = I, r = I, and r = 2
successes out of 2 trials (that is, by good luck, the results were exactly

Digitized by Google
52 Theoretical distrilndioM § 3.4
typical of the population, each of these values of r being equiprobable,
see Table 3.2.1). The variance of r oould now be estimated using
(2.6.3). N = 4 is used in the denominator, not N -1, because the
population mean,,.,, not the sample mean, is being used~ § 2.6). It
would be

whioh is exactly the result found using (2.4.4); thus


v.tU(r) = 1&.9'(1-9') = 2 X 0'6(1-0·5) = 0·6.

C 0·15
col
.~

.€

I o

FlO. S.4.1. Hi8togram. Binomial distribution with n = 25 and ~ = O·S.


Mean number of succeeeee out of 25 trials = ~ = 7·5. Variance of r, vat(r)
= ~(1-~) = 5·25; aIr) = y(5'25) = 2·29; vat(rln) = ~(1-~)/n = 0·0084;
a(rln) = yO'0084 = 0'0917. ContintAOtU di8tribution. Calculated GauBBian ('nor-
mal') distribution with p = 7·5 and a = 2·29.

Digitized by COOS Ie
3.5 Binomial au P0i88on. 53
The agreement is only e:md because the sample happened to be pa-Jtdly
representative of the population. If the calculations are baaed on small
samples the estimate of varianoe obtained from (3.'.') will agree
approximately, but not exaotly, with the estimate from (2.6.2). A
similar situation arises in the case of the Poisson distribution and a
numerical example is given in § 3.7.
Results are often expressed as the proportion (rIft), rather than
the number (r), of SU0088888 out of ft trials. The varianoe of the propor-
tion of 8UOO88888 follows directly from the rule (2.7 .G) for the effect of
multiplying a variable (r in this case) by a constant (11ft in this case).
Thus, from (3.'.'),
v&t(r) .9'(1-.9')
v&t(rIft) = - - = . (3.'.G)
ftll ft

The use of these expressions is illustrated in Fig. 3.'.1 in whioh the


abscissa is given in terms both of r and of rIft. As might be supposed
from Fig. 3.2.1-3.2.', the binomial distribution is only symmetrical if
.9' = O·G. However, Fig. 3.'.1 shows that as ft increa.aes the distribution
beoomes more nearly symmetrical even when .9' =F O·G. The binomial
distribution in Fig. 3.'.1 i8 seen to be quite o1osely approximated by the
superimposed continuous and 8ymmetrical Gaussian distribution (see
Chapter '), which has been constructed to have the same mean and
varianoe as the binomial.

3.&. Random events. The Poisson distribution

Gaui8 oj 1M di8tribution.. BelationBhip to 1M binomial


The Poisson distribution describes the oocurrenoe of purely random
events in a continuum of spaoe or time. The IOrt of events that may be
described by the distribution (it is a matter for experimental observa-
tion) are the number of oells visible per square of haemooytometer,
the number of isotope disintegrations in unit time, or the number of
quanta of acetylcholine released at a nerve-ending in response to a
stimulus. The Poi880n distribution is used as a criterion of the random-
nesa of events of this sort (see § 3.6 for examples). It can be derived in
two ways.
First, it can be derived directly by considering random events, when
(3.G.l) follows (using the multiplication rule for independent events,
(2 .•. 6» from the assumption that events ooourring in non-overlapping

Digitized by Google
54 Theoretical distributions § 3.5
intervals of time or space are independent. This derivation is given in
§ A2.2 (Chapter 5 should be read first). The independence of time
intervals is part of the definition of random events (see Chapter 5 and
Appendix 2).
Secondly, the Poisson distribution can be derived from the binomial
distribution (§ 3.4). In the examples cited the number of 'successes'
(e.g. disintegrations per second) can be counted, but it does not obviously
make sense to talk about the 'number of trials of the event'. Consider
an interval of time Ilt seconds long (or an interval of space) divided into
n small intervals. If the true (or population, or long-term) average
number of events in Ilt seconds is called m, then the probability of one
event occurring ('SUCce88') in a small interval of length I1t/n is 9
= mIn. t Because of the independence of time intervals the n intervals
are like n independem trials with a C01I.8tam probability 9 = ",/n of
SUCce88 at each trial, just like n tosses of a coin. These properties of
independence and constancy define (plausibly enough) what is meant by
'random'. If n is finite, the number of successes in n trials is therefore
given by the binomial distribution, (3.4.3), with 9 = mIn. In order to
consider very short time intervals let n-oo (and thus 9_0) in eqn
(3.4.3), so that m = ~ remains fixed. The result is (3.5.1), a limiting
form of the binomial distribution in which neither n nor 9, but only m
appears. The derivation is discussed by Feller (1957, p. 146), Mood and
Graybill (1963, p. 70), and Lindley (1965, p. 73). It is easy to follow ifit
is remembered that as n-oo,lim (l-m/n)" = e-M. See Thompson (1965,
Chapter 14) if it is not remembered.
The distribution gives the true probability of observing r events per
unit of time (or space) as
m'
P(r) = ----:-e--, (3.5.1)
rl
where m is the true mean number of events per unit of time or space.
(It is shown in Appendix I, (Al.l.7), that m is the population mean
value of r.) This is a discontinuous distribution because r must be an
t You may object that .. could be bigger than n, giving a probability bigger than 1 !
But the argument only applies to very 6Iion intervals 10 that #It < n and the chance of
fIIOf"8l1aon OM event occurring in a short interval (length At/n) is negligible. For example,
if At = 1 hour (3600 8) and #It = 36 eventa/h, then if n = 3600 it follows that fI' = 36/
3600 = 0·01. On average, 99 out of 100 Is intervals contain no event ('failure'), 1 in
100 contains I event ('8UCO_') and a negligible proportion contains more than one
event. The 'negligible proportion' is dealt with more rigorously in Appendix 2. It be-
comes zero if the intervals are made infinitely ahart, which is why we let n-+ co in the
derivation.

Digitized by Google
§ 3.5 BiflOmial and PoiuoA
integer. It baa the basic property of all probability distributions that it
should be certain (P = 1) that one or other of the possible outcomes
(r = 0, r = 1, ... ) will be observed. From the addition rule (2.4.2),
this means that p[r = 0 or r = 1 or... co] is the sum of the separate
probabilities i.e., from (3.5.1),
IX> IX> ",r ",2
I P(r) = e-'" I - = e-"(I+",+-+ ... ) = e--eM = 1. (3.5.2)
,-0 ,.orl 21
(See Thompson (1965, p. 118) if you do not recognize the expansion of
eM.)

The variance oJ the Poi88O'n diBtnbution


According to (3.4.4) the variance of the number of 'succe888s', r, for
the binomial distribution, is t'a_(r) = n91'(1-9'). Because Nt = n9I'
this can be written ",(1-9'), and because, as discU88ed above, the
Poi880n distribution can be derived from the binomial by letting ~-O,
the variance of the Poi880n becomes simply
tuu(r) = III, (3.5.3)
the same as the mean. As in the case of the binomial distribution,
but not the normal distribution, this allows an estimate of variance
to be made with even a single ob~rvation of r (a single estimate of III),
as well as by the conventional method of estimation. This is illustrated
numerically in § 3.7.

3.8. Some biological applications of the Poisson distribution


Oell diatribution
If the number of cells per unit area of a counting chamber were
observed to be Poi880n-distributed this would imply that the cells
were independent and randomly distributed, for example that they
have no tendency to clump.
Thus, if the number of red cells present in the volume represented by
one small square of a haemocytometer is r, and the number of squares
observed to contain r cells is J, then, using the observations in Table
3.6.1, the estimated mean number of cells per square is the total number
of cells divided by the total number of squares, i.e.

"i:.Jr 531
r = -
"i:.J
= -80 = 6·625 .

Digitized by Google
56 §3.6
The Poisson distribution (3.5.1) gives the probability of a square
containing r cells as per) = e-..",r/rl, where"" the mean number of
cells per square, is estimated by r. For example, the probability of a
square containing 3 cells is predicted to be
(6·625)3
P(3) = e- 8 •826 31 ,....,0·064.

Multiplying this probability by the number of squares counted (80)


gives the predicted /requ,en,cy (/0..10) of squares containing 3 cells, viz.
80 X 0·064 = 5·1, i.e. about 5 squares. The rest of the values are given
in Table 3.6.1. The observed distribution is slightly more clumped
than the calculated Poisson distribution. In § 8.6, p. [133], a test is
carried out to see whether this tendency can be reasonably be attributed
to random errors. For this purpose some categories are pooled as indi-
cated by the brackets in Table 3.6.1.

TA.BLB 3.6.1
,. obe. fI'eq. (f) calc. fI'eq. fr
0 0

&
1 0 •
2
,
3 V'
6 9 20
2
9

6 10 11 60
6 16 18 90
7 20 12 ao
8 17 10 136
9 6 7 M
10 8 6 80
0
~}o
11
12 :}6 0
ormor.,
Totals 80 80 631

Bacterial dilutions
H samples of a dilute suspension of bacteria are subcultured into
several replicate tubes then bacterial growth will result in those tubes
in which the added sample contained one or more viable bacteria. The
proportion of tubes showing growth is therefore an estimate of the
probability that a sample contains one or more organisms, P(r~l).
1/ the bacteria in the sample 8U8penai0n8 u·ere randomly and independently

Digitized by Google
§3.6 Birwmial aM PoiBson. 57
diMribWed throughout the suspending medium the number of bacteria
in unit volume of solution (r) would follow the Poisson distribution;
this enables an estimate of the mean number of cells per sample (M)
to be made from the observed proportion of suboultures showing
growth (P(r~l) = p, say).
From (3.&.1) the probability of the sample being sterile (r = 0)
is P(O) = e-· and therefore, by (2.4.3)(0£. (3.&.2»,
p = P(r~l) = I-P(O) = I-e- Ill •

By solving this fot m the mean number of viable organisDl8 per sample
is estimated to be
m = -log.(I-p)
(remember loglle% = z). For example, if 40 per cent of oultures are non-
sterile, p is 0·4 and m = -log.(1-0·4) = 0·51 organisDl8 per sample.
The error of this estimate depends on the number of suboultures on
whioh the estimate of p is based and is usually quite large.

Phe quantal ,elease 01 acellIlcAoline at nert16 terminals


In a low-calcium, high-magnesium medium the musole end-plate (or
post-synaptio) potential elicited by nerve stimulation is redueed in size
because the number of quanta of acetylcholine released is reduced. A
certain proportion of stimuli produce no response at all ('failures').
The number of quanta of acetyloholine released per stimulus has
been found to be Poisson distributed (see Martin 1966). In other
words, the proportion of stimuli causing release of, quanta, P(r), is
observed to be predicted well by (3.5.1). This is illustrated by an example
given by Katz (1966). The mean response to a single quantum (mean of
78 spontaneous miniature end-plate potentials) W&8 0·4 mY. The mean
of the responses to 198 nerve impulses was 0·933 mV, the individual
responses tending to be either zero ('failures', , = 0) or integer
multiples of 0·4 mV corresponding to the release of an integral number
(r) of quanta. Assuming that the response (mV) is proportional to the
number of quanta released, the mean number released is estimated to
be m = 0·933/0·4 = 2·33 quanta per stimulus. The proportion of
stimuli releasing' quanta is therefore predicted, from (3.5.1), to be
2·33Fe- a•33/, I The predicted number of impulses out of 198 releasing r
quanta is simply 198 times this proportion. The results in Table 3.6.2
show that the Poisson prediction agrees well with observations.

Digitized by Google
#1Re;r[i#1ReuCI:U diatributi4freEii: § 3.6
TABLE 3.6.2
Oompari8on oj ob8erved and P0i88on di8tributions oj the number oj quanta
oj acetyl choline released per 8timulus (Katz, 1966; based on Boyd and
Martin, 1956)

r ObseiVi,i£
number of freqUiViY
quanta

19 18
°1 44 44
2 52 55
3 40 36
4 24 25
5 11 12
6 5 5
7
8
9

Total 198 198

The predicted frequencies are only approximate because the observed


mean m has been substituted for the population mean, In, in (3.5.1).
iiYii'fli#1R'fled frequencies iivproximate becaViv to a
yuvntum is itself quite {'fltiiiiidard deviation
i'i#1R'i#1RihiFii 100 X 0·086/0·4 so the re&pOi'rii#1RE
overlap somewhhit, response is 1argi',
longer directly the number ot Yetalls
are discU88ed by Martin (1966) and Katz (1966). When corrections are made for
these factors the observed distribution of responses (in m V) is fitted closely by
the calculated distribution.

Furthermore, assuming a Poisson distribution of r, In can be estimated


from the observed number offailures, viz. 18 (from Table 3.6.2), because
(from 3.5.1)), -logeP(O) = loge
2·4 quanta agreeing the
estimate 2·:~:~ fruhich was
Poisson distribuYiuu,

Estimation of the qtUlntaZ content, M, by the 'coejJicient8 of variation method'


If the depolarization produced by a single quantum (miniature end plate
potential) is denoted II, and the quantal content is m &8 above, then the end plate

gle
§ 3.6 Binomial aM PoWon. 69
potential C&Il be repreeented (when 8 is small enough for the _. to be additive) by

8 = '"
1:-., (8.6.1)
'-1
which is the sum of a va.ria.ble number (m) of random variables (_). It is stated in
(2.7.19), and proved in I A1.4, that if the miniature end-plate potentiaJa are
independent of each other (which is probably so), and if the end-plate potential, 8,
is produced by a random sample (of variable size, m) from the population of
single quanta (which is leas certain), then the square of coeftlcient of variation
of the end-plate potential size is given by

..
'iP(8) = 'iP(_) +'iP(m), (3.6.2)

where '1'(_) and 'I'(m) are the population coeftlcients of variation of _ and m
defined in (2.6.4). This result does not depend on aasuming any particular distri-
bution for either _ or m (see I Al.4).
Suppose, for example, that m is binomially distributed, which might beexpected
if the nerve impul8e caU8ed there to be a constant probability ~ of releaaing each
of a population of N quanta, so the true mean number of quanta releaaed is
.. = N~ 88 in 13.4, and, on average, a proportion~ of the population is releaaed.
According to (3.4.4), luu(m) = N~(l-~) = ..(I-IP) and therefore 'iP(m)
== ."u(m)/__ = (l-~)/"', Substituting this into (3.6.2) gives

.. ..
'iP(8) = 'iP(_) + (1-~). (8.6.8)

Solving for .., gives, in this case of binomial distribution of m,

'iP(_)+I-~
.. = ---'-c=-- (8.6.4)
'iP(8)

The case where m is Poisson-distributed is obtained when ~ tends to zero (see


13.6), or directly from (3.6.2) using "at(m) = ... from (3.5.3), i.e. 'iP(m) = 11...,
giving

(3.6.6)

This, and the other results in this section, are discussed in the review by Martin
(1966). An estimate of ... is obtained by substituting the experimental estimates of
'1'(_) and '1'(8) into (8.6.5).
Equations (3.6.4) and (3.6.5) do not entirely account for the experimenta
observations and it W88 pointed out by del Cutillo and Katz (see Martin 1966)
that if we drop the rather unreaaonable 888umption that all the quanta have the
same probability of releaae, then 'iP(m) will be leas than the binomial value
(1-~)/"', which in turn is leas than the Poisson value, 11.... It can be shown
(e.g. Kendall and Stuart (1963, p. 127» that if each quantum baa a dUferent
probability (~/) of releaae, and that if these probabilities are constant from one
nerve impulse to the next, then 'iP(m) = (l-~-"a"(~)/~)/'" where ~ is the
mean probability of a quantum being rele88ed (i.e. ~IIN) and "at(IP) is the

Digitized by Google
60 § 3.6
variance of the 1111 values (zero in the binomiaJ caae when all the 111 are identical).
If this Is substituted in (3.6.2), solving for M gives

(3.6.6)

which Is amaller than Is given by either (3.6.4) or (3.6.5). In the caae where N
is very larp (3.6.6), like (8.6.4), tends to the Poisson form (3.6.5) despite the
variability of 111•
.As an example consider the observations discussed above. The observed values
for the response to one quantum waa I = 0·4 m V with standard deviation 0'086
m V, Le. coeftlcient of variation C(.) = 0·086/0·4 = 0'215. The observed mean end-
plate potenti&l waa 'B = 0'988 mV with a standard deviation of 0·684 mV (this
value Is taken for the purpoeea of illustration, the original figure not being
available) and hence CIS) = 0·684/0'938 = 0·680. If m were Polsaon-distributed
its mean value could be estimated from (3.6.5) aa

_ 0'215111 +1 1'046
m = 0'6S01i1 = 0.462 = 2'26,

which agrees quite well with eatbn&te (viz. 2.4) from the proportion of failures,
which al80 &lBUlDes a Poisson distribution, and the direct estimate 0'988/0'4
= 2·88 which does not.

PM number oJ BpO'nIan.eow miniature end-plate potemial8 in unit 'ime


The number of single quanta released in unit time is observed to
follow the Poisson distribution, i.e. quanta appear to be released
spontaneously in a random fashion. This phenomenon is discussed in
Chapter 5, after continuous distributions have been dealt with, 80
the continuous distribution of intervals between random events can be
discussed.

3.7. Theoretical and observed variances: a numerical example


concerning random radioisotope disintegration
The number of unstable nuclei disintegrating in unit time is observed
to be Poisson-distributed over periods during which decay is negligible
(see Appendix A2.5), and disintegration is therefore a random process
in time.
Since the variance of the Poisson distribution (3.5.3) is estimated
by the mean number of disintegrations per unit time, the uncertainty
of a count depends only on the number of disintegrations counted and
not on how long they took to count, or on whether one long count or
several shorter counts were done. The example is based on one given by
Taylor (1957). The values of z listed are n = 10 replicate counts, each

Digitized by Google
f3.7 BifW'mial aM PoWoA 61
over a period of &min, of a radioactive sample. The decay over the
period of the experiment is &88UID.ed to be negligible. The Z valu88 are
10536 10636 10398 10393 10&86
10381 10479 10401 10262 10403
The total number of counts is l:z = 10447& oounts/GO min, mean oount
i = l:z/_ = 10447·& counts/5 min, and oount-rate = 10447·&/& =
2089·& oounts/min. What is the unoertainty in the oount-rate' Its
variance can be calculated in two ways.

(a) TlNordieal Poiuon tHJria1lU


The number of counts observed in a 50-min unit of time W&8 104476
80 i/the number of oounts in unit were PoiaBOn-distributed the estimate
of the variance of the variable 'number of oounts in &0 minutea' would
be 10447& (from (3.&.3». In this 0&88 the total number of oounts was the
sum of ten I-min counts. In general. according to (2.7.4), VM(U) = _
VM(Z), 80 if Z is the number of oounts in &min. the variance of a
single 5-min oount is estimated to be
var(U) 104475
var(z) = _ = 10 = 10447·&.

If there had only been one 5-min count, say the first one, its variance
would have been estimated as 10&36, a similar figure.
However. what is really wanted is the variance of the count-rate
per minute, determined from &0 min of counting in the experiment,
not the variance of a I-min count. The count-rate is l:z/50 oounts/min.
In general, from (2.7.5), ViH(az) = a2viH(z), where a is a constant
(I/GO in this 0&88), therefore
fU) = var(U)
var\iO 502
104475
= ~= 41·79.

The atandard deviation of the mean count-rate (2089·5 counts/min) is


therefore V(41'79) = 6·46oounts/min.
If there had been only a single 5-min oount, say 10536, the mean
oount-rate would have been 10536/5 = 2107·2 counts/min, and, by a
similar argument, its estimated standard deviation would have been
V(10536/&2) = 20·6 counts/min. Thus when the number of observa-
tions is reduced tenfold, the standard deviation of the mean goes up by
V(10), as expected from (2.7.9) (6.46 X V(10) = 20·4).
It can be seen that the uncertainty in the oount depends only on the

Digitized by Google
62 Theoretical tlistributiom §).7
total number of counts. If it is known that the count-rate has a Poisson
distribution (as it will have if the counter is functioning correctly) its
uncertainty can be estimated without having to do replicate observa-
tions.

Ob8erved variance
In this particular case there are replicate counts 80 the variance of
an observation (a 5-min count) can be estimated in the usual way
using (2.6.2),
~(X_i)2 111938
va.r(x) = .-1 = 9 = 12437·5.

This is quite close to the estimate of 10447·5 found above. Because


there are ten 5-min counts the estimate of count-rate will be based on
the mean of these, the variance of which is estimated, using (2.7.8), to be
va.r(x) 12437·5
va.r(i) = -.- = 10 = 1243·75.

And the variance of the mean count-rate per minute will be, from
(2.7.5),

(i) = -
var -5
var(i)
52- =
1243·75
25 = 49·75 .
By using the scatter of replicate counts, the standard deviation of
the count-rate (2089·5 counts/min) is therefore estimated to be
v'49·75 = 7·05 counts/min. This estimate, which has not involved any
assumption about the distribution of the observations, agrees well with
the estimate (6·46 counts/min) calculated assuming that the count-rate
was Poisson-distributed. This suggests that the assumption was not
far wrong. With either estimate the coefficient of variation of the
count-rate, by (2.6.4), comes out to about 0·3 per cent.

The effect of allowing for baclcgrOUM count-rate


Counting equipment registers a background rate even when there
is no sample in it and this must be subtracted from the sample count-
rate. There is uncertainty in the background count as well as the
sample count and this must be allowed for.
To illustrate what happens when the sample count-rate is not
much above the background rate suppose that 20 000 background

Digitized by Google
§3.7 63
counts were recorded in 10 min. Th,e net count is thus 2089'6-2000
= 89·6 counts/oiin.
By arguments similar to those above:
estimated variance of background count/min = var(count/l0) = var
(count)/I()2 = count/l()2 = 20000/100 = 200.
The estimated variance of the net count-rate (sample minus baok-
ground) is required. Because the counts are independent this is, by
(2.7.3), the sum of the variances of the two count-rates
var(sample-background) = var(sample)+var(baokground)
= 49'76+200 = 249'76,

and the estimated standard deviation of the net count-rate (89'6


counts/min) is therefore v'(249'75) = 15·8 counts/min. The coefficient
of variation of the net count, by (2.6.4), is now quite large (17'7 per
cent), and if the net count had been muoh smaller the difference be-
tween sample and background would have been completely swamped
by the counting error (for a fuller discussion see Taylor (1967».

Digitized by Google
4. Theoretical distributions. The
Gaussian (or normal) and other
continuous distributions

'When It was ftrat proposed to eetabUah laboratories at Cambridge, Todhunter,


the mathematician, objected that It was unneceaaa.ry for students to see experi-
ments performed, since the results could be vouched for by their teachers, all of
them men of the highest character, and many of them clergymen of the Church of
England.'
BBRTRAND RUSSBLL 1981
(2'1ae Bci#IrtUftc 0uU001c)

4.1. The reprasantation of continuous distributions in general


So far only discontinuous variables have been dealt with. In many
oases it is more convenient, though, because only a few significant
figures are retained, not striotly correct, to treat the experimental
variables as continuous. For example, ohanges in blood pressure,
muscle tension, daily urinary eXCl'etion, etc. are regarded as potentially
able to have any value. The difficulties involved in dealing with
this situation have already been mentioned in § 3.1 and will now be
elucidated.
The discontinuous distributions 80 far disOU8Bed have been repre-
sented by histograms in which the height (along the ordinate) of the
blocks was a measure of the probability or frequency of observing a
particular value of the variable (along the abscissa). However, if one
asks 'What is the probability (or frequenoy) of observing a musole ten-
sion of emctlll 2·0000 ... g ", the answer must be that this probability
is infinitesimally small, and cannot therefore be plotted on the ordinate.
What can sensibly be asked is 'What is the probability (or frequenoy)
of observing a muscle tension between, say, 1·5 and 2·5 g ". This fre-
quenoy will be finite, and if many observations of tension are made a
histogram can be plotted using the frequenoy (along the ordinate) of
making observations between 0 and 0·5 g, 0·5 and 1·0 g, etc., as shown
in Fig. 4.1.1. H there were enough observations it would be better to
reduce the width of the classes from 0·5 g to, say, 0·1 g as shown in
Fig. 4.1.2. This gives a smoother-looking histogram. but because there

Digitized by Google
§ 4.1 66
are more classes the number (or probability) of observations falling
in a particular cl&88 will be reduced. The blocks will also be drawn
narrower though it will usually be convenient to keep them about the
0·4

0·3

0·)

I I
0'0 0·5 ),0 )'5 2·0 2·5 3·0 3·5
Muscle tension (grams)

FlO. 4.1.1. Histogram of muscle tensions; 'observations' grouped into


cta.ee 0·5 g wide and proportion of observations in each clus plotted as ordinate.
Total Might of all blocks = 1·0.

0·08
r-r-:-r-r-
r- t-
0·06 r- -
r- ~

r- -
r- '; r-

r- ~
0·02 r- -

0·00
0·5
~ )·0 1·5 2·0 2'5
~ 3·0 3·5
luscle tension (grams)
FlO. 4.1.2. Same 'observations' as Fig. 4.1.1 but grouped into narrower
cla8lea (0·1 g), 8howing shape of distribution more clearly. Total Might of all
blocks = 1·0 as before.

aame height, &8 shown. This suggests that it might be more convenient
to represent the probability of an observation falling in a particular
cl&88 by the area of the block rather than its height. If the width of
the block (cl&88 interval) is constant then the area of the block is
proportional to its height 80 in ordinary histograIns the area is in fact

Digitized by COOS Ie
66 § 4.1
proportional to the probability. When the 018.88 width is reduced, the
reduction in width of the blocks will reduce their area, and hence the
probability they represent, without having to reduce their height. An
example of a histogram with unequal block widths ocours in §§ 14.1-
14.3. Fig. 14.2.3(a) shows it drawn with height representing probability
and Fig. 14.2.3(b) shows the preferable representation of the histogram
with area representing probability.
Using the convention that the probability is represented by the area
of the blocks rather than their height, the condition that the sum of all
the probabilities must be 1·0 is expressed by defining the total area of
the blocks as 1·0 (see (3.5.2) for example).

0-4

~
[~~-----------------
...... 0-3
~

1
;.. 0-2
.~

£.. o-}
] srI's up to :1'=

0-5 2-0 Xo lHi I I

Muscle tension (gram8) x

FlO. 4. 1.8. Continuous distribution of muscle tensions. Ordinate is probability


density, !(z), i.e. a function such that the area under the curve represents
probability. The total area under the curve is 1·0. The probability of observing
a value equal to or 1_ than z is denoted p, or F(z) and is the area under the curve
up to z .

If the cl8.88 intervals are made narrower and narrower, the probability
attached to each becomes smaller. The probability of observing a
muscle tension (x) between 1·999 and 2·001 g is small, and a very
large number of observations would be necessary to use intervals as
narrow &8 this. When the cl8.88 interval becomes infinitesimally narrow
then the probability represented by eaoh block (i.e. the probability
that x will lie within the interval dx) is also infinitesimal, say dP, and
the graph becomes continuous instead of being made up of finite
blocks, as shown in Fig. 4.1.3. It now represents an infinite population
and it can never be observed exactly. It is a mathematical idealization.

Digitized by COOS Ie
§4.1 67
The area of a block, i.e. the probability of %falling in the interval of
width dz (between %and %+dz), must now be written
dP =J(%)dz. (4.1.1)

where the functionJ(%) is the ordinate of the curve shown in Fig. 4.1.3
(i.e. the height of the block), and % is the continuous variable (e.g.
blood preasure or muscle tension) the distribution of which is being
defined (see, e.g., Thompson, 1966, if the notation of (4.1.1) is not
understood). The function J(%) is known as the probability density
function (or simply density) of %. A value of this function is called the
probability density of a particular value of %. It is taoC the probability
of that value of %, but merely a function that defines a curve such that
the arm under ,he CUn1e represents probability. For example, the
uniformly shaded area in Fig. 4.1.3, as a proportion of the whole
area under the curve, is the probability that a value of % will lie
between two specified values, %1 and %2' The summation of the in-
finitesimal blooks of whioh this area is made up is handled mathe-
matically by integration so this area can be written as the integrated
form of (4.1.1),

P(%1 ~%~%2) = f"J(%)tk. (4.1.2)


-I

Similarly, the probability that a value of % greater than %:I will be


observed is equal to the area above the point %2' How far along the
%-axis the distribution curve extends depends on the particular distri-
bution under consideration. The curve may reaoh the axis at some
finite minimum or maximum value of %, implying that observations
less or greater than this value are impossible; or the curve may, like
the Gaussian (or normal) distribution, be asymptotio to the %-axis so
that any value of % is allowed, through the probability of observing
values far removed from the mean soon becomes small. In the latter
case the probability of observing a value of %equal to or le88 than %1
(the area under the distribution ourve below %1) would be written

P(%~%1) = f:~J(%)dz. (4.1.3)

This area is said, in statistical jargon, to be the lower tail of the distri-
bution. It can be called p, or F(%1)' and is vertically shaded in Fig.
4.1.3. It depends, of course, on the value of %1 ohosen, i.e. it is a funotion
of %1'

Digitized by Google
68 § 4.1
A more satisfactory way of writing the same thing is to use a special
symbol, say X, to distinguish x considered as a random variable, from
a particular value of the random variable, denoted simply x. The
probability of observing a value of the variable (e.g. musoletension) equal
to or less than some specified value x (e.g. 2·0 g) as in (4.1.3), is written
in this notation ast

P(x ~ x) = fICO J(x)dx == F(x), or p. (4.1.4)

This is referred to as the distribution Junction of x, or as the cumuiatiw


distribution. The area below x in Fig. 4.1.3, F(x), is plotted against x

1·0
J ()'9
F(~)
~
S F(r l >
I ()'8
'it 0·7
r;
8~
()'6

i.! 0·6
t ~
J:8
oJ
0·'

5:S ()'3
'8:6
~.J ()'2
:=
:I it ()'I

,e ~ ().O 0·6 3-o~ 3·6


%

FlO. 4.1.4. Diatrlbution function, F(z), for the distribution mown in Fig.
4.1.S. The probability of obeerving a value of z or lees Is plotted agaiDat z. The
area between Zl and ~ in Fig. 4.1.S Is F(~)-FCzl) = 0'988-0·894 = 0'094, the
probability of an obeervation falling between Zl and ~.

in Fig. 4.1.4. Examples of cumulative distributions occur in §§ 5.1 and


14.2. The &rea, F(x), approaches 1·0 as x becomes very large, i.e. it is
almost certain that the variable (e.g. muscle tension) will be less than
t Another, matbematioaUy better, _y of writiDg euctly the .me t.IJiJijl

PC' <: .) - f: co !C''')dtl.

TIle variable tI doea Dot app8U' in the final . . - .

Digitized by Google
§ 4.1 69
a specified very large value (e.g. 100 kg). Differentiating (4.1.4) shOWI
that the distribution funotion is related to the probability density, as
mggeeted by (4.1.I), thus
dF(z)
-=/(z). (4.1.6)
dz

4.2. The Gau_lan, or normal, dletrlbutlon. A ca.. of wlehful


thinking?
The aaaumption that the errors in real experimental observations
can be described by the normal distribution (4.2.1) has dominated
large areas of statistias for many years. The aaaumption is virtually
always untested, and the extent to whioh it is mere wishful thinking
will be discussed in this section, after the distribution has been defined,
and in § 6.2, where the merits of methods not involving the aaaumption
of Gauaaian errors are considered.

Deji_ilioft ollAe dYtributioft


The Gaussian distribution, often, but inappropriately, known as
the normal distribution, is defined by the probability density funotion
(see § 4.1)

I g g
I(z) = ay'(21r)exp[ -(z-p) /2a ], (4.2.ll

where 'It has its uaual value, and p and a are constants. The factor
l/ay'(21r) is a constant suoh that the total area under the ourve
(from z = - ex> to z = + ex» is 1·0. The notation exp(z) is used to
stand for eZ when the exponent, z, is a long expreBBion that would be
inconvenient to write as a superscript. H I(z) is plotted against z the
graph comes out as shown in Fig. 4.2.1.
It is a symmetrical bell-shaped curve asymptotio to, i.e. never quite
reaching, the z-axis. Being continuous it represents an infinite popula-
tion (see § 4.1). The constant p is the population meant and also the
population median and mode because the diatribution is symmetrical and
unimodal; see §§ 2.5 and 4.5. The constant a m.eaaurea the widtht of

t Thill ia proved. in I AU.


~ The diateDoe £rom P to the point of iDfteotion (muimum 1Iope) on eech lide of
the IDflUlo Differentiating (4.2.1) twice with re.peot to z and equating to zero giv•
• ... p±a. The population Vllriuaoe ia de60ed in I AU.

Digitized by Google
70 § 4.2
the ourve as shown in Fig. 4.2.1, i.e. it is a measure of the scatter of the
values of z, and is the population standard deviation of z. An estimate
of (I could be made from a sample of observations, taken from the
population represented by Fig. 4.2.1, using (2.6.2). The distribution is
completely defined by the two parameters '" and (I.

18 the widupread 'U8e oJ the normal distribution j'U8tijied 1


'Everybody firmly beHeves in it· [the n01'JD&l distribution] because the mathe-
maticians imagine that it is a fact of observation, and observers that it is a
theory of mathematics' (quoted by Poincare 1892).

From the point of view of someone trying to interpret real observa-


tions (and who else is statistics for I) the only po88ible justification for
the common &88umption of normality would be the ezperimental

O·O'-----~~I..---.....I.-----L-·--.l..---~""--=--­
mean-2a mean-a mean mean+a mean+2a :r

FlO. 4.2.1. Gauaai&n ('n01'JD&l') distribution. 4·6 per cent of the observations
in the population are more than two population standard deviations &om the
mean (the shaded area is 4·6 per cent of the total area). The value 4·6 does not
apply to Mlmpla or, in general, to distributions other than the Gauaai&n (see
II 4.4 and 4.5).

demonstration that the methods based on this &88umption give results


that are correct, or at least sufficiently nearly correct for the purpose in
hand.
The truth is that no such demonstration exists. The many textbooks,
elementary and not so elementary, describing methods that mostly
depend on this &88umption evade this awkward fact in a variety of
ways. The more advanced books usually say something like 'If z were
normally distributed then . . . would follow', which is true but not
very helpful in real life. In more elementary books one often finds
(to quote two) remarks suoh as 'It is not infrequently found that a

Digitized by Google
§ 4.2 71
population represented in this way [i.e. by a Gaussian curve] is suffi-
ciently accurately specified for the purpoBe of the inquiry', or 'Many of
the frequency functions applicable to obBerved distribution do have a
normal form'. Such remarks are, at least as far as most laboratory
investigations are concerned, just wishful thinking. Anyone with
experience of doing experiments must know that it is rare for the
distribution of the obBervations to be investigated. The number of
obBervations from G ringk population needed to get an idea of the form
of the distribution is quite large--a hundred or two at least--eo this is
not surprising. In the vast majority of C&Be8 the form ofthe distribution
is simply not known: and, in an even more overwhelming majority of
cases there is no substantial evidence regarding whether or not the
Gaussian curve, is a sufficiently good approximation for the PurpoBeS of
the inquiry. It is simply not known how often the assumption of
normality is Beriously misleading. See § 4.6 for tests of normality.
That most eminent amateur statistician, W. S. GosBet ('Student', Bee
§ 4.4), wrote, in a letter dated June 1929 to R. A. Fisher, the great
mathematical statistician, '. . . although when you think about it you
agree that "exactness" or even appropriate use depends on normality,
in practice you don't consider the question at all when you apply your
tables to your examples: not one word.'
For theBe reasons some methods have been developed that do not
rely on the assumption of normality. They are discussed in § 6.2. How-
ever, many problems can still be tackled only by methods that involve
the normality assumption, and when such a problem is encountered
there is a strong temptation to forget that it is not known how nearly
true the assumption is. A possible reason for using the Gaussian method
in the abBence of evidence one way or the other about the form of the
distribution, is that an important use of statistical methods is to
prevent the experimenter from making a fool of himBelf (Bee Chapters
1, 6, and 7). It would be a rash experimenter who preBented results that
would not pass a Gaussian test, unless the distribution was definitely
known to be not Gaussian.
It is commonly said that if the distribution of a variable is not
normal, the variable may be transformed to make the distribution
normal (for example, by taking the logarithms of the obBervations, Bee
§ 4.5). As pointed out above, there are hardly ever enough obBervations
to find out whether the distribution is normal or not, so this approach
can rarely be used. Transformations are discussed again in §§ 4.6, 11.2
(p. 176) and § 12.2 (p. 221).

Digitized by Google
72 § 4.2
Various other reasons are often given for using GaU88ian methods.
One is that aome GaU88ian methods have been shown to befairly immune
to BOme sorts of deviations from normality, if the samples are not too
small. Many methods involve the estimation of means and there is an
ingenious bit of mathematiOB known &8 the central limil theorem that
states that the distribution of the means of samples of observations will
tend more and more nearly to the GaU88ian form &8 the sample size
increases whatever (almost) the form of the distribution of the observa-
tions themselves (even if it is skew or discontinuous). These remarks
suggest that when one is dealing with reasonably large samples,
GaU88ian methods may be used &8 an approximation. The snag is that
it is impossible to say, in any particular case, what is a 'reasonable'
number of observations, or how approximate the approximation will
be.
Further discU88ion of the &88umptions made in statistical calculations
will be found particularly in §§ 6.2 and 11.2.

4.3. The etandard normal dietribution


Applications of the normal distribution often involve finding the
proportion of the total area under the normal curve that lies between
particular values of the abscissa z. This area must be obtained by
evaluating the integral (4.1.2), with the normal probability density
function (4.2.1) substituted for f(z). The integral cannot be explicitly
solved. The answer comes out &8 the sum of an infinite number of terms
(obtained by expanding the exponential). In praoticethe only convenient
method of obtaining areas is from tables. For example, the Biometrika
Tables, Pearson and Hartley (1966, Table 1), give the area under the
standard normal distribution (defined below) below u (or the area
above -u which is the same), i.e. the area between -00 and u (see
below). In this table u and the area are denoted X and P(X) respectively.
Fisher and Yates (1963, Table 111' p. 45) give the area above u (= area
below -u), the value of u being denoted z in this table.t
If tables had to be constructed for a wide enough range of values of
f' and (1 to be useful they would be very voluminous. Fortunately
this is not necessary since it is found that the area lying within any
given number of standard deviations on either side of the mean is the

t TabJee of Student', , (1188 § '.') give, on the line fer infinite degreM of floeeclom. the
_ below -II plua the _ above +u. i.e. the _ in bolla taiJa of the diltribation of u.

Digitized by Google
73
same whatever the values of the mean and standard deviation. For
example it is found that:
(1) 68·3 per cent of the area under the curve lies within one standard
deviation on either side of the mean. That is, in the long run, 68·3
per cent of random observations from a Gauuian population would
be fO\Uld to dift"er from the population mean by not more than one
population standard deviation.

(2) 95·' per cent of the area lies within two standard deviations (or
95·0 per cent within ±I·96a). The 4·6 per cent of the area outside
±2a is shaded in Fig. '.2.1.
(3) 99·7 per cent of the area lies within three standard deviations.
Only 0·3 per cent of random observations from a Gaussian population
are expected to differ from the mean by more than three standard
deviations.

It follows that all normal distributions can be reduced to a single


distribu.tion if the abscissa is measured in terms of deviations from the
mean, expressed as a number of population standard deviations. In
other words, instead of considering the distribution of % itself it is
simpler to consider the distribution of

,,=-_.
%-J.'
a <'.3.1)

The distribution of " is called the Btandartl normal di8tribution.


It is still a normal distribution because " is linearly related to the
normally distributed % (J.' and a being constants for any particular
distribution), but it neoessarilyt always has a mean of zero and a
standard deviation of 1·0. The numerator, %-J.', is a normally distri-
buted variable with a population mean of zero (because the long run
average value of % is J.') and variance a2. To illustrate this consider a
normally distributed variable % with population mean J.' = 6 and
population standard deviation a = 3. It can be seen from Fig. 4.3.1 that
the distribution of (%-J.'), i.e. of (%-6), has a mean of zero but a
standard deviation unchanged at 3 (cf. (2.7.7», and that when this
quantity is divided by a the standard normal distribution (mean = 0
standard deviation = 1) results.

t See II AU ADd AU. The IteDdard Conn ofa diatribution i8 de8ned in I AU.

Digitized by Google
In terms of the standard normal distribution the areas (obtainable
from the tables referred to above) become

(1) 6S·3 per cent of the &rea lies between u = -1 and u = +1


(and thus U·S5 per cent lies below -I, and 15·85 per cent above
+1),

0·15 Ca,

0·OOI.--_""""!"3oo::::::~!:---*"--+.------*--*~:::::o.:1"='5-x
0·15
Cb)
~

1~
0·10

I O~
&OO~--~9~~~--~-~--~-~~~~9~X---~
0·40

FlO. 4.S.1. Relation of nol'Dl&l distribution to standard normal distribution.


(a) tz is normally distributed with population mean p = 6 and population standard
deviation a = 3. (b) (tz - p' is normally distributed with population mean = 0
",-6
and population standard deviation = 3. (e) u = (tz-p)/a = -8- in this case
is normally distributed with population mean = 0 and population standard
deviation = 1.

Digitized by Google
§4.3 75
(2) 95 per cent of the area lies between u = -1·96 and u = + 1·96,
(3) 99·7 per cent of the area. lies between u = -3 and u = +3.
In order to convert an observation z into a value of u = {z-p)/a, it
is, of course, neoessa.ry to know the values of p and a. In re&llife the
values of p and a will not generally be known, only more or less acourate
ulimatu of them, viz. f and 8, will be available. H the normal distribu-
tion is to be used for induction &8 well &8 deduction this fact must be
allowed for, and the method of doing this is disoussed in the next
section.

4.4. The dlatrlbutlon of t (Student's distribution)


The v&ria.ble t is defined &8
z-p
t=--, (4.4.1)
8(Z)

where z is any normally distributed variable and 8{Z) is an estimate


of the standard deviation of z from a Ample of observations of z
(see § 2.6). Tables of its distribution are referred to at the end of the
section. It is the Ame &8 u defined in (4.3.1), except that the deviation
of a value of z from the true mean (P) is expressed in units of the
ulimaled or sample standard deviation of z, 8{Z) (eqn (2.6.2», ra.ther
than the population. standard deviation a{z). As in § 4.3 the numera.tor,
(z-p), is a normally distributed variable with population mean zero
(because the long run avera.ge value of z is p, see Appendix, eqn
(Al.I.8), and estimated standard deviation 8{Z).
The 'distribution of t' me&n8, &8 usual, a formula, too complicated
to derive here, for oaJoulating the frequency with whioh the value of t
would be expected to fall between any specified limits; see example
below. The distribution of t W&8 found by W. S. Gosset, who wrote
many papers on statistical subjects under the pseudonym 'Student', in
a ol&88ical paper called 'The probable error of a mean' whioh W&8
published in 1908.
Gosset W&8 not a professional mathematician. After studying
chemistry and mathematics he went to work in 1899 &8 a brewer at the
Guinness brewery in Dublin, and he worked for this firm for the rest of
his life. His interest in statistics had strong praotical motives. The
majority of st&tistioaJ work being done at the beginning of the century
involved la.rge Amples and the drawing of conclusions from small
aa.mplea W&8 regarded &8 a very dubious prooeaa. Gosset realized that

Digitized by Google
_......
----------~_.-.- --.

76
the methods used for dealing with large samples would need modification
if the results were to be applicable to the small samples he had to
with
work in the laboratory.
Gosset spent a year (1906-7) away from the brewery, mostly working
in the Biometrio Laboratory of University College London with
Karl Pearson, and in 1908 published a paper on the distribution of t .
.As an example, suppose that the normally distributed variable of
interest is i, the mean of a sample of 4 observations selected randomly
from a population of normally distributed values of x with population
mean I' and population standard deviation o'(x). The popUlation
standard deviation of i (or 'standard error', see § 2.7) will be O'(i)

0·4 /-

/
/ '\
I \
I ~'
I \
I \
I \
I \
\

CH

0-0 -4 -I o 4
t
~

FlO. 4.4.1. Conlinuoua Ii"". Diatrlbution of Student's t with 8 degrees of


fieedom. Ninety-five per cent of values lie between -8'182 and +3·182 (see text
for example). The 5 per cent of the area outside these values is shaded. Broktm
Ii"". Standard GaUBBian (normal) distribution. 95 per cent of u values lie
between u = -1·96 and u = + 1'96. The 5 per cent of the area outside these
values is shaded vertically. As the sample size (degrees of fieedom) become very
large, the t distribution becomes identical with the standard normal distribution.

= O'(x)/...j4 (by (2.7.9» and the population mean of i will be 1', the
same as for x. (See Appendix I, (AI.2.3).) Therefore if a very large
number of samples of 4 were taken, and if for each u = (i-I')/O'(i)
(from the definition (4.3.1» were calculated, it would be found that in
the long run 96 per cent of the values of u would lie between u = -1'96
and u = +1·96, as disoussed in § 4.3 and illustrated in Fig. 4.4.1.

Digitized by Google
§ 4.4 77
However, if O'(z) were not known, an estimate of it, 8(Z), oould be
calculated from each sample of 4 observations using (2.6.2) as in the
example in § 2.7, and from eaoh sample 8(Z) = 8(Z)/ v'4 obtained by
(2.7.9). For each sample' = (Z-I')/8(Z) (from the definition (4.4.1»
oould be now calculated. The values of Z would be the same as those
used for calculating u, but the value of 8(Z) would differ from sample
to sample, whereas the same population value, O'(z), would be used in
calculating every value of u). The extra variability introduoed. by
variability of 8(Z) from sample to sample means that t varies over a
wider range than u, and it can be found from the tables referred to
below, that it would be expeoted that, in the long ron, 95 per cent of
+
the values of t would lie between -3·182 and 3·182, as illustrated in
Fig. 4.4.1.
Notice that both the distributions in Fig. 4.4.1 are baaed on observa-
tions from the normal distribution with population standard deviation
0'. The distribution of t, unlike that of u, is not normal, though it is
baaed on the assumption that Z is normally distributed.
Although the definition of t (4.4.1) takes aooount of the uncertainty
of the estimate of O'(z), it still involves knowledge of the true mean I'
and it might be thought at first that this is a big disadvantage. It will
be found when tests of significance and oonfldence limits are disoussed
that, on the oontrary, everything neoessa.ry can be done by giving I' a
hypothetical value.

PM. use oJlablu oJ tM. diBtribulitm oJ t


The extent to whioh the distribution of , differs from that of u will
clearly depend on the size of the sample used to estimate 8(Z). The
appropriate measure of sample size, as disoussed in § 2.6, is the number
of degrees of freedom assooiated with 8(Z). If 8(Z) is calculated from a
sample of N observations the number of degrees of freedom associated
with 8(Z) is N - I as in § 2.6. Clearly, , with an infinite number of
degrees of freedom is the same as u, because in this case the estimate
8(Z) is very aocurate and beoomes the same as O'(z).
Fisher and Yates (1963, Table 3, p. 46, 'The distribution oft') denote
the number of degrees of freedom n. and tabulate values such that t
has the specified probability of falling above the tabulated value or
below minus the tabulated value. Looking in the table for n. = 4-1
= 3 and P = 0·05 gives, = 3·182 as disoU88ed in the example above,
and illustrated in Fig. 4.4.1 in whioh the 5 per cent of the area outside
,= ±3·182 is shaded.

Digitized by Google
78 § 4.4
The Biometrika Tables of Pearson and Hartley (1966, Table 12,
p. 146, 'Percentage points of the t distribution') give the sa.me sort of
table. The number of degrees of freedom is denoted" and the probability
2Q. Q being the sha.ded area. in one tail of Fig. 4.4.1.
4.6. Skew distributions and the lognormal distribution
In § 4.2 it waa stressed that the normal distribution is a mathe-
matical convenience that cannot be supposed to represent real life
adequately, and that it is very rare in experimental work for the
distribution of observation to be known. In those cases where the
distribution baa been investigated it haa often been found to be non-
normal. Distributions may be more fiat-topped or more sharp-topped
than the normal distribution, and they may be 'II/MYmmetrical. Un-
symmetrical distributions may have poBititJe 8kew aa in Fig. 4.5.1
(an even more extreme case is the exponential distribution Fig. 5.1.2),
or negative 8kew, aa in the mirror image of Fig. 4.5.1.

0·07
~
.~
.
~"
.
:.
•0;:

1
~
e
g"

0·01

30
z
Flo. 4.6.1. The lognormal distribution; a positively skewed. probability
distribution. The mean value of z is greater than the median, and the mode is
1_ than the median. The 60 per cent of the area that lies (by definition) below
the median is ahaded. For the lognormal distribution, in general, mode = antilog1o
(p-2'S026al') (= 6'81 in this example), median = antUog1ol' (= 10·0 in this
example), mean = antUog1o (p+l'1613al') (= 13'1 in this example), where
p and a2 are mean and variance of the (normal) distribution of the IOgloZ shown in
Fig. 4.6.2. Reproduced from Documen14 Geigy BCientiJlc tabla, 6th edn, by per-
mhBion of J. R. Geigy S.A., Baale, Switzerland.

In the case of symmetrical distributions (such aa the normal) the


population mean, median, and mode (see § 2.5) are all the sa.me, but this
is not so for unsymmetrical distributions. For example. when the
distribution of % haa a positive skew, aa in Fig. 4.5.1, the population

Digitized by Google
§ 4.5 The GamBian. and other oon.tin.UQU8 diBtributiOfl.8 79
mean is greater than the population median which is in turn larger
than the population mode. There is no particular reason to prefer the
mean to the median or mode as a measure of the 'average' value of the
variable in a case like this. A reason for preferring the median is men-
tioned below (see also Chapter 14). The distribution of personal incomes
has a positive skew so the most frequent income (the mode) is less than
the mean income, and more people earn less than the mean income than
earn more than the mean income, because incomes above the mean are,

mode, median,mean

0·2

FlO. 4.5.2. The distribution of logl~' when :z is lognormally distributed &8


shown in Fig. 4.5.1. This distribution is normal (by definition of the lognormal .
distribution). In this example the mean (= median = mode) of logl~ is
I' = 1'0, and the standard deviation of IOgl~ is a = 0·32. See text and Chapter
14. Reproduced from Documenla Geigy llCientiftc tabla, 6th edn by permjaaion of
J. R. Geigy S.A., Baale, Switzerland.

on the whole, further from the mean than incomes below it, i.e. more
than 50 per cent of the area under the curve is below the mean, as
shown by the shading in Fig. 4.5.1.
It is usually recommended that non-normal distributions be con-
verted to normal distributions by transforming the scale of z (see
§§ 4.2, 11.2, and 12.2). This should be done when possible, but in
most experimental investigations there is not enough information to
allow the correct transformation to be ascertained. In Chapter 14
an example is given of a variable (individual effective dose of drug)
with a positively skewed distribution (Fig. 14.2.1). In this particular
example the logarithm of the variable is found to be approximately
normally distributed (Fig. 14.2.3). In general, a variable is said to
follow the logn.ormal diBtrilYution., which looks like Fig. 4.5.1, if the
logarithm of the variable is normally distributed, as in Fig. 4.5.2.

Digitized by Google
80 §4.5
In Chapter 14 the median value of the variable (rather than the
mean) is estimated. The median is unohanged by transformation, i.e.
the population median of the (lognormal) distribution of z is the
antilog of the population median (= mean = mode) of the (normal)
distribution of log z, whereas the population mode of z is smaller, and
the population mean of z greater than this quantity (of. (2.0.4». For
example, in Fig. 4.0.2 the median = mean = mode of the distribution
of IOglO z is 1'0, and the median of the distribution of z in Fig. 4.0.1
is antilog1o 1 = 10, whereas the mode is less than 10 and the mean larger
than 10.
Because of the rarity of knowledge about the distribution of observa-
tions in real life these theoretical distributions will not be discussed
further here, but they occur often in theoretical work and good accounts
of them will be found in Bliss (1967, Chapters 5-7) and Kendall and
Stuart (1963, p. 168; 1966, p. 93).

4.1. Taatlng for Gaualan dlatrlbutlon. Ranklta and problta


If there are enough observations to be plotted as a histogram, like
Figs. 14.2.1 and 14.2.2, the probit plot described in §§ 14.2 and 14.3
can be used to test whether variables (e.g. z and log z in § 14.2) follow
the normal distribution. For smaller samples the rankit method
described, for example, by Bliss (1967, pp. 108, 232, 337) can be used.
For two way olassifications and Latin squares (see Chapter 11) there is
no practicable test. It must be remembered that a small sample gives
very little information about the distribution; but conri8tem non-
linearity of the rankit plot over a series of samples would suggest a
non-normal distribution. The N observations are ranked in ascending
order, the rankit corresponding to each rank looked up in tables (see
Appendix, Table A9). Each observation (or any transformation of the
observation that is to be tested for normality) is then plotted against
its rankit.

The r&D1dt correspondlng to the amaUeat (next to amaUeat, etc.) obeervation,


Is deflned as the long run average (expectation, see t AI.I) of the amaUeat (next
to amaUeat, etc.) value in a random sample of N standard normal deviates (values
of v, see t 4.8). Thus, if the obeervationa (or their transt'ormationa) are normally
dIatrlbuted, the obeervationa and ranJdta should dUrer only in aoale, and by
random sampling, 80 the plot should, on average, be straight.

Digitized by Google
5. Random processes. The exponential
distribution and the waiting time
paradox

6.1. The exponential distribution of random intervals


DYN ..uno processes involving probability theory suoh as queues.
Brownian motion. and birth and deaths are called 8tocha8tic prOCe88e8.
This subject is disoussed further in Appendix 2. An example of interest
in physiology is the apparently random occurrence of miniature
post-junctional potentials at many synaptio junctions (reviewed by
Hartin (1966». It has been found that when the observed number (n)
of the time intervals between events (miniature end-plate potentials).
of duration equal to or less that t seconds is plotted against t. the
curve has the form shown in Fig. 5.1.1. Similar results would be
obtained with the intervals between radio80tope disintegrations; see
fA2.5.
The observations are found to be fitted by an exponential curve.

(5.1.1)

where N = total number of intervals observed and '1' = mean duration


of all N intervals (an estimate of the population mean interval, :T).
H the events were ocourring randomly it would be expected that the
number of events in unit time would follow the Poisson distribution. as
desoribed in § § 3.5 and A2.2. How would the intervals between events
be expeoted to vary if this were 80 ,
The true mean number of events in t seconds (called ". in § 3'5) is
tl:T. whioh may be written as At, where ). = 1/:T is the true mean
number of events in 1 second. Thus :T = 1/), is the mean number of
seconds per event, i.e. the mean interval between events (see (AI. 1. 11».
Aocording to the Poisson distribution (3.5.1) the probability that no
event (,. = 0) occurs within time t from any specified starting point,
i.e. the probability that the interval before the first event is greater
than I, is P(O) = e-" = e-·u . The first event must oocur either at a

Digitized by Google
82 § 5.1
time greater than, or at a time equal to or less than t. Because these
cannot both ha})pen it follows from the addition rule, (2.4.3), that
p[interval>t]+p[interval ~ t] = 1
and thus
p[interval ~ t] = F(t) = l_e- At (for t ~ 0). (5.1.2)
(The distribution function, F, was defined in (4.1.4).

;
-5 0·9
.I. 0·8
0
.s
-; 0·7
::I 0·632
2'
a 0·6
1 0·5
::I
"CI
'0
-i
~
t
.5 3·0
...e
0·4
.,
-'
[:'
~,
,
Gi'
>,
-'
._,
C' ",
~,
C
0·2 c,
0 c'
._,
oS' '-,
i c,
~,
8- i'
);:
[ 0·1 -','
, ....
,,,
~
~ 0·693 1·0 2·(1 3·0 4·0
~ lJuration of interval (as a multiple of the lIIt'an intt'l'\'al)'\ t

Flo. 6.1.1. The cumulative exponential distribution (eqn. (6.1.2». The


intervals between random events are observed to fall on a curve like this. The
abscissa is the interval duration expressed as a multiple of the population mean
interval, i.e. it is tifT =
)J. For example, if the population mean were fT = 10 a
(i.e. A = 0'18- 1 ), the graph shows that 63·21 per cent of intervals would be
10 a or ahorter, and that 60 per cent of intervals (by definition of the median)
would be equal to or less than 6·93 a, the population median.

Multiplying this probability by N predicts the number of intervals


shorter than t as N(1-e- A1 ), as observed (see (5.1.1».
This implies that the exponential di8tribution is the distribution of
the interval between any specified point of time and the point at

Digitized by Google
§ 5.1 83
which the next event occurs. And, in particular, it is the distribution of
the time interval between succe88ive events (see § 5.2 and Appendix 2).
Because the intervals can be of any length this is a continuous distribu-
tion, unlike the Poisson, and it has probability density (see § 4.1), using
(5.1.2) and (4.1.5),
dF(t) d
I(t) = - - =-(I_e- At ) = Ae- At (for t :> 0), (5.1.3)
dt dt
=0 (for t < 0).

0·693 1·0 2·0 3·0 4·0


~I. duration of interval (as a multiple of the mean interval)
Flo. 5. 1. 2 • The exponential distribution (an extreme case of the positive
skew illUBtrated in Fig. (.5.1). Fifty per cent of the area. under the curve liee
below the median. The area up to t is plotted against t in Fig. 5.1.1. The abscissa
ill plotted in the same way as in Fig. 5.1.1. (If the abscissa is multiplied by 9'"
== A-I to convert it to time units, the probability density would be divided by 9'",
80 the area under the curve remained 1·0.)

This exponential distribution of the lengths of random intervals is


plotted in Fig. 5.1.2. It is an extreme form of positively skewed distri-
bution (see § 4.5), the mode being zero, the mean IIA = or, and the
median 0·693.r (this is proved in Appendix 1, (A 1. 1. 11) and (Al. l. 14».

Digitized by Google
§ 5.1
Fig. G.1.1 is the oumulative form, F(') (see (4.1.4}}, of the exponential
distribution (of. Fig. 4.1.4, whioh is the oumulative form of the normal
distribution in Fig. 4.1.3). To obtain Fig. G.1.1 from Fig. G.1.2 notice
that the probability of observing an interval ~, is given by the area
under the distribution ourve (Fig. G.1.2) below I, i.e. between 0 and ,
(see § 4.1). This, using (4.1.4), is

.P[O ~ interval ~ '1 = F(') = J:le-Atdl = l-e- At , (G. 1.4)

whioh is (G.1.2) again. Further diaouaaion will be found in Appendix 2.


A more complete diaouaaion of the Poisson prooeaa would require
consideration of the distribution of the awn of n intervals. When this
is done it is seen that the observation of an exponential distribution
does not neoeaaarily imply a Poisson distribution of events in unit time
unless the intervals are independent of each other. Independence has
been ohecked experimentally by Burnatock and Holman (1962). This
independence is one of the defining properties of the Poisson prooeaa
(see § 3.G and Appendix 2).

6.2. The waiting time paradox


It was implied in § G.I that, for completely random events, the
average length of time from a randomly selected arbitrary point of
time (midday, for example) until the next event is the same (viz. r )
as the average length of the interval between two events (both intervals
have the same exponential distribution). This is proved in § A2.6.
(An arbitrary point, in this context, means a point of time chosen by
any method that is independent of the occurrence of events.) It must
be 80 since the events in non-overlapping time intervals are supposed
independent, i.e. the proce88 has no 'memory' of what has gone beforet.
Yet it seems 'obvious' that, since the arbitrarily selected time is equally
likely to fall anywhere in the interval between two events, the average
waiting time from the selected time to the next event must be Ir.
For example, if buses were to arrive at a bus stop at random intervals,
with a mean interval of r = 10 min, then a person arriving at the
bus atop at an arbitrary time might be supposed, on the average, to
have to wait Gmin for the next bus.t In fact, the true average waiting
time would be 10 min.
t See §§ 3.5. AU and AU for c:letaiJa.
t 5 min would be the npt AII8Wer if the bU8III arrived I'flIUJarly not randomly. 10
that all iaterva18 were euctly 10 min.

Digitized by Google
16.2
The subtle flaw in the argument for a waiting time of 19" lies in the
implicit &88umption that the interval in whioh an arbitrarily selected
time falls is a random selection from all intervals. In fact, longer
intervals have a better ohance of covering the selected time than
shorter ones, and it can be shown that the average length of the interval
in which an arbitrarily selected time falls is not the same &8 the average
length of all intervals, r, but is actually Jr (see I A2.7). Since the
selected time may fall anywhere in this interval, the average waiting
time is half of Jr, i.e. it is r, the average length of all intervals, &8
originally suppoeed. The paradox is resolved. In the bus example this
means that a peraon arriving at the bus stop at an arbitrary time would,
on average, arrive in a 20-min interval. On average, the previous bus
would have passed 10 min before his arrival (&8 long &8 this was not
too near the time when bU888 started running) and, on average, it
would be another 10 min until the next bus.
These aasertions, whioh surprise moat people at &rat, are disousaed
(with examples of biological importance), and proved, in Appendix 2.

Digitized by Google
6. Can your results be believed?
Tests of significance and the analysis
of variance

'. • • before anything was known of Lydgate's skill, the judgements on it had
naturally been divided, depending on & sense of likelihood, situated perhaps in
the pit of the stomach, or in the pineal gland, and diflering in its verdicts, but not
leas valuable as & guide in the total deficit of evidence. '
GEORGE ELIOT
(Middlemarch, Chap. 45)

8.1. The interpretation of tests of significance


THIS has already been discussed in Chapter 1. It was pointed out
that the function of significance tests is to prevent you from making a
fool of yourself, and not to make unpublishable results publishable.
Some rather more technical points can now be discussed.

(1) Aid.! to judgement


Tests of significance are only aids to judgement. The responsibility
for interpreting the results and making decisions always lies with the
experimenter whatever statistical calculations have been done.
The result of a test of significance is always a probability and should
always be given as such, along with enough information for the reader
to understand what method was used to obtain the result. Terms such
as 'significant' and 'very significant' should never be used. If the reader
is unlikely to understand the result of a significance test then either
explain it fully or omit reference to it altogether.

(2) A8sumption.!
Assumptions about, for example, the distribution of errors, must
always be made before a significance test can be done. Sometimes
some of the assumptions are tested but usually none of them are
(see §§ 4.2 and 11.2). This means that the uncertainty indicated by the
test can be taken as only a minimum value (see §§ 1.1 and 7.2). The
assumptions of tests involving the Gaussian (normal) distribution are
discussed in §§ 11.2 and 12.2. Other assumptions are discussed when
the methods are described.

Digitized by Google
§ 6.1 Puts 0/ significance and tke analysi8 0/ variance 87
Some teeta (nonparametric tests), which make fewer &88umptions than
those ba.sed on a specified, for example normal, distribution (parametric
tests such a.s the t test and ana.lysis of variance), are described in the
following sections. Their relative merits are discussed in § 6.2. Note,
however, that whatever test is used, it remains true that if the test
indica.tes that there is no evidence that, for example, an experimental
group differs from a control group then the experimenter cannot
rea.sonably suppose, on the basis of the experiment, that a real difference
exists.

(3) The basi8 and the ,UUltB 0/ ttBt8


No statements of intJef'8e probability (see § 1.3) are, or at any rate
need be, made a.s a result of significance tests. The result, P, is always
the probability that certain observations would be made given a
particular hypothesis, i.e. if that hypothesis were true. It is not the
probability that a particular hypothesis is true given the observations.
It is often convenient to start from the hypothesis that the effect
for which one is looking does not exist. t This is ca.lled a null AypotheriB.
For example, if one wanted to compare two means (e.g. the mean
response of a group of patients to drug A with the mean response of
another group, randomly selected from the aa.me population, to drug B)
the variable of interest would be the difference between the two means.
The null hypothesis would be that the true value of the difference wa.s
zero. The amount of sca.tter that would be expected in the difference
between means if the experiment were repeated many times ca.n be
predicted from the experimental observations (see § 2.7 for a full
discussion of this proce88), and a distribution constructed with this
amount of sca.tter and with the hypothetica.l mean value of zero, a.s
illustrated in Fig. 6.1.1. From this it ca.n be predicted what would
happen i/ the null hypothesis that the true difference is zero were true.
In practice it will be neCe88a.ry to allow for the inexactne88 of the
experimental estimate of error by considering, for example, the
distribution of Student's t, see §§ 4.4 and 9.4, rather than the distribu-
tion of the difference between means itself. If the differences are
supposed to have a continuous distribution, a.s in Fig. 6.1.1, it is clearly
not poaaible to calculate the probability of seeing exactly the observed
difference (see § 4.1); but it is poaaible to ca.lculate the probability of
seeing a difference equal to or larger tlan the observed value. In the
example illustrated this is P = 0·04 (the vertica.lly shaded area) and
t See p. 93 Cor. more oritical diaoU88ion.

Digitized by Google
88 TutB oj Bipificance and the analY8i8 oj mriat&ce 6.1
this figure is described as the result of a one-tail Bipificance test. Its
interpretation is disoU88ed in (4) below. It is the figure that would be
used to test the null hypothesis against the alternative hypothesis that
the trw difference is positive. When the alternative hypothesis is that
true difference is poBitit1e, the result of a one-tail test for the
difference between two means always has the following form.

11 there were no ditrerence between the true (population) means


tAlm the probability of obaerving, because of random sampling
error, a ditrerence between sample means equal to or greater
than that observed in the experiment would be P (aEUJDing the
888UJDptioDS made in carrying out the test to be true).

4 per cent of al'l'.a


in opposite tail
(included for
two-tail J»

II -Negative II P08itive
differenL'e8 f differences between mealUl
Hypothetical Obaervt'd
population difference
difference
Flo. 6. 1. 1. Basis of aigniflcance teste. See text for explanation.

H the only po88ible alternative to the null hypothesis is that the


true difference is negative, then the interpretation is the aame, except
that it is the probability (on the null hypothesis) of a difference being
equal to or Iu8 tAaB the observed one that is of interest.
In practice, in research problems at least, the alternative to the null
hypothesis is usually BOt that the true difference is positive (or that it is
negative) but simply that it differs from zerot (in either direction),
because it is uaually not reasouable to say in advance that only positive
(or negative) differences are possible (or that only positive differences
are of interest 80 the test is not required to detect negative differences).
t See Uo p. 93.

Digitized by Google
§6.1 89
If the alternative to the null hypothesis is the hypothesis that the true
difference between means is, say, positive, this implies that however
large a negative difference W&8 obsenJed it would be attributed to
chance rather than a Iru6 (population) negative difference (or at least
that it would be considered of no interest if real).
Suppoae now that it cannot be specified beforehand whether the Iru6
difference between means is positive, zero, or negative. In the example
above there would be probability of 0·04 of seeing a difference at least
as large as the positive difference obaerved in the experiment iJ the null
hypothesis were true. But there would also be a probability of 0~04 (the
horizontally shaded area) of seeing a deviation from the null hypothesis
at least as extreme &8 that actually obaerved but in the opposite direc-
tion. The total probability of obaerving a deviation from the null
hypothesis (in either direction) at least &8 extreme &8 that actually
obaerved would be P = 0·04+0·04 = 0·08 iJ the null hypothesis were
true. This is the appropriate probability because, if it were resolved
to reject the null hypothesis as falae every time an experiment gave a
difference between means as large as, or larger than that obaerved in
this experiment, then, iJ the null hypothesis were actually true it
would be rejected (wrongly) not in 4 per cent of repeated experiments,
but in 8 per cent. This is because negative obaerved differenoea in the
lower tail of Fig. 6.1.1, which would also lead to wrong rejection of the
null hypothesis, would be just as common. in the long run, &8 positive
differenoea. The probability is ohoaen so &8 to control the frequency of
this IOrt of error. This is disCUBBed in more detail in subaection (6)
below.
The value P = 0·08 is described as the result of a 'wo-ta.il ~ oJ
BipipflU. Its interpretation is diacUBSed in subaection (4) below. The
value of P is usuallyt twice that for a one-tail test. The result of a
two-tail teat always has the following form.

1/ the null hypothesis were actually true tMn the probabWty of a


sample showing a deviation from It, In either dlrectlon, as
e:r.treme. t or more e:r.treme. than that obeerved In the e:r.perlment
would be P (aaawnlng the &IIIIUII1ptiona made In carrrlnIr out the
teet to be true).

t In the OllIe of the normal diatribution (I 4.2), or any other diatn'bution that ia
8)'IDIIl8trioaI, whether oontinuoUi or diaoontinUOUl, for eumple the binomial diatribution
with 8' ... 005 (§§ 3·2 and 3·4) or Student'. diatn'bation, (14.4), one oould _y here •••••
deviation from it, in either direction, cu lorp cu, or lorger Illata, that obeerved in the

Digitized by Google
90 Tuts oJ significance and the analY8i8 0/ variance § 6.1
Notice that P is not the probability that the null hypothesis i8 true
but the probability that certain observations would be made i/ it were.
Perhaps the best popular interpretation of P is that it is the ~proba­
bility of the results occurring by chanoe'. Although this is inaccurate
and vague, and should therefore be avoided, it is not too misleading.

(4) Interpretation o/Ihe reauU8


If P is very small the conclusion drawn is that either
(a) an unlikely event has taken place, the null hypothesis being
true. As Fisher (1961) said: ' . . . no isolated experiment, how-
ever significant in itself, can suffioe for the experimental demon-
stration of any natural phenomenon; for the "one chanoe in a
million" will undoubtedly occur, with no less and no more than
its appropriate frequency, however surprised we may be that it
should occur to u,' or
(b) the assumptions on which the test was based were faulty, for
example the samples were not drawn randomly, or
(c) the null hypothesis is not true, for example the true (population)
means in the above example are different, so that the drugs do
in fact differ in their effects on patients (see also subsection (7),
below).
Whether (b) can be ruled out, and what level of improbability
is enough to make one favour explanation (c) rather than (a), are

experiment • • • ' In general, this simpler statement is not poaaible, however. Two
other c.- must be considered. (1) The -.mpling distribution (e.g. Fig. 6.1.1) is con-
tinuoua but unsymmetrical (_ § 4.5). In this caae dift'erent sized positive and negative
deviations will be needed to cut oft' equalareaa in the upper and lower tails (respectively)
of the distribution. It is the a:tremenua (i.e. rarity) of the deviation measured by the
arM it cute oft' in the tail of the distribution (rather than ite .me) that matters. The
two-tail probability is still twice the one-tail probability, however. (2) The ampling
distribution is both unsymmetrical and diacontinuoua (as often happens in the very
important IOrt of testa known as randomization testa, _ §§ 8.2, 9.2, 9.3, and 10.2-
10.4). A greater difficulty an- in this caae because the most extreme obeervations in the
opposite tail of the distribution (that not containing the obeervation) will not genera1ly
cut oft' an arM exactly the -.me as that cut oft' by the observation in ite own tail 10 P
for the two-tail teat cannot be exactly twice that for the one·tail teat. There is no
definite rule about what to do in this caae. Moat commonly a deviation is chosen in the
opposite direction to that obeerved that cute cute oft' an area in the opposite tail tIM
gntJIlIr lllan the value found in the one-tail teat, 10 the two-tail P is not greater than
twice the one-tail P. However, it may be decided to ohooae a deviation that oute oft'
an area in the opposite tail that is cu near a.t poNible to that of the one-tail teat. This
ia exempWled at the end of § 8.2 where the deviations of a from the null hypothetical
value are stated, to show exactly what baa been done. With amall unequal samples
the moat extreme poaaible obeervation in the opposite tail may out oft' an area far greater
than that in the one tail teat. This problem is diacu.ad in § 8.2.

Digitized by Google
§ 6.1 Tuts 0/ Bignijicance aM the an.alllN 0/ variance 91
entirely matters for personal judgement. The calculations throw no
light whatsoever on these problems. It is often found in the biomedical
literature that P = 0·05 is taken as evidence for a 'significant differ-
ence'. However 1 in 20 is not a level of odds at which most people would
want to stake their reputations as an experimenters and, if there is no
other evidence, it would be wiser to demand a much smaller value
before choosing explanation (c).
A twofold change in the value of P given by a test should make
little difference to the inference made in practice. For example,
P = 0·03 and P = 0·06 mean much the same sort of thing, although
one is below and the other above the conventional 'significance level'
of O·OiS. They both suggest that the null hypothesis may not be true
without being small enough for this conclusion to be reached with any
great confidence.
In any case, as mentioned above, no single test is ever enough.
To quote Fisher (1951) again: 'In relation to the test of significance, we
may say that a phenomenon is experimentally demonstrable when we
know how to conduct an experiment which will rarely fail to give us a
statistically significant result'.

(5) Gmeralizati&n 0/ the ruuZt


Whatever the interpretation of the statistical calculations it is
tempting to generalize the conclusion from the experimental sample to
other samples (e.g. to other patients); in fact this is usually the purpose
of the experiment. To do this it is neceBSaJ"Y to &88ume that the new
samples are drawn randomly from the same population as that from
which the experimental samples were drawn. However, because of
differences of, for example, time or place this must usually remain an
untested &88umption which will introduce an unknown amount of
bias into the generalization (see §§ 1.1 and 2.3).

(6) T'IJ'PU 0/ etTM" aM the pmoer 0/ tuts


If the null hypothesis is not rejected on the basis of the experimental
results (see subsection (7), below) this does Mt mean that it can be
accepted. It is only possible to say that the difference between two
means is not demomtrable, or that a biological assay is not demomtrablll
invalid. The converse, that the means are identical or that the assay is
valid, can never be shown. If it could it would always be poBBible to find
that there was, for example, 'no difference between two means' but
doing such a bad experiment that even a large real difference was not

Digitized by Google
92 Te8t8 of aigftificattee aM the analy8i8 of vanattee § 6.1
apparent. Although this may seem gloomy, it is only common sense.
To show that two population means are identioal exactly, the whole
population, usually infinite, is obviously needed.
An emmple. The supposition that a large P value constitutes evidence
in favour of the null hypothesis is, perhaps, one of the most frequent
abuses of 'significance' tests. A nice example appears in a paper just
received. The essence of it is &8 follows. Differences between membrane
potentials before and after applying three drugs were measured.
The mean differences (d) are shown in Table 6.1.1.

TABLB 6.1.1
d stands for the clifterenee between the membrane potentia.ls (millivolts) in the
presence and absence of the specified drug. The mean of n such ditlerences
isa, and the observed standard deviation of d is B{d). The standard deviation
of the mean clifterence is B{d) = B(d)/vn and values of Student's t are calculated
&8 in § 10.6.

B(d) P (appl'Ox)
Noradrenaline 2'7 10'1 40 1'60 1-7 0·1
Adrenaline 3·4 12·2 80 1·36 2'5 <0·02
Isoprenaline 3'9 10'8 60 1'39 2·8 <0'01

The potentials were about 90 mV 80 the peroentage ohange is small,


but by doing many (110 = 40-80) pairs of measurements, evidence was
found against the null hypothesis that adrenaline has no effeot, using
the paired t test (see § 10.6). Similarly it was inferred that isoprenaline
increases membrane potential. These inferenoes are reasonable, though
the order in whioh treatments were applied was not randomized. In
contrast, the P value for noradrenaline was 0·1 and the authors there-
fore inferred that 'noradrenaline had no effect on membrane potential',
i.e. that the null hypothesis was true. This is completely unjustified.
The apparent effeot of noradrenaline, 2·7 mV, was not muoh smaller
than that for other drugs, and, although the significance test shows
that we cannot be sure that repeating the measurements would give
a similar result, it certainly does not show that we t.U01J.ld not get
similar results. Suppose, perfeotly plausibly, that 80 experiments had
been done with noradrenaline (as with adrenaline) instead of 40. And
suppose the mean difference was 2· 7 mV and the standard deviation of
the differences was 10·1. In this case t = 2'7/(10'1/V80) = 2·4 giving
P < 0·02 a 'significant' result. The size of the difference d = 2·7 mV,
and the scatter of the observations 8(d) = 10'1, is just the same as in

Digitized by Google
§ 6.1 93
Table 6.1.1, but despite this the authors would presumably have oome
to the opposite oonolusion. This is clearly absurd. But if the original
experiment with. = 40 differences had been interpreted &8 'no evidence
for a real effect of noradrenaline' or 'effect. if any, masked byexperi-
mental error' there would have been no trouble. It is reasonable that
the 1arger experiment should be capable of detecting differences that
escape detection in the smaller experiments.
Theee ideas can be formalized by oonsidering the power of a signi-
ficanoe test which is defined &8 the probability that the test will reject the
null hypothesis (e.g. that two population means are equal), this proba-
bility being considered &8 a function of the trw difference between the
means. For example, if the null hypothesis was always rejected when-
ever a test gave P ~ 0·0i5 then, if the null hypothesis really were true it
would be rejected (1Df'Oftg11l) in 6 per cent of trials, &8 explained in
subsection (3) above (see subsection (7), below). The wrong rejection
of a correct hypothesis is called 4. error oJ the fir. kind, and, in this
case, the probability (at) of an error of the first kind would be at = 0·06.
If in fact there was a difference between true population means.
and this real difference was, for example. equal in size to the true
standard deviation of the difference between means (see §§ 2.7 and
9·4) (i.e. the difference, although real, is similar in size to the experi-
mental errors), then it can be shown that a two-tail normal deviate
testt would reject the null hypothesis (this time oorreotly) in 17 per
cent of experiments. However. if the null hypothesis was accepted as
true every time it was not rejected then it would be 1Df'Oftg11l accepted
in 83 per cent of experiments. The wrong acceptance of a false hypothesis
is oalled 4. error oJ the 8econd kiM, and, in this case, the probability
(ft) of this sort of error is P= 0·83.
The power curve for a two-tailed normal deviate test for the difference
between two means is shown in Fig. 6.1.2 and compared with the
power curve for the (non-existent) ideal test that would always accept
true hypotheses and reject false ones. The power of even the best testa
to detect real differences that are similar in size to the experimental
error is quite small.

(7) Some more 81Ibtle poi"" about Bipijimnce tuI8


The critloal reader will, no doubt, have lIODle objeotlona to the arsumenta
preaented in thfa aectlon. It is cWIlcult to pve & COD88D8WJ of informed opinion
t A. ten (_ S 9.4) in which the IJtandard deviation ia aooura~ known (e.,. ~
the ..alp_ are larp) 80 the standard normal deviate, U <_ I 4.8), can be u.d in pJaoe
or, <_ 14.4).

Digitized by Google
94: Tut8 of 8ignificance and tke analY8is of variance § 6.1

(a)

-Tr-
.
!
0·8
0·7
of" 0·6 I p'=U.S3
.~
'0
0·5 :I
I
I
I

£ 0·4 I
I

0·3
0·2
I
0·1 ________ !~ :-~l':~')J II
«=0·05
0·0
-.- -- --
-4 -3 -2 -I 0 +1 +2 +3 +4

1·0
(b)

J
..~
'0
(1·5

O . O L . - - - - - - - - - - -....o' - - - - - - - - '

FlO. 6.1.2. In both ftgures the abscissa. gives the dii'rerence between the
poptclalion meana (exprell8ed 88 a multiple of the standard deviation of the
dii'rerence between means: see § 9.4). (a) The power curve for a two-tail norm&l
deviate test for difterence between two means (see text) when Cl = 0'05, i.e. the
null hypothesis is rejected whenever P < 0'05, 80 if it were actually true it would
be wrongly rejected in 5 per cent of repeated experiments. If the null hypothesis
were false, i.e. there it a difterence between the population means (in this example,
a difference equal in size to one standard deviation of the difference between
meana: see § 9.4) the null hypothesis would be rejected (correctly) in 17 per cent
of experiments and not rejected (wrongly) in fJ = 83 per cent of experiments.
(b) Power curve for the (non-exist.ent) ideal test that always rejects a hypothesis
(population meana equal) when it is false, II.lld never rejects it when it is true.

Digitized by Google
§ 6.1 95
because, although there fa much informed opinion, there fa rather Utt1e COD88D.8U8.
A personal view followa.
The ftrat point concerD8 the role of the null hypotheeia and the role of prior
knowledge, Le. knowledge available before the experiment waa done. It fa widely
advocated nowadays (particularly by Bayealana, aee 111.8 and 2.') that prior
information should be uaed in making atatiatical decisions. There fa no doubt
that thfa fa deairable. AD relevant information should be taken into account in
the aearch for truth, and in aome fielda there are reuonable ways of doing thIa.
But in thfa book the view fa taken that attention must be restricted to the infor-
mation that can be provided by the experiment iteelf. Tbfa fa forced on ua because,
in the sort of amall-acaJe laboratory or clinical experiment with which we are
mostly concerned, no one baa yet devised a way that fa acceptable to the acientiat,
aa oppoaed to the mathematician, of putting prior information in a quantitative
form.
Now it baa been mentioned already that in moet real experiments it fa 1lD1'eal-
iatlo to suppose that the null hypothealat could ever be true, that two treatments
could be 1IMdl'll equl-etrectlve. So fa it reaaonable to con.atract an experiment
to teet a null hypotheala P The &D8Wer fa that it fa a perfectly reaaonable way of
approaching our aim of preventing the experimenter from maldng a fool of
hbnaelf' if, aa recommended above, we say only that 'the eq'HJrimenl provides
evidence aplnat the null hypotheala' (if P fa amall enough), or that 'the eq'HJrimenl
does not provide evidence aplnat the null hypotheala' (if P fa large enough).
The fact that there may be prior evidence, not from the experiment, aplnat the
null hypotheala does not make it unreaaonable to say that the experiment iteelf
provides no evidence aplnat it, in those cases where the observations in the
experiment (or more extreme ones) would not have been unusual in the (admit-
tedly improbable) event that the null hypothesis waa exactly true.
And, because it baa been atreeaed that if there fa no evidence aplnat the null
hypotheala it does not imply that the null hypothesis fa true, the inference from
a large P value does not contradict the prior ideaa about the null hypotheala.
We may still be convinced on prior grounda that there fa a real di1!erence of lOme
IOrt, but aa it fa apparently not large enough, relative to the experimental error
and method of analyafa, to be detected in the experiment, we have no idea of its
size or tlirectton. So the prior knowledge fa of no practical importance.
Another point CODcerD8 the diacuBon of power. It baa been recommended
that the reault of slgnitI.cance teat should be given aa a value of P. It would be
aUly to reject the null hypotheala automatically whenever P fell below arbitrary
level (0·0«; say). Each cue must be judged on its merits. So what fa the justl1lca-
tion for diacusaI.ng in subaections (8) and (6) above, what would happen 'if the
null hypotheeia were always rejected when p.s;;;; 0·05'P As usual, the aim fa to
prevent the experimenter maldng a fool of himself. Suppose, in a particular cue,
that a slgnitI.cance teat gave P = 0·007, and the experimenter decided that, all
thinp considered, thfa should be interpreted aa meaning that the experiment
provided evidence against the null hypotheala, then it fa certainly of interest to
the experimenter to known what would be the consequences of acting conaiatently
in thfa way, in a aeries of imaginary repetitions of the experiment in question.
Tbfa does not in any way imply that given a di1!erent experiment, under di1!er-
ent circumatanCe8, the experimenter should behave in the same way, i.e. use
P = 0·007 aa a critical level.

t Thia remark appliee to point hypoth_, i.e. thOll8 stating that me&DII, populationa,
etc., are idenIicaZ. All the null hypoth_ uaed in this book are of this aort.
a

Digitized by Google
96 Test8 oJ sivnificance and the analysi8 oJ variance § 6.2
8.2. Which 80rt of te8t 8hould be u8ed, parametric or
nonparametric 1
Parametric tests, such as the t test and the analysis of variance are
those based on an assumed form of distribution, usually the normal
distribution, for the population from which the experimental samples
are drawn. Nonparametric tests are those that, although they involve
some assumptions, do not assume a particular distribution. A discussion
of the relative 'advantages' of the tests is ludicrous. If the distribution
is known (not assumed, but known; see § 4.6 for tests of normality),
then use the appropriate parametric test. Otherwise do not. Neverthe-
less the following observations are relevant.

Oharacteri8tiC8 oJ nonparametnc metlwds


(1) Fewer untested assumptions are needed for nonparametric me-
thods. This is the main advantage, because, as emphasized in § 4.2, there
is rarely any substantial evidence that observations follow a normal,
or any other, distribution. The assumptions involved in parametric
methods are discussed in § 11.2. Nonparametric methods do involve
some assumptions (e.g. that two distributions are of the same, but
unspecified, form), and these are mentioned in connection with in-
dividual methods.
(2) Nonparametric methods can be used for classification (Chapter 8)
or rank. (Chapters 9-11) measurements. Parametric methods cannot.
(3) Nonparametric methods are usually easier to understand and use.

OharacteristiC8 oJ parametric metlwds


(1) Parametric methods are available for analysing for more sorts of
experimental results. For example there are, at the moment, no widely
available nonparametric methods for the more complex sort of analysis
of variance or curve fitting problems. This is not relevant when choosing
which method to use, because there is only a choice if a nonparametric
method is available.
(2) Many problems involving the estimation of population parameters
from a sample of observations have so far only been dealt with by
parametric methods.
(3) It is sometimes listed as an advantage of parametric methods that
iJ the assumptions they involve (see § 11.2) are true; they are more
powerful (see § 6.1, para. (6», i.e. more sensitive detectors of real
differences, than nonparametric. However, if the assumptions are not
true, which is normally not known, the nonparametric methods may

Digitized by Google
§ 6.2 Tut8 oj Bipifioance and the aMlyN oj variance 97
well be more powerful, 80 this cannot really be considered an advantage.
In any case, even when the assumptions of parametric methods are
fulfilled the nonparametric methods are often only slightly less powerful.
In fact the randomization tests described in §§ 9.2 and 10.3 are as
powerful as parametric tests even when the 8.88umptions of the latter
are true, at least for large samples.
There is a considerable volume of knowledge about the asymptotic
relative eJlicienciu of various tests. These results refer to infinite sample
sizes and are therefore of no interest to the experimenter. There is less
knowledge about the relative efficiencies of tests in small samples. In
any case, it is always necessary to specify, among other things, the
distribution of the observations before the relative efficiencies of tests
can be deduced; and because it is part of the problem that nothing is
known about this distribution, even the results for small samples are not
of much practical help. Of the alternative tests to be described, eaoh
can, for certain sorts of distribution, be more efficient than the others.
There is, however, one rather distreBSing consequence of lack of
knowledge of the distribution of error, which is, of course, not abolished
by a88uming the distribution known when it is not.
As an example of the problem, consider the comparison of the effects
of two treatments, A and B. The experimenter will be very pleased if a
large and consistent difference between the effects of A and B is
observed. and will feel, reasonably, that not many observations are
nece88&ry. But it turns out that with very small samples it is impossible
to find evidence against the hypothesis that A and Bare equi-effective,
however large, and however consistent, the difference observed be-
tween their effects, unle88 80mething is known about the distributions
of the observations. Suppose, for the sake of argument, that the
experimenter is prepared to accept P = 1/20 (two tail) as small
enough to constitute evidence against the hypothesis of equi-effective-
neBS (see § 6.1). If the experiment is conducted on two independent
samples, each sample must contain at least 4 observations (for all the
nonparametric tests described in Chapter 9, q.v., the minimum poBBible
two-tail P value with samples of 3 and 4 would be 2.3141/71 = 1/17},
however large and consistent the difference between the samples).
Similarly, if the observations are paired, at least 6 pairs of observations
are needed; with 5 pairs of observations the observations on the
nonparametric methods described in Chapter 10, q.v., can never give a
two-tail P le88 than 2.(1)5 = 1/16. (See al80 the discUBBion in §§ 10.5
and 11.9.)

Digitized by Google
98 16.2
In contrast, the parametrio methods can give a very low P with the
smallest samples if the difference between A and B is sufficiently large
and consistent. N et1erl1&elu8, 'hue lads mea. ,hal it is tJ diBtJdt1Gf&lQge noI
10 &aow ,he diBtribution. 01 ,he obsenJaJi0n8. They do not constitute a
disadvantage of nonparametrio tests. The problem is less acute with
samples larger than the minimum sizes mentioned.
In view of these remarks it may be wondered why parametrio tests
are used at all when there are nonparametrio alternatives. In fact they
are still widely used even now. This is partly because of familiarity.
The , test and analysis of variance were in use for many years before
most nonparametrio methods were developed. It probably also results
from the sacrifice of relevance to the real world for the sake of mathe-
matical elegance. Methods based on the assumption of a normal distribu-
tion have been developed to cover a wide range of problems within a
single, admittedly elegant, mathematical framework.
It is not uncommon for those who are dubious about the assumptions
necessary for parametrio tests to be told something along the lines
'experience has shown that the t test (for example) will not mislead
us'. Unfortunately, as Mainland (1963) has pointed out, this is just
wishful thinking. There is no knowl~ at all of the number of times
people have been misled by using the' test when they would not have
been misled by a nonparametric test (see §§ 4.2 and 4.6).
A plausible reason for using tests based on the normal distribution
is that some of them have been shown to be fairly insensitive to some
sorts of deviations from the assumptions on whioh they are based if the
samples are reasonably big. The tests are said to be fairly robu8t. But
this knowledge can usually be used only by intuition. One is never
sure how large is large enough for the purposes in hand. When the
nature and extent of deviations from the assumptions is unknown,
the amount of error resulting from assuming them true is also unknown.
It is muoh simpler to avoid as many as possible of the assumptions.

If a nonparametric test is available it abould be used in pref-


erence to the parametric test, unless there is ezperimenttJl evidence
about the distribution of errol'B.

In spite of what has just been said parametrio methods are discussed
in the following chapters, even when nonparametrio methods exist.
This is necessary as an approach to the more complex experimental
designs, curve-fitting problems, and biological assay for which there are

Digitized by Google
§ 6.2 99
still ha.rdly any nonparametrio methods available, so parametrio tests
or nothing must be used. Whichever test is used, it should be inter-
preted as suggested in §§ 1.1, 1.2,6.1, and 7.2, the uncertainty indicated
by the test being taken as the minimum uncertainty that it is reasonable
to feel.

1.3. Randomization testa


The principle of randomization teBt8, also known as permutaticm
tests, is of great importance because these tests are among the most
powerful ofnonparametrio tests (see § 6.1 and 6.2). Moreover, they are
easier to understand, at the present level, than almost all other sorts
of test and they make very olear the fundamental importance of
randomization. Examples are encountered in §§ 8.2, 8.3, 9.2, 9.3,
10.2, 10.3, 10.4, 11.5, 11.7, and 11.9.

1.4. Types of sampla and types of measuramant


When comparing two groups the groups may be related or inde-
pendent. For example, to compare drugs A and B two groups could be
selected randomly (see § 2.3) from the population of patients, and one
group given A, the other B. The two samples are independent. Inde-
pendent samples are disoussed in Chapters 8 and 9, and in §§ 11.4,
lUi, and 11.9. On the other hand, the two drugs might both be given,
in random order, to the same patient, or to a patient randomly selected
from a pair of patients who had been matched in some way (e.g. by
age, sex, or prognosis). The samples of observations on drug A and
drug B are said to be related in this oa.ae. This is usually a preferable
arrangement if it is possible; but it may not be possible because, for
example, the efFeots of treatments are too long-lasting, or because of
ignorance of what oharacteristics to match. Related samples are
discossed in Chapter 10 and in §§ 8.6, 11.6, 11.7, and 11.9.
The method of analysis will also depend on what sort of measure-
ments are made. The three basic types of measurement are (I) olassifica-
tion (the nominal scale), (2) ranking (the ordinal scale), and (3) num-
erical measurements (the interval and ratio scales). For further details
see, for example, Siegel (1956a, pp. 21-30). If the best that can be
done is claaaifo;ation as, for example, improved or not improved, worse
or no ohange or better, passed or failed, above or below median, then
the methods of analysis in Chapter 8 are appropriate. If the measure-
ments cannot be interpreted in a quantitative numerical way but can

Digitized by Google
100 Te8t8 oJ Bign,ijicance and the analysis oJ variance § 6.4
be arranged (ranked) in order oJ magnitude (as, for example, with arbitrary
soores such as those used for subjective measurements of the intensity
of pain) then the rank methods described in §§ 9.3, 10.4, 10.5, 11.5,
11.7, and 11.9 should be used. For quantitative numerical mea8'Urement8
the methods described in the remaining sections of Chapters 9-11 are
appropriate.
Methods for dealing with a Bingle sa.mple are discussed in Chapter 7
and those for more than two sa.mples in Chapter 11.

Digitized by Google
7. One sample of observations. The
calculation and interpretation of
confidence limits

'Eine Hauptursache der Armut in den Wissenschaften 1st melst eingeblldeter


Reichtum. Es is nicht ihr Ziel, del' unendlichen Weisheit eine Tf1r zu 6trnen,
sondern eine Grenze zu setzen dem unendlichen Irrtum.'t
t 'One of the chief C&UIM of poverty in lOience is u.ua1ly imaginary wealth. The aim
of lOience is not to open a door to infinite wiBdom, but to set a limit to infinite error'.
G ALILEO in Brecht's LfJlHm de8 Galilei

7.1. The repreaentetive velue: mean or median?


I T is second nature to calculate the arithmetic mean of a sample of
observations as the representative central value (see §2.5). In fact this
is an arbitrary procedure. If the distribution of the observations were
normal it would be a reasonable thing to do since the sample mean
would be an estimate of the same quantity (the population mean
= population median) as the sample median (§§ 2.5 and 4.5), and it
would be a more precise estimate than the median. However, the
distribution will usually not be known, so there is usually no reason to
prefer the mean to the median. For more discussion of the estimation of
'best' values see §§ 12.2 and 12.8 and Appendix 1.

7.2. Precision of inferences. Can estimates of error be trusted?


The answer is that they cannot be trusted. The reasons why will
now be discussed. Having calculated an estimate of a population median
or mean, or other quantity of interest, it is necessary to give some sort
of indication of how precise the estimate is likely to be. Again it is
second nature to calculate the standard deviation of the mean-the
so-called 'standard error'-see § 2.7. This is far from ideal because
there is no simple way of interpreting the standard deviation unless the
distribution of observations is known. If it were normal then the
confidence limits, sometimes called confidence intervals, based on the
t distribution (§ 7.4) would be the ideal way of specifying precision
since it allows for the fact that the sample standard deviation is itself

Digitized by Google
102 § 7.2
only a more or less inaccurate estimate of the population value (see
§ 4.4).
As usual it must be emphasized that the distribution is hardly ever
known, so it will usually be preferable to use the nonparametric
confidence intervals for the median (§ 7.3), which do not assume a
normal distribution.
No sort of oonfidence interval, nonparametrio or otherwise, can
make allowance for samples not having been taken in a strictly random
fashion (see §§ 1.1 and 2.3), or for systematio (non-random) errors.
For example, if a measuring instrument were wrongly calibrated so
that every reading was 20 per cent below its correct value, this error
would not be detectable and would not be allowed for by any sort of
confidence limits.
Therefore in the words of Mainland (1967a), confidence limits
'provide a kind of minimum estimate of error, because they show how
little a partioular sample would tell us about its population, even if
it were a strictly random sample'. It seems then that estimates cannot
be trusted very far. To quote Mainland (1967 b) again,
'Any hesitation that I may have had about questioning error estima.te8 in
biology disappeared when I recently learned more about error estimates in that
sanctuary of scientUlc precision-physics.
'One of the most disturbing things about scientUlc work is the failure of an
investigator to con1lrm results reported by an earlier worker. For example in
the period 1895 to 1961, some 15 observations were reported on the magnitude
of the astronomical unit (the mean distance from the earth to the sun). You will
find these summarized in a table • . . which lists the value obtained by each
worker and his estimates of plus or minus limits for the error of the estimate.
It is both entertaining and shocking to note that, in every case, a worker's
estimate is out.idfJ the limits set by his immediate predecessor. Clearly there is
an \lIlN801ved problem here, namely, that experimenters are apparently unable
to arrlYe at realistic estimates of experimental errors in their work" (Youden
1968).
If we add to the problems of the physicist the variability of biological and
human material, and the nonrandomness of our samples from it, we may well
marvel at the confidence with which "confidence intervals" are presented.'
Confidence limits purport to predict from the results of one experi-
ment what will happen when the experiment is repeated under the
same (as nearly as possible) conditions (see § 7.9). But the experimentalist
will not need much persuading that the only way to find out what will
happen is actually to repeat the experiment and see. And on the few
occasions when this has been done in the biological field the results
have been no more encouraging than those just quoted. For example,
Dews and Berkson (1954) found that the internal estimates of error

Digitized by Google
17.2 103
calculated in individual biological aasays were mostly considerably
lower than the true error found by aotual repetition of the aasay. As
Dewa and Berkson point out, if the aasays were performed at different
times or in different laboratories it would probably be said that there
were 'inter-time' or 'inter-laboratory' differences; and if there were
no suoh 'obvious' reasons for the interval error estimates being too
low, then probably 'the animals would be stigmatized as "heterogen-
eous", with more than a hint that there had been too little incestuous
activity among them'. The moral is once again that confidence limits,
or other estimates of error calculated from the internal evidence of an
experiment, must be interpreted as lower bounds for the real error.
Nevertheless, on the grounds that a minimum estimate of error is
better than none at all, examples follow. Their interpretation is
discusaed further in 1 7.9.

7.3. Nonparametrlc confldancallmlts for tha median


Limits can be found very simply indeed, without any calculation
at all, using the table of Nair (1940) whioh is reproduced as Table AI.
Consider, for example, determinations of the glomerular filtration
rate (m1/min) from nine randomly selected dogs:
13~ 133 1~4 124 1~3 142 140 134 138.
The observations will be denoted in the usual way (12.1),11,(; = 1, 2,
... , n) and n = 9. Now rank the observations in ascending order; 124,
133, 134, 13~, 138, 140, 142, 1~3, 1M. These observations will be
denoted 11(0(; = 1, 2, ... 9), the parenthesiZed subscript being used to
indicate that the observations have been ranked, i.e. 111 simply denotes
the first observation written down, whereas 11(1) indicates the smallest
of the observations. The sample estimate of the population median is
11(11) = 138 m1/min (using (2.~.5». Reference to Table AI, for the
approximately 95 per cent confidence limits, and for a sample size
n = 9, gives a value r = 2. This means that the second (i.e. the rth)
observation from each end, viz. 133 m1/min (= 1/(2» and 153 m1/min
(= 11(8»' are to be taken as the confidence limits for the estimated
median, 138 m1/min. The table also gives, in the next column after r,
the figure 96,1, which indicates that these are actually 96·1 per cent
confidence limits. The fact that r has to be a whole number makes
it impossible to get exactly 95 per cent limits. There is a probability of
0·961 that the population median is between 1/(2) and 1/(8) in the sense
explained in 1 7.9.

Digitized by Google
104 § 7.3
The reasoning behind the construction of Table Al is roughly as
follows (see Nair (1940) and Mood and Graybill (1963, p. 407». Let m
denote the population (true) median. By definition of the median
(§ 2.5) the probability is 1/2 that an observation selected at random
from the population, which is assumed to follow any continuous
distribution, will be less than m. The probability that i observations
out of n fall below m follows directly from the binomial distribution
(3.4.3) with U' = i, i.e.
n!
i!(n-i)! 2
(1)" (7.3.1)

To find from this the probability that the rth ranked observation,
1/(r), in a sample of n observations, will be greater than the population
median, note that this will be the case if the sample contains either
i = 0 or 1 or...or (r-I) observations below the median, so, by using the
addition rule (2.4.2),
t-r-1 n!
P(Y(r»m) = I . '( _.) , -2 •
(1)"(7.3.2)
t .. o ... n ...
H a 95 per cent confidence limit is required, r is now chosen so as to
make this expression as near as possible to 0·025 (2·5 per cent). In the
above example this means taking r = 2 giving

P(Y(i) > m)
t .. 1 91
= t~oi !(9-i)! 2 =
(1)9 1
512+ 512
9
= 0·0195,
i.e. it is unlikely that Y(i) will be above the population median. Because
of the symmetry of the binomial distribution when U' = i (§ 3.4)
this is also the probability that Y(8) < m; it is equally unlikely that
Y(8) is Ius than the population median. Thus, in general, (7.3.2) also

gives P(Y(,,-r+1)<m). So the probability of the event that either


m < Y(i) or m > Y(8) is, again by the addition rule (2.4.2), 0·0195
+0·0195 = 0·039. H this event does not occur then it mustt be that
t If you find this argument takes you by IIllrPrise, in spite of its mathematica1
impeccability, you may be relieved to find that this view is shared by BOme of the most
eminent mathematica1 statisticians. For example, Lindley (1969) says 'The procedure
which transfers a distribution on:l: to one on 8 through a pivotal quantity such as :1:-8
has always eeemed to me to be reminiacent of a conjuring trick: it all looks very plausible,
but you cannot see how it is done •.•• As a youngman I remember asking E. C. FieUer
to suggest a really difficult problem. His answer was beautifully simple: "The probability
that an obllervation is leas than the median is 1/2: explain why this means that the
probability that the median is greater than the observation is alBO 1/2." I could offer no
really BOund explanation then, and I still cannot.• You may also be relieved to find that,
in spite of the difficulties, virtually all statisticians, faced with experimental results
such as thOile in this section, would reach a conclusion that differed little, if at all, from
that presented here.

Digitized by Google
§ 7.3 105
Y(2) ~ In ~ Y(8)' and the probability of this must be 1-0·039 = 0·961,

as discovered above from Table AI. The general result is

P[y(r) ~ In ~ Y(II-r+l)] = 1- 2
l-r-1 n!
I .'(n _.),.,
(1)"
-2 • (7.3.3)
1=0 , .

and r is chosen so that this is as near as possible, given that r must


be a whole number, to 0'95; or whatever other confidence probability
is required. A very similar sort of statement is found for the mean in
the next section.
The method assumes that the distribution of the observations is
continuous (see § 4.1) so it is not possible for two observations to be
exactly the same. In practice there may be ties because of rounding
errors but this does not matter even though very occasionally a sample
could give the same, say 95 and 99 per cent limits. If the distribution
is really discontinuous then the method is not appropriate.

7.4. Confidence limits for the mean of e normally distributed


variablet

Oonji.de1u;e limit.! Jor the population mean


In the improbable event that the glomerular filtration rate of dogs
was known to follow the normal distribution it would be possible to
calculate confidence limits for the mean of the nine observations given
in § 7.3. The sample mean is'i:.y/n = 1253/9 = 139·2 ml/min, compared
with the sample median of 138 ml/min. The sum of squared deviations
is given by (2.6.5) as
I-II .. ('i:.y,)2 (1253)2
I (YI_j)2 = I y~--- = 175179 :z::: 733·56.
1-1 1-1 n 9
Therefore the variance of Y is estimated to be 8 2 (y) = 733·56/(9-1)
= 91·69; the variance of the mean is r(g) = 91,69/9 = 10·19 by eqn.
(2.7.8), and the estimated standard deviation of the sample mean
glomerular filtration rate is 8(g) = V(10'19) ml/min = 3·192 ml/min.
These estimates have n-l = 8 degrees of freedom (§ 2.6). From this
estimate of scatter, and the assumption that y (and therefore j) is
normally distributed, limits can be calculated within which the mean
of the population from which the observations were lirawn (which may

t The _umption of normality could be tested B8 in § '.6 if there were more observa·
tions, but with one aample of 9 no useful test can be made.

Digitized by Google
106 17.4
or not be the population in which the investigator is really interested)
is likely to lie.
The limits must be baaed on Student's t distribution (§ 4.4) because
only the estimated standard deviation is available. Reference to tables
(see § 4.4) shows that, in the long ron, 95 per cent of values of t (with
8 d.f.) will fall between t = -2·306 and t = +2·306. The definition of
t (eqn. (4.4.1» is (Z-p)/8(Z) where Z is normally distributed. In the
present example the (assumed) normally distributed variable of interest
is the sample mean, g, 80 t is defined as (g-p)/8(g).
It follows that in 95 per cent of experiments t = (g-p)/8(g) is
expected to lie between -2·306 and +2·306, i.e.
P[ -2·306~(g-p)/8(g)~+2·306] = 0·95,
:.p[ -2·306.8(g)~(g-p)~+2·306.8(g)] = 0·95,
:.P[g-2·306.8(9)~p~+2·306.8(g)] = 0·95.t

This statement, which is analogous to (7.3.3), indicates our confidence


that the population mean, 1', lies between the P = 0·95 confidence
limits, viz. g-2·3068(9) = 139·2-(2·306 X 3·192) = 131·8 mI/min and
g+2·3068(9) = 139·2+(2·306 X 3·192) = 146·6 mI/min. Compare the
mean 139·2 mI/min and its P = 0·95 GaUBBian confidence limits,
131·8 to 146·6 mI/min, with the median and its confidence limits found,
with fewer assumptions, in § 7.3.
Condensing the above argument into one formula, the GaUBBian
confidence limits for I' (given an estimate of it, g, the mean of a sample
of" normally distributed observations) are

(7.4.1)

In general, the confidence limits for any normally distributed variable,


z,are
z±t8(Z), (7.4.2)
where the value of Student's t is taken from tables (see § 4.4) for the
probability required and for the number of degrees offreedom associated
with 8(Z).
To be more sure that the limits will include I' (the population
value of g; see § Al.l); they must be made wider. For example, the
value of t for P = 0·99 with 8 d.f. is, from tables, 3·355. That is 0·5
t If tbia argument Ibakel you, _ the footnote in § 7.3 (p. 1(4), reading 11 for III and P
for 6.

Digitized by Google
- - - - , - - - -- -- -- --- ---
- - - - -

17.4 107
per cent (0'005) of the area. under the curve for the distribution of t
with 8 d.f. lies below -3,355 a.nd a.nother 0·5 per cent above +3'355,
a.nd 99 per cent lies between these figures. The 99 per cent Gaussia.n
confidence limits are then 1i±"'(g), i.e. 128·5 to 149·9 ml/min.

OtmfoJert,ce Zimit8/or new ob8tmJtioM


The Umite just found were thoae expected' to contain, p, the population mean
value of fI (and &lao of g. and g. see below). If Umita are required within which a
new oNerwItion from the aame population is expected to He the reault is rath8l"
cWrerent. 8uppoee, &8 above, that ft observations are made of a normally distri-
buted variable, fl. The aample mean is g. and the aample deviation -(fI), MY.
If a further m independent observations were to be made on the aame population,
within what Umita would their mean, g., be expected to HeP The variable
g.-g. wUl be normally distributed with a population value p-p = 0, 80 ,
= (g. -g.)/lI[g. -g.], (see 14.4). UaiDg (2.7.S) and (2.7.8) the estimated variance
is 'U. -g.] = aI(g.HaI(g.) = aI(fI)/ft+aI(fI)/m = aI(fI).(l/ft+ 11m). The best
prediction of the new observation, g., wUl, of course, be the observed 1De&D, g•.
This is the aame &8 the estimate of p, but the confidence Umita must be widel"
bec&uae of the error of the new observations. Aa above, p[ -t < (f. -g.)1
1fJ. -g.] < +'1 = 0'95 80, by rearranging this &8 before, the conJldence limite for
g. are found to be
(7.4.S)

For eumple, a aiDgle new observation (m = 1) of glomel"ular flltration rate


would have a 95 per cent chance (in the sense explained in 17.9) oflying within
the limite calculated from (7.4.S), viz. IS9·2 ± 2·306Y[91·69(1/9+1)]; that is,
from 115'9 ml/min to 162·5 ml/miD. These limite are far wider than thoae for p.
Where m is very large, (7.4.3) reduces to the Gausaia.n Umite for p, eqn (7.4.1)
(&8 expected, becauae in this caae g. becomes the aame thing &8 p).
It is important to notice the condition that the m new observations are from
the aame Gausaia.n population &8 the original ft. As they are probably made
later in time there may have been a change that invalidates this .-umption.

7.&. Confidence limits for the ratio of two normally distributed


ob..rvations
If (J a.nd b are normally distributed variables, their ratio m = (JIb
will not be normally distributed, 80 if the approximate variance of the
ratio is obtained from (2.7.16), it is only correct to calculate limits
by the methods of § 7.4 if the denominator is large compared with its
standard deviation (i.e. if 9 is small, see § 13.5). This problem is quite
a common one because it is often the ratio of two observations that is
of interest rather than, say, their difference.
If (J a.nd b were lognormally distributed (see § 4.5) then log (J and
log b would be normally distributed, and 80 log m = log (J-log b would

Digitized by Google
108 § 7.5
be normally distributed with var(log m) = var(log a)+var(log b)
from (2.7.3) (given independence). Thus confidence limits for log 'In
could be calculated as in § 7.4, log m±tv'[var(log m)], and the anti-
logarithms found. See § 14.1 for a discussion of this procedure.
When a and b are normally distributed the exact solution is not
difficult. But, because it looks more complicated at first sight, it
will be postponed until § 13.5 (see also § 14.1), nearer to the numerical
examples of its use in §§ 13.11-13.15.

7.8. Another way of looking at confidence limits


A more general method of arriving at confidence intervals will be
needed in §§ 7.7 and 7.8. The ingenious argument, showing that limits
found in the following way can be interpreted as described in § 7.9, is
discussed clearly by, for example, Brownlee (1965, pp. 121-32). It will
be enough here to show that the results in § 7.4 can be obtained by a
rather different approach.
For simplicity it will be supposed at first that the population standard
deviation, a, is known. It is expected that in the long run 95 per cent
of randomly selected observations of a normally distributed variable
y, with population mean P and population standard deviation a, will
faU within P± 1·96a (see § 4.2). In § 7.4 the normally distributed
variable of interest was y, the mean of n observations, and similarly,
in the long run, 95 per cent of such means would be expected to fall
within p±1'96a(y), where a(y) = a/v'n (by (2.7.9». The problem is
to find limits that are likely to include the unknown value of p.
Now consider various possible values of p. It seems reasonable to
take as a lower limit, PL say, a value which, if it were the true value,
would make the the observation of a mean as large as that actually
observed (Yoba) or larger a rare event-an event that would only
occur in 2'5 per cent of repeated trials in the long run, for example. In
Fig. 7.6.1(a) the normal distribution of Y is shown with the known
standard deviation a(y), and the hypothetical mean PL chosen in the
way just described.
Similarly, the highest reasonable value for p, say PH' could be
chosen 80 that, if it were the true value, the observation of a mean
equal to or le88 than YOb. would be a rare event (again P = 0'025, say).
This is shown in Fig. 7.6.1 (b). It is clear from the graphs that YOba
= PL+1'96a(fi) = PH-1·96a(y). Rearranging this gives PL = YOb.
-1'96a(fi), and PH = YOb.+1·96a(y). If a is not known but has to be
estimated from the observations, then a(y) must be replaced by

Digitized by Google
----. - - - ----

§ 7.6 109
8(y), and 80 1'96 must be replaced by the appropriate value of Student's
t (for example 2'306 for P = 0·96 limits in the example in § 7.4; see
al80 § 4.4). When this is done PL andPB are the limits previously found
using (7.4.1).

Obeerved
(a) mean
fi...
I
I
I
I
I
I
I
I
I
t-a(y)--t
I
I
:--1·96 a(!}) 2·5 per cent of area

Sample meani
(b)

I
I
I
I
I
I
I
I
t-a(g).l
I
I
I
~.-- 1'96a(g)--I
,.
Observed
Sample mean i
mean
FIG. 7.6.1. One way of looking at confidence limite. See text.

7.7. What is the probability of 'success'? Confidence limits for


the binomial probability
In §§ 3.2-3.5 it was described how the number of successes (r) out of
" trials of an event would be expected to vary in repeated sets of 11.
trials when the probability of 'success' was &J at each trial and the
probability of 'failure' was l-UJ. Usually, of course, the problem is
reversed. &J is unknown and must be estimated from the experimental
results. For example, if a drug were observed to cause improvement in
r = 7 out of 11. = 8 patients, who had been selected strictly randomly
(see § 2.3) from some population of patients, then the best estimate of
the proportion (&J) of patients in the population that would improve

Digitized by Google
no §7.7
when given the drug is rln (as in § 3.4), i.e. 7/8 = 0·876 or 8Nl per
cent. What is the error of this estimate' Would it be unreasonable,
for example, to suppose that the population contained only 60 per cent
of 'improvers' t The answer can be found without any ca.1culation at
all using Table A2, which is based on the following reasoning.
The approach described in 17.6 can be used to find confidence limite for the
population value of~. For concreteness suppose that 95 per cent (or P = 0·95)
confidence limits are required for the population value of" when "oIM 'successes'
have been observed out of n trials. The highest reasonable value of ", "a 8&1',
wiD be taken &8 the value that, if it were the true value, would make the observa-
tion of "OIM or fewer successes a rare event (an event occurring in only 2·5 per cent
of repeated. sets of n trials). Now the probability of" succeaaea PC,,), is given
by (8.4.8), and " <; ,,_ if " = 0 or 1 or..• or "OIM' so, using (8.4.8) and the
addition rule, (2.4.2), it is required that

'-'.'" nl (7.7.1)
p[" <; "oIMJ = 1: 1
,-0 " (n-,,) 1"*(I-"a)"-' = 0·025.

The only unknown in this equation is "a, the upper confidence limit for the
population proportion, so it can be solved for "a. There is no simple way of
rearranging the equation to get ~a however, so tables are provided (Table A2)
giving the solution. Similarly, the lowest reasonable value, "L' for the population
~ (the lower confidence limit for ") is taken &8 the value that, if it were the
true value, would make the observation of "ob. succeaaea or more (i.e. " = "OIM
or,,_+ 1 Of••• or n) a rare event. Thus "L is found by solving

(7.7.2)

Again the solution is tabulated. in Table A2.

The t.&8e oJ Table A2


Confidence limits (96 and 99 per cent) for the population value of
1009' are tabulated for any observed r, and sample sizes from n = 2
to n = 30, and also some values for n = 1000 for comparison. Other
sample sizes are tabulated in the Documenta Geigy Scientific Tables
(1962, pp. 86-103). In the example at the beginning of this section
r = 7 out of n = 8 patients improved (100rln = 87·6 per cent
improvement). Consulting Table A2 with n = 8 and r = 7 shows
that the P = 0·96 confidence limits (1009'1. to 1009'B from (7.7.1»
and (7.7.2) are 47·36 to 99·68 per cent. In other words, if repeated
samples of 8 were taken from a population that actually contained
47·36 per cent of improvers, 2·0 per cent of the samples would contain
7 or more (i.e. 7 or 8) improvers. And if the population actually oon-
tained 99·68 per cent of improvers then, in the long run, 2·6 per cent

Digitized by Google
§ 7.7 HI
of samples would contain 7 or fewer improvers. Thus, if the drug were
tested on an infinite sample (rather than only 8) it would not be sur-
prising (see § 7.9 for a more precise interpretation) to find any propor-
tion of patients improving between ~L = 0·4735 and ~B = 0·9968.
The observation is compatible with any hypotAetical population ~
that lies between the confidence limits (see § 9.4) 80 the observation of
7 improving out of 8 cannot be considered incompatible with a true
improvement rate of 50 per cent (~ = 0·5) at the P = 0·95 level of
significance. For greater certainty the P = 0·99 confidence limits
would be found from the tables. They are, of course, even wider,
36·85 to 99·94 per cent. A sample of 8 gives surprisingly little informa-
tion about the population it was drawn from, even when all the assump-
tions of randomness and simple sampling (see § 3·2) are fulfilled.
The comparison of two ob8enJed binomial proportions is a different
problem. It is discussed in Chapter 8.

7.8. The black magical assay of purity in heart as an example of


binomial eampllng
In a sadly neglected paper, Oakley (I943) proposed an assay method
for purity in heart. Oakley points out that lack of statistical knowledge
may vitiate a worth-while experiment the apparent failure of which
may deter others from repeating it, and that this fate seems to have
overtaken an experiment carriod out many years ago in Germany.
The only known 80urce (Anon 1932) describes the experiment thus:
'The legend of the Brooken (the famous peak in the Harz Mountains noted for it.
"spectre" and as the haunt ofwitchea on Walpurgia Night), according to which a
"virigin he-goat" can be converted into "a youth of aurpaaaing beauty" by
apells performed in a magic circle at midnight, was tested on June 17th by
Britiab and German scientlstB and investigators, including Profeuor Joad and
Mr. Harry Price of the Natioaal Institute of Psychical Research. The object was
to expose the fallacy ofBJack Magic and also to pay a tribute to Goethe, who used
the legend in "Faust". Some wore evening dress. The goat was anointed with the
prescribed compound of scrappinga from church bella, bats' blood, soot and honey
The neceaaary "maiden pure in heart" who removed the white sheet from the
goat at the critical moment, was Fraulein Urta Bohn, daughter of one of the
German profeuors taking part in the teat. Her mother was a Scotswoman (form-
erly Mias Gordon). The scene was floodlit and fllmed. As our photographs show,
the goat remained a goat, and the legend of the Brooken was dispelled I'

The main variables are the virgin he-goat and the maiden pure in
heart. Virginity may for the present be regarded as an absolute char-
acter, but purity in heart no doubt varies from person to person.

Digitized by Google
112 The calculation and interpretation oJ confidence limits § 7.8
Oakley therefore supposed that it might be possible to estimate the
purity in heart index (PHI) of a maiden by observing how many of a
group of he-goats are converted into young men. The original experi-
menters were clearly guilty of a grave scientific error in using only one
he-goat.
We shall assume, as Oakley did, that the conversion of he-goats into
young men is an all-or-nothing process; either complete conversion or
nothing occurs. Oakley supposed, on this basis, that a. comparison
could be made between, on one hand, the percentage of he-goats
converted by maidens of various degrees of purity in heart, and, on the
other hand, the sort of pharmacological experiment that involves the
measurement of the percentage of individuals showing a specified
effect in response to various doses of a drug. In conformity with the
common pharmacological practice he supposed that a plot of percent-
age he-goat conversion against log purity in heart index (log PHI)
would have the sigmoid form shown in Fig. 14.2.4. As explained in
Chapter 14, this implies that log PHI required to convert individual
he-goats is a normally distributed variable. Furthermore it means that
infinity purity in heart is required to produce a population he-goat
conversion rate (HGCR) of lOO per cent.
Although there is a lack of experimental evidence on this point,
the present author feels that the assumption of a normal distribution
is, as so often happens, without foundation (see § 4.2). The implication
of the normality assumption, that there exist he-goats so resistant to
conversion that infinite purity in heart is needed to affect them, has
not been (and cannot be) experimentally verified. Furthermore the
very idea of infinite purity in heart seems likely to cause despondency
in most people, and should therefore be avoided until such time as its
necessity may be demonstrated experimentally. Oakley's treatment of
the problem requires, in addition, that PHI be treated as an independent
variable (in the regression sense, see Chapter 12), which raises problems
because there is no known method of measuring PHI other than he-goat
conversion.
In the light of these remarks it appears to the present author desirable
tha.t the purity in heart index should be redefined simply as the
population percentage of he-goats converted. t This simple operational
definition means that the PHI of all maidens will fall between 0 and
100, and confidence limits for the true PHI can be found easily from

t i.e., in the more rigorous notation of Appendix 1, PHI = E[HGCR).

Digitized by Google
.8 interpretation 113

the observed conversion rate (which should be binomially distributed,


see §§ 3.2-3.5) UBing Table A2, as explained in § 7.7.
For example, if it were observed that a particular maiden caUBed
conversion of r = 2 out of n = 4 he-goats, the estimated PHI would be
00 X 2/4 = 50 from Table cunfidence limits
= 0·95) for thB:? 0~06-93·24 per cenO~ the information
be gained frOfr1ff only four imprecise that it
difficult to concB:?iuB:? Hfe it could be recommended
thB:?t for prelimi7'#7':£th at least should be
UBed. If r = 5 (50 per cent) of these were observed to be converted
Table A2 would give the confidence limits (P = 0·95) for the true PHI
as 18·71-81·29 per cent. While the most extreme forms of vice and of
virtue appear to be ruled out by this result, there is still considerable
lliicertainty abonO If a greater (l7'»nfidence were
cB:?»luired, as for example» hUBband
hB:?manded a ceUiihn (or, alternatiB:?clh maximum)
PPI before the P = O~t§t§ limits could
are 12·83-01~
tolerant suitor might be forgiven for requiring a larger sample.
These calculations show that the assay is subject to considerable
experimental error; and the problem of measuring very high or very
low PHIs is even more difficult (because percentage responses around
50 per cent are the most accurately determined). t If the practioal
hiUiculties involnch samples of n he~hoats could be
»wercome, PHIs Oi~om 50 per cent~ hB:?termined with
lliasonable acourB:?Ch = 500 conveB:?tnh» »)ffnfidence limits
= 0·95) from E%~Ce 46·85-53·10 only r = 10
hc~goats were of 1000 (1 pec »x»nfidence limits
(P = 0·95) should be 0·48-1·84 per oent. Although the relative error is
a good deal bigger than for conversion rates near 50 per cent, this is
likely to be precise enough for practical purposes.
A more precise and economical assay is clearly needed, but until
17'»lt»re experiment>,>,l hB:?ne the present xill have to do.
lIz1wever, as B:?ut, 'All thoughtful mUBt regard
indiscriminatB:? of he-goats intu with concern,
there is no what educatiml political or

t This depends on what is meant by accuracy. It is true if one is interested in the


reialiw error of the proportion converted (or not converted, whichever is the smaller).
It is also true if, 88 in Chapter 14, one is interested in the error of the dose producing a
speoUled proportion converted in quantal experiments.

gitized by Goo
114 § 7.8
economic views such young men might have, and it might well be that
their behaviour would bring scientific experiment into disrepute. This
is, however, a problem for necromancers rather than statisticians.'
7.9. Interpretation of confidence limits
The logical basis and interpretation of confidence limits are, even now,
a matter of controversy. However, few people would contest the
statement that if P = 0·95 (say) limits were calculated according to
the above rules in each of a large number of experiments then, in the

iE
-a
CIlE
~.
·c .
~1 1 2 3 4 !i 6 7 8 9 10 Il 12 13 14 15 16 17 18 19 :!H
Experiment number

FlO. 7.D.1. Interpretation of confldence limits. Repeated estimates (e.g.


sample mean) of a parameter (e.g. population mean), and their D5 per cent
confldence limits. In this (ideal) case one experiment (number 7) out of twenty
gave confldence limits that do not include the population value. One in twenty ia
the predicted long-run frequency.

long run, 95 per cent of the intervals so calculated would include the
population mean (§ 7.4) or median (§ 7.3),1-', if the assumptions made
in the calculation were true. The limit8 must be regarded a8 optimiBtic
a8 explained in § 7.2.
In any particular experiment a single confidence interval is calculated
which obviously either does or does not include 1-" It might therefore
be thought that it could be said a priori that the probability that the
interval includes I-' is either 0 or 1, but not some intermediate value.
However, in a series of identically conducted experiments, somewhat
different values of the 84mple median or mean, and of the sample

Digitized by Google
§ 7.9 115
scatter, for example of 8(g), will, in general, be found in every experi-
ment. The confidence limits will therefore be different from experiment
to experiment. The prediction is that, in the long run 95 per cent (19
out of 20) of such limits will include", as illustrated in Fig. 7.9.1. It is
not predicted that in 95 per cent of experiments the true mean will
fall within the particular set of limits calculated in the one actual
experiment.
Thus, if one were willing to consider that the actual experiment
was a random sample from the population of experiments that might
have been done, i.e. that 'nature has done the shuftling' one could go
further and say that there was a 91S per cent chance of having done an
experiment in which the calculated limits include the true mean, ",.
Another interpretation of confidence intervals will be mentioned
later during the discussion of significance tests.

Digitized by Google
8. Classification measurements

'In your otherwise beautiful poem, there is a verse which reads:

"Every moment dies a man,


Every moment one is born."

It must be manifest that, were this true, the population of the world would be at
a atandstill. In truth the rate of birth is slightly in excess of that of death. I
would auggeat that in the next edition of your poem you have it read:

"Every moment dies a man,


Every moment 1 1/16 is born."

Strictly speaking this is not correct. The actual figure is a decimal so long that
I cannot get it in the line, but I believe 1 1/16 will be autHciently accurate for
poetry.
I am etc.'

Letter said to have been written to Tennyson by Charles Babbage after reading
'The vision of sin' (Mathematical GaIIelte, 1927. p. 270)

8.1. Two independent samples. Relationship between various


methods
CLASSIFICATION measurements and independent samples were dis·
cussed in § 6.4. Before starting any analysis § 8.7 should be read to
make sure that the results are not actually the incorrectly presented
results of an experiment with related samples. The fundamental
method of analysis for the 2 X 2 table (§ 8.2) is the randomization
method (see § 6.3), whioh is known as the Fisher exact test (see § 8.2).
There is an approximate method that gives similar results to the
exact test with sufficiently large samples. This method can be written
in two ways, as the normal approximation described in § 8.4 or as the
ohi-squared test described in § 8.5. The exact test (§ 8.2) should be
used when the total number of observations, N, is up to 40. Published
tables, which only deal with N ~ 40, make this easy. When N> 40
the exact test should be ca.lculated direotly from (8.2.1) if the frequency
in any cell of the table is very small. When the smallest expected value,
x. (see § 8.5), is 5 or more there is reason to believe that the chi-
squared test (§ 8.5) corrected for continuity will be a good approxima-
tion to the exact test (Cochran 1952).

Digitized by Google
§ 8.2 117
8.2. Two independent samples. The randomization method and
the Fisher test
Randomization tests were introduced in § 6.3. As an example of the
result of classification measurements (see § 6.4), consider the clinical
comparison of two drugs, X and Y, on seven patients. It is funda-
mental to any analysis that the allocation of drug X to four of the
patients, and of Y to the other three be done in a strictly random way
using random number tables (see §§ 2.3 and 8.3). It is noted, by a
suitable blind method, whether each patient is improved (I) or not
improved (N). The result is shown in Table 8.2.1 (b).

TABLE 8.2.1
P08Bible re81dtB 01 the trial. Result (b) was actually observed
1 N I Total 1 N Total 1 N Total 1 N Total

Drug X 0 2 2
DrugY "
0 3 "
3
3
1
1
2 "
3 2 1 "
3
1
3
S
0 "
S

Total 3 7 3 7 7
" (a)
" (b)
" S

(e)
"(el)
S 7

With drug X 75 per cent improve (3 out of 4), and with drug Y only
33 1/3 per cent improve. Would this result be likely to occur if X and Y
were really equi-effective 1 If the drugs are equi-effective then it follows
that whether an improvement is seen or not cannot depend on which
drug is given. In other words, each of the patients would have given
the same result even if he had been given the other drug, 80 the observed
difference in 'percentage improved' would be merely a result of the
particular way the random numbers in the table came up when the
drugs were being allocated to patients. t For example, for the experiment
in Table 8.2.1 the null hypothesis postulates that of the 7 patients, 4
would improve and 3 would not, quite independently of which drug was
given. If this were so, would it be reasonable to suppose that the
random numbers came up so as to put 3 of the 4 improvers, but only
1 of the 3 non-improvers, in the drug X group (as observed, Table
8.2.1 (b» 1 Or would an allocation giving a result that appeared to
t Of course, if a subject who received treatment X during the trial were given an
equi-eft'ective treatment Y at a later time, the re&pOJlIIe of the aecond occaaion would
not be exactly the lIBDle B8 during the trial. But it is being postulated that i/ X and Y
are equi-efl'ective then if one of them is given to a given subject at a given moment in
time, the re&pOJlIIe would have been exactly the lIBDle if the other had been given to the
lIBDle 8llbject at the lIBDle moment.

Digitized by Google
118 § 8.2
favour drug X by as much as (or more than) this, be such a rare happen-
ing as to make one BUBpect the premise of equi-effectiveness ,
Now if the selection was really random, every possible allocation of
drugs to patients should have been equally probable. It is therefore
simply a matter of counting permutations (possible allocations) to
find out whether it is improbable that a random allocation will come
up that will give such a large difference between X and Y groups as
that obaerved (or a larger difference). Notice that attention is restricted
to the actual 7 patients tested without reference to a larger population
(see also § 8.4). Of the 7 patients, 4 improved and 3 did not.
Three ways of arriving at the answer will be described.
(a) PhyBic4l randomization. On four cards write 'improved' and on
three write 'not improved'. Then rearrange the cards in random order
using random number tables (or, less reliably, shuffle them), mimicking
exactly the method used in the actual experiment. Call the top four
cards drug X and the bottom three drug Y, and note whether or not
the difference between drugs resulting from this allocation of drugs to
patients is as large as, or larger than, that in the experiment. Repeat
this say, 1000 times and count the proportion of randomizations that
result in a difference between drugs as large as or larger than that in the
experiment. This proportion is P, the result of the (one-tail) significance
test. If it is small it means that the observed result is unlikely to have
arisen solely beoause of the random allooation that happened to come
up in the real experiment, so the premise (null hypothesis) that the
drugs are equi-effective may have to be abandoned (see § 6.2). This
method would be tedious by hand, though not on a computer, but
fortunately there are easier ways of reaching the same results. The
two-tail test is discussed below.
(b) Oounting :permutstionB. As each possible allocation of drugs to
patients is equally probable. if the randomization was properly done.
the results of the procedure just described can be predicted in much
the same way that the results of coin tossing were predicted in § 3.2.
If the seven patients are distinguished by numbers. the four who
improve can be numbered 1. 2. 3. and 4, and those who do not oan be
numbered G, 6, and 7. According to the null hypothesis eaoh patient
would have given the same response whiohever drug had been given.
How many way oan the 7 be divided into groups of 3 and 4 t The
answer is given, by (3.4.2), as 71/(4131) = 35 ways. It is not neoessary
to write out about both groups since once the number improved has
been found in one group (say the smaller group, drug Y, for convenience),

Digitized by Google
§ 8.2 Olauijication metJ8Uremnt8 119

TABLB 8.2.2
EftufMf'ation oj aU 3G poaibk tDa1J8 oj 86lecting a group oj 3 ~
from 7 10 be lima drug Y.PtJt~ I, 2, 3, and 4: ~ ami ptJIN"'.
G, 6, and 7 did noI. Number oj ftlbjed8 improfIifl{1 tDiIA Y = b (see
Table 8. 2. 3(a»

Patient. PVeD Reeult Patient. given ReBu1t


drag Y drag Y

6 6 '1 " = 0 improve. 1 2 6


1 waypving 1 2 6
Table 8.2.1(a). 1 2 '1
P = 1/85 0= 0·029

1 6 6 1 S 6
1 6 '1 1 S 6
1 6 '1 1 S '1 0= 2 improve.

,,
"

18 waya all
2 6 8 1 5 pving
2
2
6 '1
6 '1
b = 1 improve.
12 W&ya all
1
1 , 6
'1
Table 8.2.1(c).
P - 18/8' = 0·6l4.
giving
S 6 6 Table 8.2.1(b). 2 S 6
S 6 '1 P = 12/86 = 0·S4.S 2 S 6
S 6 '1 2 S '1
,, 6 6 2
,,, 6
, 6
6
'1
'1
2
2
6
'1

S ,, 6
S
S , 6
'1

1
1
2 S
2 ,, b = S improve.
'W&ya all
1
2
3
S , giving
Table 8.2.1(d).
P =- '/86 = 0·11'

the number improved in the other group follows from the fact that the
total number improved is neoessarily 3. All 35 ways in whioh the drug Y
group could have been constituted are listed systematically in Table
8.2.2. If the randomization was done properly each way should have
had an equal ohance of being used in the experiment. Notice that
fJf'O'Pt3 randomization ift coruluctifl{1 the experiment is crucial lor the

Digitized by Google
120 Olas8ification measurements § 8.2
analyBis of the re8'Ults. It is seen that 12 out of the 35 result in one
improved, two not improved in the drug Y group, as was actually
observed. Furthermore, lout of 35 shows an even more extreme
result, no patient at all improving on drug Y group, as shown in
Table 8.2.1{a).
Thus P = 12/35+ 1/35 = 0·343+0'029 = 0·372 for a one-tail testt
(see § 6.1). This is the probability (the long-run proportion of repeated
experiments) that a random allocation of drugs to patients would
be picked that would give the results in Table 8.2.1{a) or 8.2.1(b), i.e.
that would give results in which X would appear as superior to Y as in
the actual experiment (Table 8.2.1{b», or even more superior (Table
8.2.1(80», if X and Y were, in fact, equi-effective. This probability is
not low enough to suggest that X is really better than Y. Usually a
two-tail test will be more appropriate than this one-tail test, and this is
discussed below.
Using the results in Table 8.2.2, the sampling distribution under the
null hypothesis, which was assumed in constructing Table 8.2.2, is
plotted in Fig. 8.2.1. This is the form of Fig. 6.1.1 that it is appropriate
to consider when using the randomization approach. The variable on the
abscissa is the number of patients improved on drug Y, i.e. b in the
notation of Table 8.2.3{a). Given this figure the rest of the table can
be filled in, using the marginal totals, so each value of b corresponds to a
particular difference in percentage improvement between drugs X and
Y. Fig. 8.2.1 is described as the randomization (or permutation) distribu-
tion of b, and hence of the difference between samples, given the null
hypothesis. The result of a one-tail test of significance when the
experimentally observed value is b = 1 (Table 8.2.1{b», is the shaded
area (as explained in § 6.1), i.e. P = 0·372 as calculated above.
The two-tail test. Suppose now that the result in Table 8.2.1{a)
had been found in the experiment (b = 0). A one-tail test would give
P = 1/35 = 0,029, and this is low enough for the premise of equi-
effectivene88 of the drugs to be suspect if it is known beforehand that
Y cannot po88ibly be better than X (the opposite result to that observed).
As this is usually not known a two-tail test is needed (see § 6.1). How-
ever, the most extreme result in favour of drug Y {b = 3 as in Table
t This is • one·tail teat of the null hypothesis that X and Y are equi·eifective, when
the alternative hypothesis is that X is better than Y. If the alternative to the null
hypothesis had been that Y was better than X (the alternative hypothesis must, of
oourae, be choeen before the experiment) then the one·tail P would have been 12/35
+18/35+'/35 = 0'971, the probability of rellUlt as favourable to Y as that obeerved,
or more favourable, when X and Y are really equi·eft'ective.

Digitized by Google
§ 8.2 Classification measurements 12]
8.2.1(d» is seen to have P = 0·114. It is therefore impossible that 8.
verdict in favour of drug Y could have been obtained with these
patients. 1/ the drugs were rea.1ly equi-effective then, if the hypothesis
of equi-effectiveness were rejected every time b = 0 or b = 3 (the
two most extreme results), it would be (wrongly) rejected in 2·9+11·4
= 14·3 per cent of trials--far too high a level for the probability of an
error of the first kind (see § 6.1, para. 7). A two-tail test is therefore
not possible with such a small sample. This difficulty, which can only
occur with very small samples-it does not happen in the next example,
has been discussed in a footnote in § 6.1 (p. 89).

o·5 -

o.-&
0·343
o·3
-
o·2

o·1
0·0"29
--
( )·0 I
3 b
FIG. 8.2.1. Randomization distribution of b (the number of patients improv-
ing on drug Y). when X and Y are equi-eftective (i.e. null hypothesis true).

(c) Direct calculation. The FiBher te8t. It would not be feasible to


write out all permutations for larger samples. For two samples of ten
there are 201/(10110 I) = 184756 permutations. FortWlately it is not
necessary. If a general 2 X 2 table is symbolized as in Table 8.2.3(a)

TABLE 8.2.3
8ucceaa failure total 8UCceaa failure total

treatment X a A-a A 8 7 15
treatment Y b B-b B 1 11 12

total C D N 9 18 27

(a) (b)

Digitized by Google
122 § S.2
then Fisher has shown that the proportion of permutations giving rise
to the table is
AIBIO!D!
p= . (S.2.1)
Nla!(A-a) !b!(B-b)!
For example, for Table S.2.1(b), P = 4!3!4 !3 !/(713!I1l !21) = 12/35
= 0·343 as already found. With larger figures (S.2.1) is mGSt con-
veniently evaluated using tables of logarithms of factorials (e.g. Fisher
and Yates, 1963).
In fact no calculation at all is necessary as tables have been published
(Finney, Latacha, Bennett, and Hau, 1963) for testing any 2 X 2 table
with A and B, or 0 and D, both not more than 40. Unfortunately, to
keep the tables a reasonable size it is not possible to find the exact P
value for all 2 X 2 tables, but it is given for those 2 X 2 tables with
marginal totals up to 30 for which P ~ 0·05 (one tail). The published
tables are for B ~ A and b < a only, to avoid duplication. H the
table to be tested does not comply with this, rows and/or columna
must be interchanged until it does. As an example, the table in Table
S.2.3(b), which is from the introduction to the table of Finney et al.
(1963), is tested using the appropriate part of their table, which has been
reproduced in Table 8.2.4.
TABLB S.2.4
Exact tut Jor the 2 X 2 table (Extract from tables of Finney et al. (1963»

Probability (nominal)

tI 0'0& 0'025 0'01 0'005

A = 15 B = 12 15 8 0-0118 70-010- 7 0-010- 6 CHIOI


U 70-041 60'018 & 0'008 4, IMIOI
18 60-0&8 1& 0-0181 4. 0·001 80'001

.
9 2 0-0118 1 0-007 1 0·007 100'OCIlI
8 II 0-018 1 1 0-018 00'008 00'008
7 1 0-088 00'007 00·007

The observed Table 8.2.3(b) has A = 15, B = 12, and a = S.


Entering Table S.2.4 with these values shows under each nominal

Digitized by Google
§ 8.2 O1,assification m41(J,8Uremmt8 123
probability a figure in bold type which is the largest value of 6 that is just
'significant in a one-tail test at the 6 per cent (or 2·6, 1, or 0·6 per cent)
level', i.e. for whioh the one-tailt P ~ 0·05, (or 0·025, 0·01, or 0·005).
The exact value of P is given in sma.ller type. It is the nearest value,
given that 6 must be a whole number, that is not greater than the
nominal value. In this example the one-tail P corresponding to the
observed 6 = 1 is 0·018. This is the sum of the P values oa.lculated
from (8.2.1) for the observed table (a = 8,6 = 1, P = 0·017), and the
only possible more extreme one with the same ma.rgina.l totals (a = 9,
6 = 0, P = 0·001). To find the two-tail P value (see § 6.1 and above)
oonsider the distribution of 6 a.na.Iogous to Fig. 8.2.1. In this case 6
can vary from 0 to 9 and if the null hypothesis were true it would be
4 on the average (see § 8.6). The one-tail P found is the tail of the
distribution for 6 ~ 1. It is required to out off an area. &8 near &8
poeaibleto1;his in the other tail ofthe distribution (6)4),&8 in Fig. 6.1.1.
No value of 6 oute off exactly P = 0·018 but 6 = 7 cuts off an area. of
P = 0·019 that is near enough (see footnote, § 6.1, p. 89). This is the
sum of the probabilities of 6 = 7 and a.ll the more extreme (6 = 8 and
6 = 9) resulte. It can be found from the tables of Finney et al. by the
method described in their introduction. The table has a = 2, 6 = 7,
A. - ( I = 13, B -6 = 6 so oolumns are interohanged, &8 mentioned
above, and the table entered with 13 and 6 rather than 2 and 7, &8
marked in Table 8.2.4. Therefore if it were resolved to reject the null
hypothesis whenever 6 ~ 1 (as observed) or when 6 ~ 7 (opposite tail)
then, if the null hypothesis W&8 in fact true, the probability that it
would be rejected (wrongly)-an error of the first kind-would be
P = 0·018+0·019 = 0·037. This result for the two-tail test is small
enough to make one question the null hypothesis, i.e. to suspect a
real difference between the treatmente, (see § 6.1).
In practice, if the samples are not too sma.ll, it would be adequate,
and much simpler, to double the one-tail P from the table to get the
required two-tail P.

8.3. The problem of unacceptable randomizations


Sometimes it will be found that when two samples are selected at
random one sample contains, for example, all the men and the other
a.ll the women. In fact if this does not happen sometimes, the selection
cannot be random. It seems silly to carry out an experiment in which
t For the oue when it is decided, before the experiment, that the only alternative to
the nall hypotheeia is a dift'erenoa between X and Y in the obaerved direction.

Digitized by Google
124 Ola8sijication measurements § 8.3
treatment X is given only to men and treatment Y only to women.
Yet the logical basis of significance tests will be destroyed if the
experimenter rejects randomizations producing results he does not
like. Often this will be preferable to the alternative of doing an experi-
ment that is, on scientifio grounds, silly. But it should be realized that
the choice must be made.
There is a way round the problem if randomization tests are used.
IT it is decided beforehand that any randomization that produces two
samples differing to more than a specified extent in sex composition-
or weight, age, prognosis, or any other criterion-is unacceptable then,
if such a randomization comes up when the experiment is being
designed, it can be legitimately rejected if, in the analysis of the results,
those of the possible randomizations that differ excessively, according to
the previously specified criteria, are also rejected, in exactly the same
way as when the real experiment was done. So, in the case of Table
8.2.2, the number of possible allocations of drugs to the 7 patients
could be reduced to less than 35. This can only be done when using
the method of physical randomization, or a computer simulation
of this process, or writing out the permutations as in Table 8.2.2.
The shorter methods using calculation (e.g. from (8.2.1), or published
tables (e.g. for the Fisher exact test, § 8.2, or the Wilcoxon tests,
§§ 9.3 and 10.4), cannot be modified to allow for rejection of randomiza-
tions.
8.4. Two independent samples. Use of the normal approximation
Although the reasoning in § 8.2 is perfectly logical, and although
there is a great deal to be said for restricting attention to the observa-
tions actually made since it is usually impossible to ensure that any
further observations will come from the same population (see §§ 1.1
and 7.2), the exact test has nevertheless given rise to some controversy
among statisticians. It is possible to look at the problem differently.
IT, in the example in Table 8.2.1, the 7 patients were thought of as
being selected from a larger population of patients then another sample
of 7 would not, in general, contain 4 who improved and 3 who did not.
This is considered explicitly in the approximate method described in
this section. However there is reason to suppose that the exact test
of § 8.2 is best, even for 2 X 2 tables in which the marginal totals are
not fixed (Kendall and Stuart 1961, p. 554).
Consider Table 8.2.3 again but this time imagine two infinite popula-
tions (e.g. X-treated and Y-treated) with true probabilities of success

Digitized by Google
§ 8.4 Olassification measurements 125
(e.g. improved) (#11 and (#12 respectively. From the first population a
sample of A individuals is drawn at random and is observed to contain a
successes (e.g. improved patients). Similarly b successes out of B are
observed in the sample from the second population. The experimental
estimates of (#11 and (#12 are, as in § 3.4, PI = alA and P2 = bIB, the
observed proportions of successes in the samples from the two popula-
tions. In repeated trials a and b should vary as predicted by the binomial
distribution (see § 3.4).

U8e 0/ the 'I'&Of"11UJl approximation to the binomial


It is required to test the null hypothesis that (#11 = (#12' both being
(#I, say. If this were so then on the average the observed proportions
would be the same too, so PI -P2 would be distributed about a mean
value of zero (cf. Fig. 6.1.1). It was mentioned in § 3.4 and illustrated
in Fig. 3.4.1) that if n is reasonably large the discontinuous binomial
distribution of P is quite well approximated by a continuous normal
distribution. It will therefore be supposed, as an approximation, that
PI and P2 are both normally distributed. This implies that the difference
between them (PI -P2) will be normally distributed with, according to
the null hypothesis, a population mean (,,) of zero. The standard
deviation of this distribution can now be found by using (3.4.5) to
find the true variances of PI and P2 which, given the null hypothesis,
are

(8.4.1)

If PI and P2 are independent, as they will be if the samples are indepen-


dent as assumed, (cf. § 6.4), the variance of their difference will be,
using (2.7.3),

va-t(Pl-P2) = va-t(Pd+vat(P2) = (#I(I-@)(~+~)' (8.4.2)

The true value, (#I, is, of course, unknown, and it must be estimated
from the experimental results. No allowance is made for this, which is
another reason why the method is only approximate. The natural
estimate of (#I, under the null hypothesis, is to pool the two samples
and divide the total number of successes by the total number of trials
(e.g. total number improved by total number of patients), i.e.
P = (a+b)/(A+B). Thus, taking x = (Pl-P2) as the normal variable,
with, according to the null hypothesis, " = 0, an approximate normal

Digitized by Google
126 Ola88iflcation mea8Urement8 § 8.4
deviate (see § 4.3) can be calculated, using (4.3.1) and (8.4.2). This
value of u can then be referred to tables of the standard normal distri-
bution (see § 4.3).
~-I' (PI -P2)
U=-'" • (8.4.3)
(1(x) V[p(I-p)(I/A+IIB)

Applying this method to the results in Table 8.2.3 gives PI = alA


= 8/11), Pi = bIB = 1/12, P = (a+b)/(A+B) = 9/27 and 80, using
(8.4.3), the approximate normal deviate is

~ 8/ 11)-1/ 12 = 2,4648.

J[;7 ( 1-:7) (III) +112) ]


According to Table I of the Biometrika tablest about 1·4 per cent of the
area of the standard normal distribution lies outside u±2'4648
(0,7 per cent in each tail). The result of the test, P = 0·014, is seen to
be a poor approximation to the exact result, P = 0,037, found at the
end of § 8.2. A better approximation can be found by using a 'correction
for continuity' and this should always be done.

YateB' correction lor continuity


Say P = rln in general. It can be shown (e.g. Brownlee (1965, pp.
139, llS2» that the approximation of the continuous normal distribution
to the discontinuous binomial is improved if 0·1) is added to or sub-
tracted from r (or 0'51n is added to or subtracted. from p), 80 as to
make the deviation from the null hypothesis smaller. Thus a better
approximation than (8.4.3) for the normal deviate is

(PI-0·5IA)-<.P2+ 0·5I B )
(8.4.4)
~ .yCP(I-p)(I/A+IIB)] ,
where PI > Pi' Using the results in Table 8.3 again, gives u = 2·054.
Again using Table I of the Biometrika tables it is found that 4·0 per
cent of the total area of the standard normal distribution lies outside
u = ±2·054 as shown in Fig. 8.4.1 (cf. Fig. 6.1.1). In other words, in
repeated experiments it would be expected, il the null hypothesis were
tme, that in 2·0 per unit of experiments u would be less than -2,01)4,
percent
t This table actually gives the area below u = +2"68. i.e. 1-0·007 = 0·993.
See 1'.3 for det&ila.

Digitized by Google
§ 8.4 127
and in 2·0 per cent u would be greater than +2·054. This is a two-tail
test (see § 6.1).
TM ,...." of 1M tat. The probability of observing a dUrerence (poeitive or
negative) inauccess rate between the sample from population 1 (X-treated) and
tha.t from population 2 (Y-treated) as la.rge as. or la.rger than, the obeerved
sample dUrerenC8, if there were DO real dUrerence between the treatment.
(popuJa.tiODB), would be approximately 0.04:. a 1 in 26 cha.nce.

The corrected result. P = 0'04, is quite a good approximation to


the exact probability, P = 0'037, found at the end of § 8.2. It is low

0·4

-;-
.
;::;
;... 0·3
·iI
.8
.~ 0·2

j
e 0·1
c..

Standard normal deviate, u


FlO. 8.4.1. Normal approximation to the binomi&l. DifrerenC8 between two
binomial proportiODB is converted to an approximate normal deviate, u, and
referred to the standard GaWJSia.n curve shOWD in the figure.

enough to make one suspect (without very great confidence) a real


difference between the treatments.
N cmparamdric nature oj tuf. Although the normal distribution is
used, the test just described is still essentially a nonparametric test.
This is because the fundamental assumption is that the proportion of
successes is binomially distributed and this can be assured by proper
sampling methods. The normal distribution is used only as a mathe-
matical approximation to the binomial distribution.

8.6. The chi-squared (xl) test. Cla88ification measurements


with two or more independent samples
The probability dlatribution followed by the sum of squares of " independeBt
atanda.rd normal variates (i.e. :E(IC.-/l.)lIla~) where IC. ill t'IO'rt'IIGll, diflCributed with a
mean of /I. and a standard deviation of a., see § 4.S), is called the chi-squareci
10

Digitized by Google
128 Olas8ification. measurements § 8.5
distribution with! degree of freedom, denoted rU)' As suggested by the definition,
the scatter Been in estimates of the popula.tion variance, calculated from repeated
samples from a rwrmally diBtribtded population, follows the 1.2 distribution. In
fact r = !r/a2 where r is an estimate of rr issued on! degrees of freedom. The
coDBequent use of r for testing hypotheses about the true variance of such a
population is described, for example, by Brownlee (1965, p. 282).

In the special case of 1= 1 d.f., one has X(~) = 1£2, the square of a
single standard normal variate. Tables of the distribution of chi
squared with one degree of freedom can therefore be used, by squaring
the values of 1£ found in § 8.4, as an approximate test for the 2 X 2
table. In practice X(~) is not usually calculated by the method given
for the calculation of 1£, but by another method which, although it
does not look related at first sight, gives exactly the same answer, as
will be seen. The conventional method of calculation to be described
has the advantage that it can easily be extended to larger tables of
classification measurements than 2 X 2. An example is given below.
The form in which r
is most commonly encountered is that appro-
priate for testing (approximately) goodness of fit, and tables of
classification measurements (contingency tables). If Xo is an observed
frequency and x. is the expected value of the frequency on some
hypothesis, then it can be shown that the quantity

(8.IU)

which measure the discrepancy between observation and hypothesis,


is distributed approxi11UJtely like X2 • This approach will be used. to test
the 2 X 2 table (Table 8.2.3) that has already been analysed in §§ 8.2
and 8.4.
The expected values, x., of the frequencies, given the null hypothesis
that the proportion of successes is the same in both populations, are
calculated as follows. The best estimate of this proportion. of successes
is, as found in § 8.4, P = (a+b)/(A+B) = 9/27 = 0·333~. Therefore,
if the null hypothesis were true, the best estimate of the t'I.'Umbert of
successes in the sample from population 1 (e.g. number of patients
improved on drug X) would be 0·3333 X 16 = 6, and similarly the
expected number for population 2 (e.g. drug Y) would be 0·333~ X 12

t This need not be a whole number (_ Table 8.6(b) for ezample). It is a predicted
long·run ClWraflII frequency. The individual frequencies mUBt, of course, be integen.

Digitized by Google
§ 8.5 129
= 4. The original table of observations, and the table of values expected
on the null hypothesis are thus:

0bHrwd fr~ (:1:0 ) EqecUd fr~ (:1:.)


BUcce88 failure total BUcce88 failure total

Popu1&tion 1 8 7 16 6 10 15
Population 2 1 11 12 4, 8 12

Total 9 18 27 9 18 27

The summation in (8.5.1) is over all of the cells of the table. The
differences (xo-x.) are 8-5 = 3, 1-4 = -3, 7-10 = -3, and
11-8 = 3. Thus, from (8.5.1),
32 ( -3)2 ( -3)2 32
X2 = 5"+-4-+10+8" = 6,075.

This is a value of X2 with one degree of freedom, because only one


expected value need be calculated, the rest following by difference
from the marginal totals. It is, as expected, exactly the square of the
value of u found in § 8.4, 2.46482 = 6·075.

Oorrection Jor continuity


As in § 8.4:, this approximate test for association in a 2 X 2 table
should not be applied without using the correction for continuity.
Simply reduce the absolute values of the deviations by 0·5 giving
2.52 (-2·5)2 (-2·5)2 2·5
X(~) = - + + + - = 4·219.
5 4 10 8
Again it is seen that this is exactly the square of the (corrected) value of
u found in § 8'4, u 2 = 2.0542 = 4·219 = r.
This can be referred
directly to a table (e.g. Fisher and Yates, 1963, Table IV; or Pearson
and Hartley, 1966, Table 8) of the chi-squared distribution which,
for one degree of freedom, has the appearance shown in Fig. 8.5.1.
It is found that 4·0 per cent of the area under the curve lies above
r = 4·219 (because of the way the tables are constructed the most
accurate value that can be found from them is that the area is a little
less than 5·0 per cent, i.e. 0·05>P>0·025). This is exactly the same
r
as found in § 8.4 as it should be, since the ttat Jor the 2 X 2 table i8 just
aBOther way oj writing the tut using the normal approximation to the
binomial. The result of the test states that iJ the null hypothesis were

Digitized by Google
130 § 8.5
true, a value of %(~) as large as 4·219 or larger would be found in
4·0 per cent of repeated experiments in the long run. This casts a
certain amount of suspicion on the null hypothesis as explained in
§ 8.4.
It should be noticed that the probability found using is that r
appropriate for a two-tail test of significance (as shown in § 8.4,
0·5

0'4

1 c1el(rt'(' of frt't'llnm

0·2

IJ-l

0." L_L--_L--_L-__.L..--,~~::::i:::==~""",,,,,,_~----I
o ~ 3 4 Ii 7 H 9
4·~1!1 Xl

FIG. 8.5.1. The distribution of chi-squared. The observed value, '·219, for
chi-squared with one degree of freedom (see text) would be exceeded in only'
per cent of repeated experiments in the long run if the null hypothesis were true.
The distribution for 4, degrees of freedom is also shown. (See Chapter 4, for
explanation of probability density.)

Fig. 8.4.1, of. § 6.1) in spite of the fact that only one tail of the r
distribution is considered in Fig. 8.5.1. This is because involves the r
8fJ'U4ru of deviations, 80 deviations from the expected values in either
r
direction increase in the same direction.

Digitized by Google
§ 8.5 131
U8e oj cAi-aqu.ared Jor luting tJ88OCiation in tablu oj claa8iji.calion me.tJ8Ure-
menta larger than 2 X 2
If the results of treatments X and Y had been olassified in more than
two ways, for example sucoess, no change, or failure, the experiment
shown in Table 8.2.3(b) might have turned out as in Table 8.5.1(a).

TABLB 8.5.1
no

- - - - ------ I aUCCe88 change failure

Treatment X ~ 8 4 16
1 6 12
Treatment Y
I 6
(a) observed
I 9 8 10 27

no
aucceaa change failure

Treatment :x 6 4·j 6·4 16


Treatment Y 4 S·~ 4·j 12 (b) expected on
null hypothesis
9 8 10 27

A proper randomization analysis could be done similar to that in


§ 8.2, but no tables exist to shorten the calculations for tables larger
than 2 X 2. Often two or more columns or rows can be pooled (giving,
for example, Table 8.2.3(b) again) to give 2 X 2 tables, which may
answer the relevant questions. For example, is the proportion of
BUCoess and of [no change or failure] the same for X and Y' This
question is answered by the test of Table 8.2.3(b).
Table 8.5.1(a) itself can be tested using the 1.2 approximation,
which is quite sufficiently accurate if all the expected frequencies are
at least 5. (They are not in this case; the test is not really safe with
such small numbers.) On the null hypothesis that the proportion of
sucoesses is the same from treatments X and Y this proportion would
be estimated as 9/27 = 0·3333. So the number of sucoesses expected on
the null hypothesis, when 15 individuals are treated with X, is 0·3333
X 15 = 5. Proceeding similarly for 'no change' and 'failure' gives
Table 8.5.1(b). Thus, from (8.5.1),

(8-5)2 (3-4·4)2 (6-4·4)2


1.(:) = - 5 - +-4.4~-+···+ 4.4, - = 6·086.

Digitized by Google
132 § 8.5
Note that no correction, lor contin.uity i8 used lor tables larger tllan.
2 X 2. r has two degrees of freedom since only two cells C&ll be filled
in Table 8.5.I(b), the rest then follow from the marginal totals. Consult-
ing a table of the r distribution (e.g. Fisher and Yates 1963, Table
r
IV) shows that a value of (with 2 d.f.) equal to or larger than 6·086
would occur in slightly less than 5 per cent of trials in the long run, il
the null hypothesis were true; i.e. for a two-tail test 0·025 < P < 0·05.
This is small enough to cast some suspicion on the null hypothesis.
Independence of classifications (e.g. of treatment type and succe88
r
rate) is tested in larger tables in an exactly analogous way, being the
sum of rTc terms, and having (r-I)(Tc-I) degrees of freedom, for a
table with r rows and Tc columns.

8.S. One sample of observations. Testing goodness of fit with


chi-squared
In examples in I§ 8.2-8.5 two (r in general) samples (treatments)
were considered. Each sample was cl&88ified into two or more (Tc in
general) ways. The chi-squared approximation is the most convenient
method of· testing a single sample of cl&88ification measurements, to
see whether or not it is reasonable to suppose that the number of
subjects (or objects, or responses) that are observed to fall in each of the
Tc cl&88e8 is consistent with some hypothetical allocation into cl&888s
that one is interested in.
For example, suppose that it were wished to investigate the (null)
hypothesis that a die was unbiased. If it were tossed say 600 times the
expected frequencies, on the null hypothesis, of observing 1,2, 3, 4,5, and 6
(the Tc = 6 cl&88e8) would all be 100, so ire is taken as 100 each class in
calculating the value of eqn. (8.5.1). The observed frequencies are the
iro values. The value of eqn. (8.5.1) would have, approximately, the
chi-squared distribution with Tc-I = 5 degrees of freedom il the null
hypothesis were true, so the probability of finding a discrepancy be-
tween observation and expectation at least &8 large as that observed
could be found from tables of the chi-squared distribution as above.
(See also numerical example below.)
As another example suppose that it were wished to investigate the
(null) hypothesis that all students in a particular college are equally
likely to have smoked, whatever their subject. Again the null hypothesis
specifies the number of subjects expected to fall into each of the Tc
cl&88e8 (physics, medicine, law, etc.). If there are 500 smokers altogether
the observed numbers in physics, medicine, law, etc. are the ira values

Digitized by Google
§ S.6 133
in eqn. (8.5.1). The expected numbers, on the null hypothesis, are
found much as above. The example is more complicated than in the
case of tossing a die, becauae different numbers of students will be
studying each subject and this must obviously be allowed for. The total
number of smokers divided by the total number of students in the
college gives the proportion of smokers that would be expected in each
class is the null hypothesis were true, so multiplying this proportion
by the number of people in each class (the number of physics students
in the college, etc.) gives the expected frequencies, x., for each class.
The value calculated from (S.5.1) can be referred to tables of the ohi-
squared distribution with k-l degrees of freedom, as before.

A numerical example. GoodfU388 oJ fit oJ the Poiuon diBtrilnJ.tion


The chi-squared approximation, it has been stated, can be used. to
test whether the frequency of observations in each class differ by an
unreasonable amount from the frequencies that would be expected if
the observations followed some theoretical distribution such as the
binomial, PoiBBOn, or Gaussian distributions. In the examples just
mentioned, the theoretical distribution was the rectangular ('equally
likely') distribution, and only the total number of observations (e.g.
number of smokers) was needed to find the expected frequencies. In
§ 3.6 the question was raised of whether or not it was reasonable to
believe that the distribution of red blood cells in the haemocytometer
was a PoiBBOn distribution, and this introduces a complication. The
determination of the expected frequencies, desoribed in § 3.6, needed
not only the total number of observations (80, see Table 3.6.1), but
also the observed mean r = 6·625 whioh was used as an estimate of #It
in calculating the frequencies expeoted if the hypothesis of a Poisson
distribution were true. The fact that an arbitrary parameter estimated
Jrom the obBenJationa (r = 6,625) was uaed in finding the expected
frequencies gives them a better chance of fitting the observations than
if they were calculated without using any information from the observa-
tions themselves, and it can be shown that this means that the number of
degrees of freedom must be reduced by one, so in this sort of test chi-
squared has k-2 degrees of freedom rather than k-l.
Categories are pooled as shown in Table 3.6.1 to make all calculated
frequencies at least 5, because this is a condition (mentioned above)
for r to be a good approximation. Taking the observed frequency as
Xo and the caloulated frequency (that expected on the hypothesis that

Digitized by Google
134 § 8.6
cells are Poisson distributed as x. gives, using (S.5.1) with the results
in Table 3.6.1,
2 (4-8)2 (5-9)2 (3-5)2 (0-5)2
X = - 8 - + 9 +"'+-5-+ 5 = 14·7.

The number of degrees of freedom, in this case, is the number of


classes (~ = 9 a.fter pooling) minus two, as mentioned above. There
r
are therefore 7 degrees of freedom. Looking up tables of the distribu-
tion shows that P[x(~) ~ 14'7] ~ 0·05. This mea.ns that i/ the true
distribution of red cells were a Poisson distribution, tkm an apparent
deviation from the oalculated diatribution (measured. by r)
as large as,
or la.rger than, tha.t observed in the present experiment would be
expected to &rise by random sa.mpling error in only about 5 per cent of
experiments. This is not low enough (see § 6.1) for one to feel sure that
the premise tha.t the distribution is Poissonian must be wrong, though
it is low enough to suggest tha.t further experiments might lead to that
oonoluaion.

8.7. Related umplas of cla..lfication measurements. Cro..-over


trla.s
Consider Table 8.7.1(a), whioh is based on an example disoussed by
Mainland (1963, p. 236). It looks just like the 2 X 2 tables previously
pteaented. In mct it is not, beoa.use it W&8 not based on two i~
sa.mples of 12. There were actua.lly only 12 patients and not 24. Ea.oh
patient was given both X and Y (in a random order). This is described
as a 0r088-over trial beoa.use those (randomly chosen) pa.tients who
were given X first (period 1) were subsequently (period 2) given Y, and
vice versa.. Table 8.7.1(a) is an incorrect presentation of the results
beoa.use it disguises this f&ct. Table 8.7.1(b) is a correct way of giving
the results, and it conta.ins more information since 8.7.1 (a) oa.n be
oonstructed from it, whereas S.7.1 (b) oa.nnot be constructed from
8.7.1(80). Table 8.7.1(b) oa.nnot be tested in the way described for
independent sa.mples either. The 5 patients who reacted in the sa.me
way to both X and Y oontribute no information about the difference
between the drugs, only the 7 who reacted differently to X and Y do so.
Furthermore, the possibility that the result depends on whether X or
Y was given first oa.n be taken into account. The correct method of
a.na.lysis is described olearly by Mainland (1963 p. 236). The full results,
whioh have been condensed into 8.7.I(b), were as given in Table
8.7.2. Note that of the 12 patients ha.lf (6 selected at random from the

Digitized by Google
1.8.7 135
12) have been assigned to the XY sequence, the other half to the
Y X sequence. These results can be arranged in a 2 X 2 table, 8.7 .3(a)
consisting of two independent samples. A randomization (exact) test
or r approximation applied to this table will test the null hypothesis

TABLE 8.7.1
not
improved improved
(I) (N)

Drug X 12 0 12
DrugY (; 7 12

17 7 24 8.7. 1(a)

DrugY
r
I "
N

Ii 7 12
DrugX{: 0 0 0

Ii 7 12 8.7.1(b)

TABLE 8.7.2

Patient. showing X in period (1) X in period (2) TotaJa


improvement.

In both periods 3 2 5
In period (1)
not in (2) 3 0 3
In period (2)
not in (1) 0
In neither period 0 "0 "
0

6 6 12

that the proportion improving in the first period is the same whether
X or Y was given in the first period, i.e. that the drugs are equi-etfective.
The test has been described in detail in 18.2 (the table is the same as
Table 8.2.1(a», where it was found that P (one tail) = 0·029 but that
the sample is too small for a satisfactory two-tail test, which is what is
needed. In real life a larger sample would have to be used.

Digitized by Google
136 Cla8siftcation measurements § 8.7
The subjects showing the same response in both periods give no
information about the difference between drugs but they do give
information about whether the order of administration matters.
Table 8.7.3(b) can be used to test (with the exact test or chi squared)
the null hypothesis that the proportion of patients giving the same
result in both periods does not depend on whether X or Y was given
first. Clearly there is no evidence against this null hypothesis. If, for

TABLE 8.7.3
Improved in Improved in
(1) not (2) (2) not (1)

X in period (1)
X in period (2)
S
0 ,
0
,
3

3 , 7 8.7.3 (a)

Outcome in periods (1) and (2)


l&Dle different

X in period (1)
X in period (2)
3
2 ,
S 6
6

5 7 12 8.7.3(b)

example, drug X had a very long-lasting effect, it would have been


found that those patients given X first tended to give the same result
in both periods because they would still be under its influence during
period 2.
If the possibility that the order of administration affects the results
is ignored then the use of the sign test (see § 10.2) shows that the
probability of observing 7 patients out of 7 improving on drug X (on
the null hypothesis that if the drugs are equi-effective there is a
50 per cent chance, fJI = 1/2, of a patient improving) is simply P
= (1/2)7 = 1/128. For a two-tail test (including the possibility of 7 out
of 7 not improving) P = 2/128 = 0'016, a far more optimistic result
than found above.

Digitized by Google
9. Numerical and rank measurements.
Two independent samples

'HelBe Magister, helBe Doktor gar,


Und ziehe schon an die zehen Jahr
Heraul, herab und quer und krumm
Meine 8chtller an der Nase herum-
Und sebe, daB wir nichte wiasen kiinnenl
Daa will mir schier daa Herze verbrennen.'t
t 'They call me master, even doctor, and for some ten years now I've led my student.
by the nOlIe, up and down, and around and in circlll8--6lld all I 888 is that we cannot
know! It nearly breaks by heart.'
my GOBTHB
(Faual, Part 1, line 360)

9.1. Relationship between various methods


IN § 9.2 the randomization test (see §§ 6.3 and 8.2) is applied to
numerical observations. In § 9.3 the Wilcoxon, or Mann-Whitney,
test is described. This is a randomization test applied to the ranks
rather than the original observations. This has the advantage that
tables can be constructed to simplify calculations. These randomization
methods have the advantage of not assuming a normal distribution;
also they can cope with the rejection of particular allocations of treat-
ments to individuals that the experimenter finds unacceptable, as
described in § 8.3. They also emphasize the absolute neoessity for
random selection of samples in the experiment if any analysis is to be
done. For large samples Student's etest, described in § 9.4:, can be used,
though how large is large enough is always in doubt (see § 6.2). At
least four observations are needed in each sample, however large the
differences, unle88 the observations are known to be normally distri-
buted, as discU88ed in § 6.2.

9.2. Randomization test applied to numerical measurements


The principle involved is just the same as in the case of claBSification
measurements and § 8.2 should be read before this seotion, as the
arguments will not all be repeated.
Suppose that 4: patients are treated with drug A and 3 with drug B

Digitized by Google
138 §9.2
as in § 8.2, but, instead of each being classified as improved or not
improved, a numericaJ measurement is made on each. For example,
the reduction of blood glucose concentration (mg/IOO mI) following
treatment might be measured. Suppose the results were a.s in Table
9.2.1.
The numbering of the patients is arbitrary but notice that if a positive
response is counted a.s an 'improved' and negative as 'not improved',
Table 9.2.1 is the same as Table 8.2.1(b) 80, if the size of the improve-
ment is ignored, the results can be a.naJyaed exactly a.s in § 8.2.
However, with such a sma.1l sample it is easy to do the randomiza-
tion test on the measurements themselves. The argument is as in

TA.BLE 9.2.1
Ruponau (glUCOBe ~, mg/l00 mI) to ewo drug,. The ra"ks 01
the ruponau are giwn lor use iA § 9.3
Drug A DrugB
Patient Reaponae Patient Reaponae
number (mar/100m!) Rank number (mar/100m!) Rank Total

1 10 6 4: 6 4:
2 16 6 6 -8 2
8 20 7 7 -6 1
«5 -2 8
Total 4:S 21 -8 7 4:0

§ 8.2. See p. 117 for deta.ils. If the drugs were really equi-effective (the
null hypothesis) each patient would have shown the same response
whichever drug had been given, 80 the apparent difference between
drugs would depend 80lely on which patients happened to be selected.
for the A group and which for the B group, i.e. on how the random
numbers happened to come up in the selection of 4 out of the 7 for drug
A. Again, a.s in § 8.2, the seven measurements could be written on
cards from which 4 are selected at random (just as in the real experi-
ment) and ca.lled A, the other 3 being B. The difference between the
mean for A and the mean for B is noted and the process repeated. many
times. There is actually no need to calculate the difference between
means each time. It is sufficient to look at the total response for drug
B (taking the smaller group for convenience) because once this is
known the total for A follows (the total of &11 7 being always 40), and
80 the difference between means a.1so follows. If the experimenta.1ly

Digitized by Google
19.2 Two ifldependent samplu 139
observed total response for B (-3 in the example), or a more extreme
(i.e. smaller in this example) total, arises very rarely in the repeated
randomizations it will be preferred to suppose that the difference
between samples is caused by a real difference between drugs and the
null hypothesis will be rejected, just as in § 8.2.

TABLE 9.2.2
Enumeration, 01 all 35 po8Bible waY8 018electing a gt"oup 0/3 patien.t8lrom
7 to be given drug B. The re8p01l.8e lor each patient i8 given in Table
9.2.1. The total ranklllor drug Bare given lor we in § 9.3

Patients Total Patients Total


Biven response Total given response Total
drusB (mg/lOOml) rank drus B (mg/lOOml) rank

6 6 7 -10 6 1 2 6 28 U
1 2 6 22 18
1 2 7 20 12
1 Ii 6 Ii 10 1 8 Ii 28 16
1 6 7 3 9 1 8 6 27 U

,,,
1 6 7 2 8 1 8 7 26 18
2 Ii 6 10 11 1 6 18 12
2 Ii 7 8 10 1 6 Ii 11
2 6 7 7 9 1 7 10 10
8 Ii 6 16 12 2 8 6 88 16
8 6 7 18 11 2 3 6 82 16
,,
8 6
6
7
6
12
0
10
9
2
2 ,,
8 7
Ii
80
18
U
18
, 6
6
7
7
-2
-3
8
7
2
2
,,,
6
7
17
16
12
11
3 6 23 l'
3
3 , 6
7
22
20
18
12
1 2 3 '6 18

,,
1 2 4 80 16
1 3 86 16
2 3 40 17

With such small samples the result of such a physical randomization


can be predicted by enumerating all 7 !/(3 !4!) = 35 possible ways (see
eqn. (3.4.2» of dividing 7 patients into samples of 3 and 4. This predic-
tion depends on each of the possible ways being equiprobable, i.e.
IAe one wed lor the actual experiment mUBt 'lwve been. piclced at random 'I
ehe analy8i8 i8to valid. The enumeration is done in Table 9.2.2. This
table is exactly analogous to Table 8.2.2 but instead of counting the

Digitized by Google
140 §9.2
number improved, the total response is calculated. For example, if
patients, 1,5, and 6 had been allocated to drug B the total response
would have been 10+(-2)+(-3) = 5 mg/l00 mI. The results from
Table 9.2.2 are collected in Table 9.2.3, which shows the randomization
distribution (on the null hypothesis) of the total response to drug B.
This is exactly analogous to Fig. 8.2.1. The observed total (-3) and
smaller totals (the only smaller one is -10) are seen to occur 2/35
(= 0'057) times, if the null hypothesis is true, and this is therefore the
one-tail P. For a two-tail test (see § 6.1) an equal area can be cut off
in the other tail (total for B ~ 40), 80 the result of the two-tail test is
P = 4/35 = 0·114. This is not small enough to cast much suspicion on
the truth of the null hypothesis, but it is somewhat different from the
P = 0·372 (one tail) found in the analysis of Table 8.2.1(b), to which,
as mentioned above, Table 9.2.1 reduces if the size.! of the improve-
ments are ignored. In § 8.2 a one-tail P = 0.372 was found and a
two-tail test was not po88ible. The reason for the difference is that in
the results in Table 9.2.1 the 'improvements' on drug A are much
greater in size than the (negative) 'non-improvements' on drug B.
The two-tail test can be done since in § 8.2 all 35 randomizations
yielded only 4 different poBBible results (Table 8.2.1) for the trial, but
with numerical measurements the 35 randomizations have yielded
27 possible results, listed in Table 9.2.3, so it it is possible to cut off
equal areas in each tail (cf. §6.1). Notice that if patient 3 had been
in the B group and patient 4 in the A group (this leaves Tables 9.2.2
and 9.2.3 unchanged) the observed total for group B would have been
20+ (-3)+( -5) = 12 and it is seen from Table 9.2.3 that a total
~ 12 occurs in a proportion 13/55 = 0·372 of cases. This one-tail P
(when a large improvement, patient 4, is seen with drug B) is as large
as that found in § 8.2.
With larger samples there are too many permutations to enumerate
easily. For two samples of 10 there are (by (3.4.2» 201/(10110 I)
= 184756 ways of selecting 10 samples from 20 individuals. However it
is not difficult for a computer to test a large sample of these possible
allocations by simulating the physical randomization (random &88ort-
ment of cards) mentioned at the beginning of this section, and of
§ 8.2. Programs for doing this do not seem to be widely available at
the moment but will doubtle88 become more common. This method has
the advantage that it can allow for the rejection of a random arrange-
ment that the experimenter finds unacceptable (e.g. all men in one
sample) as explained in § 8.3. The results in Table 9.2.4 are observations

Digitized by Google
§ 9.2 Two independent aamples 14:1
TABLE 9.2.3
Randomization distribution 01 total re8p0f&8e (mg/lOOml) 01 a gr01l.p 01
3 patimts given drug B (according to the null kypothe8i8 tkat A and B
are equi-eJJective). OO'lUJtf't.lded from Table 9.2.2

Total for
drugB Frequency
(mg/lOOml)

-10 1
-3 1
-2 1
0 1
2 1
3 1
6 1
7 1
8 1
10 2
12 2
13 2
16 2
17 1
18 1
20 2
22 2
23 2
26 1
27 1
28 1
30 2
32 1
33 1
36 1
4,0 1
46 1

Total 36

made by Cushny and Peebles (1905) on the sleep-inducing properties of


(-)-hyoscyamine (drug A) and (-)-hyoscine (drug B). They were
used in 1908 by W. S. Gosset ('Student') as an example to illustrate the
use of his t test, in the paper in which the test was introduced. t
H two randomly selected groups of ten patients had been used, a

t In tbia paper the names of the drugs were mistakenly given 88 ( - ) hyoecyamine
and (+ ).hyoeoyamine. When someone pointed this out Student commented in a letter
to R. A. Filber, dated 7 January 1935, 'That blighter is of COUl'lle perfectly right and
of oourae it doesn't really matter two straws ••• '

Digitized by Google
142 § 9.2
randomization test of the sort just described could be done as follows.
(In the original experiment the samples were not in fact independent
but related. The appropriate methods of analysis will be discussed in
Chapter 10.) A random sample of 12000 from the 184756 possible
permutations was inspected on a computer and the resulting randomiza-
tion distribution of the total response to drug A is plotted in Fig. 9.2.1

TABLE 9.2.4
Response in. ko'Urs extra Bleep (compared with controls) induced
by (- )-hyOBcyamin.e (A) and (- )-hyoscin.e (B).
From Cushny and Peebles (1905)

Drug A DrugB

+0·7 +1·9
-1·6 +0·8
-0·2 +1-1
-1·2 +0'1
-0·1 -0,1
+3·"
+3'7 +"."
+5·5
+0'8 +1-6
0'0 +"·6
+2·0 +3'"
l:YA = 7'5 l:y. = 23·3
ftA = 10 ft. = 10
iiA = 0·75 'O. = 2·33

(of. the distribution in Table 9.2.3 found for a very small experiment).
Of the 12000 permutations 488 gave a total response to drug A of less
than 7·5, the observed total (Table 9.2.4), 80 the result of a one-tail
randomization test is P = 488/12000 = 0·04067. With samples of this
size there are so many possible totals that the distribution in Fig. 9.2.1
is almost continuous, so it will be possible to cut off a virtually equal
area in the opposite (upper) tail of the distribution. Therefore the
result of two-tail test can be taken as P = 2 X 0·04067 = 0·0813. This
is not low enough for the null hypothesis of equi-effectiveness of the
drugs to be rejected with safety because the observed results would not
be unlikely if the null hypothesis were true. The distribution in Fig.
9.2.1, unlike that in Table 9.2.3, looks quite like a normal (Gaussian)
distribution, and it will be found that the t test gives a similar result
to that just found.

Digitized by Google
19.3 1.&3

50n

100

°0-4 2 04 111--1 120-1 1-1 0-1 W04 IH04 204 22 04 24 0-1 26°" 284 3eH
j05 (obtlP"ofd ,0.hIP) Tot&1l'l'8pot1M' to dl"U(r A (hou",)

~-o 2~ 2:2 1:8 p!& 1:0 0:6 cf.2 J.2-1'06 -i-o-i04-108-202-206-3..1


1°.'iS (obePrvrd valtJP) ('orI'l'8ponding valtJP of diff~ren('f'
ht-twt't'n mNll8 (1.- y, I

FlO. 9.1.1. Randomisation dJatribution of the total reeponae to drua A-


for Cuahny and Peeblee' reaulte, when A and B are equi-etrective (nuD hypathesia
true). 'l'be va1uee of the difterence between meaDS COI'ItWpOnding for each total
for A Is aIao shown on the abecu.. (the total of &ll responaea is SO·8 for every
aDocation 80, for ex&mple, if the total for A were 10·4, the total for B must be
20·4 80 the difterence between me&na is 1·0). Constructed fiom a I'&I1dom umple
of 12000 fiom the 184756 ways of &lIoc&ting 10 patients out of 20 to drug A.

9.3. Two sample randomization tast on ranks. The Wilcoxon (or


Mann-Whitney) tast
The difficalty with the method described in 19.2 is that it is not
poeaible to prepare tables for all poeaible sets of observations. However,
if the observations are ranked in ascending order and eaoh observation
replaced by its rank before performing the randomization test it is
poasible to prepare tables, because now every experiment with N
observations will involve the same numbers, 1,2, . . . , N.
In addition to the fact that it is not necessary to assume a partioular
form of distribution of the observations, another advantage is that the
method can be used for results that are themselves ranks, or results
that are not quantitative numerical measurements but can be ranked
in order of magnitude (e.g. subjeotive pain scores). Even with numerical
11

Digitized by Google
144 N 'Umerical and rank measurements § 9.3
measurements the 1088 of information involved in converting them to
ranks is surprisingly small.
Assumptions. The null hypothesis is that both samples of observations
come from the same population. If this is rejected, then, if it is wished
to infer that the samples come from populations with different medians,
or means, it must be assumed that the populations are the same in all
other respects, for example that they have the same variance.
r--

r-- I---

- r--

6 789 10 11 12 13 14 In 16 171M
t
obS('rvro value
Total of ranks for drug B
(sum of 3 ranks from 7)
if null hypothesis were true

FIG . 9.3. 1. Randomization distribution of the total of ran.ks for drug B


(SUID of3 ran.ks from 7) if null hypothesis is true. From Table 9.2.2. The mean is
12 and the standard deviation is 2·828 (from eqns. (9.3.2) and (9.3.3)). This is
the relevant distribution for any Wilcoxon two-sample test with samples of size
3 and 4.

The results in Table 9.2.1 will be analysed by this method. They


have been ranked in ascending order from 1 to 7, the ranks being shown
in the table. In Table 9.2.2 all 35 (equiprobable) ways of selecting
3 patients from the 7 to be given drug B are enumerated. And for each
way the total rank is given; for example, if patients 1,5, and 6 had had
drug B then, on the null hypothesis that the response does not depend
on the treatment, the total rank for drug B would be 5+3+2 = 10.

Digitized by COOS Ie
§9.3 145
The frequency of each rank total in Table 9.2.2 is plotted in Fig. 9.3.1,
which shows the randomization distribution of the total rank for drug
B (given the null hypothesis). This is exactly analogous to the distribu-
tions of total response shown in Table 9.2.3 and Fig. 9.2.1, but the
distributions of total response depend on the particular numerical
values of the observations, whereas the distribution of the rank sum
(given the null hypothesis) shown in Fig. 9.3.1 is the same for any
experiment with samples of 3 and 4 observations. The values of the
rank sum cutting off 2·5 per cent of the area in each tail can therefore
be tabulated (Table A3, see below).
The observed total rank for drug B was 7, and from Fig. 9.3.1, or
Table 9.2.2, it can be seen that there are two ways of getting a total
rank of 7 or less, 80 the result of a one-tail test is P = 2/35 = 0·057.
An equal probability, 2/35, can be taken in the other tail (total rank of
17 or more) 80 the result of a two-tail test is P = 4/25 = 0·114. This
is the probability that a random selection of 3 patients from the 7
would result in the potency of drug B (relative to A) appearing to be
as small as (total rank = 7), or smaller than (total rank < 7), was
actually observed, or an equally extreme result in the opposite direction,
'I A and B were actually equi-effective. Since such an extreme apparent
difference between drugs would occur in 11·4 per cent of experiments
in the long run, this experiment might easily have been one of the
11·4 per cent, 80 there is no reason to suppose the drugs really differ
(see § 6.1). In this case, but not in general, the result is exactly the
same as found in § 9.2.
A check can be applied to the rank sums, based on the fact that the
mean of the first N integers, 1,2,3, . . . , N, is (N+I)/2 80 therefore
sum of the first N integers = N(N +1)/2. (9.3.1)
In this case 7(7+1)/2 = 28, and this agrees with the sum of all ranks
(Table 9.2.1), which is 21+7 = 28.
The distribution of rank totals in Fig. 9.3.1 is symmetrical, and this
will be 80 as long as there are no ties. The result of a two-tail test will
therefore be exactly twice that for a one-tail test (see § 6.1).

The t.&8e 01 table8 lor the Wilcoxon. test


The results of Cushny and Peebles in Table 9.2.4, which were
a.na.lysed.in § 9.2, are ranked in ascending order in Table 9.3.1. Whereties
occur each member is given the average rank as shown. This method
of dealing with ties is only an approximation if Table A3 is used

Digitized by Google
146 § 9.3
because the table refers to the randomization distribution of integers,
1, 2, 3, 4, 6, 6, . . . 20, not the actual figures used, i.e. I, 2, 3, 41, 41, 6,
etc. Such evidence as there is suggests a moderate number of ties does
not cause serious error.
The rank sum for drug A is 1+2+3+41+6+8+91+14+161+17
= 801, and for drug B it is 1291. The sum of these, 801+ 1291 = 210,

TA.BLB 9.3.1
The obaervations/rom Table 9.2.4 ranked in a8Ce1Ulif&{l order
Observation
Drug (hoUl'll) Rank

A -1·6 1
A -1·2 2
A -0·2 S
B -0·1 4*} _ 4+6
A. -0·1 4i - 2
A. 0·0 6
B 0·1 7
A 0·7 8
B 0·8 9*} 9+10
A. 0·8 91 - - 1 -

B 1-1 11
B 1·6 12
B 1·9 13
A. 2·0 14
B S·4 16*} _ 16+16
A. S·4 lOi - - 2 -

A 3·7 17
B 4·4 18
B 4·6 19
B 6·6 20

Total 210

checks with (9.3.1), which gives 20(20+1)/2 = 210. A randomization


distribution could be found, just as above and in § 9.2, for the sum of
10 ranks picked at random from 20. The proportion of poBSible alloca-
tions of patients to drug A giving a total rank of 801 or 1e88 is the
one-tail P, as above. The two-tail P may be taken as twice this value
though, as mentioned this may not be exact when there are ties.

Digitized by Google
147
P (two tail) can be found (approximately) from Table A3 in which
nl and n~ are the sample sizes (nl ~ n~). For each pair of sample sizes
two figures are given. If the rank sum for sample 1 (that with n l
observations) is equal to or less than the smaller tabulated figure,
if it is equal than the la.rg<7<7 figure, then
(two tail) is nut than the figure at the oolumn.
this case n l 10 and the figures is
128t for P = 132 for P = ubserved rank
of 801 is less greater than l~ween 0·1 and
0·05. This mea.ns that if the null hypothesis of equi-effectiveness were
true then the probability of observing a rank sum of 801 or less would
be under 0'05, a.nd the probability of observing a rank sum equally
extreme in the other direction would al80 be under 0'05, 80 the total
two-tail P (see § 6,1) is under 0·1. This result is similar to that found in
D,2 using the powerful test on the
(wbiliijjrvations thej?wjj~wl<7vv, not small evidence for
hifference betW??V'wn

are too large lor


Table A3 only deals with samples containing up to 20 observations.
For larger samples the randomization distribution ofranks (shown for a
small sample in Fig. 9.3.1) is well approximated by a normal distribu-
tion. II the null hypothesis is true, the distribution of the rank sum,
say, for the nl observatiovv vhown (see, for
vnvmple, Brownlvv, D52) to have m?(wz?(r'?
nl(N + 1)/2, (9.3.2)
wh<7re N = n l +n?? number of Fnr example, in
the first example discussed in this section, nl = 3, N = 7 80 PI = 3
(7+1)/2 = 12, &8 is obvious by inspection of Fig. 9.3.1. The standard
deviation of RI is (loc. cit.)
o = v[nln~(N + 1)/12]. (9.3.3)
the distributiw)?i h.3.1 the stand??id is therefore
X4(7+1)/12] hising these approximate
???(vndard normal ,"""'hH'd? § 4.3) can be ',"'H'"H',''~''' dwm (4.3.1) &8
R1-Pl
(9.3.4)
o
t Theee are the rank BUJD8 that cut off II per cent olthe area in each tail (10 per cent,
P = 0'1, altogether), in the analogue of Fig. 9.3.1 for aamplee olilize 10 and 10.

gitized by Goo
148 § 9.3
and the rarity of the result judged from tables of the standard normal
distribution.
For example, the results in Table 9.3.1 gave "1
= 10, N = 20,
Rl = 80·0. Thus, from (9.3.2)-(9.3.4),

80·0-10(20+1)/2
'U = v'[10x 10(20+1)/12] = -1·85.

This value is found from tables (see § 4.3) to cut off an area P = 0·032
in the lower tail of the standard normal distribution. The result of
two-tail test (see § 6.1) is therefore this area, plus the equal area above
'U = +1·85, i.e. P = 2xO·032 = 0·064, in good agreement, even for
samples of 10, with the exact result from Table A3. The two-tail result
can be found directly by referring the value 'U = 1·85 to a table of
Student's t with infinite degrees of freedom (when t becomes the
same as 'U, see § 4.4).

9.4. Student's t test for independent samples. A parametric test


This test, based on Student's t distribution (§ 4.4), assumes that
the observations are normally distributed. Since this is rarely known
it is safer to use the randomization test (§ 9.2) or, more conveniently,
the Wilcoxon two-sample test (§ 9.3) (see §§ 4.2, 4.6 and 6.2). It will
now be shown that when the results in Table 9.2.4, are analysed using t
test the result is similar to that obtained using the more assumption-free
method of §§ 7.2 and 7.3. But it cannot be assumed that the agreement
between methods will always be so good with two samples of 10. It
depends on the particular figures observed. If the observations were
very non-normal the t test might be quite misleading with samples of 10.
The assumptions of the test are explained in more detail in § 11.2 and
there is much to be said for always writing the test as an analysis of
variance as described at the end of § 1l.4. There was no evidence that
the assumptions were true in this example.
To perform the t test it is necessary to assume that the observations
are independent, i.e. that the size of one is not affected by the size of
others (this assumption is necessary for all the tests described), and
that the observations are normally distributed, and that the standard
deviation is the same for both groups (drugs). The scatter is estimated
for each drug separately and the results pooled. The quantity of
interest is the difference between mean responses (iiB-iiA), so the

Digitized by Google
§ 9.4 Two independent Bamplu 149
object is to estimate the standard deviation of the difference, 8[gA -YB]' t
80 that it can be predicted (see example in § 2.7) how much scatter
would be seen in (y A -YB) if it were determined many times (this
prediction is likely to be optimistic, see § 7.2).

(1) For drug A the sum of squared deviations, using (2.6.5), is

(l:Yr~
l:(y_y)2 = l:y2 _ _-
n
(7·5)2
= 34·43--- = 28·805
10

with nA-l = 10-1 = 9 degrees of freedom (see § 2.6).


(2) For drug B the sum of squared deviations is similarly
(23'3)2
l:(y_y)2 = 90·37 -10 = 36·081

with ns-1 = 10-1 degrees of freedom.


(3) The pooled estimate of the variance of y (the response to either
drug) is
total sum of squares 28'805+36·081
8 2[y] =
total degrees of freedom
= 9+9
= 3.605

with 9+9 = 18 degrees of freedom. As it is necessary to assume that


the scatter of responses is the same for both groups, a singled pooled
estimate of this scatter is made.
(4) Using (2.7.8), the variance of the mean of 10 observations on
drug A is estimated to be

and similarly the variance of the mean of 10 observations on drug B is


estimated &8

t Note that this means the estimated standard deviation, II, of the random variable
(fiA -fiB)' It i8 the functional notation described in § 2.1. It does not mean II 'imu
(fiA -fiB)'

Digitized by Google
150 § 9.4
(5) Using (2.7.3) the variance of the difference between two such
means (assuming them to be independent, see also § 10.7) is
82[gA -YB] = 82[gA]+82[gB] = 0·3605+0·3605 = 0·7210.
The standard deviation of the difference between means is therefore
V(0·7210) = 0·8491 hours = 8[gA -YB]' with 18 degrees of freedom.
(6) The definition of t, given in (4.4.1), is (X-I')/8(X) where x is
normally distributed and 8(X) is its estimated standard deviation. In
this case the normally distributed variable of interest is the difference
between mean responses, (YA -YB). It is required to test the null
hypothesis that the drugs are equi-effective, i.e. that the population
value of the difference between means is zero, I' = 0, and therefore
I' = 0 is used in the expression for t because, as usual, it is required to
find out what would happm il the mdl Aypotlae8i8 were true. Inserting
these quantities giv~ on the null hypothesis,
(YA -YB)-O 2·33-0·75
t= = = 1·861.
8[gA -YB] 0·8491

References to a table of the distribution of t (see § 4.4 p. 77) for 18


degrees of freedom, shows that 5 per cent of the area lies outside the
t = ±2·101 and 10 per cent lies outside t = ±1·734 (cf. §§ 4.4 and
6.1). Therefore, for a two-tail test, P is between 0·05 and 0·1. This would
be the probability of observing a value of t differing from zero by as
much as, or more than, 1·861, if the null hypothesis were true, and if
the assumptions of normality, etc. were correct. It is not small enough
to make one reject the null hypothesis that the drugs are really equi-
effective <I' = 0). See also § 6.1.
In general, to compare two independent samples (A and B) of
normally distributed mutually independent observations one calculates,
condensing the above argument into a single formula,

t = 1~A-YB)-I' 12 '
J ["l:(YA-YA) +l:(YB-YB) (~+~)J
L- nA+nB- 2 n A nB
(9.4.1)

where nA and nB &l'e the numbers of observations in each sample (not


necessarily equal); I' is the hypothetical value of the difference to be
tested (most often zero, but see example in § 12.5 in which it is not),
and the vertical bars round the numerator indicate that its sign is
ignored, i.e. t is taken as positive. This quantity is referred to tables of

Digitized by Google
§ 9.4 151
the t distribution (with "A+"B-2 degrees of freedom) in order to
find P.

U8e oj ccm.fidence limila lead8 to the 8atM conclu.rion a8 the t Ie8t


The variable of interest is the difference between means (gA -fa)
a.nd ite observed value in the example was 2·33-0·75 = 1·58 hours.
The standard deviation of this quantity was found to be 0·8491 hours.
The expreasion found in § 7.4 for the confidence limite for the population
mean of any normally distributed available z, viz. z±t8(z), will be
used. For 90 per cent (or P = 0·9) confidence intervals the P = 0·1
value of t (with 18 d.f.) is found from tables. It is, as mentioned above,
1·734. Gaussian confidence limits for the population mean value of
ClA-la) are therefore Hi8±(1·734xO·8491), i.e. from o·n to 3·05
hours. Because these do not include the hypothetical value of zero
(implying that the drugs are equi-e1£ective) the observations are not
compatible with this hypothesis, if P = 0·9 is a sufficient level of
certainty. For a greater degree of certainty 95 per cent confidence
limite would be be found. The value of t for P = O·Oli and 18 d.f. was
found above to be 2·101 80 the Gaussian confidence limits are l·li8
±(2·101xO·8491), i.e. from -0·2 to +3·36 hours. At this level of
confidence the results are compatible with a population difference
between means of p = 0, because the limits include zero. These results
imply that confidence limits can be thought of in terms of a significance
test. For any given probability level (<<) the variable will be found
'significantly' different (at the P = « level) from any hypothetical
value (zero in this case) of the variable that falls outside the 100(1-«)
per cent confidence limits.

Digitized by Google
10. Numerical and rank measurements.
Two related samples

10.1. Relationship between various methods


Tlm observations on the soporific effect of two drugs in Table 9.2.4
were analysed in §§ 9.2-9.4 as though they had been made on two
independent samples of 10 patients. In fact both of the drugs were
tested on eaoh patient,t so there were only 10 patients altogether.
The unit on which each pair of observations is made is called, in general,
a block (= patient in this case). It is assumed throughout that observa-
tions are independent of each other. This may not be true when the
pair of observations are both made on the same subject as in Table
10.1.1, rather than on two different subjects who have been matched
in some way. The responses may depend on whether A or B is given
first, for example, because of a residual effect of the first treatment,
or because of the passage of time. It must be assumed that this does
not happen. See § 8.6 for a discussion of this point. Appropriate
a.na.lyses for related samples (see § 6.4) are discussed in this ohapter.
Chapters 6-9 should be read first.
Because results of comparisons on a single patient are likely to be
more consistent than observations on two separate patients, it seems
sensible to restrict attention to the difference in response between
drugs A and B. These differences, denoted, d are shown in Table
10.1.1. The total for each pair is also tabulated for use in §§ 11.2 and
II.6.
These results will be analysed by four methods. The sign test (I 10.2)
is quiok and nonparametrio: and, alone in this chapter, it does not
need quantitative numerical measurements; scores or ranks will do.
The randomization test on the observed differences (§ 10.3) is best for
quantitative numerical measurements. It suffers from the fact that,
like the analogous test for independent samples (§ 9.2), it is impossible
to construct tables for all possible observations; so, except in extreme
oases (like this one), the procedure, though very simple, will be lengthy
unless done on a computer. In § 10.4 this problem is overcome, as in

t Whether A or B is given flrat should be decided randomly Cor each patient. See
§§ 8.4 and 2.3.

Digitized by Google
----_.-- ._. --

§ 10.1 Two related Bample& 153


§ 9.3, by doing the randomization test on ranks instead of on the
original observations-the Wilcoxon signed-ranks test. This is the
best method for routine use (see § 6.2). In § 10.6 the test ba.sed on the

TABLE 10.1.1
The ruu.lt8 from Table 9.2.4 presented in a way 81wwing
how the experiment was really done
Patient DUrerence Total
(block) 'lilt. 'II. d = ('II.-'IIIt.) ('11.+'11,,)

1 +0'7 +1·9 +1-2 2·6


2 -1·6 +0'8 +2,' -0·8
,
3 -0·2
-1·2
+1-1
+0·1
+1·3
+1·3
0'9
-1-1
5 -0·1 -0·1 -0·2
6
7
+3·4
+3·7
+4'4
+5·5
°
+1'0
+1·8
7·8
9·2
8 +0'8 +1·6 +0'8 2·,
9 +4·6
10 °
+2·0 +3·4
+"6
+104
'·6
5·4

Totals 7·5 22·3 15'8 80'8


mean 1·58

assumption of a normal distribution, Student's paired t test, is described


(see §§ 6.2 and 9.4). Unless the distribution of the observations is
known to be normal, at least six pairs of observations are needed, as
discussed. in § 6.2 (see 80180 § 10.5).

10.2. The sign test


This test is ba.sed on the proposition that the difference between the
two readings of a pair is equally likely to be positive or negative if the
two treatments are equi-effective (the null hypothesis). This means
(if zero differences are ignored.) that there is a 50 per cent chance (i.e.
fJI = 0'5) of a positive difference and a 50 per cent chance of a negative
difference. In other words, the null hypothesis is that the population
(true) median difference is zero. The argument is closely related. to
that in §§ 7.3 and 7.7 (see below). It is sufficient to be able to rank the
members of each pair. Numerical measurements are not necessary.
Emmple (1). In Table 10.1.1 there are 9 positive differences out of
9 (the zero difference is ignored though a better procedure is probably
to allocate to it a sign that is one the safe side, see footnote on p. 155).

Digitized by Google
1M § 10.2
11 the probability of a positive difference is 1/2 (null hypothesis) 11k>n
the probability of observing 9 positive differences in 9 'trials of the
event' (just like 9 heads out of 9 tOMeS of an unbialled coin) is given by
the binomial diHtribution (3.4.3) as (1/2)9 = 1/512~0·OO2. For a two-
tail test of significance (see § 6.1) equally extreme deviations in the
oppotrite direction (i.e. 9 negative signs out of 9) must be taken into
accoWlt and for this P~0·002 also, 80 the reault of a two-tail sign test
II P~0·004. This is aubetantially lower than the values obtained in
Chapter 9 (when it was not taken into account that the samples were
related) and auggeste rejection of the null hypothesis because reaults
deviating it by aa much as was actually observed would be rare if it
were true.
EZMIIIpk (2). If there had been one negative difference (however
small) and 9 positive ones, then the one tail P (see § 6.1) weuld be the
probability of observing 9 or more positive signs out of 10. This would
be the situation if it were decided to count the zero difference in Table
10.1.1 as negative, to be on the safe side. From the binomial distribu-
tion, (3.4.3), the probability of observing 9 positive differences out of 10
is

101
P(9) =-(0'5)9 (0·5)1
9111

= 9-'10'i I(I)'
2 10 = 0'00976,

and the probability of 10 positive differences out of 10 (the only more


extremereault)isP(IO) = (1/2)10 = 0'000976. Therefore the probability
of observing 9 or 10 positive signs out of 10 is 0·00976+0·000976
= 0·0107. The two-tail P (soo § 6.1) includes equally extreme results
in the other direction (lor fewer positive signs out of 10, i.e. 9 or more
negative signs) for which P = 0·0107 al80, 80 the two-tail P = 0·0107
+0·0107 = 0·0214.
Phi8 meaM, in wortl8, that iJ the null hypothe8i8 (that fJ' = 0'5, imply-
ing equi-effectivene88 oj the treatments) were true, then, in the long run,
2·14 per cmI oj repealed experiments would give results differing in either
direction from the result8 expected on the null hypothe8i8 (i.e. 5 negative
,igM out oj 10) by as much as, or more than, was actually observed in the
uperimenl. Phi8 i8 a 81I.fficiently rare event to cast some doubt on thepremi8e
oj tqui-tJ/ectivent88 (see § 6.1).

Digitized by Google
§ 10.2 Two related Mlmplu 155
The gmeral re8UU
Generalizing the argument shows that if robe differences out of 1& are
observed to be negative (or positive, if there are fewer positive signs
than negative), then the result of a two-tail test of the hypothesis that
the population median difference is zero is

'-'-
P=2I 1& I (1)11
,-0 r I(n-r)! -2 . (10.2.1)

H(1UJ 10 jiM the ruu.lt8 without calctdatiof&


There are several ways of using tables to get the result.
Method (1). One way is to find confidence limits for the median
difference (d value) from Table AI, as described in § 7.3. If the con-
fidence limits do not include zero (or, more generally, any other
hypothetical value that it is wished to test for consistency with the
observations), then, as explained in § 9.4, the observations are not
consistent with the null hypothesis. For example, the results (d values)
in Table 10.1.1 consist of 1& = 9 non-zero differences.t The method of
§ 7.3 shows, using Table AI, that 99·60 per cent confidence limits for
the population median are provided by the largest and smallest of the
nine observations, i.e. +0·8 to +4·6. These limits do not include zero,
10 the results are not consistent with the null hypothesis and the result
for a two-tail test is that P is not greater than 1-0·996 = 0·004 (in
this case P = 0·004 as shown above).
Putting the matter more precisely, the exact value of P for the
confidence limits that just fail to include zero (e.g. such that the next
smallest observation below the lower limit would be negative) will
be the same as the exact value of P for a two-tail test (see method
(2) below). By way of example suppose that patient 5 (Table 10.1.1)
had given a difference of -0·01 (rather than zero) in Example (2)
above. Table Al shows that the 99·8 per cent confidence limits for the
population median difference, based on a sample of 1& = 10 differences,
are provided by the largest and smallest observations, -0·01 to +4·6
These limits include zero. The 97·86 per cent confidence limits, from

t This situation show. the difficulties that can be introduced by ties. There is no ~n
to exclude the zero difference when finding confidence limite for the median, but the
reeulta will only agree exactly with the sign teet (from which the zero was omitted) if
this is done. The beet BDIIWer is probably to be on the aafe side. This 11II11alI:7 mee.na
counting the zero difference as though it had the sign leaat conducive to rejection of the
null hypothesis. In the example diacUMed this meana pretending that patient 5 actually
pve a Deptive difference. Example (2) showa that P ... 0·0214 in this - .

Digitized by Google
156 Numerical and rank measuremmt.Y § 10.2
Table AI, are the next-to-smallest and next-to-largest observations,
i.e. +0·8 and +2·4, which just fail to include zero. This agrees with the
exact two-tail result, P = 0·0214 (= 1-0·9786), found by direct
calculation above.
Method, (2). The same result is obtained if Table Al is entered with
r = rObl+1. This is obvious if (10.2.1) is compared with (7.3.3).
Considerations of a few examples shows that if limits are taken as the
(robl+1) the observation from end of the ranked observations, the
limits will just fail to include zero. For example, in Table 10.1.1, as
just discussed, r = robl+1 = I, gives P = 1-0·996 = 0·004. Likewise
in the second example above, robl = 1 negative sign out of n = 10.
Entering Table Al with n = 10 and r = rObl+ 1 = 2 gives the result of
the two-tail significance test as P = 1-0·9786 = 0·0214, exactly as
found from first principles above.
Method, (3). As might be expected, the same result can be obtained by
finding confidence limits, gJ B and gJL, for the population proportion
(9') of positive (or negative) differences and seeing whether these limits
include 9' = 0·5 or not. The method has been described in § 7.7 and the
the result can be obtained, as explained there, from Table A2. It will be
left to the reader to improve his moral fibre by showing (by comparing
(7.7.1), (7.7.2), and (7.3.2» that if the upper confidence limit for
the population median difference, found above, just fails to include
zero, then it will be found that the upper confidence limit for gJ, gJB' is
equal to or less than 0·5. Similarly, if the lower confidence limit for the
population median just fails to include zero then it will be found that
gJL ~ 0·5.
For example, in Table 10.1.1, robl = 0 out n = 9 differences were
negative, 80 l00r/n = 0 per cent negative differences were observed.
Entering Table A2 with r = 0 and n = 9 shows that 99 per cent
confidence limits for the population proportion of negative differences
are gJL = ° and gJ B = 0·445. These limits do not include 0·5 (as
expected) and this implies that for a two-tail significance test P
< 0·01 (i.e. 1-0·99), as found above.
In the second example above (robl = 1 negative difference out of
n = 10), consulting Table A2 with r = 1, n = 10, gives 95 per cent
confidence limits for the population proportion (9') of negative differ-
ences as 0·0025 and 0·445 which do not include 0·5. This is as expected
from the fact that the 97·86 per cent (which is as near to 95 per cent
as it is possible to get, see § 7.3) confidence limits for the population
median difference, +0·8 to +2·4 found above, just fail to include

Digitized by Google
§ 10.2 Two related Bample8 157
zero. The 99 per cent confidence limits for [#J are 0·0005 to 0'5443,
which do include the null hypothetical value, [#J = 0'5, as expected.
These results imply that the result of a two-tail sign test is 0·01 < P
< 0·05. The exact result is 0·0214, found above.

10.3. The randomization test for paired observations


The principle involved is that described in §§ 6.3, 8.2, 8.3, and 9.2.
As in § 9.2, it is not poBSible to prepare tables to facilitate the test
when it is done on the actual observations. However, in extreme
cases, like the present example, or when the samples are very small, as
in § 8.2, the test is easy to do (see § 10.1).
As before, attention is restricted to the subjects actually tested.
The members of a pair of observations may be tests at two different
times on the same subject (as in this example), or tests on the members
of a matched pair of subjects (see § 6.4). It is supposed that if the null
hypothesis (that the treatments are equi-effective) were true then the
observations on each member of the pair would have been the same
even if the other treatment (e.g. drug) had been given (see p. 117 for
details). In designing the experiment it was (or should have been)
decided strictly at random (see § 2.3) which member of the pair
received treatment A and which B, or, in the present case, whether A
or B was given first. If A had been given instead of B, and B instead of
A, the only effect on the difference in responses (d in Table 10.1.1),
if the null hypothesis were true, would be that its sign would be changed.
According to the null hypothesis then, the sign of the difference between
A and B that was observed must have been decided by the particular
way the random numbers came up during the random allocation of A
and B to members of a pair. In repeated random allocations it would be
equally probable that each difference would have a positive or a neg-
ative sign. For example, for patient 1 in Table 10.1.1, the randomization
decided whether +0·7 or +1·9 was labelled A and hence, according to
the null hypothesis, whether the difference was +1·2 or -1·2. It can
therefore be found out whether (if the null hypothesis were true) it would
be probable that a random allocation of drugs to these patients would
give rise to a mean difference as large as (or larger than) that observed
(1·58 hours), by inspecting the mean differences produced by all
poBSible allocations, (i.e. all p088ible combinations of positive and
negative signs attached to the differences). If this is sufficiently im-
probable the null hypothesis will be rejected in the usual way (Chapter

Digitized by Google
158 § 10.3
6). In fact it can be shown that the same result can be obtained by
inspecting the sum of only the positive (or of only the negative) d
values resulting from random allocation of signs to the differences, 80 it
is not necessary to find the mean each timet (similar situations arose
in §§ 8.2 and 9.2).
A88Umptiona. Putting the matter a bit more rigorously, it can be
seen that the hypothesis that an observation (d value) is equally
likely to be positive or negative, whatever its magnitude, implies that
the distribution of d values is symmetrical (see § 4.5), with a mean of
zero. The null hypothesis is therefore that the distribution of d values
is symmetrical about zero, and this will be true eitlw if the lIA and lIB
values have identical distributions (not necessarily symmetrical), or
the distributions of lIA and lIB values both have symmetrical distribu-
tions (not necessarily identical) with the same mean. This makes it clear
that if the null hypothesis is rejected, then, if it is wished to infer from
this that the distributions of lIA and lIB have different population means,
it must be a88Umed either that their distributions both have the same
shape (i.e. are identical apart from the mean), or that they are both
symmetrical.
Note that when the analysis is done by enumerating possible alloca-
tions it is assumed that eaoh is equi-probable, i.e. that an allocation
was picked at random for the experiment, the de8ign of which is therefore
inutricabllllinked with iU analyBi8 (see § 2.3)
If there are n differences (10 in Table 10.1.1) then there are 2"
poBSible ways of allocating signs to them (because one difference oan
be + or -, two can be ++, +-, -+,or - - , and each time another
is added the number of poBSibilities doubles). All of these combinations
could be enumerated as in Table 8.2.2 and Fig. 8.2.1, and Tables 9.2.2
and 9.3. This is done, using ranks, in § 10.4. In the present example,
however, only the most extreme ones are needed.
Emmple (1). In the results in Table 10.1.1 there are 9 positive
differences out of 9 (the zero difference, even if included, would have no
effect because the total is the same whatever sign is attached to it).
The number of ways in which signs can be allocated is 28 = 512. The
observed allocation is the most extreme (no other can give a mean of
1·58 or larger) 80 the chance that it will come up is 1/512. For a two-tail
test (taking into account the other most extreme poBSibility, all signs

t As before this is because the IoIaZ of all differences is the same (lIi'S in the example)
Cor all raruiomizations, 80 specifying the sum of negative differences also specifies the
mean difference.

Digitized by Google
§ 10.3 Two rel4tetlllamplu 169
negative, see § 6.1), the P value is therefore 2/512~0·004. In this
most extreme case (though no other) the result is the same as given by
the sign test (§ 10.2). Consider, for example, what would have happened
if patient 5 had given a negative difference instead of zero. The result
of the randomization test will, unlike that of the sign test, depend on
how large the negative difference is.
Example (2). Suppose that patient 5 had given tl = -0·9, the other
patients being as in Table 10.1.1. There are now 210 = 1024 possible
ways of allocating signs to the n = 10 differences. How many of these
give a total for the negative differences (see above) equal to leas than
0·9 t Apart from the observed allocation, only two. That in whioh
patient 8 is negative but 5 is positive giving a sum of negative differ-
ences of 0·8, and that in whioh all differences are positive giving a sum
of negative differences of zero. The probl;'bility of observing, on the
null hypothesis, a sum of negative differences as extreme as, or more
extreme than 0·9 is thus 3/1024. For a two-tail test (see § 6.1), therefore,
P = 6/1024 = 0·0059t (see next example for the detailed interpreta-
tion).
Example (3). If, however, patient 5 had had tl = 2·0, the mean
difference, l, would have been 13·8/10 = 1·38. In this case a sum of
negative differences equal to or less than 2 could arise in ten different
ways, as well as that observed, 80 P (one tail) = 11/1024 and P
(two tail) = 22/1024 = 0·0225.t The 11 possible ways are (a) all
differences positive (sum = 0), (b) one difference negative (patient
8,6, 1,3,4, 10,7, or 5) giving a sum of 0·8, 1·0, 1·2, 1·3, 1·3, 1·4, 1·8, or
2·0, depending on which patient has the negative difference, (0) two
differences negative, patients 6 and 8 giving a sum of negative differ-
ences of 1·0+0·8 = 1·8, or patients 1 and 8 giving a sum of 1·2+0·8
= 2·0.
This result means that if the null hypothesis were true then the
probability would be only 0·0225 that the random numbers would
come up, during the allocation of the treatments, in such a way as to
give a sum of negative differences of 2·0 or less (i.e. a mean difference
between B and A of 1·38 or more), or results equally extreme in the
other direction (A giving larger responses tha.n B). This probability
is small enough to make one suspect the null hypothesis (see § 6.1).

t In general it is poaaible, though uncommon in this BOn of teat, that an eltactly


equal arM could not be out oft' in the opposite tail BO twice the one-tail P may be a
maxUnmn value for the two·tail P (_ § 6.1).

Digitized by Google
160 § 10.4
10.4. The Wilcoxon signed-ranks test for two related samples
This test works on muoh the same principle as the randomization
test in § 10.3 except that ranks are used, and this allows tables to be
constructed, making the test very easy to do. The relation between the
methods of §§ 9.2 and 9.3 for independent samples was very similar.
However, the signed-ranks test, unlike the sign test (§ 10.2) or the
rank test for two independent samples (§ 9.3), will not work with
observations that themselves have the nature of ranks rather than
quantitative numerical measurements. The measurements must be
such that the values of the differences between the members of eaoh
pair can properly be ranked. This would certainly not be possible if
the observations were ranks. If the observations were arbitrary scores
(e.g. for intensity of pain, or from a psychological test) they would
be suitable for this test if it could be said, for example, that a pair
difference of 80-70 = 10 corresponded, in some meaningful way, to a
smaller effect than a pair difference of 25-10 = 15. Seigel (I956a, b)
disousses the sorts of measurement that will do, but if you are in
doubt use the sign test, and keep Wilcoxon for quantitative numerical
measurements. Sections 9.2, 9.3, 10.3 and Chapter 6, should be read
before this section. The precise nature of the assumptions and null
hypothesis have been discussed already in § 10.3.
The method of ranking is to arrange all the differences in ascending
order regardle88 oj sign, rank them 1 to n and then attach the sign of the
difference to the rank. Zero differences are omitted altogether. Differ-
ences equal in absolute value are allotted mean ranks as shown in
examples (2) and (3) below (and in § 9.2). To use Table A4 find T, which
is either the sum of the positive ranks or the sum of the negative ranks,
whichever sum is smaller. Consulting Table A4 with the appropriate
n and T gives the two-tail P at the head of the column. Examples are
given below. Of course for simple cases the analysis can be done
directly on the ranks as in § 10.3.

How Table A4 is constructed


Suppose that n = 4 pairs of observations were made, the differences
(d) being +0'1, -1·1, -0·7, +0'4. Ranking, regardless of sign, gives

d +0·1 +0'4 -0·7 -1·1


rank 1 2 -3 -4
The observed sum of positive ranks is 1+2 = 3, and the observed sum
of negative ranks is 3+4 = 7. The sum of all four ranks, from (9.3.1),

Digitized by Google
110.4 181
isn(n+l)/2 = 4(4+1)/2 = 10, which checks (3+7 = 10). Thus 21 = 3,
the amaller of the rank sums. Table A4 indicates that it is not pouible
to find evidence against the null hypothesis with a sample as small as
4 dift'erences. This is because there are only 2" = 2' = 18 dift'erent
ways in which the results could have turned out (Le. ways of allocating
signs to the dift'erences, see § 10.3), when the null hypothesis is true.

TABLB 10.4.1
PAe 16 poMble tlJtJ1I' in whiM a trial on Jour pair' oj ftCbjtdl could ,.".
out iJ treatment. ..4 and B were equi-effectitJe, 10 lAe lip oj eacA differenu
" decided by wAetAer tAe randomization fWOCU' allocatu ..4 or B to lAe
member oj tAe pair gitJi"fl tAe larger 'upoMe. For tlXJmpIe, on 1M IeCOntl
liM lAe If'IIGllut differenu " negatitJe and all lAe rut are poftti", gitJi"fl
IUm oj negatitJe ranJ:" = 1

Sum of Sum of T
Ra.ok 1 2 S 4 poe. ranb. neg. ranks

+ + + + 10 0 0

+ + + 9 1 1
+ + + 8 2 2
+ + + 7 S S
+ + + 6 4 4

+ + 7 S S
+ + 6 4 4
+ + 6 6 6
+ + 6 6 6
+ + 4 6 4
+ + 3 7 S

+ 1 9 1
+ 2 8 2
+ 3 7 3
+ 4 6 4

0 10 0

Therefore, even the most extreme result, all dift'erences positive, would
appear, in the long run, in 1/16 of repeated random allocations of
treatments to members of the pairs. Similarly 4 negative differences
out of 4 would be seen in 1/16 of experiments. The result of a two tail
test cannot, therefore, be le88 than P = 2/16 = 0·125 with a sample of
four dift'erences, however large the differences (see, however, §§ 6.1 and

Digitized by Google
162 § 10.4
10.5 for further comments). With a small sample like this, it is easy to
illustrate the principle of the method. More realistic examples are
given below.
The 2~ = 16 poBBible ways of allocating signs to the four differences
(i.e. the possible ways in whioh A and B could have been allocated to
members of a pair, see § 10.3 for a full discussion of this proOeBB), are
listed systematically in Table 10.4.1, together with the sums of positive
and negative ranks, and value of T, corresponding to each allocation.
In Table 10.4.2, the frequencies of these quantities are listed from the

TABLE 10.4.2
The relative Jreque:ru;ie8 of observing various vcUU68 of i.e. the distrilntJioM
of, the rank sum, and T, with n = 4 :pairs of obBervatioM when the null
hypothe8iB is true. Ocmstruded, from Table 10.4.1

Rank Frequency for Frequency for Frequency for


sum poe. ra.ob neg. ra.ob T

0 1 1 2
1 1 1 2
2 1 1 2
3 2 2 4
4 2 2 4
5 2 2 2
6 2 2
7 2 2
8 1 1
9 1 1
10 1 1

Total 16 16 16

results in Table 10.4.1, and in Fig. 1004.1 the distribution of the sum
of positive ranks is plotted (that for negative ranks is identical).
(These are the paired sample analogues of the rank distribution worked
out for two independent samples in Table 9.2.2 and Fig. 9.3.1.)
Now the observed sum of positive ranks was 3, and the probability
of observing a sum of 3 or less is seen from Table 10.4.2 or Fig. 10.4.1, to
be 5/16. The probability of an equally large deviation from the null
hypothesis in the other direction (sum of positive ranks ~ 7) is also
5/16. (The distribution is symmetrical, like that in Fig. 9.3.1, unleBB
there are ties, so the result of a two-tail test is twice that for a one-tail
test. See § 6.1.) The result of a two-tail significance test is therefore

Digitized by Google
§ 10.4 163
P = 10/16 = 0'626, 80 there is no evidence against the null hypothesis,
because results deviating from it by as much as, or more than, the
observed amount would be common if it were true. In other words, if
the null hypothesis were true it would be rejected (wrongly) in 62'6
per cent of repeated experiments in the long run, if it were rejected
whenever the sum of positive ranks was 3 or less, or when it was equally
extreme in the other direction (7 or greater). A value of T equal to or
less than the observed value (3) is seen, from Table 10.4.2, to occur in
4+2+2+2 = 10 of the 16 possible random allocations. The probability
(on the null hypothesis) of observing T ~ 3 is therefore P = 10/16

;;..
...
s::
~ I
;r
2:
:..

(I
(I 2 3 4 5 6 7 8 9 10
Rank 8um
FlO. 10.4.1. Distribution of the sum ofpoeitive ranks when the null hypothesis
is true for the Wilcoxon signed ranks teat with four paira of observations. The
distribution is identical for 8um of negative ranks. Plotted from Table 10.4.2.

= 0·626 which is another way of putting the same result. As in § 9.3,


the calculations in Tables 10.4.1 and 10.4.2 would be the same for any
experiment with n = 4 pairs of observations. The values of T cutting off
suitably small tail areas (1 per cent, 6 per cent, etc.) can therefore be
tabulated for various sample sizes. (The smallest possible value,
T = 0, outs of an area of P = 2/16 = 0·126 for the small sample in
Table 10.4.2, as mentioned above.) This is what is given in Table A4. A3
Example (1). In Table 10.1.1, there are 9 positive differences out of 9
80 all ranks are positive and T = sum of negative ranks = O. Consulting
Table A3 with n = 9, T = 0, shows P < 0·01 (because T is less than
2, the tabulated value for P = 0·01). In fact doing the test directly
it is seen that there is only one way (the observed one) of getting a sum
of negative ranks as extreme as zero, out of 211 = 612 ways of allocating
signs (see § 10.3). So P (one tail) = 1/612, and P(two tail) = 2/612
= 0·004 (exactly as in §§ 10.2 and 10.3 for this extreme case, but not
in general). This is quite strong evidence against the null hypothesis.

Digitized by Google
]84 1]0.4
This is, 88 11811&1, because if the null hypothesis were true, deviations
from it (in either direction) 88 large 88, or larger than, those obeerved
in this experiment would occur only rarely (P = 0.0(4) because the
random numbers happened to come up 80 that all the subjects giving
big reBpOD888 were given the same treatment.
Emmple (2). Suppose however, 88 in § 10.3, Example (2), that patient
6 had given d = -0·9 instead of zero. When the observations are ranked
regardleu of sign the result is 88 follows:
d o·s -0·9 ]·0 1·2 1·3 1·3 1·4 I·S 2·4 4·6
rank 1 -2 3 4 5t lit 7 S 9 10

Thus, n = 10, BUm of negative ranks = 2, and sum of positive ranks


= 63. Thus, T = 2, the smaller rank BUm. The total of all 10 ranks
should be, from (9.3.1), n(n+l)/2 = 10(10+1)/2 = liD, and in fact
53+2 = 55. Consulting Table A4 with n = 10 and T = 2, again
shows P < 0·01 (two tail). An exact analysis is easily done in this case.
A sum of negative ranks of 2 or leu could arise with only 2 combinations
of signs, in addition to the observed one (rank 1 negative giving T = 1
or all ranks positive giving T = 0), and there are 2ft = 210 = 1024
possible ways of allocating signs (see § 10.3). Thus P (one tail) = 3/1024
and P (two tail) = 6/1024 = 0·0059. Again quite strong evidence
against the null hypothesis.
Example (3). If patient 5 had had d = -2·0 (88 disOUBBed in § 10.3)
the ranking prooess would be 88 follows:
d O·S 1·0 1·2 1·4 I·S -2·0 2·4 4·6
rank 123 6 7 -S 9 10

Consulting Table A4 with n = 10 and T = S gives P = 0·05. Enumera-


tion of all possible ways of achieving a sum of negative ranks of S or
less shows there to be 25 wayst 80 the exact two-tail P is 50/1024
= 0·049.
Emmple (4). Consider the following 12 differences observed in a
paired experiment, shown after ranking in asoending order, disregarding
the sign.
d 0·1 -0·7 0·8 1-1 -1·2 1·5 -2·1 -2·S -2·4 -2·6 -2·7 -S·1
rank 1 -2 S 4 -5 6 -7 -8 -9 -10 -11 -12

t Thia i8 found by conatructing a sum of 8 or Ie. from the integers from 1 to n( = 10),
.. 11-'. in oaIoulating Table A4. More properly it mould be done with the figuree 1,2, a,
4la, 41b, 8, 7. 8. 9. 10 and with theee there are only 24 ways of getting a sum of 8 or
Je..

Digitized by Google
§ 10.4 186
In this case most of the obeerved difFerenoes are negative. Is the
population mean difference different from zero t The sum of the
negative ranks is 2+5+7+8+9+10+11+12 = 64, and the sum of
the positive ranks is 1+3+4+6 = 14, 80 T = 14, the Imaller rank
sum. An arithmetical check is provided by (9.3.1) which gives n(n+ 1)/2
= 12(12+1)/2 = 78, and, correctly, 64+14 = 78. TableA4showathat
when n = 12, a value of T = 14 corresponds to P = 0·01S. Only
marginal evidence against the null hypothesis (see § 6.1).

Haw 10 ekal toill BGmplu 100 large lor Table ..4.4


Table A4 deals only with samples up up to " = 26 pairs of obaerva-
tions. For larger samples, as in § 9.3, it is a very good approximation
to assume that the distribution of the rank sum, shown for a small
sample in Fig. 10.4.1, is GaUBBian (normal) with (given the null hypo-
thesis) a mean of
,., = .(.+1)/4 (10.4.1)
and standard deviation

(10.4.2)

(the derivations of ,., and a are given, for example, by Brownlee


(1965, p. 258». For example, for the distribution in Fig. 10.4.1, n = 480
the mean is ,., = 4(4+1)/4 = 5, as is obvious from the figure, and
a =V{4(4+1)(8+1)/24} = 2·74.
The results in Example (4) can be used to illustrate the normal
approximation. In thisexa.mple,. = 12,80,., = 12(12+1)/4 = 39, and
a = V{12(12+ 1)(24+ 1)/24} = 12·75. An approximate standard nor-
mal deviate (see § 4.3) can therefore be calculated from (4.3.1) as

,,= iT-,.,1 = 114 -391= ~ = 1.96. (10.4.3)


a 12·75 12·75
The vertical bars mean, as usual, tha.t the numerator is taken as positive.
The same value would be obtained if the sum of negative ranks, 64,
were used in (10.4.3), becauae 64-39 = 25. This value can now be
referred to of the standard normal distribution (see § 4.3), or tables of
, (with infinite degrees of freedom), (see § 4.4). A value of" = 1·96 cuts
off an area P = 0·05 in the tails of the standard normal distribution, as
explained in § 4.3. In other worda of value of " above + 1·96, or less
than -1·96, would occur in 5 per cent of repeated experiments. In

Digitized by Google
166 § 10.4
this case the normal approximation gives the same value as the exact
result, P = 0,05, found above from Table A4.

10.&. A data selection problem arising in sman samples


Consider the paired observations of responses to treatments A and B shown
in Table 10.5.1.

TABLE 10.5.1

Treatment
Block A B Difference

1 1-7 0·5 +1·2


2 1·2 0·5 +0·7
3 1·8 0·9 +0·9
f 1·0 0·7 +0·3

The experiment W88 designed exactly like that in Table 10.1.1. All the differences
are positive so the three nonparametric teste described in §§ 10.2-10.f all give
P = 2/2' = 1/8 for a two-tail test. In general, for " differences all with tbfl
same sign, the result would be 2/2".
It baa been stated that the design of an experiment dictates the form its an-
alysis must take. Selection of particular features after the results have been seen
(data-snooping) can make significance teste very misleading. Methods of dealing
with the sort of data-snoopingt problem that arise when comparing more than
two treatments are discU88ed in § 11.9. Nevertheless, it seems unreasonable to
ignore the fact that in these results, the observations are completely separated,
i.e. the smalleet response to A is bigger than the largest response to B, a feature
of the results that baa not been taken into account by the paired tests. (In
general, the statistician is not saying that experimenters should not look too
cloeely at the results of their experiments, but that proper allowance should be
made for selection of particular features.) This feature means that if the results
could be analysed by the two nonparametric methods designed for independent
samples (described in §§ 9.2 and 9.3), both methods would give the probability
of complete separation of the results, if the treatments were actually equi-
effective, 88 P = 2" I" 1/(2,,) I = 2(flf 1)/8 I = 1/35 (two-taill-a far more 'signi-
ficant' result I The naive interpretation of this is that it would have been better
not to do a paired experiment. This is quite wrong. It baa been shown by Stone
(1969) that the probability (given the null hypothesis of equi-effectiveness of
A and B) of complete separation of the two groups 88 observed, would be 1/35
even if there were no differences between the blocks, and even less than 1/35 if
there were such differences. This is not the same 88 the P = 1/8 found using the
paired tests because it is the probability of a different event. If the null hypo-
thesis were true, then, in the long run, 1 in 8 of repeated experiments would be
expected to show f differences all with the same sign out of f, but only 1 in
35, or fewer, would have no overlap between groups 88 in this case.
It remains to be decided what should be done faced with observations such 88
t This is atatistioiana jargon. 'Data I18I8Otion' might be better.

Digitized by Google
§ 10.5 167
thoee in Table 10.5.1. The BD&g ie, of course, that IIfty a1ngle apeclfted arrange-
ment of the resulte is improbable. If the treatments were equi-effective (and
there were no differences between blocks) IIfty of the 4141/81 = 70 poaible
arrangements of the eight ftgurea into two groupe of 4 would have the same
probability, P = 1/70, of occurring. It is oDly because the particular a.rra.np-
ment, with no overlap between A and B, corresponds to a preconceived ldea,
that it is thought unusual, and what constitutes 'correspondence with a pre-
conceived idea' may be arguable. The problem is an old one:
' .••• when Dr Beattie ot-rved, .. BOmethiDg remarkable which had happened to him,
that he had ohanoed to _ both No.1 and No. 1000, of the haokney-ooaohea, the fint
and the Iut; "Why, Sir, (1I&id Jobnaon,) there is an equal ohanoe for one'. _ing thoee
two numbera 88 any other two". He W88 oIearly right; yet the -intr of the two extreDIM,
each of which is in BOme degree more conspicuous than the rest, could not but .trike
one in a stronger manner than the Bight of any other two numbers'
(Boawe1l'. Life oJ JoAtieoft)

The oDly aafe gtmlll"tll rule that can be offered at the moment is to analyae the
aperiment 88 a paired aperiment if it was designed in that way. In other words
take P = 1/8 in the present caae; not much evidence apinat the null hypotheala.
The problem is, however, a complicated one that is atiIl not fully solved. t

10.8. The paired t test


As fot the two sample t test (§ 9.4), it is neoesaa.ry to &88ume that the
distributions ofrespoDSe8 to the two treatments are Gaussian (normal) in
form (see §§ 4.2 and 4.6), but it is no longer neoesaa.ry to &88ume that they
have identica.l variances. The &88umptions are explained in more detail in
§ 11.2 and it would be preferable always to write the ca.lculationa in the
form of analysis of variance as described in § 11.6.
The method will be applied to the results in Table 10.1 whioh have
already been ana.lysed properly in §§ 10.2-10.4 (and whioh were
analysed as though the two samples were independent in Chapter
9) although there is no evidence that the &88umptions are fulfilled.
The analysis is carried out on the difFerences, tl = YB-YA' The
variance of the difFerences is estimated to be, uaing (2.6.2) and (2.6.5),

I:(d-J'f~ 38'58-(15·8)2/10
r[tl]= n-l = 10-1

= 1·513 with (n-l) = 9 degrees of freedom.

t In IOfM 088M, such .. this one, when the amane.t P value (given, in this _ by
the Wilcoxon two.l8IDple teat) is amalIer than the P value that the other teIIt8 under
oonaideration can ever reach, however large the dift'erence between the treatments,
Stone (1969) baa argued that it is proper to quote the amalIer P, i.e. P <; 11M for the
results in Table 10.6.l.
Stone'. method al80 introdUC88 another factor of 1/2, i.e. he t&kea P <; 1/70, but this
feature baa not yet come into wide U88.

Digitized by Google
168 § 10.6
And the variance 01 the mean difference is, by (2.7.8),
r[d] l-ln3
s2[J] = - n = -10- = 0·1513 with 9 degrees of freedom.
This should be carefully distinguished from the variance 01 the
diJJerence between mea1&8 found in § 9.4, which was larger (0,7209) and
had more degrees of freedom (18). The disappearance of 9 degrees of
freedom will be explained when the results are looked at from the point
of view of the analysis of variance in § 11.6.
The standard deviation of the mean difference is estimated as
B[J] = YO·Un3 = 0·3890
and the null hypothesis is that the population (true) 'mean difference,
,." is zero. Thus, from (4.4.1),

t= J-,., =~ = J . (10.6.1)
B[J] 8[J] y[~di-(l:drjJ/n}/{n(n-l)}]

In this example
1·58
t = 0.3890 = 4·062.
Referring to a table of the t distributiont with n-l = 9 degrees of
freedom shows that a value of t (of either sign) as large as, or larger
than 4·062 would be soon in leaa than 0·5 per cent of trials if the null
hypothesis that the population (true) mean difference ,., = 0 were
true, and if the asaumptions of normality, etc. were true, i.e. P (two
tail) < 0·006. This strongly suggests that the null hypothesis is not in
fact true and that there is a real difference between the means.
This result is rather different from that found in § 9.4 and the other
sections in Chapter 9, when the same results were analysed as though
the drugs had been tested on independent samples, and the reasons for
this are discussed in §§ 10.7, 11.2, 11.4, and 11.6. It is in reasonable
agreement with the other analyses in this chapter but it cannot be
asaumed that the , test will always give similar results to the more
asaumption-free methods.
As in § 9.4, the same conclusion could be reached by calculating
confidence limits for the mean difference. The 99·6 per cent confidence
t For example, Fieber and Yates (1983), Table 3, or PearIIOn and Hartley (1988),
Table 12. Only the latter baa P = 0'005 values. See § 4.4 for detail••

Digitized by Google
§ 10.8 189
limits for p would be found not to inolude zero, the null hypothetical
value, but the 99·8 per cent confidence limits do inolude zero.
10.7. When will related sampl.. (pairing) be an advantage?
In Chapter 9 the results in Table 9.2.4 and § 10.1 were analysed by
several different methods and in no case was good evidence found
against the hypothesis that the two drugs were equi-effective. The
methods all assumed that the measurements had been made on two
independent samples of ten subjects each. In § 10.1 it was revealed
that in fact the measurements were paired and the same results were
reanalyaed making proper allowance for this in §§ 10.2-10.6. It was then
found that the evidence for a difference in effectiveness of the drugs
was actually quite strong. Why is this' On commonsense grounds the
difference between responses to A and B is likely to be more consistent
if both respon.ses are measured on the same subject (or on members of
a matched pair), than if they are measured on two different patients.
This can be made a bit less intuitive if the correlalicm between the
two responses from each pair is considered. The correlation coeffioient,
r (whioh is disoussed in § 12.9, q.v.), is a standardized measure of the
extent to whioh one value (say YA) tends to be large if the other (Ya) is
large (in 80 far as the tendency is linear, see § 12.9). It is olosely related
to the covariance (see § 2.6), the sample estimate of the correlation
coefficient being r = COV[YA' Yal/(8[YA].8[Ya]).
Now in § 9.4 the variance of the difference between two means was
found as rrgA-ga] = r(gA]+r(ga] using (2.7.3), which assumes that
the two means are not correlated (which will be 80 if the samples are
independent). When the samples are not independent the full equation
(2.7.2) must be used, viz.
r[J] = r(ga-gA] = 82[gA]+82[ga]-2 COV(gA' galt
Using the correlation coeffioient (r) this can be written
(10.7.1)
(This expression should ideally contain the population correlation
coefficient. If an experiment is carried out on properly seleoted in.-
depen.dmt samples, this will zero, 80 the method given in § 9.4, which
ignores r, is correct even if the 8ample correlation coefficient is not
exactly zero.)
These relationships show that if there is a positive correlation between
t There are equal numbera in each IalDple 80 (rA -g.) = (YA -Y.) = d.

Digitized by Google
170 § 10.7
the two responses of a pair (r> 0) the variability of the difference
between means will be reduced (by subtraction of the last term in
(16.7.1)), as intuitively expected. In the present example r = +0'8,
and r[jA-YBl is reduced from 0·7210 when the correlation is ignored
(§§ 9.4 and 11.4), to 0·1513 when it is not (§§ 10.6 and 11.6). The
correct value is 0,1513, of course.
Although correlation between obaervations is a useful way of looking
at the problem of designing experiments 80 as to get the greatest
possible preoision, this approaoh does not extend easily to more than
two groups and it does not make olear the exact assumptions involved
in the test. The only proper approaoh to the problem is to make olear the
exact mathematical model that is being assumed to describe the
observations, and this is disousaed in § 11.2.

Digitized by Google
11. The analysis of variance. How to
deal with two or more samples

'. • • when I come to "Evidently" I know that it means two hoUIB hard work at
least before I can see way.'
why W. S. GOSBET ('Student')
in letter dated June 1922, to R. A. FIsher

11.1. Relationship between various methods


TH B methods in this chapter are intended for the comparison of two or
more (k in general) groups. The methods described in Chapters 9 and
10 are special caaes (for k = 2) of those to be described. The rationale
and aaaumptions of the two sample methods will be made clearer
during diaousaion of their k sample generalizations. The principles
diaCUBBed in Chapter 6, although some were put in two-sample language,
all apply here.
All that any of the methods will tell you is whether the null hypo-
thesis (that all k treatments produce the same effect) is plausible or
not. If this hypothesis is not acceptable, the analyaia does not say
anything about which treatments differ from which. Methods for
comparing all possible pairs of the k treatments are described in
§ 11.9 and, referenoes are given to other multiple compariBma methods.
When the samples are independent (as in Chapter 9) the experimental
design is described as acme-way cla88ijicaticm because each experi-
mental measurement is classified only aocording to the treatment
given. Analyses are described in §§ 11.4 and 11.0. When the samples
are related, as in Chapter 10, each measurement is olassified aocording
to the treatment applied and also aocording to the particular block
(patient in Chapter 16) it oocurs in. Suoh two-way cla88ijicaftou are
diaCU88ed in §§ 11.6 and 11.7.
As in previous chapters nonparametrio methode baaed on the
randomization principle (see §§ 6.2, 6.3, 8.2, 8.3, 9.2, 9.3, 10.2, 10.3, and
10.4) are available for the Bimplest experimental designs. As UBUal,
these experiments can also be analysed by methods involving the

Digitized by Google
172 The afllllyBiB oj variance §11.1
assumption (among others, see § 11.2) that experimental errors follow a
normal (Gaussian) distribution (see § 4.2), but the nonparametrio
methods of §§ 11.0 and II. 7 should be UBed in preference to the Gaussian
ones usually (see § 6.2). Unfortunately, nonparametric methods are not
available (or at least not, at the moment, practicable) for analysis of
the more complex and ingenious experimental designs (see § 11.8 and
later ohapters) that have been developed in the context of normal
distribution theory. For this reason alone most of the chapters following
this one will be based on the assumption of a normal distribution (see
§§ 4.2 and 11.2).
When comparing two groups, the difference between their means or
medians was used to measure the discrepancy between the groups.
With more than two it is not obvious what measure to use and because
of this it will be useful to desoribe the normal theory tests (in whioh a
suitable measure is developed) before the nonparametrio methods.
This does not mean that the former are to be preferred. Tests of
normality are disoussed in § 4.6.

11.2. Assumptions Involved in the analysis of variance basad on


the Gaussian (normal), distribution. Mathematical models
for real observations
It was mentioned in § 10.7 that in order to see clearly the underlying
prinoiples of the t test (§ 9.4) and paired t test (§ 10.6) it is necessary
to postulate that the observations can adequately be described by a
simple mathematioal model. Unfortunately it is roughly true that the
more complex and ingenious the experimental design, the more
complex, and less plausible, is the model (see § 11.8).
In the case ofthe two-sample t test (§§ 9.4 and 11.4) and its k sample
analogue it is assumed that (1) the observations are normally distri-
buted (§ 4.6), (2) the normal distributions have population means PI
(j = 1,2, ... ,k) which may differ from group to group (e.g. PI for drug
.A. and /A2 for drug Bin § 9.4), (3) the population standard deviation,
til is the same for every sample (group, treatment), i.e. til = tl2 = ...
= tile = tI, say (it was mentioned in § 9.4 that it had to be assumed
that the variability of responses to drug .A. was the same as that of
responses to drug B), and (4) the responses are independent of eaoh
other (independence of responses in different groups is part of the
assumption that the experiment was done on independent samples, but
in addition the responses within each group must not affect each other
in any way, of. discussion in § 13.1, p. 286).

Digitized by Google
§ 11.2 How 10 deal with two or more llamplu 173
TAt culditiw model
These assumptions can be summarized by saying that the ith
observation in the jth group (e.g. jth drug in § 9.4) can be represented
as 1Iu = I'I+e ll (i = 1.2 •...•nl. where n l is the number of obaervations
in the jth group-reread § 2.1 if the meaning of the subacripts is not
clear). In this expreesion the 1'1 (constants) are the population mean
responses for the jth group. and ell' a random variable. is the error
of the individual observation. i.e. the dift"erence between the individual
obeervation and the population mean. It is assumed that the eu are
independent of each other and normally distributed with a population
mean of zero (so in tAt lotag runt the mean 1111 = 1'1) and standard
deviation (/. Usually the population mean for thejth grouP. 1'1' is written
as I'+TI where I' is a constant common to all groups and TI is a con-
stant (the treatmem effect) characteristic of the jth group (treatment).
The model is therefore usually written
(11.2.1)
The paired t test (§§ 10.6 and 11.6). and its k sample analogue
(§ 11.6). need a more elaborate model. The model must allow for the
po88ibility that. &8 well &8 there being systematic dift"erences between
samples (groups. treatments). there may also be systematic differences
between the patients in § 9.4. i.e. between blocb in general (see § 11.6).
The analyses in § 11.6 &88ume that the observation on the jth sample
(group. treatment) in the ith block can be represented &8
(11.2.2)
where I' is a constant common to all observations. PI is a constant
characteristic of the ith block, TJ is a constant, &8 above. characteristic
of the jth sample (treatment). and eu is the error. a random variable.
values of which are independent of each other and are normally distri-
buted with a mean of zero (80 the long run average value of 1111 is
Jl+PI+TJ)' and standard deviation (/. This model is a good deal more
restrictive than (11.2.1) and its implications are worth looking at.
Notice that the components are supposed to be additive. In the case
of the example in § 10.6. this means that the differtnCtA between the
responses of a pair (block in general) to drug A and drug B are supposed
to be the same (apart from random errors) on patients who are very
sensitive to the drugs (large PI) &8 on patients who tend to give smaller
t In the notation of Appendix I, E(e) = 0 ao E{y) = E(p/HE(e) = PI from (11.2.1).
And E(y) = E(p)+E(/I.HE("/HE(e) = P+/I'+"I from (11.2.2).

Digitized by Google
174 § 11.2
response8 (8mall P.). Likewise the difference in response between
any two patients who receive the same treatment, and are therefore in
different pairs or blocks, is supposed to be the same whether they
receive a treatment (e.g. drug) producing a large observation (large 1'/)
or a treatment producing a 8mall observation (small 1'/). These remarks
apply to differtnCU between responses. It will not do if drug A always
give8 a re8ponse larger than that to drug B by a constant percentage,
for example.
Consider the first two pairs of observations in Table 10.1.1. In
the notation just defined (see al80 § 2.1) they are 1111 = +0·7, 1112
= +1·9,1121 = -1·6, and 1122 = +0·8. The first difference is &88umed,
from (11.2.2), to be 1112-1111 = (P+P1+1'2+e12)-(P+P1+1'l+e11)
= (1'2-1'l)+(e12-e11) = +1·2. That is to say, apart from experimental
error, it measures onlll the difference between the two treatments
(drugs), viz. 1'2-1'1 whatever the value of Pl. Similarly, the second
difference is 1122-1121 = (1'2-1'l)+(t:a2-t:a1) = 2·4, which i8 al80 an
estimate of exactly the same quantity, 1'2-1'1; whatever the value of P2,
i.e. whatever the sensitivity of the patient to the drugs.
Looking at the difference in response to drug A (treatment 1) between
patients (blocks) 1 and 2 8hoW8 1111-1121 = (P+P1+1'l+e11)-(P+P2
+1'l+t:a1) = (P1-P2)+(e11-t:a1) = +0·7-(-1·6) = +2·3, and sim-
ilarly for drug B 1112-1122 = (P1-P2)+(e 12 -t:a2) = 1·9-0·8 = 1·1.
Apart from experimental error, both estimate only the difference
between the patients, which i8 &88umed to be the same whether the
treatment i8 effective or not.
The best estimate, from the experimental results, of 1'2-1'1 will be
the mean difference, J = 1·58 hours. H the treatment effect i8 not the
same in all blocks then block X treatment interactions are said to be
present, and a more complex model is needed (see below, § 11.6 and,
for example, Brownlee (1956, Chapters 10, 14, and 15».
This additive model i8 completely arbitrary and quite restrictive.
There i8 no reason at all why any real observations should be repre-
sented by it. It is used because it is mathematically convenient. In the
case of paired observations the addivity of the treatment effect can
easily be ohecked graphically because the pair differenoes should be
a measure of 1'2-1'1' as above, and should be unaffected by whether
the pair (patient in Table 10.1.1) is giving a high or a low average
response. Therefore a plot of the difference, d = lIA -liB' against the
total, lIA+lIB, or equivalently, the mean, for each pair should be apart
from random experimental errors, a 8traight horizontal line. This

Digitized by Google
§ 11.2 How to deal with two or more samplu 175
plot, for the results in Table 10.1.1, is shown in Fig. 11.2.1. No system-
atic deviation from a horizontal line is detectable with the available
results but there are not enough observations to provide a good test
of the additive model. For methods of checking additivity in more
complex experiments see, for example, Bliss (1967, p. 323-41).

"
3
::
~ ®

2
®

®
® ® ®
®
®

0
-2 -I
.0 2 5 6 7 8 9
3
" Y.+YA
10

FlO. 11. 2.1. Teet or additive model ror two way a.na1ysis or variance with
two aamples (i.e. paired t teet). Pair difterences are plotted. against pair sum
(or pair mean).

H omogemity oj error
In the models for the Gaussian analysis of variance all random errors
are pooled into the single quantity, represented by e in (11.2.1) and
(11.2.2), which is supposed to be normally distributed with a mean of
zero and a variance of a2. In other words, if observations could be
made repeatedly using a given treatment (e.g. drug) and block (e.g.
patient) the scatter of the retrults would be the same whatever the
size of the observation and whatever treatment was applied. This means
that the scatter of the observations must be the same for every group
(sample, treatment) for experiments with independent samples,
represented by (11.2.1).
13

Digitized by Google
176 The aftalysi8 0/ variance §11.2
To test whether the variance estimates calculated from each group
can reasonably all be taken to be estimates of the same population
value, a quick test is to calculate the ratio of the largest variance to
the smallest one, 8~8~. This test assumes that the k samples are
independent and that the variation within each follows a normal
distribution. Under these conditions the distribution of 8~;arum is
known (when k = 2 it is the same as the variance ratio, see § 11.3). For
the results in Table 11.4.1 it is seen that k = 4, 8~B:nin = 158·14/
34·29 = 4'61, and each group variance is based on 7-1 = 6 degrees of
freedom. Reference to tables (e.g. Biometrika tables of Pearson and
Hartley (1966 pp. 63-7 and Table 31» shows that a value of '8~
orrum of 10·4 or larger would be expected to occur in 5 per cent of
repeated experiments if the 4 independent samples of 7 observations
were all from a single normally distributed population, and therefore
all had the same population variance-the null hypothesis. Thus
P > 0·015 and there are no grounds for believing that population
variance is not the same for all groups, though the test is not very
sensitive. The tables only deal with the case of k groups of equal size.
If the sizes are not too unequal the average number of degrees of
freedom can be used to get an approximate result.

U8e 0/ trafUJ/ormationa
If the original observations do not satisfy the assumptions, some
function of them may do so, although you will be lucky if you have
enough observations to find out which function. Aspects of this problem
are discussed in Bliss (1967 pp. 323-41), §§ 4.2, 4.5, 4.6 and 12.2.
For example, suppose the observations (I) were known to be log-
normally distributed (see § 4.5) and (2) were represented by a multi-
plicative model (e.g. one treatment always giving say 50 per cent greater
response than another, rather than a fixed increment in response) and
(3) had standard deviations that were not constant, but which were
proportional to the treatment mean (i.e. each treatment had the same
coefficient of variation, eqn (2.6.4». In this case the logarithms of the
observations would be normally distributed with constant standard
deviation, and would be represented by an additive model. The
constancy of the standard deviation follows from eqn (2.7.14). Therefore
the logarithm of each observation would be taken before doing any
calculations.
If the standard deviation for each treatment group is plotted against
the mean as in Fig. 11.2.2 the line should be roughly horizontal.

Digitized by Google
§ 11.2 How to deal with two or more 8amplu 177
This can be tested rapidly using sWaZmin &8 above, given normality.
H it is a straight line p&88ing through the origin then the coefficient
of variation is constant and the logs of the observations will have
approximately constant standard deviation &8 just described. H the
line is straight, but does not pass through the origin, &8 shown in Fig.
11.2.2, then 110 = alb (where a is the intercept and b the slope of the

Means of treatment groupe it

FlO. 11. 2.2. Transformation of obeervatioDB to a acale fu11Uling the require-


ment of equal scatter in each treatment group. See text.

line, &8 shown) should be added to each observation before taking


logarithms. It will now be found that log (1I+1Io) h&8 an approximately
constant st&ndard deviation, though this is no reason to suppose that
this variable fulfils the other assumptions of normality and additivity.
It is quite po88ible that no transformation will simultaneously
satisfy all the assumptions. Bartlett (1947) discusses the problem.
A more advanced treatment is given by Box and Cox (1964). In the
discussion of the latter paper, NeIder remarks 'Looking through the
corpus of statistical writings one must be struck, I think, by how
relatively little effort h&8 been devoted to these problems [oheoking of
assumptions]. The overwhelming preponderance of the literature
consists of deductive exercises from a priori starting points . . .
Frequently these prior assumptions are unjustifiably strong and amount

Digitized by Google
178 The analYN 01 variance § 11.2
to an assertion that the scale adopted will give the required additivity
etc.' A good discussion is given by BliM (1967 pp. 323-41).

What sort 01 model is appropriate lor yoo,r experiment -fixed elJect8 or


random elJect8!
In the discussion above, it was stated that the values of p, and TJ
were constants characteristic of the particular blocks (e.g. patients)
and treatments (e.g. drugs) used in the experiment. This implies that
when one speaks of what would happen in the long run if the experiment
were repeated many times, one has in mind repetitions carried out with
the same blocks and the same treatments as those used in the experi-
ment (model 1). Obviously in the case of two drugs the repetitions would
be with the same drugs, but it is not 80 obvious whether repetitions
on the same patients are the appropriate thing to imagine. It would
probably be preferred to imagine each repetition on a different set of
patients, each set randomly selected from the same population, and in
this case the p, will not be constants but random variables (model 2).
It is usually &88umed that the p" as well as the e'J' are normally dis-
tributed, and the variance of their distribution (0)2, say) will then
represent the variability of the population mean response for individual
patients (blocks) about the population response for all patients (0)2 is
&88umed to be the same for all treatments). Compare this with a2,
which represents the variability of the individual responses of a patient
about the true mean response for all responses on that patient (a2 is
&88umed to be the same for all patients and treatments).
The distinction between models based on fixed effects (model 1)
and those based on random effects (model 2) will not affect the inter-
pretation of the simple analyses described in this book all long all the
Bimple additive model (8UCA all (11.2.2» i8 (UJ8'fJ,meO, to be valid, i.e. there
are no interactions. But if interactions are present, and in more complex
analyses of variance, it is essential for the interpretation of the .esult
that the model be exactly specified because the interpretation will depend
on knowing what the mean squares, which are all estimates of a2 when
the null hypothesis is true, are estimates of when the null hypothesis is
not true. In the language of Appendix I, the first thing that must be
known for the interpretation of the more complicated analyses of
variance is the expectations 01 the mean. Bq'U4res. These are given, for
various common analyses, by, for example, Snedecor and Cochran
(1967, Chapters 10-12 and 16) or Brownlee (1965 Chapters 10, 14, and
15). See al80 § 11.4 (p. 186).

Digitized by Google
§1 How
11.3. The distribution of the variance ratio F
This section describes the generalization of Student's t distribution
(§ 4.4) that is necessa.ry to extend the two-sample normal distribution-
based tests (see §§ 9.4 and 10.6) to more than two (k in general) samples
of with normally errors (see §§
The t teSt23 £:~Hd 10.6 will be
&'Y',ore general methr)h23,
the t test f023 samph523
discrepancy between the samples was measured by tb23
between the sample means, and if this was large enough, compared
with experimental errors, the null hypothesis that the two population
means were equal, i.e. that both samples were samples from the same
parent population, with variance cr,
was rejeoted. If it is required to
test that more 2323tmples are all
therefore have rf±ean and VILI:lt±r,±±rr~
between
didhrence that Wee
measure of SCehhe23 He23mal distribution
standard deviation, and it turns out that the standard deviation of the
k sample means is a suitable generalization of the difference between
two means.
The sensibleness of this is made apparent when it is realized that the
difference between two figures is their standard deviation, apart from a
two observehirr71e, Y2' What is
l:(y_g)2
=1A+~
- -(h~ I ~+2YIY2)/2 hYIY2)/2 = (Yl
standnrh deviation of the two dgures is tde square root od

(11.3.1)

where the subscript 1 indicates that this is a standard deviation based


on degree of freedGrrL defined (see §§
where x is ditc~ributed, with
deviation 8,[71] say, / degrees
(/ If the varirthlrr h,crftest is x = Yl
wishes to test the null hypothesis the population valuest of Y1 and
Y2 are equal, i.e. f' = 0, then one calculates t = (Yl -Y2)/S[Y1 -Y2]. Now
t In the language of Appendix I, l' = E[Yl - Ya] = E[yl] - E[ya] = O.

gle
180 § 11.3
111 and 112 are assumed to be independent, and both are assumed to
have the same variance, of which an estimate ~] baaed on I degrees of
freedom is available (calculated, for example, from the variability
within groups as in § 9.4 if 111 and 112 were group means). The estimated
variance of 111 -112 is therefore, by (2.7.3), r[y1 -yal = r[y1]+r£9a]
= 28~], 80 Bf1I1-yal = Y28rf11]. Using these values gives

t _ Y1 -Y2 _ Y1 -Y2
1- Bf1I1-yal - y28rf11]

81U1]
= 8rf11]' by (11.3.1). (11.3.2)

Thus, if the null hypothesis (p = 0) is true, t 2 is seen to be the ratio of


two independent estimates of the same population variance (see cr
above), that in the numerator having one degree of freedom (in § 9.4
this was found from the di1ference between the sample means, g.l and
g.2), and that in the denominator having I degrees of freedom (in
§ 9.4 this was found from the di1ferences withi" samples). Compare this
approach with the discussion ofpredioted variances in § 2.7. In §§ 9.4
and 11.4 it is predicted from the observed scatter within samples what
the scatter between samples could reasonably be expected, and the
prediction compared with the observed scatter between samples.
Now the ratio of two independent estimates of the same population
variance is called the variance ratio and is denoted F (after R. A. Fisher
who discovered its distribution when the population is normally
distributed). Hthe estimate in the numerator hasl1 degrees of freedom,
and the estimate in the denominator has 12 degrees of freedom then F
is defined as

(11.3.3)

From (11.3.2) it is immediately seen that t 2 with I degrees of freedom is


simply the speoial caset of the variance ratio with It = 1 degree of
freedom in the numerator, i.e. t'
= 8U8' = F(I,/). Because the
variance in the numerator can be found from (and used as a measure of
the discrepenoy of) k sample means, F is the required generalization of
Student's t. Numerical examples occur in §§ 11.4 and 11.6
t It is worth noting. in paBBing. that chi.squared distribution is another special caM
of the variance ratio. Since i' with J degreM of freedom is the distribution of J
( _ § 8.5) it follows that i'lf is simply F(f.oo). the population variance'" being an
.-,as
estimate with 00 d.f.

Digitized by Google
§ 11.3 How to deal with two or more 8amplu lSI
Imagine repeated samples drawn from a single population of normally
distributed observations. From a sample of A+ 1 observations the
sample variance, "" is calculated as an estimate of the population
variance (with 11 degrees of freedom). Another independent sample
of/2+ 1 observations is drawn from the same population and ita sample
variance, 81., is also calculated. The ratio, F, of these two estimates
of the population variance is calculated. If this prooesa were repeated
very many times the variability of the population of F values 80
0·7

0·6

0·1 10 per cent of area

01 5
F(oI.6)

Flo. 11. 8.1. Distribut.lon of the variance ratio when there are , degreee of
fieedom for the estimate of II' in the numerator and 6 degreee of fieedom for
that in the denominator. In 10 per cent of repeated experiments in the long run,
the ratio of two moo estimates of the .me variance (null hypotheaia true) will be
8·18 or larger. The mode of the distribut.lon is lea than 1, and the mean is greater
than 1.

produoed would be described by the distribution of the variance ratio,


tables of which are available. t The tables are more complicated than
those for the distribution of t because both 11 and 12 must be specified
as well as F and the corresponding P, 80 a three-dimensional table is
needed. An example of the distribution of F for the case when 11 = 4,
t For example, (a) Fisher and Yatea (1963), Table V. In tbeee tab_ values of F are
given for P = 0'001, 0'01, 0'06, 0'1, and 0·2 (the '0·1 per cent, etc. peroentap pointe'
of F). The degrees of freedom /1 and /. are denoted fit and "a, and the variance ratio Fill,
for largely hiIItorioa11'1l81101U1, oal1ed ea. (the tab1ee of:l on the facing p8Ie8 mould be
ignored). (b) Pearson and Hartley (1966), Table 18 give values of F for P = 0·001
0-006,0'01,0'025,0'06,0'1, and 0·25. The degrees of freedom are denoted 1'! and ....

Digitized by Google
182 The analyN 01 variance § 11.3
and 12 = 6 i8 8hown in Fig. 11.3.1. Reference to the tables show8 that
in 10 per cent of repeated experiments F will be 3.18 or larger, as
illustrated. The distribution has a different shape for different numbers
of degrees of freedom, but it i8 alwaY8 positively skewed 80 mode and
mean differ (see § 4.5). Since numerator and denominator are estimates
of the same quantity values of F would be expected to be around one.
r
As in the case of (see § 8.5), deviations from the null hypothesis in
either direction tend to increase the value of F (because squaring
makes all deviations positive), 80 the area in one tail of the F distribu-
tion, as in Fig. 11.3.1 i8 appropriate for a two-tail test (see § 6.1) of
significance in the analY8i8 of variance. This can be seen clearly in the
case of the t teat. In § 9.4 it was found that the probability that t 18 will
be either 1688 than -2·101 or greater than +2.101 was 0·05. Either of
these possibilities implies that t 1:a.. ~ 2.1012, i.e. F(I,18) ~ 4·41.
Reference to the tables of F with 11 = 1 and 12 = 18 show8 that
F = 4·41 cuts off 5 per cent of the area in the upper tail of the di8tribu-
tion, the same result as the two-tail t test.

What to do il the variance ratio is leu tM!n one


When the null hypothesis is true F would be expected to be 1688 than.
one quite often but the tables only deal with values of one or greater
In this case look up the reciprocal of the variance ratio whioh will now
have!2 degrees of freedom for the numerator and!1 for the denominator.
The re8ulting value of P i8 the probability of F equal to or lu8 tha1/. the
original observed value. For example, if F(4,6) = 0·25, then look up
F{6,4) = 1/0,25 = 4·0 with 6 d.f. for the numerator and 4 for the
denominator. From the table8 it i8 found that P[F(6,4) ~ 4'0] = 0·1,
and therefore P[F(4,6) ~ 0'25] = 0·1. 10 per cent of the area lies
below F = 0·25 in Fig. 11.3.1. The probability required for the analyBiB
of variance is P[F(4,6 > 0·25] = 1 - 0·1 = 0·9. If a variance ratio is
observed that is 80 small as to be very rare, it can only be &88umed that
either a rare event has happened, or else that the &88umptions of the
analyBia are not fulfilled. Deviations from the null hypotheBia can only
result in a large varianoe ratio.

11.4. Gaussian analysis of variance for k independent samples


(the one way analysis of variance). An illustration of the
principle of the analysis of variance
The use of the variance ratio distribution to extend the method of
§ 9.4 to more than two sample8 will be illustrated using observations on

Digitized by Google
§ n.' How to deal with two or more 8ampUa 183
the blood 8ugar level (mg/lOOml) of 28 rabbits shown in Table n.'.!'
As umal, the rabbits are 8upposed to be randomly drawn from a
population ofrabbits, a.nd divided into four groups in a 8trictly random
way (see § 2.3). One of the four treatments (e.g. drug, type of diet, or
environment) is &88igned to each group. Is there any evidence that the
treatments affect the blood 8ugar level1 Or, in other words, do the

TA.BLE 11.4.1
Blood sugar level, (mg/lOOml)-IOO, in lour gt'O'UpB 01 8et1eft. rabbiU.
8u § 2.1 lor explafllJtion. 01 tWlation.. The flguru in parmtlauu are the
ranD aM rank BUms lor use in § 11.6

1 2
Treatment (J)
8 ,
17 (IOi) 87 (26) 86 (22i) 9 (6)
16 (9) 86 (2') 22 (16) 8 (')
28 (18) 21 (lSi) 86 (22i) 17 (101)
, (8) 18 (7) 88 (26) 18 (12)
21 (18i) '6 (28) 81 (19) 1 (2)
o (1) as (10i) 8' (201) 8' (201)
28 (10i) 18 (7) '0 (27) 18 (7)

Total Grand total


'-7 '£ 71.,
.-1
T., = l: 7111 109 (71·6) 188 (121) 286 (162,6) 100 (61) G =

=
Ie
l:
i-1.-1
682
Mean Grand mean
T.,
g., = - 15'571 26'867 38·671 U·286 g.• = 682/28
taJ = 22'57U

.1,
Variance
102'96 168'U 8',29 109·2'

four mean level8 differ by more than reasonably could be expected if the
null hypothe8is that all 28 observations were randomly selected from
the 8&me population (so the population means are identical) were true 1
The a88umptions concerning normality, homogeneity of variance8,
and the model involved in the following analY8i8 have been di8cU8&8d in
§ 11.2 which 8hould be read before thi8 section. Although the large8t
group variance in Table 11.4.1 i8 4·6 times the 8malle8t, this ratio is not
large enough to provide evidence against homogeneity of the variance8,
&8 shown in § 11.2. Tests of normality are discussed in § 4.6.

Digitized by Google
184 The analy8i8 of variance § 11.4
The nonparametric method described in § 11.5 should generally be
preferred to the methods of this section (see § 6.2).
The following discussion applies to any results consisting of Ie
independent groups, the number of observations in the jth group being
"I (the groups do not have to be of equal size). In this case Ie = 4 and
all n l = 7. The observation on the ith rabbit and jth treatment is
denoted Yfj' The total and mean responses for the jth treatment are
T.I and Y.I' See § 2.1 for explanation of the notation.
The arithmetic has been simplified by subtracting 100 from every
observation. This does not alter the varianoes calculated in the analysis
(see (2.7.7», or the differenoes between means.
The problem is to find whether the means (9'/) di1fer by more than
could be expected if the null hypothesis were true. There are four
sample means and, as discussed in § 11.3, the extent to which they differ
from each other (their scatter) can be measured by calculating the
variance of the four figures.
The mean of the four figures is, in this case, the grand mean of all
the observations (this is true only when the number of observations is
the same in each group). The sum of squared deviations (SSD) is
1-'
I (9.I-g ..>2 = (15·571-22·571)2+(26'857-22·571)2+
+(33·571-22·571)2+(14·286-22·571)2
= 257·02.

The sum of squares is based on four figures, i.e. 3 degrees of freedom,


80 the variance ofthe group means, calculated direotly (cf. § 2.7) from
their observed scatter is
257
- = 85·6733
4-1 .

Is this figure larger than would be expected 'by chance', i.e. if the
null hypothesis were true' It would be zero if all treatments had
resulted in exactly the same blood sugar level, but of course this
result would not be expected in an experiment, even if the null hypo-
thesis were true. However, the result that 'I.lJO'lIld be expected can be
predicted, because the null hypothesis is that all the observations come
from a single Gaussian population. If the true mean and variance of
this hypothetical population were p. and a2 then group means, which are
means of 7 observations, would be predicted (from (2.7.8» to have
variance a2/7. If another estimate of a2, independent of differences

Digitized by Google
11.4 to deal with 18Jb
between treatments, were obtainable then it would be possible to see
whether this prediction was fulfilled. How this is done will be seen
shortly.
With greater generality, suppose that" groups of n. observations are
be compared" example = 7. If all thu
t:)lbservations wen, uingle population nli%7innoe cr then thu
nananoe of groub E?hf"uld be crln., 80 of the means
(h6·673) in the ca.loulated their observed
E?E?ntter about thu should be an crln, caloulated
from differences between groups. Multiplying through by n., it therefore
foUows that
I-Ie
n. I (9.1-9 ..>2
(k-l)

the null hme.


Thus in the present example 7 X 85·673 = 599·71 is an estimate of
cr. It can be shown that if the number of observations were not equal,
say nl observations in the jth group, then this expression would become

I-Ie
} n.I (9 .1 -.i
"'= ;, .. )2
i~l _ _ __ (11.4.1)
k-l

However, if tbu means of thu gr"ups n?("re n.ot all thu


same, i.e. if the null hypothesis were not true, then the above expression
would not be an estimate of cr. Its numerator would be inflated by
the real differences between means 80, on the average, the estimate
caloulated from differences between groupB would be an estimate of
E?"fDmething largeE? The expectation oh bfDinE?""n-groups mean
E?fhuare will be cr if the null not true, see
186.
To test whethfDu bappened an Jbstimate of cr,
nfft dependent that tIn (p'uhulation) group
means are equal, is needed. This can be obtained from differences witkin
groupB. The estimate of cr oalculated from within the jth group-
simply the estimated variance of the group-is, as usual, found by

gitized by Goo
11.4
summing the squares of deviations (SSD) of the individual observations
in a group from the mean for that grO'Up. Thus, for the jth group,

SSD within jth group


s2[y] = -de-grees--o-=-f------"--=---jt-==-h-gr-ou-p
all groups have the sam4:
about the variEli,Eli,4:Cli4: ~~ll groups can
4:Cli4:4:imate. This is dOn4: ,;Z,,,,,,"',,,ou, the total su:m
,;Z"",,,,,,,,,,a of freedom,
I-k
I (SSD within the jth group)
s2r.,] = ~/-=1 _ _--;--; _ _ _ _ __
La I=k
I (nl-l)
1-1

iI
1=1/-1
(YII-fj
- '---"~N=--$fu.~~- (2.1.8) and (20 1.4.2)
k
I nl is the of observato,{':&,,,o the
1-1
required estimate of a2 calculated from differences within groups. In
the present example its value is 2427'714/(28-4) = 101·155. An easy
method of calculating the numerator is given below.
Furthermore, if all N observations were from the same population,
a2 oould be estimated from the sum of squared deviations (SSD) of all
28 in this from thei" grand
rihus, using
riotal SSD = I = l: l:yjj2_(l: 1.4.3)
I
= (17"+ 16"+ ... +34 2 + 13 2 )-(632)"/28.

Tabulation of the results


It has been shown that (11.4.1) and (11.4.2) would both be estimates
of (12 if the null hypothesis were true, so their ratio would follow the
di4:4:4:iln¥toion (see § is a differen{,n gnEli,Ups
) will be observed F WEli,zn,ld be,
e"'n~:,,"tation (long-run n"""ndix 1) of MS. in n",bn nE[MS.l
general E[MSbl ,,~f)l/(k-l) for model
(11.2.1) diacUllll8d in § 11.2, BO if the nuJ hypothesis, that all the "1 are the same, is
true, E[MSbl = IT'a1ao. For the random effects model (_ p. 178) E{MSbl = IT'+nwI
when all groups are the same size (n). And the null hypothesis is that w2 = 0, in which
_ again E[MSbl = IT'. (See Brownlee (1965, pp. 310 and 318).)

gle
§ 11.4: How to deal with two or more _mpU8 187
on the null hypothesis, improbably large. It is shown below that the
sums of squares in the numera.tors of (11.4:.1) and (11.4:.2) add up to
the total sum of squares (11.4.3). Furthermore, the number of degrees
of freedom for between groups (11.4:.1) comparisons, k-I, and that for

11.4:.2
TA.BLB

Gemral one-way analy8i8 oj variance. N alice that iJ the .ull hypothe8i8


were tru.e all the ftguru i. the mea. BqVIJre column would be utimatu oj
the same quantity (as). Mean BqVIJre is jut a30ther name Jor variance
(8U § 2.6). Putting in the figuru in the example gitJe8 Table 11.4.3
Source of Sum of Mean equa.re Variance P
variation d.f. Bquaral (or variance) ratio

Ie
Between k-l 1:"I<9.I-g··)11 SSD/d.f. = MSII F = MStJMS.
groupe J

Within N-k let


1: (7III-g./)1I SSD/d.f. = MS.
groupe J «

Total N-l let


1: (7III-g"y~
J «

11.4.3
TA.BLE
AnalyaiB oj the rabbit blood BUlJar level ob8ervatiooB in Table 11.4.1
So_of d.r. Sum or Mean Variance P
variation sqUareB square ratio

Between
treatment. '-1- 3 1799,1'3 1799,1'3 = /J99.71
3
~=/J'93 <000011
101-111'
Within lI'lI7'71' = 101-11111
treatment. lIS-' - 2' lI'lI7'71'
lI'

Total lIS-l = lI7 'lIlI8·SI17

within grOUps (11.4.2) comparisons, 1:(11.1-1) = N -k, add up to the


total number of degrees of freedom.
These results can be written out as an analy8i8 oj variance table.
All a.nalysis of variance tables have the same column headings, but the
sources of variation considered depend on the design of the experiment.
For the one-way analysis the general result is as in Table 11.4.2.

Digitized by Google
188 The analllBiB 0/ variance § 11.4
If'the null hypothesis were true 599·71 and 101·16 would both be
estimates of the sa.me variance (002). Whether this is plausible or not is
found by referring their ratio, 5·93, to tables of the distribution of
F (see § 11.3) with A = 3 and /2 = 24 degrees of freedom. This
shows (see Fig. 11.3.1) that a value of F(3,24) = 5·93 would be exceeded
in only about 0·5 per cent of experiments in the long run (if the assump-
tions made are satisfied). Therefore, unle88 it is preferred to believe
that a 1 in 200 chance has come off, the premise that both 599·71 and
101·16 are estimates of the same variances will be rejected, and this
implies that all the observations cannot have come from the same
population, i.e. that the treatments really differ in their effect on blood
sugar level (see § 6.1).
Notice that this does not say anything about which treatments
differ from which others"":""whether all treatments differ, or whether
three are the same and one different for example. The answering of this
question raises some problems, and a method of doing it is disoussed in
§ 11.9. It is not correct to do t tests on all poBBible pairs of groups.

A practical method lor calculating the sum 01 8fJ1I4res


A form of (11.4.1) that is more convenient for numerical computations
can be derived along the same lines as (2.6.5). The SSD between groups,
from (11.4.1), is
I-Ie Ie
~ n1.1
~
(9 -9 .. )2 -- ~
~
(nI;,fi .12-2n. .i . . +nI fJ ..2)
I;,i .1;,
1-1 1-1
Ie Ie Ie
= I nI9}-2fJ .. I nlfJ.I+fJ':. I nl· (11.4.4)

In this expreBBion consider


(a) the first term: because fJ.1 = T.ilnl (see Table 11.4.1 and § 2.1)
the first term can be written

(b) the second term: again substituting the definition of fJ.1 shows
that the second term is 2fJ ..'J:.nifJ.I = 2fJ ..J:.T.1 = 29 ..G = 2G2/N,
because IT.I = sum of group totals = grand total, G, and fJ .. = GIN;
I
(c) the third term: because the sum of the group numbers Inl = N,
I

Digitized by Google
§ 11.4 How to deal with two Of' more IItJmplu 189
the total number of observations, and g.. = GIN, the third term is
gJ ..I-J = Ng.:~ = (PIN.
J
Substitution oftheee three results in (11.4.4) gives

SSD between groups = f ('l'})_ N(P,


1-1 -I

i.e., writing out the summation term by term,


'1' 2 '1' 2 '1' 2 (P
---.!!+---=.!+ ... +-2-N
SSD between groups =
-1 -2
and this is the usual working formula. The formula for the total sum
fa"
, (11.4.5)

of squares, (2.6.5), can be regarded as the special case in which each


total '1' contains n = 1 observation. In the present example
loga 1882 2352 1()()2 (632)2
SSD between groups = -7-+-7-+-7-+-7--28

= 1799·143

as shown in Table 11.4.3.


The SSD within groups can now be found most easily by difference
SSD within groups = total SSD-between-groups SSD (11.4.6)
= 4226·857-1799·143 = 2427·714,
as in Table 11.4.3.

A digre88Wn to 8how thal sum 0/ 8tJ.'114res between aM with,_ groups must


add up to the total sum 0/8quares
Consider the totaleum of squared deviations (SSD) of the observations about
the grand mean (11.4.1), namely 1:(111/-17.,,1. At ftrat consider the SSD of the
observations in the jth group about the grand mean, i.e.

~ (1I1/-17,Jl1 = ~ [(III/-17./H(Y.I-Y.J]I
1-1 1-1

= ~ [(IIII-17./)I+2(1I1/-17./)(17.,-17.. H(17.I-17.,)I]
1-1

= ~ (1I1I-17./)I+2(17.I-il.J ~ (1I1I-ii./H ft /(17.I-Y.J1


1-1 1-1

=~ (1I1I-17./)I+ ft/ (17.,-il.J1•


1-1

The last step in this derivation follows from (2.6.1), which shows that 1.:(111/-17./)
= 0, 80 the central term disappears. •

Digitized by Google
=

190 The analyBia oj variance § 11.4


If the above result is summed over the k groups, the required l'eIIult is obtained.
thus
(11.4.7)

groups groups
Tbus is a purely algebraic result and must hold for any set of numbers. but
unleaa the observations can really be represented by the postul&ted model
(see 111.2) the components will not have the simple interpretation impHed in the
'source of variation' column of Table 11 .•. 2.

The t te8t on the reault8 oj Ct,uih""y and Peeblu written as a"" analyBia oj
variance
The calculations in § 9.4 can usefully be written as an analysis of
variance on the lines just described, with k = 2 independent groups.
The results and necessary totals are given in Table 10.1.1. Again refer
to § 2.1 if in doubt about the notation.
The first step is usually to calculate 02/N as it appears several times
in the calculations. This quantity is often called the correction Jadof'
Jor the mea"", because, from (2.6.5), it corrects l:y2 to l:(y_y)2.
From Table 10.1.1:
(a) correction factor G2/N = (30·8)2/20 = 47·4320; (11.4.8)
(b) total sum of squares, from (2.5.6) (cf. (11.4.3».
2 10
I I (YI/-ti ..>2 = l:l:ml- G2/N
lal 1-1
= 0'72+1.62+ ... +3.42-47.432

= 77·3680; (11.4.9)
(c) sum of squares between columns (i.e. between drugs A and B).
calculated from the working formula (11.4.5), is
2 T2 T2 0 2
I "/(Y.I_y ..>2 = ~+~--
1-1 ""1 ""2 N
7.52 23.32
= 10+10-47 '432

= 12·4820; (11.4.10)
and, as above, when divided by its number of degrees of freedom
this would give an estimate of a2 if the null hypothesis (that all
observations are from a single population with variance a2)
were true;

Digitized by Google
§ 11.4 HOUJ to deal witA ttOO Of' more «&mplu 191
(d) the sum of squares within groups can now be found by difference.
as in (11.4.6),
2 10
I I (YU-g./)2 ::z 77·3680-12·4820 = 64·8860.
1-1.-1

These results can now be assembled in an analysis of variance table,


Table 11.4.4, which is just like Tables 11.4.2 and 11.4.3.

TABLB 11.4.4
Sum of
Source of va.rla.tion d.f. squares MS F P
Between drugs 1 12·.820 12'.820 3·.63 0'1-0·05
Error (or within drugs) 18 6.'8860 3'60.7

Total 19 77'3680

Reference to tables of the distribution of F (see § 11.3) shows that, if


the assumptions disOUBBed in § 11.2 are true, a value of F(I,I8) equal
to or greater than 3·46 would occur in between 5 and 10 per cent trials
in the long run, if the null hypothesis, that the drugs are equi-e1fective,
were true, and if the assumptions of normality, etc. were true. This
is exactly the same result as found in § 9.4, and P is not small enough to
reject the null hypothesis. Because there are only two groups (k = 2).
F has one (= k-l) degree offreedom in the numerator, and is therefore
(see § 11.3) a value of t 2 • Thus V F = v(3'463) = 1·861 is a value of
t with 18 d.f., and is in fact identical with the value of t found in § 9.4.
Furthermore, the error (within groups) variance of the observations,
from Table 11.4.4, is estimated to be 3,605, exactly the same as the
pooled estimate of r(y) found in § 9.4. Table 11.4.4 is just another way
of writing the calculations of § 9.4.

11.&. Nonparametrlc analysis of variance for independent


samples by randomization. The Kruskal-Wallis method
As suggested in §§ 6.2 and 11.1, the methods in this seotion are
to be preferred to that of § 11.4.

The randomization method


The randomization method of § 9.2 is easily extended to k samples
and this is the preferred method, because it makes fewer assumptions
than Gaussian methods. The disadvantage. as in § 9.2. is that tables
14

Digitized by Google
192 The aMly8i8 oJ variance 111.5
cannot be prepared to facilitate the test, which will therefore be tedious
(though simple) to calculate without an electronic computer. AB in
§ 9.3, this can be overcome, with only a small loss in sensitivity, by
replacing the original observations by their ranks, giving the Kruskal-
Wallis method described below.
The principle of the randomization method is exactly 808 in §§ 9.2,
9.3, and 8.2 80 the arguments will not all be repeated. If all four treat-
ments were equi-effective then the observed differences between group
means in Table 11.4.1, for example, must be due solely to the way the
random numbers come up in the process of random allocation of a
treatment to each rabbit. Whether such large (or larger) differences
between treatments means are likely to have arisen in this way is
again found by finding the differences between the treatment means
that result from all possible ways of dividing the 28 observed figures
into 4 groupe of 7. On the assumption that, when doing the experiment,
all ways were equiprobable, i.e. that the treatments were allocated
strictly at random, the value of P is once again simply the proportion
of possible allocations that give rise to discrepancies between treatment
means 808 large as (or larger than) the observed differences. In 19.2 the
discrepancy between two means was measured by the difference
between them. AB explained in §§ 11.3 and 11.4, when there are
more than two (say k) means, an appropriate thing to do is to measure
their discrepancy by the variance of the k figures, i.e. by calculating,
for each possible allocation the 'between treatments sum of squares' 808
described in § 11.4.
An approximation to the answer could be obtained by card shuftling,
808 in § 8.2. The 28 observations from Table 11.4.1 would be written on
cards. The cards would then be shuftled, dealt into four groupe of
seven, and the 'between treatments sum of squares' calculated. This
would be repeated until a reasonable estimate was obtained of the
proportion (P) of shuftlings giving a sum of squares equal to or larger
than the value observed in the experiment. In fact, just as in § 9.2
it was found to be sufficient to calculate the total response for the
smaller group, so, in this case, it is sufficient to calculate "2:.T}/1I.1 for
each possible allocation, because once this is known the between
treatments sum of squares, or between-treatments F ratio, follows from
the fact that the total sum of squares is the same for every possible
allocation.
By a slight extension of (3.4.3), the number of possible allocations of
N objects into k groupe of size 11.1, 11.2''''11.". ("2:.11.1 = N) isN '/(11.1 Ins 1.•.11." I).

Digitized by Google
§11.5 How to deal with two or more aamplu 193
the Table vt! theE:i3 there:GE:i3ti 281/(717
4725183,l24424oo possi:Glti allocatic)¥iiCt, This is too mi3rlY
enumerate by hand (doing one every 5 minutes it would take about 20
thousand million normal working years), though it is easy to select a
:>i%f~fciently random of 1f1fith an ""'1f1f,,,"1i%',',n,
this is tivailablE:, reoommE::>iYtiYl procedilii3ti
",. m%.nr.., .. of inYependen2 samples is, just &8 in § 9.3, to replace the
observations by ranks allowing tables to be constructed. This is known
&8 the Kruskal-Wallis method.

Ie samp:Gc E:cE:ndomizatii3ff&
analysis 0/ variance
This is simply an extension to more than two (Ie, say) groups of
WilcoxE:n):>iciCt-samplE: (see abotiti, § 9.3), :GE:fore, thti
:GhYltithesis all N tihE:~tiE:ntitions aE:"1iti the saKi,E: lE:JpulatioIl,
i2 this is reject.1f,,:G the conclusion will t:Gat the poptilations diyer.
it is wished to conclude that the population medians differ then it must
be assumed that the underlying distributions (before ranking) for all
24i3uPS are same t:Gom thti no nA.11"t".H''''''%'

Treatment
A B c
Score

7 1
115 8 15 2 81 11
12 6 6 3 14 7
8 I) 22 10

Ra=

form of distribution is assumed except that it must be continuous (see


§ 4.1). This implies that the variance is assumed to be the same for all
YJC)Ups.
Ylgain E:lltithod can 1f1ippliedlhe obsenn2itins are
nnlnes not iiliilinnical mE:'2i3E:ntitiilients bi3t 01' SCOri:>i
must be reduced to ranks, &8 well as when they are numerical measure-
ments.
Yln N arti order, :Gtiing giVE:ti
,,,,nnrage rai, 4 in TablE: 1 sho:>iiCt
E:i3sults of an experiment in which 1 patients were c:GcviYed ranYomly
194 The analllN 01 mriance § 11.6
into Ie = 3 groups, each being given a different a.naJgesic drug (A, B,
and e). In each group the figure recorded is the total subjective pain
score recorded over a period of time by each patient. Such measure-
ments should not be treated like numerioal measurements but should
be ranked. The ranks are shown in the table together with the rank
sum. B I , for each group (j = 1,2, •.. k,). The number of observations in
each group is "'I and the total number is N = 1:"1 as in § 11.4.
The measure of the extent to which the treatments differ, analogous
to the rank sum of the smaller sample used in § 9.3; is the statistio H
defined as
12 *R2
H = N(N+l)~"'; -3(N+l) (11.6.1)
I-I
as long as there are not too many ties (see below). Notice that the
term 1:RlI"l makes H similar in oharaoter to a between-groups sum of
squares (11.4.5). For the results in Table 11.5.1, N = 11, = 4. "'I
~ = 3, "s = 4, Rl = 23, R2 = 6, and Ra = 37. Applying the oheck
(9.3.1) gives the sum of all ranks as N(N+l)/2 = 66 and in fact 23
+6+37 = 66. Using these values with (11.5.1) gives

12 (232 62 372)
H = - + - + - -3(11+1)
11(11+1) 4 3 4
= 8·277
.

Table A6 gives the exact distribution of H found, as in § 9.3, by the


randomization method. It shows that for sample sizes 4,4, and 3 (the
order of these figures is irrelevant) a value of H ~ 7·1439 would
occur in 1 per cent of trials (P = 0·01) in the long run il the null
hypothesis were true, therefore H = 8·227 must be even rarer, i.e.
P < 0·01. As in § 11.3 deviations from the null hypothesis in any
direction increase the size of H. Again, as in all analyses of variance,
this result does not give any information about differences between
individual pairs of groups (see § 11.9).
Example with larger sample8. Table A5 only deals with Ie = 3 groups
with not more than 5 observations in any group. For larger experi-
ments it is a sufficiently good approximation to assume that H is
distributed like chi-squared with Ie-I degrees of freedom. P can then
be found from the chi-squared tables (see § 8.5). For example. the
results in Table 11.4.1 have been converted to ranks, shown in paren-
theses. In this case N = 28, "'I "'2
= = "s = "'. = 7, Rl = 71·5.
R. = .121, Ra = 162·5, and R. = 61. Applying the check N(N + 1)/2

Digitized by Google
§ 11.5 How to deal with two or more _mplu 195
= 28(28+1)/2 = 406 and, correctly, Rl+~+R8+R. = 406. Thus,
from (11.5.1),
12 (7H~SI 1212 11)2,52 61 2)
H = 28(28+1) -7-+-7-+-7-+"'7 -3(28+1) = 11·66.

Consulting a table of the chi-squared distribution (Fisher and Yates


(1963, Table IV) or Pearson and Hattley (1966, Table 8» with k-I = 3
degrees of freedom shows that %(1) = 11·345 would be exoeed.ed in 1
per cent of experiments in the long run if the null hypothesis were
true, so P < 0·01 for the observed value of 11·66. This is somewhat
larger than the value of P < 0·005 found when the assumptions of the
Gaussian analysis of variance were thought justified (Table 11.4.3),
but it is still small enough to oaat considerable doubt on the null
hypothesis.
As in the GaUBBian analysis, the finding that all k groups are unlikely
to be the same says nothing about which groups c:Wfer from whioh
others. A method for testing all pairs of groups to answer this question
is described in § 11.9. It is not correct to do two-sample Wilcoxon
tests on all poBBible pairs.
OOfTeaion lor tiu. Unleaa there is a very large number of ties the
correction factor (described, for example, by Brownlee (1965, p. 256»
has a negligible effect. It always makes H larger. and hence P smaller
so there is no danger that neglecting the correction factor will lead to
rejection of the null hypothesis when it would otherwise not have been
rejected.

11.1. Randomized block designs. Gaussian analysis of variance


for k related samples (the two-way analysis of variance)
In §§ 10.1 and 6.4 it was pointed out that if the experimental units
(e.g. patients, periods of time) can be selected in some way to form
groups that give more homogeneous and reproducible responses than
units selected at random, then it will be advantageous if all the treat-
ments (k in number, say) to be compared, are compared on the units of
such a group. The group is known as a block. The units comprising the
block are sometimes, because of the agricultural origins of the design,
known as plot8. It must clearly contain as many (k) experimental
units as there are treatments, or at least a multiple of k. The k treat-
ments must be allocated strictly randomly (see § 2.3) to the k units of
each block. Because every treatment is tested in every block the
blocks are described as complete (of. § 11.8). This section deals with

Digitized by Google
196 § 11.6
randomized complete blook experiments when the observations are
described by the single additive model with normally distributed error
(11.2.2) described in § 11.2, which should be read before this section.
The analysis in § 10.6, Student's paired t teet, was an example of
a randomized block experiment with Ie = 2 treatments and 2 units
(periods of time) in each block (patient). This teet will now be reform-
ulated as an analysis of variance.

The paired t tut written as t.m analyBis oJ variance


The observations of Cushney and Peebles in Table 10.1.1 were
analysed by a paired t teet in § 10.6. At the end of § 11.4 it was shown
how a sum of squares for differences between drugs could be obtained,
but no account was taken of the block arrangement aotually used in the
experiment. As before (see §§ 11.3 and 11.4) the calculations are suoh
that iJ the null hypothesis (that all observations are from the same
GaUBBian population with variance as) were true, then the quantities
in the mean-square column of the analysis of variance would all be
independent estimates of as. t
Because of the symmetry of the design it is possible to obtain an
estimate of as, on the null hypothesis that there is no real difference
between blocks (patients), by considering deviations of block means
from the grand mean, i.e. by analogy with (11.4.1), from IIe(gI. -9 ..>2/
I
(n-l), where Ie = number of treatments = number of observations
per blook, and n = number of blocks = number of observations on
each treatment. Unlike the one-way analysis, n must be the same for
all treatments. N = len is the total number of observations. From
(11.4.6), it can be seen that the numerator of this (sum of squares
between blocks) can most simply be calculated from the block totals,
as the sum of squares between treatments was found from treatment
totals in (11.4.6). From the results in Table 10.1.1

SSD between blocks = L1-


1-11'1'2

1-1
(p
N (11.6.1)

2.62 0.82 6.42


= 2+2+'''+2-47.4320

= 68·0780.
t The expected value of the mean lIqu&rell ( _ § 11.2, p. 173 and 11.', p. 186) are
derived by Brownlee (1966, Chapter 1'). Often the mixed model in which treatment.
are fixed effect. and blocks are random effect. is appropriate (loc. cit., p. '98).

Digitized by Google
§ 11.6 How to deal witA ttOO or more 8CJmplu 197
In this, the group (row, block) totals, T t ., are squared and divided by
the number of observations per total just as in (11.4.5). See §§ 2.1 and
11.4 if clarification of the notation is needed. Since there 3 = 10 groups
(rows, blocks) this sum of squares has 3-1 = 9 degrees of freedom.
The values of (J2/N(47'4320), and of the sum of squares between
drugs (treatments, columns) (12·4820) and the total sum of squares
(77·3680), are found exactly as in (11.4.8Hll.4.10). The results are
asaembled in Table 11.6.1. The residual or error sum of squares is again

TABLB 11.6.1
The paired t tul oJ § 10.5 tDt"iItm (J8 a3 analllN oJ vanaftC8. The mea"
IKJ'fU'rea are Jourul by dividi1l{/ the BUm oJ IKJ'fU'rea by their d.J. The F ratw.
are the ratio oJ eacA mea3 IKJ'fU're to the error mea" IKJ'fU're
8umof Mean F P
Source of variation d.f. aquarea aquare

Between treatments (drop) 1 12·4820 12·4820 16·5009 0·001-0'006


Between blocks (patients) 9 68·0780 6·4681 8·681 0'001-0·006
Enor 9 6·8080 0'7664
Total 19 77·3680

found by difference (77·3680-58'0780-12·4820 = 6·8080), and 80 is


its number of degrees of freedom (19-9-1 = 9). The error mean
square will be an estimate of the variance of the observations, a2,
after the elimination of variability due to differences between treat-
ments (drugs) and blocks (patients), i.e. the variance the observations
would have because of 80urces of experimental variability if there were
no such differences. This is only true if there are no interactions and the
simple additive model (11.2.2) represents the observations. The other
mean squares would also be an estimate of a2 if the null hypothesis
were true, and therefore, when the null hypothesis is true the ratio of
each mean square to the error mean square should be distributed like
the variance ratio (see § 11.3). If the size of the F ratio is 80 large is to
make its observance a Tery rare event, the null hypothesis will be
abandoned in favour of the idea that the numerator of the ratio has
been inftated by real differences between treatments (or blocks).
The variance ratio for testing differences between drugs is 16·5009
with one d.f. in the numerator and 9 in the denominator. Reference to
tables of the F distribution (see § 11.3) shows that F(I,9) = 13·61

Digitized by Google
198 'I'he anoJyBi8 01 variance J 11.6
would be exceeded in 0·5 per cent, and F(I,9) = 22·86 in 0·1 per cent
of trials in the long run, if the null hypothesis were true. The observed
F fa.1ls between these figures so 0·001 < P < 0·005, just as in § 10.5.
As pointed out in § 11.3 (and exemplified in § 11.4), F with 1 d.f.
for the numerator is just a value of t a 80 V[F(I,9)] = V(16·5009)
= 4·062 = t(9), a value of t with 9 d.f.-exactly the same value, as
was found in the paired t test (§ 10.6). Furthermore, the error variance
from Table 11.6.1 is 0·7564 with 9 d.f. The error variance ofthe differ-
ence between two observations should therefore, by (2.7.3), be 0·7564
+0·7564 = 1·513, which is exactly the figure estimated directly in
§ 10.6.
If there were no real differences between blocks (patients) then
6·453 would be an estimate of the same a2 as the error mean square
0·756. Referring this ratio (8·531) to Tables (see § 11.3) of the distribu-
tion of the F ratio (with 11 = 9 d.f. for the numerator and la = 9 d.f.
for the denominator) shows that the probability of an F ratio at least
as large as 8·531 would be between 0·001 and 0·005, if the null hypo-
thesis were true.
This analysis, and that in § 11.4, show clearly why thad 18 d.f. in the
unpaired t test (§ 9.4) but only 9 d.f. in the paired t test (§ 10.6). In the
latter, 9 d.f. were used up by comparisons between patients (blocks).
There is quite strong evidence (given the assumptions in § 11.2)
that there are real differences between the treatments (drugs), as
concluded in § 10.5. This is because an F ratio, i.e. difference between
treatments relative to experimental error, as large as, or larger than,
that observed (16·532) would be rare if there were no real (population)
difference between the treatments (see §§ 6.1 and 11.3). Similarly,
there is evidence of differences between blocks (patients).

An emmple 01 a randomized block experiment with lour treatmmt8


The following results are from an experiment designed to show
whether the response (weal size) to intradermal injection of antibody
followed by antigen depends on the method of preparation of the
antibody. Four different preparations (A, B, C. and D) were tested.
Each preparation was injected once into each of four guinea. pigs. The
preparation to be given to each of the four sites on each animal was
decided strictly at random (see § 2.3). Guinea pigs are therefore blocks
in the sense described above. The results are in Table 11.6.2. (This is
aotua.1lyan artificial example. The figures are taken from Table 13.] 1.1
to illustrate the analysis.)

Digitized by Google
§ 11.6 How to deal with two or mort 8tJmplu 199
TABLB 11.6.2
Wtal dia".." UBi1&(! Jour antibody preparatiotaB in piMa t»g,
Antibody preparation (treatment)
Guinea.
pig
(block) A B C D Totala (Te.)

1 U 61 62 43 207
2 48 68 62 48 266 226
3 63 70 66 53 242
4 66 72 70 62 250
Total (T. / ) 198 271 260 196 0= 926
Mean 49'6 67'7 66·0 49·0

The caJculations follow exactly the same pattem &8 above.


()2 (926)2
(1) 'Oorredio'nJaetor' (see (11.4.8». N = 16 = 53476·5625.

(2) Bet'lDU'n antibody prtparatiotaB (treatment8). From (11.4.5),


1982 2712 2602 1962
SSD = 4 + 4 + 4 + 4 - 5 3 476·5626 = 1188·6875.

(3) Bet'lDU'n piMa pig, (blocks). From (11.6.1),


2072 2262 2422 2502
SSD = 4 + 4 + . + . -53 476'6625 = 270·6875.

(4) Total nm oj IKJ'UlJred detJiatiotaB. From (2.6.5) (or (11.4.3»,


SSD = 412+482+".+532+62 2-63476.6625 = 1492·4375.
(6) Error nm oj 8qtlQ.ru JouniJ, by diJJertma.
SSD = 1492'4375-(1188'69+270·687) = 33·0625.
There are 3 d.f. for blocks and treatments (because there are 4 blocks
and 4 treatments) and the total number of d.f. is N -1 = 15, 80, by
difference, there are 15-(3+3) = 9 d.f. for error. These results are
aaaembled in Table 11.6.3. Comparison of each mean square with the
error mean square gives variance ratios (both with Jl = 3 and J2
= 9 d.f.) which, according to tables of the F distribution (see § 11.3),
would be very rare if the null hypothesis were true. It is concluded that
there is evidence for real differences between different antibody

Digitized by Google
200 The analy8i8 of variance § 11.6
preparations, and between different animals, for the same reasons as in
the previous example.

TABLB 11.6.3
Sum of Mean
Source of variation d.f. squares square F P
Between antibody 3 1188'6876 396·229 107·86 < 0'001
preps. (treatments)
Between guinea pip 3 270·6875 90·229 24·56 < 0·001
(blocks)
Error 9 33·0625 3'674

Total 15 1492·4376

Multiple compariMmB. As in §§ 11.4 and lUi, if it is wished to go


further and decide which antibody preparations differ from which
others the method described in § 11.9 must be used. It is not correct
to do paired t tests on all possible pairs of treatments.

11.7. Nonparametric analysis of variance for randomized blocks.


The Friedman method
Just as in §§ 9.2, 10.3, and 11.5 the best method of analysis is to
apply the randomization method to the original observations. The
principles olthe method have been discussed in §§ 6.3,8.2,9.2,9.3, 10.3,
10.4, and lUi. Reasons for preferring this 80rt of test are discussed in
§ 6.2. As before, the drawback of the method is that tables cannot be
prepared to the oalculation will be tedious without a computer, though
they are very simple; and as before, this disadvantage can be overcome
by using ranks in place of the original observations (the Friedman
method).

The randomization method


The argument, simply an extension to more than two samples of
that in § 10.3, is again that if the treatments were all equi-effective
each subject would have given the same measurement whichever
treatment had been administered (see p. 117 for details), 80 the observed
difference between treatments would be merely a result of the way the
random numbers came up when allooating the Ie treatments to the /c
units in each block (see § 11.6). There are Ie I possible ways (permuta-
tions) of administering the treatments in each block, 80 if there are "

Digitized by Google
§ 11.7 How to deal wieA two or more 8CJmplu 201
blocks there are (k I)" ways in which the randomization oould oome
out (an extension to k treatments of the 2· ways found in § 10.3).
If the randomization was done properly these would all be equi-
probable, and if the F ratio for 'between treatments' is caJculated for all
of these ways (of. § I U5), the proportion of oases in which F is equal to
or larger than the observed value is the required P, as in f 10.3. .AJJ in
§ 11.5 it will give the same result if the sum of squared treatment
totals, rather than F, is caJculated for each arrangement.
As in previous oases an approximation to this result oould be obtained
by writing down the observations on cards. The cards for each blook
would be separately ahuftled and dealt. The first card in eaoh block
would be labelled treatment A, the second treatment B, and 80 on.
If this prooeaa were repeated many times, an estimate of the proportion
(P) of oases giving 'between treatments' F ratios as large as, or larger
than the observed ratio, oould be found. If this proportion was small
it would indicate that it was improbable that a random allocation
would give rise to the observed result, if the observation was not
dependent on the treatment given. In other words, an allocation that
happened to put the same treatment group subjects that were, despite
any treatment, going to give a large observation, would be unlikely to
turn up.

'I'M anal1/N oJ randomized bloelu by raUB. 'I'M Fried"",,, method


If the observations are replaced by tanks, tables can be oonstructed
to make the randomization test very simple.
The null hypothesis is that the observations in eaoh blook are all
from the same population, and if this is rejeoted it will be supposed that
the observations in any given block are not all from the same popula-
tion, because the treatments differ in their effects. If it is wished to
oonolude that the median effects of the treatments differ it must be
assumed that the underlying distributions (before ranking) of the
observations is the same for observations in any given blook, though the
form of the distribution need not be known, and it need not be the
same for different blocks.
As in the case of the sign test for two treatments (§ 10.2) the observa-
tions within each blook (pair, in the case of the sign test) are ranked.
If the observations in eaoh blook are themselves not proper measure-
ments but ranks, or arbitrary scores whioh must be reduced to ranks,
the Friedman method, like the sign test, is still applicable. In fact the
Friedman method, in the special case of k = 2 treatments, becomes

Digitized by Google
202 The aWllN of variaftCe § 11.7
identical with the sign teet. (l,mpare this with the Wilooxon signed-
ranks teat (§ 10.4) in which proper numerical measurements were
neoeaaary because differences had to be formed between members of a
pair before the differences oould be ranked.
Suppose, as in § 11.6, that Ie treatments are oompareci, in random
order, in " blocks. The method is to rank the Ie observations in each
block from 1 to Ie, if the observations are not already ranks. The rank
totals, RI (see §§ 2.1, 11.4, and 11.5 for notation), are then found for
each treatment. H there were no difference between treatments these
totals would be approximately the same for all treatments. The sum
of the ranks (integers 1 to Ie) in each block should be 1e(1e+ 1)/2, by
(9.3.1), and because there are " blocks the sum of the rank sums should
be
Ie
I RI = n1c(Ie+l)/2. (11.7.1)
1-1

As a measure of the discrepancy between rank sums now simply


calculate 8, the sum of squared deviations of the rank sums for each
treatment from their mean (of. (11.4.1». From (2.6.6), this is
_ Ie (l:R / )2
8 = l:(R-R)2 = I R1--J_-· (11.7.2)
1-1 II;

The exact distribution of this quantity, calculated according to


the randomization method--eee sections referred to at start of this
section, is given in Table A6, for various numbers of treatments and
blocks. For experiments with more treatments or blocks than are
dealt with in Table A6, it is a sufficiently good approximation to
calculate
128
r,.. = ,,1e(Ie+l) (11.7.3)

and find P from tables of the chi-squared distribution (e.g. Fisher and
Yates (1963, Table IV) or Pearson and Hartley (1966, Table 8» with
Ie-I degrees of freedom.
As an example, oonaider the results in Table 11.6.2, with Ie = 4
treatments and " = 4 blocks. H the observations in each block (row)
are ranked in aaoending order from 1 to 4, the results are as shown in
Table 11.7.1. Ties are given average ranks as in Table 9.3.1. This is an
approximation but it is not thought to affect the result seriously if the
number of ties is not too large.

Digitized by Google
§ 11.7 How to dt.al with two (JI' more ItJmplu 203
Applying the oheck (11.7.1), shows that 'E.RI should be 4.4.(4+1)/2
= 40, as found in Table 11.7.1. Now oa.loulate, from (11.7.2),
(40)2
8 = 62+162+132+62--4- == 66·00.

Consulting Table A6 shows that when Ie = 4 and n == 4, 8 = 64


corresponds to P = 0·0069. So the observed 8 = 66 corresponds to
P < 0·0069. This means that il the treatments were equi-effeotive
(null hypothesis) then in le88 than 0·69 per cent of experiments in the

TABLB 11. 7 .1
PM ob.gemJtionB within each blocle in Pable 11.6.2 redvced 10 rtJna
(Antibody preparation (treatment)
Guinea. pig
(block) A B C D

1 1 3
• 2
2
3

Ii
Ii
2 •••
3
3
3
a
1
Rank aum(B/ ) B,. =6 Bs = 15 Ba = 13 B. = 6 ~B, = fO

long run would a random allocation of treatments to the units of eaoh


block be chosen that gave differences between treatment rank sums
(i.e. a value of 8) as large as, or larger than, that observed (8 = 66).
The null hypothesis of equi-etJeotiven888 is therefore rejected, though
not with as much confidence as when the same results were analysed
by the GaUBBian analysis of variance in § 11.6. In Table 11.6.3, it was
seen that if the aBBumptions made (see § 11.2) were correct, P 0·001, «
was muoh lower than found by the present method.
H the experiment had been outside the scope of Table A6 then
(11.7.3) would have been used giving %~ = 12.66/4.4(4+1) = 9·90.
Consulting tables of the chi-squared distribution (see above) with
Ie-I = 3 degrees of freedom shows that a value of 9·837 would be
exceeded in 2 per cent of experiments in the long run 80 P ~ 0·02.
Not a very good approximation, in suoh small samples, to the exact
value of P (just 1888 than 0·0069) found from Table A6.
H it were of interest to find out whether there was a difference
between blooks, exactly the same method would be used (e.g. inter-
ohange the words block and treatment throughout this section).
Mvltiple compari8on8. As in §§ 11.4-11.6, the conclusion that the

Digitized by Google
204: § 11.7
treatments do not all have the same effect says nothing about which
ones differ from which others. It would not be oorrect to perform sign
testa on all possible pairs of treatment groups, in order to find out, for
example, whether treatment B differs from treatment D. A method of
answering this question is given in § 11.9.
11.8. The Latin square and more complex designs for experiments
There is a vast literature describing ingenious designs for experiments
but the analysis of almost an of these depends on the assumption of a
normal distribution of errors and on elaborations of the models des-
cribed in § 11.2. If the experiments are large there is, in some cases,
some evidence that the methods will not be very sensitive to the
assumptions. As the assumptions are rarely checkable with the amount
of data available it may be as well to treat these more complex designs
with caution (see comments below about use of small Latin squares).
Certainly if they are used the advice of a critical professional statistician
should be sought about the exact nature of the assumptions being
made, and the interpretation of the results in the light of the mathe-
matical model (see § 11.2).
To emphasize the point it should be sufficient to quote Kendall and
Stuart (1966, p. 139): 'The fact that the evidence for the validity of
normal theory tests in randomized Latin squares is flimsy, together
with the even greater paucity of such evidence for most other, more
complicated, experiment designs, leads one to doubt the prevailing
serene assumption that randomization theory will always approximate
normal theory.'

PM. Latin Bq'UlJre


The experiment 8ummarized in Table 11.6.2, was actually arranged
so that each of the four injection 8ite8 (e.g. anterior and po8terior on
each 8ide) , received every treatment once, according to the de8ign
shown in Table 11.8.1(a). The measurements, from Table 11.6.2, are
given in Table 11.8.I(b).
In the randomized block de8ign (§ 11.6) each treatment appeared
once, in random order, in each block (row). In the de8ign shown in
Table 11.8.1, which is called a Latin 8quare, there i8 the additional
restriction that each treatment appears once in each column so that the
column totals are comparable. The number of columns (injection sites)
as well as the number of blocks (roW8, guinea pig8) must be the same
as (or a multiple of) the number oftreatmont8. If a model like (11.2.2),

Digitized by Google
§ 11.8 How to deal witA two M more 8amplu 205
but with another additive component characteristics of each column
(injection site), is supposed to represent the real observations, then a
8um of 8qUares (see II 11.2, 11.3, 11.4:, and 11.6) can be found from the
observed 8catter of the column total8 (the corresponding mean 8quare,
would, &8 usual, estimate as, if the null hypothesi8 were true), and used

TABLB 11.8.1
The Latin square design
Column Injection Bite

Row 1 2 8

Gulnea
pia 1 2 8 , Total

1 A B C D 1 U 61 62 <&S 207
2 C A D B 2 62 .8 '8 68 226
,
8 D
B
C
D
B
A
A
C
8

6S
72
66 70
62 66
6S
70
2U
260

Total 228 227 286 28' 925


(a) (b)

in the GaU88ian analY8i8 of variance to eliminate errors due to 8ystem-


atic differences between columns (injection 8ite8). The 8um of 8qUares
i8 found from column totals, and number of observations per total
(4: in this case), using (11.4:.5) again.
SSD between injection site8 (columns)
2282 2272 2362 234:2
= -+-+-+--53476'5625=14'6875 (11.8.1)
4 4 4 4

TABLB 11.8.2
A nalysi8 oj mnance JM the Latin square
Sums of Mean
Source of variation d.C. squares square F P
Between antibody prepara-
tion (treatments) 3 1188·6876 896·23 129'. <0'001
Between guinea pigs
(rows) 8 270·6876 90·28 29·5 <0'001
Between Bites (columns) 3 14·6876 "89 1·6 >0'2
Error 6 18·8760 8·06

Total 16 U92·.876

Digitized by Google
206 The aftalgBi8 oj mnance § 11.8
with 3 degrees of freedom (because there are 4 columna). The sums of
squares for differences between treatments and between guinea pigs,
and the total sum of squares, are euotly as in § 11.6. When these
results are filled into Table 11.8.2, the error sum of squares and degrees
of freedom can be found by difference, and the rest of the table com-
pleted as in § 11.6. Referring the variance ratio, F{3,6) = 1·6, to
tables (see § 11.3) shows that there is no evidence for a population
difference between injection sites (P > 0·2).

OAoosing a Latin BqUare at random


As usual it is essential that the treatments be applied randomly,
given the restraints of the design. This means the design, Table 11.8.1 (a),
actually used in the experiment had the same ohance of being
chosen as each of the lS7lS other possible 4 X 4 Latin squares. The
selection of a square at random is not as straightforward as it might
appear at first sight and is frequently not done correotly. Fisher and
Yates (1963) give a catalogue of Latin squares (Table 25), and instruc-
tions for choosing a square at random (introduction, p. 24).

Are Latin BqUaru reliable ,


The answer is that iJ the assumptions of the mathematical model
are true, then they are an excellent way of eliminating experimental
errors to two sorts (e.g., guinea pigs and injeotion sites) from the
comparisons between treatments whioh is of primary interest. How-
ever, as usual, it is very rare that there is any information about
whether the model is correct or not. In the case of the t test the Gaussian
approach could be justified because it has been shown to be a good
approximation to the randomization method (§ 9.2) if the samples are
large enough. However, there is muoh less information on the sensitivity
of Latin squares (and more complex designs) to departures from the
assumptions. In the case of the 4 X 4 Latin square the randomization
method does not, in general, give results in agreement with the Gaussian
analysis so one is totally reliant on the assumptions of the latter being
sufficiently nearly true. It is thus doubtful whether Latin squares as
small as 4 X 4 should be used in most oircumstances, though the larger
squares are thought to be safer (see Kempthome 1952).

Jncompltk blocJ: duignB


In § 11.6, the randomized block. method was described for eliminating
errors due to differences between blooks, from comparisons between

Digitized by Google
§ 11.8 How to deal with two qr more MJmplu 207
treatments. Sometimes it may not be possible to test every treatment
on every block as when, for example, four treatments are to be com-
pared, but each patient = block is only available for long enough
to receive two. It is sometimes still possible to eliminate differences
between blocks even when each block does not contain every treatment.
Catalogues of designs are given by Fisher and Yates (1963, pp. 25,
91-3) and Cochran and Cox (1957).
A nonparametric analysis of incomplete block experiments has been
given by Durbin (1951).
Examples of the use of balanced incomplete block designs for bio-
logical 8088&18 (see § 13.1) have been given by, for example, Bliss
(1947) and Finney (1964). General formulas for the simplest analysis
of biological 8088&ys based on balanced incomplete blocks are given by
Colquhoun (1963).
11.9. Data snooping. The problem of multiple comparisons
In all forms of analysis of variance discussed, it has been seen that
all that can be inferred is whether or not it is plausible that all of the
i treatments (or blocks, etc.) are really identical. If there are more
than two treatments the question of which ones differ from which
others is not answered. The obvious answer is never to bother with the
analysis of variance but to test all possible pairs of treatments by the
two sample methods of Chapters 9 and 10. However, it must be re-
membered that it is expected that the null hypothesis will sometimes
be rejected even when it is true (see § 6.1), so if a large number of
teats are done some will give the wrong answer. In particular, if several
treatments are tested and the results inspected for possible differences
between means, and the likely looking pairs tested ('data selection',
or as statisticians often call it 'data snooping'), the P value obtained
will be quite wrong.
This is made obvious by considering an extreme example. Imagine
that sets of, say, 100 samples are drawn repeatedly from the same
population (i.e. null hypothesis true), and each time the sample out of
the set of 100 with largest mean is tested, using a two-sample test,
against the sample with the smallest mean. With 100 samples the
largest mean is likely to be so different from the smallest that the
null hypothesis (that they come from the same population) would be
rejected (wrongly) almost every time the experiment was repeated,
not only in 1 or 5 per cent (according to what value of P is chosen as
low enough to reject the null hypothesis) of repeated experiments as it
I,

Digitized by Google
208 The aftalY8i8 of variance § 11.9
should be (see § 6.1). If the particular treatments to be compared. are
not chosen before the results are seen, allowance must be made for data
snooping. There are various approaches.
One way is to compare all possible pairs of treatments. This is
probably the most generally useful, and methods of doing it for both
nonparametric and Gaussian analysis of variance are described below.
Another case arises when one of the treatments is a control and it is
required to test the difference between each of the other treatment
means and the control mean. In the Gaussian analysis of variance
this is done by finding oonfidence intervals for the jth difference
as difference ± dBv(l/nc+l/n/) where nc is the number of oontrol
observations, n l the number of observations on the jth treatment, s is
the square root of the error mean square from the analysis of variance,
and d is a quantity (analogous to Student's t) tabulated by Dunnett
(1964). Tables for doing the same sort of thing in the nonpa.ra.metric
analyses of variance are given by Wilooxon and Wilcox (1964).
A third possibility is to ask whether the largest of the treatment
means differs from the others. Nonparametric tables are given by
McDonald and Thompson (1967).

TM. critical range method for testing all possible pairs in tM. KN.Ul1cal-
WaUiB nonparametric one way analysis of variance (§ 11.5)
Using this method, which is due to Wilcoxon, all possible pairs of
treatments can be compared validly using Table A7, though the table
only deals with equal sample sizes. The procedure is very simple.
Just caloulate the difference between the rank sums for any pair of
groups that is of interest. If this difference is equal to (or larger than)
the critical range given in Table A7, the P value is equal to (or less
than) the value given in the table. For small samples exact probabilities
are given in the table (they cannot be made exactly the same as the
approximate P values at the head of the column because of the dis-
continuous nature of the problem, as in § 7.3 for example). For larger
samples use the approximate P value at the head of the column.
The first example of the Kruskal-Wallis analysis given in § 11.5
cannot be used to illustrate the method because it has unequal groups.
The second example in § lUi, based on the (parenthesized) ranks in
Table 11.4.1, will be used. In this example there were J: = 4 treatments
and n = 7 replicates, and evidence was found in § 11.5 that the treat-
ments were not equi-effective. Consulting Table A7 shows that a
difference between two rank sums (seleoted from four) of 79·1 or larger

Digitized by Google
§ 11.9 How to deal with two or more MJmplu 209
would occur in about I) per cent of random allocations of the 2S subjeota
to 4 groups (i.e. in about I) per cent of repeated experiments if the
null hypothesis were true), that is to say P ~ 0·01) for a difference of
79·1. Similarly P ~ 0·01 for a difference of 95·S.
The simplest way of writing down the differences between all six
possible pairs of rank sums is to construct a table of differences, with
the rank sums from § 11.5 (or Table 11.4.1), as in Table 11.9.1. The
treatments have been arranged in ascending order so the largest
differences occur together in the bottom left-hand comer of the table.

TABLB 11.9.1

Treatment
rank II1JD1
I ,61
1
71-1;
2
121
3
152·5
, 61
10.5
1 71·5 11·5
2 121 60·0 49·5
3 152·5 91·5· 81'0· 31·5
--------

The differences marked with an asterisk in Table 11.9.1 are larger than
79·1 but less than 91)·S. So P is somewhere between 0·01 and 0·01) for
these differences suggesting (see § 6.1) that there is a real difference
between treatments 3 and I, and between treatments 3 and 4. All
other differences are less than 79·1 so there is little evidence (P > 0·01)
for any other treatment differences.

The critical range metlwd lor teating all poBBible paira in the Friedman
'1W'nfHJrametric two way analysia 01 mriQ/nu (§ 11.7)
This method, also due to Wilcoxon, allows valid comparison of any
pair of treatments in the Friedman method (§ 11.7), using Table As
in much the same way as just described for the one way analysis.
The results in Table 11.7.1 will be used to illustrate the method.
There are k = 4 treatments and n = 4 blocks (replicates), so reference
to Table AS shows that a difference (between any two treatment rank
sums selected from the four) as large as, or larger, than 11 would be
expected in only 0·5 per cent of repeated experiments if the null
hypothesis (see § 11.7) were true, i.e. if ranks were allocated randomly
within blocks. Similarly a difference of 10 would correspond to P
= 0·026.
A table of all possible pair differences between the rank sums from
Table 11.7.1 can be constructed as above in Table 11.9.2.

Digitized by Google
%10 § 11.9
AD the six di«erences are less than 10, i.e. none reaches the P = o-OH
level of significance. Despite the eridence (in § 11.7) that U1e four
tzeatmenta are DOt equi-effective, it is not, in this cue, poBble to
detect with any certainty which treatments di«er from which others.
Thia ia not 10 smprisiDg looking at the raub in Table 11.7.1. but
IookiDg at the original figmee in Table 11.6.2 mggeatB stroDgIy that

TABL. 11.9.2
D C B
8 13 16

7
9 2

treatments B and C give larger responses than A and D. In fact, il the


aaaumptions of GaWl8ian methods (see § 11.2) are though justifiable,
the Scheffe method could be used, and it is shown below that it gives
just this result. The reason for the apparent lack of sensitivity of the
rank method with the small samples is similar to that diac1188ed in
f 10.5 for the two-sample case.
ScAell:8 metIwd lor multiple compariBOM in flu! GaUBia" araalpiB 01
mriance
The GaUll8ian analogue of the critical range methods just described
is the method of Tukey (see, for example, Mood and Graybill (1963,
pp. 267-71», but Scheffe's method is more general.
Suppose there are k treatments. Define, in general, a contraBI (see
examples below, and also § 13.9) between the k means as

(11.9.1)

where }AI = O. The values of al are constants, some of whioh may be


zero. When the 91 are the means of independent samples the estimated
variance of this contrast follows from (2.7.10) and is

(11.9.2)

where 8 2 is the variance of y (the error mean square from the analysis
of variance) and 11.1 is the number of observations in the jth treatment

Digitized by Google
§ 11.9 How to deal waIA two or mort 8amplu 211
mean, fh. The method is to construct confidence limits (see Chapter 7)
for the population (true) mean value of L as
L ± Sy[var(L)] (11.9.3)
where S = V[(k-I)F], and F is the value of the variance ratio (see
§ 11.3) for the required probability. For the numerator F has (k-I)
degrees of freedom, and for the denominator the number of degrees of
freedom associated with r. H the confidence limits include any hypo-
thetical value of L the observations cannot be considered inoompatible
with this value, as explained in § 9.4.

TArtt "umtrical emmplu


Bmmpk I. Suppose that it were decided to test whether the largest
mean in Table 11.4.1 (9a = 33·157) really dilfers from the smallest
(9, = 14·29), this pair being ohosen after the results were known. H
in (11.9.1) we take al = 0, as = 0, as = +1 and a, = -I then L
= 9a-9, = 19·~8, the dilference between means. From Table 11.4.3
it is seen thatr = 101·!li with 24degreeeoffreedom. Thus, by (11.9.2),
var(L) = 101·!li(02/7+02/7+12/7+-12/7) = 28'90. There are k = 4
tleatments 80 to find the 99 per cent confidence limits, the variance
ratio for P = 0·01 (= 1-0'99) with 3 and 24 degrees of freedom is
required. From the tables (see f 11.3) this is found to be 4'72. Thus
S = V[(4-1).4·72] = 3'763. The P = 0·99 confidence limits are
19·28 ± 3·763v'(28·90) = 19·28 ± 20'23, i.e. -0·95 to +39·51. The
limits include 0, 80 the dilference between the two means cannot be
considered to dilfer from 0 at the P = 0·99 level of confidence. In
other words a significance test (see § 6.1) for the dilference between
the largest and smallest means would give P> 0·01 (compare § 9.4).
And, because SV[var(L)] is the same for any pair of means, the same
can be said of any pair of means differing by less than 20·23 (see
Example (3) also).
Now try the 97·5 per cent limits. From the Biometrika tables (see
§ 11.3) the value of F(3, 24) for P = 0·025 is 3'72, 80 Sv'[var(L)]
= v'(3 X 3·72 X 28·90) == 17·96. This is lea8 than the observed dilference,
19·28, 80 the result of a significance test would be that P is between
0·025 and 0'01, suggesting, though not with great confidence, a real
dilference.
Bmmplt 2. .As another example, suppose that it were wished to ~
the null hypothesis that mean of the two more effective treatments
(2 and 3) is equal to the mean of the other two treatments in Table

Digitized by Google
212 f 11.9
11 .•. 1. Todotbill,takeal = -I, ~= +I,aa= +1 anda. = -110
L = W2+lia) -(til +Ii.)· The true (population) value of this will be
zero if the hypothesis to be tested is true. The sample value is L
= -15-67+26·86+33'67-14·29 = 30-57. From (11.9.2) var(L) =
101·16 (-12/7+12 /7+12 /7+-12/7) = 67·80. 8 = 3·763 eDCtly aa
above, 10 the 99 per cent (P = 0'99) confidence limite for the population
value of L are 30-67 ± 3·763V(67·80) = 30·67 ± 28,61, i.e. +1·98
to +69'18. The limite do not include zero 80 the null hypotheaia that
the true (population) value of L is zero would be rejected if P < 0-01
were considered BUfticiently small (see § 6.1). The same could be said
of any difference (between the sum of any two means and the sum of
the other two), that exceeded 28·61.
EZMnfIk 3. The method can be used, at least as an approximation,
for randomized block experimente also. For the resulte in Table 11.6.2,
r = 3·674 with 9 d.f. (from Table 11.6.3). To test li2 against Iii take
al = -I, ~ = + I, as = 0, a. = 0, as in Example (I). There are
,,= ·heplicates,80 var(L) = 3.674:(-12 /4+1 2/4:+02/4:+02/4:) = 1,837,
from (11.9.2). And this value will be the same for the difference
between any two means. There are J: = 4: treatmente 80 values of
F{3,9) are required. From the Biometrika tables (see f 11.3) the
P = 0·26 value is 1·63 and the P = 0·001 value is 13·90. Thus, 8 =
V{3x 1·63) = 2·211 and 8v'£var{L)] = 2·211V(I·837) = 2·996 for
P = 0·26. And for P = 0'001,8 = V(3x 13'90) = 6·4:67,80 8v'£var
(L)] = 8'749. The differences between the six poBSible pairs of means
from Table 11.6.2 are, tabulating as above, shown in Table 11.9.3.

TABLB 11.9.3
Treatment D A C B
:Mean .9'0 .9·6 65'0 67'7

D .9·0
.A 49·6 0·6
0 65·0 16,0- 16'5-
B 67·7 18,7- 18,2- 2·7

The four differences marked by an asterisk in Table 11.9.3 are greater


than 8·749 80 the null hypothesis that the true values of these differ-
ences are zero can be 'rejected' at P < 0·001 (see § 6.1). The other
differences are less than 2·99680 there is no evidence (P > 0·26) that
these differences are due to anything but experimental error. It is
conoluded that treatments B and C both give larger responses than

Digitized by Google
§ 1 How
treatments A a.nd D, but that no difference ca.n be detected between
A a.nd D, or between Band C. Compare this result with rank analysis
of the sa.me observations (§§ 11.7 a.nd 11.9, above). Remember that a
norma.1 (Gaussia.n) distribution has been a.ssumed throughout these
""""hd!J0&"!'L#~ d"",,,,.f·..... the fa.ct th!!t !'±L#id!!itce was present~i,
"""""""""!!''''Uu. was justifi!!d,

gle
12. Fitting curves. The relationship
between two variables

tt4§Jt4§Jre of the
ll4§Jamples H%%,asurements 01" Hgiable
have been involved (e.g. blood sugar level or change in duration of
sleep). However, experiments are often concerned with the relationship
between two (or more) variables; for example, dose of drug and response,
concentration and optical density, time and extent of chemical reaction,
or school and university examination results. The last ofthese examples
different from ,md suggests of
"''''''''''''' llflCur.
variable can be ii(lcurately and itsl by
for when a moosurlfmllEi
of a drug. variable is ca11iit
variable (notice that independent in this context has a different meaning
from that encountered in §§ 2.4 and 2.7). The other variable, called the
dependent variable, is subject to experimental error, and its value
depends on the value chosen for the independent variable. For example,
a dependent ilhich is related to th'l inde-
r~%%","S"ii'" uariable (as be measurr4§J EUdHgible

variable can the


or measureH rilor. For exam4§Jlu before
and after university (as measured by school and university exam
results) are both measured inaccurately.
In both of these cases the first thing usually done is to plot the
results and draw some sort of line through them.
Case (a) is described (for historical reasons, now irrelevant) as a
rfsJU""'crUi"", problem. The liEU ihe points is rrbuessWn
iurmula for biing the regressiPJ?"i This
",,,,,UA... is dealt 1-12.8.
""%%''''"U type of (b), is a corre4§Jui4§JJ%i miSS,",",SSiSS
flf the results is a scatter diagriiil
12.9).

gle
§ 12.1 The relatioMkip between two variablu 215
The expression 'fitting a curve to the observed points' means the
process of finding estimates of the parameters of the fitted equation
that result in a ca.lculated curve which fits the observations 'hest' (in a
sense to be specified below and in § 12.8). For example if a straight line
is to be fitted the 'hest' estimates of its arbitrary parameters (the slope
and intercept) are wanted. The method of fitting the straight line is
disc1l88ed in detail in §§ 12.2 to 12.6 because it is the simplest problem.
But very often, especially if one has an idea of the physical mechanism
underlying the observations, the observations will not be represented
by a straight line and a more complex sort of curve must be fitted.
This situation is discussed in §§ 12.7 and 12.8. Often some way of
transforming the observations to a straight line is adopted, but this
may have a. considerable hazards as explained in §§ 12.2 and 12.8.
It is, however, usually (see § 13.14) not justified to fit anything but a
straight line if the deviations from the fitted line are no greater than
could reasonably be expected by chance (i.e. than could be expected if
the true line were straight). In general it is usually reasonable to use
the simplest relationship consistent with the observations. By simplest
is meant the equation containing the smallest number of arbitrary
parameters (e.g. slope), the values of whioh have to be estimated from
the observations. This is an application of 'Occam's razor' (one version
of which states 'It is vain to do with more what can be done with fewer' :
William of Occam, early fourteenth century). The reason for doiDg
this is not that the simplest relationship is likely to be the true one,
but rather because the simplest relationship is the easiest to refute
should it be wrong. (The opposite would, of course, be true if the
parameters were not arbitrary, and estimated from the observations,
but were specified numerically by the theory.)

The role o/8tatistical methodB


Statistical methods are useful
(1) for finding the best estimates of the parameters (see §§ 12.2, 12.7,
and 12.8) of the chosen regression equation, and confidence limits
(Chapter 7 and §§ 12.4-12.6) for these estimates,
(2) to test whether the deviations of the observed points from the
caloulated points (the latter being obtained using the best estimates
of the parameters) are greater than could reasonably be expected by
chance, i.e. to test whether the type of curve ohosen fits the observations
adequately. It is important to remember (see § 6.1) that if observations
do not deviate 'significantly' from, say, a straight line, this does not

Digitized by Google
216 Fitting curves § 12.1
mean that the true relationship can be inferred to be straight (see
§ 13.14 for an example of practical importance).
The but .fitting curve (see § 12.8) is usually found using the metlwd oJ
It.a8t BqUlJrea. This means that the curve is chosen that minimizes the
'badness of fit' as measured by the sum of the squares of the deviations
of the observations (1/) from the calculated values (Y) on the fitted
curve. In other words, the values of the parameters in the regression
equation must be adjusted so as to minimize this sum of squares. In
the case of the straight line and some simple curves such as the parabola
the best estimates of the parameters can be calculated directly (see
§§ 12.2 and 12.7). The principle of least squares can be applied to
any sort of curve but for non-linear problems (see § 12.8; but note that
fitting BOme sorts of curve is a linear problem in the statistical sense,
as explained in § 12.7) it may not have the optimum properties that it
can be shown to have for linear problems (those of providing unbiased
estimates of the parameters with minimum variance, see § 12.8, and
Kendall and Stuart (1961, p. 75». For linear problems these optimum
properties are, surprisingly, not dependent on any assumption about
the distribution of the observations, but the construction of confidenoe
limits and all the analyses of variance depend on the assumption that
the errors ofthe observations follow the Gaussian (normal) distribution,
so all regression methods must (unfortunately) be classed as parametrio
methods (see § 6.2). Tests for normality are discussed in § 4.6.

12.2. The straight line. Estimates of the parameters


It is assumed throughout this disCU88ion of linear regre88icm that the
indepe1Ulent variable, x (e.g. time, concentration of drug, see § 12.1)
can be measured reproducibly and its value fixed by the experimenter.
The experimental errors are assumed to be in the observations on the
depe1Ulent variable, 1/ (e.g. response, see § 12.1). Suppose that several
(k, say) values of the independent variable, Xl' Xa, • • ., X", are ohosen
and that for each observations are made on the dependent variable,
1/1' 1/a,· . ., 1/N (there being N observations altogether; N may be bigger
than k if several observations are made at each value of x, as in § 12.6).
In order to find the 'best' straight line by the method of least squares
(see §§ 12.1, 12.7, and 12.8) it is necessary to find the line that will
minimize the badness of fit as measured by the sum of squares of
deviations of the depe1Ulent variable from the line, as shown in Fig.
12.2.1.

Digitized by Google
§ 12.2 The relatioMhip betwem two variablu 217
Thua the best fitting straight line is the one that minimizes the sum of
squared deviations
I-N
8 = Id1 = l:(YI-Y/)l~ (12.2.1)
1-1
where YJ is the observed, and Y J the calculated, value of the dependent
variable corresponding to xJ. The resulting line is called the regre88ilm
liM oJ Y on. x. If the deviations of points from the fitted line were not
measured vertically as in (12.2.1), but, say, horizontally, the least
squares line would be different from that found in the way just described
It.

~ ---------------- .
.. ~ "" -r"1 i
y. -----------------

Y1

r
Flo. 12.2.1. The dependent variable 1/, plotted against; the Independent
variable :e. Definition of l/i, :el' Y I and d l for dJscuaslon of curve fitting.

(it would be called the regression line of x on y), but this would not be
the correct approach when the experimental errors are supposed to
affeot y only.
The general equation for a straight line can be written Y = at +bx
where a' is the intercept (i.e. the value of Y when x = 0). It will be
more convenient (for reasons explained in § 12.7) to write this in a
slightly different form, viz.
Y = a+b(x-i) (12.2.2)
where b is the slope, and a is the value of Y when x = i (so that
a-hi, which is a constant, is the same as at). The left-hand side is

Digitized by Google
218 Fitting curves § 12.2
written as capital Y to emphasize that the evaluation of the equation
gives the calculated value of the dependent variable, which will in
general, differ from the observations (y) at the same value of x, unless
the observation happens to lie exactly on the calculated line.
The true (population) regression equation, assuming the line to be
really straight, can be written, for any specified value of x,
p. = population value of Y = ex+P(x-i) (12.2.3)
where ex and p are the true parameters, of which the statistics a and b
are estimates made from a sample of observationa. Because the in-
dependent variable, x, is assumed to be measured with negligible error
there is no distinction between the observed and true values of x.
The problem is now to find the least squares estimates of ex and p,
from the observations. This will be done algebraically for the moment.
In § 12.7 the geometrical meaning of the algebra is explained. Firat
substitute the calculated value of Y at the jth value of x, which, from
(12.2.2), is Y I = a+b(xl-i), into (12.2.1) giving
I-N
S = I [Y/-a-b(xl -i)]2. (12.2.4)
1-1

Squaring the term in brackets gives


N
8 = I £1A+a2 +b2(xl -i)2_2aYI-2YP(xl -i)+2ab(xl-i)]
1-1

and therefore, using (2.1.5), (2.1.6), and (2.1.8),


N N N N N
S = Iy~+Na2+b2 I (xl-i)2-2a IYI- 2b IY/(xl -i)+2ab I (xI-i).
I-I 1-1 1-1 1-1 1-1
(12.2.5)
The last term in this equation is zero because, by (2.6.1), l:(x-i) = O.
The object is to find, for the particular values of x uaed.and the particular
values of Y observed, the values of a and b which make S as small as
possible. For a particular set of results we are, for the moment regarding
x and Y values as fixed and a and b as variables. The usual procedure in
calculus for finding a minimum is to differentiatet and equate the
result to zero as illustrated (for a) in Fig. 12.2.2 (see Thompson (1965,
p. 78 et seq.» A fuller explanation 01 this pr0ce88 is given in. § 12.7.
t ~ there are two variablee, II and b, partial dit!'ereotial ooemoiente (with
curly 8) are uaed. Tbi8 makee the clliferentiatioD of (12.2.11) even simpler beoauae it
_ that when differentiating with respect to II, b is treated .. a oonatant (and vice
vena). See § 12.7.

Digitized by Google
§ 12.2 The relationahip between. two tJariablu 219
It is shown below (see (12.2.10» how the least squares estimates can be
derived without using ca.loulus at all. Thus, to find the least squares
value of a, differentiate (12.2.5) treating b as a constant
{)8 N
- = 2Na-2I,h = 0
8a 1-1

therefore N a = l:.1h
l:.y
80 a = N = Y· (12.2.6)

80

70
~

860
"=.~ SO
-8
1i"30
40
_SlopE' =0
'0 at minimum
e 20
rZ
10

04
5 6 i ~. 9 10 11 12
a = 8.00 P088ible values of a

FlO. 12 •2 .2. The sum of squared devia.tlou (8) plotted a.gaiD8t; various
values of II using eqn. (12.2.6). The data (If and 116 values) are those in Table 12.7.1
and b was held coutant at 3·00 (cf. Fig. 12.7.8). The slope ofthe curve, 08180,
is zero at the minimum. The graph is diacuaaed in detail in 112.7.

Similarly, to find the least squares estimate of b, differentiate (12.2.6)


with respect to b, treating a as a constant,
{)8 N N
- = 2b I (Z/_f)2_2 I Y/(Z/-f) = 0
Oa 1-1 I-I

therefore 2bl:.(zr~f)2 = 2l:.Y/(z/-f) b


80 b = l:.Y/(z/- f )
(12.2.7)
l:.(z/-f)2

or (12.2.8)

Digitized by Google
220 FiUing C'Urve8 § 12.2
Although it is not immediately obvious, the numerators of these two
expressions for b are identical, as shown by (12.2.9) below.
Using (2.6.2) and (2.6.6) shows that the estimated slope, (12.2.8), can
be written b = cov(x, y)/var(x). It was shown in § 2.6 that l:(y-y)
hence the mtlH,Hllre the extent lHmds
]il7'ili:72H"i:7 mhen x is inore2ililmL

a:)
N
l: (11l-yHzJ- x ) = ~YPJ-y~-yzJ+gx]
i-I
= ~YJ(zJ -a:) -y(zJ -x)]
= :EYJ(zJ -x) -y:E(ZJ -a:)
N
-l: -x) (12.2.9)

hmt term in the ihuution is, by (2.6~

The argument is exactly analogous to that already used for the arithmetic
mean in § 2.6 (p. 27). The sum of squares to be minimized, (12.2.4). can be
written
S = l:[YJ-a-b(zJ-x)]2 = ~YJ-y-b(zJ-xH~+N(a-Y)~+(b-b). :E(zJ-x)~.
(12.2.10)
where a and b denote possible estimates of ex and P. and ;; denotes, as in § 12.7.
,utdllU221t'22 of P given by expression is {H.6.6).
see that the ValU22U that minimize and
two
H2UH]H~], value. It can quEtu

is an algebraic iUU2,rting b from (l2Jt~ 2222J unding


2

the right Side, in the same sort of way as shown in detail for (h.h.h).

AS8'UmptionB made in find the leCUlt squares fitting and analysis oJ straight
lines
(1) The standard deviation of y was assumed to be a constant. That
t,hat the observiltif2ntl the same scattnu L"nints
80 that be attaohed {2bnuHhntions
buifnations, of.
condition is observations
h2,ilUn£j2tuut,[1,'§tlll. Quite often is not fulfillHd
Fig. 12.2.3. For instance, it is quite commonly found in practice that
there is a tendency for the smaller observations to have less scatter,
in a way that the relative scatter (e.g. the coefficient of variation,

gle
§ 12.2 TII~ relatioMkip bdween. two variablu 221
(2.6.4» is more nearly constant than the absolute scatter (e.g. the
standard deviation). If this is the case the observations (whioh are said
to show ~icity) should not be given equal weight, and this
makes the calculations more complicated (of. Chapter 14).
(2) The population (true) relation between y and % has been assumed
to be a straight line. In § 12.6 it will be shown how it can be judged
whether deviations from linearity can reasonably be ascribed to
experimental error.

y y
(a) Homoecedaatic' (b) Heteroecedaatil'

z
FlO. 12.2.S. (a) A homoscedastic curve-fitting problem (idealized). (b) An
eumple of heteroeced.aatic observations.

(3) The independent variable has been assumed to be measured


with negligible error. For a disoU88ion of what to do when it is not see,
for example, Brownlee (1966, p. 391).
(4) The analyses to be described will all assume that the errors in
the observations (y, the dependent variable) at each of the selected %
values follow Gaussian (normal) distributions (see §§ 4.2, 4.6 and 12.1).

PM U8~ oJ tra1&8JormatioM
This discU88ion is olosely related to that in § 11.2, in which a method
of choosing a transformation to equalize variances was described.
Transformation (e.g. logarithm, square root, reciprocal) may be used
to make results conform with the above assumptions. For example,
if the observations are described by an exponential relationship.
Y = Yoe- IcZ, then taking natural logarithms gives log Y = log Yo-kz.
The regression of log y on % should therefore be a straight line with
intercept = log Yo and slope = -k. An example is worked out in
§ 12.6. Notice, however, that if y were homosoedastic and normally
distributed then log y would be neither, 80 it may not be possible to

Digitized by Google
222 Fittif&{/ CUrve8 § 12.2
satisfy all the assumptions simultaneously (see §§ 11.2, 12.8 and
Bartlett (1947». Tests for normality are discussed in § 4.6.
It is important to distinguish between the effects of transformations
of the dependent variable, '!I, on one hand, and on the independent
variable. ~. on the other. Transformations of ~ are often used to make
a line straight (e.g. response, '!I. is often plotted against the log of the
dose. ~. in pharmacology). This merely alters the spacing at which
points are plotted along the abscissa in Fig. 12.3, but cannot have any
effect on the homoscedasticity or distribution of errors of the observa-
tions, '!I. Transformations of '!I, on the other hand, affect these as well as
linearity.
12.3. Measurement of the error in linear regression
Consider the straight line fitted to the results in Fig. 12.3.1. As
before, '!I stands for the observation at a particular value of ~, and Y

y.
------------------------~

• L ______________ ~:~:~i i

y----------------- (y~~~{-j
0:
I
1
1
I
I
I
1
o 1
1
1
,-<
,x
X1
x
Flo. 12.3.1. Definition of terms used in curve fitting. The values of the
dependent variable are plotted on the ordinate and the independent variable (2:)
on the abscissa (see §§ 12.1 and 12.2). The five observed values (0),"1 to "11'
have been plotted against tbe corresponding 2: values, 2:1 to 2:11, and a straight
line fitted to them. The nature of the terms (" - Y), (Y -fj), and (" -fj) occurring
in eqna. (12.3.2) and (12.3.3), is illustrated for the fourth 2: value.

for the predicted value of the dependent variable (i.e. that calculated
from the estimated line) at a particular ~. The equation for the esti-
mated line, Y = a+b(~-i), can be written. using (12.2.6),
Y = g+b(~-i). (12.3.1)

Digitized by Google
§ 12.3 The relationahip betwem two t1ariablu 223
from which it can be see that the line must go through the point
(g, x) because Y = 9 when x = x (i.e. when x-x = 0).
This seotion is concerned only with errors in y, because x has been
a.ssumed to be measured without error (§§ 12.1, 12.2). The total devia-
tion of the observed point from the mean, in Fig. 12.3.1, can be divided
into two parts: (y- Y) = deviation of observed value from the line,
and ( Y -g) = deviation of predicted value on the line from the mean of
all observations. This can be written

(y-g) (y-Y) + (Y -g) (12.3.2)


total deviation deviation from part or the total
ItraJabt lin deviation accounted
ror by the IlDeu
relatloD bet_
.,aDds

It is now poBBible to use the analysis of variance approaoh. The total


sum of squared deviations (SSD) of eaoh observation from the grand
mean of all observations isl:(YI-g)l', and this total SSD can be divided
into two components (compare (12.3.2»

(12.3.3)

in whioh the first term on the right-hand side measures the extent
to whioh the observations deviate from the line and is called the SSD
Jor deviations Jrom linearity. It is this that is minimized in finding least
squares estimates, see § 12.2. The seoond term on the right-hand side
measures the amount of the total variability of Yfrom 9 that is aocounted
for by the linear relation between y and x, and is oalled the SSD due to
linear regression. That (12.3.3) is merely an algebraio identity following
from (12.3.2) will now be proved.

Digression to prOtJe (12.3.3), and to obtain. a working JormlulaJor the BUm


oj square8 due to linear regression 3
(1) To IlIioto thal (12.3.3) follounJ from (12.2.2). The summations are, &8 before,
over all N observations of y (there may be more than one 11 at each II: value).
From (12.2.2),
3 totalSSD = l:(y_j)1 = l:[(y-Y)+(Y -j)]2
= l:[(y-Y)I+2(y- Y)(Y -j)+(Y -g)l]
= l:(y-Y)I+2l:(y-Y)(Y -j)+l:(Y -j)1
= l:(y- y)I+l:( Y -j)l Q.E.D.
devlatloDi due to IlDeu
from repeaion
IIl111U1ty
J6

Digitized by Google
224 § 12.3
The central term in the penultimate equation is zero because
2E(y - Y)( Y -fl)
= 2lb-fl-b(z-z)B+b(z-:8) -fl] (from (12.8.1»
= 2l:[yb(z -:e) -fjb(z -z) -b2(z _Z)2]
= 2bEy(z-z)-2b2E(Z-Z)2 (from (2.6.1»
= 2bEy(z-:8) -2bEy(z-z) = O. Q.E.D.
(from (12.2.7»

(2) .A toorking formvla for tile BUm of aqua,.. due to linear regreuioft
As US1l&l, it is inconvenient to calcuJate the individual deviations (Y -fl), and
a more coDvenient working formuJa is used. As before, the 8UIIlID&tions are over
aU N observations.
E(Y-fl)2 = E£Y+b(z-z)-fl]2 (from (12.8.1»
= l:(b(~-:e)]2 = b2E(Z-Z)2 (from (2.1.5».
Substituting (12.2.8) for the alope, b, gives the alternative forms:
[E(y -fl)(~ -:e)]2
SSD due to linear regreaaion = b2E(~_Z)2 or - E(
~-z)
_2 • (12.8.')

12.4. Confidence limits for a fitted line. The Important


distinction between the variance of y and the variance of Y
It was stated in § 12.2 that the method of fitting a straight line
there desoribed involves the assumption that the scatter of the observa-
tions does not depend on their size (see Fig. 12.2.3), i.e. that their
population (true) variance, a2[y], is a oonstant, independent of the
value of z (and of y) The estimated value of a2 from a sample is ~[y]
or var[y], the error mean square from the analysis of variance table
(see §§ 11.4, 12.5, and 12.6). The width of the confidence interval
for the population value of an observation is therefore the same
(±I8[y], see (7.4.2» whatever the size ofthe observation.
In practice a straight line is usually fitted for one of the following
reasons.
(a) To estimate the slope or intercept and their confidence limits
(see §§ 7.2, 7.9, 12.5, and 12.6).
(b) To predict values of y for a given z. For example, it may be
required to predict from the fitted line what response (Y) would be
produced by a partioular dose (z), or what optical density (Y) a
solution of a partioular concentration (z) will have. The error of such
a prediotion is disoussed in this section. There are two forms of the
problem, as in § 7.4.
(c) To predict the value of z required to produce a given (observed.
or hypothetical) value of y. For example, the prediotion of the dose
(z) needed for a given response, or of the concentration (z) of a solution

Digitized by Google
§ 12.4: The relatioMhip between two tJariable8 225
of a particular optical density. This sort of problem is probably the
most important in practice but its solution is rather more complicated
than for (a) and (b). Its solution will be given in § 13.14.
In case (b) confidence limits are required for a value of Y calculated
frt>m the fitted linu, for an observ»d § 12.2); and
uow be found.
the meaning see §§ 7.2 and

frquation (12.2.2) for t2e dtted line is Y = a+b(z-i) where a = iJ


and b = ~y(z-i)~(z-i)2 (from (12.2.6) and (12.2.7». Because the
independent variable (x) is assumed to be measured without error
(§ 12.2), terms involving only x can be treated, for the purposes of
assessing error, as constants. Because, as shown below, a and b, and
du;nce Y, are lineztt ffrf the observa2tnnt, b~nHnws that if the
nHtn>rvations are be normally
Hitfributed. ImaHnz&,t? €€uH"triment being many times on
n€&,peated random the same 74Ulng the same x
j;j;€ulues. From ea(lh a and b am and Y, for
example the response for a partioular dose, is caloulated from the
fitted line (12.2.2). The variation of the repeated a, b, and Y values
should fonow normal distributions with means ex, p, and I-' (see (12.2.3»,
and variances vM[a], vat[b], and VM[Y] say. Compare the non-normal
distribution of parameter estimates found in the non-linear problem
dinflussed in § 12,h, nurmally distribnt7tnH limits can
Hound as in § 7, H is known. Tf% it is necessary
know the Vari'7H%j;j;,n> b.

The tJariance of the utimated 8lope, b


The least squares estimate (b) of the tme slope (P) given in eqn.
(12.2.7) can be written out term by term in the form

~yj(xj-i)
- (12.4.1)
~(xl-i)2

Thurefore b is a of the observ,?hi?%Hf, can be written


in the form ~clYI = C1Yl+C2Y2+",+CNYN where the Cj are constants
(for more comments on this use the word 'linear' see § 12.7). In this case
the constants are cj = (xJ-i)/~(Xj-i)2. The variance of b now follows

gitized by Goo
226 § 12.4
directly from (2.7.10) and is va.r[g].l:cJ. Now c] = (ZJ-i)2/[l:(zJ-i)2]2
80 l:cJ = 1:(ZJ-i)2/[l:(zJ-i)2P and therefore, cancelling,
va.r[g]
var[b] = ~( -)2' (12.4.2)
~zJ-z

This gives a predicticm of what the variance of repeated estimates of b


should be, based on the scatter of the observations, va.r[g], seen in
the one experiment actually done (see §§ 2.7 and 7.2). Notice that the
slope will be most accurately determined (var [b] smallest) when the
values of Z are widely spaced making the values of (z-i) large, as
common sense suggests.
Confidence limits for the slope can be obtained just as in § 7.4
(because b is normally distributed when the observations are, see above)
as
b ± tv'(var[b]), (12.4.3)
where t is Student's t for the required P and with the number of d.f.
associated with var[y]. See §§ 12.5 and 12.6 for examples.

The variance oj a
By (12.2.6) a = ii 80 var[a] = var[fi] = va.r[y]/N, by (2.7.8).
Oon.fidence limits Jor the true (populaticm) 8traight line
The value of Y estimated from the line, Y = a+b(z-i), is a linear
function of the observations because, as above, both a and b are. It
will therefore be normally distributed when the observations are.
The population mean value of Y at any given value of z is p (see
(12.2.3», 80 the error of a value of Y is Y -p which has a population
mean valuet of p-p = 0 and variance var[Y] (because p is a constant).
The variance of Y is
var[Y] = var[fi+b(z-i)] (by (12.3.1»
= va.r[fi]+var[b(z-i)] (by (2.7.3»
= va.r[fi]+(z-i)2ovar[b] (by (2.7.5»
= va.r[y] + (Z_i)2 var[y] (by (2.7.8) and (12.4.2»
N 1:(z/-i)2

(12.4.4)

t See Appendix 1 Cor a rigorous definition. E(Y -p] = E(Y]-E[p] = p-p = O.

Digitized by Google
The
Notice that the UBe of (2.7.3) 8.88umes that ti and bare uncorrelated
(i.e. in repeated experiments there will be no tendency for y to be
large in experiments when b is). This has not been proved but a similar
relationship is discussed in greater detail in §§ 13.S and 13.10. See also
§ l~t
','C"liCC",',g"D limits for I-' value of
cs?'lculating Y, arc,
Y
Several points about (12.4.4) are worth noticing. First, although
the term l:(Xj_X)2 is a constant (depending on the particular values
of x chosen) for a given experiment, the term (X_X)2 is not. The presence
of this latter term shows that the variance of Y, unlike that of y, ill
dependent on the value of x. The variance of Y will be at a minimum
whcTI bccaUBe at this Rc,cond term, wti£tt
,''','',,',~''F'' leaving var[ as expected
ti, see § 12.3). be seen that thR7
tJle width of the limits) increasec
in euher direction from X, becaUBe the deviation (x -x) is squared and
therefore always positive.
The common sense of these results is discussed further when they
are illustrated numerically in §§ 12.5 and 13.14 (and plotted in Figs.
12.5.1 and 13.14.1).
for new oblJ'erY5·{,4:C4'>"i,'?

for, say, the


by concentration
limits within which the mean (tim) of m observations of the response to
concentration x would be expected to lie. The best estimate of Ym is
the same as the best estimate of 1-', viz. Y = a+b(x-x); but, as in
§ 7.4, its error is different. The error of the prediction is Y -tim.
which will be normally distributed with a population mean of I-' -I-' = O.
BeSlfcRR'?~C new observatis?'TIc rS?,pposed to be
ftum the same Ic'spulation (2.7.3)
= var[ var[ Y] is givcn
var[jj m] = exactly the S8.1cz1C
(7.4.3), the for flm will bc

Y±tJ[var[y]('!'+'!'+ (X_X)2 )]. (12.4.6)


N m l:(xl-X)2

gle
228 Fitting curtJU § 12.'
As expeoted (and 88 in § 7.') this reduces to (12.4.5) when m is very
Ia.rge 80 g. becomes the same as p. The prediction is that if repeated
experiments are conducted and in each experiment the limits calculated.
then in 95 per cent of experiments (or any other chosen proportion.
depending on the value chosen for t) the mean of m new observations
will fall within the limits. The limits. and g•• will of course vary
from experiment to experiment-see § 7.9. This prediction is. 88
usual. likely to be optimistic (see § 7.2). The use of this method is
illustrated on § 13.14 (and plotted in Fig. 13.14.1).
12.&. Fitting a straight line with one observation at each JC value
The results in Table 12.IU show a single observation on the dependent
variable. y. at each value of the independent variable. x. For example.
y might be the plasma concentration of a drug at a precisely measured
time x after administration. The common sense supposition that the
times have not been chosen sensibly will be confirmed by the analysis.
The assumptions n.ecesaary for the analysis have been discussed in
U 11.2. 12.1. and 12.2. and the meaning of confidence limits has been
discussed in 117.2 and 7.9. These should be read before this section.
TABLE 12.5.1

lie 11

160 59
165 54
169 M
175 67
180 85
188 78
Totals 1087 407

There is a tendency for y to increase as x increases. Is this trend


statistically significant 1 To put the question more precisely, does the
estimated slope of this line. b, differ from zero to an extent that would
be likely to occur by random experimental error if the trw slope of the
line, p, were in fact zero 1 In other words it is required to test the null
hypothesis that p = O.

Fitting the straighlline Y = a+b(x-x)


The least squares estimate of at in (12.2.3) is. by (12.2.6),
a= g= 407/6 = 67·833.

Digitized by Google
§ 12.G 229
The least aqua.ree estimate of the slope, p, is, by (12.2.8),
b= l:(y-g)(z-i)fI'.(z-ir~.

First calculate, by (2.6.6),


l:(Z_i)2 = l:z2-(u)2/N = 1602+ 1662+ ... + 1882-103~/6 = 626·833.
The sum of products is found using (2.6.7),
l:(y-g)(z-i) = l:yz-l:y.u/N
= (69 X 160)+ ... +(78X 188)-(407 X 1037)/6
= 611·833.
Thus b = 611·833/626·833 = 0·9716. Also i = 1037/6 = 172·833.
Inserting these values in (12.2.2) gives the equation for the least
squares straight line.
Y = 67·833+0·9716 (z-172'833) (12.6.1)
This line is plotted in Fig. 12.5.1 together with the observed values.
Does the estimated slope, b = 0'9716, differ from zero by more than
could reasonably be expected if the population slope, p, were zero ,

The analyri8 oj tHJriance


The a.na.lysis is performed with the observations (y values). The
independent variable, z, only comes in incidentally. The principle of
the method is described in § 12.3.
The total sum of squares, by (2.6.6), is
l:(y_g)2 = l:1I'-(l:y)2/N = 6D2+ ... +782-4072/6 = 682·833.
The sum of squares due to linear regre88ion, by (12.3.4), is
(511·833)2/626·833 = 497·260.
The sum of squares due to deviations from linearity is found, using
(12.3.3), by difference, as 682'833-497·260 = 185·673.
There are 6 values of y so the total number of degrees of freedom is 6.
The sum of squared deviations (SSD) due to linear regression has one
d.f. because it corresponds to the calculation of one statistic (b) from
the observations (this is made obvious by the identity of the analysis
with a t test, shown at the end of this section). The analysis summarized
by (12.3.3) is tabulated in Table 12.5.2, which is completed and inter-
preted in the sa.me way as previous analyses of variance (e.g. Table
11.3, 11.4, and 11.G). The two figures in the mean square column would
be independent estimates of the 8ame quantity (a2) if all 6' observations
were from a single population (with mean p and variance a2). This way

Digitized by Google
230 Fitting CUrt1e8 § 12.5
of stating the null hypothesis implies that the population mean of the
observations is always p (whatever the x value), i.e. it implies that
{J = 0, the way in whioh the null hypothesis was put above. The
probability that the ratio of two independent estimates of the same
variance will be 10·72 (as observed, Table 12.5.2), or larger, is 0·02 to
0·05 (see §§ 11.3 and 11.4), i.e. 10·72 would be exceeded in something
120
y I
100 /
/

--
80
#
X/ 0

-- -- -- /.
.
x--- ----
60

40

20

Or-------·OO~----~~~--~~I~OO~----~~~
z

-20

-40

-60

-80

- 100
FlO. 12.5.1. Observed points from Table 12.5.1.
--Least squares estimate ofatraight line (eqn. (12.5.1).
- - - 95 per cent confidence limits for Y, i.e. for the fitted line.
X Particular values of confidence limits calculated in the text.

between 2 and 5 per cent of experiments in the long run (the limitations
of the tables of F, see § 11.3, prevent P being found to any greater
acouracy).
In this analysis there is no good estimate of the experimental error.
beoause only one observation was made at each value of x. This analysis

Digitized by Google
§ 12.6 The. relationahip betwem two t1ariablu 231
should be compared with that in § 12.6, in which replication of the
observations gives a proper estimate of a2. The best that can be done
in this case is to a88'Utne that the line is straight, in which case the
mean square for deviations from linearity, 46'393, will be an estimate of

TABLB 12.6.2
8oU1'C8 of
variation d.f. SSD M8 F P

Linear repeI8i.on 1 497·260 497·260 10·72 0·02-0·06


Devia.tioD8 from
Unearlty N-2 =4 185·673 46·898

Total N-l =6 682·888

the error variance (see § 12.6). Following this procedure shows that a
value of b dift"ering from zero by as much as or more than that observed
(0'9716) would be expected in between 2 and 6 per cent of repeated
experiments if pwere zero. This suggests, though not very conclusively,
that 11 really does increase with x (see § 6.1).

Gauna" ccmjidence limit8 for the. population. line


The error variance of the 6 observations (the part of their variance
not accounted for by a linear relationship with x) is estimated, from
Table 12.2, to be var[y] = 46·393 with 4 d.f. The value of Student's
, for P = 0·96 and 4 d.f. is 2·776 (from tables, see § 4.4). The confidence
limits for the population value of Y at varioU8 values of x can be
found from (12.4.6). To evaluate var[Y] from (12.4.4) at each value of x,
thevaluesvar[y] = 46·393,N = 6,x = 172·833andl:(x-x)1I =526·833
are needed. These have already been calculated and are the same at all
values of x. Enough values of x must be U8ed in (12.4.4) to allow smooth
curves to be drawn (the broken lines in Fig. 12.5.1). Three representative
oaloulations are given.

(a) x = 200. At this point, (by 12.5.1), the estimated value of Y is


Y = 67·833+0·9716 (200-172'833) = 94·23
and, by (12.4.4),
1 (200-172'833)11)
var[Y] = 46·393 ( -+ = 72·72.
6 626·833

Digitized by Google
232 § 12.5
The Gaussian confidence limite for the population value of
Y at x = 200 are thus, by (12.4.5), 94·23 ± 2·776'\1(72·72), i.e.
from Y = 70·56 to Y = 117·90; these are plotted in Fig. 12.5.1
at x = 200.
(b) x = 172·833 = i. At the point (x-i) = 0 80, from (12.5.1),
Y = 67·833 = g. From (12.4.4) var[Y] = var(y)(I/N) = 46·393/6
= 7·73, and the confidence limits for the population value of Y
are, by (12.4.5), 67·833 ± 2·776'\1(7·73), i.e. from 60·11 to 75·55.
(c) x = o. At this point, the intercept on the y axis, Y = 67·833
+0·9715 (0-172·833) = -100·1. Thisis, of course, a considerable
extrapolation beyond the range of the experimental results. From
(12.4.4),
1 (0-172.833r~)
var[ Y] = 46·393 ( 6+ 526.833 = 2638,

which is far larger than when x is nearer i. The confidence limits are
-100·1 ± 2·776'\1(2638), i.e. from -243 to +42·5.
The confidence limits, are much wider at the ends than at the central
portion of the curve which illustrates the grave uncertainty involved
in extrapolation beyond the observations. Moreover it must be re-
membered that these confidence limits a&9'Ume that the population
(true) line is really straight in the region of extrapolation. There is, of
course no reason (from the evidence of this experiment) to a&8ume this.
In fact with only one observation at each x value linearity could not be
tested even within. the range of the observations. The uncertainty in
the extrapolated intercept, Y = -100·1, at x = 0 is therefore really
even greater than indicated by the very wide confidence limits which
extend from -243 to +42'5 (even apart from the further uncertainties
dillC1188ed in § 7.2). The intercept does not differ 'significantly' from
zero (or even from +40 or -240) 'at the P = 0·05 level'.
Puting a hypothetical value with the t tut
As in § 9.4 the confidence limits can be interpreted as a t test, and
this will make it clear that the (rather undesirable) expression 'not
significant at the P = 0·05 level' means the result of the test is P
> 0·05. For example to test the hypothesis that the population value of
the intercept is p = +40, calculate, from (4.4.1),
t = (Y-p)/'\Ivar[Y] = (-100·1-40)/'\1(2638) = -2·728
with 4 degrees of freedom. Referring t = 2·728 to a table (see § 4.4)
of Student's t distribution shows P > 0·05 (two tail; see § 6.1).

Digitized by Google
§ 12.6 233
The curvature of the confidence limits for the population line is
only common sense because there is uncertainty in the value of a,
i.e. in the vertical position of the line, as well as in b, its slope. If lines
with the steepest and shallowest reasonable slopes (confidence limits for
fl) are drawn for the various reasonable values of a the area of uncer-
tainty will have the outline shown by the broken lines in Fig. 12.6.1.
Another numerical example (with unequal numbers of observations at
each point) is worked out in § 13.14.

Oonfi,tlen,u limita lor the slope. Identity 01 the analyriB 01 tHJrianu with a
ttut
In § 12.4 it was mentioned that the slope will be normally distributed
if the observations are, with variance given by (12.4.2). In this example
b = 0·97115 and var[b] = 46·393/1526·833 = 0·08806. The 915 per cent
confidence limits, using t = 2'776 as above, are thus, by (12.4.3),
0·97115±2·776v'(0·08806), i.e. from 0·16 to 1·80. These limits do not
include zero, indicating that b 'differs signficantly from zero at the
P = 0·015 level'.
Ju above, and as in § 9.4, this can be put as a t teat. The GauaBian
(normal) variable of interest is b, and the hypothesis is that its popula-
tion value (fJ) is zero 80, by (4.4.1),
b-fl 0,97115-0
t = v'var[b] = 0.08806 = 3·274

with 4 degrees of freedom (the number aasooiated with var[y], from


which var[b] was found). Referring to tables (see § 4.4) of Student's
t distribution shows that the probability of a value of t as large as, or
larger than, 3·274 ocourring is between 0·02 and O·OlS, as inferred from
the confidence limits.
It was mentioned in § 11.3 (and illustrated in § 11.4) that the variance
ratio, F, with 1 d.f. for the numerator and I for the denominator is
simply a value of t 2 with I degrees of freedom. In this case t2 with
4 d.f. = 3.2742 = 10·72 = F with 1 d.f. for the numerator and 4 for the
denominator. This is exactly the value of F found in Table 12.15.2
(and P = 0·02-0·05 exactly as in Table 12.5.2). It is easy to show
that this t teat is, in general, algebraically identical with the analysis
of variance, 80 the component in the analysis of variance labelled
'linear regression' is simply a test of the hypothesis that the population
value of the straight line through the results is zero. This approaoh also,

Digitized by Google
234 Fitting CUrtJe8 § 12.5
incidentAlly, . makes it clear why this component in the analysis of
variance should have one degree of freedom.

12.8. Fitting a straight line with several observations at each JC


value. The use of a linearizing transformation for an
exponential curve and the error of the half-life
The figures in Table 12.6.1 are the results of an experiment on the
destruction of adrenaline by liver tissue in vitro. Three replicate
determinations (n = 3) of adrenaline concentration (the dependent
variable, y) were made at each of Ie = 5 times (the independent variable,
x). The figures are based on the experiments ofBain and Batty (1956).

TABLB 12.6.1
Val'Ue8 0/ adrenaline (epinephrine) concentration, y Cpg/ml)
Time, II) (min)
6 18 80 42 64 Total

30·0 8'9 4'1 1·8 0·8


28'6 8·0 4·6 2'6 0·6
28·6 10·8 4'7 2·2 1'0
TotaJa 87'1 27·7 18'4 6·6 2'4 187·2

The decrease in adrenaline concentration with time plotted in Fig.


12.6.1 is apparently not linear. Because there is more than one observa-
tion at each point in this experiment it is possible to make an estimate
of experimental error without assuming the true line to be straight
(cf. § 12.5). There/ore it i8 :po88Wle to jwlge whether or not it i8 reasonable
to attribute the ob8erved deviations from linearity to experimental error.
The assumptions of the analysis have been discussed in §§ 6.1, 7.2,11.2,
12.1, and 12.2 which should be read first. There are not enough reatdt8
/or any 0/ the assumptions to be checked 8atis/actorily (see §§ 4.6 and 11.2).
The basic analysis is exactly the same &8 the one way analysis of
variance described in § 11.4, the 'treatments' in this case being the
different x values (times). As in § 11.4, it is not necessary to have the
same number of observations in each sample (at each time). lithe three
rows in Table 12.6.1 had corresponded to three blocks (e.g. if three
different observers had been responsible for the observations in the
first, second, and third rows) then the two-way analysis described in
§ 11.6 would have been appropriate, with a between-rows (between

Digitized by Google
§ 12.6 235
blocks, between observers) component in the analysis of varianoe. The
additional factor, compared with the one-way analysis in § 11.4, is

o
o
o
o 6 18 30 42 54
Timer
FlO. 12.6.1. Observed mean adrenaHne concentration (g./) plotted against
time (z). Data of Bain and Batty (1956) from Table 12.6.1.

that part of the differences 'between treatments' (i.e. between the mean
conoentrations at the five different times) can be accounted for by a
linear change of conoentration with time (see § 12.3).
OalctUati'llfl lite analy8i8 0/ variance 0/ y
The first part is exactly as in § 11.4 (where more details will be found).
(1) OOf'f'ectifm/actor (PIN = (137.2)2/15 = 1264·923.
(2) Total sum o/aquarea, from (2.6.5) (cf. (11.4.3) and refer to § 2.1 if
you are confused by the notation).
II 3
I I (yu-g.J2 = '}:.'}:.y,~-G2IN
1,.1 '''I
= 30·()2+8·92 + ... +2·22 +1·()2-1264·923
= 1612·037.

(3) Sum 0/ aquarea (SSD) between. columm (Le. between the ooncen-
trations at different times), by (11.4.5), is
87.12 27.72 2.42
SSD between times = -3-+-3-+"'+3-1254'923 = 1605·937.

This SSD can be split into two components, just as in § 12.5. In this
case the calculations could be made easier by transforming the
independent variable (x), as shown at the end of this section. But,
for generality, the full calculation will be given first.

Digitized by Google
236 112.6
(a) 811m oJ MJ1UII'U title to liMaf'regreuicm. This is found. from (12.3.4)
88
N
SSD = [I(y-y)(x-i)]2 I(x-i)2.
IN
It is easy to make a mistake at this stage by supposing that there
are only five x values, when in fact there are N = 15 values. This
will be avoided if the N = 15 pairs of observations are written
out in full, as in Table 12.6.1, rather than in the oondensed form
shown in Table 12.6.1. This is shown in Table 12.6.2.

TABLE 12.6.2

6 80·0
6 28·6
6 28·6
18 8·9

64 0·8
64 0·6
54 1·0

Totals 450 187·2

Firstly find the sum of products using (2.6.7) and Table 12.6.2:

I(y-g)(X-i) = IXY (u)(1:y)


N
= (6X 30·0)+(6 x28·6)+ ...
(450)( 137·2)
+(54x 1·0)
15
= -2286·000.

Notice that 1:x = 3(6+18+30+42+54) = 460 (compare Table


12.6.2), because each x occurs three times; and also that the
calculation of the sum of products can be shortened by using the
group totals from Table 12.6.1 giving
(6x 87·1)+(18x 27.7)+ ... +(54x 2.4) (460)(137·2)
15
= -2286·000. (12.6.1)

Digitized by Google
111.8 237
Secondly, find the BUm of aquarea for %. From (2.8.6)
II II (U)I 46C)I
!(%-zf' = !r- N = ea+ea+ ... +M2+MI _15
46C)I
= 3(61+182+ ... +6041 )-15
= 4320·000. (12.6.2)
From (12.3.4) the SSD due to linear regreeaion now follows
SSD = (-2286·00)2 = 1209.88
4320·000 .
,b) SSD lor tkviatioM from liMarity. As in § 12.4 this is most easily
found by dift"erenoe (cr. (12.3.3»
SSD due to deviations from linearity = SSD between % values-
SSD due to linear regres-
sion (12.6.3)
= 1606·937 -1209·68
= 396·26.

(4) SSD lor etTor. This is simply the within groupB SSD of § 11.4.
The experimental error is assessed from the scatter of replicate observa-
tions of each % value. It has N -Ie = 15-5 = 10 degrees of freedom

TABLB 12.6.3
GaU88ian analYN 01 roriance 01 y
Source d.C. SSD MS F P
Linear regreaDOD I 1209·68 1209·68 1988 <0·001
Deviations from
tinearlty k-2 = 8 896·26 132·09 216·5 <0·001
Between lie valuee
(timee) k-l = 4 1605·937 401·48 658·2 <0·001
Error (within lie
valuee (timee)) N-k = 1O 6·100 0·6100
Total N-I = 14 1612·037

and is most easily found by difference as in § 11.4, thus 1612·037


-160·937 = 6·100.
These results can now be assembled in an analysis of variance table
(Table 12.6.3) the bottom part resembling Table 11.4.2, the top .part
1605.937

Digitized by Google
238 Fitting c.rves § 12.6
resembling Table 12.5.2 (except that the number of different x values, k,
is no longer the same as the total number of observations, N).
The table is completed as described in Chapter 11 and § 12.5. Each
mean square would be an estimate of a2 if the null hypothesis that all
15 observations came from a single normal population (with variance
a2) were true. The ratio of each mean square to the error mean square is
referred to tables of the F ratio (see § 11.3), to see whether it is larger
than could be expected by chance. Although a considerable part of the
differences between the mean adrenaline concentrations at different
times ('between times') is accounted for by a linear relationship between
concentration (y) and time (x), the remainder ('deviations from linearity')
is still much larger than could be reasonably expected if the true line
were straight. P < 0·001, i.e. deviations from linearity as large as
those observed, or larger, would occur in far fewer than 1 in 1000
repeated experiments if the true line were straight, and if the assump-
tions about normal distributions, etc (see § 12.2), made in the calcula-
tions are sufficiently nearly true.
There are now two possibilities. Either a curve can be fitted directly
to the observations (see §§ 12.7 and 12.8), or a transformation can be
sought that converts the graph to a straight line. The latter approach is
now desoribed.

A linearizing tram/ormation. Does the cataboli8m 0/ adrenaline follow an


exponential time oour8e 1
If the rate of catabolism of adrenaline by liver tissue at any given
moment were proportional to the concentration of adrenaline (y)
present at that time (x) than the concentration of adrenaline (Table
12.6.1) would be expected to fall exponentially, i.e.

(12.6.4)

where Yo is the conoentration present at time x = 0 and k is the rate


constant. t The reciprocal of k, the time constant, is the time taken
for the concentration to fall to 100le ~ 36·8 per cent of its original
value (when x = 11k, it follows from (12.6.4) that y = Yale). Taking
natural logarithms (logs to base e) of (12.6.4) gives

loge1l = loge1lo -kx. (12.6.5)

t The symbol k has already been used for the number of treatments (times). but
there should be no risk of oonfusion between ita two lllMIIings.

Digitized by Google
§ 12.6 The relatiOMkip between two variables 239
(Remember the log is the power to which the base must be raised to
give the argument, 80 loge e- 1cz = -kz.) Therefore there is a straight
line relation between log y and x, with slope -k and interoept log Yo'
The half-life of adrenaline is related to the rate constant in a simple
way. Putting y = yo/2 in (12.6.5) gives the half-life as
loge2 0·69315
XO'6 =k = k (12.6.6)

T1te interpretation of t1te rate C01/.8tant in molecular terms i8 diac'U88ed in


§ .A2.3.
Common logarithms (to base 10) are more easily available than
natural logarithms 80 it will be convenient to write (12.6.5) in terms of
common logarithms. Dividing though by logelO ~ 2·3026 gives, using
(13.3.5),

(12.6.7)

a straight line with slope = -k/2·3026 and an intercept IOgloYO'


In order to do the following analysis it is necessary to assume that the
values of log y at each x value are normally distributed (i.e. that y is
lognormal, § 4.5) and homoscedastic (see § 12.2). These assumptions,

TABLE 12.6.4
Values oflogloY f()IUnd from Table 12.6.1
Time, IIC (min) Total
6 18 80 42 64

1·4771 0'9494 0'6128 0·2558 -0'0969


1-4564 0'9081 0·6628 0·4150 -0·2218
1·4548 1·0334 0·6721 0·8424 0'0000

Total 4·3888 2'8859 1·9477 1'0127 -0'8187 9·9159

of course, contradiot those just made in doing the analysis of variance


of y (Table 12.6.3), when y itself was supposed normal and homo-
scedastic (see the discussions of transformations in §§ 11.2 and 12.2;
this problem would not arise if the transformation was made on the in-
dependent variable x). There is no way of telling how likely it is that
this contradiction will give rise to misleading inferences in particular
cases. In the absenoe of real knowledge about the distributions of the
J,/

Digitized by Google
240 Fitting CUrtJe8 § 12.6
observations the analysis will, &8 previously emphasized (see §§ 4.2,
6.2, and 7.2), be in error to some unknown extent. 1J y were known to
be normally distributed the methods of § 12.8 would be preferred to
that now described.
To see whether the straight line defined by (12.6.7) fits the observa-
tions, the logarithms of the observations are tabulated in Table 12.6.4.
TABLE 12.6.5
GaUilBian. an.alyBia oJ mrian.ce oJ logloY
Source d.f. SSD MS F P

LInear regreaaion 1 4·2467 4·2467 87H ~0·001


Deviations from linearity 3 0·0337 0'0112 2'31 0'1-0·2

Between times 4 4·2804 1·0701 220·3 ~0·001


Error 10 0·04858 0'004858
Total 14 4·3290

1·7

!
1 1.0
'0

0'0

-0.3L-..-;6!----t.18;;---~30~-~4~2~---;}54
Timer
Flo. 12.6.2. Same data 88 Fig. 12.6.1, but the mean value of the ioglO
adrenaline concentration (from Table 12.6.4) is plotted against time. The line is
that found by the method ofleaat squares, eqn (12.6.12).

The mean log concentrations are plotted against time in Fig. 12.6.2.
The graph looks much straighter than Fig. 12.6.1. The analysis of
variance of the log concentrations in Table 12.6.4 is now calculated in
exactly the same way &8 the calculation of the analysis of variance of

Digitized by Google
§ 12.6 241
the ooncentrations themselves (Table 12.6.1). The result is Table
12.6.5. Compare Table 12.6.3.
The results in Table 12.6.5 show that almost all the variation of the
log ooncentrations between times is aooounted for by a straight line
relationship between log y and time and the evidence against the null
hypothesis that the true slope of this line, p, is zero is very strong.
Deviations from linearity 88 large 88, or larger than, those observed
would be expeoted to ooour in between 10 and 20 per cent of repeated
experiments if the true population line were straight (if the assumptions
made are oorrect). There is, therefore, no compelling reason to believe
that the relation between log y and time is non-linear, i.e. the experi-
mentprovidesnoevidencethateqn. (12.6.7),andhenceal80eqn. (12.6.4),
fit the observations inadequately. In other words there is no reason
to believe that the concentration of adrenaline does not decay
exponentially.
Having established that it is reasonable to fit a straight line to the
log concentrations, the next step is to estimate the parameters (slope
and intercept) of the line.
Fitting the Btraight line
If the log observations are denoted y', i.e.
y' = IOglo1!, (12.6.8)
the equation to be fitted (12.6.7) can be written 88
IOglO Y == y' = a+b(x-i), (12.6.9)
whioh has the same form as in previous examples in this ohapter.
Using (12.2.6) the estimate of a is
9·9159
a = y' = 15 = 0·66106.
To estimate the slope, the sum of products is first found as described
in the analysis of untransformed concentrations (eqns. (12.6.1) and
(2.6.7»
"(' li')( x-x_) = "~X1J,
~ Y -:I
(1:x)(1:y')
N
= (6X4·3883)+(18x 2·8859)+ ... +(S4x -0'3178)-
(450)(9·9159)
15
= -135'446 (12.6.10)

Digitized by Google
242 § 12.6
This is negative because 1/' decreases with x (see § 2.6). The sum of
squares for x is found as in eqn. (12.6.2), and is 4320·000 as before. The
slope is therefore estimated, by (12.2.8), to be
l:(y' -g')(x-i) -135'466
b= l:(x-xr~ = 4320.000 = -0·03135. (12.6.11)

Putting these values, and i = 450/16 = 30·000 as before, into (12.6.9)


gives the least squares estimate of the straight line as
IOglO Y = 0·66106-0·03135 (x-30)
= 1·6016-0·03135x. (12.6.12)
Comparing this with (12.6.7) gives the estimates of the parameter as
loglo1lo = 1·6016,80 1/0 = 39'96I'g/ml (12.6.13)
-Ie
and - - = -0·0313680 k = 0·07219 min-I (12.6.14)
2·3026
In its original form (12.6.4) the estimated regreBBion equation is thus
1/ = 39·96e-0.07219Z (12.6.15)
The time oonstant (discussed above) for adrenaline catabolism is
estimated to be
I/k = 1/0,07219 = 13·85 min, (12.6.16)
and from (12.6.6) the half-life of adrenaline is
0·69315
Xu = k = 9·602 min. (12.6.17)

It is shown in § A2.3 that Ie-I = 13·85 min can be interpreted as the


mean lifetime of adrenaline molecules, and XO.6 = 9·602 min can be
interpreted as the median lifetime.
Oonfolence limm lor the holl-lile. From the analysis of variance
(Table 12.6.5), the variance of the log observations is estimated as
var[1/'] = 0·004858 (the error mean square with 10 d.f.). Thus, using
(12.4.2) and (12.6.2), the variance of the estimated slope (b in eqn.
(12.6.9» is var[b] = var[y']fl:(x-i)2 = 0·004858/4320·000 = 1·125
X 10- 8 • The value of Student's t for P = 0·96 and 10 d.f. (from tables,
see § 4.4) is 2'228, 80 95 per cent confidence limits for b follow
from (12.4.3) as b ± ty'(var[b]) = -0·03136 ± 2·228y'(1·125X 10- 8 )
= - 0·03371 to -0,02899. The values for the half-life oorrespondingto

Digitized by Google
§ 12.6 The relati0n8kip between two variable8 243
these values of b are now found as above (xu = 0,69315/( -2·3026b».
Because 2·3026 and 0·69315 are C01&8tant8, not variables, no additional
error enters in the conversion of b to xu. The 95 per cent Gaussian
confidence limits for the true half-life are thus 8·930 to 10·38 min. As
usual these limits can be interpreted as in § 7.9 only i/ a.ll the assump-
tions disoUBBed in §§ 7.2, 11.2, and 12.2 are fulfilled. And, as usual,
the limits are likely to be optimistio (see § 7.2) .

.A simplifying tram/ormation 0/ the x values


When, as in the present example, the x values are equally spaced and
there is the same number of observations at eaoh, the values of x can be
transformed to make the arithmetio simpler. H x' is defined as x/12-21
the soa.le becomes

~ -2 -1 0 +1 +2

Thus u' = -2-2-2-1-1-1+0+0+0+1+1+1+2+2+2 = 0,


BO i' = O. It follows that ~(x' -i'r~ = U'2 = 3(22+ 12+02+ 12+22)
= 30 and ~(11' -fi')(x' -i') = ~1I'(x' -i') = ~1I'x' = (-2 X 4'3883)+
... +(+2X -0'3187) = -11·2872. These simplifiedcaloulationsgive, of
course, the regression equation Y' = a+b(x' -i') = a+bx', the plot
of log 11 against x'. The result is Y' = 0·6611-0·3762x'. Inserting the
definition of x' gives Y' = 0·6611-0·3762 (x/12-21) = 1·602-
0'03135x, exactly as above (eqn. (12.6.12».

12.7. Linearity, non-linearity, and the search for the optimum


In real life most graphs are not straight lines. Sometimes, as in
§ 12.6, they can be converted to lines that are near enough straight, but,
as will be shown in § 12.8, this may be a hazardous prooeBB. Most
elementary books do not disCUBB ourves that are non-linear (in the
sense to be defined in this section) because the mathematics is incon-
venient to do by hand. Since most relationships that are based on BOme
BOrt of physical model are non-linear, this is unfortunate. A simple
computer method for fitting non-linear models will be given in § 12.8.
Before this the principles of finding least squares estimates will be
disoUBBed, mainly in a pictorial way, and an attempt made to give an
idea of the soope of linear (in the general sense) models.

Digitized by Google
244 Fitti"'ll CUrve8 1)2.7
Finding ka8t 8tJU4ru solutioM. The geometrical meaning oJ the algebra
In § 12.2 the least square estimates, is and 6, of the parameters, «
and p, of the straight line (12.2.3) were found algebraically. (In this
section; and in 112.8, the symbols is and 6 will be used to distinguish
least squares estimates from other poBBible estimates of the parameters.)
It will be convenient to illustrate the approach to more complicated.
curves by first going into the case of the straighter line in greater
detail.
The intention is to find the values of the parameter estimates that
make the sum of the squares of the deviations of the observations
(y) from the ca.lculated values (Y), 8 = l:(y- y)2 (eqns. (12.2.1) and
(12.2.5», as small as poBBible. Notice that during the utimation procedure
the ezperimen.tal obaertJationB are treated as C<m8ta1ll8 (the particular
obaertJationB made) and various po88ible val'Ue8 oJ the parameters are
CO'II.8idered. The conventional way of finding a minimum, as in § 12.2,
is to differentiate and equate to zero. How this works was illustrated. in
Fig. 12.2.2, in which 8 was plotted. against various poBBible values for
a(b being held constant). The slope of this graph (i.e. 08/&) is zero at
the minimum, and the corresponding value of a is taken as the least
squares estimate of «. The curly 0, indicating partial differentiation
means that b is treated. as a constant when differentiating (12.2.5) to
obtain (12.2.6). This means that b is given a fixed value which is
inserted, along with the experimental observations (from Table 12.7.1)
into (12.2.5) 80 that 8 can be ca.lculated for various values of a,
giving the curve plotted. in Fig. 12.2.2. It may occur to you to ask
whether the value at which b is held constant makes any difference to
the estimate of «. In fact it does not because the expreBBion found for
08/& did not involve b, and similarly the expreBBion for 08/& did not
involve a. The geometrical meaning of this will be made clear using the
data in Table 12.7.1.
Fitting a straight line in the form

Y = a+b(x-i) (12.7.1)

gives Y = 8·000+2·107(x-l·0), (12.7.2)

whioh is plotted in Fig. 12.7.1. The ca.loulations and interpretation are


the same as for the example in 112.5. The oorresponding analysis of
variance in Table 12.7.2 shows that, if the true line were straight and the
a.BBumptions described in 1 12.2 were true, then the slope of this line is

Digitized by Google
§ 12.7 245

TABLB 12.7.1
til: 11

-2 1
-1 4
0 6
1 9
2 11
S 10
4 15

Total 7 56
Mean 1'0 8'0

16

~2~---~I----O~--~--~2--~3~--J4

FlO. 12.7.1. A atrafght Une (eqn. (12.7.2)) fitted by the method of leut
squares to the data In Table 12.7.1.

Digitized by Google
246 Fitting CUf'Ve8 § 12.7
greater than could be reasonably expected if the population 8lope (fJ)
were zero.
The least 8quares estimate8 given in (12.7.2) are d = 9 = 8·000 and
6 = 2·107, calculated from eqns. (12.2.6) and (12.2.7). H the values of x
and y from Table 12.7.1 are inserted in the expre88ion for the 8um of

TABLE 12.7.2

Source d.f. 8S MS F p

Linear regreaaion 1 124·821 124'821 SO'96 <0·001


DeviatioDS from linearity 6 7·679 1·636

Total 6 182·000

squared deviations, 8, (eqn. (12.2.5», then 8 can be calculated for


VariOU8 po88ible values of a and b. There are three variables here 80
the results mU8t be plotted as a three-dimensional graph. The m08t
convenient way to represent thi8 on two dimensional paper i8 to plot a
contour map, the contours representing value8 of 8 (the height), i.e.
the a- and b-axes are in the plane of the paper and the 8-axi8 is 8ticking
up perpendicular to the paper. The result, calculated for the results in
Table 12.7.1 using eqn. (12.2.5), is 8hown in Fig. 12.7.2. The graph i8
seen to represent a valley with elliptical contours. The bottommost
point of the valley corresponds to d = 8·000 and 6 = 2·lO7, i.e. the
least 8quare8 estimate8 already found.
It can now be shown why the value of b used in constructing Fig.
12.2.2 did not matter. In Fig. 12.7.3 sections acro88 the valley are shown
for b = 6 = 2'107, b = 3·0 and b = 3·4. The line8 along which these
sections have been taken are 8hown in Fig. 12.7.2. It can be seen that
wherever the section is taken (i.e. whatever value b i8 held constant at),
the minimum in the curve occurs at the same place, viz. at a = if
= 8·00. Similarly, if sections acr088 the valley are taken at variOU8
fixed a values (i.e. at 90° to the sections illU8trated), each section will
give a plot of 8 against b, with 8lope oS/ob. The minima (as/ab = 0) of
these curves will clearly all be at b = 6 = 2·lO7 whatever the value of a.
Clearly this independence of a and b arise8 because the axe8 of the
ellipses in Fig. 12.7.2 are at right angle8 to the coordinates of the
graph (the ellipses are said to be in ca7W1/,ical form).
The fundamental form of the 8traight-line equation i8 p. = (I.' +px,

Digitized by Google
Rections taken along theae lines
[

5~------~----~--~r---~~~~~--~
o 2 3 3·4 "
b Values oU
FIG. 12.7.2. Contour map of the sum of squared deviations, S (on an &Xis
perpendicu1a.r to the paper), plotted against various values of a and b using eqn.
(12.2.5) and the data in Table 12.7.1 (plotted in Fig. 12.7.1). The contours for a
stra££lE£lEt in the fonn always have
ValUL'0:§ ililSkLked on the m%nimum value of S~
£Sues the least SqUiliLU as c'l = 8'000
,,£lEe valley, along are plotted in
lSk~uint on each llnL 12.7.3) is marlSk'0:§J

gle
24:8 § 12.7
where «' is the intercept, {J the slope, and p the population value of y.
Inserting the estimates of the parameters gives
Y = d'+6x (12.7.3)
and, because (12.7.2) can be written as
Y = 5·893+2·107x, (12.7.4)

80

70

20

s_10
0~--~--~--~~--~--~--~~~7-~,
" 5 6 7 8 9 10 11 12
•= 8.00
a V&lues of II

Flo. 12.7.3. Sections &Cl'OIIII the valley along the lines indicated in Fig. 12.7.2.
The slope of the line, as/aa, is zero when S is at a minimum, &8 shown in Fig.
12.2.2. The value of S at the bottom of the valley is 7 ·679 &8 shown, and &8 found
in Table 12.7.2.

it is seen that d' = 5·893 and 6 = 2·107. Comparison of (12.7.1) and


(12.7.3) shows that in general, as in § 12.2,
d' = d-6x. (12.7.5)
It may be asked why (12.7.4) was arrived a.t indireotly, through the
seemingly more complicated form, (12.7.2). Why not apply the method
of least squares directly to (12.7.3)1 The answer to this will become
clear when it is tried. The method of least squares will now be applied

Digitized by Google
§~12.7
to the straight line in the form of (12.7.3), in just the same way as it
was applied in § 12.2 to the straight line in the form of (12.7.1).
Denoting the observations y and the values calculated from (12.7.3)
as Y, as in § 12.2, gives the sum of squared deviations, which is to be
minimized, as
8 = l':(y_Y)2 = l':(y-a' -bX)2
= l':(y+a'2+bY-2a'y-2ybz+2a'bx)
= l':r+Na'2+b2l:x2-2a'l:y-2bl':yx+2a'bl:x. (12.7.6)
This is analogous to (12.2.5), but notice that this time the last term is
not zero. As in § 12.2,8 is differentiated with respect to a', treating b as a
constant, giving
08
ea' = 2Na' -2l':y+2bl:x, (12.7.7)

and equating this to zero to find the value of a' for which 8 is a minimum
(see Fig. 12.7.5) gives
Na' +bl:x = l':y. (12.7.8)
The value of a' for which 8 is a minimum is no longer independent of b,
as shown by the presence of b in (12.7.8), the solution of which will
depend on the value of b chosen.
Differentiating (12.7.6) with respect to b, holding a' constant, gives
08
ab = 2bl:x2-2l':yx+2a'l':x, (12.7.9)

and again equating to zero gives


a'l:x+bl:x2 = l':yx. (12.7.10)
Again unlike the result in § 12.2, the estimate of b is seen to depend on
the value of a'.
The required solution for 4,' and 6' are those for which (12.7.8) and
(12.7.10) are both true simultaneously. In fact, (12.7.8) and (12.7.10)
are a pair of (linear) simultaneous equations (known, in regre88ion
analysis, as the normal equati0n8), which can be solved for a' and b
by school-book methods giving (with the values of x and 11 in Table
12.8) 4,' = 5·893 and 6 = 2·107 as found above.
What is the geometrical meaning of these results 1 If contours are
plotted from (12.7.6) (using the data in Table 12.7.1) the results are as
shown in Fig. 12.7.4.

Digitized by Google
250 F'tt'ng curves § 12.7
The contours are still elliptical, but their axes are no longer parallel
with the coordinates of the graph. When sections are made acr088 the
valley at the values of b shown in Fig. 12.7.4, the results are &8 shown in
Fig. 12.7.6.

6
U'

b
FlO. 12.7.4. Contour maps of S (values marked on the contours) for same
data as Fig. 12.7.2, but straight line fitted in the form Y = a'+bz. Sections
&Cl'08S the valley, along the lines shown, are plotted in Fig. 12.7.5. The lowest
point along each line (minima in Fig. 12.7.5) is marked x.

The value of a' for which 8 is a minimum is seen to depend on the value
at which b W&8 held constant when making the section aCl'088 the valley,
&8 expected from (12.7.8). Of course, the slope of the curves in Fig.

Digitized by Google
§ 12.7 The relatioMhip betwetf& two txJriahle8 251
12.7.5, 08/0a', is zero at the minimum of each curve. But the only
point at which 08/ob is BimuZtane0'U8ly zero is at the bottommost
point of the valley in Fig. 12.7.4 (hence the BimuZtaneotl8 equations).
For example, on the curve for b = 3·4, in Fig. 12.7 .li, 8 is at a minimum
(i.e. oS/Oa' = 0) at the point a' = 4·6. Inspection of Fig. 12.7.4 makes
it clear that if a section is made acro88 the valley (at 900 to the sections

80
S

70

60

50

40

30

20

10
S ...
I
I
~~)---7.3~--47---~5~-'~6--~7~--~8--~9~~10
a' Values of a'
FlO. 12 •7 .5. Sections &Cl'OII8 the valley, along the lines shown in Fig. 12.7.4.
when a straight line is fitted in the form Y = a' +bit. The value of S at the bottom
of the valley is 7·679 as before.

in Fig. 12.7 .li) at a' = 4·6, giving a plot of 8 against b (with slope
= oS/ob), the minimum will not be at b = 3·4. That is to say, at the
point a' = 4·6, b = 3·4, oS/oa' is zero but 08/ob is not.
It is now clear that the effect of writing the straight line in the form
Y = a+b(x-x), is to make the estimates If and 6 independent of each
other 80 two simple independent equations (derived in § 12.2) can be
used for their estimation. lithe line is written in the form Y = a'+bx,
then the estimates are no longer independent, but must be found by
solving simultaneous equations.

Digitized by Google
262 F'tt'n.g Ct.U't1U § 12.7
WAaI dou linear mean t
The term linear, as usually based by statisticians, embraces more
than the simple straight line. It includes any relationship of the form

(12.7.11)

where Xu Xlh Xa, ••• are independent variables (see § 12.1; examples
are given below), and a, b, c, d, . . . are estimates of parameters. This
relationship includes, as a special case, the straight line (Y = a+bx),
which has already been disc1188ed at length. Equation (12.7.11) is des-
cribed as a mull'ple linear regression equation (the 'linear' bit is,
sad to say, often omitted). As well &8 describing straight line
relationships for several variables (Xl' X" • • • ), (12.7.11) also includes,
for example, the parabola (or 8econcl degree polynomial, or quadratic),
Y = a+bx+cT, &8 the special case in which X, is the square of Xl'
(As disc1188ed in § 12.1, an 'independent variable' in the regression
sense is simply one the value of which can be fixed precisely by the
experimenter; it does not matter that in this case Xl and X, are not
independent in the sense of II 2.4 and 2.7 since their covariance is not
zero. All that is required is that the values of Xl' X" • • • be known
precisely.) The parabola is not a straight line of course, but it is linear
,n the 8ense tAal Y is a linear function (p. 39) 0/ the parameter8 i/ the X
tHJluea are regarded as wnatanta (they are fixed when the experiment is
designed). This is the sense in which 'linear' is usually used by the
statisticians. It turns out that for (12.7.11) in general (and therefore
for the parabola), the estimates of the parameters are linear functions
of the observations. This has already been shown in the case of the
straight line for which 8 = g, and for which 6 h&8 also been shown
(eqn. (12.4.1» to be a linear function of the observations. This means
that the parameter estimates will be normally distributed if the
observations are, and the standard deviations of the estimates can be
found using (2.7.11). Also, if the parameter estimates are normally
distributed, it is a simple matter to interpret their standard deviations
in terms of significance tests or confidence limits. Furthermore, linear
problems (including polynomials) give rise to linear simultaneous
equations (like (12.7.8) and (12.7.10» which are relatively easy to
solve (cf. § 12.8). They can be handled by the very elegant branch of
mathematics known &8 matrix algebra, or linear algebra (see, for example,
Searle (1966), if you want to know more about this). It is doubtless
partly the aesthetic pleasure to be found in deriving analytical solutions

Digitized by Google
§ 12.7 The relationBhip between two tNJriahlu 253
in terms of matrix algebra that has accounted for the statistical litera-
ture being heavily dominated by empirical linear models with no
physical basis, and, much more dangerous, the widespread availability
of computer programs for fitting such models by people who do not
always understand their limitations (some of which are mentioned
below).

Polynomial curt1e8
It does not change the nature of the problem if some z values in
(12.7.1) are powers of the others. Thus the general polynomial regression
equation

(12.7.12)

is still a linear statistical problem. Increasingly complex shapes can


be described by (12.7.12) by including higher powers ofz. The highest
power of z is called the degree of the polynomial, so a straight line is a
first degree polynomial, the parabola is a second degree polynomial,
the cubic equation, Y = a+iJz2+dz3, is a third degree polynomial,
and so on. Just as a straight line can always be found to pass exactly
through any two points, it can be shown that pth degree polynomial
can always be found that will pass exactly through any specified p+ 1
points. Because of the linear nature of the problem discussed above,
polynomials are relatively easy to fit (especially if the z values are
equally spaced). Methods are given in many textbooks (e.g. Snedecor
and Cochran (1967, pp. 349-58 and Chapter 13); Williams (1959,
Chapter 3); Goulden (1952, Chapter 10); Brownlee (1965, Chapter 13);
and Draper and Smith (1966» and will therefore not be repeated here.
Although polynomials are the only sort of curves described in most
elementary books, they are, unfortunately, not of much interest to
experimenters in most fields. In most cases the reason for fitting a
curve is to estimate the values of the parameters in an equation based
on a physical model for the prooe88 being studies (for example the d
Michaelis-Menten equation in biochemistry, which is discussed in
§ 12.8). Very few physical models give rise to polynomials, which are
therefore mainly used in a completely empirical way. In moat BituationB
nothing more i8 learned by fitting an empirical curve (the parameter8 oj
which have no physical meaning), than could be made obvious by drawing a
curve by eye. One possible exception is when the line is to be used for
prediction, for example a calibration curve, and an estimate of error is

Digitized by Google
204 FiUing CUrve8 § 12.7
required for the prediction-see § 13.14. In this case a polynomial
curve might be useful if the observed line was not straight.

M uUiple linear regre8sWn


If, as is usually the case, the observation depends on several different
variables, it might be thought desirable to find an equation to describe
this dependence. For example if the response of an isolated tissue
depended on the concentration of drug given (Xl' say), and also on the
concentration of calcium (X:h say) present in the medium in which the
tissue was immersed, then the response, Y, might be described by a
multiple linear regression equation like (12.7.11), i.e.
(12.7.13)
This implies that the relationship between response and drug concentra-
tion is a straight line at any given ca.lcium concentration, and the
relationship between response and ca.lcium concentration is a straight
line at any given drug concentration (so the three-dimensional graph of
Y against Xl and X2 is a flat plane). As already explained Xl could be
the log of the drug concentration, and X2 could similarly be some
transformation of the calcium concentration, the transformation being
ohosen so that (12.7.13) describes the observations with sufficient
accuraoy. Even so the linear nature of (12.7.12) is a considerable
restriction on its usefulness. Furthermore all the assumptions described
in § 12.2 are still necessary here. The process of fitting multiple linear
regression equations is described, for example, by Snedecor and Cochran
(1967, Chapter 13); Williams (1959, Chapter 3); Goulden (1962,
Chapter 8); Brownlee (1965, Chapter 13); and Draper and Smith
(1966).
The really serious hazards of multiple linear regression arise when the
X values are not really independent variables in the regression sense
(see § 12.1), i.e. when they are not fixed precisely by the experimenter,
but are just observations of some variable thought to be related to Y,
the variable of interest. Data of this sort always used to be analysed
using the correlation methods described in § 12.9, but are now very
often dealt with using multiple regression methods. There is muoh to be
said for this as long as it is remembered that however the results are
analysed it is impossible to infer causal relationships from them (see
also §§ 1.2 and 12.9).
Consider the following example (which is inspired by one disoussed
by Mainland (1963, p. 322». It is required to study the number of

Digitized by Google
§ 12.7 The relatiOfUlhip between. two mriablea 255
working days lost though illne88 per 1000 of population in various
areas of a large city. Call this number y. It is thought that this may
depend on the number of doctors per 1000 population (Zl) in the
area and the level of prosperity (say mean income, ZII) of the area.
Values of y, Zl' and ZII are found by observations on a number of areas
and an equation of the form of (12.7.13) is fitted to the results. Even
supposing (and it is not a very plausible &88umption) that such complex
results can be described adequately by a linear relationship, and that
the other &88umptions (§ 12.2) are fulfilled, the result of such an
exercize is very difficult to interpret. Suppose it were found that areas
with more doctors (Zl) had fewer working days lost through illne88 (y).
(If (12.7.13) were to fit the observations this would have to be true
whatever the prosperity of the area.) This would imply that the co-
efficient b must be negative. Suppose it were also found that areas with
high incomes had few working days lost through illne88 (whatever
number of doctors were present in the area), so the coefficient c is also
negative. Inserting the values of a, b, and c found from the data into
(12.7.13) gives the required multiple regression equation. If Zl in this
equation is increased y will decrease (because b is negative). If ZII is
increased y will decrease (because c is negative). It might therefore be
inferred (and often is) that if more doctors were induced to go to an
area (increasing Zl), the number of working days lost (y) would decrease.
This inference implies that it is believed that the presence of a llf,fge
number of doctors is the cawe of the low number of working days lost,
and the data provide no evidence for this at all. Whatever happens in
the equation, it is clear that in real life one still has no idea whatsoever
what will happen if doctors go to an area. The number of working days
lost might indeed decrease, but it might equally well increase. For
example, it might be that doctors are attracted to areas of the city
which are near to the large teaching hospitals, and that these areas also
tend to be more prosperous. It is quite likely, then that most people in
these areas will do office jobs which do not involve much health hazard,
and this might be the real cawe of the small number of working days
lost in such areas. Conversely,le88 prosperous areas, away from teaching
hospitals, where many people work at industrial jobs with a high
health hazard, (and where, therefore, many working days are lost
through illne88) attract fewer doctors. If the occupational health hazard
were the really important factor then inducing more doctors to go to
an area might, far from decreasing the number of working days lost
according to the naive interpretation of the regression equation, might
II

Digitized by Google
§ 12.7
actually increase the number lost, because the occupational health
hazards would be unohanged, and the larger number of doctors might
increase the proportion of oases of ocoupational disease that were
diagnosed. Similarly it cannot be predioted what effect in the ohange in
,"F',',YTFd"FFit,v of an area the number of lost.
I'FjldCI',I'JI',ion equation most) only the
j,I'ldCS notM",(/ at aU

a correlation relationshi!" statio


survey data of this sort, in which the x values are correlated with
eaoh other and with other variables that have not been inoluded in the
regression equation because they have not been thought of or cannot
be measured, is the sort of thing for whioh it is poBBible to think up
half-a-dozen plausible e"planations before breakfast, The onlp use of
F"F,"",,,", apart from pou are luoky, as it
the survey was !"rovide hints ""rt of
T"",TYF" I'"pJJriments migP!' The onlldC tiid out
increasing the ldCootors in an area ,y"crease
the number. In a proper experiment various numbers of doctors would
be allocated striotly at random (see § 2.3) to the areas being tested.
This point has been disoU88ed already in Chapter 1.
Further disoUBBions will be found in § 12.9, and in Mainland (1963,
p. 322). More quantitative descriptions of the hazards of multiple
,"""JJYRWnFR will be found Snedecor (1967,
462-4).

I'J"rth mentionint s,hat the analpjiI' "I'J"i""oe can


be written in the form of a multiple linear regreBBion problem. Consider
for example the comparison of two treatments on two independent
samples of test objects (the problem discU88ed at length in Chapter 9).
It was pointed out in § 11.2 that in doing the analysis based on GaUBBian
distribution (i.e, Student's t test in the case of two samples-
is assumed th"s, {"hservation on !'I'JJ"hment
uI'hresented as YI7 and for the iF,','",,,,,''''''
nj+e'2 (eqn (11 f' is a constanh,
j,haracteristic of second treatmI'J,!'" "C'FnFF,',UV'AIV
This model can be written in the form of a multiple linear regression
equation
(12.7.1.)

gle
§ 12.7
where %1 is defined to have the value 1 for all responses to treatment
1 (j = 1) and 0 for all responses to treatment 2{j = 2), and %2 is 1 for
responses to treatment 2, and 0 for responses to treatment 1. Inserting
these values (12.7.14) reduces to lIu = P+T1+eU for treatment I,
and to 11'2 = P+T2+e'2 for treatment 2, exactly as in § 11.2 H the
estimates of T1 and T2 from the data are called b and c, and estimate of p
is called a, the estimated value for ith response to the jth treatment
becomes Y = a+b%1+C%2, identical with (12.7.13). The estimation of
treatment effects (values of T) is the same problem as the estimation of
the regression coefficients. An intermediate level disCU88ion of this
approach will be found in the first (1960) edition of Brownlee's (1966)
book.

12.8. Non-linear curve fitting and the meaning of 'best'


estimate
For the purposes of illustration, the problem of fitting the Michaelis-
Menten hyperbolat (or Clark equation, or Langmuir isotherm) will be
discussed. In biochemical terms the equa.tion states that the velocity
of an enzyme catalysed reaction is "Y%/(~ +%) where %is the concentra-
tion of substrate (the independent variable in the sense of § 12.1, and
the parameters true or population values) of the equation are "Y (the
maximum velocity, approached as %-(0), and~, the Michaelis con-
stant (the substrate concentration necessary for half-maximum velocity;
if ~ = % the velocity is "Y/2). The observed velocity, 11 say (the de-
pendent variable, see § 12.1), will differ from this by some error. H
V and K are estimates from experimental results, of "Y and ~, the
estimated velocity of the reaction will be
V%
Y=--· (12.8.1)
K+%
The shape ofthis curve is shown in Fig. 12.8.1.
Notice tha.t the parameters, "Y and ~, are not linearly related to Y
so this is a non-linear problem in the sense defined in § 12.7.
There are many ways of estimating "Y and ~. The relative merits
of some of them will be considered below. First, the problem of finding
least squares estimates for non-linear models will be discussed. The

t The general formula for a hyperbola is (Y -Cl)(Z-C.) = conetant, where the


constanta Cl and Ca are the asymptotes of the hyperbola. H Cl = Y and C. = -K
rearranging (12.8.1) abo.... that (Y - Y)(z+K) = - YK = conatant, whioh has the aame
form .. the general formula.

Digitized by Google
258 Fittift(/ C1U'tIe8 § 12.8
problem of estimating the error of these estimates from the experimental
results is important but complicated, and it will not be considered here
(see Draper and Smith 1966). Oliver (1970) has given formulas for
calculating the asymptotic variances of V and K from BCatter of the
observations 8(y). H there were several observations (y values) at each
:e then 8(y) would be estimated from the scatter of these values 'within

j' (least sq\lAl'ftl) - - .


30 '1'"/true)-

.----- -------
.-.---
--.--
®
Y(LB plot)
-----

10
I Two population
standard dl'viations

fKplot)(LB tf(true)
fK(Le&lit squares)
10 20 40
Substrate concentration,x
FlO. 12.8.1. Fitting the Hicbaelie-Menten hyperbola.
o'Observed' values from Table 12.8.1.
- - True (population) hyperbola (known only because the
'observations' were computer-simulated, not real. See
discussion on p. 268). The population standard deviation Is
a(y) = 1·0 at all fie values.
- . - Least squares estimate of population tine found from
'observed' values.
- - - Lineweaver-Bark (LB, or double reciprocal plot) estimate
of population tine found from the same ·observations'.
The true values, -r and :JI", and their values estimated by the two methods
(from Table 12.8.4) are marked on the graph.

:e values', but in the following example where there is only one observa-
tion at each :e, the best that can be done is to assume the population
curve follows (12.8.1), in which case the sum of squares of deviations
from the fitted curve, SmlD will be an estimate of r(lI). This is exactly
like the situation for a straight line discussed in § 12.6. The formulas
involve the population values -r and .1f' for which the experimental

Digitized by Google
--~----

§ 12.8 269
values V and K must be substituted. No allowance is made for the
uncertainty resulting from the use of sample values V, K, and 8(y) in
place of population values "/1",r, and a(y) so the formul808 are to some
extent optimistic. Using them is just like using the normal deviate "
instead of Student's' (see §§ 4.3 and 4.4).

Lea8t 8tJ'11M'u ~if'11t11e8 Jor n<m-liMtJr models


The approach is exactly 808 in § 12.7. It is required to find the esti-
mates of the parameters that minimize the sum of the squares of
deviations between observed (y) and ca.1culated (Y) velocities, S
= l:(y- y)2. In this example these le808t squares estimates will be
denoted , and /(, 808 in § 12.7.
From (12.8.1),

S = l:(y_Y)2 = ~(Y_::JI

(12.8.2)

li, 808 in § 12.7, this expression is differentiated first with respect to V


holding K constant (giving as/aV), and then with respect to K holding
V constant (giving as/oK), and the two derivatives equated to zero,
the result is a pair of simultaneous equations (the normal equatioM)
that can be solved for' and /(, just 808 (12.7.8) and (12.7.10) could be
solved for 4' and 6 in § 12.7. The only snag is that in this ca.ae they are
non-linear simultaneous equations that cannot be solved by school-
book methods. Another difficulty is that there may well be (808 in this
example) more than one set of solutions. The sort of difficulty that may
be encountered can be illustrated using a numerical example. The
figures in Table 12.8.1 represent the results of an enzyme kinetic
'experiment'.
Using these figures and eqn. (12.8.2), contours for various S values
can be calculated and plotted against V and K 808 shown in Fig. 12.8.2
(a) and (b).
The contours are not simple ellipses like those found in § 12.7
(Fig. 12.7.2 and 12.7.4). The required solution is clearly the bottom-
most point of the va.11ey in Fig. 12.8.2(80) (where as/av and as/oK
are simultaneously zero, see § 12.7), and it can be seen that this point

Digitized by Google
§ 12.8
TABLB 12.8.1
RuulU 01 Gft ~ lcinelic expmfM1ll. PM populat"'" (Irw) wlociIiu
Me tWo gifJm. Plaq Me hotm only becaue 1M 'expmfM1ll' U7GI tIOI
mil, but U7GI rim1llat«l ma (J computer, tU di8cuued later 'n tAU aectiota.

Sub8tftte 'ObBerved' Population


CODCeDtration velocity m-.n veloelty
(e) W) <1')
2'6 6·678 4·1857
6'0 7·282 NIOOO
10·0 12·621 12·0000
20'0 18'18S 17·1429
40'0 28·219 21·S182

K=15'89
FlO. 12.S.2(a) Fitting the Mtcbaeli&--Menten hyperbola. Contour map of the
IIUIDof squared devlatlona, S (on an aUa perpendicular to the paper), aplDat
various values of K aDd V. This figure is analogous to Figs. 12.7.2 and 12.7.4
which referred to the fitting of a atralght line. The values of S, calculated from
eqn. (12.S.2) using the obBervationaln Table 12.S.1, are marked on the contours.
(a) Thia covera the (phyalcally important) positlve values of V and K. The
mlnlmum value of S, 4'828, at the bottom of the valley oorreaponda to the leut
square estimates 9 = 81·45 and g = 16·S9.

Digitized by Google
§ 12.8 261

FIG. 12.8.2(b) Thla shows the contour map in the recton of neptlve
(phyalca.1ly bnpoeaible) K values. There I.e Been to be a subminimum at V = 8·2""
and K = -8.798, but this correaponda to S = 76"'8, a far W01'll8 fit than the lowest
minimum, S = 4.828.
The contoUl'll marked 10"0 are actually for S = l:V' = 10<&0'''726. For values
of S equal to or greater than this, the contour lines behave curiously.

corresponds to least squares estimates of , = 31·46 and it = 16·89


at the minimum value, 8 = .,323. But the contours behave in a
curious and complicated fashion in the region of negative K values
shown in Fig. 12.8.2. There are infinitely high ridges at K = -2·6,
K = -6, etc., because at these points K+x in eqn. (12.8.2) becomes
zero. The astonishing behaviour of the contour lines for high values of
8(~ 1040'.726 = :Elf) can be seen in Fig. 12.8.2(b). The contours all
Cf088 each other, and Cf088 the infinitely high ridges at K = -2·6,
-5·0, etc. The points of intersection of the contours have curious
properties. The height, i.e. the value of 8, at these points depends on
the direction from which the points are approached and, although
anyone who has climbed a mountain will feel that this fact is not
surprising, topographers might think that it was a warning against
pushing the geographical analogy too far.

Digitized by COOS Ie
262 Fitting CUnHl8 § 12.8
There are, in faot, several solutions to the simultaneous 'normal
equations' in this case. t For example, there is another pit at the
point V = 3·244 and K = -3'793, shown in Fig. 12.8.2(b). Although
these values correspond to a minimum in 8, the minimum is merely a
hollow in the mountain side. The value of 8 at this minimum, 764'3,is far
greater than the value of 8 at the least of all the minimums, ',323, as
shown at the bottom of the valley in Fig. 12.8.2(a). H there are several
minimums that with the smallest 8, i.e. the best fitting ourve, corres-
ponds to the least squares estimates. In this case (though not neoessarily
in all problems) all of the subminimums correspond to negative values
of K that are physically impossible and can therefore be ruled out.
There are many methods of finding the least squares solutions (see,
for example, Draper and Smith (1966, Chapter 10), Wilde (1964».
In almost all non-linear problems the solution involves suocessive
approximations (iteration). The procedure is to make a guess at the
solution and then to apply a method for correcting the guess to bring
it nearer to the correct solution. The method is applied repeatedly
until further correotions make no important difference. The final
solution should, of course, be independent of the initial guess. Geo-
metrically, the initial guess corresponds to some point on Fig. 12.8.2
(say V = 10, K = 2 for example). The mathematical procedure is
intended to proceed by steps down the valley until it reaches (suffioiently
nearly) the bottom, which corresponds to V = , and K = k. One
method, whioh sounds intuitively pleasing is to follow the direclion 0/
8teepea'de8cent (which is perpendicular to the contours) from the initial
guess point to the minimum. However inspection of Fig. 12.8.2 shows
that the direction of steepest descent often points nowhere near the
minimum. Furthermore if the search for the minimum is started in the
precipitous terrain shown in Fig. 12.8.2(b), or if this region is reached
at some time during the search, the direction of steepest decent may be
completely misleading. Although this and other sophisticated methods
(see, e.g Draper and Smith (1966, Chapter 10» have had muoh success,
many people now favour simpler search methods whioh seem to be
rather more robust (see Wilde 1964). One such method whiohhasproved
useful for ourve fitting (Hooke and Jeeves 1961 ; Wilde 1964; Colquhoun,
1968, 1969) will now be described.

t There is alIIo, in general, the JK*l'biJity of • aaddle point or mountain pa. when •
minimum in the plot of 8 against one parameter coincides with • maximum in the plot
of 8 apinat the other. Such. point alIIo aatiaftes the normal equationa becauae both
derivatives are SIIl'O.

Digitized by Google
§ 12.8 263
PalmuearM minimization
In Table 12.8.2 a computer program (an Algol 60 prooedure) is

12.8.2
TABLB

PalmuearM procedure (in Algol 60) written by M. Bell (UniWf'rity oJ


Lontlfm Iutitute oJOompuler 8cimu) to whom I amgrateJulJorptmnWion
to reproatIU it
A Fortran IV version can be supplied on request.
For this procedure the following must be supplied:
Ie = number of variables on which the function to be minimized dependa
bp[l :Ie] = baeepoint, the initial gueaaea for the valuee of each variable (para-
met.er estimate)
AP[I :Ie] = Dewpoint, a ..... array
.up£1 :Ie] = initial step aise for altering each variable in aearoh for better
valuee
red/ad[1 :k] = step reduction factor for each variable (uaually between 0·1 and
0·6)
criWep[1 :k] =0 IIID&lleBt permiadble step air.e for each variable. Thia oontrola
accuracy with which the minimum ill located.
ern = halC the amaUeat of the critatepe
IIIHJl = number of evaluation, an integer variable
lIIHJlim = mashnum permilaible number of evaluatioDB of function
pal = patternf&otor (uaually 1·0, but other valuee may help in aom8 caaea)
miA = a real variable

The function to be minimized ill declared aa


..... pI'OOIIIlun,futadiota (P); ..... array (Ph (see Table 12.8.S for an example)
On exit, after calling patteroaea.rch,
miA = minimum value of the function
Ap = valuee of the variables corresponding to the minimum (the leaat aqU&l88
pa.nmeter estimates for example)
IIIHJl = number of evaluatioDB of the function during the aea.rch
proceclun ~ (functiota, k, bp, AP, .up, red/ad, criIIIIep. ."., .val,
evalim, pal, MiA); i n ' " k, eval, evalim; ..... ern, pal, miA; ..... array bp, AP,"'.
red/ad, criIIIIep; ..... pI'OOIIIlun futaeliota;
hepD ..... array moH [1 :k]; inae,... i, laile; ..... value, mi,..,..;
proceclun aa:plor.;
hepD ..... 1aome; in. . . ;;
faile:= 0 ;
fori: ... 1 .... 1 untilkdo
hepD 1aome: = Ap[i];;: = 1;
.ADD 8: Ap[i]: = 1aome+.tq{i]; value: = junctiota (Ap);eval: = ewal +1;
if value < ""A &beD ,,"A: = value
eIIIehepD
if; ... 2 tben beiin Ap[i]: = 1aome; laile: = faile+l eucl
eIIIehepD.up£i]:= -.up£i];;:= 2; IOto.ADDSeucl
eucl
eucl
eucl 0/.."""..;

Digitized by Google
2M § 12.8
Min: = junction (bp); et1CIl: = 1;
GO ON: for i: = 1 . p 1 until k do "pCi]: = bp[i];
T BY: etII1pltwe;
If laU. = k tbeD
..... for i: = 1 . , 1 until k do
If aba(.c.p{i]) ~ critlUp[i] tbeD goto CaNT;

-;
gotoBXIT;
CaNT: for i: = 1 . p 1 until k do IIlep [i]: = red/ad[i]x.c.p{i];
goto TBY

fOl'i:= 1 .... 1 until k do motH(i]:= "pCi]-bp[i];


PATTBRNING: If et1CIl > et1CIlim ..... goto EXIT;
fori:=- 1 . , 1 until k do
..... bp[i]: = "pCi]; "pCi]: = bp[i]+pat x mow(i];
If motH(i]x.c.p{i] < 0 ..... Blep{i]: = -BIep{i]
end;
mitulore: = mi,,; mi,,: = fUnction ("p); et1CIl: = eval + 1 ;
etII1pltw. ;

-;
If mi" < minIIore tbeD
..... fori:= 1 . . 1 until k do mow(i]: = "pCi]-bp[i];

BXIT: _
fori:= 1 .... 1 untilkdolfaba(mow(i]) > epa tbeD goto PATTERNING

mi": = mitulor.; goto GO ON;


ofpGlt#JmBearc1&;

given that can be used to minimize any function, i.e. that will find
the values of the k variables (in the present example k = 2 variables,
viz. K and V) required to make the function (in the present example,
S given by (12.8.2» a minimum. The procedure was written by Bell
on the basis of the work of Hooke and Jeeves (1961). The procedure
ata.rts from the initial guess (basepoint) by trying steps (of speoified
size) in each variable to see whether the function is reduced. The
size of the reduction is not taken into account. When a suocessful
pattern of moves has been found it is repeated, the step size increasing
while the moves are successful (i.e. while they reduce the funotion
value). When the function cannot be decreased any further the step
size is reduced (by a specified factor) and a further exploration carried
out. When the steps fall below a specified size the search terminates on
the assumptions that a minimum has been found. Further details are
given by Wilde (1964).
Of course, if the surface has several pits patternsearch will locate only
one of them, which one depending on the initial guess, step sizes,
etc.
A typical procedure for calculating values of the function is shown in
Table 12.8.3. It calculates the sum of square deviations (eqn. (12.8.2»
for fitting the Michaelis-Menten equation. It incorporates a simple

Digitized by Google
§ 12.8
device for preventing the searoh venturing into the oraggy (and physically
impoaaible) region of negative V and K values.
When the paltt;m,8earcA program was used for fitting the Michaelis-
Menten curve to the results in Table 12.8.1 a minimum of 8 = 4·32299
r
was found at = 31·45004 and 8.. = 15·89267 after 215 evaluations of
8 (from Table 12.8.3) with various trial values of V and K. In this case
the initial guesses, bp in Table 12.8.2, were set to V = 2·0 K = 50·0,

TABLB 12.8.3
An Algol 60 procedure lor calculating the lunction to be minimized lor
.fitting the Mic1wuliB-Mente1& equation. The arraya containing the n
obaertJationa, 1/ [1 :n], and the n aubatrate concentrationa, z[1 :n], are
declared and read in belore calling pattemaearch. 11 the Boolean tHJriable
conatrained i8 aet to true the 8«Jrch i8 reatricted to non-negative valtu8 01
VandK
real pI'OOIIIIure fu.neCion (P); real am)' (P);
..... lDtepr;; real 8, K, V, Ycalc;
8:=0;
If eonatra..... &hen for;: = I, 2 do If l'r;] < 0 &hen PIj]: = 0;
V:= P[l]; K:= P[2];
for;: = 1 Idep 1 untiU " do
..... Ycalc:= VXa(;1/(K+z{;]);
8:= 8+(W]-Ycalc) 2 t
ead;
fundion:= 8
ead 0/ function;

step sizes were 1·0 for both V and K, reduction factor was 0·2 for both
V and K, and crit8tep was 10 - a for both V and K. Patternf'actor was
2·0. In another run, the same except that patternf'actor was set to
1·0 virtually the same point was reached (8 = 4·32299 at = 31·45019 r
and k = 15·89286) after 228 evaluations of 8-not quite as fast. If
the initial guesses were V = 1·0, K = 2·0 then again the virtually same
minimum (8 = 4·32299 at r
= 31·45018 and k = 15·89283) was
reached after 191 trial evaluations of 8. On the other hand if the initial
guesses are V = 2'5, K = -3·8 and the step sizes 0·01 then the program
locates the subminimum (8 = 764·299 at V = 3·2443 and K =
-3'793) shown in Fig. 12.8.2(b), if not constrained.

Other 'U8U lor pattemaearch


The program in Table 12.8.2 can be used for any sort of minimization
(or maximization) problem. It can, for instance, be used to solve any

Digitized by Google
266 § 12.8
set of simultaneous equations (linear or non-linear). If the n equations
are denoted/,(zl'''''z,,) = 0 (i = 1, ... ,11.) then the values of z correspond-
ing to the minimum value of "i:./l (which will be zero if the equations
have an exact solution) are the required solutions.

The meaning 01 'but' estimate


The method of least squares has been used throughout Chapter 12
(and implicitly, in earlier chapters). It was stated in § 12.1 that least
squares (LS) estimates have certain desirable properties (unbiasedness
and minimum variance; see below) in the case of linear (see § 12.7)
problems. It cannot automatically be assumed that least squares
estimates will be the best in the case of non-linear problems (and even if
they are best, they may not be 80 much better than others that it is
worth finding them if doing 80 is much more troublesome than the
alternatives). II the distribution of the observations is normal then
the method of least squares becomes the same as the method of maxi-
mum likelihood (see Chapter 1) and this method gives estimates that
have some good properties. Maximum likelihood (ML) estimates,
however, are often biased, as in the case of the variance for which the
maximum likelihood estimate is "i:.(Z_i)2/N, see § 2.6 and Appendix
1, eqn. (Al.3.'). And in generalML estimates have minimum variance
only when the size of the sample is large (they are said to be 0IJ'!IfI'1J-
toticall,l eJlicient, meaning that as the size of the sample tends to infinity
the variance of the ML estimate is at least as small as that of any other
estimate). Such results for large samples (aaymptotic results) are often
encountered but are not much help in practice because most experi-
ments are done with more or less small samples. There are few published
results about the relative merits of different sorts of estimates for
non-linear models when the estimates are based on small experiments.
Such knowledge as there is does not contradict the view that if the
errors are roughly constant (homoscedastic) and roughly normally
distributed, it is probably safest to prefer the LS estimates in the
absence of real knowledge. The ideas involved will be illustrated by
means of the Michaelis-Menten curve fitting problem discussed above.
As in all estimation problems, there are many ways of estimating the
parameters (1'"' and ar of (12.8.1) in the present example), given some
experimental results. And as usual all methods will, in general, give
different estimates. The methods most widely used in the Michaelis-
Menten case all depend on transformation of (12.8.1) to a straight line
(of. § 12.6). The most widely used (and worst) method is the double

Digitized by Google
§ 12.8 267
reciprocal plot (or Lineweaver-Burk plot). This depends on rearrange-
ment of (12.8.1) into the form

..!. = ~+ K(~), (12.8.3)


Y Y Y z
which shows that a plot of l/Y against lIz should be straight with
intercept l/Y and slope KIY. Such a plot is shown in Fig. 12.8.3.
A straight line has been fitted to the results by the simple (unweighted)
method of least squares described in § 12.5 (in laboratory practice
25

211

]011
!I
Hi

10

1011
30

10 20 311 40
]00
z
FlO. 12.8.S. Double reciprocal (or Lineweaver-Burk) plot (1/11 against
1/#e) for the 'observations' in Table 12.8.1. See also Table 12.8.4.
o· Observations'.
- - Straight Une fitted (see text) to ·observations'.
Intercept = 100/V = 100/22·58.
Slope = K/V = 8'16/22·58.
• • •• True Une corresponding to population mean velocities in
Table 12.8.1 (i.e. .y = SO ~ = 15, see Table 12.8.4).
Intercept = 100/S0 = S·SS.
Slope = .y/~ = 0·5.

Digitized by Google
.. --~~--=-:.. . --

268 Fittift{/ CUf't1U § 12.8


usually either this done, or a line is fitted. by eye). From its slope and
intercept the estimates of V and K are found to be V = 22·ft8 and
K = 8·16.
Another method is baaed on the rearrangement of (12.8.1) in the
form

(12.8.4)

from which it is seen that plot of y against ,Ix should be a straight


line with slope -K and intercept V. This plot is shown in Fig. 12.8.4.
Again a straight line was fitted using the method of § 12.5 in spite of
the fact that the abscissa, ylx, is not free of error as assumed in § 12.1
(because it now involves the observations, y). From the slope and
intercept of this line the estimates are found to be V = 25·76 and
K = 10·13.
The results of applying these various estimation methods to the
observations in Table 12.8.1 are compared in Table 12.8.4. They are not
very informative as they stand, but it will now be shown that they are
not untypical.

TABLE 12.8.4
V K
True population value 80·00 15·00
Leaat aquarea estimate 81-45 15'89
Lineweaver-Burk
estimate (eqn. (12.8.8» 22·58 8·16
" agaiDat ,,/IID estimate
(eqn. (12.8.4» 25·76 10'18

In fact the 'observations' in Table 12.8.1 were taken from a study in


which simulated.t experiments were used to investigate various
methods of estimation under various conditions (Colquhoun 1969; of.
Dowd and Riggs 1965). An 'experiment' was performed by picking at
random an observation from a normally distributed population known
to have the mean ~) given in Table 12.8.1 (and plotted in Fig. 12.8.1),
t The simulation method ..voida the mathematical difficulties of 8ndiDg the distribu·
tion of eetimatea, but the reeulte are not very general. Fig. 12.8.6 would look cWrerent
for cWrerent BON of error, cWrerent distributions for the o~tions and cWrerent
experimental deeipa (1p&Cm, and number of aubetrate concentrations, i.e. of I/IJ valuea).

Digitized by Google
§ 12.8 269
and known to have a standard deviation aCy) = 1·0 at every conoentra-
tion (i.e. the 'oheervations' were hom08oedastio--eee Fig. 12.2.3). The
'oheervations' were generated using computer methods. The oheen-a-
tions are thus known to be unbiased (their population means, p, are

,,
30 -"1'"=30

,,
\
- 1'=25·76
\ ,,
,,®
,
20 " ,
~

,,
!I 15
,,
",
True,
10 ,,
,
®,,
\

5
\ ,,
,,
\ ,
'!lIz
Flo. 12.8.4. Linearized plot using 11 aga.inat 1I1~.

o 'Observations' from Table 12.8.1.


- - Straight line fitted (see text) to 'observations'
Intercept = V = 26'76, slope = -K = -10·18
(see Table 12.8.4).
• • •• True line corresponding to population mean velocities in
Table 12.8.1, i.e. intercept = -r = SO, slope = -~
= -16.

known to lie exactly on the caloulated curve in Fig. 12.8.1) and, unlike
what happens in any real experiment, their distribution and population
means and standard deviations are known. Seven hundred and fifty

Digitized by Google
§ 12.8
such 'experiments' were performed, and from each 'experiment'
estimates of V and K were calculated by five methods (three of which
have been mentioned above). The resulting 750 estimates of V and K
were grouped to form histograms. The distributions 80 obtained of the
sssss,sssssssssss Df V are shown for three methDd" eetimation.
True valm:
r-

300
y against y/x

-
-
H
15 e;s 60 75

;., 150
I/y against Ilx
c"
ill
:>
a-
t:
~

60 75

Least,

60 75
(sf V

5. Distributiom: ,"utimates of 'Y (~ using


three methods, in 750 simulated experiments. Top: estimates from plot of y against
II/Z (88 shown for one 'experiment' in Fig. 12.8.4). Middle: double reciprocal
(Uneweavel'-Burk) plot (88 shown for one 'experiment' in Fig. 12.8.8). Bottom:
method of least squares.

gle
§ 12.8 The relati0f&8hip between two variahlu 271
The distributioll8 of estimates of K are similar, which is expected in
the light of the finding that the estimates of Y and K are highly corre-
lated, i.e. experiments that yield an estimate of Y that is too high tend
to give an estimate of K that is too high also, whichever method
of estimation is used. Inspection of Fig. 12.8.5 shows that in this
particular case (the p. values shown in Table 12.8.1 and Fig. 12.8.1, with
normally distributed homosoedastic observatioll8) the method of least
squares is in fact the best of the three methods. The LS estimates are
more closely grouped round the population value (Y = 30'0) than the
estimates found by the other methods (i.e. they have the smallest
variance), and the average value of the LS estimates (viz. 30.4) is close
to the population value (i.e. they have little bias).
By comparison the Lineweaver-Burk method is olearly terrible-
the scatter of estimates being very much greater (near infinite estimates
will be obtained when the plot in Fig. 12.8.3 goes nearly through the
origin giving l/Y ~ 0, and these distort the average estimate 80 muoh
that no realistio estimate of the bias is possible).
The plot of y agaill8t ylx falls in between these extremes. In spite of
breaking the rules for fitting straight lines by having error in the
quantity (ylx) plotted along the abscissa, the estimates are obviously
much less variable than those found by the Lineweaver-Burk method
(their standard deviation is only about 28 per cent greater than that of
the LS estimates in this case). The estimates from the y vs.ylx plot
are, however clearly coll8istently too low-they have a negative bias.
The average of all 750 estimates is 28·0, well below the population
value of 30·0, and about 73 per cent of estimates are too low (i.e. below
30'0). This bias is purely a property of the method of estimation. In these
simulated experiments the observations themselves were known to be
completely unbiased (a similar situation was seen in the case of the
standard deviation, see § 2.6 and Appendix 1). In real life there tJJO'Uld
in addition be Bome unknown amount of biaB in the obBervationB themBel1J68
(see §§ 1.2 and 7.2).
If, as is usually the case, experiments are repeated several times,
bias would be considered a more serious problem than large variance.
This is because the variance of an estimate can always be reduced by
doing a large enough number of experiments, whereas bias remains
however many experiments are averaged, and there is no way of
detecting the presence of bias from the results of repeated experiments.
These results are only valid for the particular conditions under which
they were obtained. In fact different results are obtained if the errors
19

Digitized by Google ... ·


272 Fittift{/ CUrve8 § 12.8
are not constant or the observations not normally distributed (Dowd
and Riggs 1966). For example, if the observations are normally dis-
tributed but heteroscedastic, i.e. they do not have the same standard
deviation at eaoh z value, then it is found, in the case when the co-
efficient of variation (standard deviation/mean) is the same at each
z value, that linear transformations give better estimates of V and K than
the least squares method (Colquhoun 1969). The only exception is the
linear Lineweaver-Burk plot which is always awful.

Why are the Lineweaver-Burlc utimale8 80 bad t


The problem is mainly one of weighting. In fitting the straight line
to the plot of llY against lIz, the dependent variable, I/y, has been
treated as though it had constant variance (see §§ 12.1 and 12.2),
and if the straight line is fitted by eye rather than by the method of
§ 12.5, the result is usually much the same. In fact, in this example
y had constant variance (= 1·0 at every z value). The variance of I/y
is therefore, from (2.7.14), approximately proportional to IIp.'-very
far from constant. Inspection of Fig. 12.8.3 shows that in the particular
experiment illustrated the poor estimates were mainly the result of
the error in the top point of graph (lIz = 0·4, z = 2·5). This observa-
tion was somewhat too high (see Table 12.8.1), so I/y is too low, and
this point has been given far too much weight in plotting the straight
line in Fig. 12.8.3. It has pulled the line down distorting the parameter
estimates. From (2.7.14), and the values of p. in Table 12.8.1, it is seen
that the variance of I/y at z = 2·6 is approximately VM(y)/p.' = 1·0/
(4,2867)', and the variance of I/y at the highest substrate concentration
(z = 40) is approximately I·O/(21·8182)'-far more precise. Each
point should really have a weight inversely proportional to its variance
(see § 2.5) 80 the point for z = 40 (lIz = 0·026) should have (21'8182)'1
(4,2867)' ~ 670 times the weight of the point for z = 2·5 (lIz = 0·4),
not the equal weight it was given in Fig. 12.8.3. The impression that the
point for z = 2·6 has been given far too much importance in the
Lineweaver-Burk plot is confirmed. The correctly weighted Lineweaver-
Burk plot is quite satisfactory, but in real life the weights (population
variances) would not be known so fitting it would be no less arithmetio-
ally inconvenient than finding the LS estimates.

12.9. Correlation and the problem of causality


So far in this chapter it has been assumed that the z variable (or
variables) can be fixed precisely by the experimenter. In many cases,

Digitized by Google
~ ~~~-----

§ 12.9 273
especially in social and behavioral scienoes, when often it is not
possible, or thought not to be possible, to do proper experiments (see
Chapter I), two (or more) variables are measured, neither (or none)
of which can be fixed by the experimenter, or assigned by him to
particular individuals. Results of this sort are far more difficult to
interpret, and therefore far less satisfactory, than the results of proper
experiments as discussed in Chapter I, but they are sometimes un-
avoidable.
Examples of the sort of questions usually treated by correlation
methods are (a) do people with good scores in school exams also have
Is) (d)

'.= )·0 ,.=1'0


, =1·0 , =0·82

If) (hi
o o
o o
o o
o o o
,.=0'60 o o '8=0·60 '.= +0·09
r =0·79
o , =()o,I;) , = -0·01
_, , I I ,

FlO. 12.9.1. Behaviour of the Spearman rank correlation coefIlcient ra,


and the product moment correlation coeftlcient, r, on various aorta of data.
Clearly non-linearity can result in coeMcientB of almost any value even when there
is a perfectly smooth relationship between :e and 1/. In these amaII samples it can
be seen from Table 12.9.2 that there is no evidence against the null hypothesis
that the population value of ra is zero in figures (d)-(h).

high scores in university exams' (b) are people who smoke a lot of
cigarettes more likely to die of lung cancer than those who smoke few'
(c) do parts of the country that have a large number of doctors per
1000 of population have more or fewer working days lost because of
illness than less well supplied areas' and so on. In each of these cases
there are two sets of figures (e.g. school and university exam scores for a
n~mber of people) which can be plotted on a graph or scatter diagram
like those in Fig. 12.9.1. The tendency of one variable to increase (or
decrease) as the other variable increases can be measured by a correlation

Digitized by Google
274 § 12.9
coeJ1icient. There are many different sorts of correlation coefficient, of
which two will be described briefly. For detailed descriptions of correla-
tion methods see, for example, Guilford (1954).
H a correlation is observed between two variables (A and B say),
and if it is large enough for it to be unlikely that it arose by chanoe,
then it can be concluded that
either (1) A causes B,
or (2) B causes A,
or (3) some other factor, directly or indirectly, causes both A and B,
or (4) an unlikely event has happened and a large correlation has
arisen by chanoe from an uncorrelated population (see § 6.1).

Usually there is no reason, other than the observer's prejudioe, for


preferring one of these explanations to the others. As explained in
Chapter 1, the only way to choose between (1), (2), and (3) is to do a
proper experiment. For example, using the example already discussed
in § 12.7, if it were found that areas with more doctors (x) had fewer
working days (y) lost through illness, the relationship might be pre-
sented in the form of a correlation coefficient, which would be negative,
between x and y, or by fitting a curve to the graph of 1/ against x. H a
straight line, Y = a+bx, was an adequate representation of the
observations the slope, b, would be negative. However, as mentioned in
§ 12.2, the least squares estimate (b) of the slope found by minimizing
1:(1/- Yr~ will not be quite the same as the estimate found by using the
horizontal deviations from the line in Fig. 12.2.1 (i.e. treating x as the
dependent variable and minimizing 1:(x-Xr~). 8inoe there is no in-
dependent variable in this case it is not obvious which line to fit. This
problem is avoided with correlation coefficients, into which x and 1/
enter in a symmetrical way. The interpretation of the relationship,
however it is presented, is clearly very difficult because chosen numbers
of doctors were not allocated at random to selected areas. This has
already been discussed at length in § 12.7. As stated there, and in
Chapter 1, the only way out of the difficulty is to do a proper experi-
ment.

Oorrelation based on ran". Spearman', coeJ1icient, rs


This coefficient, like other methods based on ranks, does not depend
on assumptions about normal distributions or the straightness of lines.
+
And, like other correlation coefficients, a value of 1 corresponds to
perfect correlation between x and y, a value of 0 corresponds to no

Digitized by Google
§ 12.9 275
oorrelation and a value of -1 oorresponds to perfect negative oorrela-
tion (y deoreasing as z increases). However, what is meant by 'perfect
oorrelation' is not the same for different coefficients (see Fig. 12.9.1).
In the case of the Spearman coefficient it means that the ranking of
individuals is the same for both criteria. As an example take the N = 6
pairs of obaervations shown in Table 12.5.1. These were analysed by
regression methods in § 12.5. They are reproduced in Table 12.9.1,
in which the ranks of the z and of the y values are given, and also
de = difference between ranks for the ith pair of obaervations. In this
case one variable might be a measure of the rarity of doctors in the ith
area. and the other variable a measure of the number of working days
lost through illness in that area.

TABLJI 12.9.1
pair rank rank
no. (i) e. If. ole. of If. fit df
1 160 59 1 2 -1 1
2 165 54 2 1 +1 1
S 169 64 S S 0 0
4: 175 67 4: 4: 0 0
5 ISO 85 5 6 -1 1
6 188 78 6 5 +1 1

Total 1087 4:07 21 21 0 ,


The Spearman rank oorrelation coefficient, r s , is estimated using the
same formula (eqn. (12.9.3» as used for the Pearson coefficient (see
below). but using the ranks rather than z and y themselves. It can be
shown (e.g. Siegel 1956) that the same anwer is found more easily from
N
6I~
1 C-l (12.9.1)
rs = - N(Ni-l)'

where l:d2 is the sum of the squares of the differences in rank. for each
pair of observations (as shown in Table 12.5.1) and N = number of
pairs. From Table 12.5.1. N = 6 and l42 = 480
6x4
rs =1 6(36-1) = 0·886.
This is a leBS than perfect positive oorrelation, as expected. If the ranks
for y had been exactly the same as those for z. all the differences,

Digitized by Google
276 § 12.9
tI., would have been zero, 80 it is obvious from (12.9.1) that rs would
have been + 1. H the ranks for 11 had. been in exactly the opposite order
to the ranks for 11 then rs would have been -1. And that is about all
that can be said. In no 8e1Ule does a correlation coefficient (of any sort)
of 0-886 mean '88·6 per cent perfect correlation', and olearly rs does
not meaaure the slope of the line when the observations (or the ranks)
are plotted against each other &8 shown in Fig. 12JU, &8 rs can only
vary between + 1 and -1. Some examples of the Spearman and
Peanon (see below) correlation coefficients calculated from partioular
II8t8 of observations are shown in Fig. 12.9.1. to give an idea of their
properties. It is obvious from this figure that far more information is to
be gained from plotting the graph than from calculating a correlation
coefficient.
Tau. Small numbers ofties can be given average ranks &8 in Chapters
8-10. For a description of the corrections necessary when there are
many ties see, for example, Siegel (1956).

18 iI.,..rtaBOnGble to ft:ppose tlaallM ~ corrdatiota arose by cAance ,


Aa usual this, put more precisely, means 'what is the probability that
a correlation coefficient differing from zero by &8 muoh, or more than
the observed value would be found by random sampling from an
uncorrelated population " (see § 6.1). The exact probability can be
found in just the same 80rt of way &8 W&8 UBed in Chapters 8-10. H
the observations are from an uncorrelated population eaoh of the N I
posaible rankinga of 11 (permutations of the numbers I to N) would
have an equal ohance of being observed in combination with a given
ranking of z. The probability of any partioular ranking would therefore
be lIN! 80 a correlation of +1 or - I (when no more extreme values are
possible) will have P = lIN! (one tail) or 21N! (two tail, see Chapter
6). P can always be found by enumerating all N! poBBibilities and
seeing how many give rs equal to or larger than the observed value
(of. Chapters 8, 9-10). To save trouble, tables have been constructed
giving the critical values of rs corresponding to P (two tail) not more
than 0·1, 0·05, and 0·01. For samples up to N = 8 the values are shown
in Table 12.9.2.
In the present example N = 6 and rs = 0·886 80 P = o-ms (from
Table 12.9.2). For larger samples than 8 it is olose enough to calculate

,= rJ(N-2)l-r2
(12.9.2)

Digitized by Google
§ 12.9 277
and refer the value of t found to tables (described in § 4.4) of Student's
t distribution with N -2 degrees of freedom. Equivalently, when N > 8,
rs can be referred to tables (e.g. Fisher and Yates, 1963, Table VII) of

TABLE 12.9.2
Oritical mlue8 0/ rs. 1/ the ob8ertJed rs (taken as poritive) i8 equal to or
larger than the tabulated mlue then P(two tail) i8 not more than the BpBCiJied
mlue. Reproduced from Mainland (1963), by permission of author and
publisher.

Number of P (two ta.il)


paira,N 0·1 0·06 0·01
, 1·000
5 0·900 1·000
6 0·829 0·886 1·000
7 0·71' 0·786 0·929
8 O·MS 0·788 0·881

critical values ofPea.raon's correlation coefficient. In this oase t = 0·886


V'[(6-2)/(1-0·886an = 3·82 with 6-2 = 4 degrees of freedom.
Reference to tables of t (see § 4.4) gives P ~ 0·02, not a very good.
approximation to the exaot value (0·05) when N is as small &8 6.

Limar correlation. PearMm'8 product ~ correlation coejficierll (r)


1/ %and 11 are both normally diatributed.t (see Chapter 4) the oloaenees
with which points cluster round a straight line is measured by Pearson's
product moment correlation coefficient, r. This measure has been met
already in § 10.7. The population value of r is estimated. by
~(y-g)(%-z)
r = --:--=-"'----,--="-=----'--::-
v'~(lI-g)a.~(%-f)a]

cov(y,%)
(12.9.3)
- v'[var(lI).var(%)]'

The second form follows from the definition of variance and covariance
«2.6.2) and (2.6.6». It was shown in § 2.6 that the covariance measures
the extent to which 11 increases &8 % increases. Pearson's r will be 1

t It ill actually . .umed that III and 11 fonows • bivariate normal diIItribution (_. for
eumple. Mood and Graybill (1963). p. 198).

Digitized by Google
278 Correlation § 12.9
(or -1) only if the points lie exaotly on a 8traight line as shown in
Fig. 12.9.1. The relationship between x and y may be perfeotly pre-
dictable and yet have a low correlation coefficient if the relation is not
a straight line, as illustrated in Fig. 12.9.1 (0), (d), and (g). The informa-
tion to be gained from r is therefore limited.
Using the results in Table 12.5.1 and Table 12.9.1 as an example
once again, r can be estimated easily because the sums of squares and
produots have already been caloulated in § 12.5. Inserting their
values in (12.9.3) gives
lHI-833
r = v'(526.833 X 682'833) = 0·853

a fairly large positive correlation. Its interpretation has been disoussed


above.
To find what the probability of observing a Pearson correlation
coeffioient as large or larger than 0·852 would be, if the observations
were randomly selected from normal population with zero correlation,
the procedure is to calculate I using (12.9.2). The value of I is referred
to the tables of Student's I distribution (desoribed in § 4.4) with N-2
degrees of freedom where N is the number of pairs of observations.
In the present example N = 6 80

, = 0'853J[(I~;8:3i)] = 3·27.

Consulting the tables with 6-2 = 4 degrees of freedom shows that


the required probability is between P = 0·015 (corresponding to
I = 2·776) and P = 0·02 (corresponding to I = 3·747). This is low enough
to make one a little suspioious of the null hypothesis that the population
correlation is zero. (The inference was, for praoti~l purposes, the
same when Spearman's coeffioient was caloulated using ranks.) The
same result can be obtained, without caloulation, from tables of oritioaJ
values of r (e.g. Fisher and Yates (1963, Table VII».
A little bit of algebra shows that the test of the hypothesis that the
population correlation coeffioient is zero is identical with test (in § 12.5)
that the population slope (regression coefficient) is zero. The value of
, just found is the same as that found at the end of § 12.5, and = 3·27i Ii
= 10·7 is the value of F found in the analysis of variance of the observa-
tions shown in Table 12.5.2.

Digitized by Google
13. Assays and calibration curves

'11 est vra.i que certaines paroles et certalnea cm-m:nomea auftlaent pour faire
p~rir un tropeau de moutons, pourvu qu'on y ajoute de l'a.rsemc.'t
VOLTAIRB 1771
(Questions sur "Ef&C1/clop«lu: 'Enchantement')

t 'mcantatiou will destroy a flock of sheep if administered with a certain


quantity of a.rsenic.'
(Tranalation: GBOROB ELIOT, Midcflematoc:A, Chap. 17)

13.1. Methods for estimating an unknown concentration or


potency
THE prooeas of estimating an unknown concentration will be referred
to as an (J8IJ(Jy. All biological assays and most ohemical assays depend
on compariaon of the unknown substance with a standard 80 the
prinoiples involved in both ohemical and biological assays are the same.
The objeots are to obtain (a) the 'best' (usually least squares, see
§§ 12.1, 12.2, 12.7, and 12.8) estimate of the unknown concentration,
(b) confidence limits for its true value, and (0) to test as many as
posaible of the assumptions involved in the assay. Unfortunately
almost all the methode used involve the assumption of a GaWJSian
(normal) distribution (see § 6.2). AB usual it is no exaggeration to say
that there is rarely any reason to believe that this assumption is correct
80 the results must be interpreted with caution as indicated in §§ 4.2, 4.6
and. 7.2. A detailed account of biological assay will be found in Finney
(1964), whose notation has been used in most places to make this
standard reference book as aooeaaible as posaible.
This ohapter is pretty 80lid and it may help to go through the
numerical examples in §§ 13.11-13.15 before looking at the theory in
§§ 13.2-13.10. The objeot of the theoretical part is to derive the
formulas used in parallel line assays using simple algebra only. This
means putting all the steps in, avoiding 'evidently' and 'it is obvious
that'. One result is that the theoretical part is rather long and, by
mathematicians' standards, inelegant. Another result, I hope, is that
the basis of the analysis of parallel line assays is made available to

Digitized by Google
280 A88aY8 and calibration curvea § 13.1
those who, like me, prefer to have the argument laid out in words of one
syllable.
The experimental designs according to whioh the various concentra-
tions of standard and unknown substance can be tested are disoussed
at the end of this section.
All the methods to be disoussed involve the assumption, whioh may
be tested, that the relationship between the measurement (y, e.g.
response) and the concentration (x) is a straight line. Some transforma-
tion of either the dependent variable, y, or the independent variable, x,
may be used to make the line straight. The effects of suoh transforma-
tions are discussed in § 12.2. In biological assay the transformed response
is called the reaponae metameter (Le. the measure of response used for
caloulations) and the transformed ooncentration or dose is oalled the
doae metameter. Of oourse the response metameter may be the response
itself, when, as is often the case, no transformation is used.
Furthermore, all the methods to be discussed assume that the
standard and unknown behave as though they were identical, apart
from the concentration of the substance being assayed. Suoh assays are
called analytical dilution atlIJaY8. When this condition is not fulfilled
the assay is oalled a comparative a88ay. Comparative assays occur
when, for example, the concentration of one protein is estimated
using a different protein as the standard, or when the potenoy of a
new drug relative to a different standard drug is wanted. (Relative
potency means the ratio of the concentrations or doses required to
produce the same response.) One difficulty with oomparative assays is
that the estimate of relative concentration or potency may not be a
oonstant, i.e. independent of the response level chosen for the oomparison,
so when a log dose scale is used the lines will not be parallel (see below).

Oalibration curvea
Chemical assays are often done by constructing a calibration curve.
plotting response metameter (e.g. optical density) against concentration
of standard. The ooncentration corresponding to the optical density
(or whatever) ofthe unknown solution is then read off from the calibra-
tion curve. This sort of assay is discussed in § 13.14.

Ocmtin'U0U8 (or graded) and diaccmtin'U0U8 (or quantal) reaponaeB


In chemical assays the 'response' is nearly always a continuous
variable (see §§ 3.1 and 4.1), for example volume of sodium hydroxide
or optical density. In biological assays this is often the case too.

Digitized by Google
§ 13.1 281
For example the tension developed by a musole, or the fall in blood
pressure, is measured in response to various concentrations of the
standard and unknown preparations. A8says based on oontinuous
responses are discussed in this ohapter. Sometimes, however, the
proportion of individuals, out of a group of ft, individuals, that produced
a fixed response is measured. For example 10 animals might be given a
dose of drug and the number dying within 2 hours counted. This
response is a discontinuous variable-it can only take the values
0, 1, 2, . . . , 10. The method of dealing with suoh responses is con-
sidered in Chapter 14, together with olosely related direct tJ86ay in
whioh the dose required to produce a fixed response is measured.
One of the assumptions involved in fitting a straight line by the
methods of Chapter 12, discussed in § 12.2, is the assumption that the
response metameter has the same scatter at each x value, i.e. is homo-
soedastio (see Fig. 12.2.3). This is usually assumed to be fulfilled for
assays baaed on continuous responses (it should be tested as described
in § 11.2). In the case of discontinuous (quantal) responses there is
reason (see Chapter 14) to believe that the homosoedastioity assumption
will not be fulfilled, and this makes the calculations more complicated.

Parallel line and 8lope ratio atJ8(JY8


In the case of the calibration curve described in § 13.14 the abscissa is
measured in concentration (e.g. mg/ml or molarity). It is usual in
biological assays to express the abscissa in terms of ml of solution
(or mg of solid) administered. In this way the unknown and standard
can both be expressed in the same units. The aim is to find the ratio
of the concentrations of the unknown and standard, i.e. the potency
ratio R.
concentration of unknown
R=-------------------
concentration of standard
amount of standard for given effect (2:8)
(13.1.1)
- amount of unknown for same effect (2:u)'

For example, if the unknown is twice as concentrated as the standard


only half as much, measured in ml or mg, will be needed to produce
the same effect, i.e. to contain the same amount of active material.
See also § 13.11.
Suppose it is found that the response metameter y, when plotted
against the amount or dose, in ml or mg, gives a straight line. Obviously

Digitized by Google
........--=-...
~::..- ----_. -

282 § 13.1
the response should be the same (zero, or control level) when the
dose of either the standard or unknown preparation is zero. The
straight line for standard can be written Y8 = a+b8 %8' where b8 is
the slope, Zs the dose (amount) of standard, and a the response to
zero dose (Zs = 0); similarly for the unknown Yu = a+buZu. the
response to zero dose being a, &8 for the standard. When Y8 = Yu
it follows that a+b8 z8 = a+buZu so the potency ratio, from (13.1.1),
is B = Za/Zu = bulbs, the ratio of the slopes of the linea, &8 illustrated
in Fig. 13.1.1(a). An assay in which the abscissa is the dose or amount
of substance is therefore called a 8lope ratio aBBall (e.f. § 13.14). This
sort of assay is described in detail by Finney (1964).

~
.. .

.,
~
~ E
(a) (b)
j. oCI
"i:
E
E il
~
J Standard
~
y

I (slope=bs )
I
I
I
I
I
I
I

-.
I

0 :tv .,.
Xu = log :tv X8 = log :8
Dose or amount (:)
log dOle (x)
FlO. 13.1. (a) Slope ratio &ll8&y. RespoDSe metameter plotted aga.inst dose.
(b) ParaUelline &ll8&y. ReepoDSe metameter plotted against log dose. See text for
discusaion.

Consider now what happens if it is found empirically that, in order


to obtain straight linea, the response metameter must be plotted against
the logarithm of the dose, Z = log z say. The ratio of doaea required to
produce any arbitrary constant effect Y, in Fig. 13.1.1(b) is again the
potency ratio zs/zu from (13.1.1). Now from Fig. 13.1.1(b) the horizontal
distance between the two lines is Zs-Zu = logzs-logZu = log(zalZu)
= log B. So the horizontal distance is the log of the potency
ratio, and because (for analytical dilution assays, see above) the potency
ratio (R) is a constant, the horizontal distance between the linea (log B)

Digitized by Google
§ 13.1 AB8aYB and calibration Cun1ea 283
must also be a constant. This will be 80 whether or not the lines are
straight (the argument has not involved the assumption that they are),
but when they are straight it implies that they will be parallel. .Assays
in which the abscissa is on a logarithmic scale are therefore called
parallel line auaYB. The reaaon for using a logarithmic dose scale is to
produce a straight line. Parallelism is a COta8equmc6 of using the log-
arithmic scale (see § 12.2 also). Another ~ of using the
logarithmic dose scale is that the ratio between doses is usually kept
constant 80 that the interval between the log doses will be constant.
The spacing of the doses is, of course, a consequence of using a log-
arithmic scale, and not a reason for using it as is 80metimes implied.
Furthermore; the range covered by the doses has nothing to do with
scale chosen. A wide range can be accommodated just as easily on an
arithmetic scale as on a logarithmic scale.
A similar situation arises in pharmacological studies when the log
dose-response curve is plotted in the presence and absence of a drug
antagonist. The parallelism of the lines can be tested as described in
the following sections. H they are parallel the potency ratio can be
estimated. In this context the potency ratio is the ratio of the doses of
drug required to produce the same response in the presence and absence
of antagonist, and is called the doae ratio.
The rest of this chapter, except for § 13.14, will deal with parallel
line assays with continuous responses. Sections 13.2-13.10 deal with
the theory and numerical examples are worked out in §§ 13.11-13.16.

T1fP68 oJ parallel line auaYB


In biological assays, when the response, y, is plotted against log
dose, ~, the line is usually found to be sigmoid rather straight. But it is
often sufficiently nearly straight over a central portion for the assump-
tion to produce negligible error.
It is convenient to classify assays according to the number of dose
levels of each preparation used. H ks dose levels of the standard
preparation are used, and ku of unknown, the assay is described as a
(ks+ku) dose assay. The properties of various types are, briefly, as
shown in Table 13.1.1.
The tests of validity possible in a (2+2) dose assay will now be con-
sidered in slightly more detail before starting on the theory of parallel
line assays. It is intuitively plausible that the following tests can be
done (see § 13.7 for details).
(1) For slope (i.e. due to linear regression, see § 12.3). The null

Digitized by Google
284 A88atl8 and calibration. CUf'V68 § 13.1
hypothesis that the slope of the response-log dose curve is zero is
tested. Obviously the assay is invalid unless it can be rejected. PoBBible
reasons for an increase in dose not causing an increase in response are
(a) insensitive test object; (b) doses not sufficiently widely spaced, or
(c) responses allsupramaximal.
(2) For difference between standard and unknown preparations, i.e.
is the average response to the standard different from that to the

TABLE 13.1.1

Number of doees of
Std (le.) Unknown
(lev)

1 1 The respoD8e8 to the two doees must be exactly


matched. If not too much exactneas is demanded
this may be possible to achieve once, but a single
match would allow no estimate of error. If the
doees were given several times it is moat improb-
able that the means would match and eo no
result could be obtained. This maklai.., GUaIi is
therefore unsatisfactory.

1 The response-log dose line for the standard can


be drawn with two points, if it is already known
to be straight, and the dose of standard needed to
produce the 8&Dle response 88 the unknown can
be interpolGted. Error can be estimated. The
88SWD.ption that the slope of the line is not zero
can be tested but the 888umptions of linearity and
para.llelism cannot. (See § 13.15.)

2 2 The (2+2) dose 888&1' is better because, in addition


to being able to test the slope of the dose response
lines, their pa.raJlelism can be tested (see Fig.
12.1.2). It is still nece8ll&l'J' to 88SWD.e that they are
straight.
3 3 With a (3+3) dose 888&1' the 88Bumptions of
non-zero slope, pa.raJlelism, and linearity can all
be tested.

test preparation 1 This is not usually of great interest in itself though


it helps precision if g8 and gu are not too different (see § 13.6). It will
be seen later that this test emerges as a side effect of doing tests (1) and
(3).
(3) Deviations from parallelism. The null hypothesis that the true
(population) slopes for standard and test preparation are equal is

Digitized by Google
§ 13.1 ABBa!!B and calibration cu",u 286
tested. H this hypothesis is rejected the assay must be oonsidered in-
valid. In an analytica.l dilution assay the most probable cause of non-
parallelism is that one of the preparations is off the linear part of the
log dose-response curve. This is shown in Fig. 13.1.2.

(b)
(a)
,

.
~
~
.
~
~
w
e ew
Sw ~
e e
l l
J ~

log dOle (x) log dOle (x)

FlO. 13.1.2. Apparent deviations from pa.ralleHsm can reault when some
doeee are not on the straight part of the dose response curve, 88 shown in (b),
even when the horizontal dlatance between the two curves ia constant.
o Observations.
- - - Straight line fitted to observations.
- - True response-log dose curve.

Symmetrical parallel line aBBa!!B


In the following section it will beoome obvious that the ca.lculations
can be very greatly simplified when the assay is symmetrica.l. In the
oontext symmetry means that the assay has (a) the same number of
dose levels of each prepa.ration~ither (2+2) or (3+3) usually, (b) eaoh
dose is administered the same number of times, (0) the ratioB between
all doses are equal, and the same for both standard and unknown,
i.e. the intenJal8 between doses are equal on the logarithmio scale.
These oonditions are summarized precisely in eqns. (13.8.1).

De.BignB lor ,ke adminiBtration 0IBtandartl and unknown


Any of the usual experimental designs, some of whioh were described
in Chapter 11, may be used. The various ooncentrations ofsta.ndard and
unknown are the treatmenl8. See also § 13.8.

Digitized by Google
286 Assays and calibration CU.rv68 § 13.1
For example in a (3+3) dose assay there are 6, different solutions,
each of whioh is to be tested several (say 17,) times. The 6n tests may be
done in a completely random fashion as described in § 11.4. If each dose
is tested on a separate animal this means allocating the 617, doses to
617, animals strictly at random (see §§ 2.3 and 11.4). Often all observa-
tions are made on the same individual (e.g. the same spectrophotometer
or the same animal). In this case the order in which the 617, tests are
done must be strictly random (see § 2.3), and, in addition, the size of a
response must not be influenced by the size of previous responses (see
disoussion of single subjeot a.ssa.ys below).
If, for example, all 617, responses could not be obtained on the same
animal, it might be possible to obtain 6 responses from each of 17,
animals, the animals being block8 as described in § 11.6. Examples of
a.ssays based on randomized block de8i(fIUJ (see § 11.6) are given in
§§ 13.11 and 13.12. A second source of error could be eliminated by
using a 6 X 6 Latin square design (this would force one to use 17, = 6
replicate observations on each of the 6 treatments). However it is
safer to avoid small Latin squares (see § 11.8).
If the natural blocks were not large enough to acoommodate all the
treatments (for example, if the animals survived long enough to receive
only 2 of the 6 treatments); the balanced incomplete block design could
be used. References to examples are given in § 11.8 (p. 207).
The analysis of a.ssays based on all of these designs is done using
Gaussian methods. Many untested a.ssumptions are made in the analysis
and the results must therefore be treated with caution, as described in
§§ 4.2, 4.6, 7.2, 1l.2, and 12.2. In particular, the estimate of the error
ef the result is likely to be too sma.ll (see § 7.2).

Bingle BUbject Q,88aY8


Assays in whioh all the doses are given, in random order, to a single
animal or preparation (e.g. in the example in § 13.11, a single rat
diaphragm) are partioularly open to the danger that the size of a
response will be affected by the size of the preceding responses(s).
Contrary to what is sometimes said, the faot that responses are evoked
in random order does not eliminate the requirement that they be
independent of each other. Special designs have been worked out to
make the a.llowance for the effect of one response on the next, but it is
neoeuary to a.ssume an arbitrary mathematical model for the inter-
action so it is muoh better to arrange the a.ssa.y so &8 to prevent the
effect of one dose on the next (see, for example, Colquhoun and Tattersall

Digitized by Google
113.1 287
(1969). If the doeea have to be well separated in time to prevent inter-
action it may not be poaaible to give all the treatments to one subject,
80 an incomplete block design may have to be used (see § 11.8 and; for
example, Colquhoun (1963». The problem is discussed by Finney
(1964, p. 291).

13.2. The theory of paralle. line auays. The respon.. and do..
metamete,.

RuponBs mstamet8r (1/)


The object is to transform the re8PQnse 80 that it becomes normally
distributed, hom08ceda.stic, and produces a straight line when plotted
against log dose (see §§ 11.2, 12.2, p. 221, and 13.1). In many oases the
response itself is used. A linear transformation of the response, of the
form 1/ = c1 +~ where C1 and C2 are constants, may be used to simplify
the arithmetic. This will not affect the distribution, soeda.sticity, or
linearity. For example, in § 11.4 each observation was reduced by 100
to make the numbers smaller. For teats of normality, see § 4.6.

PM doBe mstamet8r (~)

For parallel line assays this of course, by definition (see § 13.1),


the logarithm of the dose. The dose (measured in volume or weight) is
denoted z, as in § 13.1. Thus
~ = log z (13.2.1)

Usually logarithms to base 10 (common logs) will be used beoause the


tables are the most convenient; but it will be shown that for parallel
line assays which are symmetrical, as defined in § 13.1 andeqn. (13.8.1),
it will make the calculations much simpler to use a different base for
the logarithms. This will not, of course, affect the linearity or parallelism
of the lines. At this stage this only looks like an additional complication,
but the simplification will become apparent later. Numerical examples
are worked out in §§ 13.11, 13.12, 13.13, and 13.15.

PM 'Y"'melrical (2+2) do8e a88a1/


Suppose that the ratio between the high and low doses is D, both for
the standard. and for the unknown. Suppose further that each dose is
given" times 80 the total number of observations is N = 4n. Of these
"8 = 2" = iN are standards and "u = iN are unknowns.

Digitized by Google
288 § 13.2
If the low doses of standard and unknown preparations are ZLS and
~U then, by definition of D, the high doses will be
Zas = Dzu" and Zau = ~u. (13.2.2)
The most convenient base for the logarithms is v'D. This looks most
improbable at first sight, but the reason why it is 80 will now be shown.
Taking the logarithms to the base v'D of the doses (remembering that
10g.mD = 2 whatever the value of D, because the log is defined as the
power to which the base must be raised to give the argument) gives,
from (13.2.1) and (13.2.2),
XLS = log .mZLB (13.2.3)
XBS = 10g.mZBB = log .m(Dzu,)
= log .roD+log NLS
= 2+XLS' (13.2.4)
Similarly, for the unknown, ~U = log~u, Xsu = 2+xLu.
The mean value of the log dose for the standard preparation, if the
high and low doses are given an equal number of times (n.), will be,
using (13.2.4)
_ 'nXLS+'nXBS
XS=
2n. } 13.2.6)
and
Combining these results with (13.2.4) gives
(xas-is) = +1,
(XLS-iS) = -1,
and similarly )<13.2.6)
(xBu-iu ) = +1,
(XLU-iU) = -1.
Using logs to the base v'D has made (x-i) takes the value + 1 for
the high doses (of both standard and unknown), and -1 for the low
doses. This means that (X_i)2 = 1 for every dose; and since there are
iN doses of standard and iN doses of unknown, it follows from (2.1.7)
that
(13.2.7)
where the summations are over all iN doses of standard (or unknown).
Thus the total sum of squares for x, pooling standard and unknown, is

s.u
I I(x-i)2 =l:(Xs-is)2+l:(XU-iU)2 = N, (13.2.8)

Digitized by Google
§ 13.2 ABBayS and calibration curvu 289
where the symbol I means 'add the value of the following for the
s.v
standard to its value for the unknown' (as shown in the central expres-
sion of (13.2.8». The sums of squares are greatly simplified by using
logs to the base VD.

The aymmarical (3+3) doBe aasay


The most convenient base for the logarithms in this case is D rather
than VD. The low, middle, and high doses will be indicated by the
subscripts 81, 82, and 83 for standard, and Ul, U2, and U3 for un-
known. The ratio between each dose and the one below it is D, as
before. Thus
%SlI = DzSl'
%sa = DzSlI = Dlizsl (13.2.9)
Taking logarithms to the base D (remembering logDD = 1 and
logDDli = 2, whatever the value of D) gives

XSl = lognZsl' )
Xu = lognZslI = logD(Dzsl ) = logDD+lognZsl = I+XSl, (13.2.10)
Xsa = lognZsa = logDDli+lognZsl = 2+XSl'
The mean standard dose, if ea.ch dose level is given the same number
of times (11.), will be, using (13.2.10),

}I3.2.11)
and
Combining this with (13.2.10) gives, for the standard
(XSl -xs) = -I
(XS2-XS) = 0 )(13.2.12)
(XS3 -xs) = + 1

and similarly (xu-xv) = -1,0, +1 for low, middle, and high doses of
unknown.
Because the &88&y is symmetrical (see §§ 13.1 and 13.8.1) each dose
is given the same number of times, n. The total number of observations
is N = 6n and the number of standa.rds is ns = 311. = iN, and of
unknowns no = 311. = iN. Now (X_X)lI = + 1 for all high and low
doses, and 0 for all middle doses 80
l:(XS-xs )2 = l:(XU-xv)lI = iN (13.2.13)

Digitized by Google
- -----------
290 § 13.2
the summations being over all iN doaea (n low, n middle, and n high)
of standard or unknown. The total sum of squares for z, pooling stand-
ard and unknown, is

II(z - xr~ == l:(zs - XS)i + l:(zu - XU)i = iN (13.2.14)


B,U
where I means sum over preparations as in (13.2.8).
s,u
13.3. The theory of para lie. line auaye. The potency ratio
This disCU88ion applies to any parallel line 8B8&Y, symmetrical
(see §§ 13.1 and 13.8) or not. Numerical examples are given in §§ 13.11,
13.12, 13.13, and 13.15.
According to (13.1.1) the ratio of the concentration of unknown to
concentration of standard is
concentration of unknown
----------~--~~
R= concentration of standard
amount of standard for given effect z's
== amount 0 f u nknown fior same eftiect == z-"
u (13.3.1)

where the prime indicates doaea estimated to produooidentical responses.


As in § 13.1 the dose z will be measured in the same units (e.g. volume of
solution, or weight of solid) for both standard and unknown (what
happens when the units are different is explained in the numerical
example in § 13.11).
The conventional symbol for the log of the potency ratio is M so,
from (13.3.1),

M = log R = log %~ -log z~ = z~ -z~. (13.3.2)


As in § 13.1, M = z~-x~ the difference between the logs of equi-
effective doaea, is the horizontal distance between the parallel linea as
shown in Fig. 13.3.1. The least squares estima.te of this quantity will
now be derived.
When straight lines are fitted to the standard and unknown responses
the lines are constrained to be parallel, i.e. an average of the observed
slopes for S and U is used for both (see § 13.4). If this common slQpe
is oalled b, the linear regression equations (see §§ 12.1 and 12.7, and
eqn. (12.3.1» are written
Y s = ys+b(zs-xs),
Y u = tiu+b(xu-xu)· (13.3.3)

Digitized by Google
§ 13.3 291
When the response is the same for ata.ndard and unknown Ys = Yv, so
these can be equated giving
tis+b(z~-fs} = tiv+b(z~-fu},
where z~ and z~ are the log doaes giving equal responses as above.
Rearranging this to give M = z~-z~, from (13.S.2), gives the result

!I, log dOle (x)


FlO. 13.3.1. Geometrical mea.nlng of the equation (derived in the text)
for the log potency ratio (M = log B) in any paraIlelliDe _yo

M = Iog R '
= zs-zv, = (- -)+ (Iv-tis)
zs-zv b· (13.3.4)

The geometrical meaning of the right-hand aide is illUBtrated in Fig.


13.3.1. To find the potenoy ratio, R, the antilog of M can be found
from tables if common logarithms (to base 10) have been used. However
in symmetrica.l assays it has been shown that it is better to use logar-
ithms to a different base, say base r in general (it was shown in § 13.2
that r = 1/D is best for 2+2 dose assays and r = D for 3+3 dose
assays). Since antilog tables are available only for base 10 logarithms
it will be necessary to convert to base 10 before looking up antilogs.
The general formulat for ohanging the base of logarithms from a to b is
log" z = 10gb z.log" b (13.3.15)
from whioh it follows that
loglO R = logr R. loglor = M.log10 r. (13.3.6)
t Proof. From the deftnition of logs, antilog. III _ ". and 10 bl .... = I. AlIo, in pneraI,
" log III = log ",.. Thu log. I.log. b ... log. (b l ....) - log. I.

Digitized by Google
292 § 13.3
Therefore, multiplying (13.3.4) by the conversion factor, loglo r, gives

- - )+(YU-YS)]
IoglO R = [(Zs-Zu b • Ioglo'"· (13.3.7)

This is a perfectly general formula whatever 80rt oflogarithms are used.


H common logs were U8ed, r = 1080 the conversion factor is loglo10
=1.
For symmetrical assays (as defined in (13.8.1) and at the end of
§ 13.1) this expression can be simplified, as shown in § 13.10.

13.4. Tha theory of parallallina ....ys. Tha bast averaga slope


For estimation of the potency ratio it is essential that the re&pOnse-
log dose lines be parallel (see §§ 13.1, 13.3). Inevitably the line fitted
to the observations on the standard will not have emdly the same slope
as the line fitted to the unknown; but, if the deviation from parallelism
is not greater than might reasonably be expected on the basis of
experimental error, the observed slope for the standard (bs ) is averaged
with that for the unknown (hu) to give a pooled estimate of the pre-
sumed oommon value.
By 'best' is meant, as usual, least squares (see §§ 12.1, 12.2, 12.7,
and 12.8). A weighted average of the slopes for standard and unknown is
found using (2.lU). Calling the average slope h, this gives

(13.4.1)

where the weights are the reciprocals of the variances of the slopes
(see § 2.5). The estimated variances of the individual slopes, by (12.4.2),
are

(13.4.2)

where r[y] is, as usual, the estimated error variance olthe observations
(the error mean square from the analysis of variance).

Digitized by Google
§ 13.4 293
Now in general the variance of the weighted mean 9 = "£.wtYJ"£.w,
will be given by (2.7.12) as
1
var(g] = - . (13.4.3)
"£.w,
Taking Ws = l/var[bs ] and Wo = l/var[bo] from (13.4.2), and insert-
ing the estimate of the slope from (12.2.7) gives
~(xs _XS)2 ~Ys(xs -xs) "£.1Is(xs -xs)
wsbs = r[y] . ~(xs-xsr.l = r[y] (13.4.4)

and similarly for unknown. Inserting these results in (13.4.1) gives the
weighted average slope

~ (xs-xs
"'1Is - )+~
"'Yo (xo-xo
-) IIY(x-x)
so
6 = ~(xs-XS)2+"£.(XU-XU)2 = =i=-I--:-(-x-_-X)-2 (13.4.6)
s.o
where the symbol I means, as before, 'add the value of the following
s.o
quantity for the standard to its value for the unknown'. In other
words, the average slope is simply (pooled sum of products for 8 and U)/
(pooled sum of squares of x).
For symmetrical assays it was shown in § 13.2 that ~(X_X)2 is the
same for standard and unknown so, from (13.4.2), the weights are
equal and the two slopes (6s and bu) are simply averaged.
From (13.4.2) and (13.4.3) it follows that the variance of the average
slope is, in general, estimated as
1 8 2[y] r[y]
var[b] =~
...w =~( - 2 ~( -
... xs-xs) + ... xo-xo )2= ~~
~~(x-x)·
_ft (1346)
..
s.o
(compare this with (12.4.2». It is, of course being assumed that the
variance of the observations, r[y], is the same for standard and un-
known as well as for each dose level-see §§ 11.2, 12.2, and 13.1.

13.5. Confidence limits for the ratio of two normally distributed


variables: derivation of Fieller's theorem
The solution to the problem posed in § 7.5 will now be given. The
result, in its general form, looks rather complicated; but the numerical
examples in §§ 13.11-13.14 show how easy it is to use.
Although the sum or difference (or any linear combination, see p. 39)
of two normally distributed (Gaussian) variables is itself normally

Digitized by Google
294: § 13.G
distributed, their ratio is not. Therefore, as disOU88ed in § 7.G, the
methods disc1188ed so far cannot provide oonfidence limits for the
ratio. A solution of the problem will now be described.
The simplest application of the result is to find the oonfidence limits
for the ratio (= m, say) of two means (see § 14:.1), the problem dis-
0U88ed in § 7.G. It is shown below that if g (eqn (13.5.8» is very sma.ll
oompa.reci with one, so (I-g) ~ I, the result of using Fieller's theorem
is the same as the approximate result, m±tV[va.r(m)], where va.r(m)
is given, approximately, by (2.7.16).
The theorem is needed to find oonfidence limits for the value of
the independent variable (z) neoessa.ry to produce a given value of the
dependent variable (y) as disOU88ed in § 12.4:. A numerical example of
this 'oa.libration curve problem' is given in § 13.14:. The oonfidence
limits for a potency ratio are also found using Fieller's theorem.
Before oonsidering a ratio, the argument of § 7.4: leading to oon-
fidence limits for a single Gaussian variable, 11, will be repeated in a.
rather more helpful form. If 11 is normally distributed with population
mean p and estimated variance r then lI-P is normal with population
mean = p-p = 0 and variance 8~, so, as in § 4:.4:, t = (y-p)/r. As in
§ 7.4: the 100 IX per cent oonfidence limits for the value of p are baaed
on Student's t distribution (§ 4:.4:) which implies
P[ -tB ~ (y-p) ~ +"'1 = IX (13.5.1)
or, in other words (see § 11.3, p. 182),
P[(y-p)~ ~ t~r] = IX. (13.5.2)
The deviation (y-p) will border on significance when it is equal to
-tB or +tB, i.e. when
(y-p)~ = t~r.

This is a quadratic equation in p and solving it for p using the usual


formulaf gives as the two solutions p = 1I-tB and p = lI+tB, the
oonfidence limits for p found in § 7.4:. This seems a long way round to
get the same answer as before, but the approach will now be used to
find the oonfidence limits for a ratio.
Consider, in general, any ratio p = IXIP. The estimate (m) of the
population value (P) from the observations will be written m = alb
where a is the estimate of IX, and b is the estimate of p. The oa.se of
interest, or, at any rate, the oa.se to be dealt with, is when a and b are
t In pneraI, if w+bz+c ... 0 then :/I - [ -6 ± V'C".-4ac)]/k

Digitized by Google
§ 13.5 295
fIOf'fI1lIllll diBtribvled variables (with population meaDS IX and Il). The
variances of a and b must be specified and this will be done using a
new notation. This notation is based on the fact that not only the
variances but also the oovariances (in analysis of variance problema
that are linear in the general sense discussed in § 12.7) can be expresaed
as multiples of the error variance of the observations, r[y] (as usual
this is the error mean square from the analysis of variance). For ex-
ample, the variance of a mean g, is, by (2.7.8), 1/"" times the error
variance. Similarly the variance of a slope, b, is, by (12.4.2), 1/l'.(z-z)1J
times the error variance. If these multiplying factors are symbolized t1
then one can define
var[a] = t1ur,
var[b] = t1~, }(13.&.3)
oov[a, b] = t11~'

r
where is written for r[y]. The subscripts distinguishing the t1af'iGnca
mulhplw8, t1, are arbitrary (of. § 2.1), but the notation used emerges
naturally from a more advanced treatment, and is that used in Finney
(1964), who disousses Fieller's theorem and two of ita extensions. For
example, if a was a mean, g, then tJu = 1/"" as above.
Since a and b are normally distributed and I' is a oonstant, the
variable (a -ph) is a linear function of normal variables, and is therefore
normally distributed. The population mean of (a-ph) will be IX-I'll = 0
and ita estimated variance will be, using (13.5.3), (2.7.2), (2.7.5), and
(2.7.6),

var[(a-I'b)] = var[a]+var[pb]-2 oov[a,ph]


= var[a]+1'1J var[b]-21' oov[a,b]
= r(tJu +1'~u-21't111J)' (13.5.4)

Now, by direct analogy with (13.5.2), it follows from the definition of


Student's t that

(13.5.5)

And, again by analogy, the 100 IX per cent oonfidence limits for I' are
found by solving for I' the equation

(13.5.6)

Digitized by Google
296 § 13.5
This is again a quadratic equation in p and when solved for p by the
usual formula (see above) the two solutions are the required oonfidence
limits for p. They are

1 [ m-(/1J12
(I-g) tBJ{v11 -2mv12+ m2v22-g ( v11 - V122)}]
v22 ±'b V22 (13.5.7)

where
t 2"22
g=~. (13.5.8)

Simplijicati0n8 oJ Fieller's theorem in special ca8e8


If a and b are independent, i.e. V 12 = 0, the result simplifies oon-
siderably, giving the confidence limits for p 808

(1 m 9)±b(I'.:..9)J[(1-9)V11+m2v22]. (13.5.9)

The quantity 9 defined in (13.5.8) can be oonsidered an index of the


significance of the difference of b (the denominator of the ratio) from
zero. This is clearly important because if the denominator could be
zero, the ratio oould be infinite. The effect of the (I-g) in front of the
± sign is to raise both upper and lower limits, i.e. unless 9 is very small
the limits are not symmetrica.l about m. Since var(b) = varby
(13.5.3) it follows that if b2 < t 2"22' i.e. if 9 > 1, then b would be
judged 'not significantly different' from zero at the level of significance
fixed by the value of t chosen, and useful limits oould not be found. In
other words l/g is the square of the ratio of b to t times the standard
deviation of b.
If 9 is very small, 808 it will be in good experiments, then the general
formula, (13.5.7), simplifies giving the (symmetrical) oonfidence limits
for p 808
(13.5.10)

This is the result that would be obtained by treating m 808 roughly


normally distributed and ca.lculating m±tv[var(m)], 808 in § 7.4,
using the approximate formula, (2.7.18), together with (13.5.3) to give
2 (var(a) var(b) 2 COV(a,b»)
var(m)~m ~+~- ab

(13.5.11)

Digitized by Google
§ 13.5 297
If a and b are uncorrelated (VIi = 0), &8 well 9 ~ 1, then the confidence
limits for p ca.n again be found &8 m ± tv'[var(m)], the approximate
expression for va.r(m), (13.5.11), simplifying even further to

,,(va.r(a) Var(b»)
va.r(m) ~ m ~+~ iO

r
== bi(vu +mivii). (13.5.12)

This is the variance given by the approximate formula, (2.7.16)


(because OS(m) = va.r(m)/mi, etc., from the definition (2.6.4».
Examples ofthe use of the results in this section will oocur in §§ 13.6
and 13.10-13.14.

13.8. The theory of parallel line assays. Confidence limits for the
potency ratio and the optimum design of assays
This discU88ion applies to any parallel line &88ay. The simplifications
possible in the case of symmetrical &88&Ys (see § 13.1) are given later in
§ 13.10, and numerical examples in § 13.11 onwards.
The logarithm of the potency ratio (R) is M = log R &8 in § 13.3. It
will be convenient to rearrange the formula for the potency ratio,
(13.3.4), to give

- -)
M - (xs-xu (iiu-iis)
= . (13.6.1)
b

The term (xs -xu) ha.s zero variance because x is supposed to be me&8-
ured with negligible error (see §§ 12.1, 12.2, and 12.4), and 80 can be
treated &8 a constant. The approach is therefore to find confidence
limits for the population value of (iiu-iis)/b and then add the constant
(xs -xu) to the results. Now if the observations are normally distributed
then 80 are (fiu-fis) and (as explained in § 12.4) the average slope, b.
The right-hand side of (13.6.1) is therefore the ratio of two normally
distributed variables, a.nd confidence limits for it can be found using
Fieller's theorem (§ 13.5).
The variance multipliers defined in (13.5.3) are required first.
From (2.7.3) and (2.7.8) it follows that va.r[fiu-iis] = ,i[y]/nu+r[y]/ns
and therefore

V
11
= (~+~)
nu ns
(13.6.2)

Digitized by Google
113.6
where "0 and "s are the numbers of responses to the 1IJlImown and
standard preparations. The variance of the average slope. b. in the
denominator. is. from (13.4.6). var[b] = ru,]/ I I(z-%)2 80
•• U

1
"22 = ",,(z-z/
~~ -\2' (13.6.3)
••U

the notation being explained in 113.4. Beca.1188 it can be shown (186


113.10) that <Iu-g.) and bare uncorrelated. i.e. have zero covariance.
it follows that "18 = 0 (see also 112.7). Thus the simpJified form of
Fieller's theorem. eqn. (13.5.9). can be uaed to find confidence limite for
the ratio <Iu-g.)/b. Using (13.6.2) and (13.6.3) these are

<Iu-j.)/b ±
(I-g)
I
b(I-g)
J[(I-g)(~+~)+ <Iu-j.)2J~.
"0 "s II(z-%)2 (13.6.4)
••U

where from (13.5.8) and (13.6.3).

9 = 68II(z-%)2' (13.6.5)
••U

From (13.6.1) it follows that (13.6.4) gives the confidence limite for
11 -(%.-%u). so the confidence limite for the log potenoy ratio. 11. are
(%.-%u>+[13.6.4].
To find the confidence limite for the potenoy ratio itself the anti-
logarithms of these limite are required. Now. &8 discussed in 113.3. the
calculations are often carried out not with logarithms to base 10 of the
dose. but with some other convenient base, say,. In this case 11
= 10grB. and. &8 explained in 113.3. it is necessary to multiply by
10gI0" to convert to logarithms to base 10. before looking up the anti-
logarithms. The confidence limits for true value of log10B are thus

[(Z.-~ul+ <9r:!;t±b(l~)J{(l-U{~+~)+ ~i!~Z}] log,,r.


(13.6.6)

A numerioal example of the 1188 of this general equation 00CUl'8 in


§ 13.13.

Digitized by Google
§ 13.6 A88aYB au caZibrtDioR CUn168 299
8impli.ftcation oll1&e calculation lor good a88aYB
If the 810pe of the log dose-response line, b, i8 large oompa.red with
the experimental error then 9 will be 8mall (see § 13.5), 80 (I-g) ~ 1.
Inserting this into (13.6.6), together with the definition of M == 1000B
from (13.3.4), giVeB the oonfidenoe limits for 10810B as approximately

(13.6.7)

This i8 equivalent to treating the log potency ratio, M, as approximately


normally distributed and calculating the limits as M ± 'Vlvar(M)], as
in § 7.4, with

(13.6.8)

which can altematively be inferred directly from (13.5.12

The optimum design 01 a88ayB


The aim is to make the oonfidenoe limits for the potency ratio as
narrow as poBBible, i.e. the result of the &.88&y as precise as poBBible.
Ways of doing this can be inferred from (13.6.6).

(1) 9 should be 8mall. In other words (see disCU88ion at the end of


§ 13.5) the slope of the response-log dose lines 8hould be as
large as poBBible relative to its standard deviation. If 9 is large
(approaching 1) the limits for the log potency ratio will beoome
wide because ofthe term involving 9 after the ± sign in (13.6.6),
and also unsymmetrical about M because of the 9 in the term
before the ± sign both upper a1Ullower limits are raised when 9 i8
large.
(2) 8, the error standard deviation should be 8mall. That is the
responses should be as reproducible as poBBible; and the error
varianoe reduced, if poBBible, by giving the d0888 in designs 8uch
as randomized blocks, as described in § 13.1 and illustrated in
§§ 13.11 and 13.12.
(3) b should be large, to minimize the term after the ± sign in
(13.6.6). A steep 810pe will also minimize g.

Digitized by Google
300 A88aYB and calibration c'UrtJe8 § 13.6
(4) (1/"0+ 1/",s) should be small. That is, as many responses as
possible should be obtained. For a fixed total number of responses
(1/"'u+ 1/",s) is at a minimum when "'u = "'s so a symmetrica.l
design (see § 13.1) is preferable.
(5) (fiu-gs) should be sma.ll because it occurs after the ± sign in
§ 13.6. That is, the size of the responses to standard and unknown
should be as similar as possible. The a.ssa.y will be more precise
if a good guess at its result is made beforehand.
(6) l:l:(x-i)1 should be large. That is, the doses should be as far
apart as possible, making (x-i) large; but the responses must,
of course, remain of the straight part of the response-log dose
curve.

13.7. The theory of parallel line a88ays. Testing for non-validity


This discussion is perfectly general for any parallel line a.ssa.y with at
least 2 dose levels for both standard and unknown, i.e. (2+2) and
larger a.ssa.ys. The simplifications possible in the ca.se of symmetrical
a.ssa.ys are described in §§ 13.8 and 13.9 and numerical examples, are
worked out in §§ 13.11-13.13. The (k+ 1), e.g. (2+ 1), dose a.ssa.y is
discussed in § 13.15.
In the discussion in § 13.1 it was pointed out that it will be required
to test whether the slope of the response-log dose lines differs from
zero ('linear regression' as in § 12.5), and whether there is reason to
believe that the lines are not parallel. If more than 2 dose levels are used
for either standard or unknown it will also be possible to test the
hypothesis that the lines are really straight. The method of doing these
tests will now be outlined.
Each dose level gives rise to a group of comparable observations
and these can be analysed using an analysis of variance appropriate
to the design of the assay, the dose levels being the 'treatments'
of Chapter 11; as discussed in §§ 12.6 and 13.1. For example, for a
(2+ 2) dose a.ssa.y there 4( = k, say) 'treatments' (high and low standard,
and high and low unknown), and a (3+4) dose a.ssa.y has k = 7 'treat-
ments'. The number of degrees of freedom for the 'between treatments
sum of squares' will be one less than the number of 'treatments' (or
'doses'), i.e. at least 3 as this section deals only with (2+2) dose or
larger assays (cf. § 11.4). Now the 'between treatments' (or 'between
doses') sum of squares can be subdivided into components just as in
§ 12.6. This partition can be done in many different ways (see § 13.8
and, for example, Mather (1951), and Brownlee (1965, p. 517», but

Digitized by Google
§ 13.7 301
only one of these ways is of interest. Each component must be un-
correlated with all others and this will be demonstrated, in the case of
symmetrical assays, in § 13.8. Three components, eaoh with one degree
offreedom, oan always be separated; (a) linear regression, (b) deviations
from parallelism, and (0) dift'erenoe between standard and unknown
responses, as desoribed in § 13.1. H there are more than 3 degrees of
freedom (i.e. more than 4 'treatments') the remainder can be lumped
together as 'deviations from linearity' (of. § 12.6), as in Table 13.1,
or further subdivided as in §§ 13.10 and 13.12. The analysis thus has
the appearence of Table 13.1 ifthere are k dose levels ('treatments') and
N responses altogether.

TABLE 13.7.1

Degrees of
Source of variation Sum of squares
freedom

Linear regression 1 A
Deviations from parallelism 1 B
Between standard and unknown 1 C
Deviations from linearity k-4 D-(A+B+C)

Between 'treatments' or dose levels k-l D


Error (within 'treatments') N-k

Total N-l

The bottom part of the analysis would look like Table 11.6.1 or Table
11.8.2 if a randomized blook or Latin square design (respectively) were
used.
(1) Linear regruaion. To test whether the population value of the
slope differs from zero, the appropriate sum of squares (SSD) is, from
(13.4.0) by analogy with (12.3.4),
SSD for linear regression =
[l:ys(xs -xs)+l:yu(xu -xu>r~
(13.7.1)
l:(xs _XS)2+l:(XU-XU)2
(2) Deviations from paralleli8m. To test whether the lines are parallel
it seems reasonable to oaloulate the dift'erenoe between (a) the total
sum of squares for linear regression for lines fitted separately to
standard and unknown (from 12.3.4), and (b) the sum of squares for
linear regression when the slopes are averaged (i.e. (13.7.1», because
this difference will be zero if the lines are parallel. Thus

Digitized by Google
302 § 13.7
SSD for deviations from parallelism ~

~s(zs _ZS)]2 ~u(ZU-zu)]2 . •


~( -)2 + ~( -)2 SSD for linear regreB81on. (13.7.2)
~ zs-zs ~ Zu-zu

(3) Behoeen. standard and unknown. This is found directly from


(11.4.5) as
(EYS)2 (EYu)2 02
SSD between S and U = - - + - - - N ' (13.7.3)
ns no
(4) Deviations from linearity. This is found as the difference between
the sum of squares between 'treatments' (from (11.4.5», as in Table
11.4.3, and the total of the above 3 components. It must be zero for
a (2+2) dose assay (k = 4) when the sum of (13.7.1), (13.7.2), and
(13.7.3) can be shown to add up to the sum of squares between treat-
ments.
A numerical example of the use of these relations is worked out in
§ 13.13.

13.8. The theory of symmetrical parallel line assays. Use of


orthogonal contrasts to test for non-validity
Numerical examples are given in §§ 13.11 and 13.12. Symmetrical
(in this context) means, summarizing the definition in § 13.1,
n = number of responses at each of the k dose levels
(see § 13.7), same for all
N = kn = total number of responses,
ks = ku = lk = number of dose levels for standard and
to unknown (same for both), (13.8.1)

D = ratio between each dose and the one below it. The same for all
doses, and for standard and unknown (see also §§ 13.1 and 13.2
and Fig. 13.8.1), so the doses are equally spaced (by log D) on
the logarithmic seale.

The symmetrical (2+2) doIle aBsay. Contrasts


There are k = 4 dose levels, low standard (LS), high standard (HS),
low unknown (LU) and high unknown (HU). The k-l = 3 degrees of
freedom between dose levels can be separated into 3 components as
described in §§ 13.1 and 13.7 (Table 13.7.1). A simpler approach than
that in § 13.7 is possible.

Digitized by Google
§ 13.8 A8saY8 and calibration Curt168 303
As usual a hypothesis is formulated. Then the probability that observa-
tions would be made, deviating from the hypothesis by as much &8, or
more than, the experimental results do, il the hypothesis were in fact
true, is calculated (of. § 6.1).
(1) Linear regreuion. From Fig. 13.8.1 it is clear that il the null
hypothesis (that the true value, p, of the average slope, see § 13.4, is
zero) were true then, in the long run, the responses to the high doses

NHI' --------. HU
Slope=bv ~ I Slope=b ?fHS
/) I ~I
.. I Standard I
~so III'
- I
I Slope=bs I
i I A I
~ Slopt'=b? ILS h I
8- 'I I . I
~ ii,.I· --1Ll.r I I
I I
log dOle (x)

F JG. 13.8. 1. The symmetrical 2 +2 dose pa.ra.Jlelline 8811&7.


o Mean of" observed responses (e.g. gau is the mean of the
" responses to zau)'
- - - Straight lines between observed points, with slope bu
for unknown and b. for standard.
- - Best fitting pa.ra.Jlel lines with slope b (= average of
b. and bu, see § 13.4).

would be the same as those to low doses, i.e. UUU+UBS = ULU+ULS'


It follows that if the regre8Bion contrast, L 1 , is defined as

(13.8.2)

(a linear combination of the responses), it will be a measure of departure


from the null hypothesis. If the null hypothesis were true ..he population
mean value of Ll would be zero (as long as each dose level is given the
same number of times so the total responses can be used in place of the
mean responses). In a small experiment LI would not be exactly zero
at

Digitized by Google
304 A88aY8 and calibration CUnJU § 13.8
even if the null hypothesis were true, and it is shown § 13.9 how to
judge whether Ll is la.rge enough for rejection of the null hypothesis.
(2) DetJiationB from paralleliBm. From Fig. 13.8.1 it is clea.r that if
the null hypothesis (that the population lines are pa.rallel, Ps = Pu, see
§ 13.4) were true then, in the long run, gBU-gLU = gBS-gLS' Therefore
deviations from pa.ra.llelism are measured, as above, by a detJiationB
from paralleliBm ccm.traBt, L; defined as
L; = l:YLs-l:YBS-l:YLU+l:BU'yHU (13.8.3)
Again the population value of L; will be zero if the null hypothesis is
true.
(3) Between Btandartl and unknown preparationB. If the null hypo-
thesis that the population mean response to standard is the same as
that for unknown were true then; in the long run, gLS+gBs = gLU+gau'
Departure from the null hypothesis is therefore measured by the
between 8 and U (or between preparationB) ccm.traBt, L p , defined as
Lp = -l:YLS-l:YBS+l:YLU+l:YBU, (13.8.4)
which will have a population mean of zero if the null hypothesis is
true.
These contrasts are used for calculation of the analysis of variance
and potency ratio, as described below and in §§ 13.9 and 13.10.
The subdivision of a sum of squares (of. § 13.7) using contrasts
is quite a general process described, for example, by Mather (1951) and
Brownlee (1965, p. 517). The set of contrasts used must satisfy two
conditions.
(1) The sum of the coefficients of the contrast must be zero. In
Table 13.8.1 the coefficients (which will be denoted at) of the response
totals for the contrasts defined in (13.8.2), (13.8.3), and (13.8.4) are
summarized. In each case l:at = 0 as required. This means that the
population mean value of the contrast will be zero when the null
hypothesis is true. t
(2) Each contrast must be independent of every other. A set of
mutually independent contrasts is described as a set of orthogonal
ccm.traBt8. It is easily shown (e.g. Brownlee (1965, p. 518» that two
contrasts will be uncorrelated (and therefore, because a normal distri-
bution is assumed, independent, see § 2.4) when the sum of products
of corresponding coefficients for the two contrasts is zero. All results
t In the language of Appendix I, E{L] = E[l4T1] where T, is the total of the "
reapollSN of the jth treatment (dose). If all the observations were from a. Bingle popula-
tion, all EfT,] = rap where E(y] = p. Thus E{L] - rap14 = 0 if14 = O.

Digitized by Google
§ 13.8 A88aY8 and calibration CUn7es 305
necessary for the proof have been given in § 2.7. It is shown in the
lower part of Table 13.8.1 that this condition is fulfilled for all three
poBBible pairs of contra.sts.

TA.BLE 13.8.1
The upper part summarizes the coefficient8 (at) 0/ the response totala/or
the tJalidity tests/or the symmetrical (2+2) dose as8ay. The lower part
demonstrates the orthogonality (i.e. independence) 0/ the contrasts

I:IILS I:IIB8 I:"Lu I:"Bu l4 l41

Linear regression ~ -1 +1 -1 +1 0 4
Parallelism L~ +1 -1 -1 +1 0 4
Preparations (8 and U)
Lp -1 -1 +1 +1 0 4

TotaJ

«£1 X «£~ -1 -1 +1 +1 0
«£1 X «£p +1 -1 -1 +1 0
«£1 X «£p -1 +1 -1 +1 0

These conditions mean that if two contrasts are defined, to measure


linear regre88ion and deviations from parallelism, say, there is no
choice about the third, which happens to measure the difference
between 9s and 9u.

The symmetrical (3+3) dose as8ay


There are k = 6 levels, say 81, 82, 83, Ul, U2, and U3, where 81 is
the lowest, 82 the middle, and 83 the highest standard dose. There are
k-l = 5 degrees of freedom between dose levels 80 after separating
components for linear regre88ion, deviations from parallelism and
between 8 and U, there are two degrees of freedom left for deviations
from linearity (see Table 13.7.1). The first three contrasts are
constructed from response totals by the same sort of reasoning as for
the (2+2) assay and the coefficients (at) are given in Table 13.8.2.
Deviations from linearity can be further divided into two components
each with one degree of freedom. H the average curve for 8 and U is
straight then (for a symmetrical assay) the responses to the middle
doses will be equal to the mean of the responses to the high and low
doses, i.e., in the long run (gSl +983+9ul +9U8)/4 = (982+9ul)/2.

Digitized by Google
306 § 13.8
Therefore a ~ from linearity contra.Bl, La, measuring departure
from the -hypothesis of straightness, can be defined as
(13.8.5)
and this will be zero in the long run if the average line is straight. The
fifth contrast is dictated by the conditions mentioned above. It is
called L~, and inspection of the coefficients in Table 13.8.2 shows that

TA.BLE 13.8.2
OoeJficient8 (at) oJ 1M rup0n8e foIal8 Jor 1M orthogonal contraats in a
symmetrical (3+3) do8e auay
ReepoD8e totaJa Eat Eat2

Contrast ~Sl ~S2 l:tiS8 l:tiVl l:tiu2 ~UlI

~ -1 0 1 -1 0 1 0 4
Li 1 0 -1 -1 0 1 0 4
Lp -1 -1 -1 1 1 1 0 6
La 1 -2 1 1 -2 1 0 12
LI -1 2 -1 1 -2 1 0 12

It can eaai1y be checked t.ha.t, 88 in Table IS.8.1, the sum of the products of the
coeftlcients of corresponding totaJa is zero for all poaaible p&ira of contrasts, 80
all pa.ift of contraate a.re orthogonal.

it is a measure of the extent to which deviations from linearity are the


same for the standard and unknown. It is therefore called the difference
oj curvature contrast (cf. Li which me808ure8 the extent to which
the linear regressions diifer between Sand U, i.e. deviations from
parallelism).

13.9. The theory of symmetrical parallel line assays. Use of


contrasts In the analysiS of variance
The notation used is defined in § 2.1 and (13.8.1). In conformity with
the usual approach in the analysis of variance, it is required to ca.lculate
from each contraat a quantity (the mean square) that will be an estimate
of the error variance, a2[y], if the appropriate null hypothesis is true.
These estimates will then be compared with the error variance (which
estimates a2 whether or not the null hypotheses are true), in the usual
way (see § ll.4). Numerica.l examples are given in §§ 13.11 and 13.12.
The first step is to estimate the variance of a contrast. If PI is used

Digitized by Google
§ 13.9 307
to stand for the total of the" responses to the jth dose level, then the
contrasts defined in § 13.8 all have the form

(13.9.1)

The variance of this, from (2.7.10), is ~Gt:var(T/) and from (2.7.4) it


follows that var(T/ ) = u2[y] where ,slY] is the estimated variance of
the observations and " is the number of observations in eaoh total.
Thus
(13.9.2)
The values of~Gt2 are worked out in Tables 13.8.1 and 13.8.2.
It might be supposed that it is not possible to estimate the variance
of L directly from the observed scatter of values of L, becaU8e there is
only a single experimentally observed value for each contrast. However
it is what happens when the null hypothesis is true that is of interest,
and it was shown in § 13.8 that when it is true the population mean
value of eaoh L will be zero. Now it was pointed out in § 2.6 (eqn.
(2.6.3» that if there are N observations of 1/, and if the population
mean value (p.) of 1/ is known, then the estimate of the variance of 1/ is
~{1J-p)2/N (the divisor is only N -1 when the sample mean is used in
place of pl. For a single value of L it follows that, on the null hypotheais.
var[L] == (L-0)2/1 == V. (13.9.3)
Equating (13.9.2) and (13.9.3) shows that when the null hypothesis
(that the population value of L is zero) is true. an estimate of the error
variance is provided by

,s---,
v
-~r
(13.9.4)

and this expression also gives the sum of squares required for the
analysis of variance, beoaU8e eaoh sum of squares has one degree of
freedom---iJee §§ 13.7, 13.8, 13.11, and 13.12-eo the sum ofsquaree is
the same as the mean square.
It is not difficult to show (try it) that, when the appropriate base is
used for the logarithms giving (13.2.6), (13.2.7), (13.2.12), and (13.2.13),
the sums of squares for testing validity given by the general formulas
(13.7.1), (13.7.2), and (13.7.3) are the same as those given by (13.9.4),
using the definitions of the contrasts in § 13.8. The demonstrations
follow the lines used in the next section.

Digitized by Google
308 A88aY8 and calibration C'UrveB § 13.10
13.10. The theory of symmetrical parallel line assays. Simplified
calculation of the potency ratio and its confidence limits
The general results in §§ 13.3, 13.4, and 13.6 can be simplified when
the appropriate dose metameter is used (see § 13.2). The notation, and
the definition of 81Jmmetrical, are given in (13.8.1). Numerical examples
are given in §§ 13.11 and 13.12. Try to suspend your belief that this is
a very complicated sort of simplification until you have compared the
calculations for symmetrical assays (in §§ 13.11 and 13.12) with those
for an unsymmetrical assay (§ 13.13).

The 81Jmmetrical (2+2) doBe a88aY


The best dose metameter in this case was shown in § 13.2 to be
x = log,z where z = dose and r = v'D. The consequences of using this
base for the logarithms, derived in § 13.2, can be used to simplify the
ratio (gu-gs)/b which occurs in the potency ratio and its confidence
limits. Taking the numerator first, (gu-gs) is, as expected, simply
related to the between-preparations contrast, Lp. Thus, from (13.8.1)
and (13.8.4),
_ _ l:yu l:ys l:YLU+l:YBU-l:YLS-l:YBS
(Yu-Ys) = 1I.u - 1I.s = iN

(13.10.1)

The average slope, b (see § 13.4) is related to the regression contrast,


L 1 , as expected. From (13.4.5), (13.2.6), (13.2.8), and (13.8.2) it follows
that
l:ys(Xs -xs) + l:yu(xu -xu)
b= l:(xs -xsr~+l:(xu-xu)2

(XLS-XS)l:yLS+ (xBS-XS)l:YBS+(XLU-xU)l:yLU+ (XBU-XU)l:YBU


II(x-X)2
s.u

(13.10.2)

Combining (13.10.1) and (13.10.2) gives

(gu-gs)/b = 2Lp /L1 • (13.10.3)

Digitized by Google
§ 13.10 309
Furthermore, from (13.2.5),
is -iu = ZLB -ZLU = logrZLs -logrZLu = logr(Zul/Zx.u)
(13.10.4)
and from (13.3.6)
(13.10.5)

The potency ratio


Substituting (13.10.3) and (13.10.5) into the general formula for the
log potency ratio, (13.3.7), gives

Putting r = v'D (remembering that log v'D = log DI = 1 log D),


taking antilogarithms gives

R= (~:)anti10g10[~~OgI0D J. (13.10.6)

The eonjidenu limit8


It was mentioned in § 13.6 that (9u -9s) is not correlated with b and
80 "12 = o. This follows (using (13.10.1) and (13.10.2» from the fact
that Lp and Ll were shown in § 13.8 to be uncorrelated.
From (13.6.2) and (13.8.1),
"11 = (1/n u+ 1/ ns) = (1/1N+l/1N) = 41N. (13.10.7)

And from (13.6.3) and (13.2.8),


"22 = 11 I I(z-i)2 = liN. (13.10.8)
s.u
Substituting (13.10.2), (13.10.3), (13.10.5), (13.10.7), (13.10.8) and r =
v'D (again log v'D = 1 log D) into the general formula for the
confidence limits for IOgloR (13.6.6), and taking antilogarithms, gives
the confidence limits for the population value of R, the potency ratio
in a symmetrical (2+2) dose assay, as

(~:).antilOglo[ (Ll(~~g)± Ll(~_g)J{N(I-g)+N(~:r}).lOgI0D]


(13.10.9)

Digitized by Google
310 .A88tJY8 and caZibraHon CUf'VU § 13.10
where, from (13.6.5), (13.10.2), and (13.10.8),
N8 2t 2
g=-y. (13.10.10)
1

H g is very amall 80 (I-g) ~ 1, then (13.10.9) can be further sim-


plified. As explained in § 13.6, this equivalent to treating IOg10R
as approximately normally distributed, and calculating confidence
limits for its population value as log10R ± t B[IOg10R] as in § 7.4,
where the approximate standard deviation of log10R, from (13.10.9)
(or from (13.6.8», is

(13.10.11)

A numerical example of the use of (13.10.9) and (13.10.11) is given in


§ 13.11.

Phe symmdricaZ (3+3) tloBe auay


The simplifications follow exactly the same lines as those just
described. (From (13.8.1) and the definitions of the contrasts in Table
13.8.2,
y

(13.10.12)

and, from the general definition of the slope b, (13.4.5), using (13.i.12),
(13.2.14), and the definition of Ll in Table 13.8.2,
1
b = I I(Z_i)2[(ZSl -is)l:YSl +(z82-is)~S2+ (Z83 -is)~83+
S.U

y
(13.10.13)

Combining (13.10.12) and (13.10.13) gives

(gu -Ys)/b = 4Lp /3L 1 • (13.10.14)

Digitized by Google
§ 13.10 311
Furthermore, from (13.2.11)
(zs-Zu) = (Z81-Zul) = logrZal-logrZul = log,(Zsl/Zul)
(13.10.1ft)

BO, from (13.3.6),


(ZS-ZU).logl.,r = log10(Zsl/Zul)' (13.10.16)
Substituting these results, together with , = D (see§ 13.2), into the
general formulae, as above, gives the potency ratio, from (13.3.7), as

(13.10.17)

Confidence limits for the population of B from (13.6.6), (with "11


... 41N, "a = 1/IN) are

(::).anti1og1o[(3L~~~g)±3Ll~_g)J{N(I-g)+ ~(~:)1).log10D]-
(13.10.18)
....here

(13.10.19)

Again if, g is small, 80 l-g ~ 1, further simplification is possible.


As explained above and in § 13.6 the confidence limits for the popula-
tion value of IOglOR can be found, as in § 7.4, from log10 B ± l8[log10R]
where the approximate standard deviation of logloB, from (13.10.18)
or (13.6.8), can be written in the form

(13.10.20)

A numerical example of the use of (13.10.18) and (13.10.20) is given in


§ 13.12.

13.11. A numerical example of a symmetrical (2 +2) do..


parallel line a...y
The results of a symmetrical assay of (+ )-tuboourarine baaed on a
randomized block design (see § 11.6) are shown in Table 13.11.1. The
mean responses are plotted in Fig. 13.6. The response, 11, was the
percentage inhibition, caused by each dose, of the oontraction (induced

Digitized by Google
13.11
by stimula.tion of the phrenic nerve) of the isolated rat dia.phragm.
The four doses (or 'treatments') were allotted arbitrarily to the numbers
0, 1, 2, 3 &8 described in § 2.3:
Dose 0 = LU = 0·28 ml of unknown solution,
HU = 0·32 ml £¥c)lution,
= I6·0pg of )~LhIbocura.rine,
LS = 14·0 pg of H,fcbocura.rine.
))&8 given four doses were
The doses were given in sequence to the same tissue (see § 13.1, p. 286),
the blocks, in this ca.se, corresponding to periods of time. The analysis

TABLE 13.11.1
(+ )-tuboc'Ura1FiffhI, df"1Fffe8 were given in fCCHUff"Cfff in
(time period) a8 cC'fff'f,f,f"ffC the text, not in in

UI HS LU HU Totals

1 43 62 41 61 207
2 48 62 48 68 226
Block
3 53 66 53 70 242
4, 52 70 56 72 250

196 i98 271

tfC1F£¥fffc)re help to due to changes of


lFfith time (whi1Fd in this experin1nz,t} H(D1FnnVer it
seems most unlikely that the responses in one block (period of time)
will differ from the responses in another by a constant amount, &8
specified in the model (eqn. (11.2.2» on which the ans.lysis is ba.sed
(see §§ 11.2 and 11.6), so the analysis should be regarded &8 only an
approximation. The four doses were given in strictly random order
each time 1Fnildom number
: first block: block 3, 1D

nlnalysis (normnl of
errors, equnl scatter in all gronpn, of response nnt, fCCH"fC"CCf,f
vions responses, additivity etc.) have been discussed in §§ 1l.2 and
13.1, p. 279. The ans.lysis is the same &8 that for randomized block

gle
§ 13.11 A88aY8 and calibration C'UrtIU 313
experiments (§ 11.6), with the addition that the between treatment
sum of squares can be split into components as described in § 13.7.
Because this assay is symmetrical (see (13.8.1» the arithmetic can be
simplified using the results in §§ 13.8, 13.9 and 13.10. Remember that

70

40 '--_""7U'-:_6:---_-:-'U-=_5--1 f + i .1 +1-2
loglodOBe
FIG. 13.11.1. Results ofsymmetrical2+2 dose 8811&y from Table 13.11.1.
o Observed mean re&pODSes.
- - Least sqU&re8 lines constra.ined to be p&ra.llel (i.e. with
mean slope, see §§ 13.4, 13.10 and calculations at end of
this section).
Notice break on absciss&. The question of the units for the potency ratio,
50·61, is discU8lled I&ter in this section.

the assumptions discussed in §§ 4.2, 7.2, 11.2, and 12.2 have not been
tested, 80 the results are more uncertain than they appear.

The analysiB oj variance oj the resp0n8e (y)


The figures in Table 13.11.1 are actually identical with those in Table
11.6.2 which were used to illustrate the randomized block analysis.
The lower part of the analysis of variance (Table 13.11.2) is therefore
identical with Table 11.6.3. The calculations were described on p. 199.
(A similar example is worked out in § 13.12.) All that remains to be
done is to partition the between-treatment sum of squares, 1188·6875,
using the simplifications made possible by the symmetry of the assay.
(a) Linear regression. The linear regression contrast defined in
(13.8.2) (or by the coefficients in Table 13.8.1) is found, using the
response totals from Table 13.11.1, to be
Ll = -196+260-198+271 = 137.

Digitized by Google
314 .A.88ays and calibranon C'UnJU § 13.11
The sum of squared. deviations (SSD) for linear regression is found
using eqn. (13.9.4):
lJf 137i
SSD = ~.
----: = -4X4 = 1173·0625.

In this expression ft, is the number of responses in each total used


in the calculation of L (see (13.8.1», and l:/Xi is the sum of the squares
of the coefficients of the response totals in L (given in Table 13.8.1;
in the particular oase of the (2+2) dose &88&y it is 4 for all 3 contrasts).
(b) DetJiationB from parallelism. The deviations from parallelism
contrast defined in (13.8.3) is
L~ = 196-260-198+271 = 9·0
The sum of squares is found, &8 above, from (13.9.4)
L'i Di
SSD == _1
~i
- = 5.0625.
== 4X4
(c) Between. 8laflllard and unlmoum preparations. The contrast,
defined (in 13.8.4), is
Lp = -196-260+198+271 = 13·0,
and the sum of squares, using (13.9.4) &8 above, is
13i
SSD = -4X4 = 10·5625.

(d) Oheck em arithmetical (JC(;'Uracy. The sum of the three components


is 1173'0625+5·0625+10'5625 = 1188,6875, the same &8 the between
treatments sum of squares which was caloulated independently.
These results are assembled in the analysis of variance, Table 13.11.2.

I fIlerpretation. oj the an.alysis oj tHJria1&Ce


Dividing eaoh mean square by the error mean square, in the usual
way, gives the variance ratios F. As usual all the mean squares would
be an estimate of the same variance ai if all 16 observations were ran-
domly selected from a single population with variance ai. This is
the basio all-embracing form of the null hypothesis because if it were
true there would obviously be no differences between treatments,
blOOD, preparations, etc. In fact, when the variance ratio for linear
regression, F = 319·3 with J1 = 1 and Ji = 9 degrees of freedom, is

Digitized by Google
§ 13.11 315
looked up in tables of the distribution of the variance ratio, as described
in § 11.3, it is found that a value of F(I,9) as large as, or larger than,
319·3 would be very rarely (P«O·OOI) observed if both 1173·0625 and
3'674 were estimates of the same varianoe (as), i.e. if there were in
fact no tendency for the high doses to give larger responses than low
doses (see §§ 13.8 and 13.9). It is therefore preferred to reject the null
hypothesis in favour of the alternative hypothesis that response dou

TABLB 13.11.2
AMlYN 01 tHJriance 01 re8pO'l&8U lor aymmetncal (2+2) do8e assay 01
( +) tubocurarine. The lower pari 01 the aMl,N i8 identical with Table
11.6.3 which was calcvlaIed uftng the _me ftguru
Source of variation d.f. SSD 1tIB F P

Linear regreaaion 1 1178·0625 1178·0625 819·8 <0·001


Deviations from
parallelism 1 5·0625 5·0625 1·88 >0·2
Between preps.
(Sand U) 1 10·5625 10·5625 2·87 0·1-0·2

Between treatments 8 1188·6875 896·229 107·86 <0·001


Between blocks 8 270·6875 90·229 24,·56 <0·001
Error 9 88·0625 8·674,

Total 15 a92·4,875

ohange with dose (i.e. that (J, the population value of b, is f&Ot zero, of.
t 12.5). The logical reason for this preference was discussed in § 6.1.
Proceeding similarly for the other variance ratios shows that devia-
tions from parallelism such as those observed would be quite common
if the true (population) lines were parallel. The same (or larger devia-
tions from parallelism would be expected in more than 20 per cent of
repeated experiments if the population lines had the same slope
({Js = (Ju)· There is therefore no evidence against the hypothesis of
parallelism.
Similarly there is little evidence that the average responses are
dift'erent for standard and unknown. Of course it is most unlikely that
they are exactly equipotent, but dift'erences as large as, or larger than
those observed would not be very uncommon if they were (see p. 93).
There appears to be a real dift'erence between blocks. Differences as
large as, or larger than, those observed would be expected in less than
1 in 1000 experiments if the population block means were equal; of.

Digitized by Google
316 A88aY8 aM calibration curves § 13.11
§ 11.6. Inspection of the results reveals a tendency for the responses to
get larger with time, and the analysis suggests that this cannot be
attributed to experimental error. The arrangement in blocks has
therefore helped to decrease the experimental error.
All these inferences depend on the assumption of §§ 4.2, 7.2, 11.2,
and 12.2 being sufficiently nearly true. If they were, the conclusion
would be that there is no evidence that the assay is invalid 80 it is
not unreasonable to carry on and calculate the potency ratio and its
confidence limits.

The potency ratio aM ,he question 01 units


The simplified result for the symmetrical (2+2) dose parallel line
assay, eqn. (13.10.6), gives the least squares estimate of the potency 808

R= (~:) . antilog [~~oglOD]


1o

14.0) [ 13·0 J 50.64


= ( 0.28 . antilog10 137.0· antilog1o (I·14286) = 50·61 pg/ml.

D is the ratio between high and low doses (see (13.8.1», i.e.
D = 0·32/0·28 = 16·0/14·0 = 1·14286, and the contrasts, Lp and Ll
have been already calculated. In § 13.3 and later sections it W808
assumed that all doses were expressed in the same units. This means
that Zr.s/Zr.u, and hence R, is a dimensionless ratio. In this case the
dose of standard was given in pg, and that of unknown in ml, sO
Zr.s/Zr.u = 14·0 pg/O·28 mI = 50·0 pg/ml. If these units are used Zr.s/
Zr.u, and hence R, will have the units pg/mI, suggesting that, if these
units are used, R is actually the potency (concentration inpg/mI) of the
unknown, rather than a potency ratio. It can easily be seen that this
is so by converting standard and unknown to the same units. For
example the doses of standard could be assumed to be 16·0 mI and
14·0 mI of a 1·0 pg/mI standard solution of ( + )-tubocurarine (the fact
that they are more likely in reality, to have been 0·16 mI and 0·14 mI
of a 100 pg/mI solution does not alter the dose given). This would give
ZLS/ZLU = 14·0 mI/O·28 mI = 50 (a dimensionless ratio). The potency
ratio would therefore be 50·61, as above, also a dimensionless ratio.
The concentration of the unknown is, from the definition of the
potency ratio (13.3.1), Rx concentration of standard = 50·61 X 1·0 pg/
ml = 50·61 pg/mI, as found above.

Digitized by Google
§ 13.11 A8saY8 and calibration, curvM 317
OOfl,jidenu limits/or lhe potency ratio
The simplified form of Fieller's theorem appropriate to this assa.y
is eqn. (13.10.9), which gives confidence limits for the population
value of the potenoy ratio as

(~~) • antilOg1o [ (Ll(~~9)±Ll(7_g)J{N(I-9)+N(~:r}) .IOglOD]


9
where = NHIL~, according to (13.10.10).
If the doses are expressed in their original units the equation will
give confidence limits for the concentration of unknown, rather than
for the potency ratio, for enctly the reasons explained above in
connection with the potency ratio oaJoula.tion. In this enmple:
ZuJ/Zr.u = 14·0/0·28 = 50·0 pg/ml as above:
Lp/Ll = 13·0/137·0 = 0·09489;
IOglOD = log (0,32/0,28) = 0'05529; 0.05799
aI[y] = 3'674, the error variance (from Table 13.11.2) with 9 degrees
of freedom;
t = 2·262 for P = 0·05 (as 95 per cent confidence limits are wanted)
with 9 d.f. (from tables of Student's I, see §§ 4.4 and 7.4); thus
9 = 16 X 3·674 X 2·2622/137·()2 = 0·0160, from equation (13.10.10),
and (I-g)
= 1-0·016 = 0·984.

The fact that 9 is considerably less than one implies that the slope, h,
is much larger than its standard deviation (as inferred from the large
variance ratio for linear regression in Table 13.11.2). This means that
it is safe to use an approximate equation, based on (2.7.16), for the
variance of the log potency ratio (as discUBBed in §§ 13.5, 13.6, and
13.10, and illustrated below). However, it is very little trouble to use
the full equation. Substituting the above quantities into the equation
for limits gives
0.09489 1·917 X
50.antilog1o [( 0.984 ± 137.0 X 0.984
2.262J{(16 X 0·984)+ 16(0·09489)2})
X 0'05529]
= 50 antilog 1o [-0·001757 and +0·01242lt
= 49·80 pg/ml and 51·45 pg/ml.
49.79 51.52
t If neceuary, _ p.325 for a footnote deecribing how to find the antilog of a
negative number.

Digitized by Google
318 § 13.11
Appro:rimate ctmjideftce lim.,.
Because g is muoh less than 1 the approximate formula for the limits,
eqn. (13.10.11), can be used (see §§ 13.5, 13.6, and 13.10). Substituting
the quantities already calculated into (13.10.11) gives the estimated
standard deviation of IOg10B aa

0·05529
B[logI0R] ~ 137.0 VJ:3·674X 16(1+0·0948Di)] = 0·003108.

The approximate confidence limits are therefore, aa in § 7.4,

IOg10B ± 1B[l0gloR] = logI050·61±(2·262 X 0·003108) = 1·6972


and 1·7113.

Taking antilogs gives the approximate 95 per cent Ga1lB8ian confidence


limits for the true value of B aa 49·80 and 51·44 pg/ml, not much
different from the values found from the full equation, whioh are
themaelves, of course, only approximate aa explained in § 7.2.

Summary oj eM resua
There is no evidence that the assay is invalid, and the estimated
potenoy of the unknown tubocurarine solution is 50·61 pg/ml, with
95 per cent Ga1lB8ian confidence limits 49·9 ",g/ml to 51·45 ",g/ml. These
conclusions are based. on the aasumptions disousaed in §§ 4.2, 7.2, 11.2,
and 12.2. The confidence limits are, aa uauaJ, likely to be too narrow
(see § 7.2). Notice that the confidence limits for B are not equally
spaced on each side of R, unlike the limits encountered in Chapter 7.
In fact even the limits for log R are not equally spaced on each side of
log R unless g is small (see §§ 13.5, 13.6, and 13.10).

B(1UJ 10 pial f'UUlU. OORtIflrft«m 10 ctmt1enimt un'"


When the results of the assay is plotted, as in Fig. 13.11.1, it will
be preferable to plot the leaat squares lines. The calculated average
slope, b, haa been found using logs to base v'D (see § 13.2) 80 these
must be used in plotting the graph (they can be found from logs to
base 10 using (13.3.5». Alternatively, if the graph is plotted with log10
(dose) along the abacirma, as in Fig. 13.11.1, the calculated slope must
be converted to the correct units. In this example b = LI/N = 137·01
16 = 8·562S (from eqn. (13.10.2».
To convert from logs to base v'D to logs to base 10, it is necessary,
using (13.3.5), to multiply the former by 10g10 v'D as in § 13.3. Because

Digitized by Google
§ 13.11 319
dose ooours in the denominator of the slope, b must be tlitMedt by
loglov'D, i.e. by i log10D = i IOgI0(0·32/0·28) = 0·0290. The required
slope is therefore b' = 8'6626/0·0290 = 295·3. The dose response
ourves have the eqns. (13.3.3),

Ys = 's+b'(xs-is ),
Y u = lu+b'(xu-iu),
where x is now being used to stand for loglo(dose), the abscissa of
Fig. 13.11.1. The response means are, from Table 13.11.1, Is = (196
+260)/8 = 67·0 and lu = (198+271)/8 =- 58·625. The dose me&D8
have not been needed explicitly because of the simplifications resulting
from the ohoice of dose metameter. For the standard, 1081016'0
= 1·204:1 and log1014:·0 = 1·14:61 80 is == (4: X 1·204:1 +4: X 1'14:61)/8
= 1·1751 (each dose occurs four times remember). Similarly logloZau
= log100'32 == -0·4:94:9 and 10g1oZLu = IOg100·28 = -0·6628, 80 iu
= (4: X -0·4:94:9+4: X -0·5528)/8 = -0·6238.
Substituting these results gives the lines plotted in Fig. 13.11.1 as
Ys = 57·0+295·3(xs -l·1751), Yu = 58·626+296·3(xu+0·6238).

13.12. A numerical example of a symmetrical (3 +3) do..


parallel line ....y
The results in Table 13.12.1, which are plotted in Fig. 13.12.1, are
measures of the tension developed by the isolated guinea pig ileum in
response to pure histamine (standard), and to a 8OIution of a histamine
preparation containing various impurities as well as an unknown
amount of histamine. Five replicates of eaoh of the six doses were
given, all to the same tissue, 80 there is a danger that one response
may affect the size of the next, contrary to the necessary assumption
that this does not happen (see discussion in § 13.1, p. 286). The doses
were arranged into five random blocks. The purpose of this arrange-
ment is the same as in § 13.11, and, as in that example, the order in
which the six doses were arranged in each block was decided strictly
at random using random number tables (see § 2.3).
This is a symmetrical assay as defined in (13.8.1), there being,.. == 6
responses at eaoh of the 1: = 6 dose levels; 1:s = 1:u = 3 dose levels
for standard and for unknown; ns == no = 16 responses for standard
t More rigorously. the slope using loglo (dose) is
dy dy 1 dy b
d logloZ = d(logv-v·logloYD) = IOg10yD'd logV-Dc == lOllDyD'

Digitized by Google
A88ays and calibr77j,i{77~ 13.12
TABLE 13.12.1
Re8p0n8u oj the isolated ileum. The do8u were given in random orrkr
(see ten) in each block (time period), not in the order shown in the table

8tandard histaminff Unknown


81 82 VI U2
4 ng/ml 8 ng/m£ ng/ml 16ng/ml Total
~--

20·5 27'0 18·5 30·0 ~69'0


18·5 31·5 15'0 24'0 £67·5
3 20·0 26'0 3££·££ 13·0 26·0 758·5
4 18·0 23·5 41·5 13'5 26·0 35·0 157·5
5 20'0 25·0 38·5 12·0 25'0 32'0 152·5

Total 97'0 133·0 197·5 72'0 131'0 174·5 805·0

o
/
30
Standard

0'7___ ?-known

J
log

20

10~~~------~------~------~
0·602
/
0·903 1·2Q.i 1·505 log!. doee
2 3 4 5 logl d08e
4 8 16 32 dose (muml)
(IOjl&ritllmic
sl'ale)

1• Resu1t.e of ffffff~(fffffi.ff~ ££ +3 dose assay l'iU2.1.

o Obillfffffffj, ffi{ff££ff)DSe8 to sta.n££fffffff_


11 ObseffffffU fffffffPOnses to unknffi{i{_
- Least constrained to H 13.4,
13.Hi

The analysis indicates tha.t these straight lines may well not fit the observations
adequately. The abscissa. shows three equivalent ways of plotting the log dose.
Note tha.t the ordinate does not st&rt at zero.

gle
§ 13.12 321
and for unknown. The ratio between each dose and the one below it is
D = 2 throughout. The first stage is to perform an analysis of varianoe
on the responses to test the &88&y for non-validity. As for aU &88&ya,
this is a GaUBBian analysis of variance, and the &88UDlptioDS that.
must be made have been discU88ed in §§ 4.2, 7.2, 11.2, and 12.2, whioh
should be read. Uncertainty about the &88umptioDS means, as usual,
that the results are more uncertain than they appea.r.

Analyri8 oj tJanance oj tilt re8pO'll8e (y)


The first thing to be done is, as in § 13.11, a conventional GaUBBian
analysis of variance for randomized blocks. Prooeeding as in § 11.6,
(]:A 805i
(1) correction factor N = 30 = 21600·8333;
(2) sum of squares between doses (treatments), with k-l =5
degrees of freedom, from (11.4.5),
97·Qi 133·Qi 174'5i
= -5-+-5-+"'+-5--21600'8333 = 2179-0667;

(3) sum of squares between blooks, with 3-1 = 4 degrees of freedom,


from (11.6.1),
169·Qi 152'5i
= -6-+"'+-6--21600'8333 = 32-8333;
(4) total sum of squares, from (2.6.5) or (11.4.3),
= :£(y_g)i = 20.5i+18-5i+ ... +35.Qi+32.Qi-21600-8333 =
2328·6667;
(5) error (or residual) sum of squares, by difference,
= 2328·6667 -(2179·0667 +32·8333) = 116-7667
with 29 -5 = 6(5 -1) = 24 degrees of freedom.

These results can now be entered in the analysis of variance table,


Table 13.12.2. The next stage is to aocount for the differences observed
between the responses to the six doses, i.e. to partition the between
doses sum of squares into components representing different SOuroeB
of variability, as described in § 13.7. The simplified method described
in § 13.8 can be used because the &88&y is symmetrical. The coefficients,
ot, for construction of the contrasts are given in Table 13.8.2.
(a) Linear regt'e88Wn,. From the coeffioients in Table 13.8.2, and the
response totals in Table 13.12.1, the linear regression contrast is

Ll = -97·0+197'5-72·0+174·5 = 203·0.

Digitized by Google
322 113.12
The corresponding 8UDl of squares for linear regression is found, using
(13.9.4), to be
L~ 2031
SSD = - = - = 2060·45.
nul 5X4

In this expreasion n = 5 is the number of responses at each dose level


(i.e. in each total), and u l , the sum of squares of the ooeftioienta, is
given in Table 13.8.2.
(b) D~ from fNJf'tJllelilm. The deviations from paraJlelism
contrast, from Table 13.8.2, is

L~ = 97·0-197'5-72·0+174'0 = 2·0.
The corresponding sum of squares is

L'~ 2·01
SSD=-=-=O·20.
n1:«1 oX4
(c) Between standard and unknown. preparatioM. The contrast, from
Table 13.8.2, is
Lp = -97·0-133·0-197·5+72'0+131·0+174'5 = -00·0.

The sum of squares, from (13.9.4) (using 1:«1 = 6 from Table 13.8.2), is
LI 50·01
SSD = n1:«1 =
-p - - = 83·33.
5x6

(d) DetJiatioM from linearity. The contrast, from Table 13.8.2, is


LI = 97'0-(2x 133'0)+ 197·5+72·0-(2 X 131'0)+174·5 = 13·0

and the corresponding sum of squares, as before, is


LI 13·01
SSD = _I = - - = 2.82.
n1:«1 5 X 12
(e) Difference of curvature. The contrast, from Table 13.8.2, is
L~ = -97'0+(2 X 133·0)-197·5+72·0-(2X 131'0)+174·5 = -44·0,

and the corresponding sum of squares


(L~)I (-44.0)1
SSD = - = = 32·27.
n1:«1 {) X 12

Digitized by Google
§ 13.12 A88GY8 tmd calibrtJlion cumI8 323
(j) OAeci on ariIAmeIical accuracy. The total of the five BUms of squares
just calculated is
2060·45+0·20+83·33+2'82+32'27 = 2179'07
agreeing, as it should, with the sum of squares between doees which
was caJculated independently above.
All these results are now assembled in an analysis of variance table,
Table 13.12.2, which is completed as usual (of. §§ 11.6 and 13.7).
Divide each sum of squares by its number of degrees of freedom to find

TABLB 13.12.2
The P mlU6 marked t iB Jound from reciprocaZ F = 1),838/0,2,
Bee Ia:e

Source d.f. B8D MS F P

Lin regression 1 2060,'6 2060,'6 852·9 «0'001


Deviation from
paraJ1e1iam 1 0'20 0·20 0·08' 0'8-0'9t
Between S and U 1 83·88 88·88 14,·27 0:!0·001
Deviations from
linearity 1 2'82 2·82 0·4,8 >0·2
Difference of
curv&ture 1 82·27 82·27 6'68 <0·05
Between doeea
Between blocb ,6 2179'07
32·83
4.86·813
8'208
7"65
1'4,1
«0'001
>0·2
Error 20 116'77 5·888
Total 29

the mean squares. Then divide each mean square by the error mean
square to find the variance ratios. The value of P is found from tables
of the distribution of the variance ratio as described in § 11.3. As
usual P is the probability of seeing a variance ratio equal to or
greater than the observed value iJ the null hypothesis (that all 30
observations were randomly selected from a single population) were
true.
Interpretation oj ,he analy8i8 oj mriance
The interpretation of analyses of variance has been disC11.888d in
§§ 6.1, 11.3, and 11.6 and in the preceding example, § 13.11. As usual
it is conditional on the assumptions being sufficiently nearly true, and
must be regarded as optimistic (see §§ 7.2, 11.2, and 12.2). There is no

Digitized by Google
324 A88ays aM calibration CunJeB § 13.12
evidence for differences between blocks, so little or nothing was gained,
and some degrees of freedom were lost, by using the block arrangement
in this particular case (of. § 13.11). The average slope of the dose
response curves, shown in Fig. 13.12.1, is clearly not likely to be zero
because if it were, a value of F ~ 362·9 would be exceedingly rare.
The question of pa.raJlelism is interesting, especially as the standard
and unknown were not identical substances. The variance ratio,
F(l,20) = 0·2/6·838 = 0·034, is very small so there is no hint of
deviations from pa.raJlelism. To find the P value for F < 1 the method
described in § 11.3 can be used. Looking up F(20,l) = 6·838/0·2
= 29·2 in tables of the variance ratio gives the probability of observing
an F value of 29·2 or larger as something between 0·1 and 0·2. Therefore
the probability of observing F(l,20) ~ 0·034 is 0·1-0·2,-not so rare
that the lines must be considered as more nearly parallel than would
be expected on the basis of the observed experimental error. Another
way of stating the result is that in 80-90 per cent of repeated experi-
ments the F value for deviations from parallelism would be predicted
to be greater than 0·034 if the population lines were parallel.
Though neither the standard nor the unknown observations lie
on straight lines, as seen in Fig. 13.12.1, the analysis of variance gives
no hint of deviations from linearity. This is because the average of the
two lines (to which the analysis refers) is very nearly straight. The
observations lie on linea that curve in opposite directions so the curva-
tures cancel when the slopes are averaged. In fact an F value corres-
ponding to a difference in curvature as large as, or larger than, the
observed one would be expected to occur, as a result of experimental
error, in rather less than 6 per cent of repeated experiments. This
cannot be explained further without doing more experiments. There
could be a real difference in curvature as a result of the impurities in
the unknown solution. In intuitive pharmacological grounds this
does not seem very likely so perhaps there is no real difference in
curvature and a rarish (rather less than 1 in 20) chance has come off
(see § 6.1). More experiments would be needed to tell.
H the possibility of a real difference in curvature were not considered
to invalidate the assay, the potency ratio and its confidence limits
would be calculated as follows.

The potency ratio


In this example the doses of both standard and unknown are ex-
pressed in the same units (ng/ml) , so the units problem discussed in

Digitized by Google
§ 13.12 325
§ 13.11 does not arise. The least squares estimate of the potency ratio,
from (13.10.17), is

R = ( Zsl) . (4Lp
Zul . antilog1o 3L1 • log10D
)

4) • (4X(-50) )
= ( 8 . antilog10 3 X 20S • IOglo 2

= 0·5 antilog10( -0·09885) = 0·5 X 0·7964 = 0·S98t


From the definition of the potency ratio, (13.3.1), conoentration of
unknown = R X concentration of standard. The unknown preparation
is thus estimated to contain 39·8 per oent by weight of histamine,
assuming that the impurities in it do not interfere with the assay.

Oon.fi.dence limits lor the pottncy ratio


The simplified form of Fieller's theorem for the (3+3) dose sym-
metrical assay is (13.10.18), which gives confidenoe limits for the
population value of the potenoy ratio &8

( Z_8_1) . antilog1o [( 4Lp ±____-4Bt--


Zul 3L1 (I-g) 3L1(I-g)

where 9 = 2Nrt2/3L~ according to (13.10.19). For this example

ZSl/Zul = 4/8 = 0,5,


Lp/Ll = -50/203 = -0·2463,
IOglOD = IOgl02 = 0·3010,
8 2[y] = 5·838, the error varianoe (from Table 13.12.2) with 20 degrees
of freedom,
8 = V(5'838) = 2·416,
t = 2·086 for P = 0·05 (for 95 per cent confidence limits) and 20 d.f.
(from tables of Student's t, see §§ 4.4 and 7.4).
t To find the antilog of a negative number write it .. the IUJD of a negative integer and
a positive part between 0 and 1. ThuB, to find antilog (-0'09885), write -0'09886 in
the fonn -1+0·9011, which is oonventionally written '·9011. Look up antilo.J 0'9011
= 7'964, and move the decimal point one place to the left (beoaulle of the I) giving
antilog (-0'09885) = 0·7964. Working from first principles. antilogl0( -0'09885)
= 10- 0.01111., from the definition oflogarithmB,and 10- 0'01111. = 10- 1 10+ 0'"011 = 10- 1
antilog (0,9011).

Digitized by Google
326 AMay6 au calibralitm CWI1U § 13.12
Thus, = 2x30X 5-838 X 2'0862/(3 X 2032) = 0001233
and (1-,) = 009877.
As in the last example, , is small 80 the approximate formula for the
limite could be 1l8ed, but before doing this the full equation given above
wi1l be uaed to make 811J'8 that the approximation is adequate. Substitut-
ing the above quantities into the general formula gives

[(4 X ( -0024:63)
OoG antilogl0 3 X 0.9877 ±
2
4 X 2·416 X 2'086j{(30 X009877)+---a-<
3 X 203 X 009877
4 X 30 })]
-0'24:63)2 0·3010

== 006 antilog (-00IG61, -Oo04:4:OG) = 0·349 to 004:62t.

.A.f'fWOZiflllJle cunjitlt:nu lim'"


Beca.u.ae , is much lees than I, the approximate formula for the
oonfidaDce limite (see fl13.5, 13.6, and 13.10) can be uaed, as in the
last example. Substituting into (13.10.20) gives the estimated standard
deviation of log10B as

.pogl0R]-4XOo30IOj[
3X203 5'838x30(1+'3(-0.2463)2
2 )] = 0·02669.

The approximate 9G per cent oonfidence limite are therefore, as in § 7.4,

log10B ± l8[1og10B] = 10810 0·3982 ± (2·086 X 0-02669)


= -0·3999 ± 0·05568 = -0·4556 and -0·34:4:2.

Taking antilogst gives the confidence limite as 0'3GO and 0'453, similar
to the values found from the full equation.

Summary 0/ the ruult


The &888.y may have been invalid because of a difference in curvature
between the standard and unknown logdoae-response curves. H this
difference were attributed to (a rather unlikely) chance the estimated
potency ratio would be 0·398, with 95 per cent Gaussian confidence
limite of 0·349 to 0·452. As usual, these confidence limite must be J'8
garded as optimistic (see § 7.2).

t See footnote p. 326.

Digitized by Google
§ 13.12 A8saY8 and calibration C'Unle8 327
PloIIing the re8'UlU
The slope of the response-log dose lines, from (13.10.13), is h = 203/20
= 10·15. This is the slope using z = logD (dose) (eee f 13.2). It must be
divided by IOgloD = 0'3010, giving h' = 33'72, the slope of the response
against log10 (dose) lines, whioh are plotted in Fig. 13.12.1. The full
argument is similar to that for the 2+2 dose assay given in detail in
§ 13.11.

13.13. A numerical example of an unsymmetrical (3 +2) do..


parallel line aU8Y
The general method of analysis for parallel line 8888oyS, when none of
the simplifications resulting from symmetry (defined in (13.8.1» can be
30

20

10

II 0.0 0.5 ).0


loglo dOile (x)
Flo. IS .1S. 1. Results of an unsymmetrical S+ 2 dose 888&y from Table
lS.1S.1.
o Observed mean responses to standard.
8 Obeerved mean responses to unknown.
- Least squa.ree tinea constrained to be pa.raJlel (see
§ lS.4 and this section).

used, will be illustrated using the results shown in Table 13.13.1 and
plotted in Fig. 13.13.1. The figures are not from a real experiment--
in real life a symmetrical design would be preferred. Tho 16 dOBe8

Digitized by Google
328 § 13.13
should be allocated strictly at random (see § 2.3) so a one way analysis
of va.rian.ce (see § 11.4) is appropriate (given the &88umptions described
in § 11.2).
TABLB 13.13.1
BuulIB 0/ a 3 + 2 do8e atl8ay
Standard doaea UDknown doaea
1'0 8'0 10·0 1·0 4·0

0·0 0·4771 1'0 0'0 0·6021 Total

9·4 18·0 27·7 18·6 25·1


10·8 18·8 28·1 12·8 25·0
10·1 17·9 28·2 24·0
18·1

ft 8 4 8 2 8 15
Mean 10·1 18·2 28'0 18·2 24·7
Total SO·8 72·8 84·0 26·4 74·1

T '---y---J
Total 187·1 100·5 287'6

The analyriB 0/ variance


0/ the re8pOfl868
The one way analysis of variance is exaotly as in § 11.4.
(J2 287-«P
(1) Cbrrection factor N = ~ = 5514·25066.
(2) Total sum of squares (from (2.6.5) or (11.4.3», with N -1 = 14
degrees of freedom,
= 9·4i +l0·8i + ... +24·()2-5514·25066
= 650·16933.
(3) Sum of squares between doses (from (11.4.5» with 5-1 = 4
degrees of freedom,
30'3i 72·8i 74·1i
= -3-+-4-+"'+-3-- 5514'25066
= 647·48933.
(4) Error sum of squares, by difference,
= 650·16933-647'48933 = 2·6800
with 14 -4 = 10 degrees of freedom.
The next stage is to divide up the sum of squares between doses, as
described in § 13.7. It will be convenient first to ca.lculate various
quantities from the results.

Digitized by Google
§ 13.13 Assays and calibration curves 329
Fur the &:+!0$3kliw:\oru the Ta.bl&:+ 13.U, givr

3)+(0,4771 4)+ '0 x 4· {'00t4


(remember that eaoh dose ooours several times; of. Table 12.6.2),
Xs 4,900,'/10
~Ys = 30'3+72'8+84·0 = 187·1,
iis 18tH/I0 18·71,
~(us-xs)r = (0!0 X 3)+(0,477 x4)+(I·()2x3)-4·90842/10
= 1·50126 (from (2.6.5)· again eaoh x occurs several

~Ys(xs-xs) = (Ox9·4)+(OX 1O·8)+ ... +(I·OX28·2) -


{,t,0084 187· 10
= 26·89672 (found from (2.6.9) and (12.2.9), &8 in
(12.6.1)),
Similarly, for the unknown preparation,
nu 5,
~xu = (OX 2)+(0·6021 X 3) = 1'80630,
xn 1,80000/5 0·3tHt6,
~yu = 26·4+74·1 = 100'5,
iiu == 100,5/5 = 20·10,
2) X 3) = 0,43500,
~Yu(xu-:fu) = (OX 26·4)+(0'6021 X 74·1)-(1'8063 X 100'5)/5
8·30008.

Now these results can be used to find the components of the sum of
squf?UUE? bel ween Oesc?'ibed in 13.

(1 LinenE? regretE?E?ion, fO&:+ftm 1),


(26·89672+ 8·30898)2
SSD = 1,00126 0'4300:~
= 640·111405.

(2) Deviations from parallelism, from (13.7.2),


83"''''82
SSD = 1.50126 + ~'~;~~3 640·111405 = 0·472584.

(3) Between standard and unknown, from (13.7.3),

187, 100'02
SSD = -W+-5--5514·25066 = 6'4403.

:ed b) IV\. l\:.


330 § 13.13
(4) Deviations from linearity, by difference,
SSD = 647'48933-640-1114015-0-4721584-6'4403
= 0-4650415.
These figures can now all be filled into the analysis of variance table
(Table 13.13.2), whioh has the form of Table 13.7.1.

TABL. 13.13.2
8omoe of variation eLf. 88D JrI8 11 P

I..iDear repeeaion 1 MCHll 640'11 2S88 «0'001


DevJatIoDa from
para.IJeUsm 1 0,'73 0·4.73 1·78 >0'2
Between 8 and U 1
DeviatioDa from linearlty 1
8·4.40
0·4.85
8-4.40
0·4.85 1-7,
2',03 <0-001
>0'2
Between doeee , M7"89 181'872 80',0 «0·001
Error (within doeee) 10 2·880 0·288
Total
l' 860·189

The interpretation of the analysis is just &8 in §§ 13.11 and 13.12.


There is no evidence of invalidity, though if the reapoD8e8 to standard
and unknown had been more nearly the same it would have increased
precision slightly (see § 13.6).

PloUittg t1ae ruults


The average slope of the dose reaponae linea, from (13.4.5), is

26·89672+8·30898
b=
1·50126+0·43l503
== 18·18.
The slopes of lines fitted separately to standard and unknown would be
bs = 26·89672/1·50126 = 17'92, and bu = 8·30898/0'43503 = 19·10.
The lines plotted in Fig. 13.13.1 are therefore, from (13.3.3),

Ys = 18·71+18·18(:':s-0·4908),
Yu = 20·10+18·18(:':u-O·3613).

This calculation, but not the preoeeding ones, has been made rather
simpler than in §§ 13.11 and 13.12, because there is no Bimplifying
transformation to bother about.

Digitized by Google
§ 13.13 331
PM pom&C1J ralio
From (13.3.7). the potency ratio is estimated to be (because ~ = loglo
dose)
(20.10-18-71)]
R = antiloglo [ (0'49084-0-36126)+ 18-18

= antiloglo(0'2060) = 1-607.

OrmjiJlenu limits lor t1ae poten,cy ratio


Using the quantities already found.

r[y] = 0-2680 (the error mean square with 10 d.f. from Table 13.13.2),
8 = V(0-2680) = 0-6177,
(gu-ys)/b = (20·10-18-71)/18-18 = 0-076468,
I I(~-i)1I = 1-50126+0-43603 = 1-9363,
S.u
t = 2-228 for P = 0-951imita and 10 d.f_ (from tables of Student's ';
see §§ 4.4 and 7.4).

Thus, from (13.6.6),


2-22811 X 0·2680
g = 18.1811x 1.9363 = 0·00208

80 (I-g) = 0-9979.
Logs to the base 10 have been used, 80 the conversion factor
loglo 10 = l. The 95 per cent confidence limits for the population value
of R are therefore, from the general formula (13.6_6),

0·076458
antiloglo [ (0·49084-0·36126)+ 0.9979 ±

0·5177 X 2·228 I{ (1 1) 0-07645811}]


18-18 X O·9979.v' 0-9979 5+ 10 + 1·9363

= antiloglO (0-2062 ± 0-03496)


= 1·484 and 1-743.

Approximate wn.fi.de'nce limits


Because g is small (even smaller than in the lut two examples),

Digitized by Google
332 A8aaY8 and calibration CUn168 § 13.13
the approximate formula, its general form, can be used. In this case
M = IOglOR 80, using (13.6.8),

va.r[logloR]~18'182
0·268 (I5+ 10I+ 0.076458
1·9363
2)

= 2·4570 X 10-'.

The confidence limits for IOglOR are therefore IOglOR ± tY(Va.r[IOglOR]),


and Y(Va.r[IOglOR]) = Y(2·457) X 10- 2 = 0·015675, giving the limits
as 0·2060±2·228X 0·015675 = 0·2060±0·03492 = 0·1711 and 0·2409.
Taking antilogs gives the approximate confidence limits as 1·483 and
1·742.

Summary 0/ the re8'Ult


The &88&y is not demonstrably invalid. The potency ratio is estimated
to be 1·607, with 95 per cent Gaussian confidence limits of 1·484 to
1·743. The analysis depends on the &88umptions discussed in §§ 4.2,
7.2, 11.2, and 12.2 and, as usual, the confidence limits are likely to b~
too narrow (see § 7.2).

13.14. A numerical example of the standard curve (or calibration


curve). Error of a value of x read off from the line
In Chapter 12 the method for estimation of the error of a value of Y
(the dependent variable) read from the fitted line at a given value of z
was described. In § 12.4 it was mentioned that the reverse problem,

TABLE 13.14.1
III Observations (II) Total " Mean

1 2.3 1.7 4.0 2 2.0


2 5.4 4.7 4.9 15·0 3 5'0
standard
,
3 7" 6·6
9'7 8'9 8·'
14'0
27·0
2 7·0
3 9·0
Unknown 8·1 8·5 16'6 2 8·3

estimation of the error of a value of z interpolated from the fitted line


for a given (observed or hypothetical )value of y, is more complicated.
In fact the method is closely related to that used to estimate confidence
limits for the potency ratio, and an example will now be worked out.
The results in Table 13.14.1, which are plotted in Fig. 13.14.1, are

Digitized by Google
§ 13.14 .A88aY8 and calibration CUn168 333
results of the sort that are obtained when measurements are made from
a standard calibration curve. This method is often used for ohemioal
a.ssa.ys. For example z could be concentration of solute, and 11 the
optioal density of the solution measured in a spectrophotometer.
In this example z can be any independent variable (see § 12.1), or any
transformation of the measured variable, as long as 11 (the dependent

10

Yu=8·3
8
11

FIG. 13. H.!. The standard caJibration curve plotted from the results in
Table 13.14.1.
o Observed mean responses to standard.
- - Fitted least squa.rea straight line (see text).
-.-.- 95 per cent Gaussian confidence limits for the popuJation
(true) line, i.e. for the popuJation value of 11 at any
given m value (see text).
- - - 95 per cent Gaussian confidence limits for the mean of
two new observations on 11 at any given mvalue (see text).
The graphical meaning of the confidence limits for the value of z corresponding
to the value of 11 observed for the unknown is illustrated.

variable) is linearly related to z (unlike most of the rest of this ohapter,


in which the disoUBBion has been confined to para.llelline a.ssa.ys in which
z = log dose). It is quite poBBible to deal with non-linear calibration
curves using polynomials (see § 12.7 and Goulden (1952» but the
straight line case only will be dealt with here.

Digitized by Google
334 .A8aay8 and calibration CUn168 § 13.14
Frequently the sta.ndard curve is determined. first and it is a&BUmed.,
as in this section, that it has stayed. consta.nt during the subsequent
period in which measurements are made on the unknowns. This
requires separate verification, and it would obviously be better if
sta.ndards and unknowns were given in random order or in random blocks.
If this is done the unknowns can be incorporated in the analysis of
variance as described in § 13.15, the effect of this being to reduce the
risk of bias and to improve slightly the estimate of error by taking
into account the scatter of replicate observations on the unknown. It
will of course be an &BBumption that the scatter of responses is the same
for all of the sta.ndards and for the unknowns, in addition to the other
&BBumptions of the GaUBBian analysis of variance which have been
described in §§ 11.2 and 12.2.

PM 8traight line and itB analyBiB 0/ variance


First a straight line is fitted to the results for the standard. The
method has already been described in § 12.6, 80 only the bare bones
of the calculations will be given here. The basic design is a one way
cl&B8ification with ks = 4 independent groups (see § 11.4).
(1) Oorrection factor
na = 10
~Ya = 4·0+15·0+14·0+27·0 = 60'0,

60·()i
correction factor = 10 = 360·0.

(2) Total Bum o/8fJ1UJru, from (2.6.5) (cf.(11.4.3»


= 2.311 +1'711 + ... +8'411 -360.0 = 65·6200.

(3) Sum 0/8quar'U between gr01l,1'8, from (11.4.5),


411 1511 1411 2711
= "2+3+2+3-360.0 = 64·0000.

Although the :l: values are equally spaced, the simplifYing trans-
formation described at the end of § 12.6 cannot be used, because the
number of observations is not the same at each x value.
(a) Sum o/8fJ1UJru due to linear regru8Wn. First calculate
~xs = (lx2)+(2x3)+(3x2)+(4x3) = 26·0
and fa = 26,0/10 = 2·60.

Digitized by Google
§ 13.14 336
The sum of products, from (2.6.7) (see § 12.6), is
l:ys(zs-xs) = (I X4·0)+(2X 15·0)+(3 X 14·0)+(4X27·0)-
26·0 X 60·0
10 = 2S·00.

The sum of squares for z is, from (2.6.6),


26·()I
l:(zs-xsr~ = (li X2)+(2i X3)+(3i X2)+(4i X3)----.o

= 12·40.

Thus, sum of squares due to linear regression, from (12.3.4),


2S·()I
= 12.4 = 63·226S.
(b) Sum of Bq1.I4res for detJiationB from linearity. By difference
SSD = 64·0000-63·2268 = 0·7742.
(4) Sum of Bq1.I4res for error (within groups sum of squares). By
difference
SSD = 66·6200-64·0000 = 1·6200.
These results can now be entered in the analysis of variance table.
Table 13.14.2.

TABLE 13.14.2
Source d.f. SSD MS F P

Lin. regreasion 1 63·2258 63·2258 234 «0·001


Dev. from Hnearity 2 0·7742 0·3871 1·43 >0·2
Between groups (:/: values) 3 64·0000 21·3333 79·0
Error (within groups) 6 1·6200 0·2700

Total 9 65·6200

The interpretation is as in § 12.6. There is strong evidence that y


increases with z. If the true line were straight then an. F value for
deviations from linearity equal to or greater than 1·43 would be expected
in more than 20 per cent (P > 0·2) of repeated experiments (given the
assumptions-see § 11.2 and 12.2), 80 there is no reason to believe the
23

Digitized by Google
336 Aaaaya and calibration curvea § 13.14
true line is not straight. However this analysis does not distinguish
between systematio and unsystematio deviations from linearity.
Looking at Fig. 13.14.1 suggests the deviations in this case, though no
larger than would be expected on the basis of experimental error, are of
a systematio sort. The line appears to be flattening out. Now physical
considerations, and past experience suggest that this is just the sort of
nonlinearity that would be expeoted in a plot of, say, optical density
against concentration. In a case like this it would be rather rash to fit
a straight line, in spite of the fact that there are no grounds for rejeoliing
the null hypothesis that the true (population) line is straight. TAia ia a
good example 01 the practical importance 01 the logicallact explained i.,..
§ 6.1, that il there are 1100 ~ lor rejecti1&{/ a AypotAe8i8 tAia does ?loOt
mea.,.. that there are good grouw lor accepti1&{/ it. In a small experiment,
suoh as this, with substantial experimental errors, it is more than
likely that deviations from linearity that are real, and large enough to be
of praotical importance, would not be detected with any certainty.
The verdiot is not proven (see § 6.1). For purpo8e8 01 illuatration, a
atraigAt lim will TWW be JUted, thougA the loregoi1&{/ remark8 8U{Jge8t that a
pol1J1loOmial (aee above) would be aaler. The least squares estimates of the
parameters (see § 12.2) are thus, from (12.2.6),
~ys 60·0
aa=gs = - = - = 6·00
""s10
and, from (12.2.8),
~Ys(xs -is) 28·00
bs = l:(XS-iS)2 = 12.40 = 2·2581,

so the fitted line is


Y s = as+bs(xs-is ) = 6·00+2·2581 (xs-2'60) (13.14.1)
= 0·1289+2·2581 Xs
and this is the straight line plotted in Fig. 13.14.1.

Interpolation 01 the unknown


The mean of the two observations (nu = 2) on the unknown, from
Table 13.14.1,isYu = 8·30. The equation for the standard line, (13.14.1),
is Y = a+b(x-i), and rearranging this to find x gives

(13.14.2)

Digitized by Google
§ 13.14 A88aY8 and calibration CU",68 337
The estimate of Xu (e.g. concentration) corresponding to the mean
observation gu (e.g. optical density) on the wllmOwn is therefore, from
(13.14.1) and (13.14.2),
Xu = :is+(Yu-gs)/bs
= 2·60+(8·30-6·00)/2'2581
= 3'619, (13.14.3)
&8 shown graphically in Fig. 13.14.1.

Ga'U88ia", rnnjide7&Ce limita for the interpolated X mlue


The approach is exa.ctly like that in § 13.6. In (13.14.3) :is is an
accurately me&aured constant. H the observations are norm&1ly
distributed (see § 4.2), then gu-a = gu-gs will be a normally dis-
tributed variable, and 80 will the slope of the standard line, bs (see
§ 12.4, especi&1ly (12.4.1». Therefore (gu-gs)/bs = m, say, will be the
ratio of two normally distributed variables and Fieller's theorem (see
§ 13.5) can be used, &8 in § 13.6, to find confidence limits for its true
value. H the error mean square from Table 13.14.2 (82 = 0·2700) is
taken &8 the variance of the observations on the unknown &8 well &8
the variance of the observations on the standard then, from (2.7.9),
var(gs) = a2/",s, var(gu) = 8 2/",U' Because the observations on standard
and unknown are assumed independent, it follows from (2.7.3) that
var(gu-gs) = var(gu)+var(gs) = a2(I/nu+ 1/",s), and 80 from (13.5.3),
"11 = (l/nu+ 1/",s),
"22 = 1{£.(xs-:is)2 (from (12.4.2»,
"12 = 0 (as in § 13.6).

In the present example


82 = 0·2700 with 6 d.f. (from (13.14.2»,
8 = V(0·2700) = 0·5196,
t = 2·447 for P = 0·95 and 6 d.f. (from tables of Student's t; see
§ 4.4),
(gu-gs) (8·30-6·00)
m = = = 1·01856
b 2·2581 '
t28 2"22 2.4472 X 0·2700
g = ~ = 2.25812 X 12.40 = 0·02557 (from (13.5.8»,

(I-g) = 0·9744.

Digitized by Google
338 .A.aaaya and calibration CUnJe8 § 13.14
The 95 per cent confidence limits for the true value of Zu therefore
follow from (13.5.9) (by adding is to the confidence limits for (gu -gs)/6;
of. § 13.6) and are

i.e in the present case


1·01856 0·5196X2·447 '[ (I I)
1·01856~
2·60+ 0.9744 ±2.2581 X 0'9744N 0·9744 2+ 10 + 12·40 -J
= 3·173 and 4·1I8.

Because g is fairly small, similar limits would have been found by


using the approximate formula, from (13.5.12), var(zu) ~ r(flll
+m2v22 )/lr. The limits are not symmetrical aboutzu = 3·619unleBBgis
negligibly small (or unless gv = gs), as disoUBBed in § 13.5. In this
case the limits expressed. as percentage deviations from 3·619 are
-12·3 per cent to +13·8 per cent. The graphical meaning of the limits
is discussed below.

Summary 01 t1ae re8'Ult


The unknown value of z corresponding to gu = 8·3 is Zu = 3·619
with 96 per cent Gaussian confidence limits from 3·173 to 4·118. These
results depend on the assumptions desoribed in §§ 7.2, 1I.2, and 12.2
and, as usual, must be considered to be optimistio (see § 7.2).

O()'fl,~ limita lor 'M population calibration line


Assuming the true line to be straight, limits for its position oa.n be
caloulated as described in § 12.4. Another example was worked out in
§ 12.5. In this case var(y) = 0·2700, the error mean square with
6 d.f. from Tables 13.14.2, N = 10 (the number of observations used
to fit the line), , = 2·447 as above, is = 2·60 and ~(z-ir~ = 12·40 &8
above. Using these values var(y) , and hence the confidence limits,
0&Il be caloulated at enough values of z to plot the limits, which are
shown as dot-dashed lines in Fig. 13.14.1. Two representative oaloula-
tions follow.
(I) Atz = 1·0. At this point the estimated value of yis, from (13.14.1),
Y = 0'1289+(2'2581 X 1'0) = 2·387

Digitized by Google
§ 13.14 339
and, from (12_4_4),
1 (1-0-2-60)2)
var( Y) = 0-2700 ( 10+ 12.40 = 0·082742.

The 96 per cent Ga1188.ian confidence limite for the population value of
Y at z = 1·0 are therefore, from (12.4.6), 2-387 ± 2·447 X y'00082742
= 1·683 and 3·091.

(2) At z = 2'0, Y = 0·1289+(2'2681 X 2·0) = 4,645,


1 (2.0-2.60)2)
var( Y) = 002700 ( 10+ 12.40 = 0,034839,
g.VIDg confidence limite of 4·646 ± (2·447 X y'0'034839) = 4·188
and 6·102.

Oonfidence limita/or the mean 0/ two M1D ob8enJati0n8 at fJ gitJe1& z. The


graphical meaning 0/ Fieller'8 theorem.
In § 12.4 a method was described for firuUng limite within which new
observations on 1/ (rather than the population value of 1/), at a given z,
would be expected to lie. In the present example there are "u = 2
new observations on the unknown. Using eqn (12.4.6) with m = 2,
N = 10, and the other values as above, these limite can be caloulated
for enough values ot z for them to be plotted. They are shown as
dashed lines in Fig. 13.14.1. Two representative caloulations, using
(12.4.6), follow.

(1) At z = 1·0. At this point Y = 2·387 as above The 96 per cent


confidence limits for the mean of two new observations are, from
(124.6),

2-387 ± 2-447 J[ 0-27 00{10+2+


1 1 (1.0-2.60)2}]
12-40 = 1-245 and 3'629.
(2) At z = 2·0, Y = 4-646 as above, and the limite are, from
(12.4.6),

4·645 ± 2·447 J[002700 10+2+{


I I (2-0-2.60)2}]
12-40 = 3·637 and 5·653.

These limits are seen to be wider than the limits for the population
value of Y as would be expected when the uncertainty in the new
observations is taken into account_ They are also less strongly ourved.

Digitized by Google
340 A88aY8 and calibration curves § 13.14
The mean of the two observations on the unknown in Table 13.14.1,
was iiu = 8·3, and the oorreaponding value of Xu read off from the line
was 3·619 as calculated above, and as shown in Fig. 13.14.1. The
95 per cent confidence limits for Xu at y = 8·3 were found above to be
3·173 to 4·118. It can be seen in Fig. 13.14.1 that these are the points
where the line for y = 8·3 intersects the confidence limits just cal-
culated (the limits for the mean of two new observations at a given x).
The limits found from Fieller's theorem (13.5.9) are, in general, the
same as those found graphically via (12.4.6).

13.15. The (k+1) dose a888Y and rapid routine 8888YS


In this section the ks + 1 dose p&rallelline assay will be illustrated
using the same results that were used to illustrate the calibration
curve analysis in § 13.14.

Routine a88ay8
The (2+2) or (3+3) dose assays should be preferred for accurate
assays. The (ks +l) dose assay probably occurs must frequently in
the form of the (2+ 1) dose assay in which the unknown is interpolated
between 2 standards. This is the fastest method and is often used when
large number of unknowns have to be assayed. It is rare in practice
for the doses to be arranged randomly, or in random blocks of 3 doses
(ks + 1 doses in general). Even worse, standard and unknowns are often
given alternately, 80 each standard is used to interpolate both the
unknown immediately before it and the unknown immediately after it.
This introduces correlation between duplicate estimates, making the
estimation of error difficult. Quite often the samples to be assayed will
come from an experiment in which replicate samples were obtained,
and several assays will be done on each of the replicate samples. In
this case a reasonable compromise between speed and statistical
purity is to do (2+ 1) dose assays with alternate standard and unknown,
and to interpolate each unknown response between the standard
responses (one high and one low) on each side of it. The replica lie assays
on each sample are then simply averaged. An estimate of error can
then be obtained from the scatter of the average assay figures for
replicate samples rather than doing the calculations described below.
The treatments should have been applied in random order (see § 2.3)
in the original experiment and the samples should be assayed in random
order. If the ratio between the high and low standard doses is small
(say less than 2) it will usually be sufficiently accurate to interpolate

Digitized by Google
§ 13.15 A88aY8 and calibration curves 341
linearly (rather than logarithmically) between the standards. See
Colquhoun and Tattersall (1969) for further discussion.

A numerical example oj a (4+ 1) doBe paraUelline a88ay


In a parallel line assay x = log (dose) by definition (see § 13.1),
unlike § 13.14 in which x could have been any independent variable.
In biological assays it is usual to specify the dose of unknown (e.g. in
mI or g of impure solid) and to compute a potenoy ratio R (see § 13.3),
10

Yu = 8·3 Unknown Standards


-----------~---------
y 8 I I
I
I
I
6 I
I

I
4 I
I
I
I
I
2 ® r--logtO R = 1·619--1
I I
I I
I I
O~------~----~~I------~--~~I ~
o 2 3 3·619 4
logt. dose (x)

FIG. 13.15.1. Ifxin §13.14 (Table 13.14.1) were log dose, then the resulta
in Table 13.14.1 could be treated as a 4 + 1 dose pa.rallelline asaa.y, as illustrated,
as an alternative to the treatment as a standard curve problem which was worked
out in § 13.14. The observations and fitted line are as in Fig. 13.14.1 with the
addition that the dose of unknown required to produce the unknown responses
has been specified.

rather than to interpolate the unknown response on the standard curve


as in § 13.14. (Equation (13.14.3) together with (13.3.7) is seen to
imply log R = 0, i.e. R = I, which simply means that a given dose, in
terms of the active substance, gives the same response whether it is
labelled standard or unknown.) Suppose, for example, that x in Table
13.14.1 represents the loglo ofthe standard dose (measured in mI) in a
(4+1) dose parallel line assay. Suppose further that a IOglO (dose
in mI) of unknown, x = 2·0, is administered twice and produces

Digitized by Google
342 § 13.15
responses 1Iu = 8·1 and 8·5, 80 flu = 8·3 as in Table 13.14.1. This
&88&y is plotted in Fig. 13.U.1. Using the general formula for the
log potency ratio, (13.3.7), gives, using (13.14.3),

- - +flu-fls
1og10 R = Xs-Xu -b-
= 3·619-2·00 = 1·619.

Taking antilogs gives R = 41·59 which means that 41·59 m1 of standard


must be given to produce the same effect as 1 m1 of unknown.
It was mentioned in § 13.14 that it is dangerous to determine the
standard curve first and then to measure the unknowns later unless
there is very good reason to believe that the standard curve does not
change with time. It is preferable to do the standards and unknowns
(all 12 measurements in Table 13.14.1) in random order (or in random
block of 5 measurements, of. §§ 13.11 and 13.12). If this had been done
the analysis of varianoe would follow the lines described in § 13.7,
exoept that there can obviously be no test of parallelism with only one
dose of standard (a 2+ 1 dose assay would have no test of deviations
from linearity either). There are now 5 groups and 12 observations. The
total, between group and error sum ofsquares are found in the usual way
(see §§ 11.4, 13.13, or 13.14) from the 5 groups of observations in Table
13.14.1. The results are shown in Table 13.15.1. The between groups sum
of squares can be split up into components using the general formulae
(13.7.1)-(13.7.3). In a (ks +l) dose assay there is only one unknown
dose 80 Xu = Xu, i.e. (Xu-xu) = 0 80 the expressions for the slope
(13.4.5) and the sum of squa.rea for linear regression, (13.7.1), reduoe
to those used already in § 13.14, which are entered in Table 13.12.1
(it is only common sense that the observations on the unknown can
give no information on the slope of the log dose-response line). The
sum of squares for differences in responses to standard and unknown,
from (13.7.3), is
60s 16.62 76·tP
- + - - - = 8'8167
10 2 12 •

When this is entered in Table 13.15.1 the sum of squares for deviations
from linearity can be found by differenoe. It is seen to be identioal
with that in Table 13.14.2, as expected.
The error varianoe in Table 13.U.l is 0'2429, less than the figure of
0·2700 from Table 13.14.2. Inclusion of the unknown responses baa

Digitized by Google
§ 13.11S A88aY8 and calibration C1U't1U 343
slightly reduced the estimate of error because they are in relatively
good agreement. The interpretation is the same as in § 13.14.
The confidence limits for the log potency ratio be found from the
general parallel line assay formula. (13.6.6). The oaloulation is. with

TABLB 13.16.1
Source eLf. 88D M8 F P

linear regrtaion 1 63·2268 63·2268 260 <0·001


Bet. atd. and unknown 1 8·8167 8·8167 86'S <0'001
Deviatiooa from llnearity 2 0'7742 0'8871 1·69 >0·2
Between d088ll 4 72·8167 18·2042 74'9 «0'001
Error (within d088ll) 7 1·7000 0·2429
Total 11 76·6167

any luck. seen to be exactly the same as in § 13.14 except Zu = 2·00


is subtracted from the result. The limits are therefore 3·173-2·00
= 1·173 and 4·118-2·00 = 2·118. Taking antilogs gives the 96 per cent
Gaussian confidence limits for the true value of B (estimated as
41'69) &8 14·89 to 131·2-not a very good assay.

Digitized by Google
14. The individual effective dose,
direct assays, all-or-nothing
responses and the probit
transformation

14.1. The individual effective dose and direct assays


THE quantity of, for example, a drug needed to just produce any
specified response (e.g. convulsions or heart failure) in an animal is
referred to as the individual effective dose (lED) for that animal and
will be denoted z. More generally, the amount or concentration of any
treatment needed to just produce any specified effect on a test object
can be treated as an lED. A standard preparation of a drug, and a
preparation of the same drug of unknown concentration, can be used
to estimate the unknown concentration. This sort of biological assay is
usually referred to as a direct a88a1l.
A group of animals is divided randomly (see § 2.3) into two sub-
groups. On each animal (test object, in general) of one group the lED
of a standard solution of the substance to be assayed is measured. The
lED of the unknown solution is measured on each animal of the
other group.
It is important to notice that in this case the dose i8 the variable not
the response as was the case in Chapter 13.
H the doses of botb solutions are measured in the same units (see
§ 13.11) then the dose (z ml, say) needed for a given amount (in mg.
say) of substance is inversely proportional to the concentration of the
solution. The object of the as'!&y is to find the potency ratio (R) of the
solutions, i.e. the ratio of their concentrations. Thus
concentration of unknown
R= concentration of standard =
population mean lED of standard
(14.1.1)
= population mean lED of unknown·
In practice the population meant IEDs must. of course. be replaced
t See Appendix 1.

Digitized by Google
§ 14.1 Probit8 345
by sample estimates, the average, i, of the observed IEDs. The question
immediately arises as to what sort of average should be used.
H the IEDs were normally distributed there are theoretical reasons
(see §§ 2.5, 4.5, and 7.1) for preferring to caloulate the arithmetio
mean lED for each preparation (standard and unknown). In this case
the estimated potency ratio would be R = is/Zo. Because the lED has
been supposed to be a normally distributed variable, this is the ratio
of two normally distributed variables. A pooled estimate of the variance
rrz] could be found from the soatter within groups (as in § 9.4). The
confidence limits for R could then be found from Fieller's theorem,
eqn. (13.5.9), with till = I/.",s and tl22 = I/nu, where ""s and Au are the
numbers of observations in each group. (Because each lED is supposed
to be independent of the others, tl12 = 0.)
However, if the IEDs are lognormally distributed (see § 4.5) then
the problem is simpler. Tests of normality are discussed in § 4.6.

Use oJ the logarithmic dose scale Jor direct assays


In those cases in which it has been investigated it has often been
found that the logarithm of the lED (x = log z, say) is normally
distributed (i.e. z is lognormally distributed, see § 4.5). It therefore
tends to be assumed that this will always be so, though, as usual, there
is no evidence one way or the other in most cases. H it were so then it
would be appropriate to take the logarithm of each observation and
carry out the calculations on the x = log z values, because they will be
normally distributed. (In parallel line assays a logarithmic scale is
used for the dose, which is the independent variable and has no distribu-
tion, for a completely different reason; to make the dose-response
curve straight. See §§ 11.2, 12.2, and 13.1, p. 283.)
Taking logarithms of both sides of (14.1.1) gives the log of the
potency ratio (N, say) as
lED of standard )
M - log R = log ( lED of unknown =

= log (lED of S)-log (lED of U).

If the log lED is denoted x = log z then it follows that the estimated
log potency ratio will be
M = log R = xs-xu. (14.1.2)
The variance of this will, because the estimates of lED have been
assumed to be independent, be

Digitized by Google
Pro1Jn8 § 14.1
var(zs) var(zu)
var(M) = var(xs)+var(xu) = + ,(14.1.3)
"s "u
from (2.7.3) and (2.7.8). It is neoessary, as in § 9.4, to assume that the
scatter of the measurements (z values) is the same in both groups so a
pooled estimate of var(z) is calculated from the scatter of the logs of
the observations within groups as in § 9.4, and used as the best estimate
of both var(zs) and var(zu}. The oonfi.dence limits for the log potency
ratio are then M ± 'v{var(M}} as in § 7.4. Taking antilogarithms of
these, and of (14.1.2), gives the estimates of B and its oonfi.dence limits.
A numerical example is given by Burn, Finney, and Goodwin (1950.
pp.44-8).
14.2. The relation between the individual effective dose and
all-or-nothing (quantal) responses
In the sort of experiment described in § 14.1 the individual effective
dose (lED) just sufficient to produce a given effect is measured directly
on each individual. For example. the amount of digitalis solution needed
to produce cardiac arrest can be measured on each of a group of animals
by giving it as a slow intravenous infusion and observing the volume
administered at the point when the heart stops. The results given in
Table 14.2.1 are an idealized version of experimental measurements
of 100 individual lethal doses (21) of cocaine cited by J. W. Trevan
(1927). The results have been grouped so that a histogram can be
plotted from them and the percentage of individual effective doses
falling in each dose interval is denoted I. The logarithms (z) of the
doses are also given (I has been added to each of the values to make
them all positive).
From the results in Table 14.2.1 the mean individual effective dose is
the total of the 21 values divided by the total number of observationst
"1:.121 61·476
i = -
"1:.1 = 100 ""'0·516 mg.
The median effective dose (dose for'P = 60 per cent) (I4.2.I)
(interpolated from Fig. 14.2.2) ~ 0·49 mg.
The modal effective dose (interpolated from Fig.
I4.2.I) ~ 0·44 mg.
t Thill mean is calculated from the grouped results, each lED being a.umed to have
the O8IltraJ. value of the group in whioh it falla. If the original ungrouped obllervatioD8
were available, the mean of these would be preferred. If it is aooepted that I is lognormal
<_ below) then the mean can also be estimated uaiDg the equation on p. 78 with
P = 1.707 and t1 == 0'104 from Fig. 14.2.6. Thill gives antilog1o (i.707 + 1-1513 X
0.1048 ) == 0·524 mg.

Digitized by Google
§ 14.2 ProbitB
A histogram of the distribution of the individual effective doses is
plotted in Fig. 14.2.1 and the estimated mean, median, and modal
IEDs (see § 2.5) plotted on it. The distribution looks positively skewed
and therefore, as expected, mean > median > mode (see § 4.5).

TABLE 14.2.1
Fr~y 1 = percemo,ge 01 animal8 re8pOft(ling in each, do8e if&ten1al.
Oumulative frequency p = total percemo,ge 01 animal8 re8pOft(ling to do8e
equal to or leu than tke tqYJ>er limit 01 each, do8e intenJal. ProbitB (au
§ 14.3) were obtained from Fiaker and Yatu table8 (1963, Table IX,
p. 69). Tke p txUuu are lound a8 tke cumulative BUm 01 tke obat.nJed, 1
txUuu. For example 1S4 = 38+ 15+ 1
Dose log dose Probit
interval Mid-point interval +1 / P /. of
(mg of coca.ine) (.) (:Il) P

0-0-2 0-10 -co-O-301 0 0 0 -co


0-2-0,3 0·26 0-301-0-"7 1 1 0·26 2-67'
0-3-0-4 0·35 0-477-0-602 16 16 6-26 .·006
0-4-0-5 0-45 0·602-0·699 58 54 17-11 5·100
0-6-0-6 0-55 0-699-0'778 26 79 18-76 5-806
0'6-0-7 0-65 0'778-0-845 11 90 7·15 6·282
0-7-0-8 0-75 0-846-0-903 6·5 96-5 '-88 6-812
0-8-0-9 0-85 0'903-0-954 2-5 99 2-125 7-826
0-9-1-0 0-95 0-954,-1-000 1 100 0'95 +co

'E./ 'E./ts =
= 100 61-475

The individual effective dose is of course a contin'U0'U8 variable


and the distribution of IEDs is a continuous distribution (see § 4.1).
However, in order to get an idea of the shape of the distribution it has
been necessary to group the observed IEDs 80 that the histogram in
Fig. 14.2.1 can be plotted. A continuous line has been drawn by eye
through the histogram as an estimate of what the actual continuous
distribution should look like.
In Fig. 14.2.2 the histogram of cumulative frequenoy (P) is plotted
against dose. When a continuous line is drawn through the top right-
hand comer (see below) of each block an unsymmetrical sigmoid curve
is obtained. This is the oumulative distribution, or distribution func-
tion, F(z), (defined in (4.1.4» corresponding to the distribution of lED
shown in Fig. 14_2.1. That is to say the ordinate of the curve in Fig.
14.2.2 for any specified value of the dose, z, is equal to the area under

Digitized by Google
Probits §14.2
the curve in Fig. 14.2.1 below z (cf. Fig. 5.l.2 and its cumulative form,
Fig. 5.l.1, and Figs. 4.l.3 and 4.l.4).
The relation between the lED and quantal responses can now be
illustrated. When a quantal response is obtained the lED itself is not
nt¥2cn¥24¥2rrn~ A fixed dose, given to a grmzp ¥2ubjects
r, of the chosen ub¥2nrved.
nTf",T"'M".1Inn of subjerI'¥2 in the group is, of
zliscontin'U0U8 ¥2cnd if the samr given

60
Mode !::o< 0'44 mg
50

\0

0·2 (l.g \·0

. Histogram effective dnTT mcn,"n""l~,ment in


The frequency, against dose (z), ennt" "uus line
"R~rwn by eye throu,," "iTtc,wam as an estim"k (con-
"Ptribution of indi Tl,£~e"l doses. The ""ew so
median effective dose (shaded area = 50 per cent of total area under curve) is
less than the mean but greater than the modal effective dose (see § 4.5).

repeatedly to many groups of n subjects then the number showing a


response, r, would be expected to vary from trial to trial according to
dj"z:,,)ntinuous binomc{d (see §§ zmbjects
,,,'z:n,,c,n will be those kif8 than

piven, z. ThereI'm'", chosen weir limits


inI'nrval in Table duses of 0·2, 0·3, ,0 mg)
ralues of r/n ,each of the 9 nnimals
would be the same (apart from experimental error) as the values of p
in the tin he table (which is why p and probit [P] are plotted against the upper
limits of each dose interval in Fig. 14.2.2, 14.2.4, 14.2.5, and 14.2.6).

gle
§ 14.2 Probits 349

100

90

80

70

J
c 60
~
8.
~ 50 ------------- ---
j
::I
e~ 40
~

30

20

IfI
Mode (maximum slope) ~0·44 mg
Median ~ 0·49 mg
oL--.&:;::::::::;;;;~==:::i.-L..Ji-~.I.."___.,""'=_..:...,,..L,_-....,.J..,,-__,.J
()·I 0·2 ( ·5 0·6 0·7 0·8 0·9 1·0
% (dose mg)

FIG. 14.2.2. Results from Table 14.2.1. The histogram is plotted. using the
cumulative frequency p, against dose •. The blocks, each of height I, from Fig.
14.2.1, have been put above each other 80 that the total height is p. The sigmoid
curve has been drawn by eye through the top right-hand comer of each block
(see text) &8 an estimate of the true (continuous) cumulative distribution (i.e.
the distribution function, see § 4.1) of individual effective doses, i.e. the ordinate
is the percentage of animals with an individual effective dose equal to or leu
than •.

Digitized by Google
3&0 ProhiU

M-. mecliaD. ADd modIIllBD


=antilog••T·707~ 0-509 mg
50 (a,
I
I
;r
I i"
g:
540 •
~
.f-= 30 3
!r
i.
!:: ~
:.
t-
~
2Ct 2 ,...
io
c
r
:a. 10
!.
II
::
t)
.,
c
<
!!.
log doee (l + log : or I +zl

Mean.median.and modal lED


(b. =antilog•• T·707=0·509 mg
i
Mean ~T707
I

8D=0'104

0·7 (1·8 0·9 1·0


101( dOlle (1 + log : or I +z)

FlO. If.2.S. Result. fl'Om Table 14.2.1. Histogram of individual effective


doee measurement. with doee on a logarithmlc BC&Ie, rather than on an arith-
metic BC&Ie aa in Fig. 14.2.1 (1 baa been added to the logs to avoid negative
values). Now that the blocks of the histogram are not of equal width, their area
is no longer pl'Oportional to their height, 80 a convention must be adopted aa to
whether area or height aha.Il represent frequency.
(a) In this figure height represent. frequency i.e. frequency (left-hand scale)
is plotted against log dose. The height. of the blocks are aa in Fig. 14.2.1. The
continuous curve is a GaWlBia.n (normal) distribution, calculated using the mean
of the log individual effective doses (1'707), and the standard deviation of the
log lED (0'104), estimated from Fig. 14.2.6 aa described in the text. The pl'Ob-
ability density (right-hand) BC&Ie baa Deen chosen to make the a.reaa under

Digitized by Google
§ 14.2 ProbitB 31n
So if the quantal responses (values of rIft) were plotted against the
dose an unsymmetrioaJ sigmoid dose-response ourve like the continuous
line in Fig. 14.2.2 would be expected.
Thus when quanta.! responses are measured the dose is fixed by
the experimenter and the number (or proportion) of subjects responding
is the variable measured. On the other hand, in direct 8.ssa.ys the dose
is not fixed but is the variable quantity measured by the experimenter.
The subjects responding in the quanta.! experiment are the subjects
in the group with an lED equal to or less than the fixed dose given. No
information is obtained about IEDs of a Bingle animals 80 Fig. 14.2.1
cannot be plotted directly (though it can of course be obtained by
plotting the slope of the quanta.! dose-response curve, Fig. 14.2.2,
against dose, i.e. by differentiation of Fig. 14.2.2 (this was shown in
(4.1.5».
The cumulative curve in Fig. 14.2.2 is anaJOgOUB to an ordinary
dose-response curve, for example the tension (a continuous variable)
developed by a smooth muscle preparation in response to various
doses of histamine. Because it is easier to handle a straight line than a
ourve, it is usual to look for ways of converting dose-response ourves to
straight lines. A method of doing this that often works in the oase of

histogram and the GaWlllian curve equal, but the two are stUl not comparable
because area represents frequency for the continuous curve (see § 4.1), but not
for the histogram.
(b) The histogram in this figure baa been constructed 80 that the area of each
block represents frequency, I, or, more precisely, the proportion II'I:.I = 11100
in this eumple. The area is the height (1& say) times the width of the log dose
interval (~ say). For eumple, the first and laat blocks represent a fioequency of
l:z 1 per cent (see Table IS.2.1) 80 the first and laat blocks are of equal height in
FIgs. U.2.1 and U.2.8(a). However, in Fig. 14.2.8(b) they have equal areas (each
have 1 per cent of the total area), and therefore UDequal heights. By deflnition,
proportion = 11100 = 1&~ = area. For e:umple, for the first block ~ = 0·477
-O·SOI = 0'176, 80 the heisht (probability deusity) is 1& = Ill00~ = 1/17'6
= 0'05682, aa plotted. For the laat block ~ = 1'0 -0'954 = 0'046, 80 1& =
III00~ = 1/4·6 = 0·2174 aa plotted.
The area convention shown in Fig. U.2.S(b) is the preferable one, because It
shows the shape of the distribution correctly when the widths of the groupe are
not equal (though only at the expense of making it not obvious when frequencies
are equal, because it is more di1Dcult to judge relative areas than relative heights
by eye). The continuous curve Is a GaWlllian curve with the same mean and
standard deviation aa in Fig. 14.2.8(a), and it can now be compared directly
with the hiatogram because both have been plotted wdng the same (area) con-
vention (see § 4.1), and both have a total area of 1·0. The GaWlllian curve is seen
to fit the obeervationa N880nably well.

Digitized by Google
352 Probil8 114.2

(II,

t
i 50 --------------
=
.~

1
8
40

20

Mean median
10 . and modal lED
=antilog1-707
=0'509 mg
S~'3===0~'4~~&~5===0~'6~~0~'7~=0~'8;:~O~'9~~1'0'
log dOle (1+ log :)

Flo. 14.2.4. Reau1ts from Table 14.2.1. Cumulative frequency, p, plotted


against dose (II: = log z). This figure is related to Fig. 14.2.3 in the aame way as
Fig. 14.2.2 is related to Fig. 14.2.1.
The blocks (obeervations) are each the height of the I values in Table 14.2.1,
and they are put above each other 80 the total height gives the p value from
Table 14.2.1. The blocka are the aame as those in Fig. 14.2.3(a), and the height
of each block is proportional to the area of each block (i.e. the frequency, I) in
Fig. 14.2.S(b).
The continuous curve is an estimate of the true (continuous) cumulative
distribution (i.e. the distribution function, see § 4.1) oflog lED values. In other
worda, the ordinate is the percentage of animals with a log lED equal to or leIIB
than 11:. The continuous curve in this figure is related to that in Fig. 14.2.S(b)
in euctly the aame way as the blocks are related; it is a calculated Gausalan
distribution function (see 14.1 and text)-the ordinate is the area to the left of tie.
under the calculated GaWlllian distribution in Fig. 14.2.3(b), just as the ordinate
for the blocks is the total area of the blocka below tie under the histogram in
Fig. 14.2.S(b). The calculated Gaussian function fits quite well (the continuous
curve in Fig. 14.2.2 fits euctly only because it was drawn through the obeervatlons
by eye).

Digitized by Google
§ 14.2 ProbitB 353
7·5 . - - - - - - - - - - - - - - - - - - - - - - . . , 9 9 · 4

7'0 98

6·5
90
6·0
80

... 5·5 70p


..
iii,
0
80
:!
e 5·0
ll.
50
40
4·5 30

20
4·0
10
3·5

3·0 2

2'5
0·2

FlO. 14.2.5. Reaults from Ta.ble 14.2.1. Plot of the probit of J' a.ga.inst the
doae (.). The corresponding percentage aca.le is shown on the right for comparison.
The non-linearity indicates that lED values are not norma.lly distributed. A
smooth curve baa been drawn through the points by eye and the median lED
(p = 50 per cent, probit [P] = 5) is estimated to be 0'49 mg, 88 W88 also found
by interpolation in Fig. 14.2.2 (cf. Fig. 14.2.6, which gives a. slightly cWrerent
estimate).

quantal responses is disCUBBed in § 14.3, and illustrated in Fig. 14.2.3-


14.2.6, which show various manipulations of the original results.

14.3. The problt transformation. Linearization of the quanta.


dose response curve
When dealing with continuously variable responses it is common
practice to plot the response against the logarithm of the dose (x
= log z, say) in the hope that this will produce a reasonably straight
dose response line. H p (from Table 14.2.1) is plotted against the log

Digitized by Google
3li4 ProbitB § 14.3

7·0

6·5

6·0

5~ -------------------------
...o
SI,

... 5· ----------------------

~
4~ -------------------

4·0

3·5

Median lED
0·3 =antilog 10T'707
=0·509 mg
2·5 L.----:::'-:---::"::----::l-=-.i.....::~-'-_:_L:_-__:_'::__~
0·4 0·5 0·6 0·7 0·8 0·9 1·0
log dOBe (I +log1.:)

FIG. 14.2.6. Results from Table 14.2.1. Plot of the probit of p aga.inat log
dose (I/: = log s). The graph is reasonably straight, indicatmg that log lED
values are approximately normally distributed (i.e. lED values are approximately
lognormal, see § 4.5). The reciprocal of the slope (1/9,60 = 0·104) estimates the
standard deviation of the normal distribution of log lED values, and the dose
corresponding to p = 50 per cent (probit(p] = 5), i.e. antilog I'707 = 0'609 mg,
estimates the median (= mean = mode) of the distribution of log lED values.
The distribution plotted with this mean and standard deviation is shown in
Fig. 14.2.3. The estimate of the median effective dose from this plot, 0·609 mg,
is d1f!erent from that obtained from Fig. 14.2.2 and 14.2.5 (0·49 mg). This is
becauae a atra.Ight line baa been drawn in this figure, using all the points; and
the dose corresponding to probit(p] = 5 baa been interpolated from the atraIght
line even though it does not go exactly through the points. This would be the
best procedure if the true line were in fact straight (Le. if the population of log
lED values were in fact Gausaian). In Figs. 14.2.2 and 14.2.5, curves were
drawn by eye to go exactly through all the points, 80 effectively on the obaerva-
tions on each side of probit(p] = 5 were being used fO!' interpolation of the
median, whereas when a straight line (or other rpecijitJd function) is fitted, all the
observations are taken into account. In a real quantal uperiment the IItrataht

Digitized by Google
§ 14.3 ProbiU 366
dose, the result, shown in Fig. 14.2.4, is not a straight line but a
symmetrical sigmoid curve. (In fact similar results are often observed
with continuous responses also.)
A way of converting the results to a straight line is suggested. by
Fig. 14.2.3, in which / rather than p is plotted against log dose. The
histogram has become roughly symmetrical compared with the skewed
distribution of IEDs seen in Fig. 14.2.1. The continuous line in Fig.
14.2.3 is a calculated normal (Gaussian) distribution with a mean
and standard deviation estimated as described below and illustrated
in Fig. 14.2.6. The calculated normal distribution is seen to fit the
observed histogram quite well suggesting that the logarithms of the
IEDs (values of x = log z) are normally distributed, i.e. that the
IEDs (values of z) are lognormally distributed (see § 4.5) • .Any ourve
can be linearized i/ the mathematioal formula describing it is known.
The sigmoid curve in Fig. 14.2.4, the cumulative form of the distribu-
tion in Fig. 14.2.3, is a cumulative normal distribution. This was
illustrated in Fig. 4.1.4, which shows the cumulative form, p == F(x),
of the normal distribution in Fig. 4.1.3. If the abscissa in Fig. 4.1.4 is
some measure of the effective dose then the ordinate of the cumulative
normal distribution is
p = F(x) = area under normal curve below x
= proportion of animals for which lED ~ x, (14.3.1)
i.e. exactly what is plotted as the ordinate in Fig. 14.2.2 and 14.2.4.
The formula for the integral normal curve shown in Fig. 14.1.4,
from (4.1.4) and (4.2.1), is
[(x-P)~
= F(x) = f_«)av'(2Tr)exp
z 1
P - ~J'dx. (14.3.2)

This curve can be transformed to a straight line if, instead of plotting


p against x, the abscissa corresponding to p is read off from a standard
normal curve (see § 4.3) and thia is plotted against x. For example,
if a dose of x = 3 produces an effect in 16 per cent of a group of animals,
the abscissa (viz. u = -1, see § 4.3) of the standard normal ourve
corresponding to an area in the lower tail of the curve of 16 per cent

line would be fitted to the pointe in this figure using the iterative method dis-
cWllJeCi in § 14.4. In this example, the quantal data has been generated, for
illustrative purposes, using actual lED meaauremente rather than by giving
fixed doees to groupe of an.imaJa, so the best that can be done is to fit an un-
weighted straight line (shown) as deecribed in § 12.6.

Digitized by Google
356 P,0bit8 § 14.3
would be read off as shown in Fig. 14.3.1. This value of the absoissa
would then be plotted against the dose (or some transformation of it,
such as the logarithm of the dose), as shown in Fig. 14.3.2.
The abscissa of the standard normal curve, is, as described in § 4.3,
U = (z-I')/a, where a is the standard deviation of Z (i.e. of the log
lED in the present case). So in effect, instead of plotting 'P against z,
tM mlue 01 u corre8fJ01Uli1lfl to 'P (which iB called tM normal equitJalent
detJiation or NED) iB 'Plotted againat x. But because the relation between
U and x,

(14.3.3)

has the form of the general equation for a straight line u = bx+a, the
plot of NED against x will be a straight line with slope l/a and intercept
(-I'/a) iI, and only if, the values of x are normally distributed. This
is because the NED corresponding to be observed 'P were read from a
normal distribution curve.
The values of u are negative for 'P < 50 per cent response and so,
to avoid the inconvenience of handling negative values, 5·0 is added to
all values of the NED and the result is called the probit corresponding
to 'P or probit [Pl. Tables of the probit transformation are given, for
example, by Fisher and Yates (1963, Table IX, p. 68). From Fig.
14.3.1, it is seen that'P = 50 per cent response corresponds to u == NED
= 0, i.e. probit [50 per cent] = 5. Thus
Z-I'
probit [P] == u+5 == NED+5 = 5+-
a

(14.3.4)

so the plot of probit [P] against z will be a straight line (if z is Gaussian)
with slope l/a (as above) and intercept (5-I'/a). Here, as above, a is the
standard deviation of the distribution of x, i.e. of the log lED in the
present case. It is therefore a measure of the heterogeneity of the
subjects (see § 14.4 also).
From Fig. 14.3.1, it can be seen that the NED of a 16 per cent
response (i.e. 16 per cent of individuals affected) is -lor, in other
words, the probit of a 16 per cent response is +4. This follows from the
fact (see § 4.3) that about 68 per cent of the area under a normal
distribution curve is within ±a (i.e. within ±1 on a standard normal

Digitized by Google
§ 14.3 367
0·4

·fO.3
-8
~0·2
j
£ o·}
O·OL-~-==--~--........JL- _ _-'-_ _.....L.-_ _ -'-_~ __

-3 -I 0 +1 +3~EDoru
3 4 Ii 6 8 Probit
(=u+5)
FIG. 14.3.1. Standard GaWl8ian (normal) diet.rl.bution (see Chapter 'I.
Sixteen per cent of individuals responding corresponda to a value of u of -1
(the NED), i.e. to a probit of 4.

8 +3
R.,
'S
... ztz.l
i
~ i +2 ...,0='
~

2 -3----~'------~----~~'------~'------~'
2 :r: 4 5 6
FIG. 14.3.2. If the dose (or transformed dose, e.g. log dose) :e = 3 caused
16 per cent of individuals to respond, the probit of 16 per cent, i.e. 4·0, from
Fig. 14.3.1, would be plotted against:e = 3. See complete plots in Figs. 14.2.6 and
14.2.6.

Digitized by Google
368 Probits § 14.4
curve), and of the remaining 32 per cent of the area, 16 per cent is
below u = -1 and 16 per cent is above 1. +
In Fig. 14.2.6 the probit of the percentage response is plotted against
the dose z. The curve is not straight, implying that individual effective
doses do not follow the Ga1188ian distribution in the animal population.
This has already been inferred by inspection of the distribution shown
in Fig. 14.2.1 which is clearly skew. However, in the usual quantal
response experiment the distribution in Fig. 14.2.1 is not itself observed.
The directly observed results are of the form shown in Fig. 14.2.2; and
it is not immediately obvious from Fig. 14.2.2 that individual effective
doses are not normally distributed. When the probit of the percentage
response is plotted against log dose, in Fig. 14.2.6, the line is seen to be
approximately straight, showing that (in this particular instance) the
logarithms of the individual effective doses are approximately normally
distributed (cf. Fig. 14.2.3).
The use of this line is discUBBed in the next section.

14.4. Problt curves. Estimation of the median effective dose and


quanta. 8888ya
The probit transformation described in § 14.3 can be used to estimate
the median B/lective doaB or concentration; that is, the dose (or con-
centration) estimated to produce the effect in 50 per cent of the popula-
tion of individuals (see § 2.6). This dose is referred to as the ED60
(or, if the effect happens to be death, as the LD50, the median lethal
doaB). If the individual effective doses (IEDs) are measured on a
scale on which they are normally distributed the ED60 will be the
same as the mean effective dose (e.g. in § 14.3 the median effective log
dose is the same as the mean effective log dose, see § 4.5 and Figs. 14.2.1
and 14.2.3).
The procedure is to measure the proportion (p = r/n) of individuals
showing the effect in response to each of a range of doses. The propor-
tions are converted to probits and plotted against the dose or some
function (usually the logarithm) of the dose. H the curve does not
deviate from a straight line by more than is reasonable on the basis of
experimental error (see below) the ED50 can be read from the fitted
straight line. From Fig. 14.2.6 it can be seen that the graphical estimate
of the ED50 is 0·509 mg, the antilog of the log dose corresponding to &
probit of 5 as explained in § 14.3.
Furthermore, the slope of Fig. 14.2.6 is an estimate of l/a, according
to (14.3.4); so the reciprocal of this slope (i.e. 1/9,60 == 0·104 log10

Digitized by Google
§ 14.4 ProbitB 369
units, from Fig. 14.2.6) is an estimate of a, the standard deviation of
the distribution of the log lED, and this value of a was used to plot
the distribution in Fig. 14.2.3. This standard deviation is a measure of
the variability of the individuals, i.e. of the extent to which they do not
all have the same individual effective dose.
Assays of the sort described in Chapter 13 can also be done using
quantal responses, and if a log dose scale is used they will be parallel
line assays (see § 13.1).
In all the applications discussed the problem arises of how to fit the
'best' line to the observed points. Methods for doing this have been
described in Chapters 12 and 13 but they all assume that the scatter of
the observations is the same at every value of x, i.e. that the results
are homoscedastic (see §§ 12.2 and 13.1). This is not the case for probit
plots (see § 14.6 for an exception) and this complicates the process of
curve fitting. Numerical examples of the methods are given by Burn,
Finney, and Goodwin (1950, p. 114), and Finney (1964, Chapters
17-21).
The reason for the heteroscedasticity is not difficult to see. The
number of individuals (r) responding, out of a randomly 8elected (notice
that random selection is, as usual, essential for the analysis) group of
n., should follow the binomial distribution (§§ 3.2-3.4), and the variance
of the proportion responding, p = rln., would be estimated from
(3.4.5) to be var[p] = p(l-p)/n.. Because the line is to be fitted to the
plot ofprobit[p](= y. say) against dose metameter, it is the variance of
y = probit[p] that is of interest. From (2.7.13) it is seen that var[g]
~ var[p].(dy/dp)2 = (dy/dp)2.p(l-p)/n.. Now the standard normal
curve, in Fig. 14.3.1 can be written (by (4.1.1» as dp = / dy, and thus
dy/dp = 1/1, where / is the ordinate of the standard normal curve
(the probability density, see § 4.1 and (4.2.1) ;/was used with a different
meaning in §§ 14.2 and 14.3). This result follows, slightly more rigorously
from (14.3.1) and (4.1.5). Therefore var[g] ~p(l-p)/n.r, and this is
not a constant but varies with p. The probit plot is therefore hetero-
scedastic and eaoh probit (y value) must be given a weight I/var[g]
~ n.rlp(l-p) when fitting the dose response lines (of. §§ 2.5 and 13.4);
it is this that gives rise to the complications. When a line is fitted. it will
lead to a better estimate of the y corresponding to each x, and hence to
better estimates of the weights and hence to a better fitting line. The
calculation is therefore iterative.
It is because of the existence of this theoretical estimate (of. § 3.7)
of var[g] that the deviations from linearity of Fig. 14.2.6 can be tested

Digitized by Google
360 ProbitB § 14.4
even though there is only one observation (y value) at each x value
(of. §§ 12.5 and 12.6).
H the weight is plotted against p {Fisher and Yates (1963, p. 71)
give a table ofr/p{l-p», it is found to have a maximum value when
p = 0'5, i.e. 50 per cent response rate. This is the reason why the
ED50 is caJculated as a measure of effectiveness. It is the quantity that
can be determined most precisely.

Phe, mi1&imum eJ/edive doBe


This term is fairly obviously meaninglesst as it stands (unless the
lED is the same for all individuals). The larger the sample the larger
the chance that it will contain a very susceptible individual (from the
lower tail of Fig. 14.2.3) 80 the lower the estimate of the minimum
effective dose will be. Clearly it is necessary to specify the Pf'01IOrlio1& of
individuals affected in the population. It was 50 per cent in the diS0U8-
sion above.
Unfortunately, it is often not of interest to known the ED50. H
one were interested in the proportion of individuals suffering harmful
radiation effects from the fall-out nuclear explosions it is (or should
be) only of secondary interest to known what dose of radiation will
harm 50 per cent of the population. What i8 required is an estimate of
the dose ofradiation that will not harm anyone. No answer other than
zero dose is consistent with the lognormal distribution of individual
effective radiation doses usually 8BBumed because the normal distribu-
tion of the log lED is asymptotic to the dose axis (see § 4.2), zero
effect being produced only by log dose = -00, i.e. zero dose. The
question is not compatible with a normal distribution of doses either,
as this would imply the existence of negative doses. This is a very real
problem because when dealing with a very large population a very small
proportion harmed means a very large number of people harmed. Suppose
that it is decided that the EDO·Ol shall be estimated, i.e. the dose
affecting 0·1 per cent of the population (about 0·0001 X 3500 million
= 350 000 people on a world scale I). The weight,r/p{l-p), correspond-
ing to P = 0·0001 (i.e. probit [P] ~ 1.3) is seen from the tables to be
0'00167, compared with 0·6366 (about 380 times larger) for p = 0·5.
Thus to estimate the EDO'O 1 with the same precision as the ED50, the

t This neceMi.tatee the abandonment of the conventional definition of the unit of


beauty (Marlowe l6Of). viz. one milliHelen = that quantity of beauty just auftloient
to launch a Bingle ship. An altemative definition could follow the linea of the purity in
heart index discu~ in § 7.S.

Digitized by Google
361
sample size (n) would have to be much bigger. And this is not the
only problem. Working with small proportions means working with
the tails of the distribution where assumptions about its form are least
reliable. For example, the straight line in Fig. 14.2.6 might be extra-
polated-a very hazardous process, as was shown in § 12.6, Fig. 12.5.1.

14.6. U.. of the problt transformation to linearize other sorts


of sigmoid curve
The probit transformation can be tried quite empirically in attempt-
ing to linearize any sigmoid curve if the ordinate can be expressed as a
proportion (see § 14.6). Hthis is done the variability will not usually be
binomial in origin so the method discussed in § 14.4 cannot be used,
and curves should not be fitted by the methods described in books on
quantal bioassay. It would be necessary to have several observations at
each x value to test for deviations from linearity, and the assumptions
discussed in § 12.2 must be tested empirically.
An example is provided by the osmotic lysis of red blood cells by
dilute salt solutions. It is often found that the plot of the probit of the
proportion (P) of cells not haemolysed against salt concentration
(not log concentration) is straight over a large part of its length.
This implies (see § 14.3) that the concentration of salt just sufficient to
prevent lysis of individual cells (the lED) is approximately normally
distributed in the population of cells with a standard deviation esti-
mated, by (14.3.4), as the reciprocal of the slope of the plot. In this
sort of experiment each test would usually be done on a very large
number (n) of cells so the variability expected from the binomial
distribution, p(l-p)/n, would be very small. However, in this case
most of the variability (random scatter of observations about the
probit-concentration line) would not be binomial in origin but would
be the result of factors that do not enter into the sort of animal experi-
ment described in § 14.3, such as variability of n from sample to sample,
and errors in counting the number of cells showing the specified response
(not lysed).

14.8. Loglh and other transformations. Relationship with the


Mlchaells-Menten hyperbola
The use of the probit transformation for linearizing quantal dose
response curves is, in real life, completely empirical. The probit of the
proportion responding is plotted against some function (x, say) of the
dose, and the plot tested for deviations from linearity. However, there

Digitized by Google
362 Probits § 14.6
are many other curves that closely resemble the sigmoid cumulative
normal curve in Figs. 4.1.4 and 14.2.4. One example is the logistic
curve definedt by
1
(14.6.1)

This plotted in Fig. 14.6.1, curve (b), and is seen to be very like the
cumulative normal curve. If the relation between p and x was repre-
sented by (14.6.1) then it could be linearized by plotting logit[p]

P=Y/Y...
1·0

256 384 512 % (ordinary arithmetic Bcale)

'(I"" ~"'} different


2 4 8 16 32 64 128 256 512
I I I I I I I I I 102.
I
1 2 3 4 5 6 7 8 9 10 :r=log. % ways of
0·3 0·6 0·9 1·2 1·5 1-8 2·1 2·4 2·7 3'0 x = log z pldtting
10 x=log:
0·6931·39 2·08 2·77 3-47 4·16 4·85 5·55 6·24 6·93 x=log.:

FIG. 14.6.1. Curve (a) Plot of p against II from eqn. (14.6.8). When b = 1
this curve is part of a hyperbola.
Curve (b) Plot of p against :e from eqn. (14.6.1). This curve is the same sa
curve (a) with II plotted on a logarithmic scale (three equivalent ways of plotting
:e = log II are shown). It is a logistic curve and can be linearized by plotting
logit{p] against :e.
The particular values WJed to plot the graphs were K = 100, b = 1.

t An equivalent definition of the logistio curve that may be enoountered is

where the hyperbolio tangent is defined by ~ = (eI.-I)/(eI.+ I). In terms ofthia


definition logit [p] == 2tanh- I (2p-l) = a+b.

Digitized by Google
§ 14.6 prob;ts 363
(instead of probit) against ~, where logit[p] is defined as log.{P/(I--p)}.
This follows from (14.6.1) which implies

logit[p] = log.(1 P p) = log.(e-<:+bz» = a+= (14.6.2)

which is a straight line with slope b and intercept a. (Remember that,


in general, log.ez = ~ because the log is defined as the power to which
base must be raised to give argument. This implies, also, that e Z can be
written, in general as antilogeZ, which is used below in deriving
(14.6.3).) The use of this and other transformations for analysing
quantal response experiments is described by Finney (1964). The
probit and logit transformations are too similar for it to be possible
to detect which fits the results better, with quantal experiments of the
usual size.
The logit transformation is also a linearizing transformation for
the hyperbola discussed in § 12.8, and plotted in Fig. 14.6.1, curve
(a). In this application the response, 11, is a continuous variable, not a
quantal variable. The linearity follows by taking ~ = log.z (using z
to represent dose, or concentration, rather than ~ which was used for
this purpose in § 12.8), and p = 1I/1Imu. i.e. 11 expressed as a proportion
of its maximum. possible value, the value approached as z becomes
very large (in § 12.8 1Imaz was called V). H the constant, a, is redefined
as -log.K then putting these quantities into (14.6.1) gives

11 1 1
P = Ymaz = l+e-r - II- - = l+e-r - -..
1 1

(14.6.3)

which is the hyperbola (12.8.1), in the special case when b = 1. As


mentioned in § 12.8, this is the Michaelis-Menten equation of bio-
chemistry (when b = 1). The more general form, (14.6.3), has been
used, for example, in biochemistry and pharmacology (the Hill equa-
tion). The plot of logit [P] against log z is known as the Hill plot. The
use and physical interpretation of the Hill plot are disoussed by Rang
and Colquhoun (1973).

Digitized by Google
364 ProbitB § 14.6
Summarizing these arguments, if the response y, plotted. against
dose or concentration z, follows (14.6.3) (which in the special case
b = 1 is the hyperbola plotted in Fig. 14.6.1, curve (a), and in Fig.
12.8.1), then the response plotted against log concentration, x = log z,
will be a sigmoid logistic curve defined by (14.6.1) and plotted in Fig.
14.6.1, curve (b). And logit fJ//Ymaz] plotted against x will be a straight
line with intercept a = -log X, and slope b. Quite empirically,
equations like (14.6.3) are often found to represent dOB&-response
curves in pharmacology reasonably well (the extent to which this
justifies physical models is discussed by Rang and Colquhoun
(1973», so plots of response against log dose are sigmoid like Fig.
14.6.1, curve (b). The central portion of this sigmoid curve is sufficiently
nearly straight to be not grossly incompatible with the assumption,
made in most of Chapter 13, that response is linearly related to log
dose.
It is worth noticing that the sigmoid plot of Y against x in Figs.
14.2.4 or 4.1.4 (the cumulative normal curve, linearized by plotting
probit[P] against x) looks very like the sigmoid plot of Y against x in
Fig. 14.6.1, curve (b) (the logistic curve, linearized by plotting logit
[P] against x). However, if x is log z, then the corresponding plots of Y
against z (e.g. response against dose, rather than log dose) are quite
distinct. The corresponding plots are, respectively, that in Fig. 14.2.2
(the cumulative lognormal distribution, see § 4.5), which has an
obvious 'foot', it flattens off at low z values; and the hyperbola in
Fig. 14.6.1, curve (a), which rises straight from the origin with no
trace of a 'foot' or threshold'. This distinction is effectively concea.Ied
when a logarithmic scale is used for the abscissa. (e.g. dose).
In order to use the logit transformation for continuously variable
responses it is necessary to have an estimate of the maximum response,
Ymax' This introduces statistical complications (see, for example,
Finney (1964, pp. 69-70». A simple solution is not to bother with
linearizing transformations except as a convenient method for pre-
liminary assessment and display of results, but to estimate the para-
meters Ymax' X, and b directly by the method of least squares as des-
cribed in § 12.8.

Digitized by Google
Appendix 1
Expectation, variance, and non-experimental bias

THE object of this appendix is to provide a brief account of some rather


more mathematical ideas which, although they are not necessary for following
the main body of the book, will be useful to anyone wanting to go further.
Also some of the following results will be useful in Appendix 2. Further
explanation will be found, for example, in Brownlee (1965, pp. 51, 57, and
87), Mood and Graybill (1963, p. 103), or Kendall and Stuart (1963, Chapter
2). All the ideas discussed in this section require that the distribution of the
variable be specified.
A1.1. expectation-the population mean
The population mean value of a variable is called its ~ion and is
defined ast
E(%) = I % P(%) for discontinuous distributions, (Al.l.1)
aU s

E(%) = i +...... %f(%) dz for continuous distributions. (Al.l.2)

This can be regarded as the arithmetic mean of an indefinitely large number


of observations on the variable %, the distribution of which is specified by
the probability P(%) (discontinuous), or the probability density f(%) (con-
tinuous), as explained in §§ 3.1 and 4.l. The reasonableness of the definition
(Al.l.1)is obvious ifa large but finite number of observations, N, is considered.
On the average proportion P(%) of the observations will have the value %, so the
number of observations with the value % will be f = N P(%) and the total of
the f observations will be f%. The total of all N observations will be T.p,
and their mean will therefore be T.f%/N, which is exactlyeqn. (Al.l.1) if
liN is substituted for P. The form for continuous variables, (Al.l.2), is just
the same as (Al.l.1) except that the P is replaced by dP = f{%) dz (from
(4.l.1», and consequently summation is replaced by integration.
As a numerical example take the binomial distribution with n = 3 and
{IJ(B) = 0·9 from Table 3.2.2 and Fig. 3.2.4. From (Al.l.l)
r-3
E(r)= I rP(r) = (0 X 0'(01)+(1 x 0,027)+(2 X 0,243)+(3 X 0·729) = 2'7,
r-O

t If AI, considered as a random variable, is denoted II to distinguish it from AI oonsidered


as an algebraio symbol (as in (4,.1.4,) and § A2.6), the definition of expectation can be
written in the preferable form E(II) = lA:P(III). etc.

Digitized by Google
366 §AI.I
whioh is n9' (= 3 x 0·9 = 2'7), the population mean value of r (number
of SUooes&e8 in three trials), as mentioned in § 3.4. Notice that in this case
the mean value is never actually observed. All observations must be integers.
Several properties follow directly from the definition of expectation.
For example, for a linear function, where a and b are constants,
E[a+b3:] = E[a]+E[b3:] = a+bE[%] (Al.l.3)
and also, more generally.

(Al.l.4)

= NE[%] if all %•• have the same mean. (Al.I.5)


But for a nonlinear funotion. g(%) say,t
E[g(%)] =F g(E[%]). (Al.l.6)
so averaging a function of % will not give the same answer as averaging
% first and then finding the function of the average (cf. (2.5.4); the arithmetio
mean of log % is not the log of the arithmetic mean of %, but the log geometric
mean). See also (Al.2.2).
M ea.n oj 1M Poi88cm diatribulion
It was stated in §§ 3·5 and 5·1 that". in (3.5.1) was the mean of the
Poisson distribution. This follows, using (AI.l.I) and (3.5.1), giving

E(r) = I rPer) = ,.0


,-0
I (r. ~re_M)
r

(AI.I.7)

= ".
M ea.n oj 1M nonnaZ diatribulion
Using (Al.l.2) the statement that the parameter pin (4.2.1) can be inter-
preted as the mean of the normal distribution can be justified. From (AU.2),

E(%) = i: %J(%)dz = i: £fJ+(%-p)]J(%) dz

= p i:J(%)dz+ i: (%-p) J(%) dz

= p+O = p (AU.S)

t The expectation of. function of #J: ie defined in (AI.I.16) and (Al.l.17).

Digitized by Google
§A1.1 367
because the first integral is the area under the whole distribution curve,
i.e. 1. The second integral is zero because, using (4.2.1) for the density, 1(:1:),
of the normal diatributionand putting 1/ = (:I:-pr~80thatd.y = 2(:I:-p) dll:,
it becomes

1
aV(2w)
f(:I:-p)e-CZ-lI)II/S"IIdll: = 1
aV(2w) . 2
f !e-Jl/S"lId1/
= - V~)[e-CZ-")II/S"II]:,", = O. (AI. 1.9)

Mean and median oll1ae ~ tli8eribulion


The exponential distribution of intervals between random events waa
introduced in Chapter 5 and is discussed in more detail in Appendix 2. It
was defined in (5.1.3) by the probability density
1(:1:) = ).e-.I% for :I: ~ 0,
(A1.I.IO)
1(:1:) = 0 for :I: < 0,
which is plotted iD 'Fig. 5.1.2. It was argued in § 5.1 that the popuJation
mean interval between events must be A-I. This followa from (Al.I2) which
gives

The lower limit can be taken as 0 rather than -00 because, from (Al.l.IO),
1(:1:), and hence the integral, is 0 for :I:< O. This can be evaluated using inte-
gration by parts. See, for example, Thompson (1965, p. 188), or Massey and
Kestelman (1964, pp. 332 and 402).) Putting u = :1:, 80 du = dll:, and dt1
= A-.I%dll: 80 t1 = J).e-.I%dll: = [-e-.I%], gives

E(:I:) = Judt1 = [ut1]-Jmu


= [ -ll:e-.I%J~ -i'"'(-e-.I%)dll:

=0- [TJo =1=A- 1 •


e-~CO I
(A1.l.11)

To evaluate this notice that :re-.I% _ 0 as :I: _ 00; see, for example, Massey
and Kestelman (19M, p. 122). The area under the distribution curve up to
any value :1:, i.e. the probability that an interval is equal to or less than :I:
is, from (5.1.4),
F(:I:) = I-e-.I% (Al.1.12)
80 the proportion of all intervals that are shorter than the mean interval,
putting:l: = A-I in (A1.1.12), is
F(A- 1 ) = l-e- 1 = 0'6321, (A1.I.13)
i.e. 63·21 per cent of the area under the distribution in Fig. 5.1.2liea below

.,
the mean (1.0 in Fig. 5.1.2) .

Digitized by Google
368 §AI.I
The median (see § 2.5, p. 26) length of the intervals between random
events is the length such that 50 per cent of intervals are longer, and 50 per
cent shorter than it, i.e. it is the value of z bisecting the area. under the
distribution curve. If the population median value of z is denoted Zm then,
from (Al.l.12),
l-e- um = 0'5,
i.e. Zm = l-llog 2 = 0·69315l- 1• (Al.I.14)
This is shown on Fig. 5.1.2. As expected for a positively skewed distribu-
tion (see § 4.5), the population median is leas than (in fact 69·315 per cent of)
the population mean, l-l. The mode of the distribution is even lower at
Z = 0, as seen from Fig. 5.1.2.
The variance of an exponentially distributed variable, from (A1.2.2), is
v&t(z) = l-I. (AI.1.15)
For details see, e.g. Brownlee (1965, p. 59).

PM ~ion 0/ G fufldion 0/ Z
The expectation, or long run mean, of the value of any function of z, say
g(z) , can be found without first finding the probability density of g(z).
The derivation is given, for example, by Brownlee (1965, p. 55). The results,
analagous to (A1.l.I) and (AI.1.2), are
E[g(z)] = 14(z) P(z) for discontinuous distributions, (AI. 1.16)

E[g(z)] = i + 1It
_ lit g(z) I(z) dz for continuous distributions. (AI. 1.17)

The expectation of a function of two random variables is discusaed in


§ A1.4.
A1.2. Variance
For any variable z, with expectation 1', the population variance of z is
defined as the expected value (long run mean) of the square of the deviation
of z from I' = E(z), i.e.
v&t(z) = E[(Z_p)l] (A1.2.1)
= E[r]-(E(z])l. (AI.2.2)
The second form of the definition follows from the first by expanding the
square. It shows that E[r] is not the same, in general, as (E[Z])I (this is an
example of the relation (A1.1.6».
Use of this definition in conjunction with (A1.1.1) or (Al.1.2) gives, for
example, the variance of the binomial distribution as ft.9'(I-SJ), of the
Poisson as m, of the exponential as l-I, and of the normal as ai, as asserted
in (3.4.4), (3.5.3), (A1.1.15), and (4.2.1). For details of the derivations see
the referenoes at the beginning of this section. The mean and variance of a
function of two random variables is discussed in § A1.4.

Digitized by Google
§A1.2 369
The sIQ/TlIlo,rdizetl/orm of any random variable, z, can be defined &8 X say,
where
X _ z-E[z]
(Al.2.3)
- YV&t(z)
(see for example, the standard normal distribution, § 4.3). X must always
have a population mean of zero, and population variance of one because

E[X] = E[Z-E[Z]] = E[z]-E[z] = 0 (A1.2.4)


yv&t(z) yv&t(z)
and, from (AI.2.2), (AI.2.4), and (Al.2.I),
v&t(X) = E[X2]_(E[X])2 = E[X2]

= E[(Z-E[Z])~ = v&t(z) = 1. (A1.2.6)


v&t(z) - J v&t(z)

A1.3. Non-experimental bias


It bas been mentioned in § 2.6 in connection with the standard deviation,
and in § 12.8, that estimates of quantities calculated from observations may
be biased. even when the observations themselves have no bias at all. In this
C&Be the estimation method (i.e. the formula used to calculate the sample
estimate, say 6, of a parameter 8 from the observations) is said to be biased..
An estimation method is said to have a bias = E[6]-9, and it is said to be
unbiased. if
E[6] = 8. (A1.3.I)
For example, the sample arithmetic mean is an unbiased. estimate of the
parameter E[z] (whatever the distribution of z) because E[z] = E[z].
Using (A1.1.4),

E[z] = EC~Z/NJ= E[la]/N =~(E[z])/N = NE[z]/N = E[z] = p.


(AI.3.2)
Furthermore, (2.6.3) gives an unbiased. estimate of the population variance,
var(z) or o2(z), for any distribution because, from (Al.l.3), (Al.l.4), and
(A1.2.1),

E[~(Z-P)~
N -J =
E~(Z_p)2]
N =
~E[(Z_p)2]
N =
Nv&t(z)
N = v&t(z).
(Al.3.3)
However, if p is replaced by its (unbiased) sample estimate, z, an unbiased
estimate of v&t(z) is no longer obtained (as diBcUBBed in § 2.6). If tr =
~(z-z)2/N then
i-N
Ntr =I (Z_Z)2 = ~[(z-p)_(i_p)]2
i-I

Digitized by Google
370 Appmdi3: I § AI.3
because 2(i-p). l:(z-p) = 2(i-p) (u-Np) = 2(i-p) (Ni-Np) =
2N(i-p)l. Thus, using (AU.3), (AI.1.4), (A1.2.1), and (2.7.8),
NE[8I] = E[l:(Z_p)"_N(i-p)l]
= l:E[(Z_p)I]-NE[(i-p)l]
= NE[(Z_p)I]-NE[(i-p)"]
= N va-t(z)-N va-t(i) = N va-t(z)-N va-t(z)/N

and 80 E[8I] = v&t(Z)(NN I)or o2(N;1). (A1.3.4)

Because 81 is a biased. estimate of va-t(z), its expectation being less than


va-t(z) , it is not used. Instead it is multiplied by N/(N -I) to correct the
bias, giving the usual estimate, (2.6.2), with N -I rather than N in the
denominator.
A1.4. Expectation and variance with two random variables. The
sum of a variable number of random varlablest
In dealing with & function of two random variables, a proper procedure is to
average over one of them holding the other fixed, and then to average over the
other. This is rather like averaging the rows in a square table, and then averaging
the row averages to find the grand average. The proof will be outlined for the
ca.se of the sum of a randomly variable number of random variables, but the result
<S§ Al ••. ll and Al ••• 12) is general. The result is used and m1llltl'ated in § 8.6.
Relevant information will be found, for example, in Mood and Graybm (1968,
p. 117) and Bailey (196.).
It will be necessary, as on pp. 68 and 388, to distinguish between random
variables denoted i, m, etc., and particular values that these variables may
take, denoted 8, m, etc.
Suppose we are interested, as in § 8.6, in the sum, itself a random variable
denoted S"., of m values of 8, where m and Z are random variables, i.e.
S". = 81+111+ ... +8.. (Al ••. l)
The population means and variances of the variables will be denoted, for brevity
E(m) := p", lI.at(m):= a:. (Al ••• 2)
E(I) := p. IIM(i):= a: (Al. ••8)

We ahall deal only with the ca.se where the 8 values are independent, and each
S value is made up of a random sample (of variable size) from the population of
III values. It is assumed in (Al. •• 8) that the 8 values all have the same mean and
variance, for example, that they are from a single population.
The probability that S". is equal or less than a specified value S, i.e. the dis-
tribution function of the sum (see § •. 1), can be written, looking at each possible
value of m separately, as
Pf.S". ~ 8] = Pf.(m = 1 and Sl ~ 8) or (m = 2 and SI ~ 8) or... ].
(AU•• )

t I am very grateful to R. Galbraith, Department of Statistics Univeraity College,


London for showing me how to obtain the result. in this aeotion.

Digitized by Google
A1.4 37
The events in parentheses are mutually exclusive 80, using the addition rule
(2.4.2), this becomes a sum (over all possible m values), viz.
!p[m = m and S", ~ 8]. (Al.4.5)
m
Now, using the rule in its gen<5<5ni (t,4.4) shows that
~tm = m &DdS", ,<5<5itten in terms unnilitiunal probabilities
P[Sm~8Im = nnd 80 (Al.4.5) <5ee,nn'""n
Ip[S",~8Im = (AI.4.6)

can be written
F(8) = !F(8Im).p[m = m] (AI.4.7)
m

and diflerentlating this with respect to 8 gives, 88 in (4.1.5), the probability


density function 88
!(S) = !i(Slm).P[m = m]. (AU.S)
m
function of the now simply US<5
eU,pectation, (Al.l.

p(8) !(8)dS

= Lg(S).~(Slm).p[m = m].dS
= ~ {fl(S) !(Slm) dS}.p[m = m] (Al.4.9)

I{Esfg(S.)lm = (AU.10Y
272

The last step folluwe [",n,,,,,,*, term in curly Dwn',,,eTU (AL'i.9) is simply the
g"pectation of the ,,,Pen m has a fixg~ ALe value of this
of course depenP t,mction of) the valn*" 80 (AI.4.IO) has
form! (functin" = m)], just like means that the
m
term in curly brackets is being averaged over all m values and (AI.4.IO) can
therefore be written
(AU.Il)
which describes in symbols the two-stage averaging process mentioned at the
~gginning of the sec$,inn, *n"nult is much mOn<5 it appears from
derivation. If nny two random nuntinuous or dis-
ngntinuous, then, IAI.4.Il) we haVg
EII{EJg(z,y)ly (A1.4.12)
The mean value $",lloWB directly if the function
g(SIII) is simply identified with Sill' Averaging the sum for a fixed value of m,
using the definitions in (AI.4.I)-(Al.4.S), gives the term in curly brackets in
(AI.4.Il) 88
(AU.IS)

gitized by Goo
372 AppetUliz 1 §Al.4
i.e. the average value of the total of a fixed number, m, of values of IS is m times
the average value of IS, fairly obviously. Actually, this step is not quite &8 obvious
&8 it looks. Written out in full we have

E[S,..lm = m] = E[(lSl +tsa+ ••• +IS .. )jm = m]


= E[~lm = ml+E[tsalm = ml+ ••. +E[IS.. lm = m)
(AI.4.U)
and only if m is independent of the IS values, i.e. if the size of IS values does not
depend on whether m is large or small, can this be written
(AU.16)
and if all the. values have the same mean, P., &8 888UJDed, this is simply mp.
&8 stated in (AI.4.IS).
Having found this for a fixed value of m, we now do the second Stage of
averaging, over m values, treating m &8 a random variable though P. is, of COUl'lle,
a constant. Thla gives, using (Al.4.U) with (AI.4.IS) and (AI.4.2),
E[S,..] = E ..[mp.]
= p.E.[m) = PaP .., (Al.4.16)
which is just what would be expected for the average value of the sum of m
values of ••
To find the variance of S we use the definition (AI.2.2) which is
tUK(S,..) = E[S,..II]_(E[S,..W1• (AU.17)
The only thing needed now is to find the expectation of S,..II. To do this we
use (AI.4.U) again, but this time g(S,..) is identified with S,..II. So first we want to
find the term in curly brackets in (AI.4.11), the expectation of S,..II when m has a
fixed value, m. Thla we find by rearranging the definition of variance (AI.2.2)
to give the general relation
E[fII) = tU~.(!l+(E[!»1I == 0':+1':. (AU.IS)
The term in curly brackets is therefore
E[Sllm = m] = IIM(S,..lm = ml+(E[SlIIlm = m)11
(AI.4.19)
the first term being the variance of the sum of m independent variables, from
(2.7.4), and the second term following from (Al.4.IS) above. (This step again
888UJDes that the IS. are independent of m, &8 in (AI.4.IS).) Now we average
over m values, i.e. we now treat m &8 a random variable though O'~ and P. are
constants of COUl'lle. Thus (AI.4.U) gives, using (Al.4.19) and (AI.4.2),
E[SIi] = E,.[ma:+mllp:l
= a:E..[ml+p:E,.[mll ]
= aII"., .. +p~(a:.+,.,:.), (AI.4.20)
the last line following by the use of (Al.4.IS) to find E[mll).
The variance of Sill can now be found by substituting (AI.4.16) and (AI.4.20)
into (Al.4.17) giving the required result
va.(S) = aII"., .. +p:(a:'+"':')-p:"':'
= aII"., .. +aII..,.,: (AI •••21)

Digitized by Google
AI.4 373
Using the coefticientB of variation defined in (2.6.4), i.e.
~(i) €I./P., = and == vt",,,,(S,,,)]tR3Zt,4i,,,],
we get, using (Al.4.21) and (A1.4.16),

CAt.4.22)

An illustration of the use of this result is given in I 3.6 (p. 69).


Appendix 2
Stochastic (or random) processes
Some basic results and an attempt to explain the unexpecte-d propertie8 of random
pr0Ce88e8

The Science of the age, in short, is physical, chemical, physiological; in &ll


shapes mechanical. Our favourite Mathematics, the highly prized exponent of &ll
these other sciences, baa also become more and more mechanical. Excellence in
what is called its higher departments depends less on natural genius than on
acquired expertness in wielding its machinery. Without under-valuing the
wonderful results which a Lagrange or Laplace educes by means of it, we may
remark, that their calculus, differential and integral, is little else than a more
cunningly-constructed arithmetical mill; where the factors being put in, are, as it
were, ground into the true product, under cover, and without other effort on our
part than steady turning of the handle.
TROKAS CARLYLE 1829
(Signs of the Times, Edinburgh RevNt.o, No. 98).

THB following discussions require more calculus than is needed to follow


the main body of the book 80 they have been confined to an appendix to
avoid scaring the faint-hearted. However, the principlu involved are the
important thing, 80 do not worry if you cannot see, for example, how an
integral is evaluated. That is merely a techincal matter that can always be
cleared up if it it becomes necessary.
A2.1. Scope of the stochastic approach
In many cases the probabilistic approach is necessary for, or at least is
enlightening in, the description of processes that are variable by their nature
rather than because of experimental error. This approach might, for example,
involve consideration of (1) the probability of birth and death in the study of
populations, (2) the probability of becoming ill in the study of epidemics,
(3) the probability that a queue (e.g. for hospital appointments) will have a
particular length and that the waiting time before a queue member is served
has a particular value, (4) the random movement of a molecule undergoing
Brownian motion in the study of diffusion processes, and (5) the probability
that a molecule will undergo a chemical reaction within a specified period of
time (see examples in §§ A2.3 and A2.4).
The appendix will deal with aspects of only one particular stochastic
prooess, the Poisson prooess which has already been discussed in Chapters
3 and 5. It is a characteristic of this process that events occurring in non-
overlapping intervals of time are quite independent of each other. The same

Digitized by Google
§A2.1 375
idea. ca.n also be expressed by saying, at the risk of being anthropomorphic,
that the process has 'no memory' and therefore is unaffected by what has
happened in the past. or that the process 'does not age' (see also Cox
(1962. pp. 3-5 and 29».
Examples of Poisson processes discussed in Chapters 3 and 5 were the
disintegration of radioactive atoms at random intervals and the random
occurrence of miniature end plate potentials (MEPP). Other examples are
(1) the random length of time that a molecule remains adsorbed on a mem-
brane before being desorbed (e.g. an atropine molecule on its cellular receptor
site, see § A2.4), and (2) the random length of time that elapses before a
drug molecule is broken down in the experiment described in § 12.6.
The lifetime of a molecule on its adsorption site (or of a drug molecule
in solution, or of a radioactive atom) is a random variable with the same
properties as the random intervals between MEPP (see § 5.1). In the case
of the adsorbed molecule, this implies that the complex formed between
molecule and adsorption site does not age, and the probability of the complex
breaking up in the next 5 seconds, say, is a constant and does not depend
on how long the molecule has already been adsorbed, just as the probability
of a penny coming down heads was supposed to be constant at each throw,
regardless of how many heads have already occurred, when discussing the
binomial distribution in Chapter 3. Consequently the Poisson distribution
ca.n be derived from the binomial as explained in §3.5. Another derivation
is given in § A2.2 below.
The arrival of buses would not, in general, be a Poisson Process, although
it often seems pretty haphazard. The waiting time problem for ra7Ulomly
arriving buses. discussed in § 5.2, is typica.l of the sort of result that is
usually surprising and puzzling to people who have not got used to the
properties of random processes. I certainly found it surprising and puzzling
until recently, and so I hope the reader will find the results presented below
as enlightening as I did.
Fur further reading on the subject see, for example, Cox (1962), Feller
(1957, 1966), Bailey (1964, 1967), Cox and Lewis (1966), and Brownlee
(1965, p. 190).

A2.2. A derivation of the Poisson distribution


As mentioned in § 3.5, the distribution follows directly from the condition
that events in non-overlapping intervals of time or space are independent
each, using the definition of independence discussed in § 2.4.
The probability of one event occurring in the time interval between t
and t+!:ll ca.n be defined as J.!:ll, if !:ll is small enough. From the discussion
of the nature of the Poisson process in §§ 3.5 and A2.1, it follows that J.
must be a con8Iant (i.e. it does not vary with time, and does not depend on
the past or present state of the system) that characterizes the rate of occur-
rence of events. More properly. it should be said that the probability of
one occurrence in the infinitesimal time interval, dt, between t and t+dt
is constant and ca.n be written J.dt. This definition, plus the cOndition of
independence, is sWlicient to define the Poisson distribution. H finite

Digitized by Google
376 § A2.2
time intervals, tll, are considered then the probability of 07U event in the
interval between t and I+tll should be written Atll+O(tll) (see (A2.2.9».
Furthermore, the probability of more than 07U event occurring in the interval
ru becomes negligible when the interval is very short, and so it is also
~TI'TI'TITI" u(tll), as shown in (A2,2Jl),
o(tll), which ilihen discU88ing ili~%TICTI%TI%TIili'ILi qrocesses,
stand for any becomes negligiqlLi to III
ikii%TIrvailength tll R:iiiuumTIili small (it does TI'TTI'md for
q%TI%TIntity, and maq in the same exp%TIm:::TI'i%u%TI for
i!i:iiil%TIu3LiLi quantity each precisely, any iliTI'itten
o(tll) if it obeys the definition

1:!e~») = 0 (A2.2.1)

so no approximation will be involved in the limit in ignoring o(lll) terms.


lTImbubility that thene event8 between U%TId III thus,
uPdition rule I-probabiHt'H more
AIll-o(tll).
uH!%TI':::""111"'y that r b!!tween 0 (the timTITI ~ikeaaure-
Li%TI~rted) and I will be P(r, f), an extenTI'Ku%TI %TIutation
and Chapter 5, UTI'Y,'H notation, P(O,IU qer the
probability that zero events occur between 0 and t+ tll. For this to happen
there must be both
(zero events between 0 and f) and (zero events between t and t+Ill).
The probability of the first of these contingencies is P(O, t), and the prob-
abilit,Y of the second is, as above, l-Atll-O(tll). H the events in the non-
n:::",'!:::' nn!%! time intervals {mm f to t+tll are (ihis is
and very stro:rU, the probabililH con-
happen follun'Li multiplication Lind is
%%!%!" w,:::, of separate

P(O, t+ ll-Atll-O(Ill)l. (pq.2.2)


Rearranging this gives

P(O, f+Ill)-P(O, t) __ 'P(O f)-P(O t) o(Ill).


III - A, "Ill

letting At-O, side becomu,u, bH dufiniYion of


i!i:ii43Li%TILit:&LitRun. d P(O, t)/dt u%TIumple, Massey ana {19M,
the second tem! !%!y,ht-hand side (from

AP(O, f) (aU.2.3)

and the solution of this differential equation is


P(O, f) = e- AI • (A2.2.4)

gle
§A2.2 377
This is found using the condition that P(O, 0) = eO = 1 (i.e. it is certain
that zero events will occur in zero time). The solution is easily checked by
differentiating (A2.2.4), giving (A2.2.3) back again, thus dP(O, t)/dl =
de-AI/dl = -AP(O, t). Equation (A2.2.4) is just the probability of zero
events occurring in time I given by the Poisson distribution (3.5.1), if J.
is interpreted &8 the average number of events in unit time (see §§ 3.5, and
5.1 and eqn (Al.l.7», so III = At is the mean number of events in time I.
To find the Poisson distribution when r > 0 notice that r events will
occur between 0 and 1+ tu if either

[(r events occur between 0 and I) and (zero events occur between I and I+tu)]

or
[(r-I events occur between 0 and I)
and (one event occurs between t and I+tu)].

The probabilities of the four events in brackets have been defined &8 P(r, I),
(I-J.tu-o(tu», P(r-I, I), and Atu+O(tu) respectively. Therefore, using
the addition rule (2.4.2) and the multiplication rule for independent events
(2.4.6), the probability of r events occurring between 0 and t+ltu becomes

P(r,t+L\t) = P(r,I). [1-Atu-O(tu)]+P(r-l,t).[J.tu+o(tu)].


(A2.2.5)
Rearranging this gives
P(r,t+tu)-P(r,l) o(tu).
L\I = -J.P(r,t)+J.P(r-l,t)+"At[P(r-l,t)-P(r,t)].
(A2.2.6)

Again letting tu - 0 gives, using (A2.2.1) &8 above,

d P(r,t)
~ = -J.P(r,t)+AP(r-I,t). (A2.2.7)

This holds for any r greater than 0, so putting r = 1 gives an equation


for P(I,I), the probability of r = 1 event occurring in a time interval of
length t. Inserting the value of P(r-I,I) = P(O,t) = e- AI from (A2.2.4)
into (A2.2.7) results in an equation that can be solved giving P(I,I) = (At)e- AI,
which is the Poisson probability for r = 1 defined in eqn (3.5.1) and § 5.2.
This can be inserted into (A2.2.7) with r = 2 to find P(2,1), the next term
of the Poisson aeries. Alternatively, simply notice that the probability of
r events in a time interval of length I, the solution of (A2.2.7) for any value of
r, (greater than 0) is

P(r t) = (At)'e- AI (A2.2.8)


, rl '

Digitized by Google
378 §A2.2
whioh is the Poisson distribution defined in (3.5.1) (see also § 5.1), beoaU8e it
baa been shown in (A2.2.4) that (A2.2.8) does aotually hold for r = 0 &8 well.
This solution is easily ohecked by differentiating (A2.2.8) giving
dP(r ') d (Ae)r ) lr
- - ' - - _ _ e- AI - - rlr-l e- AI
de - de r!' - rr'
+-
(Ae)r
r! .
(_18-.11)

_ ~
(Ae)r-l
_ _ e-AI , (Ae)r
_ .-.11
-A'(r_I)1" -A· rl ·..
= lP(r-I,t)-AP(r,I).
Thus (A2.2.8) is a solution of (A2.2.7).
WAy 1M remainder tmn8 ran be negleclal
Having derived the Poisson distribution, the remainder terms, which were
written o(~t) above, can be written explicitly, so it can be seen that they do in
fact become negligible relative to ~t when ~t-+O, .. stated in (A2.2.1).
The probability of r = 1 event occurring in the interval ~ is found by putting
r = 1 in (A2.2.8). The exponential is then expanded in series (8.8 in (8.5.2)) giving

A4te-A~1 = AAt (I-AAt+ (A::)I _ (A~)3 +...)


(AAt)3
= AAt-(AAt)I+~ - •••

= AAt+o(At) (A2.2.9)
8.8stated at the beginning of this section. All the terms but the ftrst on the
penultimate can be written 8.8 o(At) because they obey the deftDition (A2.2.1),
thus

(A2.2.10)
because every term is sero when At becomes sero.
The probability that more than one event (r> 1) 0CCUl'8 in At is, from (A2.2.8),
(A~We-A~'
rl
and for a.ll r> 1 this can also be written o(At). For example, for r = 2 we have.
using the definition (A2.2.1),

lim(O(At») = 1im(AAt)I/21. e--'4I)


~t-O() At ~t-oO At

= lim (AAt • e-
At-+O 21
-'4')
= 0 (A2.2.1l)
8.8 stated at the beginning of this section.

Digitized by Google
§A2.3 379
A2.3. The connection between the lifetimes of Individual
adrenaline molecules and the observed breakdown rate
and half-life of adrenaline
In the experiment analysed in § 12.6 it is found that when adrenaline
was incubated with liver slices in f1iWo the concentration of adrenaline
fell exponentially or, to be more precise, there was no evidence that the
relationship was not exponential. The estimated rate constant was
k = OO()7219 min- 1 (from (12.6.14», i.e. the estimated time constant was
Ilk = 13·85 min (from (12.6.16», and the estimated half.life, from (12.6.6),
was 0·69315/k = 9·602 min. The arguments in this section apply equally to
the disintegration of radioisotopes since the number of radioactive nuclei is
observed to fall with an exponential time course. One then considers the
lifetimes of individual unstable nuclei.
Focus attention on single adrenaline molecules. Suppose that they are
perfectly stable until, at zero time, the adrenaline solution is added to the
liver preparation that contains enzymes catalysiDg its catabolism. Suppose
that after the addition of enzymes at t = 0, there is a constant probability,t
l&+o(&) say, that any individual adrenaline molecule will be catabolized
in any short interval of time &. As before, l is a constant (it does not
vary with time) that characterizes the rate of catabolism. The probability
that the molecule will flOC be catabolized, from (2.4.3), is therefore I-l&
-0(&). The argument is now exactly like that in § A2.2. Denote as P(I)
the probability that the molecule is still intact at time t. The molecule will
still be intact at time t+& if
(it is still intact at time t) and (it is not catabolized between t and ,+ &).
H these events are independent, then the multiplication rule of probability,
(2.4.6), implies
P(t+&) = P(I).[I-l&-o(&)]. (A2.3.1)
This is like eqn. (A2.2.2). Rearranging gives

P(I+&L-P(I) = -).P(')-P(').o(~)

and, using (A2.2.1) just as in § A2.2, when & _ 0 this becomes dP(I)/dt
= -AP(t) (see, for example, Massey and Kestelman (1964, p. 59». The
solution (using the condition that P(O) = I, i.e. it is certain that the molecule
is still intact at zero time) is, as in § A2.2,
P(I) = e- A'. (A2.3.2)
Now in a large population of molecules the probability that a molecule
will be still intact at time I can be identified with the profJOrlion of molecules

t See § A2.2. A fuller explanation of the natU1'e of the term o(A,), which become.
neglisJ'ble for abort enough time intervala, ia liven in SA2.1.

Digitized by Google
380 §A2.3
that are still intact at time I, i.e. y/Yo where y is the concentration of adrena·
line at time t, and Yo is the initial concentration. Equation (A2.3.2) is now
seen to be identical with the observed exponential decline of concentration
(eqn. (12.6.4» if the rate constant, i, is identified with A.
Furthermore, the probability that a molecule is still intact at time I,
given by (A2.3.2), can be identified, just as in § 5.1, with the probability
that a molecule has a lifetime greater than t (if it did not it would not still
be intact). The probability that the lifetime is equal to or lu8 than I is
therefore, from the addition rule (2.4.3), and (A2.3.2),
I-P(t) = l-e- A1 =F(I), (A2.3.3)
which is exactly like (5.1.4) (the distribution function, F, was defined
in (4.1.4». This is consistent with (see § 5.1) the hypothesis that lifetimes
of individual adrenaline molecules are random variables following the
exponential distribution (see Fig. 5.1.1 and 5.1.2), with probability density
(from (4.1.5) and (A2.3.3»
dF(I)
1(1) = ( i t = le- A1 (I ~ 0) (A2.3.4)

as previously defined «5.1.3) and (A1.1.10». In other words, the mean


lifetime of molecules is A- l (as explained in § 5.1 and proved in (A1.1.11».
Referring again to the example in § 12.6, it can now be seen that the time
constant for the observed exponential fall in adrenaline concentration,
i-l=A.- l = 13·85 min (from (12.6.16», can be interpreted as the mt.Gn
value of the lifetimes of individual adrenaline molecules (measured from
the time of addition of enzyme at t = 0, or, as shown in the following sections
of this appendix, from any other arbitrary time). It follows from the argue
ments in § A2.6 that if adrenaline molecules were being synthesized in the
system, their mean lifetime measured from the moment of synthesis to the
moment of catabolism would also be A- l = 13·85 min.
Furthermore, the half·time for the observed decay of concentration
0·69315/i = 9·602 min (from (12.6.6) and (12.6.17», can be interpreted as
the median value of the lifetimes of individual adrenaline molecules, because
it was shown in (A1.1.4) that the population median of the exponential
distribution is 0069315/A. Fifty per cent of molecules survive longer than
9·602 min.

A2.4. A stochastic view of the adsorption of molecules from


solution
Suppose that a surface (e.g. cell membrane) containing many identical
and independent adsorption sites is immersed in a solution and is continually
bombarded with solute molecules. Some of these will become adsorbed on to
adsorption sites, remain on the sites for a time, and then desorb back into
the solution. Macroscopic observations of the amount of material adsorbed
can be related to what happens to individual molecules using the same sort
of approach as in §§ A2.2 and A2.3. This is, for example, the simplest model

Digitized by Google
§A2.4 381
for the interaction of drug molecules with cell receptor sites and, &8 such,
it is diBcusBed by Rang and Colquhoun (1973).
Consider a single site. The probability that a site is occupied by an ad-
sorbed molecule at time' will be denoted Pl('), and the probability that the
Bite is empty at time 1 will be denoted P 0(1). ThU8, from (2.4.3),
Po(I) = I-P1 (t). (A2.4.1)
The probability that an empty Bite will become occupied would be ex-
pected to be proportional to the rate at which solute molecules are bombard-
ing the surface, i.e. to the concentration, c say, of the 801ute (aBBUmed
constant). The probability that an empty site will become occupied during
the short interval oftime Ill, between t and 1+ Ill, will therefore be written
kill where k, &8 in §§ A2.2 and A2.3, is a constant (i.e. does not change
with time). The probability that an occupied Bite becomes empty during the
interval III will not depend on the concentration of 801ute, and so will be
written pill, where p is another constant. The probability that an occupied
site does noC become empty during III is therefore, from (2.4.3), I-pill. t
Now a Bite will be occupied at time 1+ III if either [(Bite W&8 empty at time I)
and (Bite is occupied during interval between I and ,+Ill)] or [(site was
occupied at time t) and (site does noC become empty between I and t+Ill)].
Now the probabilities of the four events in parentheses have been defined &8
Po(t) , kill, P 1(t), and (I-pill) respectively. So, by application of the
addition rule (2.4.2), and the multiplication rule (2.4.6) (&88uming, &8 in
§§ A2.2 and A2.3, that the events happening in the non-overlapping intervals
of time, from 0 to I and from I to t+ Ill, are independent), it follows that the
probability that a site will be occupied at time t+1ll will be
P 1 (t+lll) = Po(I).klll+P1 (t)(I-plll)+0(1ll), (A2.4.2)
where 0(1ll) is a remainder term that includes the probability of several
transitions between occupied and empty states during Ill. As in §§ A2.2
and A2.3, 0(1ll) becomes negligible when III is made very small. Rearranging
(A2.4.2) gives

Now let lit _ O. As before the left-hand Bide becomes, by definition of


differentiation (e.g. Massey and Kestelman (1964, p. 59», dPl/dt, 80, using
(A2.4.1) and (A2.2.1),

(A2.4.3)

t The probabilities should really be written AcA'+o(A.), pA'+o(A') and l-pA.


-o(A.), as in n A2.2 and A2.3, if the time interval, A'. is finite. Alternatively. it could
be ll&id... in § A2.2. that the probability that an occupied site becomes empty during
the infinitesimal interval between. and '+dI can be written pd'. etc. A fuller diacuaaion
of the nature of the o(A.) terma i8 given in § A2./i. All these terma have been gathered
together and written as o(A.) in (A2.'.2). which holds for finite time intervala.

Digitized by Google
382 §A2.4
If Pl(t), the probability that an individual site is occupied at time t, is
interpreted as the proportion of a la.rge population of sites that is occupied
at time t, then (A2.4.3) is exactly the same as the equation arrived at by a
conventional deterministic approach through the law of mass action, if
A and P are identified with the mass action adsorption and desorption rate
constants. Thus AlP is the law of mass action affinity constant for the solute
molecule·site reaction. The derivations and solution of (A2.4.3), and its
experimental verification in pharmacology is discussed by Rang and
Colquhoun (1973).

Phe lengIA oj "me Jor whiM an ad8orption riCe is occupied; iC8 tli8tribul':ion and

In order to investigate the length of time for which a molecule remains


adsorbed consider the special case of (A2.4.3) with Ac = O. The probability
of an adsorbed molecule desorbing does not depend on the probability, A.
that an empty site will be filled, or on the concentration of solute, so this
does not spoil the generality of the argument. For example, at t = 0 the
surface, with a certain number of adsorbed molecules, might be transferred
to a solute-free medium (i.e. c = 0) so that adsorbed molecules are gradually
desorbed, but no further molecules can be adsorbed, so that a site that
becomes empty remains empty. When c = 0, (A2.4.3) becomes

(A2.4.4)

This equation has already been encountered in §§ A2.2 and A2.3. Integration
gives the probability that a site will be occupied, at time t after transfer to
solute free medium, as
(A2.4.5)
where P 1(0) is the probability that a site will be occupied at the moment of
transfer (' = 0). In other words, the proportion of sites occupied, and there-
fore the amount of solute adsorbed, would be expected to fall exponentially
with rate constant p. Such exponential desorption has, in some cases, been
observed experimentally.
Now if the total number of adsorption sites is N tot, then the number of
sites occupied at time' will be N(t) = N tot P 1 (t), and the number occupied
at t = 0 will be N(O) = N totPl(O). The proportion of initially occupied sites,
that are still occupied after time t will, from (A2.4.5), be
N(') _ P 1 (t) _ _/I'
N(O) - P 1(0) - e , (A2.4.6)

and this will also be the probability that an individual site, that was occupied
at t = 0, will still be occupied after time 't.
. A site will only be occupied after time t if the length for whioh the molecule
remains adsorbed (its lifetime) is greater than t, so (A2.4.6) is the probability
t i.e. baa been oontiDuoualy oooupied between 0 aDd e.

Digitized by Google
§A2.4 383
that the lifetime of a.n adsorbed molecule is longer than t. Analogous situations
were met in §§ 5.1 and A2.3. The probability that the lifetime of a.n adsorbed
molecule is t or Zu8 is therefore, from (2.4.3),
P(O~lifetime~') == F(t) = l-e- II'. (A2.4.7)
This is exactly like (5.1.4) a.nd (A2.3.3), and is consistent with (see § 5.1) the
hypothesis implied by the physical model of identical and i~
adsorption sites, that the lifetime of individual adsorbed molecules is a.n
exponentially distributed variable, with probability density as before, from
(4.1.5),
dF(I)
/(1) = Cit = ,sa-II'. (A2.4.8)

The mean lifetime of a molecule on an adsorption site is therefore 1-'-1 (from


(AI. 1. 11», the observed time constant (see (12.6.4» for desorption of adsorbed
molecules into a solute-free medium; a.nd, just as in § A2.3, the observed
half-time for desorption, 0·69315/1-' (from (12.6.6» can be interpreted, using
(A1.1.14), as the median lifetime of a molecule on an adsorption site. Fifty
per cent of molecules stick for a longer time than 0,69315/1-"
What is meant by lifetime 1 In the discU88ion above, the lifetime of an
adsorbed molecule was measured from the arbitrary instant (t = 0) when the
surface was transferred to solute-free medium until the instant when the
molecule desorbed. The average length of this ruidualli/etime (see § A2.7)
was 1-'-1. It is of more fundamental interest to known the average length of
time a molecule remains adsorbed, i.e. the lifetime measured from the instant
of adsorption to the instant of desorption. The mean length of this lifetime
is also 1-'-1, as implied in § 5.1. It might be expected that, because the
adsorbed molecules have already been adsorbed for some time at the time
that the surface is transferred to the solute-free medium, and the lifetime
measured from moment of adsorption to moment of desorption would be
longer than 1-'-1 (see Fig. A2.7.3). This cannot be because of the 'lack of
memory' or 'lack of ageing' of the Poisson process. It is nevertheless sur-
prising to most people, in just the same way as the analogous bus-waiting
time 'paradox' described in § 5.2 is, at first sight, surprising.
If the mean interval between bus arrivals (supposed ra.ndom) is 10 min
then the waiting time from an arbitrary moment until the next bus was
stated in § 5.2 to be 10 min also, just as the waiting time from an arbitrary
moment until desorption (residual lifetime) is the same <1-'-1) as the mean
time between adsorption and desorption (lifetime). In words, the reason for
this is that if one looks at the surface at an arbitrary moment of timet it is
more likely that it will contain long-lived molecule-adsorption site complexes
tha.n short-lived ones which, because they exist for only a short time do not
stand such a good chance of being in existence at any specified arbitrary
moment. Similarly in § 5.2, it is more probable that a person will arrive at
t An arbitrary moment of time means a time choeen by any method at all .. long
.. it is independent of the 000Ul'I'8I108 of eventa, i.e. independent of the timell when
moleculea move OD and off aciaorptiOD Bite. in this cue.
a6

Digitized by Google
884: §A2.4
the bus stop during a long interval than a short one. In fact, the mean life-
time (from the moment of adsorption to the moment of desorption) oj
mokcules Fe8W at an arbitrary moment (such as the moment when the
surface with its adsorbed molecules is transferred to solute-free medium) is
exactly twice the mean lifetime of all molecules, i.e. it is 2p -I, so the average
residual waiting time until desorption is p-l as stated, and the mean length
of time that a molecule has &lre&dy been adsorbed at the arbitrary moment
is also p-l, of. § 5.2). These statements are further discussed and proved in
§§ A2.6 and A2.7.

The lengIA oj time Jor w1&U:h an adBorptWn rit6 is empty


The argument follows exactly the same lines as that presented above
for the average length of time for which a site is occupied. As above it is
convenient to consider the speoial 0&Be when the combination of molecule
with adsorption site is irreversible so that once ocoupied a site remains
occupied, i.e. p = 0 (the probability of an empty site becoming occupied
does not depend on p so this does not spoil the generality of the arguments).
In this O&Be, because it follows from (A2.4.1) that dPo/dt = -dP1/dt,
equation:(A2.4.3) becomes

(A2.4.8)

which has e:u.ctly the same form as (A2.4.4). Using the same arguments as
above, it follows that the length of time for which a site remains empty is
an exponentially distributed random variable with a mean length of
(h:)-1. The mean length is inversely proportional to the concentration of
solute (c). As above, this is the lifetime measured either from an arbitrary
moment, or from the time when the site was last vacated by a desorbing
molecule.
AdBorptWn at equilibrium
.After a long time (t _ 00) equilibrium will be reaohed, i.e. the rate at
which molecules are desorbed will be the same as the rate at whioh they are
adsorbed. Therefore, the proportion of sites oocupied, PI' will be coD8t&nt,
i.e. dP1/dt = o. Equation (A2.4.3) gives
h:(I-P1)-pP1 = 0
from which it follows that, at equilibrium,
h: Ke
p ------ (A2.4.10)
1- h:+p - Ke+l
if K = I.!p, the law of m&BB action affinity constant. This equation is the
hyperbola in §§ 12.8 and in 14.6. Now it has been shown that the mean
length of time for whioh an individual site is occupied is p-l, and the mean
length oftime for which it is empty is (h:)-1. These values hold whether or

Digitized by Google
A2.4 385
not equilibrium has been reached. t Mter transferring a membrane with
empty sites, to a solution containing a constant concentration, c, of solute,
the empty sites will have to wait, on average, (Ac)-1 seconds before they
become occupied so equilibration will take time; see Rang and Colquhoun
(1973). Using these values, it follows that
(uuuupied time/empty (A2.4.11)
therefore written
1
(A2.4.12)
(UTrFpty time/occupiuY
For example, if the probability that a site is oocupied is PI = 0·5 (this will
be independent of time at equilibrium), i.e. 50 per cent of sites, on the average,
are ocoupied at any moment of time, it follows from (A2.4.12) that empty
time = occupied time, Le any given site is occupied for 50 per cent of the
time. This state is attained, at equilibrium, when = ",-1, Le. when
concentration c = ",I). = 11K directly from
(Yt.4.l0».
tty,S. The relath:m uiEiuu"u'iEiiEiiEi the lifetime ot ±Trtt,tt±Tr±,Tr± radioisotope
molecules Buterval betweeiEi Tr'i"ii±t~"t"""ln,n.
The examples ot intttiEiuals between plate potentials
(MEPP) discussed in § 5.1) and between bus arrivals (in § 5.2) were straight-
forward in that there was in each case a single continuous stream of events.
In the case of radioisotope disintegration (§§ 3.5-3.7), catabolism of adrena-
line (§ A2.3), or adsorption of solute molecules (§ A2.4) the situation is
not quite the same. For each isotopic atom there is one event, disintegra-
Nevertheless intervals betwzum buses have the
properties as intervals definiEit TIiuutimes of isotope
utzums (or adrenalinu tzr solute ±4ite complexes).
The mean lifetimu molecules, any arbitrary
time (see §§ A2.3, und A2.7) may of years. For
uuumple the half-litzz lifetime, see § uuzub'un-14 molecules
is 5760 years, so the mean lifetime of a molecule is ).-1 = 5760/0·69135 0.69315
= 8310 years (from (Al.l.ll) and (Al.l.I4», i.e. multiplying by the number
of seconds in a Gregorian year, 8310 X 3·155695 X 107 = 2·6224 X lOlls.
This is obviously independent of the amount of 1'0 present.
However, in § 3.7 the Poisson distribution considered was that of the
zz±±mber of second (it will tor the sake of
u±±zzmple, that the iUUZJlued was 140). Beu±uiUJi uUYiEiFble is Poisson-
d titributed with mu,uiEi, in § 3.7, dtT9'5 disintegra-
If the movemenUf could be obeervttLY, length of time for
Tr£>"'oh a site W88 ooouZ;,i""E meamred, but th±4 obviously have
taken over a Ion£> tiiiOe, relative to ",-z, t,±4±4n if many sites,
rather than just one, were obB9rved. It can be shown that the time constant for equilibra-
tion of the sites is (Ao+",)-l (_, for example, Rang and Colqulloun (1973)),80 in fact
the average can only be given a frequency interpretation over a period that is long
relative to the time taken *0 reach equilibrium.

gitized by Goo
386 A~2 §A2.5
tioDS per second, the mean number of events in I = 1 second (assuming that
the counter detects all disintegratioDS), it follows from the arguments in
§ 5.1 that the intervals between disintegratiODS are exponentially distributed
with mean interval (1')-1 = 1/2089·5 = 0·000478583 second (this obviously
depends on the amount of 1'0 preeent). Compare this with the lifetimes
of individual molecules that are also exponentially distributed with mean
lifetime 1- 1 = 8310 years. These two exponential distributioDS are, as
expected, closely related. This will now be shown.
The probability that any individual 1'0 atom disintegrates in an interval
of time of length Ill, from the arguments in §§ 5.1 and A2.3, must be llll. t
Suppose that at time I a sample of 1'0 contains N(I) undisintegrated 1'0
atoms. Define as an 'event' the disintegration of any of these atoms, i.e.
if the atoms could be numbered, the disintegration of either atom number
1 or atom number 2 or . . . or atom number N(t). The probability of this
event occurring in an interval of time of length Ill, is, from the addition rule
(2.4.1),
11ll+11ll+ ... +11ll-0(1lI) = N(I)llll-o(IlI), (A2.5.1)

where 0(1lI) is a remainder term (see (A2.2.1» that includes all the prob-
abilities of more than one disintegration occurring during Ill, which will be
negligible when III is made very small. The argument now follows exactly
the same lines as in § A2.3. Define the probability that no event occurs up
to time I as P(t). No event will occur up to time 1+ III if (no event occurs up
to t) and (no event occurs between t and 1+ Ill), and the probability of this is,
from the multiplication rule (2.4.6),
P(t+llI) = P(t)[I-N(t)llll+o(IlI)]. (A2.6.2)
Rearranging this and allowing 1lI-0 gives, as in (A2.2.2) and (A2.3.1),
dP(t)/dt = -N(t)lP(t). Now ijthe length of time considered is short enough
for the decay of the radioisotope to be negligible (as assumed in § 3.7) then
N(I) can be treated as a coDStant. It follows that the solution for pet),
using the condition that P(O) = 1 (i.e. it is certain that no events will
occur in zero time), will be, as before,
P(I) = e-N(I)AI, (A2.5.3)
just as (A2.3.2). This probability, that no disintegration will occur up to
time I, can be identified with the probability that the interval between
disintegratioDS is longer than I. Using the same arguments as in § A2.3 it
follows that the interval between disintegratioDS is an exponentially dis·
tributed variable with a mean length, defined above as (1')-1, of (N(I)1)-1,
and the mean number of disintegratioDS per second is therefore
l' = N(I)l (A2.5.4)

t TIWI probability should really be written AAI+o(4I), if 41 ill &nita, .. in Sf .u.S


and At.'. The nature of the 0(41) terms. and a more rigolOU8 derivation of (At.G.I).
are diIIou-t at the end of thiIIl8Otion.

Digitized by Google
§A2.5 A~2 887
which decreaaes, as expected, as the total number of isotope moleonlee,
N(I), decreases. The intervals, will of COUl'8e, only be exponentially distributed,
and the disintegration rate will only be Poisson distributed, over time
intervals short enough for N(I) to be substantially coutant. Using (A2.5.4)
and the figures given above for the example in § 3.7 shows that the number
of 1'0 atoms present at the time the sample was counted mUlt have been
N(I) = ).'/). = 2089·5 (atoms 8- 1) x 2·6224 X 1()11 (8).
= 5·4795 X 10!t atoms.
Therefore the weight of !to was
5·4795 X 1014/6.023 X IOU = 9·098 X 10- 10 gramme molecules, or
9·098 X 10- 10 X 14 = 12·74 X 10- 8 g.
A more careJullook at the nature oJ the o( at) termB in prOCUlle8 like the
cataboliBm oJ adrenaline, the decay oJ radw-i8otopeB and the ad8orption oJ
moZeculea
The basic Poiaaon proceaa coD8lsta of a continuous stream of event., such .. the
occurrence ofmiDJature end plate potentiaJs (see 15.1) or the random arrivala of
buses at a bus atop. It W88 shown in I A2.2 that in thla 80rt of proceaa the prob-
ability of one event occurring in a finite time interval flt can be written .. Aflt
+o(flt). Obviously this probability cannot be written simply .. Aflt because this
would become indefinitely large, if long enough time intervala were considered,
whereas all probabilities must be leaa than I.
In processes like the catabolism of adrenaline, the decay of radioisotopes, or
the adsorption of molecules, the situation Js not quite the same. Each adrenaline
molecule can only be destroyed once, 80 one cannot consider the probability of it
being "destroyed f' times during flt" .. in I A2.2. Neve1'theleaa it clearly will not
do to say that the probability of cataboliam (decay, adsorption, etc.) during
flt Js Aflt, because, .. above, thJa can be greater than I. Suppose that thla prob-
ability can be written Aflt+o(flt). The argument in the ftrst part of this aect.lon
can now be made more rigorous.
The cataboliam (decay, etc.) of dUrerent atoms during a finite time flt are not
mutually exclusive event., 80 the simple addition rule cannot be used. Instead,
the binomial theorem should be used. In the Jangua.ge of II 8.2-3.4, let a 'trial'
be the observing of a molecule during the time flt, and let a 'SUCCe88' be the
occurrence of cataboliam (decay, etc.) during thJa period. If, 88 above, there are N
molecules present altogether, then the probability that one of them will be
catabolized (decay, etc.) during flt can be identifted with the probability of
f' = 1 SUCCe88 occurring in N triala, and thla Js given by the binomial distribution,
(8.4.8), 88 N9'(I-IJ')N-l where it baa been supposed that the probability of
success at each trial can be written 9' = Aflt+o(flt). ThJs probability Js the same
at every trial 88 discU8lled in 18.5. Substituting it for 9' in N9'(l-9')N-l, and
expanding the resulting expression (use the binomial expansion on (l-9')N-l),
it Js found that the required probability of one of the N molecules being cata-
bolized during flt can indeed be written
N9'(I-IJ')N-l = NAflt+o(flt) (A2.5.5)
88 asaerted, on the basis of a simpWled argument, in (A2.5.1).
ThJs argument can now be turned upside down starting from the experi-
mental observations and working backwarda. The decay of radioJaotopea, and,

Digitized by Google
388 A~2 §A2.5
in lIOIDe circumatancee at leaat, the catabolism of molecules, and the deaorbtton
of a.d8orbed molecules, are ob.tenHId to follow an exponential time coune. In each
cue the impHcation is that the probability that a molecule is stUl intact at time
t, I-F(t), is e- AI • This is coDBistent, 88 deacribed in earBer sectioDa, with the
phyaical model that apeciJles that the Metime of individual mo1ecu1ee is an
exponentially distributed va.ria.ble with mean A-I, In the cue of radioisotope
decay, this can be confirmed experimenta.lly by the observation that the number
of disintegrations in unit time is Poisson distributed (over timee during which N
is subata.nti&lly constant). Now, if the number of molecules catabolized, etc.
during Ilt is Poisson distributed, and the mean number of events during Ilt is
A' Ilt 88 above, then the probability that one molecule of the N preeent will be
catabolized, etc. during Ilt is given by the Poisson distribution, (3.5.1), with
f' = 1 and #It = A' Ilt, i.e. it is A' Ilte-A~I. Substituting A' = NA from (A2.5.'), and
expanding the exponential term exactly 88 in (A2.2.9), givee the probability of
one of the N molecules being catabolized, etc. during Ilt 88

NAIlt.e- NAAI = NAat+o(Ilt). (A2.5.6)

just 88 in (A2.5.5) and (A2.5.1). Now, according to the argument above, this can
be equated with NIJI(I-IJI)H-l, where IJI is the probability of any individual
molecule being catabolized during Ilt. The only two solutions of this equation for
IJI are' = Allt or' = AIlt+o(Ilt). The former will not do, 88 explained above,
so the probability must be written AIlt+o(llt) 88 asserted in II A2.3-A2.5.

A2.S. Why the waiting time until the next event does not
depend on when the timing is started. for a Poisson
proce..
The assertion that waiting time does not depend on when timing wsa
started baa been made repeatedly in Chapter 5 and this appendix. For
example, the mean waiting time until a molecule is desorbed does not depend
on the arbitrary time when the timing is started, and will be the same,
A-I, sa if the timing were started from the moment the molecule wsa adsorbed.
Suppose that the interval from one event to the next is exponentially
distributed with mean A-I. It will be convenient, sa at the end of § 4.1,
to use l to stand for time me&BUred from the laat event considered sa a
random variable, and I, '0'
etc., to stand for partioular values of l. Suppose
that a time 10 is known to have elapsed from the last event. Given this fact,
what is the probability that the time from '0
until the next event (the
refttlual "fmfM) is less than any specified time I, i.e. what is the probability
that l<Io+' (event El say) given that l>to (event Ea say)' In symbols
this is P(E1 IEa), i.e. from the definition of conditional probability (2.4.4),

P(l<Io+I uull>Io)
P('<'o+'I'>Io) == P(l> ' 0 ) •
(A2.6.1)

Now the event that (l<Io+I and l>Io) is the same sa the event
'o<l<'o+' and, because the interva1a between events are being suppoeed

Digitized by Google
§A2.6 389
to follow the exponential distribution (0.1.3),/(') = hi-A' with mean interval
between events = A-I, the probability of this is, as in (4.1.2),

P(Io < 1< ' 0 +') = i10


lo+l
hI-A'dI

(A2.6.2)
The denominator of (A2.6.l) is (cf. (0.1.4)),

P(l> '0) = i'0 CD


la-A' dI

= [ _e- A']: = e-A'o. (A2.6.3)

Substituting (A2.6.2) and (A2.6.3) into (A2.6.1) gives the required conditional
distribution function (cf. (4.1.4)) for the residual life-time, t, (meaB'Ured from
to to the nezI etI67'It) as

(A2.6.3)
which is identical with the distribution function «0.1.4) or (A2.3.3)) for
the intervals between events (mea8Ured lrom. ,he laBt etlW to the nezI etlw).
Differentiating, as in (A2.3.4), gives the probability density for the residual
lifetime, I, as 1(1) = la-A', the exponential distribution with mean A-I
(from (AI.I.11)), exactly the same as the distribution of intervals between
events. The common-sense reason for this curious result has been discussed
in words in §§ 0.2 and A2.4, and is proved in § A2.7.

A2. 7. Length-biased sampling. Why the average length of the


interval in which an arbitrary moment of time falls is
twice the average length of all intervals for a Poi_on
proc888
In § 0.2 it was stated that if buses arrive randomly with an average. interval
of 10 min then, ifa person arrives at the bus stop at an arbitrary time, the
mean length of the interval in which he arrives is 20 min. Similarly, in
§ A2.4 it was asserted that the mean lifetime of adsorbed molecule-
adsorption site complexes in existence at a specified arbitrary moment of
time was twice the average lifetime. In each case this was explained by
saying that a long interval has a better chance than a short one of inoluding
the arbitrary moment, i.e. the interval lengths are 1101 randomly sampled
by choosing one that inoludes an arbitrary time, just as rods of different
length would, doubt1eea. not be randomly sampled by picking a rod out of
a bag containing well mixed rods. The long rods would stand a better chance

Digitized by Google
390 A~2 § A2.7
of beiDg picked. Sampling of this sort is described as ~ (see,
for example, Cox 1962, p. 65).
The specifying of the arbitrary moment of time constitutes the choice of
an interva.l (the interval in which the time falls) from the population of
intervals between events. Imagine that intervals are repeatedly chosen in
this way. What will their average length bet First, the distribution of their
length mm be found.

PM dwribution oj ifllen1al8 c1ao3en by lengIA-biaBed IIampling


One difficulty in deriving the required result arises because it is neceesary
to consider an infinite population of intervals. It will be much easier to
start off with a finite population. Imagine a finite set of N intervals, and call
the length of an interval (the ith interval) ' •. The total length of time occupied
by the intervals is thus 1)•. The fraction of this total time occupied by the
an II
ith interval will be

'.
I'.· (A2.7.1)
all II

H these fractions are added up for all intervals that are longer than some
specified length " the result is the proportion of time occupied by intervals
longer than ,:

,t," _ time occupied by intervals longer than, (A2.7.2)


I '. -
all I
total time

= probability that a point chosen at randomt


falls in an interval longer than , (A2.7.3)

=1-1'1(') (A2.7.4)

if 1'1(') stands for the distribution function of intervals ohosen by length.


biased sampling (defined as the proportion of intervals thus chosen with
length lea8 than the specified value, " so I-Fl(t) is the proportion with
length greater than t; see (4.1.4) and (5.1.4».
The crucial step, the equating of (A2.7.2) and (A2.7.3), certainly looks
reasonable. Another way of looking at it is to suppose that the probability
of choosing any particular interval is directly proportional to its length,
'I> so longer intervals are more likely to be chosen. The proportionality
constant must be chosen 80 that all the probabilities add up to I, because it is
certain that one interval or another will be chosen. The proportionality
constant is therefore 1/ I '. giving
all II

t i.e. a point choeen at random with the uniform (or rectangular) diatribution over
the interval 0, 1:'I.
all II

Digitized by Google
§A2.7 A~2 391
probability of choosing an interval of length '. is

constant x,. = ~.
~
'. '. (A2.7.5)
aU "

It follows, using the addition rule, (2.4.2), that the probability of choosing
an interval longer than , is found by adding these probabilities for all inter-
vals longer than' giving

probability of choosing an interval longer than' =


I'.
'~t , (A2.7.6)
.4 '.
aU'

which is exactly the same as found above, eqn. (A2.7.3).


Now suppose that in the finite population of N intervals, some of the
intervals are of identical lengths. There are /. intervals of length '10 say
(so l:./. = N). The time occupied by the /. intervals of length '. must be /"10
and the total time occupied by all N intervals must be I /,',. The proportion
aU I.
of the total time occupied by intervals longer than a specified value, t,
by modificating of (A2.7.2) (or A2.7.6» must now be written

time occupied by intervals longer than t


tot{l.l time
= probability that an interval chosen by
length-biased sampling is longer than' (A2.7.7)
== I-Fl (')
-
I/,t.
~ -
I PI'.
-=:!I.~>.:...'_____ (A2.7.8)
- I Itt. - I PI'.
aU '. aU I.

if P, is defined as/.IN, the proportion of intervals of length t, in the popula-


tion. The values of p. define the (discontinuous) distribution of interval
lengths tlo in the finite population under consideration.
It is now possible, at last, to revert to the real problem, in which there is
an infinite population of intervals and the intervals can potentially be of
any length, i.e. they have a continuous distribution (see Chapters 4 and 5).
AllthatisnecessaryistoreplaoeP. bydP =/(') dt {from (4.1.1». As described
in Chapter 4, dP is the probability that the length of an interval will lie
within the very narrow range between t and t+dt. When this is substituted
in (A2.7 .8) the summations must, of course, be replaced by integrations.

Digitized by Google
§A!.7

== probability tbat All iolenal ehOED by Ia:Igth biued


IIIDpIiDg ill ~ daUl I

-1-11 (1)

_ f.-4(')dI
(A2.7.9)
-!-"(I)dl°
1.0

.~.

1
Ii< 0.4

2 :1 4 5
"'.(duration of interval &8 a multiple of the mean of all intervala)

F J o. A2. 7 • 1. Distributions of the length of ra.ndom intervals. The ab8ciaaa


la plotted 88 in Flp. 5.1.1 and A2.7.2. The distribution of durations in the
population, I(t), is the exponential distribution, exactly 88 in FIg. 5.1.2. The
dlatrlbution of the lengths of intervals chosen by length-biased sampling,/l(t),
ab owe that relatively few abort intervals will be cboeen, and the mean interval is
twic e 88 long as the mean of the whole population. If the abeclsa& is multipJied by
A-I to convert it into time unite, the probability density would be divided by
A-I, 10 the &I'e& under the curv.
remNned 1'0.

Digitized by Google
§A2.7 393
For the exponential distribution of intervals in the population, which is
what we are interested in, substitute the definition of this distribution,
J(I) = ).a-A', in (A2.7.9). The integral in the denominator of (A2.7.9) has
already been shown in (A1.1.11) to be A-I. The numerator of (A2.7.9),
integrating by parts exactly as in (A1.1.11), is

[_le-A']~ _[e~AJ~= -0+Ie-A'-0+A-1e-A'

= (A- 1 +I)e- A'. (A2.7.10)


Substituting these results in (A2.7.9) gives
(A- 1 +I)e- A'
I-Fl(1) = A-I = (l+b)e- A' (A2.7.11)

as the proportion of intervals longer than I, when the intervals are ohosen
by length-biased sampling. Compare this with the proportion of intervals
longer than I in the whole population which, from (5.1.4) or (A1.1.12), is
I-F(I) = e- A'. The oumulative distributions are plotted in Fig. A2.7.2.

Phe proportion oj iftlenJal8 longer than Ihe mean intmKJZ


The mean length of all intervals in an exponentially distributed population
is A-I, as proved in (A1.1.1l). It was shown in (A1.1.13) that 63·21 per cent
of all intervals are shorter than the mean, A-I. Therefore 100-63·21 = 36·79
per cent of all intervals are longer than the mean length. The proportion of
time oooupied by intervals that are longer than the mean follows directly
from (A2.7.11) and (A2.7.9.), putting I = A-I, and is thus
(l+ll- 1)e-A.l-l = 2e- 1 = 0'7358, i.e. 73·58 per cent.
(A2.7.12)
Thus, although only 36·79 per cent of intervals in the population are
longer than the mean length, this 36·79 per cent occupy 73·58 per cent of
the time, and this is one way of looking at the reason for there being a greater
ohance of an arbitrary time falling in a long interval than a short interval.

Phe mean lmgIh oj an intmKJZ chosen by lmgIh.biaaed BtJmpliflf


The question posed at the beginning of this section can now be answered.
The probability density function (see § 4.1) defining the distribution of
lengths of intervals chosen by length. biased sampling follows from (A2.7.11),
usina (4.1.5), and is
d d
Jl(I) = dt F1(I) = ck£1-(I+b)e- A'l

= A~-A' (for t ~ 0). (A2.7.13)


This distribution ourve is drawn in Fig. A2.7.1, and compared with the
distribution ourve, J(I) = ).a-A', for all intervals in the population.

Digitized by Google
394 §A2.7

1·0

0·9

~
J 0·8
f5
S 0·7
1 0·632
go 0·6
a
~
§ 0·5 -------
~

1 0'4

.5 0·3

1 2 3 4 5
At (duration of interval as a mUltiple of the mean of all intervals)

FIG. A2.7.2. Cumulative distributions of the lengths of random intervals.


The distribution function, F(t), for the lengths of all intervals is exactly aa in
Fig. 5.1.1. The abscissa. is the interval length aa a multiple of the mean length of
all intervals, I.e. it is At aa in Fig. 5.1.1. If the mean length of all intervals were
A-1 = 10 s, the 1lgures on the abscissa. would be multiplied by 10 to convert them.
to seconds. The cumulative distribution, F 1 (t), for intervals chosen by length-
biased sampling, is seen to have more long intervals than there are in the whole
population, the mean being 21- 1 (i.e. 20 B in the example above).

Digitized by Google
§A2.7 395
The mean length of an interval ch088D by length-biased sampling now
follows from (Al.1.2), and is

E(t) = So" if1(t)dt = 1"


A¥e-A'dt. (A2.7.14)

To solve this, integrate by parts (see, for example, Massey and Kestelman,
(1964, pp. 332, 402», as in (Al.l.ll). Put u = fll, so du = 2' dt, and put
d" = X'e-A'dt, so " = IXlIe-A'dt = [-Ae-
A']. Thus

E(I) = l" f
ud" = [v,,]- tldu

= [_'2Ae-A']~ -So"(-Ae- A
') (2tdt)
= O+2So"'Ae- A
'dt

= 2A- 1 , (A2.7.15)
i.e twice the mean (A -1) of all intervals, as stated. In the evaluation of the
first term on the second line of (A2.7.15) notice that t2e- A,_O as'- 00;
see, for example, Massey and Kestelman (1964, p. 122). The integral on the
third line of (A2.7.15) is simply the mean of the exponential distribution,
shown in (AI. 1.11) to be A- 1 •

Digitized by Google
Tables

TABLK Al
N onparaf1U/lJrie confolau limiU 1M 1M tUtIim&
See ff 7.3 aDd 10.2. Raak the II obeervat.iol:w aDd take the rth from MCh ead . .
IimitA tWith _pies ....uer thaD II = 8, 95 per cent limits eumot be fOUDd,
bat the P value for the limite formed by the IarpBt uul .....n.t. (,. = 1) obeerva-
UoDa are given (Nair 1940).

Semple Pappnm. Papprox. Semple P approx. Papprox.


IIize 95 perC8D' "perC8D' IIize 95 per oem " per oem
II r lOOP r lOOP II r lOOP r lOOP

31 10 17-oe 8 11-_
2 It 50-0 32 10 98·00 9 11-30
3 It 76·0 33 11 96·60 I II-M
87·6 Sf 11 17·61 10 11-10
"6 It
It 91·76 36 12 96·90 10 11-40

8 1 88·88 38 12 17-12 10 II-eo


7 1 88·44 37 13 96·30 11 II-M
8 1 99-22 1 99-22 38 13 96·M 11 11-60
9 2 88·10 1 99·eo 39 13 17-82 12 11-08
10 2 17·88 1 99-80 40 14 96·18 12 "·38
11 2 98-82 1 99·90 41 14 I7-M 12 99-62
12 3 88·14 2 99-38 42 16 96·M 13 "·20
13
14
3
3
97·78
98·70
2
2
".-
99-82
43
44
15
18
96·84
96·12
13
14
99-48
11-04
16 4 96·48 3 99·28 46 18 96·44 14 II-Sf

18 4 97·88 3 99-68 48 18 97·42 14 "-M


17 6 96·10 3 99·78 47 17 96·00 16 99·20
18 6 96·92 4 99·M 48 17 97·08 16 19·44
19 5 98·08 4 99-68 49 18 96·58 18 11-08
20 8 96-88 4 99·74 150 18 96-72 18 II-Sf

21 8 97-34 15 99-28 51 II 96·12 18 "·M


22 8 98-30 6 99·58 52 19 96·38 17 99-22
96·M 6 19·74 53 19 97·30 17
23
24
215
7
7
8
97-74
915·68
8
8
99-34
99·eo
M
155
20
20
96·98
97·00
18
18
"""
99-10
99-38

28 8 97-10 7 99-08 68 21 96·80 18 "·M


27 8 98-08 7 99-40 157 21 96·88 19 99·M
28 9 98-44 7 99·82 158 22 915·20 19 99·48
29 9 97-58 8 99·18 69 22 98·38 20 99-14
30 10 96·72 8 91-48 eo 22 97·28 20 99-38

Digitized by Google
Table8 397

Sample P approx. Papprox. Sample P approx. P approx.


size 915 per cent 99 per cent size 915 per cent 99 per cent
n r lOOP r lOOP n r lOOP r lOOP

81 23 98·04 21 99·02 71 27 98·80 215 99·14


82 23 97·00 21 99·28 72 28 915·158 215 99·38
83 24 915·70 21 99·48 73 28 98·158 28 99·04
84 24 98·72 22 99·18 74 29 915·28 28 99·30
815 215 915·38 22 99·40 715 29 98·30 28 99·48

66 25 98·44 23 99·08
67 26 915·02 23 99·32
88 26 98·18 23 99·150
89 28 97·08 24 99·24
70 27 915·88 24 99·44

Digitized by Google
TABLL A2
Confidenc.e limits for the parameter of a binomial distribution, i.e. the
pD1y,,s,I,ltion 'kkUCU~,s,yy~,s,'

See 7.7, 7.8, 10.2 and 8.2-3.4. If r 'successes' are observed in & sample of
n 'L~m&ls't con%fudnnce TIh',lite 100 from ~hns. ,1) (7.7.2))
for the propontjon of'succe86ai' in popuuukk2on t 8.z) from kkf&:ktch
sample was drawn, can be found from the table. Reproduced from DocumBtlta
SC'i",u2&:~ftc T,dt£u~kk, 6tb edn bh z,ermi~u2n of z, R. 8. Baslu, bwitzL5kk~
The (ttdgy t~b1ee giuu limitu (or &ll from to 10(z).

r 100r/ft 1001L lOO1u lOOIL lOO1u


ft-9
0-00 o 2N222
1 60'00 1 11'11
2 100·00 2 22·22
S 8S'S8
8 4 %2%2'%2%2
6
o 0·00 6
1 SS'SS 7
2 &e·87 8
:i 200·00 9
n-4
o 0'00 ft - 10
26'00
:i 60·00 o
:i 76'00 1
4 100·00 2
8
6 4
6
:i 0·00 0'00- ,*,H8 6
1 20·00 0·61- 71·M 7 70'00
2 40·00 6'27- 86'S4 8 80'00
13 GO'OO 2'·M- 13,H8 9 90'00
:i 80·00 28·SI1-- 2&:2'49 20 12YHYi
22 200·00 27-112-122&:200

n - 11
-,---,- --~'-----

0·00
1 18·87
2 88·88
8 60·00
, &e·87
" 88·88
:i 100-00

0·00
14·29
28·67
42·86
67-14
71·43 -1st
86·71
7 100'00
ft-8
kk 0·00 0'00- ,Y'94
1 12·60 O'S2- tsts'66
2 26·00 S'l~ 66'09
8 S7'6O 8'62- 76'61
, 60·00 U'i'7D- tkk'SO
n 82·60 Y4"~ tl2'48
n 76·00 13"91- kkt'81
7 87'60 47'S6- 99'88
8 100·00 88·011--100·00
Tables 399
95% limits 118% limIu 115')(, limits 011% limIu
, 1GOr/ft lOO81"L l00'u lOO'L lOO81"u , IGOr", lOO81"L 10081"u lOO81"L lOO81"u
---- "---
ft - 18 ft - 17 (continued)
o 0'00 0'00- 2',71 0'00- 38'47 5 29'41 10,81- 55'116 8'117- 63·10
1 7-611 0'1~ 88'03 0·04- "'110 6 35'211 14,21- 61'67 10'14- 68,'6
2 16'38 1·\12- '5,'5 0·83- 6HO
,
8
6
23'08
30'77
38,'8
5'04- 53·81
II'~ 61,'3
13·88- 68"2
2'78- 62·08
5'71- 611-13
11,'2- 75"6
7 'H8
8 47'08
II 62''''
18''''
2\1'118-
27'81-
67'08
72-111
77'02
18'71- 78·"
17'84- 78'07
21'112- 8\1·86
10 58'82 32'\12- 81'56 26'56- 86·\111
6 '6-15 111·22- 7',87 18·83- 81-13 11 114'71 38·83- 85'711 81'6'- 811·86
7 53'85 25-13- 80'78 18·87- 88'17 12 70'611 "'04- 811·611 86'110- 118·03
8 61·114 31'58- 88'14 2',54- 110·58 13 76,'7 60'10- 98'111 '2'88- 115'7'
II 611'23 88'57- 110·111 30'87- "'·211 l' 82·85 56'57- 116'20 48'\1&- 117'111
10 76·112 ,6-1 ~ ""116 37·114- 117·22 15 88·\14 63'56- 118'114 55'87- 118·37
11 84062 114'56- 118'08 '5·110- 118'17 16 ""12 71'81- 118'85 68'70- 118'117
12 112'81 63·117- 118·81 55·10- 118'116 17 100·00 8O"~100'00 78·22-100-00
18 100·00 75'2~100'00 66,53-100.00
.-18
ft - I t
0 0-00 0'00- 18'58 0·00- lUi'60
o 0'00 0·00- 23'16 0·00- 81'61 1 6'56 0'1'" 27'211 0·03- 14'68
1 7-1, C)-I8-88'87 0'04- 411'40

,
2
8
1t·211
21-48
23·57
1·78-
"86-
411·81
60·80
8'8~ 58·10
0'76-
2'67-
6'26-
51-l18
58'112
85'711
,
2
8
5
11-11
16'67
\12'\12
27-78
1·88-
8·68-
.'1-
14'71
'1'411
'7'114
II'~ 68'68
0'5~ 411'17
1'117- 48''''
,-00- 114·112
6·6'- eo'56
6 36·71 12·76- 114'86 8'86- 72·01 8 88'88 18'84- l1li001 11·61- 66'79
8 '2·86 17·86- 71-1' 12'87- 77'66 7 88'811 17·80- M·lUi 12·8'- 70068
7
8
II
60·00
57-1'
114·211
23·04- 78'116
28'86- 8\1·14
35-1... 87·\14
17'24-
22·84-
27'~
82'76
87'88
111·14
8
II
10
"."
60-00
56·56
21'53- 611·\14
28002- 78'118
80-76- 78"7
16"~ 76-26
20-47- 711·68
\14'7... 88'61
10 71"8 '1'110- 111·81 14'21- ""7' 11 61'11 35'7ft- 81'70 211·82- 87018
11 78·57 ,".\10- 115·14 41·08- 117"8 12 66'87 40.... 86'66 14·21- 110,'"
12 85·71 5N~ 118'22 48'77- 011·\14 13 72'\12 '8'62- 110'81 811"ft- 98'46
18 112'86 66'13- 118'82 67-60- 011'116 It 77'78 52'86- 118·l1li 411-08- Il6-oO
It 00'00 78'8'-100·00 68"~100'00 15 88'88 58'58- 116·41 61-16- 98-08
16 88·811 66'\10- 98·8\1 67'88- 118"1
ft - 15 17 ",." 72'71- 118'86 66'87- l1li·97
18 100'00 81-47-100'00 76-60-100-00
o 0·00 0·00- 21'80 0'00- 211'78
1 8'87 0'17- 81'116 0'03- 40'18 • -19
2 18·88 0,71- 68'68
,
8
6
20'00
28'87
88·88
1·86-
"83-
40-'8
68-011
7'~ 65-10
11'82- 81·8\1
2'8~ 68'06
"88- 82-78
8·01- 68'82
0
1
0·00
6'28
0-00-
0013-
17'66
28·08
0'00-
0'03-
\14·14
8B-11
2 10'68
6
7
8
40·00
'8'87
58·88
18·84- 87'71
21'27- 78·41
28'6~ 78·78
11·70- 74'311
15'87- 7"""
20'51- 8H8
,
8
6
16'711
21'06
16'81
1·80-
8·88-
8·or.-
II-lft-
88·1'
811'68
411·67
61'20
0·66-
1'86-
8·78-
6-17-
40'87
46'81
62·71
58'18
II eo·OO 82'2~ 88'66 25·81- 88'80 6 81'58 12'58- 56'55
10 66·87 88·88- 88'18 81-18-111'118 8'116- 68'\111
7 88·'" 1.2~ 81'114 12'07- 68-011
11 78·88 "'110- 112·21 87·27- 115'12 8 '2·11 2O.2ft- 66·60 IH~ 72·eo
12 110-00 61'111- 115·87 '8·116- 117·81 II '7'87 \I4.,ft- 71-1' 111'1~ 76·'"
18 86·87 511·6'- 118·14 61·87- 118'211 10 62'68 28'86- 76-66 28'16- 80·81
It 118·88 68-06- 118·88 611·8'- 011·117 11 67·811 88·60- 711'75 2No- ""61
16 100·00 78·\10-100-00 70·24-100·00 12 63'16 88·86- 88'71 81-111- 87-118
18 68'411 '8·4ft- 8N2 88,71- 111006
.-18 It 73'68 68'80- 110'85 '1'82- 118'88
15 78'116 114,'3- 118'95 '7'2~ 116·\12
o 0·00 0·00- 20·511 0'00- 23'111 16 ""21 eo"2- 116'8\1 68'18- 118·1t
1 8·25 0'18- 30'23 0·08- 88·1' 17 811-47 66·86- 118'70 69'63- l1li'"
2 12·60 1·56- 88·85 0·87- '8'28
,
3
6
18·75
25·00
31·25
"06- '5·85
7-27- 52·88
11·0\1- 58'66
2·23-
'·5ft-
704ft-
58'"
611·111
85·85
18
111
""7'
100·00
78,117- l1li'87
8I·Br.-l00·OO
66'8~ l1li'117
76'86-100'00

6 87·60 15·\10- 114·67 10·86- 71'3\1 . - 20


7 '8'75 111'7ft- 70'12 14'71- 78·38 0 0·00 0-00- 16·'" 0·00- 23·27
8 60'00 2"8ft- 76'35 18'117- 81'08 1 6·00 0·13- \14'87 0·03- 81·71
II 68·25 211·88- 80·25 23·82- 85'211 2 10'00 1·\13- 81'70 0·53- 88·71
10
11
12
82·60
68·76
75-00
8H3- 14·80
41·84- 88·118
47·82- 112'73
28'88-
M'Ift-
811·1t
112'66
40'~ 96,'6
,
8
5
16-00
20·00
25-00
1-21- 87-811
6'73- 48'66
1'78-
8'58-
6'83-
"'115
60'66
66'118
18 81·25 M'3ft-1IH5 46'66- 97·77 8'86- '"'10
6 80-00 11'8~ 114'28 8,'6- 80-116
l' 87·60 61·8ft- 118'411 58'72- 118'88 7 35·00 16'8~ l1li'\12 1l-8~ 66'66
15 98'76 611'77- 118·'" 61·86- 118·117 8 40·00 111·12- 68'116 14060- 70'011
16 100-00 79'41-100,00 71'81-10CHl0 II '5'00 23-06- 68"7 18'06- 7"28
10 60'00 27·\10- 72'80 21-77- 78'28
• -17 11 66'00 81'53- 78'''' 25'72- 81''''
12 CID-OO 88'0ft- 80-88 \111'91- 85'40
o 0000 0'00- 111'61 0'00- 28'78 18 66-00 40·78- "'·61 8',84- 88'61
1 5'88 0'1ft- 28'611 0-08- 88'30 It 70-00 411'72- 88'11 811004- IIl·M
2 11'76 0'63- 'H8 16 76-00 6D-IIG- 111'14
,
8 17'66
28·68
1'46-
8'80-
6,81-
88'"
41'48
'"·110
2'~ 61-«)4
"\16- 67-82
16
17
8O-DO
86-00
68·.... ""27
8\1'11- 116'711
"'02-
'"·....
65-Dr.-
""17
116·41
98·14

Digitized by Google
400 TtJblu

r IGOr/_ I~Ll~v lllOrL 1~1l r IGOr,. lOO1'L lOO1'v lOO1'L lOO1'v


_ - 20 (contIDaed)
----
_-M
18 l1O-OO 88'ao- 08·77 81·.... l1li,'7 0 0-00 0-00- 14. . 0-00- 1.81
111 116-011 7!H1- l1li'87 88·.... IMI-t7 1 '·17 CH1-1t'11 0-01- 17-11
10 100-011 as·1IHOO-OII 78'71-100-0II I 8'as 1-G1- 17-011 0-..... as-u
_ -11 ,
I
6
11·50
18·87
2O·as
1 .... 11·18
'·7.. 17·18
HI- 41·16
1·46-
I .....
"79-
18·71
41·79
48·66
0 0-00 0-00- 111-11 0-00- 11·10 8 16-011 .77- "·n e-tI- 51-06
1 '·78 0-11- 21·st 0-01- 10-41 7 &17 12-111- 61-011 •• 67·11
I .61 1-17- 10-18 0-50- IH8 8 as·as 16·11S- 66·ft 11·88- 8...0
8 14-111 3~ 18-14 1·118- '.·11 II 87'50 18·80- 6e-U 14·ar.- 86·10
4 18-06 6·4r.- '1-91 3'8~ 48'78 10 11,11- 81'18 17'~ ...,.
6·61- 63·91 41·87
6 28'81 8·11- '7-17 11 '6·81 16,6(,- 8H8 570- 71-111
8 28·67 11·28- 62'111 8-01- 68·78 12 50-00 211,12- 70-88 21.... 78-IM
7 33'as 1"6~ 611-97 10-7&- 113·37 I. 64·17 32·82-- 7',46 27-l1&- 7e-1O
8 18·10 18,11- 81·511 1.'81- 87·72 14 68·33 18·.... 77-811 30·11&- 82,'1
II 41·88 21·81- 86·911 17-07- 71·86 16 82·50 to·!)~ 81·20 14·70- 86·S6
10 '7·82 26,71- 70·22 20'6(,- 76-78 18 88'87 ".es- 84·37 18..... 88'11
11 61·18 2D'7&- 7',211 24·24- 711'" 17 70'81 48'D1- 87·18 41.... DO·70
11 6H' 14-011- 78-18 28·Ir.- st·. 18 76-011 51·.... DO·21
II 81-90 18·..... 81·811 32·28- 88·111 " .... l1li-08
111 711·17 67·sr.- 91·87 61·tr.- 116·21
14 88·87 U-OS- 86·41 18·11S- 811·22 20 81'as 111·81- 116·18 611,21- 117-06
16 7.... '7,82- 88·72 '1·11- 111'l1li 21 87'50 87·.... 117·14 11-117- 08·M
18 78·111 62·81- 111'78 '8-0&- 114,'7 22 Dl·87 73-00- 08·117 88·78- l1li'611
17 SO·1I6 68'~ 114'66 61·24- 118·81 21 116·81 78·88- l1li-811 7I·7&- l1li_
18 86·71 81·88- 118'116 611·7&- IIMII M 100-011 86·7r.-100-011 8o-1~100-011
111 DO·48 89'81- 118'as 82·81- l1li·50
20 116'24 78·1&- l1li·88 89,67- l1li·98 _ -16
21 100-011 81'8~100-011 77'7CHOO-OO
_ - 22 0 0'00 0-00- 11·71 0-00- 111'10
1 4-011 0'1~ 2O'S6 0-011- 18·18
.- 2 8'00 o·gs- H-OS 0-41- ft'10
0
1
2
0-011
'·66
11-011
0-00-
0'11-
1-11-
16'"
11'84
211·18
0-00-
0·011-
0·48-
21'to
2II·M
.6'77
,
8 12'00
18'00
1·66--
"64-
81'22
88-08
1'~ 17'4.
2·82-- d'S6
,
8
6
18·M
18'18
22·78
1'111- 14·111
!H~ to'28
7·82-- 46·37
1'80-
S·2I-
6·18-
'1·81
"_
62-01
6
0
7
20-011
M·OO
28'00
0·81-
11'88-
12-07-
to'70
"'lS
,,·n
"6~ "·DB
O'IIS- 61·88
801111- 66'68
10·71- 50·22 7,81- 68-74 8 32-011 14·er.- 68'50 11·ar.- 611·61
0 17·27 II 38'00 17'117- 67·48 18'011- 81·S6
7 81'82 lS·88- M'87 10·24- 81·21 10 to-Oll 21-11- 01·88 18'7~ 87-02
8 88·38 17·.611·14 1"'~ 86,'11
20·71- 88'86 10'1&- 89·M 11 "'00 M'~ 86-07 111-7.. 7o-M
II to'1I1 12 48'00 27·80- 88·89 22·81- 78_
10 ",'6 24'S~ 87·7D 111·48- 78'to 13 62'00 81,81- 72'20 18'07- 7H7
11 50-011 28,22- 71-78 22·91- 77'07 U·DI- 76'80 211·46- 8o-H
32,21- 76·81 18·80- SO·M l' 511'00
12 M·66 16 80'00 38'87- 78·87 32'11&- 81·21
1. 6D'0II 36·3r.- 7D·2D 30·48- 81·st 18 42·61- 81·03 88·ar.- 88·01
to·88- st·so 84'61- 88'DO M·OO
14 88·M 17 88-011 "'50- 86-06 to·48- 88·86
16 88·18 '!HI- 88·14 88·77- 8D'78 18 60-81- 87·l1li
'D·7&- 8D·27 48·28- 112·8D 72-011 ""7- DB1
18 72·7. ID 78-011 M'87- DO·M 48·.... 08-87
17 77'27 M·81- 112'18 41·011- 114·74 68-011- 06'41
60·71- 114·81 6S-01- 118·77 20 SO-Oll 6II'ao- 08'17
18 81·82 21 84'00 83-91- 06'40 57·ar.- 9H8
10 88·18 86'~ 97'011 58'8~ 98'to
70·84- 98·88 M·2I- l1li'52 22 88-011 88'7&- D7'" 111'67- 08'80
20 DO·91 28 112-00 78'117- l1li-02 87·. l1li·68
21 06,'6 77-18- l1li'88 70·78- l1li'88 24 118·00 79-86- IIII·DO 78·82-- l1li·98
22 100'00 84'68-100'00 78'80-100-00 26 100-011 88·28-100-0II SO~l00-Oll
_ - 28 _ - 28
0 0'00 O·~ 14'82 O·~ 20'68
1 0,11- 21'96 0·011- 28'14 0 0-00 0-00- 1.·28 0-00-18'"
"S5 1 8·86 0'1~ 10'M 0-01-26·1e
2 8'70 1,07- 28-06 0·48- 84'48
,
8
6
11·04
17'311
11·74
2·7&-
4·Dr.-
7'48-
33'5D
38·78
'8·70
1·51-
3'OS-
5·011-
to'll
'6'84
60-22
I
,
8
7'89
11·M
15·38
o·er.- 26'18
I·tr.- 10'15
"88- 84'87
0'41-
1·...
.1-06
88·11
2·n- '1-011
8 18·011 10·21- 48·41 7·2r.- 54'as 6 19'28 8·6r.- 811·86 "~"'50
10". 18·21- 1i2·91 D·7.. 5D·ll 8 13'08 8,97- '8·86 8·ar.- 'D'77
7 7 18·111 11-117- n'7D 8·61- 68·.
9 84'78 18·88- 67·27 11·46- 83·38
89'18 ID'71- 81'" 16·S7- 87'36 8 80-77 14'88- 61'70 10,87- 67·76
D 9 1"81 17'11- 66·87 18·88- 81'50
10 '8,'8 13'1~ 86'61 18·48- 71-18
11 n·as 28·81- 89'41 21·78- 74·711 10 88'" 20·21- 60'48 18-Gr.- 86·10
11 61'17 3O'5~ 78-18 16,21- 78·24 11 '2'81 28·8r.- 88-08 18·88- 88·67
68·62 S4"~ 78·81 28·84- 81·62 12 48-16 2.~ 88'88 21,81- n-91
18 13 50'00 211·91- 70-07 24'~ 76-11
14 80'87 88'64- SO·29 82·.... 84·88
42·71- 88·82 88·81- 87·M 14 68·86 88,87- 78"1 28'~ 78-10
16 86'22 16 67'89 S8·111- 78'86 8...1- 81-1'
18 8D·67 n·os- 88'711 '0'7~ DO·28
17 78'D1 51'6~ 89·77 46-17- 111·76 18 81'M to·67- 79'77 14'.81'116
18 78'28 58'ao- III·M ,11-78- 114'98 17 86·18 "'81- 82·70 18·50- 88'.
1D 82·81 81·21- 96-06 M·88- 118·111 18 89·28 48'21- 86'87 42·2r.- 8e-18
20 88·118 88'41- D7'22 69·88- 98'41 10 78'08 61'21- 88'48 '8-1r.- 0...8
01·30 71·98- D8'DS 86·64- IIII·M 20 78'111 68·ar.- 91..,. 60-21- 08-U
21 21 SO'77 1IO·8r.- D8·" M·50- 06..,
22 96·86 78·0r.- 1III'8D 71'88- l1li·98
13 100'00 86'18-100'00 79·41-100-00 II 84'. 86'11- 116'M 6D-GO- 117-111

Digitized by Google
1IO,,1bDlta
,
116" IbDlta 110" IbDlta
r l00r/tl l00gtL l00gtv 100gtL l00gtv l00r/tl l00gtv l00gtv l00gtL l00gtv

..
It - 10 CooaUnued) It - III CooaUnued)
18 88'" 011·86- 117066 08·711- 118_ 11 87'118 10'011- 67-7' 10·00- 08·10
111'81 7',87- 110-06 OS'1I6- 110·611 11 '1·88 18·6&- OHIO 111'18- 00·88
16 110'16 SO'8&- 110'110 7',71- 110·118 18 "·88 ZO'66- N·81 11'111- 011·"
10 100'00 80·77-100·00 81·6&-100-00 l' "·18 n'66- ON7 "'011- 7...8
16 61-71 81·68- 70'66 17'67- 76'81
" -17 10 66-17 86'011- 78·61\ 8O-M- 78'011
0-00 0-00- 11'77 0'00- 17·81 17 68'01 88·114- 70·" 88'0-.. SO'77
0 18 01-07 d·1&- 711-81 SO·SO- 88·M
1 8·70 0'08- 18'117 0·. . .·" 111 06·61 66'07- BI-OO .o-oe- 86'SO
I 7,'1 0,111- 1"111 0'811- 8O-ot
, 10 OS'1I7 '11-17- M·7I '8"11- 88·16

.
8 11-11 1·86- n'18 1·111- 86-o? 21 71"1 61'7&- 87·17 41·01- IIO·SO
U'81 HII- 88'78 I'eo- 811·78 22 76'80 I\O.,&- 811'70 60-117- III·"
6 18·1\1 11·80- SS-08 ,.lIS- "·11 23 711'81 00'18- 111-01 6,",&- 114·86
8 11'11 8'81- d'lO 11'10- ts·18 82·70 N·lIS- 114'11; 68·ta- 110'08
7 16·118 11'11- "·18 8·17- 61·111 16 80'11 OS'M- l1li-11 Ol·eo- 117·68
8 211·08 18'76- 60·18 10,,1- 1\0·08
II 88·88 10·62- 1\8·110 11'sa- 611'76 10 811'00 71'06- 117'81 07-0&- 118·110
10 87-ot II1'to- 67·08 16'88- 08'18 27 118'10 77'18- 110·16 71·77- IIO'N
18,07- 00'l1li 18 110'66 BI'U- 110·111 77iN- 110'118
11 .0'7' 11'811- 01'10
11
18
U
"."
"'16
61'86
16·t&- N·1I7
18'117- OS-06
81'116- 71'88
111-88- 011·118
23·81- 78-1'
10·8&- 711-111
n 100-00 88-0&-100-00
It -10
88·80-100-00

16 66·1\0 86'sa- 7',61 80·.711'11 0 0-00 0-00- 11-67 0-00- lO·ll1


10 611·10 88'SO- 77·01 88'81- 81'" 1 8'88 IHI8- 17·11 0-0&- 11'17
17 01·110 d·87- SO'OO 80·72- M'OI I 0·07 o-s-.. 11-07 0·86- 17'.0
18
111
10
00·07
70·87
7'-07
"'M- 88"8
'"'8&- 811·16
1\8·72- 88'811
.0.16- 8N7
"'91- 811'68
41'74- 111·88
,8
6
10-00
18'18
10'07
1-11- 10·1\8
B-1&- 80-71
6'M- 34'71
H&-
2'sa-
B·7&-
81-08
8O'M
.0'.0

..
11 "77'78 67'74- 1I1'SS 61·72- 118·110 0 10·00 7-71- 88'67
22 81'ts 01'91- 118'70 66·811- 116-77 6·66- "'IS
18 86'111 00,17- 116·81 00·17- 117·.0 7 1S·88 11·118- d·18 7·111- '7'110
88'811 70'M- 117'06 N'1I8- 118·71 8 10'07 11·18- 66'l1li II..... 61·1\0
16 111·611 76·71- IIO-GII 011·116- 110·01 II 80'00 1"78- '"•.0 11"&- 66-01
III 76·M- 110'118 10 sa'88 17'111- 61'81 18,07- 68'34
I18'SO 81·08- 110'111 11 80·07 111·118- 60·1' lOiN- 01-67
17 1011-00 87-1IB-l00'00 BI'I8-100-OO 11 .a-oo 11'00- 69-.0 18·1\0- N'70
,,- IS 18 "·18 16'.01'67 11-07- 07-78
U "'07 IS'M- 06·07 18-78- 70-07
0 0'00 0'00- 111·8' 0-00- 17'" 16 60'00 81'80- OS'70 1O·t&- 78'61
1 8'67 0'08- 18·86 0·.1S·01l 10 68'sa M·as- 71_ n·sa- 70·17
I 0·88- 88·60 0'88- n'l1 17 1\0'07 87·ta- 7"U 81'17- 78'"
,
8
6
N'
10'71
U'III
17'80
2'27- 18·18
4-08- 81·07
11-0&- SO'811
1·16- sa'lIO
2-61- 88·1\8
'-07- d'SO
18
111
10
00-00
OS·88
00-07
.o-eo- 77·M
'8'8&- SO-o?
'Nil- BI·71
86·80-
88·ta-
'1'00-
81'60
88·118
8O·as

..
0 11"8 8'80- .0·116 6'8&- ""87 21 70-00 6O.eo- 86'17 ,,-l1li- 88'68
7 16'00 10'1111- "·87 7'8&- 60·70 II 78'sa U'11- 87·71 "·.... IIO·n
8 18·57 18·22- "'117 10·. U"II 18 70·07 67'72- IIO-o? 61-01- lII·n
II 81'14 16'88- 62'86 11·a... 68·08 SO-OO 01·ta- 111·111 61\.72- 114·61\
10 86·71 18'Ot- 66'" U'77- 01-66 16 88·sa 06'18- 114·80 611·eo- 118·11
11 811·n 11·1\0- 611"2 17-88- N'IIO 10 80·07 011'18- 118·.. OS·OO- 117·07
11 d'811 ...,&- 01'81 10-0&- OS'U 27 110'00 78"7- 117'811 07'117- I18'M
18
U
16
"."
60-00
1\8'67
17'61- 00'18
80-06- 011'86
88'87- 71,'"
11'82- 71-10
16'72- 7"1S
1S'74- 7N8
18
III
80
118·88
118'07
100·00
77'118- 110'18
BI·7&- 110'111
88·ta-l00-OO
7I.eo- 110'06
77-78- 110-118
88'81-100-00
18 6N' 8N8- 76'U 81'8&- 711'118
17
18
111
00'71
N·n
07'80
.0.1\8- 78·60
"'07- 81·80
'7'06- M'11
86'10- BI'1I7
88"6- 86'1S
'1·91- 87'OS 0 ...- ,.,.
,,.,,-
,·tHI
" -1000C~)
,." ,...- ,..,
10 71·ta 61-88- 811·78 '6·61- 811'118 1
"1'
,." ,·OJ- ,." "'1-
"10- "'4
21
12
18
76·00
78·67
81·U
66-18- 811·81
611·06- 111·70
OS'11- 118·114
'"·14- 111'14
68-18- 114'14
67-10- 116'118
2
8 ,." "11-
,6 ,." '41.- ',8'1
141.
,·u
',01- 1·09
'·''1-1·111
I' 86·71 07'sa- 116'117 01'67-1170411 0'60 0-1&- 1'17 o-oe- 1'811
16 811·211 71-77- 117-78 00'01- 118'76 II 0'00 0·22- 1'81 0'14-1'U
10 111·80 70·1\0- 110'12 70'811- 110·01 7 0'70 0'18-1·" 0'18- 1·011
27 118"8 81'06- 110-91 70'81- 110·118 8 o-SO O'M- 1·68 o-U- 1·88
IS 100'00 87'00-100-00 81'7&-100·00 II 0-110 o·u- 1·71 0·111- 1·117
10 1-00 O-t&- I'M 0·86- I'll
11 1'10 0·66- 1'117 0"1- 1·16
" - n 12 1'10 0-0-.. 2·011 O·t&- 1·88
0 0'00 0-00- n'M 0-00- 10'70 18 1'80 0'011- 2·11 O'M- 1·61
1 B"6 o-oe- 17'70 0-.11·118 U 1'.0 0·77-1'34 o·eo- 1·06
2 0'110 0'86- 11·77 O·B&- 18·18 16 1'60 o·at- 1-'7 0·07- 1'78
,
B
6
10'M
18'711
2'111- 17-86
8'118- 81·00
6'86- 86·77
I'ZO- BI·1I8
2·&1- 87·.0
10
17
18
1'00
1'70
I'SO
0'91- 2'611
0'l1li- 1·71
1·07- I'M
0-74- 1'111
0'81- B-ot
0'88- B'17
17'" 8·91- '1-67
0 10'011 7'l1li- 811'72 6'06- '6'U 111 1'110 H6- 1'118 0'116- 8'80
7 "'14 10'80- ta·u N&- ,",as 10 1-00 1·.... B-08 I-G1- 8'd
8 17-611 11'78- '1'" lI'Ot- 61'110
II 81-08 16'18- 60-88 11·86- 1\0'61
10 h·" 17'114- U'88 u·zo- 611'111

Digitized by Google
TABLlI A3
PM W ilrozon tut lor two intk~ BGmplu
See § 9.3. The sample sizes are "t and ":I. If the sample sizes are not equal "t
is taken 88 the amaJJer. If the rank sum for sample 1 (that with fit ObeervatiODB)
is equal to or laB than the amaJJer tabuJated value, or equal to or peater than the
larger tabuJated value, then P (two tail) is equal to or laB than the figure at the
head of the column. If the null hypothesis were true P would be the probability
of ob8erving a rank sum equal to or greater than the larger figure, or equal to or
laB than the amaJJer. If one or both samples contain more than 20 obeervatioDB,
use the method deBcribed at the end of § 9.3. M. I. Sutclilre'. table reproduced
from Mainland (1963) by permiBon of the author and publiaher.

P (appros.) 0.01 p (appros.)

At tis 0·10 (H)6 0·10 At tis CHO 0-06 0-01

2 4 3 18 16; 61 13; 63 8;68


6 3; 13 19 18; 63 13;68 9;80
8 3;16 20 17;66 14;68 9; 83
7 3; 17
8 4;18 3;19 4 4 11; 26 10; 28
6 12; 28 11; 29
9 4; 20 3; 21 8 13; 31 12; 32 10; 34
10 4;22 3; 23 7 14; 34 13; 36 10; 38
11 4; 24 3; 26 8 16; 37 14; 38 11; 41
12 6; 26 4; 28
13 6; 27 4; 28 9 18;40 14;42 11;45
10 17; 43 16;45 12;48
14 8 ..18 4;30 11 18;48 18; 48 12; 62
16 8;30 4; 32 12 19;49 17; 61 13;66
18 8; 32 4; 34 13 20;62 18;64 13;69
17 8; 34 6; 36
18 7; 36 6; 37 14 21;66 19;67 14; 82
16 22;68 20;80 16; 86
19 7; 37 6;39 3; 41
20 7; 39 6; 41 3;43 18 24;80 21; 83 16; 89
17 26; 83 21; 87 18;72
3 3 8;16 18 28; 88 22;70 18; 78
4 8;18
6 7; 20 8; 21 19 27; 89 23; 73 17; 79
8 8; 22 7;23 20 28; 72 24; 78 18; 82
7 8; 26 7; 28
6 6 19; 38 17; 38 16;40
8 9; 27 8; 28 8 20;40 18;42 18;44
9 10; 29 8; 31 8;33 7 21;44 20;46 18; 49
10 10; 32 9;33 8;38 8 23;47 21; 49 17; 63
11 11;34 9; 38 8; 39 9 24; 61 22; 63 18; 67
12 11; 37 10; 38 7;41
10 28; 64 23; 67 19; 81
13 12; 39 10; 41 7;44 11 27; 68 24; 81 20; 86
14 13; 41 11; 43 7; 47 12 28; 82 28; 84 21; 89
16 13;44 11; 48 8;49 13 30;86 27; 88 22; 73
18 14;48 12;48 8;62 14 31; 89 28;72 22; 78
17 16;48 12; 61 8;66

Digitized by Google
p (appros.) P (approS.)

"1 fIa 0·10 0·06 ()OOI


"1 fIa 0·10 0·06 0·01

15 33,72 29,78 23,82 8 11 59,101 55; 106 49,111


18 34,78 30,80 24; 88 12 82,108 58,110 51, 117
17 36; 80 32; 83 25,90
18 37,83 33; 87 28;94 13 04;112 80; 118 53, 123
19 38; 87 14;91 27,98 14 87,117 82, 122 54; 130
15 89, 123 65; 127 58;138
20 40; 90 35; 96 28; 102 18 72; 128 87; 133 58,142
17 75;133 70; 138 80; 148
8 8 28;50 28; 52 23; 56
7 29; 56 27; 57 24; 80
18 77; 139 72; 1" 82; 1M
8 31;59 29,81 25; 65
19 80,1" 74;150 04, 180
9 33,83 31; 65 28; 70 77; 155 88; 188
20 83; 149
10 36,87 32;70 27,76

11 37; 71 34,74 28; 80 9 9 88,105 82; 109 68; 116


12 38; 78 35; 79 30; 84 10 89; 111 65,115 58; 122
13 40,80 37,83 31; 89 11 72; 117 88; 121 81; 128
14 42; 84 38; 88 32; 94 12 76,123 71; 127 83; 135
15 ",88 40; 92 33; 99 13 78;129 73; 1M 65;142

18 48,92 42; 98 34; 104 14 81;135 78; 140 87; 149


17 47; 97 43;101 38;108 15 84; 141 79; 148 89; 158
18 49; 101 45,106 37; 113 18 87; 147 82,162 72;182
19 51; 106 48; 110 38; 118 17 90;153 84,169 74,189
20 53; 109 48,114 39;123 18 93;169 87; 165 78; 178

7 7 39; 88 38; 89 32,73 19 98; 165 90; 171 78; 183


8 41; 71 38; 74 34; 78 20 99; 171 93,177 81; 189
9 43;78 40; 79 36; 84
10 46; 81 42,84 37,89 10 10 82,128 78;132 71; 139
11 47; 88 ",89 38; 96 11 88; 134 81;139 73,147
12 89; 141 84;148 78; 1M
12 49; 91 48,94 40, 100 13 92; 148 88,152 79; 181
13 62; 96 48; 99 41;108 14 98;1154 91; 159 81; 189
14 154; 100 60; 104 43, III
15 58,106 62;109 ",117 15 99; 181 94;188 84; 178
18 68,110 154; 114 48; 122 18 103;187 97; 173 88,184
17 108; 174 100,180 89; 191
17 81; 114 68,119 47,128 18 110; ISO 103; 187 92; 198
18 83; 119 58; 124 49; 133 19 113; 187 107; 193 94; 208
19 65; 124 80;129 50; 139
20 87; 129 82,114 52; 114 20 117;193 110; 200 97; 213

8 8 51; 85 49; 87 43; 93 11 11 100;153 98; 167 87; 188


9 154;90 51; 93 45; 99 12 104;180 99; 165 90;174
10 58; 98 58; 99 47; 106 13 108,187 103,172 93,182

Digitized by Google
0.10 P (approx.) P (approx.)

fil fill CE·OI 0·0/; 0·01 fil fIa 0·10 0·06 0·01
--- ---
I. 17. 1{;'fI; 180 H6; 190 19 2M 293 l{;8; 308
16 116; 181 110; 187 99; 198 20 197; 293 188; 302 172; 318

16 188 195 CE2; 206 15 273 281 ;2M


17 196 202 {kll; 21. 16 ; 283 290 '75; 3011
18 127; 203 121;209 08; 222 17 203; 292 195; 300 180; 3111
19 131;210 12.; 217 11; 230 18 208;302 200; 310 1M; 326
20 1311; 217 128; 22. !oi; 238 19 21.; 311 2011; 320 189;336

12 180 ISli £<35; 1911 20 320 330 3.7


13 1211; 187 119; 193 109; 203
I. 129; 195 123; 201 112; 212 16 16 219; 309 211;317 196; 332
15 203 209 ill; 221 17 319 ; 327 ; 3.3
16 210 ; 217 L9;229 18 ; 329 338 354
19 ; 339 348 366
17 142; 218 1311; 2211 122; 238 20 2.3; 349 234; 3118 215; 377
18 146; 226 139;233 125; 247
19 23. 241 L{;9; 255 17 346 3511 372
20 241 ; 249 1{;2; 260i 18 357 366 3M
19 262; 367 2112; 377 234; 3911
13 13 142; 209 136;215 125; 226 20 268;378 258; 388 239; 407
14 ; 217 ; 223 1{;9; 235
15 225 232 Z{;3;U4 18 386 396 414
16 234 240 L{;6; 2M 19 ; 397 407 426
17 161; 2.2 15.; 249 140; 263 20 294; 408 283; 419 263; .39

18 250 258 ; 272 19 428 .38 458


19 ; 268 266 ; 282 20 440 451 471
20 1711; 267 167; 275 151; 291
20 20 M8; 472 337; 483 315; 5011
14 240 160;246 147; 259
III ; 249 2116 ; 269
16 258 265 279
17 182; 266 172; 276 159;289
18 187; 275 179; 283 163; 299
A4.
TA.BLE

The Wilcozon signed rankB teat lor two relaUd sampl68


See § 10.4. The number of pairs of ObeervatiODB is ft. The table gives the values
of T (defined as the 8UID of positive ranks, or the 8UID of negative ranks, which-
ever is the smaller) for various values of P (the probability of a value of T equal
to or lesa than the tabulated value if the null hypothesis is true). If there are
more than 25 pairs of obeervatioDB, use the method described at the end of § 10.4.
Adapted from Wilcoxon and Wilcox (1964), with permission.

P value (two tail)

n 0·06 0·02 0·01

4 to(P = 0·126) -
6 to(P = 0·0626)-
6 0
7 2 0
8 4 2 0
9 6 3 2
10 8 6 3

11 11 7 6
12 14 10 7
13 17 13 10
14 21 16 13
16 26 20 16

16 30 24 20
17 36 28 23
18 40 33 28
19 46 38 32
20 62 43 38
21 69 49 43
22 66 66 49
23 73 62 66
24 81 69 61
26 89 77 68

t It is not po88i.bIe to reach a value of P as IJIDalJ as 0'01i with luch lIIIlallaamplea


(see §§ 6.2 and 10.4). The valuea of P for T = 0 are given.

Digitized by Google
TABI,~ A5
The Kru8kal-WaUi8 one way analysis 0/ variance on rank" (i~
aamg:;~~I)
Bee § For valuso H, t~ble the valuso P (th4'%
probabWty of obeerving a value of H eqoal to or greater than the tabulated valoe
if the noll hypothesis is true, found from the randomization distribution of rank
table only with k grouyw, t,Ye of ok::w;,'&tioso::so
"s, fIa) in being to 5.:&vger 0;' w,)re grollyll 1188 the
method deacrlbed at the end of § 11.5. From Kroskal and Wallis (l952,J. Amer.
statW. A ... 47, 614; 48, 910) with permiaaion of the author and publisher.

Sample Iisee Sample 81ze11


H P H P
"a "a
'"
2 1 I 7000 0·500
'"
4 1 I'I214 4'%,057
4·5000 0·076
2 2 1 3·6000 0·200 4·0179 0·114
2 2 I,5714 0·067
:~'7143 0·200 4 2 I,oooo {l'014
5·3333 0·033
5·1260 0·052
3 1 3·2000 0·300 4·4583 0·100
),5:667 y,l05
3 I'2857 0·100
3·8571 0·133 4 3 1 5·8333 0·021
5·2083 0·050
3 2 I'3572 0·029 I ,0000 Y'057
Y,7143 0·048 Y'I~Y56 I,'093
4·5000 Y·067 ;SHi889 Y129
4·4643 0·105
4 3 2 6·4444 0·008
3 zH429 0·043 03,03000 03,011
03,5714 0·100 03,4«4 03'046
4·0000 0·129 5·4000 0·051
4·5111 0·098
3 2 £,s'25oo 0·011 4'%,4444 0,102
03,3611 0·032
4'%,1389 03·061
4·5556 0·100 4 3 3 6·7455 0·010
4·2600 0·121 6·7091 0·013
03,5:909 03,046
3 3 03,2000 ¥?ij04 4'%,)0373 03050
6·4889 0·011 4·7091 0·092
5·6889 0·029 4·7000 0·101
4'%,6000 0·050
030667 Y'086 4 1 03'4'%0367 Y'010
4'%,6222 EHoo 03'1667 03,022
4·9667 0·048
4 3·5714 0·200 4·8667 O·OM
Tables 4:07
Sample __ Sample __
H p H P

-
tis tis tis "1 tis tis

401867 0·082 6 3 3 7·0788


.0·009
,·0867 0'102 8·9818 0·011

4 , 2 7·03M 0·006
6·M86
6·6162
0·Of9
0·061
8·8727 0·011 4'6333 0·097
6·4646 0'Of6 4,'121 0·109
6.23M 0·062
'·6646 0·098 5 4 I 8·9645 0·008
"«55 0·103
, , 3 H439 0·010
8·MOO
'·9865
'·8800
0·011
O·Off
0·068
H3M 0·011
3·9873 0·098
6·5986 0·Of9
3·9800 0·102
5·5758 0·051
'·6456
4,'773
0·099
0.102 5 , 2 7·2Of6 0·009
, , 4 7011638 0·008
H182
6·2727
0·010
0'Of9
7·6386 0·011 5·2682 0·060
5·6923 0'Of9 '·6409 0·098
5·6638 0·064 '·6182 0·101
0·097
'·6639
'·6001 O'IOf
6 , 3 N"9 0'010
5 I I 3·8671 0,1'3 7·39f9 0·011
6·66M 0'Of9
5 2 I 6·2600 0·036 6·8308 0·050
6·0000 0·Of8 '·6487 0·099
'·4600 0·071 '·6231 0·103
,·2000
'·0600
0·096
0·119 5 4 , 7'78Of 0·009
7-7"0 0·011
5 2 2 6·5333 0·008 6·6671 0·Of9
6·1333 0·013 6·8178 0·060
6-1600 0·034 ,·8187 0·100
5·OfOO 0·066 '·6627 0·102
4·3733 0·090
,·2933 0·122
6 6 I 7·3091 0·009
5 3 I 6,'000 0·012 8'83M 0·011
4·9800 0·048 6-1273 0'Of8
,·8711 0·062 ,·9091 0·063
'·0178 0·096 H091 0·088
3·MOO 0'123 "03M 0·106

6 3 2 8·9091 0·009 6 6 2 7'3386 0·010


6·8218 0·010 7·2892 0'010
5·2609 0·049 6·3386 0'Of7
6-1066 0·062 6·2'82 0·061
'·6609 0·091 '·8231 0'097
"'9f6 0·101 '·5077 0'100

Digitized by Google
4.08 Tabla
Sample IIizee Sample IIizee
H P B P
"I lis fta "I fta fta

6 6 3 7·6780 0·010 6·6429 0-000


7·6429 0·010 '·6229 0-099
6·7065 0-046 '·6200 0·101
6·6264 0·061
'·6461 0·100 6 6 5 8·0000 0-009
'·6363 0·102 7·9800 0·010

6 6 , 7·8229 0·010
6·7800
6·6800
0·049
0·061
7·791' 0·010 '·6600 0·100
6·6867 0·049 '·6000 0·102

Digitized by Google
A6 TABLE

The Friedman two way analysis oj variance on ranks Jor randomized block experiments
See § 11.7. For each value of S the table gives the exact value of P (the proba.bility of observing a. value of S equal to or greater
than the tabulated value if the null hypothesis is true, found from the randomization distribution of rank. aums). Approximate P
values are given at the head of the column. If the number of treatments, k, or the number of observations per treatment = number
of blocks, flo is too large for this table, use the method described at the end of § 11.7. From Friedman, M. (1987, J. Amer. SlaNt. A ...
11,688), by permission of the author and publisher.

Number of treatments
1:=3 1:=-4 1: ... 5

No. of P~O·06 P~O-OI P ~0-001 P ~0-05 P~O-Ol P ~0-001 P ~0-06 P~O·01 P~O·OOI
blocks
n S P S P S P S P S P S P S P S P S P

2 20 0-042
3 18 0-028 37 0-033 64 0-046 78 ()O0078 88 0-0009
~ 28 0·042 32 0-0048 62 0-038 64 0-0089 74 0·0009
6 32 0·039 42 0·0085 60 0-0008 85 0·044 83 0·0087 106 0-0008
8 42 0·029 M 0-0081 72 0-0001 78 0-043 100 0-0100 128 0-0009
7 50 0·027 82 O-OOM 88 0-0003
8 60 0-047 72 ()O0099 98 0-0009
o 9 68 ()O048 78 0-0100 114 0-0007
cO"
'"
N" 10 82 0·048 98 0-0075 128 0·0008
~ ---------

~
C"')
o
-
~
rv
TA.BLB A7
Table 01 t1ae critical range (dil/er6f1C6 between rani BUm.t lor any two
treatment8) lor comparing aU paira in. t1ae KrualaJl-WalliB nonparametric
OM way a'1U1lYN 01 varia1l.C6 (a66 §§ 11.5 aM 11.9)
Values for which an exact P is given are abridged from the tables of McDonald
and Thompaon (1967), the remaining values are abridged from WUcoxon and
WUcox (19M). Reproduction by permiaaion of the autho1'8 and pubUsbere. tNot
attainable. Number of treatments (samples) = k. Number of obeervation (repli-
cates) per treatment = ft.

P (approximate) P (approximate)
0·01 0·06 0·01 0·06
orit. orit. orit. orit.
i range P range P i range P range P
" "
8 2 t 8 0·067 5 2 18 0·018 15 0·048
8 17 0·011 15 O·OM 3 32 0·007 28 0·060
4 27 0·011 24 0·043 4 50 0·010 44 0·068
6 89 0·009 33 0·048 6 76·8 83·6
8 61 0·011 43 0·049 8 99·3 88·2
7 87-8 154·4 7 124·8 104·8
8 82·4 88·3 8 162·2 127·8
9 98-1 78·9 9 18H 162·0
10 114·7 92·3 10 212·2 177·8
11 132-l 106·3 II 244·8 206·0
12 150·4 120·9 12 278·6 233-4
13 189·4 138·2 13 313·8 283·0
14 18H 152-l 14 360·6 298·8
16 209·8 188·8 16 388·6 326·7
18 230·7 186·8 18 427·9 368·8
17 262·6 203·1 17 488·4 392·8
18 276·0 221-2 18 510·2 427·8
19 298-1 239·8 19 563-l 483·8
20 321-8 268·8 20 597·2 500·6
4 2 t 12 0·029 8 2 20 0·010 19 0·030
3 24 0·012 22 0·043 3 39 0·009 35 0·066
4 38 0·012 34 0·049 4 87·3 67·0
5 58·2 48·1 I) 93·8 79·3
8 78·3 82·9 8 122·8 104·0
7 95·8 79·1 7 1154·4 130·8
8 118·8 98·4 8 188·4 159·8
9 139·2 114·8 9 224·5 190·2
10 182·8 134·3 10 282·7 222·8
II 187·8 164·8 II 302·9 268·8
12 213·6 178·2 12 344·9 292·2
13 240·8 198·6 13 388·7 329·3
14 288·7 221-7 14 434·2 387·8
16 297·8 246·7 16 481-3 407·8
18 327·9 270·8 18 63()O1 449·1
17 369·0 298·2 17 680·3 491-7
18 391·0 322·8 18 832-1 636·6
19 423·8 349·7 19 886·4 68008
20 467·8 377-8 20 740·0 828·9

Digitized by Google
TABLE AS
Table oj the critical range (diJJer6'fI,U between rank BUmB Jor any two
treatments) Jor compairing aU pair8 in the Friedman nonparametric two
way analysis oj mriance (8ee §§ 11.7 and 11.9)
Values for which an exact P is given are abridged from McDonald and
ThompBOn (1967), the remaining values are abridged from Wilcoxon and Wilcox
(1964). Reproduction by permission of the authors and publishers. tNot attain-
able. NumberoftreatmentB = k. Number of replicates (= numberofblocks) =n.

P (approximate) P (approximate)
0·01 0·06 0·01 0·06
orit. crit. orit. orit.
~ fa range P range P ~ fa range P range P

3 3 t 6 0·028 6 2 t 8 0·060
4 8 0·005 7 0·042 3 12 0·002 10 0·067
6 9 0·008 8 0·039 4 14 0·006 12 0·064
6 10 0·009 9 0·029 6 16 0·006 14 0·040
7 II 0·008 9 0·061 6 17 0·013 16 0·049
8 12 0·007 10 0·039 7 19 0·009 16 0·062
9 12 0·013 10 0·048 8 20 0·012 18 0·036
9 22 0·008 19 0·037
10 13 0·010 11 0·037 10 23 0·009 20 0·038
11 14 0·008 II 0·049 II 24 0·010 21 0·038
12 14 0·012 12 0·038 12 26 O'OII 22 0·038
13 16 0·009 12 0·049 13 26 O'OII 23 0·036
14 16 0·007 13 0·038 14 27 o·on 24 0·034
16 16 0·010 13 0·047 16 28 0·010 24 0·046
16 16·6 13·3 16 29-1 24·4
17 17·0 13·7 17 30·0 26·2
18 17·6 14·1 18 30·9 26·9
19 18·0 14·4 19 3J07 26·6
20 18·4 14·8 20 32·6 27·3

4 2 t 6 0·083 6 2 t 10 0·033
3 9 0·007 8 0·049 3 14 0'008 13 0·030
4 II 0·006 10 0·026 4 17 0·006 16 0·047
5 12 0·013 II 0·037 6 19 0·010 17 0·047
6 14 0·006 12 0·037 6 21 0·010 19 0'040
7 16 0·008 13 0·037 7 23 0·010 20 0·049
8 16 0·009 14 0·034 8 26 0·008 22 0·039
9 17 0·010 15 0·032 9 26 0·012 23 0·043

10 18 0·010 15 0·046 10 28 0·009 24 0·047


II 19 0·009 16 0·041 II 29 0·012 26 0·036
12 20 0·008 17 0·038 12 31 0·009 27 0·039
13 21 0·008 18 0·032 13 32 0·010 28 0·039
14 21 0·011 18 0·042 14 33 O'OII 29 0·040
15 22 0·010 19 0·037 15 34 0·012 30 0·040
16 22·7 19 16 35·6 30·2
17 23·4 19·3 17 36·7 3J.l
18 24-1 19·9 18 37·8 32·0
19 24·8 20·4 19 38·8 32·9
20 25·4 21·0 20 39·8 33·7
28

Digitized by Google
A9
TABLE

Rankit8 (expected normal order 8tatiBtic8)


The use of Rankits to test for a normal (Gaussian) distribution is described
in § 4.6. The observations are ranked, the rankit is found from the table, and
plotted against the value of the observation (or any desired transformation of the
observation). Negative values are omitted for samples1arger than 10. By analogy
with the smaller samples the rankit for the seventh observation in a sample of 11
is clearly -0·225 and that for the seventh in a sample of 12 is -0'103. The
table is Bliss's (1967) adaptation of that of Harter (1961, Biometrika 48,151-65).
Reproduced with permission.
Rank Size of aampie = N
order 2 3 4 6 6 7 8 II 10
1 0'564 0'864 1'029 H63 1'267 1·352 1'424 1·486 1·6311
2 -0'564 0·000 0'297 0·495 0·642 0'767 0'852 0·1132 1·001
3 -0,864 -0,297 0'000 0'202 0·353 0'473 0·572 0'666
4 -1'029 -0,496 -0,202 0·000 0·163 0·276 0·876
6 -1-163 -0,642 -0,363 -0'11;3 0·000 0·128
6 -1,267 -0,767 -0"73 -0,275 -0,128
7 -1,352 -0'862 -0·672 -0·876
8 -1,'24 -0'982 -0'666
9 -1·486 -1'001
10 -1·63\1
11 12 13 14 16 16 17 18 19 20
1 1·51l6 1'629 1·668 1'708 1·786 1·766 1·794 1'820 1'844 1'867
2 1'062 1'116 1-164 1·208 1·248 1·285 1·819 1·350 1'880 1'408
3 0'729 0'703 0'850 0·001 0·048 0'000 1'029 1'066 1'099 1·131
4 0'462 0·537 0·603 0·662 0·716 0'763 0·807 0·848 0'886 0'921
6 0·225 0'312 0·388 0'456 0·516 0'570 0·6111 0·666 0'707 0'746
6 0·000 0·103 0·101 0·267 0·335 0·306 0·451 0·502 0'648 0'600
7 0·000 0'088 0·165 0'234 0·295 0·351 0'402 0'448
8 0·000 0·077 0'146 0'208 0'264 0·815
9 0·000 0'069 0·131 0·187
10 0·000 0·062
21 22 23 24 25 26 27 28 29 SO
1 1-1180 1'910 1·929 1'048 1·966 1·082 1'908 2'014 2'029 20(M3
2 1'434 1·468 1·41l1 1'503 1'524 1·544 1·563 1'581 1'599 1·616
3 H60 1·18K 1·214 1·239 1·263 1'285 1'306 1·327 1'846 1·366
4 0·954 0·118& 1'014 1·041 1·067 1'091 1'11& 1-137 1·158 1-179
6 0'782 0·815 0'847 0·877 0'005 0'932 0·957 0'981 1'004 1-()!6
6 0·630 0·667 0·701 0·734 0·764 0·793 0·820 0·846 0'871 0·894
7 0·401 0'532 0·569 0·604 0·637 0·668 0·607 0'725 0'752 0'777
8 0·362 0'406 0·446 0·484 0·619 0·063 0'584 0·614 0·642 0'6611
0 0·238 0·286 0·330 0'370 0'409 0'444 0·478 0·510 0·540 0·668
10 0'118 0·170 0·218 0·262 0·303 0'341 0·377 0,'11 0'443 0·473
11 0·000 0·056 0·108 0·156 0·200 0·241 0'280 0'316 0'350 0·382
12 0·000 0·052 0·100 0·144 0·185 0·224 0·260 0'294
13 0'000 0·048 0'092 0·134 0'172 0·20\1
a 0·000 0·044 0'086 0'125
15 0·000 0'041

Digitized by Google
TABLE All (Continued)

Rank 81ze of sample = N


order 31 32 33 34 3S 36 37 38 S9 40
1 2'OS6 2·070 2·082 2'OIIS 2'107 2'118 2·1211 2-140 2·1S1 2·161
2 1'632 1·647 1'662 1·676 1·690 1'704 1·717 1·7211 1'741 1'753
3 1'383 1·400 1'416 10432 l'U8 1·462 1'477 1·491 1·004 I'S17
4 1·198 1'217 1·23S 1'2S2 1·2611 1'2!!S 1·300 1·315 1·330 1·3..
S 1'047 1'067 1'087 1-105 1'123 1-140 1-1S7 1-173 1·188 1'203
6 0·917 0·938 0·050 0'079 0·098 1'018 1'034 1'051 1'087 1'083
7 0'801 0'824 0'846 0'867 0'R87 0·008 0·925 0'043 0'960 0'977
8 0·694 0·719 0'742 0·764 0'786 0·806 0·826 0'845 0·t!63 0·881
9 0'595 0'621 0'646 0'670 0·692 0·714 0'735 0'755 0·774 0'793
10 0'5O'l 0'S29 0·SS6 0'580 0'604 0·627 0'649 0'870 0·690 0'710
11 0'413 0·442 0'469 0'406 0'521 0'545 0·568 0·590 0'611 0·632
12 0·327 0·358 0·387 0'414 0'441 0'466 0'490 0'514 0'536 0'557
13 0·243 0·276 0·307 0'336 0·364 0·390 0'416 0'440 0'463 0·486
14 0'161 0·106 0·228 0'2611 0·289 0'317 0·343 0'369 0·3113 0·U7
16 0·080 0·117 0'IS1 0·184 0·215 0'24S 0'273 0'300 0'325 0'300
16 0'000 0·0311 0·076 0·110 0·143 0·174 0·203 0·232 0·2li8 0·284
17 0·000 0·037 0·071 0·104 0'135 0'165 0·193 0·220
18 0'000 0·035 0'067 0'0119 0.128 0'158
19 0·000 0·033 0·064 0·0114
20 0'000 0·081
41 42 43 U 45 46 47 48 49 SO
I 2·171 2'180 2·1110 2'100 2·208 2'216 2'22S 2·233 2'241 2·2411
2 1·765 1·776 1'71l7 1·797 1'807 HIl7 1'827 I-I!:i7 1·846 1'855
3 !-li30 1'542 1·564 1-565 1'577 1·588 1'598 1'609 1·619 1·629
4 1'357 1·370 1'3l13 1·396 1·408 1·420 1'431 1·442 1·4S3 1-464
5 1'218 1'232 1·246 1'269 1'272 1·284 1·206 1·308 1'320 1·331
6 1'009 H14 1·12R 1·142 1·1SO 1'1611 1-182 H94 1·207 1·218
7 0'11113 1·00II 1-11'24 1·039 1·054 1'068 1·081 1·094 1·107 1-119
8 O'llllll 0·915 0·031 0·946 0'1161 0·076 0·090 1·004 1·017 1'030
9 0·811 0'1!2l! 0·845 0'861 0'R77 0'89'2 0·907 0'021 0'93S 0·9411
10 0·720 0'747 0·764 0·781 0·798 0·814 0'829 0·844 0·860 0'878
11 0'651 0·671 0'6R9 0'707 0·724 0·740 0'7S7 0'772 0·787 0·802
12 0·578 0·508 0·617 0·636 0·854 0'871 0'6R8 0'704 0'720 0'73S
13 0·007 0'528 O'MR O-[,6R 0-588 0·804 0'822 0-639 0'65S 0·871
14 0-430 0·461 0·4R2 0-502 0-522 O'S40 0-650 0-576 0'S93 0'610
15 0·373 0'398 0-418 0'430 0·459 0'479 0'498 0-516 0'584 0·561
18 0·8011 0·383 0·355 0·377 0-398 0'410 0'438 0-457 0·478 0-494
17 0'246 0·270 0'294 0'317 0'339 0-360 0-381 0'400 0'419 0'438
18 0·183 0-200 0·234 0·258 0-281 0-303 0-324 0'345 0'384 0'384
19 0'122 0-149 0'175 0-200 0·224 0-247 0·289 0-290 0'310 0·330
20 0-061 0-0811 0·118 0-142 0-167 0-191 0·214 0·238 0-2S7 0·278
21 0-000 0-030 0·058 0'OR5 0-111 0-138 0'160 0'183 0·205 0-227
22 0·000 0·028 0-055 0-081 0-108 0'130 0'153 0·176
23 0-000 0'027 0·053 0'078 0·102 0·125
24 0·000 0'026 O'OM 0'07:;
25 0'000 0'025

Digitized by Google
Digitized by Google
References

~eTON. (1932). lUUl1lc€:&£~g


€:&£AILBY, N. T. J. (5 lc"lciley, New York.
(1967). The approach ro biology aeTil mffiicine. Wiley, London.
BAIN, W. A. and BA'ITY, J. E. (1956). Inactivation of adrenaline and noradrena-
line by human and other mammaUan liver in vitro. Br. J. Pharmacol. 11,
52-7.
BABTLB'l"l', M. S. (1947). The use oftra.nsformations. Biometrics a, 39-02.
BAYES, T. (1763). An 6118&y towards solving a problem in the doctrine of chances.
Phil. Trans. Soc. 6a" 370.
g""",,""ABD, C. (1965)" i~ileT"""~",,,,,:i1on ro the 8ludll oj medicine. Collier
Boob edition (Wlclj" g'=:w:b, New York"
C. I. (194n experiments g:w~up8 for use in
hiological assays" 69-88.
C. I. (1967). il£,logy, Vol. I. MCftc""m~g:wlL
gnYD, I. A. and (1956). The in mammalian
muscle. J. Phllriol., Lond. IIJ, 74-91.
Box, G. E. P. and Cox, D. R. (1964). An analysis of tra.nsformations. Jl B.
Blat• • Soc. B16, 211-43.
BROWNLEE, K. A. (1965). Statistical theory and methodology in science and
engineering, 2nd edn. Wiley, New York.
BORN, J. H., FINNEY, D. J., and GooDWIN, L. G. (1950). Biological Blandardw-
&1:on, 2nd edn. O'lcnc"l Press.
gymN8TOCK, G. an"i E. (1962). Sponinn:w:nTf: lc",,,C":nCials at sympath-
ntic nerve endings muscle. J. PhllrioL" lc46-60.
CocImA.N, W. G. ( test of goodness math. Stat• . la,
lcl5-45.
and Cox, G. €:&£""5Perimental de8ign8" V1Hey, New York;
Chapman and Hall, London.
COLQUHOUN, D. (1963). Balanced incomplete block designs in biological assay
illustrated by the assay of gastrin using a Youden squa.re. Br. J. Pha'l'flUle.
C~.ll,67-77.
- - (1968). The rate of equilibration in a competitive n drug system and the
auto-inhibitory equations of enzyme kinetics: some properties of simple
models for paBBivn Proc. B. Soc. B:D:fL"
(1969). A comL:w:gn="n ,ntjmators for a ~~,.,,,",,=:w,,",,~'" hyperbola. Jl B.
.tal• . Soc. 8er. Lt=~tistics) 18, 130-4£J"
Rapid histaminn mf:thod and some
theoretical consi";Y~r"nt,,?on~,, • Pha'l'flUle. \iy:ynn"""Tf:n"
D. R. (1962). OYn"y:w:"n£ Methuens, Londn~" Ps:perback (1967).
- - and LBwIs, P. • (££"Elcd). The Blati8tical a"s:4411Bis of a BeMu of event..
Methuen, London.
CUSBNY, A. R. and PBimLBS, A. R. (1905). The action of optical isomers. 11.
HyoecineB. J. Phllriol., Lond. aI, 501-10.

gitized by Goo
416 References
DEWS, P. B. and BERKSON, J. (1954). Statistics and mathematics in Biology,
(Eds. O. Kempthorne, Th.A. Bancroft, J. W. Gowen, and J. L. Lush), pp.
361-70. Iowa State College Press.
Documenta Geigy scientific tables, 6th edn (1962). J. R. Geigy, S. A. Basle, Switzer-
land.
DOWD, J. E. and RIGGs, D. S. (1965). A comparison of estimates of Michaelis-
Menten kinetic constants from various linear transformations. J. biol. Chern.
140,863-9.
DRAPER, N. R. and SMITH, H. (1966). Applied regression analysis. Wiley, New
York.
DUNNE'rl', C. W. (1964). New tables for multiple comparisons with a control.
Biometrics 20, 482-91.
DURBIN, J. (1951). Incomplete blocks in ranking experiments. Br. J. statist.
Psychol. 4, 85-90.
F'ELLER, W. (1957). An introduction to probability theory and its applications,
Vol. 1, 2nd edn. Wiley, New York.
- - (1966). An introduction to probability theory and its applications, Vol. 2,
2nd edn. Wiley, New York.
FINNEY, D. J. (1964). Statistical method in biological aBBaY, 2nd edn. GrUHn,
London.
- - LATSCHA, R., BENNE'rl', B. M., and Hsu, P. (1963). Tables lor testing
significance in a 2 X 2 table. Cambridge University Press.
FISHER, R. A. (1951). The design 0/ ezperiments, 6th edn. Oliver and Boyd,
Edinburgh.
- - and YATES, F. (1963). Statistical tables /ar biological, agricuUural and medical
research, 6th edn. Oliver and Boyd, Edinburgh.
GOULDEN, C. H. (1952). Methods 0/ statistical analysis, 2nd edn. Wiley, New York.
GUILFORD, J. P. (1954). Psychometric methods, 2nd edn. McGraw-Hill, New York.
HmmLRlJK, J. (1961). Experimental comparison of Student's and Wilcoxon's
two sample tests. In Quantitive methods in pharmacology (Ed. H. de Jonge).
North Holland, Amsterdam.
HOOKE, R. and JEEVES, T. A. (1961). 'Direct search' solution of numerica.1 and
statistica.1 problems. J. ABB. comput. Mach. 8,212-29.
KATZ, B. (1966). Nerve, muscle and synapse. McGraw-Hill, New York.
KEMPTHORNE, O. (1952). The design and analysis 0/ experiments. Wiley, New
York.
KENDALL, M. G. and STUART, A. (1961). The advanced theory 0/ statistics, Vol. 2.
GrUHn, London.
- - - - (1963). Theadvancedtheoryo/statistics,Vol.l,2nded. Griffin,London.
- - - - (1966). The advanced theory o/statistics, Vol. 3. Griffin, London.
LINDLEY, D. V. (1965). Introduction to probability and statistics /rom a Bayesian
viewpoint, Part 1. Cambridge University Press.
- - (1969). In his review of "The structure 0/ in/erence" by D. A. S. Fraser.
Biometrika 56, 453-6.
MAINLAND, D. (1963). Elementary medical statistics, 2nd edn. S&uders, Philadel-
phia.
- - (1967a). Statistical ward rounds-I. Clin. Pharmac. Tiler. 8, 139-46.
- - (1967b). Statistica.1 ward rounds-2. Clin. Pharmac. Ther. 8.346-55.
MARLOWE, C. (1604). The tragicaU history 0/ Doctor Faustus. London: Printed by
V. S. for Thomas Bushell.
MABTIN, A. R. (1966). Quanta.l nature ofsynaptic transmission. Physiol. Rev. 46.
51-66.

Digitized by Google
Re/erencetJ 417
MASSEY, H. S. W. and KESTELMAN, H. (1964). Ancillary mathematicB, 2nd
edn. Pitman, London.
MATHER, K. (1951). StaliBtical anal1lBiB in biology, 4th edn. Methuen, London.
McDoNALD, B. J. and THOMPSON, W. A. JR. (1967). Rank sum multiple com-
parisons in one- and two-way cl88llitlcations. Biometrika, 54, 487-97.
MOOD, A. M. and GRAYBILL, F. A. (1963). Introduction to the theory of BtatiBticB,
2nd edn. McGraw-Hill Kogakusha, New York.
NAIR, K. R. (1940). Table of confidence intervals for the median in samples from
any continuous population. Sankh1la " 551-8.
OAKLEY, C. L. (1943). He-goats into young men: tlrst steps in statistics. Univ.
Coli. HoBp. Mag. J8, 16-21.
OLIVER, F. R. (1970). Some a.symptotic properties of Colquhoun's estimato1'8
of a rectangular hyperbola. J. R. BtatiBt. Soc. (Series C, Applied statistics) 19,
269-73.
PEARSON, E. S. and HARTLEY, H. O. (1966). Biometrika tabl6IJ for BtatiBticianB,
Vol. 1, 3rd edn. Cambridge Unive1'8ity Press.
POINCARJ!:, H. (1892). Thermod1lnamique. Gauthier-Villars, Paris.
RANG, H. P. and CoLQUHOUN, D. (1973). DTUfI ReceptorB: Theory and Ezperiment.
In preparation.
ScHOR, S. and KARTEN, I. (1966). Statistical evaluation of medical journal
manuscripts. J. Am. med. ABB. 195, 1123-8.
SEARLE, S. R. (1966). Matm algebra for the biological BCienc6IJ. Wiley, New York.
SIEGEL, S. (1956a). Nonparametric BtaliBticB for the behavioural BCienc6IJ. McGraw-
Hill, New York.
- - (1956b). A method for obtaining an ordered metric scale. PBychometrika II,
207-16.
SNEDECOR, G. W. and COCHRAN, W. G. (1967). StaliBtical methodB, 6th edn.
Iowa State Unive1'8ity PreY, Iowa.
STONE, M. (1969). The role of signitlcance testing: some data with a message.
Biometrika 56, 485-93.
STuDENT (1908). The probable error of a mean. Biometrika 6, 1-25.
TAYLOR, D. (1957). The meaBUrement ofradioiBotop6IJ, 2nd edn. Methuen, London.
THOMPSON, SILVANUS, P. (1965). CalculUB made easy. Macmillan, London.
TrPPE'rI', L. H. C. (1944). The methodB of BtatiBticB, 4th edn. Williams and Norgate,
London; Wiley, New York.
TREVAN, J. W. (1927). The error of determination of toxicity. Proc. R. Soc. BIOI,
483-514.
TUKEY, J. W. (1954). Causation, regre88ion and path analysis. In StatiBticB and
mathematicB in biology (Ede. o. Kempthorne, Th. A. Bancroft, J. W. Gowen,
and J. L. Lush), p. 35. Iowa State College Press, Iowa.
WILCOXON, F. and WILCOX, ROBERTA, A. (1964). Some rapid approrimate
BtatiBtical procedur6IJ. Published and distributed by Lederle Laboratories, Pearl
River, New York.
WILDE, D. J. (1964). Optimum Beeking methodB. Prentice·Hall, Englewood Cliffs,
N.J.
WILLIAMS, E. J. (1959). RegreBBion anal1lBiB. Wiley, New York, Chapman and
Hall, London.

Digitized by Google
Digitized by Google
Index

acetylcholine releue, _ quanta! ret.. _y.-<cone.)


adding up, _ II1UDID&tion operatiou continuous (graded) N8p0D8e8, 280
addition rule, 19; _ aUo probability deaignB for, 285
additivity UBUIDption, 173 direct, 8"-8
adrenaline oataboliam diaoontinuous (quantal) N8p0D8e8, 280,
ilttmg expouential, 2M-48 8"'-64
atocbaatio interpretation, 879 incomplete block, 206, J88
adeorption, atocbaatio interpretation, interaction between reIIpODI88, 288, 319
3~ Latin square, 288
aU-or-nothing reIIpODI88, 344-M metameten for dose aDd reeponae, 280,
linearization, _ probit 287, 285, 327
relation with lED, 848 random, 285, 327
analyaia ofvariaDoe, 171-213 randomized blooke, 288,311,319
-ya,--ya rapid routine, 340
UBUIDptiona in, 172 aingle 8Ubjeot, 288
control group in, 208 lIlope ratio, 281
01U'V8 iltting, 214-67 parallel line
expectation of mean Iquanlll, 178, 188 average lIlope, 292
Friedman rank, 200, 209, 409 (table) oonildenoe limit. for, 297, 308
Gauaian oonildenoe limit., examples, 333, 317,
independent _pIes, 182, 191, 210, 325,331
234,327 oonvenient hue for lop, 287
randomized blooke, 195,210,288,311, deeigna for, 285
319 four point, _ (2 + 2) dose
homogeneity of group varianoee, 176 interpretation, 314, 323
Kruakal-Wallia, 191, 208, 406, (table) (I: + 1) dose, 340
mean lIquanlll, 187, 197, 229, 238 logite in, 361
models for observationa, 172-8, 186, 196 matching, 284
multiple aignitloanoe teste, 207-13 numerical examplee, 311 ...3
multiple ~on approach, 258 optimum deaign, 299
nonparametrio orthogonal contrute, 303-7
independent _pies, 191, 208, 406 paralleliam teet, 300
randomized blooke, 200, 209, 409 plotting reeulte, 318, aao
one way, 182, 210 potency ratio, 282, 290, 308
~,171, 191,200, 208-10 potency ratio, examples, 316, 324,
relation 331,341
with ohi-lIqU&I'8Cl, 180 Irix point, _ (3 + 3) dose
with ,teste, 179, 190, 196, 226, 232-4 lIlope (linear regreEon teet), 300, 303,
sum of lIquanlll, 27, 184-90, 217-20, 306
244-63 symmetrical, 285, 287, 289, 302, 308
additivity of, 189, 223 symmetrioal, examplee, 311, 319
working formulae, 30, 188,224 (3 + 3) dose, 284, 289, 305, 310
testing aU pairs, 207-13,410-11 (table) (3 + 3) dose, example, 319
two way, 182, 210 (3 + 2) dose, 327
variance, maximum/minimum, 176 (2 + 1) dose, 284, 340
arbitrary moment in time, 84, 883 (2+ 2) dose, 283, 287, 302, 308
area under distribution curve, 84-9 (2 + 2) dose, example, 311
_ya, 279-364 unite for dOle8, 316
analytical dilution, 280 unsymmetrioa1,285,290,300
comparative, 280 unsymmetrioal, example, 327

Digitized by Google
420 Index
assays-( com.) confidence-{com.)
validity teste, 300-7 interpretation, 101-3, 108, 114, 333
888WDptions, 70-2, 86, 101-3, 111, 139. for median, nonparametric, 103, 396
144, 148, 153, 158, 167, 172, 205-6, (table)
207; 8U al80 individual method8 for new observations, 107
in fitting curves, 220, 234 on straight line, 227
in multiple "'"",'",,,',,", nm',m"", '4i""ributed variable,
asymptotic rej"t"", "%H',',,,,,,,'Y Hf 297,308,344-Y
significance t, 325, 331, 343
averages,8u P''''''Y'i£1'4ieart index, 111
"osnto£1,t, 242

bacterial diluti,mH, 41, 107


balanced incomplete blocks, 8U incomplete exact, 293, 332--40, 3"5
Bayes' method, 6-8, 21, 95; 8U alBo for slope of straight line, 22"
probability and significance tests for time constant, 242
example in medical diagnosiB, 21--4 trustworthiness, 101-3
best estimate, 101, 216, 257-72; 8U al80 for variance, 128
bias, least squares, and likelihood for z value read off straight line, 224,
bias 293, 332--40
of estimates fiHting, 216, off straight line,
266-72,
in sampling,
statistical, 3,
binomial distrifiution, (expectation) of,
bio-assays, 8ell fiiotsi4i£1tion
cont£1',l group nnalysiB of varian"",
correlation, 5, 31, 169, 272-8
calibration curves, 280, 332 coefficient
relation with assays, 340 Pearson, 109, 273
card-shuffiing analysis, 118, 138, 192; 8U Spearman (rank), 274, 277 (table)
alBo randomization teste interpretation of, 5, 25~, 273
catabolism, exponential, 8U adrenaline covariance
cell distribution, 55 F"p"la"i
chi-squared
f',,:m1£,)a, 32
rank,202
tables, 129, ,,,4
test 8U distrihHtim'
continuity 129
for goodn"", CU'HU
for more than two samples, 131 888WDptions in, 220; Bee alBo analysis of
relation with other methods, 116 variance
for two samples, 116, 127-32 best estimate, meaning of, 101, 216,
written as normal deviate test, 12" 266-72
classification measurements, 99, 116-36 confidence limite, 8U 8Ilparate entry
two independent samples, 116 definition of sum of squares, 217
two related samples, 134 errors in, 222
coefficient of vn"intiu£1 ""'''He, 234--43
method for 58--60 ",,,,,F,,,F,,, "C:if-f2
population, m"thod, 8U 8eparale
sample, 30 £1""t,l,':mH, meaning of, 252
use, 40, 220 hyperbola,257-fh
combinations 50, 140, h52~
158, 167, ,t2, 262, 336
confidence limite, 101-15 polynomial curves, 252, 253, 336
for binomial ~, 109, 398 (table) role of statistics, 215
for fitted straight line, 224, 231 straight line, 216-57
for half-life, 2"2 transformations in, 221, 238, 2"3

gitized by Goo
Index 421
data IMllection ('data 8Il00ping'), 166,207 error-(com.)
deduction, I, 6 trustworthineaa of estimates of, 1-8,
degrees of freedom, meaning, 29, 369 101-8
density, BU probability density estimation, BU bias, least squares, likeli-
dependent variable, 21', 216 hood, aOO beet estimate
discontinuous distribution, au distribution exp(~), 69
distribution expectation, 305-8
binomial, '3-52, M, 69, 1M, 109-14, of any function, 368
124, 164, 359, 365, 398 (table) of function of two variables, 370-3
continuous, meaning, 64-9 of mean squares in analysis of variance,
cumulative, BU distribution function 178,186
discontinuous, meaning of, 43-4, 64-9, au alBomean
350 experimental method, meaning, 3-8
exponential, 81--5, 367-8, 380, 383, exponential
388-95 curve fitting, 23~3
stocbaetic interpretation, 81-5, distribution, au distnbution
379-96
function, 67-9, 367, 389
examples of, 68, 82, 3'6--58, 380, 383, F ratio, au variance ratio
389 factorial function, 9, 60
length-biased, 389-95 fiducial limite, au confidence limite
Gauaaian (normal), 69-76, 96-9, 101, Fieller's theorem, 293
346-64,866 Fisher exact test for 2 X 2 table, 116, 117
approximation to binomial, 62-3, use of tablllll for, 122
116,124 four.point _y, BU _ye, parallel line,
teste for fit, 80 (2 + 2) doee
transformations to, 71, 78, 80, 221-2 Friedman method, 200, '09 (table), .. 1
goodnese-of-fit teste (table)
chi-squared, 132 function
probit and rankit, 80 expectation of, 365-8, 370-3; au alBo
length-biased, 85, 389-96 mean
lognormal, 78-80, 107, 176, 221, 289, factorial, 10
346-64 mathematical, meaning of, 9
meaning of, 43-4, 64-9 variance of, au variance
multinomial, ..
Poi880n, 52-63, 81-5, 375-8, 388-95;
BU alBo quantal rel_ Gau88ian (normal) distribution, au distri-
skew and symmetrical, 78-80 bution
standard form of, 369 generalization, 1-8,91, 102
standard Gauaaian, 72--5, 126 GoBBet, W. S., au 'Student'
Student's " 75-8, 148, 167
dOIMl metameter, 280, 287
ratio, 283 half.life, 239
drug-receptor interaction, Bee adsorption confidence limite for, 242
Dunnett's d statistic, 208 stochastic interpretation, 380, 386
heteroecedaetic, BU homoacedaetic
Hill plot, 363
ED50, BU median effective dOIMl histogram, ", 63, 64-8, 3'6--53
efficiency of significance teete, 97 area convention, 66, 360
epinephrine, au adrenaline homoecedaetic, 167, 176, 221, 266, 269,
error 272, 281, 359
distribution of, BU distribution hyperbola, fitting of, 257-72, 361-4
estimates of, 1-8, 28-42; Bee alBo hypothesis, 6, 87-96
variance
of the first kind, 98
homogeneity of, au homoacedaetic lED, Bee individual effective dose
limite of, au confidence limite incomplete block deeignB, 206-7
of the second kind, 93 for_ye, 286

Digitized by Google
]'fIIl,a
independence, statistical, 20, 21, 22, 31, ", Lineweaver-Burk plot, 216-72
M, 84., te, 278-7, 286, 375, 379, 381 logarithm
ofoontnurt., 302-7 aha.nging hue of, 291
independent negative, 825
I81Dplee transformation, _ transformation
oluaifloation meallUl'elD8Dte, 91, 116 logistio curve, 861-4.
numerical meallUl'elD8Dte, 91, 187-61, lotPt transformation, 861-4.
182 lopormal di*ibution, _ distribution
rank - t e , 91, 187-4.8, 191
_ az.o eipifloanoe testa, random, anIi
I81Dple JIann.-Whitney~, 148
variable in curve fittiDg, 214 mean
individual efI'eoti.ve dOle, 112,844-84 of any function, 868, 868
relation with all-or-nothiDg NIpOD88, aritbmetio, population (expectation),
848 865-8
induction, 6 arithmetic, I81Dple
infennoe, lOientiilo, 8-8 lean ~ ~te, 27
preoiaion of, 101-3, 114; _ az.o ate.ndard deviation of, 83-8, 39
variaDoe, confidence limit., ctnd variaDoe of, 83-8, 81
bial weipted, 24, 39
intervaJa between random evente, 81-6, of binomial didribution, 50, 865
874-96; _ az.o lifetime deviation, 28
iIotope, _ radioilotope of uponential diatribution, 81-6, 367
of function of two variablee, 870-1
of GauaIian (normal) diatribution, 61,
Kruakal-Wallia method, 191,406 (table) 868
for testing all pairs, 410 (table) geometrio,25
lifetime and reaidual lifetime, _ life-
time
LaDpauir equation of lopormal distribution, 78, 846-67
fittiDg of, 257-72, 361 of Poiam diatribution, M, 81, 868, 375
lItoohutic interpretation, 380-6 relation with median and mode, 78-80,
Latin equare, 204 101,848,368
LD50, _ median effective doae 1IqUlU'M, 187, 117, 221, 238; _ aIM»
ean~method analyaia of variance
for_ya, 271 median
and'~'~Una~,216,257-72 effective dOlle (ED50), U6-64
for curve fitting, 218-20 lifetime, _ lifetime anIi atoohutio
geometrical interpretation, 243-63, p~

259-62 population, 26
for 1D8IIIIII, 27 relation with mean and mode, 78-80.
for lIrfichaelia-:Menten hyperbola, 257-72 101,346-64,368
without oalculua, 27, 220 I81Dple,26, 101. 108
lifetime, 81-6 metameter, dOlle and NIpOD88, 280, !87;
of adreDaline molecule, 880 _ az.o transformationa
of adIorbed molecule, 382 1Irfichaelia-:Menten equation, fittiDg of.
of empty adeorption lite, 384r 257-72,861
independence of when timing started, minimi-tion, _ optimiation
388 minimum
of iIotope, 885-7 effective dOlle, definition of, 860
length bialed I81Dple, 84., 389-95 lethal dOlle, 360
meaning of, 81-6, 385 mode, 27
reaidual,84.,383,388-95 relation to mean and median, 78-80.
twice averap length, 84., 389-95 U6-64,868
likelihood models for obllervationa
maximum, 8,268-72 flDd and random, 173, 178, 186, ~
technical meaning, 7, 21-4. mized,l96
limite of error, _ confidence limite multinomial diatn'bution, "

Digitized by Google
lflllez
multiple probability-(con.c.)
oompa.n.m.. 207 multiplication rule, 20, 378
IiDear repellion, 26~ poaterior, 8-8, 21-4, 96
and aualyBia or variance, 268 prior, 8-8, 21-4, N
multiplication aipiflcanoe value, 88-96, 207
operator, 10, 26-6 wbjective, 18, 96
rule, 20, 378 prob~~onDAtion,347,368
and haemolyaia, 381
linearizing aigmoid curvea, 381
neuromuaular junction, ... quantal teat ror GauaiaD diatn"bution, SO
re1eaae purity in heart, _1' ror, III
noD·1iDear repelliOD, ... curve fitting
noDparametriC methoda, characteriatiOll,
96,98-9 quadratic equation
DOrmal fitting, ... cnrve fitting, polynomial
diatn"bution, ... diatn'butioDB, GaUllian. IOlution or, 2M
equivalent deviation, ... probit quantal
Dull hypotheaia, 8, 87-98 rel_ or acetylcholine
Dumber or quanta per impul8e, 67-10
intervala between quanta, 81-6
obeervational method, meaning, 6; ... reaponI8I. ... all.or.nothing reIpOD1811
aZ.o oorrelation cmd probit traDBf'ormation
Occam'. razor, 216 quantitative numerioal m_ente, H
operation,1'DMIling, 9: ... ol8o aummatiOD,
etc.
optimism, or eatimatel or error, 101-3 radiation, ''''e dOlle' of, 380
op~tion,282-7 radioilotope diaintegratiOD
orthogonal oontraltl, 802-7 erron in, 62, 80-8
Dumerical eumplea, 311-43 atochutic interpretation, 386-7
variance or, 308 raDdom
blocka, 171, IN, 200, 207
Latin aquare, J08
P value, £rom Iignifloance teat, meaning, permutationa, vii, 18-19
88-100, 207 pl'OCellll, 62-83, 81-6, 374-96: Bee aZ.o
parallelilm, teat for, ... _ya, parallel 1i£etime cmd atochutic
line aample
parameter, 4 re&IIODI for uece.ity, 119
pattern.aroh minimi".atioD, 283 rejection or 1lD8CCleptable, 123
permutatiODB, 8ee oombiDatioDB aelection or, 3, 18-19,43-6
raDdom, vii, 18-19 aampling Dumben, 1188 of, vii, 18-19
pennutation teats, 8ee raDdomization teste raDdomization teats, 98, H
PoiBaon diatn'bution, ... distribution cluai&ation meuuremente, 117
polynomial curve fitting, 262-4, 388 Cuahn1' and Peeblea' data, 143
population, 4, 16, 20, 43, 84-9: 8ee ol8o numerioal and rank meellUftllDeDte, 138,
IltaDdard deviation cmd mean 143, 163, 167, 180, 191,200
power, of .igniftcance teats, 93-100 rationale, 98, 117
prior probabilitiea, Bee probability unacceptable ranciomizatiODB, 123
probability ... aZ.o card ahufBiDg analyail
addition rule, 19 raDdomized blOOD, 171, 196,200,207
Bayea' theorem, 21-4, 96 range, 28
binomial, 46, 109-14 rank m..uremente, 98, 99,118, 137, 162,
oonfidence, Bee confidence limite 171, 191,200,207-10
deDBity,84-9 oorrelation, 274
direct, 8-8 rankite, &I teat for GaUllian. diatn'bution,
distn'butioD, Bee diatn"bution SO, 412 (table)
invene, 8-8, 87 rate CODItant, 238
meaning of, 16-18,96 atochutic interpretation, 3SO, 386

Digitized by Google
424 Index
ratio significanoe-{cont_ )
dOIle,283 between , tests and analysis of
of maximum to minimum variance of variance, 190, 196,233
llet,176 between VariOUl methods, 116, 137,
potency, He _ys and oonfldence limits 152, 171
of two Gaull8ian variables, He confidence relative effioienoy, 97
limits and variance of funotions two-tail, 88
of two estimates of l&II1e variance, .ee for variance, population value of, 128
variance ratio simulation, 268
receptor-drug interaction, He adsorption six point _y, He _y, parallel line,
regression (3 + 3) doae
analysis, He ourve fitting skew distributions, 78-80, 101, 348, 368
equation, 214 standard
linear, 216-257 deviation, .ee variance of funotions
non-linear, 243-272 of obaervation, .ee variance of functions
related I&II1ples error, 33, 35, 36, 38
advantages of, 169 form of distribution, 369
ol&llllifioation measurements, 91, 134 Gaull8ian (normal) distribution _
rank measurements, 91, 152-66, 200 distribution
numerical measurements, 91, 152-70, statistioa
195,200 expected normal-order, 412
He az.o randomized blocks role of, 1-3, 86, 93, 96, 101, 214, 374
root-mean-square deviation, He standard technical meaning, 4
deviation steepest-desoent method, 262
stochastic pl"Oce8llell, 1, 81-5, 374-95
adsorption, 380-5
catabolism, 379
I&II1ple isotope disintegration, 52, 60-3, 385-7
length-biaaed, 85, 389-95; He az.o length bias, 85, 389-95
lifetime and stochastio pl"Oce8llell lifetime, .ee lifetime
simple, 44 meaning, I, 81, 374
small,49,75,80,89,96-9 of 0(111), 376, 378, 387
striotly random, vii, 3-6, 16-19, 43-5, Poisson, derivation, 375-8
117, 207; He aZ.o random quantaI rel_ of acetyloholine, 57-60,
Soheffe's method, 210 81-5
8Oientifio method, 3-8 residual lifetime, He lifetime
sign test, 153 .ee az.o distribution, exponential and
significance testa, He guide to partioular distnbution, Poi880n
tests on end sheet straight-line fitting, 214-57
for aU poBBible pain, 191, 200, 207-10 'Student' (W. S. Gosaet), 71
aBBUmptions in, 70-2, 86, 101-3, Ill, paired ,test, 167
139, 144, 148, 153, 158, 167, 172, , distribution, 75-8
205-6,207 tables of, 77
oritique of, 93-5 test, 148
efBciency of, 97 relation with analysis of variance, 179,
interpretation of, 1-8,70-2,86-100 190, 196, 226, 232-4
maximum variance/minimum variance, relation withoonfidence limits, 151, 168
176 Bum
multiple, 191, 200, 207-10 of produots, 31
one-tail, 86 working formula, 32
parametrio verBU8 nonparametrio, 96-9 of squared deviations (SSD), 27,184-90,
randomization, _ randomization testa 197,217-20,244-57
ranks, 96, 99, 116,137, 152, 171 additivity of, 189, 223
ratio of maximum to minimum varianoe, working formula for, 30, 188,224
176 .ee az.o least squares method t.md
relation analysis of variance
with oonfldence limits, 151, 155, 168, summation operator, :E, 10-14
232 survey method, meaning, 5

Digitized by Google
Index 425
tables, published, vii variance-( coral.)
tail of distribution, 67, 72 of lognormal distn"bution, 78-9,
testa 346-57
for additivity, 174 of Poisson distribution, 55, 368; 8U
of assumptions, 8U assumptions al80 distribution and quanta!
for equality of variances, 176 release
for GaU88ian (normal) distribution, of potency ratio, . . confidence limita
probit and rankit, 80 of product of two variables, 40
for goodness of fit, 132 of ratio of two variables
for Poisson distribution, 133 (approx.), 41, 107,296
of significance, 8U significance (exact), 8U confidence limita
thremold dose, 360, 364 ofreciproca1ofvariable(approx.),41,272
time constant, 238 sample, xviii, 28, 29, 369
stochastic interpretation, 380, 385 bias of, 29, 307, 369
transformations ratio of maximum to minimum, 176
for additivity, 17~ ratio of two estimates, 8U variance
for analysis of variance, 176 ratio
in 8888YS, 280-3, 287, 340, 344-6 when population mean known, 29,
in curve fitting, 221-2, 238, 243 307,309
to Gau88ian distribution, 71, 78, 80, working formula for, 30
17~, 221, 239, 287,344-6 of slope of straight line, 225
linearUing, 221-2, 238, 266-72,353 of sum
logarithmic, 78, 176,221-2,238,280-3, or difference, 37
287,291,344-6,361-4 of N variables, 37, 307
logit, 361 of variable number of variables, 41,
normalizing 8ee transformation, to 58-60,370-3
Gaussian of value
probit, 80, 347, 353-64 of II) read off straight line, 8U con·
rankit,80 fidence limita
reciprocal, 266-72 of Y read off straight line, 227
2 X 2 table of variable
independent samples, 116-134 + constant, 38
related samples, 134 X constant, 38
two samples, difference between, 8U signifi. of variance, 128
cance testa; and guide on end meet of weighted arithmetic mean, 39, 292
8U al80 confidence limita
unacceptable randomizations, 123 variance ratio (P)
less than one, 182
validity of 8888YS, 91 meaning of, 176, 179
variability, measures of, 28 relation
variance of functions of observations with chi·squared, 180
of any function (approx.), 39-40 with Student's " 179, 190, 196, 226,
of any linear function, 39, 225, 307 232-4
of difference, 37 tables of, 181
of function of correlated variables, 27, 41 virginity, III
of linear functions, 39, 225, 307
of logarithm of variable (approx.), 40 waiting time, 8U lifetime and stoch·
of mean, 33, 35, 36, 38, 101 astic pro0e8B88
meaning, 33-42 paradox, 84,374-95
multipliers, definition, 295 weighting, 25, 220, 272, 292
population, xviii, 28, 29 Wilcoxon
of binomial distribution, 50, 359, 368 signed ranks teat for two related
constancy of, 167, 175-6, 221, 266, samples, 160,405 (table)
269, 272, 281, 359 test (Mann-Whitney) for two inde·
definition of, 368-9 pendent samples, 143, 402 (table)
estimation from probit plot, 353-64
examples, 51, 60-3 Yates' correction for continuity, 126, 129,
of exponential distribution, 368 132

Digitized by Google

You might also like