Cochran 1947 Some Consequences When The Assumptions For The Analysis of Variance Are Not Satisfied
Cochran 1947 Some Consequences When The Assumptions For The Analysis of Variance Are Not Satisfied
Cochran 1947 Some Consequences When The Assumptions For The Analysis of Variance Are Not Satisfied
Author(s): W. G. Cochran
Reviewed work(s):
Source: Biometrics, Vol. 3, No. 1 (Mar., 1947), pp. 22-38
Published by: International Biometric Society
Stable URL: http://www.jstor.org/stable/3001535 .
Accessed: 22/12/2012 00:10
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access to
Biometrics.
http://www.jstor.org
Ss
+ pi + ei
23
concerned with this item. Writing in 1938, Hey (8) gives a bibliography of 36 papers, most of which deal with non-normality, while several
theoretical investigations were outside the scope of his bibliography.
Although space does not permit a detailed survey of this literature,
some comments on the nature of the work are relevant.
The work is almost entirely confined to a single aspect, namely the
effect on what-we have called the validity of tests of significance. Further, insofar as the t-test is discussed, this is either the test of a single
mean or of the difference between the means of two groups. As will be
seen later, it is important to bear this restriction in mind when evaluating the scope of the resalts.
Some writers, e.g., Bartlett (1), investigated by mathematical methods .the theoretical frequency distribution of F or t, assuming the null
hypothesis true, when sampling from an infinite population that was
non-normal. As a rule, it is extremely difficult to obtain the distributions in such cases. Others, e.g., E. S. Pearson (9), drew mechanically
500 or 1000 numerical samples from an infinite non-normal population,
calculated the value of F or t for each sample, and thus obtained empirically some idea of their frequency distributions. Where this
method was used, the number of samples was seldom large enough to
allow more than a chi-square goodness of fit test of the difference between the observed and the standard distributions. A very large number of samples is needed to determine the 5 percent point, and more
so the 1 percent point, accurately. A third method, of which Hey's
paper contains several examples, is to take actual data from experiments and generate the F or t distribution by means of randomization
similar to that which would be practiced in an experiment. The data
are chosen, of course, because they represent some type Qf departure
from normality.
The consensus from these investigations is, that no serious error is
introduced by non-normality in the significance levels of the F-test or
of the two-tailed t-test. While it is difficult to generalize about the
range of populations that were investigated, this appears to cover most
cases encountered in practice. If a guess may be made about the limits
of error, the true probability corresponding to the tabular 5 percent
significance level may lie between 4 and 7 percent. For the 1 percent
level, the limits might be taken as i percent and 2 percent. As a rule,
the tabular probability is an underestimate: that is, by using the ordinary F and t tables We tend to err in the direction of announcing too
many significant results.
24
(i) evidence of charges in the variance from one part of the experiment to another. This case will be discussed in section 6.
(ii) evidence of gross errors.
5. Effects of Gross Errors. The effects of gross errors, if undetected, are obvious. The means of the treatments that are affected will
be poorly estimated, while if a pooled error is used the standard errors
of other treatment means will be over-estimated. An extreme example
is illustrated by the data in Table I, which come from a randomized
blocks experiment with four replicates.
TABLE I
WHEAT: RATIO OF DRY TO WET GmN
Nitrogen applied
Block
1
2
3
4
None
Early
Middle
Late
.718
.725
.704
.726
.732
.781
1.035
.765
.734
.725
.763
.738
.792
.716
.758
.781
d.f.
S.S.
M.S.
9
1
8
.04729
.04205
.00524
.00525
.04205
.000655
27
28
There is no theoretical difficulty in extending the analysis of variance so as to take account of variations in error variances. The usual
analysis is replaced by a weighted analysis in which each observation
is weighted in proportion to the inverse of its error variance. The extension postulates, however, a knowledge of the relative variances of
any two observations and this knowledge is seldom available in practice. Nevertheless, the more exact theory can sometimes be used with
profit in cases where we have good estimates of these relative variances.
Suppose for instance, the situation were such that the observations
could be divided into three parts the error variances being constant
within each part. If unbiased estimates of the variances within each
part could be obtained and if these were each based on, say, at least
15 degrees of freedonm,we could recover most of the loss in efficiency
by weighting inversely as the observed variances. This device is therefore worth keeping in mind, though in complex analyses the weighted
solution involves heavy computation.
TABLE II
MANHOLDS,
PLANTNUMBERSPERPLOT
chalk
Control
Block
-_
I
II
III
IV
Total .4.............
..
Range ...
Lime
_--
---
Total
o0
140
142
36
129
447
106
49
37
114
125
325
88
98
132
130
153
513
55
135
151
143
146
575
16
117
137
137
143
534
26
81
129
135
104
449
54
147
131
103
147
528
44
330
112
130
121
493
18
897
971
928
1068
3864
(2).
29
..............3
Blocks ........
Treatments ............ ..........6
..............
Error .......
............
Total .
d.f.
S.S.
M.S.
22
31
2,079
8,516
18,939
29,534
...........
...........
860.9
........
Diff. between
Controls
Total - 4
(Controls)
91
105
78
4
.........
141
255
328
52
776
24
(1-02)
f
1
(C2 + L2 + C3 + L3)
-2(C1 + L1)
17
3
- 5
49
64
171
9
-17
43
206
12
The first two columns are used to separate the contribution of the
controls to the error. This has 7 d.f. of which 4 represent differences
between the two controls in each block. The sum of squares of the
first column is divided by 2 as indicated. There remain 3 d.f. which
come from a comparison within each block of the total yield of the controls with the total yield of the dressings. Since there are 6 dressed
plots to 2 controls per block we take
(Dressing total) -3 (Control total)
5.S.
M.S.
22
4
3
3
3
18,939
12,703
1,860
850
1,738
861
3,176
620
283
579
1,788
199
T otal .................................................
Between controls ...............
Controls v. Dressings
Chalk 1 v. Lime 1
Single v. Higher Dressings
Double and Triple Dress.
ings.
vance from the nature of the experiment. In such cases the data may
be inspected carefully to decide whether the actual amount of variation
in the error variance seems enough to justify special methods. In fact,
such inspection is worthwhile as a routine procedure and is, of course,
the only method for detecting heterogeneity when it has not been anticipated. The principal weapons for dealing with this irregular type
of heterogeneity are subdivision of the error variance or omission of
parts of the experiment. Unfortunately, in complex analyses the computations may be laborious. For the Latin square, Yates (12) has
given methods for omitting a single treatment, row or column, while
Yates and Hale (14) have extended the process to a pair of treatments,
rows or columns.
In addition, there is a common type of heterogeneity that is more
regular. In this type, which usually arises from non-normality in the
distribution of errors, the variance of an observation is some simple
function of its mean value, irrespective of the treatment or block concerned. For instance, in counts whose error distribution is related to
the Poisson, the variance of an observation may be proportional to its
mean value. Such cases, which have been most successfully handled
by means of transformations, are discussed in more detail in Dr. Bartlett's paper.
7. Effects of Correlations Amongst the Errors. These effects may
be illustrated by a simple theoretical example. Suppose that the errors
e,, e2, . . ., er of the r observations on a treatment in a simple group comparison have constant variance ur2 and that every pair has a correlation
coefficient p. The error of the treatment total, (el+ ..., + e,) will
have a variance
ru2+ r(r l)&
since there are r (r - 1) /2 cross-product terms, each of which will contribute 2pa2. Hence the true variance of the treatment mean is
(r - l)p)/r.
Now in practice we would estimate this variance by means of the sum
of squares of deviations within the group, divided by r (r - 1). But
u2{1+
(r - l)p)
(r -)
2(1-p).
or DRY HEADS
PER PLOT
(Unit, 10 g?ams)
Block 1
Block 2
LA1
84
1
LF1
148
LF2
66
1
A2
137
F2
70
1
Al
146
LFA2
179
0
0
124
0
Fl
218
0
L2
166
0
LFAl
247
0
0
177
0
Li
81
1
FA2
171
0
LA2
228
0
FAl
153
0
Al
63
1
F1
168
0
LF1
191
0
FAl
133
0
Li
97
1
LA2
158
0
L2
195
0
LFA2
145
0
33
A2
56
1
LA1
189
0
LF2
189
0
0
141
0
0
64
1
LFAl
152
0
FA2
179
0
F2
130
0
cides. The data presented are for the fourth year of the experiment,
which was conducted at the Woburn Experimental Farm, England.
The weights of dry heads are shown immediately underneath the
treatment symbols. It is evident that the first row of plots is of poor
fertility-treatments appearing in that row have only about half the
yields that they give elsewhere. Further, there are indications that
every row differs in fertility, the last row being second worst and the
third row best. The fertility gradients are especially troublesome in
that the four untreated controls all happen to lie in outside rows. The
two replications give practically identical totals and remove none of
this variation.
There is clearly little hope of obtaining information about the treatment effects unless weights are adjusted for differences in fertility
from row to row. The adjustment may be made by covariance.
For simplicity, adjustments for the first row only will be shown:
these remove the most serious environmental disturbance. As x variable we choose a variable that takes the value 1 for all plots in the first
row and zero el?ewhere. The x values are shown under the weights
in Table V. The rest of the analysis follows the usual covariance technique, Snedecor (10).
TABLE VI
SUMS OF SQUARESAND PRODUCTS
(y= weights, x = dummy variates)
d.f.
Blocks ................................. 1
13
Treatments
Error .17
31
Total.1
Y'
V;
x'
657
33,323
46,486
80,466
0.0
- 200.2
- 380.0
- 580.2
0.00
1.75
4.25
6.00
Note that there are only 14 distinct treatments, since LI is the same
as L2. The reduction in the error S.S. due to covariance is (380.0)2/
4.25, or 33,976. The error mean square is reduced from 2,734 to 782
by means of the covariance, i.e., to less than one-third of its original
value. The regression coefficient is - 380.0/4.25, or - 89.4 units.
Treatment means are adjusted in the usual way. For Li, which
was unlucky in having two plots in the first row, the unadjusted mean
is 89. The mean x value is 1, whereas the mean x value for the whole
experiment is 8/32, or 4. Hence the adjustment increases the LI mean
by (3/4) (89.4), the adjusted value being 156. For L2, which had no
plots in the first row, the x mean is 0, and the adjustment reduces the
mean from 180 to 158. It may be observed that the unadjusted mean
34
of L2 was double that of LI, while the two adjusted means agree closely,
as is reasonable since the two treatments are in fact identical.
If it were desired to adjust separately for every row, a multiple
covariance with four x variables could be computed. Each x would
take the value 1 for all plots in the corresponding row arQ!0 elsewhere.
It will be realized that the covariance technique, if misiiesd, can lead
to an underestimation of errors. It. is, however, worth keeping in mind
as an occasional weapon for difficult cases.
8. Effects of Non-Additivity. Slnppos- that in a randomized blocks
experiment, with two treatments and two replic.tes, the treatment and
block effects are multiplicative rather than additive. That is, in either
replicate, treatment B exceeds treatment A by a fixed percentage, while
for either treatment, replicate 2 exceeds replicate 1 by a fixed percentage. Consider treatment percentages of 20%oand 100% and replicate
percentages of 10% and 50%o. These together provide four combinations. Taking the observation for treatment A in replicate 1 as 1.0, the
other observations are shown in Table VII.
TABLE VII
HYrOTHETICAL DATA FOX FOUR CASES WHERE EFFECTS ARE MUvxPLICATIvu
T 20%
R 1O%o
Rep.
1
2
d
Crna I
T 100%
R 10%
T20%
R 50%
T 100%
R 50%
1.0
1.1
1.2
1.32
1.0
1.5
1.2
1.8
1X0
1.1
2.0
2.2
1.0
1.5
2.0
3.0
.02
.01
.10
.05
.10
.05
.50
.25
Thus, in the first case, 1.32 for B in replicate 2 is 1.2 times 1.1. Since
no experimental error has been added, the error variance in a correct
analysis should be zero. If the usual analysis of variance is applied to
each little table, the calculated error in each case will have 1 d.f. If d
is the sum of two corners minus the other two corners, the error S.S.
is d2/4, so that the standard error a.ais d/2 (taken as positive). The
values of d and of an, are shown below each table.
Consequently, in the first experiment, say, the usual analysis would
lead to the statement that the average increase to B is 0.21 units -+- 0.01,
instead of to the correct statement that the increase to B is 20%oe The
standard error, although due entirely to the failure of the additive rela35
36
37
REFERENCES
1. Bartlett, M. S. "The Effect of Non-Normality on the t Distribution," Proceedings
of the Cambridge Philosophical Society (1935), 31, 223-231.
2. Cochran, W. G. "Some Difflculties in the Statistical Analysis of Replicated Experiments," Empire Journal of Experimental Agriculture (1938), 6, 157-175.
3. Finney, D. J. "On the Distribution of a Variate Whose Logarithm is Normally
Distributed," Journal of The Royal Statistical Society, Suppl. (1941), 7, 155161.
4. Fisher, R. A. "On the Mathematical Foundations of Theoretical Statistics," Philosophical Transactions of the Royal Society of London, A, 222 (1922), 309-368.
5. Fisher, R. A. Statistical Methods for Research Workers.
Oliver and Boyd, Edinburgh, ? 14.
6. Fisher, R. A. The Design of Experiments.
Oliver and Boyd, Edinburgh, ? 21.
7. Jones. H. L. "Linear Regression Functions with Neglected Variablies," Journal of
the American Statistical Association (1946), 41, 356-369.
8. Hey, G. B. "A New Method of Experimental Sampling Illustrated on Certain NonNormal Populations," Biometrika (1938), 30, 68-80.
9. Pearson, E. S. "The Analysis of Variance in Cases of Non-Normal Variation,"
Biometrika (1931), 23, 114.
10. Snedecor, G. W. Statistical Methods.
Iowa State College Press, Ames, Ia. 4th
ed. (1946).
Chaps. 12 and 13.
11. Yates, F. "The Analysis of Replicated Experiments When the Field Results Are
Iincomplete," Empire Journal of Experimental Agriculture (1933), 1, 129-142.
12. Yates, F. "Incomplete Latin Squares," Journal of Agricultural Science (1936), 26,
301-315.
13. Yates, F. "The Formation of Latin Squares for Use in Field Experiments," Empire
Journal of Experimental Agriculture (1933), 1, 235-244.
14. Yates, F., and Hale, R. W. "The Analysis of Latin Squares When Two or More
Rows, Columns or Treatments Are Missing," Journal of the Royal Statistical
Society, Suppl. (1939), 6, 67-79.
38