The Analysis of Variance: I S M T 2002
The Analysis of Variance: I S M T 2002
Introduction
Researchers often perform experiments to compare two treatments, for example, two different
fertilizers, machines, methods or materials. The objectives are to determine whether there is
any real difference between these treatments, to estimate the difference, and to measure the
precision of the estimate. In the Descriptive Statistics course we have discussed comparisons of
two means. It is often important to compare more than two means. For example, we may be
interested in determining if there is any evidence for real differences among the mean values
associated with various different treatments that have been randomly allocated to the
experimental units. This corresponds to a hypothesis of the form
H 0 : µ1 = µ 2 = K = µ I
H A : at least two of the µ i ' s are different
where µ i is the mean of the ith treatment or population. Analysis of variance (ANOVA)
provides the framework to test hypotheses like the one above, on the supposition that the data
can be treated as random samples from I normal populations having the same variance σ 2 and
possibly only differing in their means. The sample sizes for the treatment groups are possibly
different, say J i . The analysis resulting from these assumptions may be approximately
justified by randomisation, to guarantee inferential validity. It is normally the case in
performing an ANOVA that the data come from an experiment rather than an observational
study, since the experimental conditions imply that balance has been achieved through
randomisation. The calculations to explore these hypotheses are set out in an analysis of
variance table. Essentially, this calculation determines whether the discrepancies between the
treatment averages are greater than could reasonably be expected from the variation that
occurs within the treatment classifications.
Yij = µ i + ε ij ( )
i = 1, K, I ; j = 1, K , J i , ε ij i. i. d. N 0, σ 2 , (1)
Note that this model has the same main assumption as the standard linear model in that the
unobservable random errors ε ij are independent and follow a normal distribution with mean
zero and unknown constant variance. If this assumption is not satisfied, then the validity of
the results of an ANOVA is in question.
H 0 : µ1 = µ 2 = K = µ I
H A : at least two of the µ i ' s are different
Yij = µ + α i + ε ij ( )
i = 1,K , I ; j = 1, K , J i , ε ij i. i. d. N 0, σ 2 , (2)
where µ i = µ + α i . The parameter µ is viewed as a grand mean, while α i is an effect for the
ith treatment group. The hypothesis associated with this model is then specified as
H 0 : α1 = α 2 = K = α I = 0
(3)
H A : at least one α i ≠ 0
It is important to note that the parameters µ and {α i } are not uniquely defined in model (2).
We say that the parameters µ and {α i } are not completely specified by the model. However,
I
we can assume without loss of generality that α . = ∑ α i = 0 , since we can write
i =1
η ij = E ( yij ) = µ + α i = (µ + α . ) + (α i − α . ) ,
It follows from the general theory that there is a unique solution satisfying ∑i αˆ i = 0 , and that
~ and {α~ } is estimable.
every parametric function of the new parameters µ i
Total N TSS
In the table above SSA, SST, SSE, TSS, MST and MSE are calculated from the observed
data. Furthermore, I is the number treatments, and N is the total number of
observations. The last column contains the value of the test statistic for the hypothesis
in (3). We will discuss these quantities below without going into too much
mathematical detail.
In order to give a short explanation of the above ANOVA table, we start off by noting
that a measure of the overall variation could have been obtained by ignoring the
separation into treatments and calculating the sample variance for the aggregate of N
observations. This would be done by calculating the total sum of squares of deviations
about the overall mean y
2
( )
I Ji
SSD = ∑ ∑ y ij − y
i =1 j =1
( )2 = ∑ J i ( yi − y )2 + ∑ ∑ (yij − y j )2
I Ji I I Ji
SSD = ∑ ∑ y ij − y
i =1 j =1 i =1 i =1 j =1
or
the total sum of squares of deviations from the overall mean can be divided into sum of
the between treatment sum of squares (SST) and the within treatment or residual sum
of squares (SSE). The SSD can also be written as the total sum of squares (TSS) minus
the sum of squares due to the average or correction factor (SSA)
I Ji
SSD = ∑ ∑ yij2 − Ny 2 = TSS − SSA
i =1 j =1
Therefore
so that, we can split up the sum of squares of the original N observations into three
additive parts:
In words this means that the total sum of squares (TSS) can be written as the sum of
squares due to the average, the between treatment sum of squares and the residual
sum of squares.
N = 1 + k −1 + N −I.
and follows an F-distribution on I-1 and N-I degrees of freedom. Intuitively, we would
expect the test statistic, FC , to be approximately 1 if there is no difference between the
treatments, and considerably greater than 1 if there is a difference. If we wish to test
H 0 at the 100(1 − α )% level, then the criterion for the test is to reject H 0 if
where Fα , I −1, N − I is the100(1 − α ) percentile of a F-distribution with I-1 and N-I degrees
of freedom. Its value can be obtained from standard tables or from a software package.
Software packages usually provide an exact p-value for the test.
Example
We analysed this data set and obtained the following ANOVA table as output:
ANOVA
YIELD
Sum of
Squares df Mean Square F Sig.
Between Groups 1290.951 3 430.317 23.458 .000
Within Groups 366.888 20 18.344
Total 1657.840 23
YIELD
Levene
Statistic df1 df2 Sig.
2.544 3 20 .085
We now verify the assumptions underlying the above model by considering residual
plots discussed above and a homogeneity of variance test. From the quantile-quantile
plot of the standardised residuals in figure 1 it seems as though the assumption of
normality for the residuals is a reasonable one. Figure 2 shows that variability of the
residuals may be a little larger for the first variety or perhaps a little smaller for the
third variety, but in general the variability of the residuals seems to be reasonably
similar across the different varieties. For a more formal analysis we can consider the
Levine test for equal variances given above. From the above SPSS output we see that
the test does not reject the null hypothesis of equal variances at a 5% significance level,
1
Data taken from Borja, M. C. Introduction to Statistical Modelling for Research: Course notes at the Department of Statistics
at Oxford University.
but it does reject it at a 10% level. Thus, it seems that the variability of the residuals is
somewhat different for the different varieties, but that the difference is not significant
at a 5% level.
In this example, the result of the hypothesis test is highly significant, as indicated by
the very small p-value. Thus, the null hypothesis that the yields of all varieties are zero
will be rejected in favour of the alternative that at least one variety has a non-zero
effect.
6
4
2
Standardised residuals
0
-2
-4
-6
-8
-2 -1 0 1 2
Quantiles of Standard Normal
Figure 1: QQ-plot for the standardised residuals of the melon yield example.
15
6
3
4
2
Residuals
0
-2
-4
-6
-8
17
20 25 30 35
Fitted : Variety
Figure 2: Residuals plotted against the fitted values of the expected response.
I λi1λi 2
∑ = 0.
i =1 N i
Contrasts are only of interest when they define interesting functions of the µ i s. There
are many ways to choose contrasts and these depend on the question the researcher is
interested in answering. Treatment contrasts are commonly used contrast to compare
all of the treatments to a control. Other commonly used contrasts are polynomial
contrasts and Helmert contrasts. The choice of contrasts can be a rather technical
subject and so we will not go into too much detail.
As an example, consider a one-way ANOVA set-up where there are five different diets
A, B, C, D and E. Suppose that A is an existing standard diet that serves as the control
and that B, C, D and E are new diets. An example of an interesting contrast may be to
compare the control diet, A, with the four new diets. This means that we are interested
in comparing the control to the average of the other four diets. This contrast would
then be
µ B − µC − µ D − µ E
µA −
4
4µ A − µ B − µ C − µ D − µ E .
One very important final remark we will make involves the issue of multiple
comparisons. The specification several different contrasts leads to multiple hypothesis
tests that are performed using the same data set. This means that we have to remember
to adjust the significance level of any multiple hypothesis tests that we conduct to
ensure that the overall level of significance for carrying out all of the tests is equal to
the desired level of significance. Recall that the reason for this is that the probability of
making at least one type I error is greater than the original significance level when we
conduct multiple tests using the same data. There are a number of different
significance correction methods described in the literature. Five of the most common
ones include adjustments suggested by Bonferroni, Scheffe, Dunnett, Tukey and Sidak.
The advantages and disadvantages of using different methods are quite complex and it
is common practice to use all of the available methods and then report the most
conservative one. Bonferroni’s method is perhaps the simplest and most widely used
method. We have discussed this method in the Descriptive Statistics Course.
The intuitive idea for the Kruskal-Wallis test is that if you rank all of the data and sum
the ranks in each group, then if the groups have no real differences, the sum of the
ranks should be the same in each group. If there were differences between the groups,
then you would expect the sum of the ranks within each group to be different.
52 36 52 62
67 34 55 71
54 47 66 41
69 125 50 118
116 30 58 48
79 31 176 82
68 30 91 65
47 59 66 72
120 33 61 49
73 98 63
The main interest is to see if whether or not diet has any effect on the mean value of
LDL cholesterol level. If we consider the boxplots in figure 3 of the LDL level for each
diet, there appear to be significant differences between the diets. We start of by fitting
the usual two-way ANOVA model to the data. It is clear from the normal Q-Q plot of
the standardised residuals in figure 4 that the residuals do not satisfy the assumption of
normality. Any inferences drawn from the results of this ANOVA model will not be
valid. Consequently, we perform the Kruskal-Wallis test. The results from this
procedure are also presented below.
2Data taken from Hettmansperger, T. P. and McKean, J. W. (1998). Robust Nonparametric Statistical Methods: Kendall’s
Library of Statistics 5. London: Arnold.
180
160
140
120
100
80
60
40
60
40
20
0
-20
-2 -1 0 1 2
Quantiles of Standard Normal
LDL
Chi-Square 7.188
df 3
Asymp. Sig. .066
a. Kruskal Wallis Test
b. Grouping Variable: DIET
The p-value of 0.066 for Kruskal-Wallis test, although not significant at a 5% level,
indicates some difference between the diets as suggested by the diets. A standard one-
way ANOVA gives a p-value of 0.35 for the F-test of equal treatment effects. This does
not agree with the boxplots in figure 3. The reason for this is that the long right tail of
the errors shown in figure 4 adversely affects the test statistic. Unfortunately, there is
no easy extension to the problem of multiple comparisons that we will introduce in
more details in a later section.
Introduction
In this section we extend the ideas of the previous section by comparing more than two
treatments, using randomised designs with larger block sizes. In blocked designs there
are two kinds of effects of interest. The first is the treatment effects, which are of
primary interest to the experimenter. The second is the blocks, whose contribution
must be accounted for. In practice, blocks might be, for example, different litters of
animals, blends of chemical material, strips of land, or contiguous periods of time. In
the next section we will consider a replicated factorial design in which the main effects
of two factors and the interaction are all of equal interest.
3Example taken from Box, Hunter and Hunter. 1978. Statistics for Experimenters: An Introduction to Design, Data
Analysis and Model Building. John Wiley, New York.
treatments in random order within each block. The randomised block design is given
in Table 3.
Treatment
Block
A B C D
( )
y ij = µ + α i + β j + ε ij , i = 1, K, I ; j = 1, K , J , ε ij i.i.d. N 0, σ 2 .
There are J blocks with I treatments observed within each block. As before the
parameter µ is viewed as a grand mean, α i is an unknown fixed effect for the ith
treatment, and β j is an unknown fixed effect for the jth block. The theoretical basis for
the analysis of this model is precisely as in the balanced one-way ANOVA. As before
the computations can be summarised in an ANOVA table, as we will show in the
following section.
ANOVA table
As was the case for the one-way layout, the results of fitting model the model for a complete
randomised block design are typically represented in an ANOVA table, which is a summary of
the modelling procedure and can be calculated using most statistical software packages. As we
are interested in the interpretation rather than theory, we only consider an example of what an
ANOVA table looks like for a complete block design, thereby avoiding unnecessary
mathematics.
4 The superscripts in parentheses associated with the observations indicate the random order in which the
experiments were run within each blend.
Total IJ TSS
In the table above SSA, SST, SSE, TSS, MST and MSE are simply summaries calculated
from the observed data, similar to what we saw for the one-way ANOVA table.
Furthermore, I is the number of treatments and J is the number of blocks. The two test
statistics are produced in a similar way to the one-way case. We will consider the two
hypotheses involved in a little more detail in the following section.
H 0 : α1 = α 2 = K = α I = 0
.
H A : at least one α i ≠ 0
Fα , I −1,( I −1)( J −1) is the 100(1 − α ) percentile of a F-distribution with I-1 and (I-1)(J-1) degrees
of freedom and its value can be obtained from standard tables or from a software
package.
Similarly, the F-statistic, FB = MSB MSE , is used to test whether there are significant
block effects, i.e., it is used to test
H 0 : β1 = β 2 = K = β J = 0
.
H A : at least one β j ≠ 0
The F-statistic provides a test of whether we can isolate comparative differences in the
block effects. Thus, a significant test indicates that blocking was a worthwhile exercise.
If we wish to test H 0 at the 100(1 − α )% level, then the criterion for the test is to reject
H 0 if
Fα , J −1,( I −1)( J −1) is the100(1 − α ) percentile of a F-distribution with J-1 and (I-1)(J-1) degrees
of freedom.
Example
Let us return to the penicillin example that we introduced earlier in this section. We
will fit the randomised block design model on this data set and validate the underlying
assumptions. Below we include the resulting ANOVA table and some diagnostic plots.
The residual plots for the penicillin example in figures 4 and 5 do not reveal anything
of special interest. The assumptions of normality and constant variance for the
residuals seem to be reasonable. Sometimes the plot of the residuals versus the
predicted values shows a curvilinear relationship. For example the residuals may tend
to be positive for low values of ŷ , become negative for intermediate values, and be
positive again for high values. This often suggests nonadditivity between the block and
treatment effects and might be eliminated by a suitable transformation of the response.
However, it is not the case in this example.
20 12
6
4
2
Residuals
0
-2
-4
15
80 85 90 95
-2 -1 0 1 2
The p-value of 0.339 for the hypothesis of zero treatment effects, suggests that the four
different treatments have not resulted in different yields. The variability among the
treatment averages can be reasonably attributed to experimental error. We reject the
null hypothesis of no blend-to-blend variation, as suggested by the small p-value
(0.041), thus there are significant block effects.
Introduction
In this section we are interested in applying two different treatments (each
treatment having a number of levels). We are trying to discover if there are any
differences within each treatment and whether the treatments interact. The main
effects of the two treatments and their interaction are all of equal interest. An easy way
to understand this topic is by means of an example. Consider an agricultural
experiment where the investigator is interested in the corn yield when three different
fertilisers are available, and corn is planted in four different soil types. The researcher
would like to establish if the fertiliser has an effect on crop yield, if the soil type has an
effect on crop yield and whether the two treatments interact. The presence of an
interaction in this example means that there may be no difference between fertiliser 1
and fertiliser 2 in soil type 1, but that fertiliser 1 may produce a greater crop yield than
fertiliser 2 in soil type 2. We will only consider balanced designs here, although the
theory extends to non-balanced designs.
ANOVA table
Again we consider the ANOVA table for the above model by avoiding unnecessary
mathematical detail. The general form of the ANOVA table is presented in table 5.
Table 5: ANOVA table for a balanced two-way factorial design with interaction
Total TSS
As before the quantities in the table above are simply summaries calculated from the
observed data, similar to what we saw for the one-way ANOVA table and the block
design. The three test statistics are produced in a similar way than before, and is based
on the same intuitive approach. That is, we are considering estimating how much of
the overall variation each factor and the interaction explain, compared to the residual
(error) variation. The next subsection will look at the relevant hypotheses in some
more detail.
The F-statistic, FA = MSA MSE , is used to test whether there are significant treatment
effects for factor A, i.e.
H 0 : α1 = α 2 = K = α a = 0
.
H A : at least one α i ≠ 0
Similar to the one-way ANOVA the statistic follows an F-distribution on a-1 and (a-
1)(b-1) degrees of freedom. If we wish to test H 0 at the 100(1 − α )% level, then the
criterion for the test is to reject H 0 if
where F1−α ,a −1,(a −1)(b −1) is the 100(1 − α ) percentile of a F-distribution with a-1 and (a-1)(b-
1) degrees of freedom.
Similarly, the F-statistic, FB = MSB MSE , is used to test whether there are significant
treatment effects for factor B, i.e.
H 0 : β1 = β 2 = K = β b = 0
.
H A : at least one β j ≠ 0
If we wish to test H 0 at the 100(1 − α )% level, then the criterion for the test is to reject
H 0 if
where F1−α ,b −1,(a −1)(b −1) is the 100(1 − α ) percentile of a F-distribution with b-1 and (a-1)(b-
1) degrees of freedom.
The third hypothesis that we want to test is that of no significant interaction effects.
The F-statistic, FAB = MS ( AB ) MSE , provides us with the basis of doing that. The
corresponding hypothesis can be formulated as
H 0 : αβ ij = 0, i = 1,2,K , a; j = 1,2, K, b,
H A : at least one αβ ij ≠ 0
When we test the null hypothesis of no interaction at the 100(1 − α )% level, the criterion
for the test is to reject H 0 if
where F1−α ,ab (n −1),(a −1)(b −1) is the 100(1 − α ) percentile of a F-distribution with ab(n-1) and
(a-1)(b-1) degrees of freedom. Normally all these values are provided by the software
package and thus there is no need to calculate them yourself.
Example
Our example involves an experiment in which haemoglobin levels in the blood of
brown trout were measured after treatment with four rates of sulfamerazine. Two
methods of administering the sulfamerazine were used. Ten fish were measured for
each rate and each method. The data are given in Table 65.
Rate
Method
1 2 3 4
If we now fit a two-way ANOVA to the data we obtain the following results:
5
This data is taken from Rencher, A. C. (2000). Linear Models in Statistics. Wiley: New York.
ANOVA output
The approach is generally to first test the hypothesis that there is an interaction, since
the significance of the main effects cannot be tested in the presence of an interaction.
If there is a significant interaction we can use something called an interaction plot to
allow us to examine the interaction and explain what is happening. If there is no
interaction we then consider testing for the effects of the two treatments.
In the above example the interaction term is not significant. If we just consider the
effect of method on its own, it appears to be strongly insignificant, whereas the effect of
rate on its own is very significant. The underlying normality and constant variance
assumption have to be verified in order for the ANOVA results to be valid. This has
been done for this example, but the results are not reported here.
Interactions
Although the interaction term is not significant in the above example, we present two
interaction plots from the example to illustrate how interaction affects can be
graphically represented. An interaction plot basically plots the mean of each level of
one treatment variable, at each level of the other treatment variable. If all the means
follow the same general pattern, then there will be no interaction. If some levels follow
different patterns, there will be an interaction. In the haemoglobin example we get the
two plots in Figure 7.
11 11
10 10
9 9
RATE
8 8
1.00
2.00 METHOD
7 7
3.00 A
6 4.00 6 B
A B 1.00 2.00 3.00 4.00
METHOD RATE
The means in the above plots all follow the same general pattern, as we would expect
from the non-significant interaction term. The use of interaction plots can be very
useful to explore the practical importance of interactions between two variables. As an
example, let us consider the following hypothetical graphs in Figure 8.
35 25
30
Average Response
Average Response
20
25
Level 1 of B Level 1 of B
20 15
Level 2 of B Level 2 of B
15 10 Level 3 of B
Level 3 of B
10
5
5
0 0
1 2 3 1 2 3
Factor A Factor A
30 30
25 25
Average Response
Average Response
20 20 Level 1 of B
Level 1 of B
15 Level 2 of B 15 Level 2 of B
10 Level 3 of B 10 Level 3 of B
5 5
0 0
1 2 3 1 2 3
Factor A Factor A
It is quite clear from these graphs, the effect that a significant main effect of a variable
and an interaction would have on the interaction plot.
You should always pay particular attention to interaction terms and their
interpretation. In particular, you should never consider a model that has an interaction
between two variables without having both of the variables included; not including the
main effects of an interaction does not make intuitive sense and is dangerous, although
a lot of software packages will allow you to do this. You should also be careful when
interpreting the results of any particular modelling procedure, if you have a significant
interaction between two variables, you can not say anything about the effect of one
variable on the response, however, you can say that your variable of interest and its
interaction with another variable has an effect on the response. In many observational
studies, interactions can often be one of the most interesting findings, since they imply
that the two variables do not act in isolation on the response, they in fact act together
and this joint interaction can give a researcher great insight into the effect of the
variables on the response.
We initially consider an interaction plot for this data to see whether there is any
evidence that there may be an interaction.
18
studytec
SN
17
CO
PO
NN
16
mean of score
15
14
13
FD FI
cogstyle
It appears from the above plot that there is an interaction between cognitive style and
study technique, so we consider the ANOVA with the interaction term included. The
ANOVA table is as follows:
6 The data were collected by Frank (1984) “Effects of field independence-dependence and study technique on
learning from a lecture” Amer.Educ.Res.J., 21, 669-678
ANOVA output
Clearly from this table, the interaction term is significant. It is now of interest to
perform an analysis of effects. In this example we are interested in testing for
differences in each of the four study techniques. However, as these techniques occur in
each of the two cognitive style groups we will have to carry out the multiple
comparisons in each group. The results from this analysis are given below. The
interesting result from this analysis is that it appears that the differences in study
technique are the same in each of the cognitive method groups; thus suggesting that
most of the differences in the response variables come from differences in study
technique rather than cognitive style.
95 % simultaneous confidence intervals for specified
linear combinations, by the Sidak method
CO.adj1-NN.adj1 ( )
CO.adj1-PO.adj1 ( )
CO.adj1-SN.adj1 ( )
NN.adj1-PO.adj1 ( )
NN.adj1-SN.adj1 ( )
PO.adj1-SN.adj1 ( )
CO.adj2-NN.adj2 ( )
CO.adj2-PO.adj2 ( )
CO.adj2-SN.adj2 ( )
NN.adj2-PO.adj2 ( )
NN.adj2-SN.adj2 ( )
PO.adj2-SN.adj2 ( )
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
simultaneous 95 % confidence limits, Sidak method
response variable: cogstyle
Analysis of covariance
Introduction
Analysis of covariance incorporates one or more regression variables into an analysis of
variance. The regression variables are typically continuous and are referred to as
covariates, hence the name analysis of covariance. In this course we will only examine
the use of a single covariate. We discuss the analysis of covariance at the hand of an
example that involves one-way analysis of variance and a covariate.
Example
We consider data on the body weights (in kilograms) and heart weights (in grams) for
domestic cats of both sexes that were given digitalis. Part of the original data is
presented in Table 7 7, 8. The primary interest is to determine whether females’ heart
weights differ from males’ when both have received digitalis. A first step in the
analysis might be to fit a one-way ANOVA model by ignoring the body weight
variable. Such a model would be given by
where the y ij are the heart weights. The ANOVA output for this model is given below.
Table 7: Body weights (kg) and heart weights (g) of domestic cats with digitalis.
7The data was originally published by Fisher, R. A. (1947). The analysis of covariance method for the relation
between a part and the whole. Biometrics, 3, 65-68.
8 The data is taken from Christensen, R. (1996). Analysis of Variance, Design and Regression: Applied statistical methods.
New York: Chapman & Hall. Here a subset of the original data is used to illustrate the application of the analysis of
covariance.
Females Males
BRAIN
Sum of
Squares df Mean Square F Sig.
Between Groups 56.117 1 56.117 23.444 .000
Within Groups 110.106 46 2.394
Total 166.223 47
We see that the effect due to sexes is highly significant in the one-way ANOVA. Until
now we have not made use of the extra value of body weight that we have in addition
to the brain weight variable. It is natural to consider whether effective use can be made
of this value.
In order to make use of the body weight observations, we add a regression term to the
one-way ANOVA model above and fit the traditional analysis of covariance model,
(
y ij = µ + α i + γz ij + ε ij , i = 1,2; j = 1,K ,24, ε ij i.i.d. N 0, σ 2 )
In the above model the z ij ’s are the body weights and γ is a slope parameter
associated with body weights. Note that this model is an extension of the simple linear
regression model between the y s and z s in which we allow a different intercept µ i for
each sex. The ANOVA table for this model is given below.
The interpretation of the above ANOVA table is different from the earlier ANOVA
tables. We note that the sum of squares for body weights, sex and error do not add up
to the total sum of squares, for example. The sums of squares in the above ANOVA
table are referred to as adjusted sums of squares because the body weight sum of squares
is adjusted for sexes and the sex sum of squares is adjusted for body weights. We do
not discuss the computation here. Interested readers can refer to Christiansen (1996)
for the computational details. The error line in the table above is simply the error from
fitting the covariance model. The only difference between the one-way model and the
covariance model is that the one-way model does not involve the regression of on body
weights, so by testing the models we are testing whether there is a significant effect due
to the regression on body weights. The standard way of comparing a full and a
reduced model is by comparing their error terms. We see from the ANOVA table
above that there is a major effect due to the regression on body weights. Figures 9, 10
and 11 contain residual plots for the covariance model. The plot of the residuals versus
the predicted values (figure 9) looks good, while figure 10 shows slightly less
variability for females than males. The difference is not very large though and we need
perhaps not worry about it too much. The normal plot of the residuals (figure 11) also
seems reasonably acceptable.
13
12
11
residuals
10
9
-2 -1 0 1 2
fitted values
2
1
residuals
0
-1
-2
FEMALE MALE
sex
Figure 10: Box plots of residuals by sex for the residuals of the covariance model.
2
1
0
-1
-2
-2 -1 0 1 2
Figure 11: Normal Q-Q plot of the standardised residuals of the covariance model
In a designed experiment, we want to investigate the effects of the treatments and not
the treatments adjusted for some covariates. Cox (1958) refers to a supplementary
observation that may be used to increase precision as a concomitant observation. It is
stated that an important condition for the use of a concomitant observation is that after
its use, estimated effects for the desired main observation shall still be obtained. This
condition means that the concomitant observations should be unaffected by the
treatments. In practice this means that either the concomitant observations are taken
before the assignment of the treatments, or the concomitant observations are made
after the assignment of the treatments, but before the effect of treatments has had time
to develop. These requirements on the nature of covariates in a designed experiment
are imposed so that the treatment effects do not depend on the presence or absence of
the covariates in the analysis. The treatment effects are logically independent
regardless of whether covariates are measured or incorporated in the analysis.
References
Box, G., J. S., Hunter, W. G., Hunter, J. S. (1978). Statistics for Experimenters: an
Introduction to design, Data Analysis, and Model Building. New York: John Wiley.
Fisher, R. A. (1947). The analysis of covariance method for the relation between a part
and the whole. Biometrics, 3, 65-68.