Anova

9-1
COMPLETE
BUSINESS
STATISTICS
by
AMIR D. ACZEL
&
JAYAVEL SOUNDERPANDIAN
6th edition.
9-2
Chapter 9
Analysis of Variance
9-3
9 Analysis of Variance
Using Statistics
The Hypothesis Test of Analysis of Variance
The Theory and Computations of ANOVA
The ANOVA Table and Examples
Further Analysis
Models, Factors, and Designs
Two-Way Analysis of Variance
Blocking Designs
9-4
9 LEARNING OBJECTIVES
After studying this chapter you should be able to:
Explain the purpose of ANOVA

Describe the model and computations behind ANOVA
Explain the test statistic F
Conduct a 1-way ANOVA
Report ANOVA results in an ANOVA table
Apply Tukey test for pair-wise analysis
Conduct a 2-way ANOVA
Explain blocking designs
Apply templates to conduct 1-way and 2-way ANOVA
9-5
9-1 Using Statistics

ANOVA (ANalysis Of VAriance) is a statistical
method for determining the existence of
differences among several population means.
ANOVA is designed to detect differences among means

from populations subject to different treatments
ANOVA is a joint test
The equality of several population means is tested
simultaneously or jointly.
ANOVA tests for the equality of several population
means by looking at two estimators of the population
variance (hence, analysis of variance).
9-6
9-2 The Hypothesis Test of

Analysis of Variance
In an analysis of variance:
We have r independent random samples, each one

corresponding to a population subject to a different
treatment.
We have:
n = n1+ n2+ n3+ ...+nr total observations.
r sample means: x1, x2 , x3 , ... , xr
These r sample means can be used to calculate an estimator of
the population variance. If the population means are equal,
we expect the variance among the sample means to be small.
r sample variances: s12, s22, s32, ...,sr2
These sample variances can be used to find a pooled

estimator of the population variance.
9-7
9-2 The Hypothesis Test of Analysis of

Variance (continued): Assumptions
We assume independent random sampling from each of the
r populations
We assume that the r populations under study:
are normally distributed,

with means mi that may or may not be equal,
but with equal variances, si2.
s
m1
Population 1
m2
Population 2
m3
Population 3
9-8
9-2 The Hypothesis Test of Analysis of

Variance (continued)
The hypothesis test of analysis of variance:
H0: m1 = m2 = m3 = m4 = ... mr
H1: Not all mi (i = 1, ..., r) are equal
The test statistic of analysis of variance:
F(r-1, n-r) =
Estimate of variance based on means from r samples

Estimate of variance based on all sample observations
That is, the test statistic in an analysis of variance is based on the ratio of
two estimators of a population variance, and is therefore based on the F
distribution, with (r-1) degrees of freedom in the numerator and (n-r)
degrees of freedom in the denominator.
9-9
When the Null Hypothesis Is True

When the null hypothesis is true:
H0: m
= m =m
We would expect the sample means to be nearly

equal, as in this illustration. And we would
expect the variation among the sample means
(between sample) to be small, relative to the
variation found around the individual sample
means (within sample).
If the null hypothesis is true, the numerator in
the test statistic is expected to be small, relative
to the denominator:
F(r-1, n-r)=
x

9-10
When the Null Hypothesis Is False
When the null hypothesis is false:

m is equal to m but not to m ,
m is equal to m but not to m ,
m is equal to m but not to m , or
m , m , and m are all unequal.
In any of these situations, we would not expect the sample means to all be nearly
equal. We would expect the variation among the sample means (between
sample) to be large, relative to the variation around the individual sample means
(within sample).
If the null hypothesis is false, the numerator in the test statistic is expected to be
large, relative to the denominator:
F(r-1,
n-r)=

9-11
The ANOVA Test Statistic for r = 4 Populations and

n = 54 Total Sample Observations
Suppose we have 4 populations, from each of which we

draw an independent random sample, with n1 + n2 + n3
+ n4 = 54. Then our test statistic is:
F(4-1, 54-4)= F(3,50) = Estimate of variance based on means from 4 samples
Estimate of variance based on all 54 sample observations
F Distribution with 3 and 50 Degrees of Freedom

0.7
0.6
f(F)
0.5
0.4
0.3
0.2
a=0.05
0.1
0.0
0
3
2.79
F(3,50)
The nonrejection region (for a=0.05)in this

instance is F 2.79, and the rejection region
is F > 2.79. If the test statistic is less than
2.79 we would not reject the null hypothesis,
and we would conclude the 4 population
means are equal. If the test statistic is
greater than 2.79, we would reject the null
hypothesis and conclude that the four
population means are not equal.
9-12
Example 9-1
Randomly chosen groups of customers were served different types of coffee and asked to rate the
coffee on a scale of 0 to 100: 21 were served pure Brazilian coffee, 20 were served pure Colombian
coffee, and 22 were served pure African-grown coffee.
The resulting test statistic was F = 2.02
F = 2.02 F
2,60
= 3.15
H0 cannot be rejected, and we cannot conclude that any of the

population means differs significan tly from the others.
0.7
0.6
0.5
f(F)
H0 : m1 = m2 = m3
H1: Not all three means equal
n1 = 21 n 2 = 20 n3 = 22 n = 21+ 20 + 22 = 63
r =3
The critical point for a = 0.05 is :
F
=F
=F
= 3.15
r -1,n-r 31,633 2,60
0.4
0.3
0.2
a=0.05
0.1
0.0
0
Test Statistic=2.02
F(2,60)=3.15
9-13
9-3 The Theory and the Computations

of ANOVA: The Grand Mean
The grand mean, x, is the mean of all n = n1+ n2+ n3+...+ nr observations
in all r samples.
The mean of sample i (i = 1,2,3,...,r) :

ni
x
j =1 ij
xi =
ni
The grand mean, the mean of all data points :
r ni
r
xij ni xi
i=1 j =1
xi =
= i=1
n
n
where x is the particular data point in position j within th e sample from population i.
ij
The subscript i denotes the population , or treatment, and runs from 1 to r. The subscript j
denotes the data point with in the sample from population i; thus, j runs from 1 to n .
j
9-14
Using the Grand Mean: Table 9-1

Treatment (j)
Sample point(j)
I = 1 Triangle
1
Triangle
2
Triangle
3
Triangle
4
Mean of Triangles
I = 2 Square
1
Square
2
Square
3
Square
4
Mean of Squares
I = 3 Circle
1
Circle
2
Circle
3
Mean of Circles
Grand mean of all data points
Value(x ij)
4
5
7
8
6
10
11
12
13
11.5
1
2
3
2
6.909
x1=6
x2=11.5
x=6.909
x3=2
0
10
Distance from data point to its sample mean

Distance from sample mean to grand mean
If the r population means are different (that is, at
least two of the population means are not equal),
then it is likely that the variation of the data
points about their respective sample means
(within sample variation) will be small relative
to the variation of the r sample means about the
grand mean (between sample variation).
9-15
The Theory and Computations of ANOVA:

Error Deviation and Treatment Deviation
We define an error devi ation as the difference between a data point
and its sample mean. Errors are denoted by e, and we have:
e =x x
ij
ij
We define a treatment deviation as the deviation of a sample mean

from the grand mean. Treatment deviations, ti , are given by:
t =x x
i
The ANOVA principle says:

When the population means are not equal, the average error
(within sample) is relatively small compared with the average
treatment (between sample) deviation.
9-16
The Theory and Computations of

ANOVA: The Total Deviation
The total deviation (Totij) is the difference between a data point (xij) and the grand mean (x):
Totij=xij - x
For any data point xij:
Tot = t + e
That is:
Total Deviation = Treatment Deviation + Error Deviation
Consider data point x24=13 from table 9-1. The

mean of sample 2 is 11.5, and the grand mean is
6.909, so:
e24 = x 24 x 2 = 13 11.5 = 1.5
t 2 = x 2 x = 11.5 6.909 = 4 .591
Tot 24 = t 2 e24 = 1.5 4 .591 = 6.091

or
Tot 24 = x 24 x = 13 6.909 = 6.091
Total deviation:
Tot24=x24-x=6.091
Error deviation:
e24=x24-x2=1.5
x24=13
Treatment deviation:
t2=x2-x=4.591
x2=11.5
x = 6.909
10
9-17

ANOVA: Squared Deviations
Total Deviation = Treatment Deviation + Error Deviation
The total deviation is the sum of the treatment deviation and the error deviation:
t + e = ( x x ) ( xij x ) = ( xij x ) = Tot ij
i
ij
i
i
Notice that the sample mean term ( x ) cancels out in the above addition, which
i
simplifies the equation.
Squared Deviations
2
2
2
+e
= ( x x ) ( xij x )
i
ij
i
i
2
2
Tot ij = ( xij x )
t
9-18

The Sum of Squares Principle
Sums of Squared Deviations
n
n
j
j
r
r
r
2
2
2
Tot
e
= nt
+
ij
i =1j =1
i =1 ii
i = 1 j = 1 ij
n
n
j
j
r
r
r
2
2
(x x) = n (x x)
( x x )2
i
i = 1 j = 1 ij
i =1 i i
i = 1 j = 1 ij
SST =
SSTR
SSE
The Sum of Squares Principle

The total sum of squares (SST) is the sum of two terms: the sum of
squares for treatment (SSTR) and the sum of squares for error (SSE).
SST = SSTR + SSE
9-19

Picturing The Sum of Squares Principle
SSTR
SSE
SST
SST measures the total variation in the data set, the variation of all individual data
points from the grand mean.
SSTR measures the explained variation, the variation of individual sample means
from the grand mean. It is that part of the variation that is possibly expected, or
explained, because the data points are drawn from different populations. Its the
variation between groups of data points.
SSE measures unexplained variation, the variation within each group that cannot be
explained by possible differences between the groups.
9-20

ANOVA: Degrees of Freedom
The number of degrees of freedom associated with SST is (n - 1).
n total observations in all r groups, less one degree of freedom
lost with the calculation of the grand mean
The number of degrees of freedom associated with SSTR is (r - 1).
r sample means, less one degree of freedom lost with the
calculation of the grand mean
The number of degrees of freedom associated with SSE is (n-r).
n total observations in all groups, less one degree of freedom
lost with the calculation of the sample mean from each of r groups
The degrees of freedom are additive in the same way as are the sums of squares:
df(total) = df(treatment) + df(error)
(n - 1) = (r - 1)
+ (n - r)
9-21

ANOVA: The Mean Squares
Recall that the calculation of the sample variance involves the division of the sum of
squared deviations from the sample mean by the number of degrees of freedom. This
principle is applied as well to find the mean squared deviations within the analysis of
variance.
Mean square treatment (MSTR):
SSTR
MSTR =
(r 1)
Mean square error (MSE):
SSE
MSE =
(n r )
Mean square total (MST):
SST
MST =
(n 1)
(Note that the additive properties of sums of squares do not extend to the mean
squares. MST MSTR + MSE.
9-22

ANOVA: The Expected Mean Squares
2
E ( MSE ) = s
and
2
m
n
(
)
= s 2 when the null hypothesis is true
2
i
i
E ( MSTR) = s
r 1
> s 2 when the null hypothesis is false
where mi is the mean of population i and m is the combined mean of all r populations.
That is, the expected mean square error (MSE) is simply the common population variance
(remember the assumption of equal population variances), but the expected treatment sum of
squares (MSTR) is the common population variance plus a term related to the variation of the
individual population means around the grand population mean.
If the null hypothesis is true so that the population means are all equal, the second term in
the E(MSTR) formulation is zero, and E(MSTR) is equal to the common population variance.
9-23
Expected Mean Squares and the

ANOVA Principle
When the null hypothesis of ANOVA is true and all r population means are
equal, MSTR and MSE are two independent, unbiased estimators of the
common population variance s2.
On the other hand, when the null hypothesis is false, then MSTR will tend to
be larger than MSE.
So the ratio of MSTR and MSE can be used as an indicator of the
equality or inequality of the r population means.
This ratio (MSTR/MSE) will tend to be near to 1 if the null hypothesis is
true, and greater than 1 if the null hypothesis is false. The ANOVA test,
finally, is a test of whether (MSTR/MSE) is equal to, or greater than, 1.
9-24

ANOVA: The F Statistic
Under the assumptions of ANOVA, the ratio (MSTR/MSE)
possess an F distribution with (r-1) degrees of freedom for the
numerator and (n-r) degrees of freedom for the denominator
when the null hypothesis is true.
The test statistic in analysis of variance:
F( r -1,n -r )
MSTR
MSE
9-25
9-4 The ANOVA Table and Examples

Treatment (i)
Triangle
-2
Triangle
-1
Triangle
Triangle
Square
10
-1.5
2.25
Square
Square
Square
2
2
2
2
3
4
11
12
13
-0.5
0.5
1.5
0.25
0.25
2.25
Circle
-1
Circle
Circle
73
17
Treatment
(xi -x)
Value (x ij )
(x ij -xi ) (x ij -xi )2
(xi -x) 2
ni (x i -x) 2
Triangle
-0.909
0.826281
3.305124
Square
4.591
21.077281
84.309124
Circle
-4.909
124.098281
72.294843
159.909091
n
j
r
( x x ) 2 = 17
SSE =
i
i = 1 j = 1 ij
r
2
SSTR = n ( x x ) = 159 .9
i =1 i i
SSTR
159.9
=
= 79.95
MSTR =
r 1
( 3 1)
SSTR 17
=
= 2 .125
MSE =
n r
8
MSTR
79 .95
=
=
= 37 .62.
F
MSE
2 .125
( 2 ,8 )
Critical point ( a = 0.01): 8.65
H may be rejected at t he 0.01 level
0
of significance.
9-26
ANOVA Table
Source of
Variation
Sum of
Squares
Degrees of
Freedom Mean Square F Ratio
Treatment SSTR=159.9
(r-1)=2
MSTR=79.95 37.62
Error
SSE=17.0
(n-r)=8
MSE=2.125
Total
SST=176.9
(n-1)=10
MST=17.69
F Distribution for 2 and 8 Degrees of Freedom

0.7
The ANOVA Table summarizes the

ANOVA calculations.
0.6
0.5
Computed test statistic=37.62
f(F)
0.4
0.3
0.2
0.01
0.1
0.0
0
10
8.65
F(2,8)
In this instance, since the test statistic is

greater than the critical point for an a =
0.01 level of significance, the null
hypothesis may be rejected, and we may
conclude that the means for triangles,
squares, and circles are not all equal.
9-27
Template Output
9-28
Example 9-2: Club Med

Club Med has conducted a test to determine whether its Caribbean resorts are equally well liked by
vacationing club members. The analysis was based on a survey questionnaire (general satisfaction,
on a scale from 0 to 100) filled out by a random sample of 40 respondents from each of 5 resorts.
Resort
Guadeloupe
89
Source of
Variation
Martinique
75
Treatment
SSTR= 14208 (r-1)= 4
MSTR= 3552
Eleuthra
73
Error
SSE=98356
(n-r)= 195
MSE= 504.39
Paradise Island
91
Total
SST=112564
(n-1)= 199
MST= 565.65
St. Lucia
85
SSE=98356
Sum of
Squares
Degrees of
Freedom
Mean Square
F Ratio
7.04

0.7
0.6
0.5
f(F)
SST=112564
Mean Response (x i )
Computed test statistic=7.04
0.4
0.3
0.2
0.01
0.1
0.0
0
3.41
F(4,200)
The resultant F
ratio is larger than
the critical point for
a = 0.01, so the
null hypothesis may
be rejected.
9-29
Example 9-3: Job Involvement

Source of
Variation
Sum of
Squares
Degrees of
Freedom
Mean Square
F Ratio
Treatment
SSTR= 879.3
(r-1)=3
MSTR= 293.1
8.52
Error
SSE= 18541.6
(n-r)= 539
MSE=34.4
Total
SST= 19420.9
(n-1)=542
MST= 35.83
Given the total number of observations (n = 543), the number of groups

(r = 4), the MSE (34. 4), and the F ratio (8.52), the remainder of the ANOVA
table can be completed. The critical point of the F distribution for a = 0.01
and (3, 400) degrees of freedom is 3.83. The test statistic in this example is
much larger than this critical point, so the p value associated with this test
statistic is less than 0.01, and the null hypothesis may be rejected.
9-30
9-5 Further Analysis

Data
Do Not Reject H0
Stop
ANOVA
Reject H0
The sample means are unbiased estimators of the population means.
The mean square error (MSE) is an unbiased estimator of the common
population variance.
Further
Analysis
The ANOVA Diagram
Confidence Intervals
for Population Means
Tukey Pairwise
Comparisons Test
9-31
Confidence Intervals for Population

Means
A (1 - a ) 100% confidence interval for mi , the mean of population i:
MSE
xi ta
ni
2
where t a is the value of the t distribution with (n - r ) degrees of
2
freedom that cuts off a right - tailed area of

Resort
Mean Response (x i )
Guadeloupe
89
Martinique
75
Eleuthra
73
Paradise Island
91
St. Lucia
85
SST = 112564
SSE = 98356
ni = 40
n = (5)(40) = 200
MSE = 504.39
a
2
MSE
504.39
= xi 1.96
= xi 6.96
ni
40
2
89 6.96 = [82.04, 95.96]
75 6.96 = [ 68.04,81.96]
73 6.96 = [ 66.04, 79.96]
91 6.96 = [84.04, 97.96]
85 6.96 = [ 78.04, 91.96]
xi ta
9-32
The Tukey Pairwise-Comparisons

Test
The Tukey Pairwise Comparison test, or Honestly Significant Differences (MSD) test, allows us
to compare every pair of population means with a single level of significance.
It is based on the studentized range distribution, q, with r and (n-r) degrees of freedom.
The critical point in a Tukey Pairwise Comparisons test is the Tukey Criterion:
T = qa
MSE
ni
where ni is the smallest of the r sample sizes.

The test statistic is the absolute value of the difference between the appropriate sample means, and
the null hypothesis is rejected if the test statistic is greater than the critical point of the Tukey
Criterion
N o te th a t th e re a re
r
2
r!
p a irs o f p o p u la tio n m e a n s to c o m p a re . F o r e x a m p le , if r
2 !( r 2 ) !
H 0: m1 = m 2
H 0: m1 = m 3
H0:m2 = m3
H1: m1 m 2
H1: m1 m 3
H1: m 2 m 3
3:
9-33
The Tukey Pairwise Comparison Test:

The Club Med Example
The test statistic for each pairwise test is the absolute difference between the appropriate
sample means.
i
Resort
Mean
I. H0: m1 = m2
VI. H0: m2 = m4
1
Guadeloupe
89
H1: m1 m2
H1: m2 m4
2
Martinique
75
|89-75|=14>13.7*
|75-91|=16>13.7*
3
Eleuthra
73
II. H0: m1 = m3
VII. H0: m2 = m5
4
Paradise Is.
91
H1: m1 m3
H1: m2 m5
5
St. Lucia
85
|89-73|=16>13.7*
|75-85|=10<13.7
III. H0: m1 = m4
VIII. H0: m3 = m4
The critical point T0.05 for
H1: m1 m4
H1: m3 m4
r=5 and (n-r)=195
|89-91|=2<13.7
|73-91|=18>13.7*
degrees of freedom is:
IV. H0: m1 = m5
IX. H0: m3 = m5
H1: m1 m5
H1: m3 m5
MSE
T = qa
|89-85|=4<13.7
|73-85|=12<13.7
ni
V. H0: m2 = m3
X. H0: m4 = m5
504.4
H1: m2 m3
H1: m4 m5
= 3.86
= 13.7
|75-73|=2<13.7
|91-85|= 6<13.7
40
Reject the null hypothesis if the absolute value of the difference between the sample means
is greater than the critical value of T. (The hypotheses marked with * are rejected.)
9-34
Picturing the Results of a Tukey Pairwise

Comparisons Test: The Club Med Example
We rejected the null hypothesis which compared the means of populations 1
and 2, 1 and 3, 2 and 4, and 3 and 4. On the other hand, we accepted the null
hypotheses of the equality of the means of populations 1 and 4, 1 and 5, 2
and 3, 2 and 5, 3 and 5, and 4 and 5.
m
m
m
m
m
3
The bars indicate the three groupings of populations with possibly equal
means: 2 and 3; 2, 3, and 5; and 1, 4, and 5.
9-35
Picturing the Results of a Tukey Pairwise

Comparisons Test: The Club Med Example
9-36
9-6 Models, Factors and Designs

A statistical model is a set of equations and assumptions
that capture the essential characteristics of a real-world
situation
The one-factor ANOVA model:
xij=mi+eij=m+ai+eij
where eij is the error associated with the jth member of
the ith population. The errors are assumed to be
normally distributed with mean 0 and variance s2.
9-37
9-6 Models, Factors and Designs

(Continued)
A factor is a set of populations or treatments of a single kind. For
example:
One factor models based on sets of resorts, types of airplanes, or
kinds of sweaters
Two factor models based on firm and location
Three factor models based on color and shape and size of an ad.
Fixed-Effects and Random Effects
A fixed-effects model is one in which the levels of the factor under
study (the treatments) are fixed in advance. Inference is valid only
for the levels under study.
A random-effects model is one in which the levels of the factor
under study are randomly chosen from an entire population of levels
(treatments). Inference is valid for the entire population of levels.
9-38
Experimental Design
A completely-randomized design is one in which the
elements are assigned to treatments completely at random.

That is, any element chosen for the study has an equal
chance of being assigned to any treatment.
In a blocking design, elements are assigned to treatments
after first being collected into homogeneous groups.
In a completely randomized block design, all members of each
block (homogeneous group) are randomly assigned to the treatment
levels.
In a repeated measures design, each member of each block is
assigned to all treatment levels.
9-39
9-7 Two-Way Analysis of Variance
In a two-way ANOVA, the effects of two factors or treatments can be investigated

simultaneously. Two-way ANOVA also permits the investigation of the effects of
either factor alone and of the two factors together.
The effect on the population mean that can be attributed to the levels of either factor alone
is called a main effect.
An interaction effect between two factors occurs if the total effect at some pair of levels of
the two factors or treatments differs significantly from the simple addition of the two main
effects. Factors that do not interact are called additive.
Three questions answerable by two-way ANOVA:
For example, we might investigate the effects on vacationers ratings of resorts by

looking at five different resorts (factor A) and four different resort attributes (factor
B). In addition to the five main factor A treatment levels and the four main factor B
treatment levels, there are (5*4=20) interaction treatment levels.3
Are there any factor A main effects?

Are there any factor B main effects?
Are there any interaction effects between factors A and B?
9-40
The Two-Way ANOVA Model
xijk=m+ai+ bj + (abij + eijk
where m is the overall mean;

ai is the effect of level i(i=1,...,a) of factor A;
bj is the effect of level j(j=1,...,b) of factor B;
abjj is the interaction effect of levels i and j;
ejjk is the error associated with the kth data point from
level i of factor A and level j of factor B.
ejjk is assumed to be distributed normally with mean
zero and variance s2 for all i, j, and k.
9-41
Two-Way ANOVA Data Layout:

Club Med Example
Factor B:
Attribute
Factor A: Resort
Friendship
Sports
Culture
Excitement
Guadeloupe
n11
n12
n13
n14
Martinique
n21
n22
n23
n24
Graphical Display o f Ef f ec ts
Eleuthra
n31
n32
n33
n34
R a ti n g
St. Lucia
n51
n52
n53
n54
Eleuthra/sports interaction:
Combined effect greater than
additive main effects
Rating
Friendship
Excitement
Sports
Culture
Paradise
Island
n41
n42
n43
n44
Friendship
Attribute
Excitement
Sports
Culture
Eleuthra
St. Lucia
Paradise island
Martinique
Guadeloupe
Resort
Resort
St. Lucia
Paradise Island
Eleuthra
Guadeloupe
Martinique
9-42
Hypothesis Tests a Two-Way ANOVA
Factor A main effects test:

H0: ai= 0 for all i=1,2,...,a
H1: Not all ai are 0
Factor B main effects test:

H0: bj= 0 for all j=1,2,...,b
H1: Not all bi are 0
Test for (AB) interactions:
H0: abij= 0 for all i=1,2,...,a and j=1,2,...,b

H1: Not all abij are 0
9-43
Sums of Squares
In a two-way ANOVA:
xijk=m+ai+ bj + (abijk + eijk
SST = SSTR +SSE
SST = SSA + SSB +SS(AB)+SSE
SST = SSTR SSE

( x x )2 = ( x x )2 ( x x )2
SSTR = SSA SSB SS ( AB)
= ( x x )2 ( x x )2 ( x x x x )2
i
j
ij i
j
9-44
The Two-Way ANOVA Table

Source of
Variation
Sum of
Squares
Degrees
of Freedom
Mean Square
F Ratio
Factor A
SSA
a-1
MSA =
SSA
a 1
MSA
F =
MSE
Factor B
SSB
b-1
MSB =
SSB
b 1
MSB
F=
MSE
Interaction SS(AB)
(a-1)(b-1)
MS ( AB) =
Error
SSE
ab(n-1)
Total
SST
abn-1
A Main Effect Test: F(a-1,ab(n-1))
SS ( AB)
( a 1)(b 1)
SSE
MSE =
ab( n 1)
F =
MS ( AB )
MSE
B Main Effect Test: F(b-1,ab(n-1))
(AB) Interaction Effect Test: F((a-1)(b-1),ab(n-1))
9-45
Example 9-4: Two-Way ANOVA

(Location and Artist)
Source of
Variation
Sum of
Squares
Degrees
of Freedom
Location
1824
912
8.94
Artist
2230
1115
10.93
804
201
1.97
Error
8262
81
102
Total
13120
89
Interaction
Mean Square
F Ratio
a =0.01, F(2,81)=4.88 Both main effect null hypotheses are rejected.

a=0.05, F(2,81)=2.48 Interaction effect null hypotheses are not rejected.
9-46
Hypothesis Tests

0.7
0.7
Location test statistic=8.94

Artist test statistic=10.93
0.6
0.4
Interaction test statistic=1.97
0.5
f(F)
f(F)
0.5
0.6
0.4
0.3
0.3
a=0.01
0.2
a=0.05
0.2
0.1
0.1
0.0
0.0
0
F0.01=4.88
F0.05=2.48
9-47
Overall Significance Level and Tukey

Method for Two-Way ANOVA
Kimballs Inequality gives an upper limit on the true probability of at least
one Type I error in the three tests of a two-way analysis:
a 1- (1-a1) (1-a2) (1-a3)
Tukey Criterion for factor A:
T = qa
MSE
bn
where the degrees of freedom of the q distribution are now a and ab(n-1). Note
that MSE is divided by bn.
9-48
Template for a Two-Way ANOVA
9-49
Extension of ANOVA to Three Factors

Source of
Variation
Sum of
Squares
Degrees
of Freedom
Mean Square
SSA
a 1
F Ratio
MSA
F=
MSE
Factor A
SSA
a-1
MSA =
Factor B
SSB
b-1
SSB
MSB =
b 1
F =
Factor C
SSC
c-1
MSC =
SSC
c 1
F =
Interaction
(AB)
Interaction
(AC)
Interaction
(BC)
SS(AB)
(a-1)(b-1)
SS(AC)
(a-1)(c-1)
SS(BC)
(b-1)(c-1)
SS ( AB)
( a 1)(b 1)
SS ( AC)
MS ( AC) =
(a 1)(c 1)
SS ( BC)
MS ( BC) =
(b 1)(c 1)
Interaction
(ABC)
Error
SS(ABC)
(a-1)(b-1)(c-1)
SSE
abc(n-1)
Total
SST
abcn-1
MS ( AB) =
SS ( ABC)
(a 1)(b 1)(c 1)
SSE
MSE =
abc( n 1)
MS ( ABC) =
MSB
MSE
MSC
MSE
MS ( AB)
F =
MSE
MS ( AC )
MSE
MS ( BC)
F=
MSE
F =
F=
MS( ABC)
MSE
9-50
Two-Way ANOVA with One

Observation per Cell
The case of one data point in every cell presents a
problem in two-way ANOVA.

There will be no degrees of freedom for the error term.
What can be done?
If we can assume that there are no interactions between
the main effects, then we can use SS(AB) and its
associated degrees of freedom (a 1)(b 1) in place of
SSE and its degrees of freedom.
We can then conduct main effects tests using MS(AB).
See the next slide for the ANOVA table.
9-51
Two-Way ANOVA with One

Observation per Cell
Source of
Variation
Sum of
Squares
Degrees of
Freedom
Factor A
SSA
a-1
Factor B
SSB
b-1
Error
SS(AB)
(a 1)(b
1)
Total
SST
ab - 1
Mean Square
F Ratio
MSA = SSA
a 1
F = MSA
MS (AB)
MSB = SSB
b 1
F = MSB
MS (AB)
MS ( AB) = SS ( AB)
(a 1)(b 1)
9-52
9-8 Blocking Designs

A block is a homogeneous set of subjects, grouped to
minimize within-group differences.

A competely-randomized design is one in which the
elements are assigned to treatments completely at
random. That is, any element chosen for the study has an
equal chance of being assigned to any treatment.
In a blocking design, elements are assigned to treatments
after first being collected into homogeneous groups.
In a completely randomized block design, all members of each
block (homogenous group) are randomly assigned to the
treatment levels.
In a repeated measures design, each member of each block is
assigned to all treatment levels.
9-53
Model for Randomized Complete

Block Design
xij=m+ai+ bj + eij
where m is the overall mean;

ai is the effect of level i(i=1,...,a) of factor A;
bj is the effect of block j(j=1,...,b);
eij is the error associated with xij
eij is assumed to be distributed normally with
mean zero and variance s2 for all i and j.
9-54
ANOVA Table for Blocking Designs:

Example 9-5
Source of Variation Sum of Squares Degress of Freedom Mean Square
Blocks
Treatments
Error
Total
SSBL
SSTR
SSE
SST
Source of Variation
Blocks
Treatments
Error
Total
n-1
r-1
(n -1)(r - 1)
nr - 1
F Ratio
MSBL = SSBL/(n-1) F = MSBL/MSE

MSTR = SSTR/(r-1) F = MSTR/MSE
MSE = SSE/(n-1)(r-1)
Sum of Squares
df
Mean Square F Ratio
2750
39
70.51
0.69
2640
2
1320
12.93
7960
78
102.05
13350 119
a = 0.01, F(2, 78) = 4.88
9-55
Template for the Randomized

Complete Block Design

Anova

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Anova

Uploaded by

Copyright:

Available Formats

9-1

Explain the purpose of ANOVA

9-1 Using Statistics

ANOVA is designed to detect differences among means

9-2 The Hypothesis Test of

We have r independent random samples, each one

r sample variances: s12, s22, s32, ...,sr2

These sample variances can be used to find a pooled

9-2 The Hypothesis Test of Analysis of

are normally distributed,

9-2 The Hypothesis Test of Analysis of

Estimate of variance based on means from r samples

When the Null Hypothesis Is True

We would expect the sample means to be nearly

Estimate of variance based on means from r samples

When the Null Hypothesis Is False

When the null hypothesis is false:

Estimate of variance based on means from r samples

The ANOVA Test Statistic for r = 4 Populations and

Suppose we have 4 populations, from each of which we

Estimate of variance based on all 54 sample observations

F Distribution with 3 and 50 Degrees of Freedom

The nonrejection region (for a=0.05)in this

H0 cannot be rejected, and we cannot conclude that any of the

9-3 The Theory and the Computations

The mean of sample i (i = 1,2,3,...,r) :

Using the Grand Mean: Table 9-1

Distance from data point to its sample mean

The Theory and Computations of ANOVA:

We define a treatment deviation as the deviation of a sample mean

The ANOVA principle says:

The Theory and Computations of

Consider data point x24=13 from table 9-1. The

Tot 24 = t 2 e24 = 1.5 4 .591 = 6.091

The Theory and Computations of

The Theory and Computations of ANOVA:

The Sum of Squares Principle

The Theory and Computations of ANOVA:

The Theory and Computations of

The Theory and Computations of

Mean square error (MSE):

Mean square total (MST):

The Theory and Computations of

Expected Mean Squares and the

The Theory and Computations of

9-4 The ANOVA Table and Examples

F Distribution for 2 and 8 Degrees of Freedom

The ANOVA Table summarizes the

Computed test statistic=37.62

In this instance, since the test statistic is

Example 9-2: Club Med

SSTR= 14208 (r-1)= 4

F Distribution with 4 and 200 Degrees of Freedom

Computed test statistic=7.04

Example 9-3: Job Involvement

Given the total number of observations (n = 543), the number of groups

9-5 Further Analysis

The ANOVA Diagram

Confidence Intervals for Population

freedom that cuts off a right - tailed area of

The Tukey Pairwise-Comparisons

where ni is the smallest of the r sample sizes.

The Tukey Pairwise Comparison Test: