Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 55

9-1

COMPLETE
BUSINESS
STATISTICS
by
AMIR D. ACZEL
&
JAYAVEL SOUNDERPANDIAN
6th edition.

9-2

Chapter 9
Analysis of Variance

9-3

9 Analysis of Variance
Using Statistics
The Hypothesis Test of Analysis of Variance
The Theory and Computations of ANOVA
The ANOVA Table and Examples
Further Analysis
Models, Factors, and Designs
Two-Way Analysis of Variance
Blocking Designs

9-4

9 LEARNING OBJECTIVES
After studying this chapter you should be able to:

Explain the purpose of ANOVA


Describe the model and computations behind ANOVA
Explain the test statistic F
Conduct a 1-way ANOVA
Report ANOVA results in an ANOVA table
Apply Tukey test for pair-wise analysis
Conduct a 2-way ANOVA
Explain blocking designs
Apply templates to conduct 1-way and 2-way ANOVA

9-5

9-1 Using Statistics


ANOVA (ANalysis Of VAriance) is a statistical
method for determining the existence of
differences among several population means.

ANOVA is designed to detect differences among means


from populations subject to different treatments
ANOVA is a joint test
The equality of several population means is tested
simultaneously or jointly.
ANOVA tests for the equality of several population
means by looking at two estimators of the population
variance (hence, analysis of variance).

9-6

9-2 The Hypothesis Test of


Analysis of Variance
In an analysis of variance:

We have r independent random samples, each one


corresponding to a population subject to a different
treatment.
We have:
n = n1+ n2+ n3+ ...+nr total observations.
r sample means: x1, x2 , x3 , ... , xr
These r sample means can be used to calculate an estimator of
the population variance. If the population means are equal,
we expect the variance among the sample means to be small.

r sample variances: s12, s22, s32, ...,sr2

These sample variances can be used to find a pooled


estimator of the population variance.

9-7

9-2 The Hypothesis Test of Analysis of


Variance (continued): Assumptions
We assume independent random sampling from each of the

r populations
We assume that the r populations under study:

are normally distributed,


with means mi that may or may not be equal,
but with equal variances, si2.
s

m1

Population 1

m2

Population 2

m3

Population 3

9-8

9-2 The Hypothesis Test of Analysis of


Variance (continued)
The hypothesis test of analysis of variance:
H0: m1 = m2 = m3 = m4 = ... mr
H1: Not all mi (i = 1, ..., r) are equal
The test statistic of analysis of variance:
F(r-1, n-r) =

Estimate of variance based on means from r samples


Estimate of variance based on all sample observations

That is, the test statistic in an analysis of variance is based on the ratio of
two estimators of a population variance, and is therefore based on the F
distribution, with (r-1) degrees of freedom in the numerator and (n-r)
degrees of freedom in the denominator.

9-9

When the Null Hypothesis Is True


When the null hypothesis is true:
H0: m

= m =m

We would expect the sample means to be nearly


equal, as in this illustration. And we would
expect the variation among the sample means
(between sample) to be small, relative to the
variation found around the individual sample
means (within sample).
If the null hypothesis is true, the numerator in
the test statistic is expected to be small, relative
to the denominator:

F(r-1, n-r)=
x

Estimate of variance based on means from r samples


Estimate of variance based on all sample observations

9-10

When the Null Hypothesis Is False

When the null hypothesis is false:


m is equal to m but not to m ,
m is equal to m but not to m ,
m is equal to m but not to m , or
m , m , and m are all unequal.

In any of these situations, we would not expect the sample means to all be nearly
equal. We would expect the variation among the sample means (between
sample) to be large, relative to the variation around the individual sample means
(within sample).
If the null hypothesis is false, the numerator in the test statistic is expected to be
large, relative to the denominator:
F(r-1,

n-r)=

Estimate of variance based on means from r samples


Estimate of variance based on all sample observations

9-11

The ANOVA Test Statistic for r = 4 Populations and


n = 54 Total Sample Observations

Suppose we have 4 populations, from each of which we


draw an independent random sample, with n1 + n2 + n3
+ n4 = 54. Then our test statistic is:
F(4-1, 54-4)= F(3,50) = Estimate of variance based on means from 4 samples

Estimate of variance based on all 54 sample observations

F Distribution with 3 and 50 Degrees of Freedom


0.7
0.6

f(F)

0.5
0.4
0.3
0.2

a=0.05

0.1
0.0
0

3
2.79

F(3,50)

The nonrejection region (for a=0.05)in this


instance is F 2.79, and the rejection region
is F > 2.79. If the test statistic is less than
2.79 we would not reject the null hypothesis,
and we would conclude the 4 population
means are equal. If the test statistic is
greater than 2.79, we would reject the null
hypothesis and conclude that the four
population means are not equal.

9-12

Example 9-1
Randomly chosen groups of customers were served different types of coffee and asked to rate the
coffee on a scale of 0 to 100: 21 were served pure Brazilian coffee, 20 were served pure Colombian
coffee, and 22 were served pure African-grown coffee.
The resulting test statistic was F = 2.02
F Distribution with 2 and 60 Degrees of Freedom

F = 2.02 F

2,60

= 3.15

H0 cannot be rejected, and we cannot conclude that any of the


population means differs significan tly from the others.

0.7
0.6
0.5

f(F)

H0 : m1 = m2 = m3
H1: Not all three means equal
n1 = 21 n 2 = 20 n3 = 22 n = 21+ 20 + 22 = 63
r =3
The critical point for a = 0.05 is :
F
=F
=F
= 3.15
r -1,n-r 31,633 2,60

0.4
0.3
0.2

a=0.05

0.1
0.0
0

Test Statistic=2.02

F(2,60)=3.15

9-13

9-3 The Theory and the Computations


of ANOVA: The Grand Mean
The grand mean, x, is the mean of all n = n1+ n2+ n3+...+ nr observations
in all r samples.

The mean of sample i (i = 1,2,3,...,r) :


ni
x
j =1 ij
xi =
ni
The grand mean, the mean of all data points :
r ni
r
xij ni xi
i=1 j =1
xi =
= i=1
n
n
where x is the particular data point in position j within th e sample from population i.
ij
The subscript i denotes the population , or treatment, and runs from 1 to r. The subscript j
denotes the data point with in the sample from population i; thus, j runs from 1 to n .
j

9-14

Using the Grand Mean: Table 9-1


Treatment (j)

Sample point(j)

I = 1 Triangle
1
Triangle
2
Triangle
3
Triangle
4
Mean of Triangles
I = 2 Square
1
Square
2
Square
3
Square
4
Mean of Squares
I = 3 Circle
1
Circle
2
Circle
3
Mean of Circles
Grand mean of all data points

Value(x ij)
4
5
7
8
6
10
11
12
13
11.5
1
2
3
2
6.909

x1=6
x2=11.5
x=6.909

x3=2
0

10

Distance from data point to its sample mean


Distance from sample mean to grand mean
If the r population means are different (that is, at
least two of the population means are not equal),
then it is likely that the variation of the data
points about their respective sample means
(within sample variation) will be small relative
to the variation of the r sample means about the
grand mean (between sample variation).

9-15

The Theory and Computations of ANOVA:


Error Deviation and Treatment Deviation
We define an error devi ation as the difference between a data point
and its sample mean. Errors are denoted by e, and we have:

e =x x
ij

ij

We define a treatment deviation as the deviation of a sample mean


from the grand mean. Treatment deviations, ti , are given by:

t =x x
i

The ANOVA principle says:


When the population means are not equal, the average error
(within sample) is relatively small compared with the average
treatment (between sample) deviation.

9-16

The Theory and Computations of


ANOVA: The Total Deviation
The total deviation (Totij) is the difference between a data point (xij) and the grand mean (x):
Totij=xij - x
For any data point xij:
Tot = t + e
That is:
Total Deviation = Treatment Deviation + Error Deviation

Consider data point x24=13 from table 9-1. The


mean of sample 2 is 11.5, and the grand mean is
6.909, so:
e24 = x 24 x 2 = 13 11.5 = 1.5
t 2 = x 2 x = 11.5 6.909 = 4 .591

Tot 24 = t 2 e24 = 1.5 4 .591 = 6.091


or
Tot 24 = x 24 x = 13 6.909 = 6.091

Total deviation:
Tot24=x24-x=6.091
Error deviation:
e24=x24-x2=1.5

x24=13

Treatment deviation:
t2=x2-x=4.591

x2=11.5
x = 6.909

10

9-17

The Theory and Computations of


ANOVA: Squared Deviations
Total Deviation = Treatment Deviation + Error Deviation
The total deviation is the sum of the treatment deviation and the error deviation:
t + e = ( x x ) ( xij x ) = ( xij x ) = Tot ij
i
ij
i
i
Notice that the sample mean term ( x ) cancels out in the above addition, which
i
simplifies the equation.

Squared Deviations
2
2
2
+e
= ( x x ) ( xij x )
i
ij
i
i
2
2
Tot ij = ( xij x )
t

9-18

The Theory and Computations of ANOVA:


The Sum of Squares Principle
Sums of Squared Deviations
n
n
j
j
r
r
r
2
2
2

Tot
e
= nt
+
ij
i =1j =1
i =1 ii
i = 1 j = 1 ij
n
n
j
j
r
r
r
2
2

(x x) = n (x x)
( x x )2
i
i = 1 j = 1 ij
i =1 i i
i = 1 j = 1 ij

SST =

SSTR

SSE

The Sum of Squares Principle


The total sum of squares (SST) is the sum of two terms: the sum of
squares for treatment (SSTR) and the sum of squares for error (SSE).
SST = SSTR + SSE

9-19

The Theory and Computations of ANOVA:


Picturing The Sum of Squares Principle
SSTR

SSE

SST
SST measures the total variation in the data set, the variation of all individual data
points from the grand mean.
SSTR measures the explained variation, the variation of individual sample means
from the grand mean. It is that part of the variation that is possibly expected, or
explained, because the data points are drawn from different populations. Its the
variation between groups of data points.
SSE measures unexplained variation, the variation within each group that cannot be
explained by possible differences between the groups.

9-20

The Theory and Computations of


ANOVA: Degrees of Freedom
The number of degrees of freedom associated with SST is (n - 1).
n total observations in all r groups, less one degree of freedom
lost with the calculation of the grand mean
The number of degrees of freedom associated with SSTR is (r - 1).
r sample means, less one degree of freedom lost with the
calculation of the grand mean
The number of degrees of freedom associated with SSE is (n-r).
n total observations in all groups, less one degree of freedom
lost with the calculation of the sample mean from each of r groups

The degrees of freedom are additive in the same way as are the sums of squares:
df(total) = df(treatment) + df(error)
(n - 1) = (r - 1)
+ (n - r)

9-21

The Theory and Computations of


ANOVA: The Mean Squares
Recall that the calculation of the sample variance involves the division of the sum of
squared deviations from the sample mean by the number of degrees of freedom. This
principle is applied as well to find the mean squared deviations within the analysis of
variance.
Mean square treatment (MSTR):

SSTR
MSTR =
(r 1)

Mean square error (MSE):

SSE
MSE =
(n r )

Mean square total (MST):

SST
MST =
(n 1)

(Note that the additive properties of sums of squares do not extend to the mean
squares. MST MSTR + MSE.

9-22

The Theory and Computations of


ANOVA: The Expected Mean Squares
2
E ( MSE ) = s
and
2

m
n
(
)
= s 2 when the null hypothesis is true
2
i
i
E ( MSTR) = s
r 1
> s 2 when the null hypothesis is false
where mi is the mean of population i and m is the combined mean of all r populations.
That is, the expected mean square error (MSE) is simply the common population variance
(remember the assumption of equal population variances), but the expected treatment sum of
squares (MSTR) is the common population variance plus a term related to the variation of the
individual population means around the grand population mean.
If the null hypothesis is true so that the population means are all equal, the second term in
the E(MSTR) formulation is zero, and E(MSTR) is equal to the common population variance.

9-23

Expected Mean Squares and the


ANOVA Principle
When the null hypothesis of ANOVA is true and all r population means are
equal, MSTR and MSE are two independent, unbiased estimators of the
common population variance s2.
On the other hand, when the null hypothesis is false, then MSTR will tend to
be larger than MSE.
So the ratio of MSTR and MSE can be used as an indicator of the
equality or inequality of the r population means.
This ratio (MSTR/MSE) will tend to be near to 1 if the null hypothesis is
true, and greater than 1 if the null hypothesis is false. The ANOVA test,
finally, is a test of whether (MSTR/MSE) is equal to, or greater than, 1.

9-24

The Theory and Computations of


ANOVA: The F Statistic
Under the assumptions of ANOVA, the ratio (MSTR/MSE)
possess an F distribution with (r-1) degrees of freedom for the
numerator and (n-r) degrees of freedom for the denominator
when the null hypothesis is true.
The test statistic in analysis of variance:
F( r -1,n -r )

MSTR
MSE

9-25

9-4 The ANOVA Table and Examples


Treatment (i)

Triangle

-2

Triangle

-1

Triangle

Triangle

Square

10

-1.5

2.25

Square
Square
Square

2
2
2

2
3
4

11
12
13

-0.5
0.5
1.5

0.25
0.25
2.25

Circle

-1

Circle

Circle

73

17

Treatment

(xi -x)

Value (x ij )

(x ij -xi ) (x ij -xi )2

(xi -x) 2

ni (x i -x) 2

Triangle

-0.909

0.826281

3.305124

Square

4.591

21.077281

84.309124

Circle

-4.909

124.098281

72.294843

159.909091

n
j
r
( x x ) 2 = 17
SSE =
i
i = 1 j = 1 ij
r
2
SSTR = n ( x x ) = 159 .9
i =1 i i
SSTR
159.9
=
= 79.95
MSTR =

r 1
( 3 1)
SSTR 17
=
= 2 .125
MSE =

n r
8
MSTR
79 .95
=
=
= 37 .62.
F
MSE
2 .125
( 2 ,8 )
Critical point ( a = 0.01): 8.65
H may be rejected at t he 0.01 level
0
of significance.

9-26

ANOVA Table
Source of
Variation

Sum of
Squares

Degrees of
Freedom Mean Square F Ratio

Treatment SSTR=159.9

(r-1)=2

MSTR=79.95 37.62

Error

SSE=17.0

(n-r)=8

MSE=2.125

Total

SST=176.9

(n-1)=10

MST=17.69

F Distribution for 2 and 8 Degrees of Freedom


0.7

The ANOVA Table summarizes the


ANOVA calculations.

0.6
0.5

Computed test statistic=37.62

f(F)

0.4
0.3
0.2
0.01

0.1
0.0
0

10
8.65

F(2,8)

In this instance, since the test statistic is


greater than the critical point for an a =
0.01 level of significance, the null
hypothesis may be rejected, and we may
conclude that the means for triangles,
squares, and circles are not all equal.

9-27

Template Output

9-28

Example 9-2: Club Med


Club Med has conducted a test to determine whether its Caribbean resorts are equally well liked by
vacationing club members. The analysis was based on a survey questionnaire (general satisfaction,
on a scale from 0 to 100) filled out by a random sample of 40 respondents from each of 5 resorts.
Resort
Guadeloupe

89

Source of
Variation

Martinique

75

Treatment

SSTR= 14208 (r-1)= 4

MSTR= 3552

Eleuthra

73

Error

SSE=98356

(n-r)= 195

MSE= 504.39

Paradise Island

91

Total

SST=112564

(n-1)= 199

MST= 565.65

St. Lucia

85
SSE=98356

Sum of
Squares

Degrees of
Freedom

Mean Square

F Ratio
7.04

F Distribution with 4 and 200 Degrees of Freedom


0.7
0.6

0.5

f(F)

SST=112564

Mean Response (x i )

Computed test statistic=7.04

0.4
0.3

0.2
0.01

0.1
0.0
0
3.41

F(4,200)

The resultant F
ratio is larger than
the critical point for
a = 0.01, so the
null hypothesis may
be rejected.

9-29

Example 9-3: Job Involvement


Source of
Variation

Sum of
Squares

Degrees of
Freedom

Mean Square

F Ratio

Treatment

SSTR= 879.3

(r-1)=3

MSTR= 293.1

8.52

Error

SSE= 18541.6

(n-r)= 539

MSE=34.4

Total

SST= 19420.9

(n-1)=542

MST= 35.83

Given the total number of observations (n = 543), the number of groups


(r = 4), the MSE (34. 4), and the F ratio (8.52), the remainder of the ANOVA
table can be completed. The critical point of the F distribution for a = 0.01
and (3, 400) degrees of freedom is 3.83. The test statistic in this example is
much larger than this critical point, so the p value associated with this test
statistic is less than 0.01, and the null hypothesis may be rejected.

9-30

9-5 Further Analysis


Data

Do Not Reject H0

Stop

ANOVA

Reject H0
The sample means are unbiased estimators of the population means.
The mean square error (MSE) is an unbiased estimator of the common
population variance.

Further
Analysis

The ANOVA Diagram

Confidence Intervals
for Population Means

Tukey Pairwise
Comparisons Test

9-31

Confidence Intervals for Population


Means
A (1 - a ) 100% confidence interval for mi , the mean of population i:
MSE
xi ta
ni
2
where t a is the value of the t distribution with (n - r ) degrees of
2

freedom that cuts off a right - tailed area of


Resort

Mean Response (x i )

Guadeloupe

89

Martinique

75

Eleuthra

73

Paradise Island

91

St. Lucia

85

SST = 112564

SSE = 98356

ni = 40

n = (5)(40) = 200

MSE = 504.39

a
2

MSE
504.39
= xi 1.96
= xi 6.96
ni
40
2
89 6.96 = [82.04, 95.96]
75 6.96 = [ 68.04,81.96]
73 6.96 = [ 66.04, 79.96]
91 6.96 = [84.04, 97.96]
85 6.96 = [ 78.04, 91.96]
xi ta

9-32

The Tukey Pairwise-Comparisons


Test
The Tukey Pairwise Comparison test, or Honestly Significant Differences (MSD) test, allows us
to compare every pair of population means with a single level of significance.
It is based on the studentized range distribution, q, with r and (n-r) degrees of freedom.
The critical point in a Tukey Pairwise Comparisons test is the Tukey Criterion:

T = qa

MSE
ni

where ni is the smallest of the r sample sizes.


The test statistic is the absolute value of the difference between the appropriate sample means, and
the null hypothesis is rejected if the test statistic is greater than the critical point of the Tukey
Criterion
N o te th a t th e re a re

r
2

r!

p a irs o f p o p u la tio n m e a n s to c o m p a re . F o r e x a m p le , if r
2 !( r 2 ) !
H 0: m1 = m 2
H 0: m1 = m 3
H0:m2 = m3
H1: m1 m 2
H1: m1 m 3
H1: m 2 m 3

3:

9-33

The Tukey Pairwise Comparison Test:


The Club Med Example
The test statistic for each pairwise test is the absolute difference between the appropriate
sample means.
i
Resort
Mean
I. H0: m1 = m2
VI. H0: m2 = m4
1
Guadeloupe
89
H1: m1 m2
H1: m2 m4
2
Martinique
75
|89-75|=14>13.7*
|75-91|=16>13.7*
3
Eleuthra
73
II. H0: m1 = m3
VII. H0: m2 = m5
4
Paradise Is.
91
H1: m1 m3
H1: m2 m5
5
St. Lucia
85
|89-73|=16>13.7*
|75-85|=10<13.7
III. H0: m1 = m4
VIII. H0: m3 = m4
The critical point T0.05 for
H1: m1 m4
H1: m3 m4
r=5 and (n-r)=195
|89-91|=2<13.7
|73-91|=18>13.7*
degrees of freedom is:
IV. H0: m1 = m5
IX. H0: m3 = m5
H1: m1 m5
H1: m3 m5
MSE
T = qa
|89-85|=4<13.7
|73-85|=12<13.7
ni
V. H0: m2 = m3
X. H0: m4 = m5
504.4
H1: m2 m3
H1: m4 m5
= 3.86
= 13.7
|75-73|=2<13.7
|91-85|= 6<13.7
40
Reject the null hypothesis if the absolute value of the difference between the sample means
is greater than the critical value of T. (The hypotheses marked with * are rejected.)

9-34

Picturing the Results of a Tukey Pairwise


Comparisons Test: The Club Med Example
We rejected the null hypothesis which compared the means of populations 1
and 2, 1 and 3, 2 and 4, and 3 and 4. On the other hand, we accepted the null
hypotheses of the equality of the means of populations 1 and 4, 1 and 5, 2
and 3, 2 and 5, 3 and 5, and 4 and 5.
m
m
m
m
m
3

The bars indicate the three groupings of populations with possibly equal
means: 2 and 3; 2, 3, and 5; and 1, 4, and 5.

9-35

Picturing the Results of a Tukey Pairwise


Comparisons Test: The Club Med Example

9-36

9-6 Models, Factors and Designs


A statistical model is a set of equations and assumptions
that capture the essential characteristics of a real-world
situation
The one-factor ANOVA model:
xij=mi+eij=m+ai+eij
where eij is the error associated with the jth member of
the ith population. The errors are assumed to be
normally distributed with mean 0 and variance s2.

9-37

9-6 Models, Factors and Designs


(Continued)
A factor is a set of populations or treatments of a single kind. For

example:
One factor models based on sets of resorts, types of airplanes, or
kinds of sweaters
Two factor models based on firm and location
Three factor models based on color and shape and size of an ad.
Fixed-Effects and Random Effects
A fixed-effects model is one in which the levels of the factor under
study (the treatments) are fixed in advance. Inference is valid only
for the levels under study.
A random-effects model is one in which the levels of the factor
under study are randomly chosen from an entire population of levels
(treatments). Inference is valid for the entire population of levels.

9-38

Experimental Design
A completely-randomized design is one in which the

elements are assigned to treatments completely at random.


That is, any element chosen for the study has an equal
chance of being assigned to any treatment.
In a blocking design, elements are assigned to treatments
after first being collected into homogeneous groups.
In a completely randomized block design, all members of each
block (homogeneous group) are randomly assigned to the treatment
levels.
In a repeated measures design, each member of each block is
assigned to all treatment levels.

9-39

9-7 Two-Way Analysis of Variance

In a two-way ANOVA, the effects of two factors or treatments can be investigated


simultaneously. Two-way ANOVA also permits the investigation of the effects of
either factor alone and of the two factors together.

The effect on the population mean that can be attributed to the levels of either factor alone
is called a main effect.
An interaction effect between two factors occurs if the total effect at some pair of levels of
the two factors or treatments differs significantly from the simple addition of the two main
effects. Factors that do not interact are called additive.

Three questions answerable by two-way ANOVA:

For example, we might investigate the effects on vacationers ratings of resorts by


looking at five different resorts (factor A) and four different resort attributes (factor
B). In addition to the five main factor A treatment levels and the four main factor B
treatment levels, there are (5*4=20) interaction treatment levels.3

Are there any factor A main effects?


Are there any factor B main effects?
Are there any interaction effects between factors A and B?

9-40

The Two-Way ANOVA Model

xijk=m+ai+ bj + (abij + eijk

where m is the overall mean;


ai is the effect of level i(i=1,...,a) of factor A;
bj is the effect of level j(j=1,...,b) of factor B;
abjj is the interaction effect of levels i and j;
ejjk is the error associated with the kth data point from
level i of factor A and level j of factor B.
ejjk is assumed to be distributed normally with mean
zero and variance s2 for all i, j, and k.

9-41

Two-Way ANOVA Data Layout:


Club Med Example
Factor B:
Attribute

Factor A: Resort
Friendship
Sports
Culture
Excitement

Guadeloupe
n11
n12
n13
n14

Martinique
n21
n22
n23
n24

Graphical Display o f Ef f ec ts

Eleuthra
n31
n32
n33
n34

R a ti n g

St. Lucia
n51
n52
n53
n54

Eleuthra/sports interaction:
Combined effect greater than
additive main effects

Rating

Friendship
Excitement
Sports
Culture

Paradise
Island
n41
n42
n43
n44

Friendship
Attribute
Excitement
Sports
Culture

Eleuthra
St. Lucia
Paradise island
Martinique
Guadeloupe
Resort

Resort
St. Lucia
Paradise Island
Eleuthra
Guadeloupe
Martinique

9-42

Hypothesis Tests a Two-Way ANOVA

Factor A main effects test:


H0: ai= 0 for all i=1,2,...,a
H1: Not all ai are 0

Factor B main effects test:


H0: bj= 0 for all j=1,2,...,b
H1: Not all bi are 0

Test for (AB) interactions:

H0: abij= 0 for all i=1,2,...,a and j=1,2,...,b


H1: Not all abij are 0

9-43

Sums of Squares

In a two-way ANOVA:
xijk=m+ai+ bj + (abijk + eijk
SST = SSTR +SSE
SST = SSA + SSB +SS(AB)+SSE

SST = SSTR SSE


( x x )2 = ( x x )2 ( x x )2
SSTR = SSA SSB SS ( AB)
= ( x x )2 ( x x )2 ( x x x x )2
i
j
ij i
j

9-44

The Two-Way ANOVA Table


Source of
Variation

Sum of
Squares

Degrees
of Freedom

Mean Square

F Ratio

Factor A

SSA

a-1

MSA =

SSA
a 1

MSA
F =
MSE

Factor B

SSB

b-1

MSB =

SSB
b 1

MSB
F=
MSE

Interaction SS(AB)

(a-1)(b-1)

MS ( AB) =

Error

SSE

ab(n-1)

Total

SST

abn-1

A Main Effect Test: F(a-1,ab(n-1))

SS ( AB)
( a 1)(b 1)
SSE
MSE =
ab( n 1)

F =

MS ( AB )
MSE

B Main Effect Test: F(b-1,ab(n-1))

(AB) Interaction Effect Test: F((a-1)(b-1),ab(n-1))

9-45

Example 9-4: Two-Way ANOVA


(Location and Artist)
Source of
Variation

Sum of
Squares

Degrees
of Freedom

Location

1824

912

8.94

Artist

2230

1115

10.93

804

201

1.97

Error

8262

81

102

Total

13120

89

Interaction

Mean Square

F Ratio

a =0.01, F(2,81)=4.88 Both main effect null hypotheses are rejected.


a=0.05, F(2,81)=2.48 Interaction effect null hypotheses are not rejected.

9-46

Hypothesis Tests
F Distribution with 2 and 81 Degrees of Freedom

F Distribution with 4 and 81 Degrees of Freedom


0.7

0.7

Location test statistic=8.94


Artist test statistic=10.93

0.6

0.4

Interaction test statistic=1.97

0.5

f(F)

f(F)

0.5

0.6

0.4
0.3

0.3

a=0.01

0.2

a=0.05

0.2

0.1

0.1

0.0

0.0
0

F0.01=4.88

F0.05=2.48

9-47

Overall Significance Level and Tukey


Method for Two-Way ANOVA
Kimballs Inequality gives an upper limit on the true probability of at least
one Type I error in the three tests of a two-way analysis:
a 1- (1-a1) (1-a2) (1-a3)
Tukey Criterion for factor A:
T = qa

MSE
bn

where the degrees of freedom of the q distribution are now a and ab(n-1). Note
that MSE is divided by bn.

9-48

Template for a Two-Way ANOVA

9-49

Extension of ANOVA to Three Factors


Source of
Variation

Sum of
Squares

Degrees
of Freedom

Mean Square
SSA
a 1

F Ratio

MSA
F=
MSE

Factor A

SSA

a-1

MSA =

Factor B

SSB

b-1

SSB
MSB =
b 1

F =

Factor C

SSC

c-1

MSC =

SSC
c 1

F =

Interaction
(AB)
Interaction
(AC)
Interaction
(BC)

SS(AB)

(a-1)(b-1)

SS(AC)

(a-1)(c-1)

SS(BC)

(b-1)(c-1)

SS ( AB)
( a 1)(b 1)
SS ( AC)
MS ( AC) =
(a 1)(c 1)
SS ( BC)
MS ( BC) =
(b 1)(c 1)

Interaction
(ABC)
Error

SS(ABC)

(a-1)(b-1)(c-1)

SSE

abc(n-1)

Total

SST

abcn-1

MS ( AB) =

SS ( ABC)
(a 1)(b 1)(c 1)
SSE
MSE =
abc( n 1)

MS ( ABC) =

MSB
MSE

MSC
MSE
MS ( AB)
F =
MSE
MS ( AC )
MSE
MS ( BC)
F=
MSE

F =

F=

MS( ABC)
MSE

9-50

Two-Way ANOVA with One


Observation per Cell
The case of one data point in every cell presents a

problem in two-way ANOVA.


There will be no degrees of freedom for the error term.
What can be done?
If we can assume that there are no interactions between
the main effects, then we can use SS(AB) and its
associated degrees of freedom (a 1)(b 1) in place of
SSE and its degrees of freedom.
We can then conduct main effects tests using MS(AB).
See the next slide for the ANOVA table.

9-51

Two-Way ANOVA with One


Observation per Cell
Source of
Variation

Sum of
Squares

Degrees of
Freedom

Factor A

SSA

a-1

Factor B

SSB

b-1

Error

SS(AB)

(a 1)(b
1)

Total

SST

ab - 1

Mean Square

F Ratio

MSA = SSA
a 1

F = MSA
MS (AB)

MSB = SSB
b 1

F = MSB
MS (AB)

MS ( AB) = SS ( AB)
(a 1)(b 1)

9-52

9-8 Blocking Designs


A block is a homogeneous set of subjects, grouped to

minimize within-group differences.


A competely-randomized design is one in which the
elements are assigned to treatments completely at
random. That is, any element chosen for the study has an
equal chance of being assigned to any treatment.
In a blocking design, elements are assigned to treatments
after first being collected into homogeneous groups.
In a completely randomized block design, all members of each
block (homogenous group) are randomly assigned to the
treatment levels.
In a repeated measures design, each member of each block is
assigned to all treatment levels.

9-53

Model for Randomized Complete


Block Design

xij=m+ai+ bj + eij

where m is the overall mean;


ai is the effect of level i(i=1,...,a) of factor A;
bj is the effect of block j(j=1,...,b);
eij is the error associated with xij
eij is assumed to be distributed normally with
mean zero and variance s2 for all i and j.

9-54

ANOVA Table for Blocking Designs:


Example 9-5
Source of Variation Sum of Squares Degress of Freedom Mean Square
Blocks
Treatments
Error
Total

SSBL
SSTR
SSE
SST

Source of Variation
Blocks
Treatments
Error
Total

n-1
r-1
(n -1)(r - 1)
nr - 1

F Ratio

MSBL = SSBL/(n-1) F = MSBL/MSE


MSTR = SSTR/(r-1) F = MSTR/MSE
MSE = SSE/(n-1)(r-1)

Sum of Squares
df
Mean Square F Ratio
2750
39
70.51
0.69
2640
2
1320
12.93
7960
78
102.05
13350 119

a = 0.01, F(2, 78) = 4.88

9-55

Template for the Randomized


Complete Block Design

You might also like