Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

4.1 Introduction To Studying Two Factors: ESGC 6112: Lecture 4 Two-Factor Cross-Classification Designs

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

ESGC 6112: Lecture 4

Two-Factor Cross-Classification Designs

4.1 Introduction to Studying Two Factors

One-factor designs – experiments designed to determine whether the


level of a factor, the independent variable, affects the value of some
quantity of interest, the dependent variable.
- eg. Whether device/usage influences battery life

Suppose we want to determine whether battery life varies by battery


brand (a different issue from the brand of the device in which the
battery is used). We could design another one-factor experiment and
note its results along with the results of the first experiment, which
considered the impact of the test device. – frequently used in practice.
Unfortunately it is an unwise process  conclusion may be incorrect.
For eg., maybe one brand of battery is superior when used in one
device, but another brand is superior in a different device. Device and
brand of battery may be synergistic (or interactive) in their effect on
battery lifetime. Even if there were no interaction effect between the
two factors under study, studying the two factors one at a time would
result in less reliable estimates of effects given the same total number
of data values. And if we study one factor while holding the other
factor constant, we may not be able to generalize the results.

- Develop the preferred procedure for examining the possible effect


of two cross-classified factors on some quantity of interest. Two
factors are said to be „crossed‟ or cross-classified when each level
of one factor is in combination with each level of the other factor.

4.2 Designs with Replication

Example 4.1. Battery brand as a second factor

2 independents variables (factors): battery brand and the device in which the
battery is used.
Study 3 devices and 4 brands, and decide to run each combination of levels
of factors for two batteries; that is, each combination of levels of factors, or
“treatment combination” is replicated twice.

Assumption: each treatment combination has the same number of replicates,


n.

If all treatment combinations do not include the same number of replicates,


nij is used for the number of replicates of treatment combination (i, j).

Table 4.1 Battery Lifetime in Hours by Brand and Device

Brand
Device 1 2 3 4
1 17.9, 18.1 17.8, 17.8 18.1, 18.2 17.8, 17.9
2 18.2, 18.0 18.0, 18.3 18.4, 18.1 18.1, 18.5
3 18.0, 17.8 17.8, 18.0 18.1, 18.3 18.1, 17.9

When we ran the experiment (and observed battery lifetime of 2 batteries), at


the combination of device 1 and brand 1, battery life values were 17.9 hours
and 18.1 hours, respectively, for the two replicates.

Table 4.1 has 3 rows and four column, and two replicates (n=2) for each cell
or treatment combination.

The Model:

Yijk    i   j  I ij  ijk

where

i = 1, 2, 3, ….., R; i indexes rows


j = 1, 2, 3, ….., C; j indexes columns
k = 1, 2, 3, ….., n; j indexes replication per cell
Yijk = the data point that corresponds with the kth replicate in the cell at the
intersection of the ith row and jth column; eg. Y231 =18.4 in Table 4.1
 = the grand mean
i the difference between the ith row mean and the grand mean
 j = the difference between the jth column mean and the grand mean
I ij = a measure of the interaction associated with the ith row and the jth
column
ijk = the “error” or “noise” in the data value (i,j,k) – the difference between
the data value and the true mean of that cell

We have n observations per cell, and RC cells, total= (nRC) data values,
df=(nRC-1)

Parameter Estimates

Yijk  Y...  (Yi..  Y... )  (Y. j.  Y... )  (Yij.  Yi..  Y. j.  Y... )  (Yijk  Yij. )

Y... - grand mean


Yi.. - mean of row i
Y. j. - mean of column j
Yij . - mean of cell [i,j]

(Yij.  Yi..  Y. j.  Y... )  (Yij.  Y... )  (Yi..  Y... )  (Y. j.  Y... )

(Yij.  Y... ) - the difference between the cell mean and the grand mean
(Yi..  Y... ) - the difference between the row mean and the grand mean
(Y. j.  Y... ) - the difference between the column mean and the grand mean

Interaction

Suppose we have two factors, A and B, each at two levels, high (H) and low
(L) and an infinite amount of replication (and hence no error in the cell
means). Further suppose that the cell means are as follows:

BL BH
AL 5 8
AH 10 ?

From row 1 and column 1, (AL, BL), where the yield is 5. As we change the
level of factor A (AL AH), holding the level of factor B constant (at BL),
the yield increases by 5 (=10-5). If we hold the level of factors A constant
(at AL) and change the level of factor B (BL BH), the yield increases by 3
(=8-5). What happens when the level of both factors, A and B, changes
(ALAH, BL BH)?

3 possibilities:
if the yield (AH, BH) is 13 (increasing by 8, 3+5), there is no interaction.
if the yield (AH, BH) is greater than 13 (increasing by > 8, 3+5), there is
positive interaction.
if the yield (AH, BH) is less than 13 (increasing by < 8, 3+5, or no increasing
at all), there is negative interaction.

Interaction = degree of difference from the sum of the separate effects.

To conceptualize interaction, suppose we have the following cell means:

BL BH
AL 5 8
AH 10 17

Holding the level of factor B constant at BL, as ALAH, the yield increases
by 5.
Holding the level of factor B constant at BH, as ALAH, the yield increases
by 9.
 if the effect of one factor varies depending on the level of another factor,
there is interaction between 2 factors.

Holding the level of factor A constant,


At AL, BLBH increases yield by 3.
At AH, BLBH, increases yield by 7.
Yield
B high The lines are not parallel,
15 there is non-zero interaction.
If there is interaction, the
10 B low interpretation of main effects
become problematic. Going
5 from A low to A high gives an
average increase in yield of 7,
however, it is really either 5 or
A low A high 9, depending on the level of B.
Level of factor A
BL BH
AL 5 8
AH 10 7

The main effect of A is 2, however, it‟s 5 when B is low, and -1 when B is


high. The main effect of B is zero, but it‟s either +3 or -3 depending on the
level of A. The interaction plot:

Yield

10 B low

8
B high
5

A low A high
Level of factor A

Statistical Model: Sums of Squares

Yijk  Y...  (Yi..  Y... )  (Y. j.  Y... )  (Yij.  Yi..  Y. j.  Y... )  (Yijk  Yij. )

Yijk  Y...  (Yi..  Y... )  (Y. j.  Y... )  (Yij.  Yi..  Y. j.  Y... )  (Yijk  Yij. )

2 2 2
 i  j  k (Y
ijk  Y... )   i  j  k (Yi..  Y... )   i  j  k (Y. j.  Y... )
2 2
  i  j  k (Y  Y  Y  Y )   i  j  k (Y  Y )
ij. i.. . j. ... ijk ij.

2
 i  j  k (Y  Y ) does not depend on the indices j and k;
i.. ...
2 2 2
 i  j  k (Y  Y ) =  j  k [  i (Y  Y ) ]  nC [  i (Y  Y ) ]
i.. ... i.. ... i.. ...

2 2 2
 i  j  k (Y
ijk  Y... )  nC  i (Yi..  Y... )  nR  j (Y. j.  Y... )
2 2
 n  i  j (Y  Y  Y  Y )   i  j  k (Y  Y )
ij . i.. . j. ... ijk ij.
2
Without replication,  i  j  k (Y  Y ) = 0, since for every value of (i,j),
ijk ij.
Yij.  Yijk , i.e. with no replication, a cell mean = individual data value.

The above equation can be symbolically written as:


TSS = SSBr + SSBc + SSIr,c + SSW
d.f.: nRC-1 = (R-1) + (C-1) + (R-1)(C-1) + RC(n-1)

Example 4.2: Calculating Sums of Squares, Battery example: (Pg166)

Brand
Device 1 2 3 4 Yi..
1 17.9, 18.1 17.8, 17.8 18.1, 18.2 17.8, 17.9
(18.0) (17.8) (18.15) (17.85) 17.95
2 18.2, 18.0 18.0, 18.3 18.4, 18.1 18.1, 18.5
(18.1) (18.15) (18.25) (18.3) 18.20
3 18.0, 17.8 17.8, 18.0 18.1, 18.3 18.1, 17.9
(17.9) (17.9) (18.2) (18.0) 18.00
Y. j . 18.00 (17.95) 18.2 18.05 18.05
Cell means are in parentheses, column means are in the bottom row and row
means are in the last column.

SSBr  (2)(4)[(17.95  18.05)2  (18.20  18.05)2  (18.00  18.05)2


 8[0.01  0.0225  0.0025]  0.28

SSBc  (2)(3)[(18.00  18.05)2  (17.95  18.05) 2  (18.2  18.05)2  (18.05  18.05)2


 6[0.0025  0.001  0.0225  0]  0.21

SSI r ,c  2[(18.0  17.95  18.00  18.05)2  (17.8  17.95  17.95  18.05)2


.....  (18.0  18.0  18.05  18.05)2
 2[0.055]  0.11

SSW  [(17.9  18.0) 2  (18.1  18.0) 2  (17.8  17.8) 2


....  (17.9  18.0) 2 ]  0.30

TSS = 0.28 + 0.21 +0.11 + 0.30 = 0.90.


ANOVA Table: Two-Factor Study of Battery Life

Source of Variability SSQ df MS Fcalc


Rows 0.28 2 0.14 5.6
Columns 0.21 3 0.07 2.8
Interaction 0.11 6 0.0183 0.73
Error 0.30 12 0.026
Total 0.90 23

Row factors, we test


H 0 : All row means are equal
H 1 : Not all row means are equal
With  = 0.05, and for df = (2,12), we have c=3.89, Thus Fcalc = 5.6 > 3.89,
we reject H 0 (p<0.05); the row factor is significant, not all true row means
are equal.

Column factor, we test


H 0 : All column means are equal
H 1 : Not all column means are equal
With  = 0.05, and for df = (3,12), we have c=3.49, Thus Fcalc = 2.8 < 3.49,
we accept H 0 (p>0.05); the column factor is not significant, cannot reject
that all true column means are equal.

For Interaction, we test


H 0 : There is no interaction between row and column factors
H 1 : There is interaction between row and column factors
With  = 0.05, and for df = (6,12), we have c=3.00, Thus Fcalc = 0.73 < 3.00,
we accept H 0 (p>0.05); the interaction effect between the two factors is not
significant, cannot reject that there is no interaction between the two factors.

E ( MSI )   2  Vint and E ( MSW )   2

Vint cannot be negative, some strong evidence that Vint =0, E ( MSI )   2 ,
MSI and MSW are both estimating the same quantity,  2 . One should pool
the two estimates for use as the denominator of Fcalc when testing for
significance of the row and column factors. The pooling leads to a modified
ANOVA table:
Source of Variability SSQ df MS Fcalc
Rows 0.28 2 0.14 6.15
Columns 0.21 3 0.07 3.07
Error 0.41 18 0.028
(0.11+0.3) (6+12)
Total 0.90 23

Row factor, reject Ho


Column factor, still accept Ho.

Example 4.3: Analysis using SPSS

Data Table 6.7, pg 171 .

4.3 Fixed Levels versus Random Levels

A factor has fixed levels if the levels of the factor implemented in the
experiment are chosen by the experimenter, and the levels of the factor in
the experiment are the only levels about which inferences are to be made.
Example of a fixed-level factor – sex (male, female).

A factor having random levels is that the levels of the factor implemented in
the experiment are randomly selected (i.e. they are a random sample) from a
large number of possibilities. An example: amount of rainfall.

Fixed model – each factor in a design is a fixed-level factor.


Random model – each factor in a design is a random-level factor;
Mixed model – a design in which some factors are fixed level factors and
others are random-level factors. Eg. One factor – type of fertilizer, another
factor: amount of rainfall.

Expected Mean Square


Mean Square Fixed Random Mixed: Col. Fixed,
Row Random
MSBr  2  Vr  2  Vr  Vint  2  Vr
MSBc  2  Vc  2  Vc  Vint  2  Vc  Vint
MSIr,c  2  Vint  2  Vint  2  Vint
MSW 2 2 2
Calculation of Fcalc
Fixed Random Row Random
Row Factor MSBr/MSW MSBr/MSIr,c MSBr/MSW
Column Factor MSBc/MSW MSBc/MSIr,c MSBc /MSIr,c
Interaction MSIr,c/MSW MSIr,c/MSW MSIr,c/MSW

Example 4.4: Brand Name Appeal for Men and Women

Two hundred college students participated in an experiment that was


allegedly to test market a new cigarette, to determine what consumers
thought of it and whether they would purchase it (to determine the proposed
brand name of a new cigarette affected its attractiveness to potential
customers).

Two brand names selected: Frontiersman, April.

4 separate groups of 50 people; 2 groups were all males; 2 groups were all
females. 50 men & 50 women (2 groups) were told the brand name was
Frontiersman & were asked about the various opinions about the cigarette.
Another 50 men & 50 women (2 groups) were told the brand name was
April & were asked about the various opinions about the cigarette.

Two-factor experiment, R=2, C=2 and n=50 (replicates per cell),:

Sex
Brand Name Male Female Both Sexes
Frontiersman 4.44 2.04 3.24
April 3.50 4.52 4.01
Both brand 3.97 3.28

The entries in the cells represent average „intent to purchase‟ (that is, cell
means) for each group on a seven-point scale with 7 representing nearly a
certain purchase of the cigarette and 1 representing nearly zero chance of
purchase of the cigarette.
ANOVA Table
Source of Variability SSQ df MS Fcalc
Sex 23.80 1 23.80 5.61
Brand Name 29.64 1 29.64 6.99
Interaction 146.2 1 146.2 34.48
Error 831.0 196 4.24
Total 1031 199

Model – fixed model. For =0.05, and df = (1,196), c=3.84 and all three
effects are significant. Interaction effect is dominant. The main effect of
brand name is -0.77 (3.24-4.01), it‟s +0.94 (4.44-3.50) for males, and -2.48
(2.04-4.52) for females. The difference in the effects between the two sexes,
3.42 [=0.94-(-2.48)], jumps out.

4.4 Two Factors with No Replication and No Interaction

Without replication, the error term is zero; error is measured by considering


more than one observation (replication) at the same treatment combination.

Yijk    i   j  I ij  ijk

2 2 2
 i  j  k (Y
ijk  Y... )  nC  i (Yi..  Y... )  nR  j (Y. j.  Y... )
2 2
 n  i  j (Y  Y  Y  Y )   i  j  k (Y  Y )
ij . i.. . j. ... ijk ij.

TSS = SSBr + SSBc + SSIr,c + SSW

Without replication, n=1, the last term of both equations equals zero, since,
for every (i,j), Yijk  Yij. , i.e. with no replication, a cell mean equals the
individual data value. Then we have:
2 2 2 2
i  j (Y  Y )  C i (Y  Y )  R  j (Y  Y )  i  j (Y  Y  Y  Y )
ij .. i. .. .j .. ij i. .j ..
and TSS = SSBr + SSBc + SSIr,c
df: RC-1 = (R-1) + (C-1) + (R-1)(C-1)

Suppose there is no interaction, Vint=0, then E(MSIr,c) = 2, we can use SSIr,c
and MSIr,c to represent the role of SSW and MSW, respectively, therefore,
TSS = SSBr + SSBc + SSW
Example 4.5: An Unreplicated Numerical Example

Level of Level of Factor A Source of SSQ df MS Fcalc


Factor B 1 2 3 variability
1 7 3 4 Row 28.67 3 9.55 43
2 10 6 8 Column 32.00 2 16.0 72
3 6 2 5 Error 1.33 6 0.22
4 9 5 7 Total 62.00 11

Unreplicated two-factor experiment, there is no interaction between the 2


factors.

SSBr = 28.67 SSBc = 32 SSW( = “SSIr,c”) = 1.33


If  = 0.01, c = 9.78 with df = (3, 6) and 10.93 for df = (2, 6). Both row &
column factors are both highly significant.

Suppose we were wrong in assuming that there is no interaction. What


happens to the conclusions? Are they all useless?

Fcalc estimated the ratio with numerator and denominator expected mean
squares of (2 + Vr)/2; however, with the interaction not zero, we actually
estimated (2 + Vr)/(2+ Vint), therefore Fcalc is, on average, smaller than it
deserves to be.

With the assumption of no interaction, H0 is rejected, there is even less


chance of rejection of H0 occurring if, indeed, H0 is true.

The consequences of having inappropriately assumed there was no


interaction are, on average, potentially harmful only for a factor that is
judged to have no effect.

Using SPSS

4.5 Blocking

One reason for having a second factor in an experiment, is the difficulty of


studying the levels of the prime factor under homogeneous conditions.
For example, suppose that we are studying worker absenteeism as a function
of the age of the worker, and have different levels of ages – 25-30, 40-46
and 55-60. However, we may be concerned that a worker‟s gender may also
affect his or her amount of absenteeism. Even though we are not
particularly concerned with this potential impact of gender, we want to
ensure that the gender factor does not pollute our conclusions about the
effect of age. For example, if the absenteeism data of one age group
happens to include a higher proportion of women than the absenteeism data
of the other age groups, the absentee rate differences assigned to the factor
„age group‟ could be including a gender effect along with it.

One solution is to study on one gender only, or two separate studies, but
these will not give an overall measure of the impact of age, and the result for
each gender would have the reliability that a combined study might have. In
addition, interaction effect may happen.

Therefore blocking is necessary. To block, a second factor (gender) is


introduced and a two-factor study is performed. Blocking on gender is to
eliminate gender as a nuisance factor by accounting for the effects of a factor
whose level may matter.

4.6 Friedman Nonparametric Test

Example 4.6: Analysis of Angioplasty Equipment


The experiment was intended to see if there is a difference in burst pressure
(i.e. pressure at which the item burst) as a function of type of angioplasty
unit, for various types of balloon dilation catheters. Data for this
unreplicated two-factor experiment:

Balloon Dilation Catheter Type Angioplasty Unit Type


A B C D
1 24 26 25 22
2 27 27 26 24
3 19 22 20 16
4 24 27 25 23
5 22 25 22 21
6 26 27 24 24
7 27 26 22 23
8 25 27 24 21
9 22 23 20 19
Friedman test considers differences in levels only for the column factor- type
of angioplasty unit.

If we want to examine differences in the levels of balloon dilation catheter,


rewrite the table as 4 x 9 form  the column factor - levels of balloon
dilation catheter.

Hypothesis:

H 0 : There are no differential effects among the different angioplasty-unit


types with respect to burst pressure.
H1 : There are differential effects among the different angioplasty-unit types
with respect to burst pressure.

Convert the data to ranks – ranks within each row  replace each data value
by its rank within its row (1 through 4 as there are 4 data value in a row). If
there is a tie, average the ranks.

Rank-Ordered Data for Friedman Test


Balloon Dilation Catheter Type Angioplasty Unit Type
A B C D
1 2 4 3 1
2 3.5 3.5 2 1
3 2 4 3 1
4 2 4 3 1
5 2.5 4 2.5 1
6 3 4 1.5 1.5
7 4 3 1 2
8 3 4 2 1
9 3 4 2 1
R. j 25 34.5 20 10.5

R. j : sum of the ranks for each column

Test statistic:
FR = {12 /[ RC (C  1)]}  ( R 2j )  3R(C  1)
j 1toC

= [12 /(9.4.5)](25  34.52  202  10.52 )  3(9)(5)


2
= 155.03 – 135 = 20.03.

Under the null hypothesis, FR is well approximated by a  2 distribution


with (C-1) df.

With  = 0.05, df = 3, critical value is 7.815.


FR > 7.815, reject H 0 , conclusion: there are differences in angioplasty-unit
type with respect to burst pressure.

Friedman Test Using SPSS (Pg 190)

Exercises:

Pg 193-196: Q2-5, 6-7, Q12-15, 17.

You might also like