Statistics FOR Management Assignment - 2: One Way ANOVA Test
Statistics FOR Management Assignment - 2: One Way ANOVA Test
Statistics FOR Management Assignment - 2: One Way ANOVA Test
FOR
MANAGEMENT
ASSIGNMENT - 2
One way ANOVA Test
INTRODUCTION
Analysis of Variance (ANOVA) is a method that allows us to compare the mean
score of a continuous (or ordinal with several scale points) variable between a
number of groups.
Why the method is called Analysis of Variance, when we want to compare the
means? This is because ANOVA works by comparing: the spread (or variance) of
the group means (called the between groups sums of squares) with the spread
(or variance) of values within the groups (called the within group sums of
squares). If the variance of the group means is larger than you would predict
from looking at the within group variance, then it is likely that the means differ.
GENESIS OF ANOVA
Despite its name (Analysis of Variance), ANOVA is a test of whether multiple
means are equal or not. The procedure gets its name because it breaks up the
overall variance in the combined data into two parts: the variance within the
groups and the variance between the groups. If the variance between the groups
is significantly larger than the variance within the groups, then we conclude that
the groups have significantly different means. Here is a picture
The sample mean is the sum of the observations divided by the sample size.
In more detail:
ANOVA breaks up the total sum of squares (the sum of the square deviations of
each Yi;j from the overall mean Y ; into within population and between
population sum of squares as follows:
The Total Sum of Squares is abbreviated simply as SS. The Sum of Squares within
Populations is also known as Sum of Squares due to Error and is often
abbreviated SSE. Sum of Squares between Populations is variously known as
Sum of Squares due to the Model (SSM), Sum of Squares due to Treatment (SST),
and Sum of Squares for the Groups (SSG). The fundamental equation for ANOVA
then becomes SS = SSM + SSE.
The question of whether the means of the groups is different or not becomes a
question of whether the Sum of Squares due to Groups/Treatment is large
compared to the Sum of Squares due to error. The appropriate test statistic is an
F -test as follows:
The resulting F -statistic has I-1 numerator degrees of freedom and n-I
denominator degrees of freedom.
Assumptions for ANOVA: the data a normal, the variances within each population
are equal, every observation is independent of every other observation. With
larger sample sizes, normality is not so critical. With a balanced design (equal
samples sizes in each group), equal variances are not so critical. For an
unbalanced design, equal variances are crucial. Independence is crucial but it
should result from the experimental design.
A different way of thinking about ANOVA is by thinking about models for the
data. ANOVA corresponds to the comparisons of two models for the data - a Full
model or Separate Means model where every population gets its own mean and
a reduced model or Equal Means model where there is one common mean for
each population. These models can be represented visually/mathematically as
In this formulation, we identify the Sum of Squares due to the Model (Treatment,
Group) with the Extra Sum of Squares that the reduced model has in its error as
compared to what the full model has in its error:
The model degrees of freedom (I-1) are also the Extra degrees of freedom that
the reduced model has as compared to the full model:
And is also simply the number of parameters in the full model minus the number
of parameters in the reduced model.
Thinking in terms of models can be useful. The above generalizes, for instance to
a model
,
Where all the means are equal except that the first mean can be different.
For ANOVA, you should check normality and apply a log transform to all the data
to correct skewness when appropriate. For a balanced design, ANOVA is robust to
violations in the assumption of equal variances. If the design is unbalanced and
the variances are unequal, you should use not use ANOVA.
COMPETING TOOLS
While the independent sample t-test is limited to comparing the means of two
groups, the one-way ANOVA (Analysis of Variance) can compare more than two
groups. Therefore, the t-test is considered a special case of the one-way ANOVA.
These analyses do not, however, necessarily imply any causal relationship
between the left-hand and right-hand side variables. The F statistic of ANOVA is t
squared (t2) when the degrees of freedom is only one. Whether data are
balanced or not does not matter in the t-test and one-way ANOVA. Table below
compares the independent sample t-test and one-way ANOVA.
Independent Sample T-test
One-way ANOVA
LHS (Dependent)
RHS (Independent)
Binary variable
Categorical variable
Null Hypothesis
mu1=mu2
mu1=mu2=mu3 ...
Probability Distribution
T Distribution
F distribution
Sample Size
Balanced/unbalanced
Balanced/unbalanced
b) Move the dependent variable (in this example, rt) to the Dependent List
space, and the independent variable (e.g., relation) to the Factor space.
Then select Options.
The second one gives you ANOVA results. In this example, the F value is .313,
and p value is .736, which means there is no significant difference among the
three conditions.
EXAMPLE
In this example, we will test if the response to the question If you could not be a
psychology major, which of these majors would you chooses? (Math, English,
Visual Arts, or History)" influences the person's GPAs. We will follow the standard
steps for performing hypothesis tests:
In the list at the left, click on the variable that corresponds to your dependent
variable (the one that was measured.) Move it into the Dependent List by clicking
on the upper arrow button. In this example, the GPA is the variable that we
recorded, so we click on it and the upper arrow button:
Now select the (quasi) independent variable from the list at the left and click on
it. Move it into the Factor box by clicking on the lower arrow button.
Click on the Post Hoc button to specify the type of multiple comparisons that you
would like to perform. The Post Hoc dialog box appears: Select Tukey
Click on the Continue Button to return to the One-Way ANOVA dialog box. Click
on the Options button in the One-Way ANOVA dialog box. The One-Way ANOVA
Options dialog box appears: Click in the check box to the left of Descriptive (to
get descriptive statistics), Homogeneity of Variance (to get a test of the
assumption of homogeneity of variance) and Means plot (to get a graph of the
means of the conditions.):
Click on the Continue button to return to the One-Way ANOVA dialog box. In the
One Way ANOVA dialog box, click on the OK button to perform the analysis of
variance. The SPSS output window will appear. The output consists of six major
sections. First, the descriptive section appears:
For each dependent variable (e.g. GPA), the descriptives output gives the sample
size, mean, standard deviation, minimum, maximum, standard error, and
confidence interval for each level of the (quasi) independent variable. In this
example, there were 7 people who responded that they would be a math major if
they could not be a psychology major, and their mean GPA was 3.144, with a
standard deviation of 0.496. There were 16 people who would be an English
major if they could not be a psychology major, and their mean GPA was 2.937
with a standard deviation of 0.5788.
value. If the p value is less than or equal to your level for this test, then you
can reject the H0 that the variances are equal. If the p value is greater than
level for this test, then we fail to reject H0which increases our confidence that
the variances are equal and the homogeneity of variance assumption has been
met. The p value is .402. Because the p value is greater than the level, we fail
to reject H0 implying that there is little evidence that the variances are not equal
and the homogeneity of variance assumption may be reasonably satisfied.
The ANOVA output gives us the analysis of variance summary table. There are six
columns in the output:
Column
Description
Unlabeled
(Source of
variance)
The first column describes each row of the ANOVA summary table.
It tells us that the first row corresponds to the between-groups
estimate of variance (the estimate that measures the effect and
error). The between-groups estimate of variance forms the
numerator of the F ratio. The second row corresponds to the withingroups estimate of variaince (the estimate of error). The withingroups estimate of variance forms the denominator of the F ratio.
The final row describes the total variability in the data.
Sum of
Squares
The Sum of squares column gives the sum of squares for each of
the estimates of variance. The sum of squares corresponds to the
numerator of the variance ratio.
df
The third column gives the degrees of freedom for each estimate of
variance.
The degrees of freedom for the between-groups estimate of
variance is given by the number of levels of the IV - 1. In this
example there are four levels of the quasi-IV, so there are 4 - 1 = 3
degrees of freedom for the between-groups estimate of variance.
The degrees of freedom for the within-groups estimate of variance
is calculated by subtracting one from the number of people in each
condition / category and summing across the conditions /
categories. In this example, there are 2 people in the Math
category, so that category has 7 - 1 = 6 degrees of freedom. There
are 16 people in the English category, so that category has 16 - 1 =
Mean
Square
Sig.
The final column gives the significance of the F ratio. This is the p
value. If the p value is less than or equal your level, then you can
reject H0 that all the means are equal. In this example, the p value
is .511 which is greater than the level, so we fail to reject H0.
That is, there is insufficient evidence to claim that some of the
means may be different from each other.
The Multiple Comparisons output gives the results of the Post-Hoc tests that you
requested. In this example, we requested Tukey multiple comparisons, so the
output reflects that choice.
The output includes a separate row for each level of the independent variable. In
this example, there are four rows corresponding to the four levels of the quasi-IV.
Lets consider the first row, the one with major equal to art. There are three subrows within in this row. Each sub-row corresponds to one of the other levels of
the quasi-IV. Thus, there are three comparisons described in this row:
Comparison
H0
H1
Art vs English
Art vs History
Art vs Math
The second column in the output gives the difference between the means. In this
example, the difference between the GPA of the people who would be art majors
and those who would be English majors is 0.2532. The third column gives the
standard error of the mean. The fourth column is the p value for the multiple
comparisons. In this example, the p value for comparing the GPAs of people who
would be art majors with those who would be English majors is 0.565, meaning
that it is unlikely that these means are different (as you would expect given that
the difference (0.2532) is small.) If the p values is less than or equal to level,
then you can reject the corresponding H 0. In this example, the p value is .565
which is larger than level of .05, so we fail to reject H 0 that the mean GPA of
the people who would be art majors is different from the mean GPA of the people
who would be English majors. The final two columns give you the 95%
confidence interval.