Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Statistics FOR Management Assignment - 2: One Way ANOVA Test

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 15

STATISTICS

FOR
MANAGEMENT
ASSIGNMENT - 2
One way ANOVA Test

INTRODUCTION
Analysis of Variance (ANOVA) is a method that allows us to compare the mean
score of a continuous (or ordinal with several scale points) variable between a
number of groups.
Why the method is called Analysis of Variance, when we want to compare the
means? This is because ANOVA works by comparing: the spread (or variance) of
the group means (called the between groups sums of squares) with the spread
(or variance) of values within the groups (called the within group sums of
squares). If the variance of the group means is larger than you would predict
from looking at the within group variance, then it is likely that the means differ.

GENESIS OF ANOVA
Despite its name (Analysis of Variance), ANOVA is a test of whether multiple
means are equal or not. The procedure gets its name because it breaks up the
overall variance in the combined data into two parts: the variance within the
groups and the variance between the groups. If the variance between the groups
is significantly larger than the variance within the groups, then we conclude that
the groups have significantly different means. Here is a picture

How does ANOVA work? It is based on a mathematical formula as below:

The sample mean is the sum of the observations divided by the sample size.
In more detail:

ANOVA breaks up the total sum of squares (the sum of the square deviations of
each Yi;j from the overall mean Y ; into within population and between
population sum of squares as follows:

The Total Sum of Squares is abbreviated simply as SS. The Sum of Squares within
Populations is also known as Sum of Squares due to Error and is often
abbreviated SSE. Sum of Squares between Populations is variously known as
Sum of Squares due to the Model (SSM), Sum of Squares due to Treatment (SST),
and Sum of Squares for the Groups (SSG). The fundamental equation for ANOVA
then becomes SS = SSM + SSE.
The question of whether the means of the groups is different or not becomes a
question of whether the Sum of Squares due to Groups/Treatment is large
compared to the Sum of Squares due to error. The appropriate test statistic is an
F -test as follows:

The resulting F -statistic has I-1 numerator degrees of freedom and n-I
denominator degrees of freedom.

Assumptions for ANOVA: the data a normal, the variances within each population
are equal, every observation is independent of every other observation. With
larger sample sizes, normality is not so critical. With a balanced design (equal
samples sizes in each group), equal variances are not so critical. For an
unbalanced design, equal variances are crucial. Independence is crucial but it
should result from the experimental design.
A different way of thinking about ANOVA is by thinking about models for the
data. ANOVA corresponds to the comparisons of two models for the data - a Full
model or Separate Means model where every population gets its own mean and
a reduced model or Equal Means model where there is one common mean for
each population. These models can be represented visually/mathematically as

In this formulation, we identify the Sum of Squares due to the Model (Treatment,
Group) with the Extra Sum of Squares that the reduced model has in its error as
compared to what the full model has in its error:

The model degrees of freedom (I-1) are also the Extra degrees of freedom that
the reduced model has as compared to the full model:

And is also simply the number of parameters in the full model minus the number
of parameters in the reduced model.
Thinking in terms of models can be useful. The above generalizes, for instance to
a model
,
Where all the means are equal except that the first mean can be different.
For ANOVA, you should check normality and apply a log transform to all the data
to correct skewness when appropriate. For a balanced design, ANOVA is robust to
violations in the assumption of equal variances. If the design is unbalanced and
the variances are unequal, you should use not use ANOVA.

COMPETING TOOLS

While the independent sample t-test is limited to comparing the means of two
groups, the one-way ANOVA (Analysis of Variance) can compare more than two
groups. Therefore, the t-test is considered a special case of the one-way ANOVA.
These analyses do not, however, necessarily imply any causal relationship
between the left-hand and right-hand side variables. The F statistic of ANOVA is t
squared (t2) when the degrees of freedom is only one. Whether data are
balanced or not does not matter in the t-test and one-way ANOVA. Table below
compares the independent sample t-test and one-way ANOVA.
Independent Sample T-test

One-way ANOVA

LHS (Dependent)

Interval or ratio variable

Interval or ratio variable

RHS (Independent)

Binary variable

Categorical variable

Null Hypothesis

mu1=mu2

mu1=mu2=mu3 ...

Probability Distribution

T Distribution

F distribution

Sample Size

Balanced/unbalanced

Balanced/unbalanced

ONE WAY ANOVA PROCEDURE


STEP 1: BUILD THE DATA SET
The table below provides an example of a data set involving one factor with
three levels. The first column lists the case number which is optional. The second
column has the independent variable (e.g., prime-target relation) which has
three levels (e.g., translation, related, and control). The next column lists the
dependent variable which is reaction times (rt).

STEP 2: RUN THE PROCEDURE


a) Go to SPSS, select Analyze, then Compare Means, then One-Way ANOVA,
and you will see this dialogue box.

b) Move the dependent variable (in this example, rt) to the Dependent List
space, and the independent variable (e.g., relation) to the Factor space.
Then select Options.

c) In the Options Dialogue box, select Descriptive, and then click on


Continue, and then click on OK.

STEP 3: READ THE OUTPUT


You will get two tables showing the results of the analysis. The first one gives you
the descriptive results:

The second one gives you ANOVA results. In this example, the F value is .313,
and p value is .736, which means there is no significant difference among the
three conditions.

EXAMPLE
In this example, we will test if the response to the question If you could not be a
psychology major, which of these majors would you chooses? (Math, English,
Visual Arts, or History)" influences the person's GPAs. We will follow the standard
steps for performing hypothesis tests:

1. Write the null hypothesis:


H0: Math = English = Visual Arts = History
Where represents the mean GPA.
2. Write the alternative hypothesis:
H1: not H0
(Remember that the alternative hypothesis must be mutually exclusive
and exhaustive of the null hypothesis.)
3. Specify the level: = .05
4. Determine the statistical test to perform: In this case, GPA is
approximately ratio scaled, and we have multiple (4) groups, so the
between-subjects ANOVA is appropriate.
5. Calculate the appropriate statistic
Once you have recoded the independent variable, you are ready to perform the
ANOVA. Click on Analyze | Compare Means | One-Way ANOVA:

The One-Way ANOVA dialog box appears:

In the list at the left, click on the variable that corresponds to your dependent
variable (the one that was measured.) Move it into the Dependent List by clicking
on the upper arrow button. In this example, the GPA is the variable that we
recorded, so we click on it and the upper arrow button:

Now select the (quasi) independent variable from the list at the left and click on
it. Move it into the Factor box by clicking on the lower arrow button.

Click on the Post Hoc button to specify the type of multiple comparisons that you
would like to perform. The Post Hoc dialog box appears: Select Tukey

Click on the Continue Button to return to the One-Way ANOVA dialog box. Click
on the Options button in the One-Way ANOVA dialog box. The One-Way ANOVA
Options dialog box appears: Click in the check box to the left of Descriptive (to
get descriptive statistics), Homogeneity of Variance (to get a test of the
assumption of homogeneity of variance) and Means plot (to get a graph of the
means of the conditions.):

Click on the Continue button to return to the One-Way ANOVA dialog box. In the
One Way ANOVA dialog box, click on the OK button to perform the analysis of
variance. The SPSS output window will appear. The output consists of six major
sections. First, the descriptive section appears:

For each dependent variable (e.g. GPA), the descriptives output gives the sample
size, mean, standard deviation, minimum, maximum, standard error, and
confidence interval for each level of the (quasi) independent variable. In this
example, there were 7 people who responded that they would be a math major if
they could not be a psychology major, and their mean GPA was 3.144, with a
standard deviation of 0.496. There were 16 people who would be an English
major if they could not be a psychology major, and their mean GPA was 2.937
with a standard deviation of 0.5788.

The Test of Homogeneity of Variances output tests H0: 2Math = 2English =


2Art = 2History. This is an important assumption made by the analysis of
variance. To interpret this output, look at the column labelled Sig. This is the p

value. If the p value is less than or equal to your level for this test, then you
can reject the H0 that the variances are equal. If the p value is greater than
level for this test, then we fail to reject H0which increases our confidence that
the variances are equal and the homogeneity of variance assumption has been
met. The p value is .402. Because the p value is greater than the level, we fail
to reject H0 implying that there is little evidence that the variances are not equal
and the homogeneity of variance assumption may be reasonably satisfied.

The ANOVA output gives us the analysis of variance summary table. There are six
columns in the output:

Column

Description

Unlabeled
(Source of
variance)

The first column describes each row of the ANOVA summary table.
It tells us that the first row corresponds to the between-groups
estimate of variance (the estimate that measures the effect and
error). The between-groups estimate of variance forms the
numerator of the F ratio. The second row corresponds to the withingroups estimate of variaince (the estimate of error). The withingroups estimate of variance forms the denominator of the F ratio.
The final row describes the total variability in the data.

Sum of
Squares

The Sum of squares column gives the sum of squares for each of
the estimates of variance. The sum of squares corresponds to the
numerator of the variance ratio.

df

The third column gives the degrees of freedom for each estimate of
variance.
The degrees of freedom for the between-groups estimate of
variance is given by the number of levels of the IV - 1. In this
example there are four levels of the quasi-IV, so there are 4 - 1 = 3
degrees of freedom for the between-groups estimate of variance.
The degrees of freedom for the within-groups estimate of variance
is calculated by subtracting one from the number of people in each
condition / category and summing across the conditions /
categories. In this example, there are 2 people in the Math
category, so that category has 7 - 1 = 6 degrees of freedom. There
are 16 people in the English category, so that category has 16 - 1 =

15 degrees of freedom. For art, there are 15 - 1 = 14 degrees of


freedom. For history there are 7 - 1 = 6 degrees of freedom.
Summing the dfs together, we find there are 6 + 15 + 14 + 6 = 41
degrees of freedom for the within-groups estimate of variance. The
final row gives the total degrees of freedom which are given by the
total number of scores - 1. There are 45 scores, so there are 44
total degrees of freedom.

Mean
Square

The fourth column gives the estimates of variance (the mean


squares.) Each mean square is calculated by dividing the sum of
square by its degrees of freedom.
MSBetween-groups = SSBetween-groups / dfBetween-groups
MSWithin-groups = SSWithin-groups / dfWithin-groups

The fifth column gives the F ratio. It is calculated by dividing mean


square between-groups by mean square within-groups.
F = MSBetween-groups / MSWithin-groups

Sig.

The final column gives the significance of the F ratio. This is the p
value. If the p value is less than or equal your level, then you can
reject H0 that all the means are equal. In this example, the p value
is .511 which is greater than the level, so we fail to reject H0.
That is, there is insufficient evidence to claim that some of the
means may be different from each other.

We would write the F ratio as: The one-way, between-subjects analysis of


variance failed to reveal a reliable effect of other major on GPA, F(3, 41) =
0.781, p = .511, MSerror = 0.292, = .05.
The 3 is the between-groups degrees of freedom, 41 is the within-groups degrees
of freedom, 0.781 is the F ratio from the F column, .511 is the value in the Sig.
column (the p value), and 0.292 is the within-groups mean square estimate of
variance.
Decide whether to reject H0: If the p value associated with the F ratio is less than
or equal to the level, then you can reject the null hypothesis that all the means
are equal. In this case, the p value equals .511, which is greater than level
(.05), so we fail to reject H0.
When the F ratio is statistically significant, we need to look at the multiple
comparisons output. Even though our F ratio is not statistically significant, we
will look at the multiple comparisons to see how they are interpreted .

The Multiple Comparisons output gives the results of the Post-Hoc tests that you
requested. In this example, we requested Tukey multiple comparisons, so the
output reflects that choice.
The output includes a separate row for each level of the independent variable. In
this example, there are four rows corresponding to the four levels of the quasi-IV.
Lets consider the first row, the one with major equal to art. There are three subrows within in this row. Each sub-row corresponds to one of the other levels of
the quasi-IV. Thus, there are three comparisons described in this row:
Comparison

H0

H1

Art vs English

H0: Art = English

H1: Art English

Art vs History

H0: Art = History

H1: Art History

Art vs Math

H0: Art = Math

H1: Art Math

The second column in the output gives the difference between the means. In this
example, the difference between the GPA of the people who would be art majors
and those who would be English majors is 0.2532. The third column gives the
standard error of the mean. The fourth column is the p value for the multiple
comparisons. In this example, the p value for comparing the GPAs of people who
would be art majors with those who would be English majors is 0.565, meaning
that it is unlikely that these means are different (as you would expect given that
the difference (0.2532) is small.) If the p values is less than or equal to level,
then you can reject the corresponding H 0. In this example, the p value is .565
which is larger than level of .05, so we fail to reject H 0 that the mean GPA of
the people who would be art majors is different from the mean GPA of the people
who would be English majors. The final two columns give you the 95%
confidence interval.

You might also like