Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
28 views

Lab 6 - Shell

This document describes a lab on performing ANOVA in R. The learning objectives are to learn how to perform ANOVA in R using both step-by-step methods and functions, and to perform investigations of the ANOVA model assumptions. The document contains exercises using a dataset of video game reviews to determine if different platforms have different average review scores, and using the iris dataset to determine if species have different average sepal lengths. The results of these analyses support rejecting the null hypotheses and concluding that platforms and species differ in their average scores/lengths.

Uploaded by

Mansi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Lab 6 - Shell

This document describes a lab on performing ANOVA in R. The learning objectives are to learn how to perform ANOVA in R using both step-by-step methods and functions, and to perform investigations of the ANOVA model assumptions. The document contains exercises using a dataset of video game reviews to determine if different platforms have different average review scores, and using the iris dataset to determine if species have different average sepal lengths. The results of these analyses support rejecting the null hypotheses and concluding that platforms and species differ in their average scores/lengths.

Uploaded by

Mansi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Lab 6 - ANOVA 1

Mansi Kumari (7908159)

2023-03-03

Learning Objectives

By the end of this lab, you should have a grasp on the following concepts:

• How to perform ANOVA in R, both step-by-step and with an easy R function.


• How to perform a simple investigation of the model assumptions.

Instructions

To complete this worksheet, add code as needed into the R code chunks given below. Do not delete the
question text. All text should be in complete English sentences. Be sure to change the author of this file to
reflect your name and student number.
To properly see the questions, knit this .Rmd file to .pdf and view the output. You will have a link in your
email that takes you to the Crowdmark submission page. Once you have completed the worksheet, knit it
to .pdf and upload your output to Crowdmark.

1
Exercises
Import the Games200 dataset. This dataset contains a random sample of 200 games released in 2019, along
with the metascore (average critic review), the userscore (average user review), and platform of release.

Games200 <- read.csv("~/Downloads/Games200.csv")

Our goal is to determine whether each video game platform receives the same metascore on average, or not,
based on this sample.
Make a boxplot comparing the metascores for each platform.

boxplot(Metascore ~ Platform, data = Games200)


90
80
Metascore

70
60
50

PC PlayStation 4 Switch Xbox One

Platform

Use aggregate to calculate the mean of each group

aggregate(Metascore ~ Platform, data = Games200,FUN = mean)

## Platform Metascore
## 1 PC 74.63462
## 2 PlayStation 4 71.48889
## 3 Switch 72.24675
## 4 Xbox One 78.11538

Use aggregate to determine the sample size of each group.

2
aggregate(Metascore ~ Platform, data = Games200,FUN = length)

## Platform Metascore
## 1 PC 52
## 2 PlayStation 4 45
## 3 Switch 77
## 4 Xbox One 26

Calculate the overall mean.

mean(Games200$Metascore)

## [1] 73.46

Calculate the SSG by hand, using your earlier calculations.

my.SSG<-52*(74.63-73.46)ˆ2 + 45*(71.48-73.46)ˆ2 + 77*(72.25-73.46)ˆ2 + 26*(78.12-73.46)ˆ2


my.SSG

## [1] 924.9421

Calculate the MSG by hand, using your earlier calculations.

my.MSG <- my.SSG/(4 - 1)


my.MSG

## [1] 308.314

Use the aggregate function with var to find the sample variances, and then from there find the SSE.

aggregate(Metascore ~ Platform, FUN = var, data = Games200)

## Platform Metascore
## 1 PC 58.78544
## 2 PlayStation 4 68.84646
## 3 Switch 57.42515
## 4 Xbox One 43.06615

my.SSE <- 51*58.79 + 44*68.85 + 76*57.43 + 25*43.07


my.SSE

## [1] 11469.12

Calculate the MSE by hand, using your earlier calculations.

my.MSE <- my.SSE/(200 - 4)


my.MSE

## [1] 58.51592

Calculate the F test statistic, using your earlier calculations.

3
my.F <- my.MSG/my.MSE
my.F

## [1] 5.268892

Use pf to find the P-value for this test.

1 - pf(my.F, df1 = 3, df2 = 196)

## [1] 0.001622573

What is your conclusion?


The p-value is 0.00162.We can conclude that we would reject our null hypothesis at 5% level of significance.We
have sufficient evidence to conclude that not all platforms have the same mean.
Repeat the earlier test, using the aov function.

my.aov <- aov(Metascore ~ Platform, data = Games200)

Use the summary function to print out the ANOVA results.

summary(my.aov)

## Df Sum Sq Mean Sq F value Pr(>F)


## Platform 3 923 307.80 5.261 0.00164 **
## Residuals 196 11468 58.51
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Create a histogram of the residuals of the ANOVA model

hist(my.aov$residuals)

4
Histogram of my.aov$residuals
50
40
Frequency

30
20
10
0

−20 −10 0 10 20

my.aov$residuals

What does this tell you about your Normality assumption?

Use the aggregate function with sd to find the standard deviations of each group.

aggregate(Metascore ~ Platform, FUN = sd, data = Games200)

## Platform Metascore
## 1 PC 7.667167
## 2 PlayStation 4 8.297377
## 3 Switch 7.577939
## 4 Xbox One 6.562481

What does this tell you about your equal-variances assumption?

Next we will do ANOVA on the iris dataset. Use the data function to load in this dataset.

data(iris)

5
This dataset contains the petal and sepal lengths and widths (in cm) for a sample of 150 iris flowers. They
are divided by their species: iris setosa, iris virginica, and iris versicolor.
We will do an analysis to determine if their sepal widths differ significantly, on average.
Exercise: Write the hypotheses for this test in TeX

H0 : µSetosa = µV irginica = µV ersicolor vs Ha : Not all means are equal

Exercise: Use the aov function to conduct a hypothesis test at the 5% level of significance to
determine whether the mean sepal lengths are equal for all species.

my_aov <-aov(Sepal.Length~Species,data = iris)


summary(my_aov)

## Df Sum Sq Mean Sq F value Pr(>F)


## Species 2 63.21 31.606 119.3 <2e-16 ***
## Residuals 147 38.96 0.265
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Exercise: Give a fully-worded conclusion to this test.


As our p-value is below 5% because we conducted this test at 5% level of significance which means we reject
our null hypothesis and there is sufficient evidence at 5 % level of significance to conclude that the mean
sepal lengths is not equal for all species.
Exercise: Check whether the ANOVA model assumptions appear to be accurate.

hist(my_aov$residuals)

6
Histogram of my_aov$residuals
60
50
40
Frequency

30
20
10
0

−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

my_aov$residuals

aggregate(Sepal.Length~Species,data = iris,FUN = sd)

## Species Sepal.Length
## 1 setosa 0.3524897
## 2 versicolor 0.5161711
## 3 virginica 0.6358796

The residuals appear to have an approximately normal shape, and also that none of the standard deviations
are twice the size of the other ,so that the conditions of the test appear to be satisfied .

You might also like