Lab 6 - Shell
Lab 6 - Shell
2023-03-03
Learning Objectives
By the end of this lab, you should have a grasp on the following concepts:
Instructions
To complete this worksheet, add code as needed into the R code chunks given below. Do not delete the
question text. All text should be in complete English sentences. Be sure to change the author of this file to
reflect your name and student number.
To properly see the questions, knit this .Rmd file to .pdf and view the output. You will have a link in your
email that takes you to the Crowdmark submission page. Once you have completed the worksheet, knit it
to .pdf and upload your output to Crowdmark.
1
Exercises
Import the Games200 dataset. This dataset contains a random sample of 200 games released in 2019, along
with the metascore (average critic review), the userscore (average user review), and platform of release.
Our goal is to determine whether each video game platform receives the same metascore on average, or not,
based on this sample.
Make a boxplot comparing the metascores for each platform.
70
60
50
Platform
## Platform Metascore
## 1 PC 74.63462
## 2 PlayStation 4 71.48889
## 3 Switch 72.24675
## 4 Xbox One 78.11538
2
aggregate(Metascore ~ Platform, data = Games200,FUN = length)
## Platform Metascore
## 1 PC 52
## 2 PlayStation 4 45
## 3 Switch 77
## 4 Xbox One 26
mean(Games200$Metascore)
## [1] 73.46
## [1] 924.9421
## [1] 308.314
Use the aggregate function with var to find the sample variances, and then from there find the SSE.
## Platform Metascore
## 1 PC 58.78544
## 2 PlayStation 4 68.84646
## 3 Switch 57.42515
## 4 Xbox One 43.06615
## [1] 11469.12
## [1] 58.51592
3
my.F <- my.MSG/my.MSE
my.F
## [1] 5.268892
## [1] 0.001622573
summary(my.aov)
hist(my.aov$residuals)
4
Histogram of my.aov$residuals
50
40
Frequency
30
20
10
0
−20 −10 0 10 20
my.aov$residuals
Use the aggregate function with sd to find the standard deviations of each group.
## Platform Metascore
## 1 PC 7.667167
## 2 PlayStation 4 8.297377
## 3 Switch 7.577939
## 4 Xbox One 6.562481
Next we will do ANOVA on the iris dataset. Use the data function to load in this dataset.
data(iris)
5
This dataset contains the petal and sepal lengths and widths (in cm) for a sample of 150 iris flowers. They
are divided by their species: iris setosa, iris virginica, and iris versicolor.
We will do an analysis to determine if their sepal widths differ significantly, on average.
Exercise: Write the hypotheses for this test in TeX
Exercise: Use the aov function to conduct a hypothesis test at the 5% level of significance to
determine whether the mean sepal lengths are equal for all species.
hist(my_aov$residuals)
6
Histogram of my_aov$residuals
60
50
40
Frequency
30
20
10
0
my_aov$residuals
## Species Sepal.Length
## 1 setosa 0.3524897
## 2 versicolor 0.5161711
## 3 virginica 0.6358796
The residuals appear to have an approximately normal shape, and also that none of the standard deviations
are twice the size of the other ,so that the conditions of the test appear to be satisfied .