MATH08051: Statistics (Year 2) Worksheet 5 2018-2019
MATH08051: Statistics (Year 2) Worksheet 5 2018-2019
MATH08051: Statistics (Year 2) Worksheet 5 2018-2019
Worksheet 5 2018–2019
Hand in attempts to the assessed questions by 14.00 Monday 1st April. These should be
placed inside the collection cabinet situated outside room 5312 (The Maths Hub). Please
clearly mark your name and workshop number/time on your solution. If you are
handing in more than 1 page, make sure your name is on every page and/or you securely staple
the pages together.
Question 1 will contribute to your final mark - the number of marks for each question are
given in [ ] brackets at the end of each question (or part question). The best four marks, out
of the five worksheets, will be used for assessment. In total these will account for 20% of the
final mark for the course.
In the associated workshop you will have the opportunity to work on any question that
you had problems with and obtain help from your tutor. You should attempt the additional
(non-assessed) questions and/or identify where you need help before the workshop.
Assessed Questions
1. An experiment is conducted to assess di↵erences in mean yields (tonnes per hectare) of
three varieties of turnip (A, B and C). Each variety was grown on eight plots. The data
are given by:
Variety
A B C
1.28 1.61 1.54
1.18 1.71 1.65
1.64 1.50 1.81
1.21 1.54 1.76
1.30 1.41 1.65
1.43 1.31 1.67
1.51 1.76 1.78
1.30 1.82 1.86
The farmer wishes to test whether or not the yield of the di↵erent turnip varieties are
the same or not.
(a) State the null and alternative hypotheses for the one-way ANOVA to be conducted,
clearly defining any notation that you use. [2]
(b) State the assumptions of this one-way ANOVA. [2]
(c) Conduct the appropriate analysis in R. Provide the R commands used and the asso-
ciated R output of the one-way ANOVA table. [2]
(d) Considering a 5% significance level state any conclusions that you make. If you
reject the null hypothesis clearly state what you can conclude with regard to any
di↵erences in yield for the di↵erent turnip varieties. [4]
1
Additional Questions for Workshop
Response: data
Df Sum Sq Mean Sq F value Pr(>F)
as.factor(treatment) 3 123.583 41.194 6.0285 0.01890 *
Residuals 8 54.667 6.833
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
3. Which of the following statements is the most reasonable interpretation of the analysis
of the previous question?
(a) We can accept the null hypothesis that the treatments do not di↵er.
(b) There is no evidence to suggest that the treatments di↵er.
(c) There is evidence that the treatments di↵er.
(d) The treatments are not all the same.
4. The following R session analyses the ice breakup time for the Tanana River in Alaska.
Time is given in days beginning from 1st January, for each of the years 1917–2008. A
climate expert proposes a model for the ice breakup data. She suggests that there was a
step change in 1970. Consider the following analyses.
2
Call: lm(formula = days ~ as.factor(step))
Residuals:
Min 1Q Median 3Q Max
-15.6528 -3.7627 -0.4528 4.1623 15.2472
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 126.2528 0.7891 159.994 <2e-16 ***
as.factor(step)2 -3.0605 1.2120 -2.525 0.0133 *
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
(a) Write down a mathematical representation of the model proposed by the climate
expert. Clearly define all the terms in your model.
(b) State how the estimates of the coefficients in the output relate to the parameters of
the model you specified in part (a).
(c) Calculate a 95% confidence interval for the di↵erence in mean date of breakup before
and after 1970.
3
5. In an experiment to compare the e↵ects on growth of piglets of three diets A, B and C, a
randomised block design was used. Four litters of three-month-old piglets were available,
and three piglets were selected from each litter to form a block. Diets A, B and C were
then randomly allocated to the piglets in each block. After six months of controlled diet,
the increase in weight was measured for each piglet. The results, in kg, are shown below.
Litter
Diet I II III IV
A 89 78 114 79
B 68 59 85 61
C 62 61 83 82
Response: wtgains
Df Sum Sq Mean Sq F value Pr(>F)
as.factor(diet) 2 1081.50 540.75 8.1418 0.01952 *
as.factor(litter) 3 1304.25 434.75 6.5458 0.02545 *
Residuals 6 398.50 66.42
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(a) Describe what analysis has been conducted and any conclusions that can be drawn.
In another experiment, the same three diets A, B and C were compared, each being
given to four piglets, selected at random without reference to which litter they came
from. Suppose the weights gained were exactly the same as those in the first experiment
(e.g. the four pigs on diet A gained 78, 79, 89 and 114 Kg). The following analysis was
again conducted in R.
> anova(lm(wtgains~as.factor(diet)))
Analysis of Variance Table
Response: wtgains
Df Sum Sq Mean Sq F value Pr(>F)
as.factor(diet) 2 1081.50 540.75 2.8582 0.1094
Residuals 9 1702.75 189.19
4
6. A neighbouring farmer to that in Q1 is also interested in the mean yield of turnips. They
conduct a two-way analysis of variance (with replicates) to assess di↵erences in mean
yields (tonnes per hectare) of three varieties of turnip, using some organic fertiliser, as
he is interested in maximising his yield. Each variety was grown on four plots in each of
two fields. The analyses, conducted in R, are shown below.
Response: yield
Df Sum Sq Mean Sq F value Pr(>F)
as.factor(variety) ### 1.2058 ### ### 0.0003057 ***
as.factor(field) ### 0.9600 ### ### 0.0002431 ***
Residuals 20 0.9675 0.04838
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
> qt(0.975,20)
[1] 2.085963
> mean(yield[variety==1])
[1] 5.4
> mean(yield[variety==2])
[1] 4.9375
> mean(yield[variety==3])
[1] 4.9125
(a) Six values have been replaced by ###. What are the missing values?
(b) Interpret the results of the analysis, as fully as possible, to the farmer (i.e. a non-
expert) with regard to any di↵erences of the mean yields for the di↵erent varieties
of turnip.
7. (From May 2018: Exam marked/100; marks per question given in [ ] brackets). The e↵ect
of vitamin C on tooth growth in guinea pigs has been studied. The response variable is
the length of odontoblasts (cells responsible for tooth growth) of 36 guinea pigs. Each
animal received one of three dose levels of vitamin C (dose level 1 = 0.5, dose level 2 =
1, and dose level 3 = 2 mg/day) by one of two delivery methods (method 1 = orange
juice; method 2 = ascorbic acid). Each possible combination of dose and treatment were
given to six di↵erent guinea pigs. The following analyses were conducted in R:
length <- c(4.2, 11.5, 7.3, 5.8, 6.4, 10.0, 16.5, 16.5, 15.2, 17.3, 22.5,
17.3, 23.6, 18.5, 33.9, 25.5, 26.4, 32.5, 15.2, 21.5, 17.6,
9.7, 14.5, 10.0, 19.7, 23.3, 23.6, 26.4, 20.0, 25.2, 25.5,
26.4, 22.4, 24.5, 24.8, 30.9)
dose <- c(rep(1,6), rep(2,6), rep(3,6),rep(1,6), rep(2,6), rep(3,6))
delivery <- c(rep(1,18),rep(2,18))
5
> model <- lm(length ~ as.factor(delivery)+as.factor(dose))
> anova(model)
(a) What is the name of the analysis being conducted? State the corresponding model
assumptions. [5]
(b) State the null and alternative hypotheses for the two tests that have been conducted
in R. [4]
(c) State your conclusions regarding whether di↵erent doses and/or di↵erent delivery
methods a↵ect the tooth growth at the 5% significance level? Explain your an-
swer. [2]
(d) Calculate the least significant di↵erences for the delivery methods and dose levels
at significance level ↵ = 0.05, as appropriate. [4] The following R output may be of
use:
> qt(0.975, 1)
[1] 12.7062
> qt(0.975, 2)
[1] 4.302653
> qt(0.975, 31)
[1] 2.039513
> qt(0.975, 32)
[1] 2.036933
(e) Conclude which delivery methods and/or dose levels di↵er from each other in terms
of mean length of odontoblasts. [4] The following R output may be of use:
> c(mean(length[delivery==1]), mean(length[delivery==2]))
[1] 17.27222 21.17778
> c(mean(length[dose==1]), mean(length[dose==2]),
mean(length[dose==3]))
[1] 11.14167 20.29167 26.24167
(f) If there is evidence that there is a di↵erence between the delivery method and/or
dose with regard to tooth growth state which delivery method and/or dose should
be recommended for a quicker tooth growth? [1]