Midterm 2022 Sol
Midterm 2022 Sol
Midterm 2022 Sol
Name: Student #:
• Put your name and student ID on the up-right corner of every sheet.
• Correct answers are usually short. Answer questions in brief but complete sentences.
For example, if we ask: Calculate SStrt , a satisfactory answer is:
The sum of square of the treatment is given by
k
X
SStrt = ni (ȳi· − ȳ·· )2 = 4 × (5 − 3.2)2 + 6 × (2 − 3.2)2 = 21.6.
i=1
• Use R for simple calculations such as the sample mean and sample variance (as in the
assignments). Answers obtained using one-line R functions will not be accepted.
• Save the R code you used in a .doc, .docx, .rtf, or .txt file. Include comments
describing which question the code block is used for. Leave sufficient space between code
for different questions. Submit your code to Canvas when instructed to.
2. use the conventional 5% level for tests, hypothesis for two-sided alternatives, and
95% confidence level.
1
1. [6] List the three principles of design of experiments we discussed in STAT 404.
Explain each principle in 1–2 complete sentences.
Answer.
(b) Replication: improves the precision of estimating the treatment effects or re-
peating the same treatment on several experiment units.
(c) Blocking: removes the effect of a factor that is not of interest or grouping simi-
lar experiment units to compare different treatments under similar conditions.
2. [8] The standard two-sample t-test is formulated under strict model assumptions.
(a) [4] Name two of the model assumptions. Describe each assumption in one
sentence.
Answer. Any two of the following (or other relevant assumptions) is acceptable:
• Identically distributed: all observations in the same sample have the same
distribution.
(b) [4] We recommend the Welch test when two populations have different variances.
Yet, we commented that this test is (1) mathematically invalid but (2) statistically
acceptable. Explain these two points.
Answer.
• The test is mathematically invalid because the test statistic for Welch’s test
does not have a t-distribution.
2
• The test is statistically acceptable (and widely recommended) because the dis-
tribution of Welch’s test is well-approximated by the recommended t-distribution,
which leads to null rejection probabilities close to the nominal level.
3. [20] A linear regression model assumes that the response values in a study can be
expressed as
yi = x ⊤
i β + ϵi for i = 1, 2, . . . , n ,
where x⊤ 2
i β is the expected value of yi and the ϵi ’s are iid N(0, σ ) random variables.
Use the R commands provided in the file “Midterm2022.txt” on the Canvas main
page to load the data. This file also provides a few lines of code to save time.
(b) [4] Estimate the error variance σ 2 (use the method given in class).
3
(c) [4] Estimate the variance matrix of β̂.
= 0.0317 .
Use the R commands provided in the file “Midterm2022.txt” on the Canvas main
page to load the data. This file also provides a few lines of code to save time.
4
and the grand mean is 1.64875. We find
6
X
SStrt = [4(ȳi − ȳ)2 ] = 0.93 .
i=1
(b) [4] Compute the error (or residual) sum of squares SSerr .
(c) [4] Complete the one-way layout ANOVA table. Not every cell needs to be filled.
Answer.
Source DF SS MSS F
Treatment 5 0.9304 0.1861 40.7367
Error 18 0.0822 0.00457
Total 23 1.0127
(d) [4] Test the hypothesis that all treatment means are equal at the 10% level.
State the null and alternative hypotheses, the test statistic and its reference distri-
bution, and your conclusions.
Answer. The null hypothesis is that all treatment means are equal, i.e.,
H0 : τ1 = · · · = τ6 .
The alternative is that at least two of them are not equal, i.e.,
H1 : τi ̸= τj , i ̸= j .
MSStrt 0.1861
Fobs = = = 40.73 .
MSSerr 0.0046
The reference distribution is F with degrees of freedom (5, 18). The p-value of the
test is
p = 1 − pf(40.73, 5, 18) = 3.38 × 10−9
5
which is below nominal level 0.1. We reject the null hypothesis and conclude that
at least two means are not equal.
(e) [4] Estimate the 6 treatment effects and the error variance (i.e., τ̂j and σ̂ 2 ). Use
complete sentences.
Answer. The estimated effects were calculated in a previous question and are
(τ̂i : i = 1, 2, . . . , 6)
s2 = MSSerr = 0.00457 .
(f ) [6] Construct simultaneous 90% CIs for the mean differences using Tukey’s
method. Pretend that you are computing all simultaneous CIs but show only the
first 3 (1 vs 2 ; 1 vs 3 ; 2 vs 3 ) in writing.
We have s
qs 1 1
√ + = 0.1347 .
2 4 4
Hence, the 90% simultaneous CIs are
1 vs 2 : (−0.312, −0.043) ,
1 vs 3 : (−0.582, −0.313) ,
2 vs 3 : (−0.405, −0.135) .
6
5. [8] The two-sample problem is a special case of the one-way layout. You may find
the following formulas helpful for this question:
n1
X n2
X
2
SStot = (y1j − ȳ·· ) + (y2j − ȳ·· )2 ,
j=1 j=1
σ2 σ2
E (ȳ1· − ȳ2· )2 = + (µ1 − µ2 )2 .
+
n1 n2
(b) [4] Prove the formula for the decomposition of the sum of squares:
Remark: while this is not a bonus question, do not start this problem unless you
have extra time.
n1
X n1
X
2
(y1j − ȳ·· ) = {(y1j − ȳ1· ) + (ȳ1· − ȳ·· )}2
j=1 j=1
Xn1
= (y1j − ȳ1· )2 + n1 (ȳ1· − ȳ·· )2 .
j=1
We get the desired identity by summing up the two sides. This completes the proof.