Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Midterm 2022 Sol

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

DO NOT OPEN THIS EXAM UNTIL INSTRUCTED TO

Name: Student #:

STAT 404 Midterm Exam


September–December, 2022
Instructor: Jiahua Chen
Total marks: 68.

• Put your name and student ID on the up-right corner of every sheet.

• Correct answers are usually short. Answer questions in brief but complete sentences.
For example, if we ask: Calculate SStrt , a satisfactory answer is:
The sum of square of the treatment is given by
k
X
SStrt = ni (ȳi· − ȳ·· )2 = 4 × (5 − 3.2)2 + 6 × (2 − 3.2)2 = 21.6.
i=1

An unsatisfactory answer is:


21.6.

• Use R for simple calculations such as the sample mean and sample variance (as in the
assignments). Answers obtained using one-line R functions will not be accepted.

• Save the R code you used in a .doc, .docx, .rtf, or .txt file. Include comments
describing which question the code block is used for. Leave sufficient space between code
for different questions. Submit your code to Canvas when instructed to.

• Unless otherwise specified,

1. assume common notations and model assumptions;

2. use the conventional 5% level for tests, hypothesis for two-sided alternatives, and
95% confidence level.

1
1. [6] List the three principles of design of experiments we discussed in STAT 404.
Explain each principle in 1–2 complete sentences.

Answer.

(a) Randomization: prevents the effect of lurking factors or assigning treatments


to random experiment units (either is fine).

(b) Replication: improves the precision of estimating the treatment effects or re-
peating the same treatment on several experiment units.

(c) Blocking: removes the effect of a factor that is not of interest or grouping simi-
lar experiment units to compare different treatments under similar conditions.

2. [8] The standard two-sample t-test is formulated under strict model assumptions.

(a) [4] Name two of the model assumptions. Describe each assumption in one
sentence.

Answer. Any two of the following (or other relevant assumptions) is acceptable:

• Independence: all observations are independent of each other.

• Normality: all observations have normal distributions.

• Identical means: the two populations have the same mean.

• Identical variances: the two populations have the same variance.

• Identically distributed: all observations in the same sample have the same
distribution.

(b) [4] We recommend the Welch test when two populations have different variances.
Yet, we commented that this test is (1) mathematically invalid but (2) statistically
acceptable. Explain these two points.

Answer.

• The test is mathematically invalid because the test statistic for Welch’s test
does not have a t-distribution.

2
• The test is statistically acceptable (and widely recommended) because the dis-
tribution of Welch’s test is well-approximated by the recommended t-distribution,
which leads to null rejection probabilities close to the nominal level.

3. [20] A linear regression model assumes that the response values in a study can be
expressed as

yi = x ⊤
i β + ϵi for i = 1, 2, . . . , n ,

where x⊤ 2
i β is the expected value of yi and the ϵi ’s are iid N(0, σ ) random variables.

Use the R commands provided in the file “Midterm2022.txt” on the Canvas main
page to load the data. This file also provides a few lines of code to save time.

The dataset contains


• x1 : the assignment mark,
• x2 : the midterm mark, and
• y: the final exam mark
of n = 39 students in some course. Regard x1 and x2 as predictors and y as the
response variable.

Note: in this case, x = (x0 = 1, x1 , x2 )⊤ and β = (β0 , β1 , β2 )⊤ .

(a) [4] Obtain the least squares estimator β̂ of β.

Answer. The LSE of β is

β̂ = (X⊤ X)−1 X⊤ y = (11.9091, 0.4910, 0.4028) .

(b) [4] Estimate the error variance σ 2 (use the method given in class).

Answer. The error variance is estimated as


X
σ̂ 2 = (y − ŷ)2 /(39 − 3) = 54.59 .

3
(c) [4] Estimate the variance matrix of β̂.

Answer. The variance matrix of β̂ is estimated as


 
86.8395 −0.7029 −0.4398
 
2 ⊤ −1
Var(β̂) = σ̂ (X X) = −0.7029 0.0131 −0.0041 .
 
 
−0.4398 −0.0041 0.0104

(d) [4] Estimate the variance of β̂2 − β̂1 (both LS estimators).

Answer. The variance of β̂2 − β̂1 is estimated to be

Var(β̂2 − β̂1 ) = σ̂ 2 (X⊤ X)−1 ⊤ −1 ⊤ −1



22 + (X X)33 − 2(X X)23

= 0.0131 + 0.0104 − 2(−0.0041)

= 0.0317 .

(e) [4] Construct a two-sided, non-simultaneous 95% CI for β1 − β2 (the difference


regression coefficients for the assignment and midterm marks).
Hint: remember the general recipe for constructing CIs.

Answer. The 95% CI for β1 − β2 is given by


q
(β̂1 − β̂2 ) ± qt(0.975, 39 − 3) Var(β̂1 − β̂2 ) = 0.0882 ± 0.361 = [−0.273, 0.449] .

4. [26] Consider a hypothetical one-way layout comparing k = 6 treatments under


standard model assumptions and notations.

Use the R commands provided in the file “Midterm2022.txt” on the Canvas main
page to load the data. This file also provides a few lines of code to save time.

(a) [4] Compute the treatment sum of squares SStrt .

Answer. The mean responses of these treatments are

1.5675, 1.7450, 2.0150, 1.3800, 1.5475, 1.6375

4
and the grand mean is 1.64875. We find
6
X
SStrt = [4(ȳi − ȳ)2 ] = 0.93 .
i=1

(b) [4] Compute the error (or residual) sum of squares SSerr .

Answer. Let s2i be the sample variance for treatment i. We compute it as


6
X
SSerr = 3 s2i = 0.0822 .
i=1

(c) [4] Complete the one-way layout ANOVA table. Not every cell needs to be filled.

Answer.

Source DF SS MSS F
Treatment 5 0.9304 0.1861 40.7367
Error 18 0.0822 0.00457
Total 23 1.0127

(d) [4] Test the hypothesis that all treatment means are equal at the 10% level.
State the null and alternative hypotheses, the test statistic and its reference distri-
bution, and your conclusions.

Answer. The null hypothesis is that all treatment means are equal, i.e.,

H0 : τ1 = · · · = τ6 .

The alternative is that at least two of them are not equal, i.e.,

H1 : τi ̸= τj , i ̸= j .

The test statistic (and its value) is

MSStrt 0.1861
Fobs = = = 40.73 .
MSSerr 0.0046
The reference distribution is F with degrees of freedom (5, 18). The p-value of the
test is
p = 1 − pf(40.73, 5, 18) = 3.38 × 10−9

5
which is below nominal level 0.1. We reject the null hypothesis and conclude that
at least two means are not equal.

(e) [4] Estimate the 6 treatment effects and the error variance (i.e., τ̂j and σ̂ 2 ). Use
complete sentences.

Answer. The estimated effects were calculated in a previous question and are
(τ̂i : i = 1, 2, . . . , 6)

−0.081, 0.096, 0.366, − 0.269, − 0.101, − 0.011 .

The estimated error variance is

s2 = MSSerr = 0.00457 .

(f ) [6] Construct simultaneous 90% CIs for the mean differences using Tukey’s
method. Pretend that you are computing all simultaneous CIs but show only the
first 3 (1 vs 2 ; 1 vs 3 ; 2 vs 3 ) in writing.

Answer. The mean differences are estimated as

(ȳ1 − ȳ2 , ȳ1 − ȳ3 , ȳ2 − ȳ3 ) = (−0.1775, −0.4475, −0.2700) .

The estimated error standard deviation is


p
s = MSSerr = 0.0676 .

Tukey’s 90% quantile is given by

q = qtukey(0.9, 6, 18) = 3.9836 .

We have s 
qs 1 1
√ + = 0.1347 .
2 4 4
Hence, the 90% simultaneous CIs are

1 vs 2 : (−0.312, −0.043) ,

1 vs 3 : (−0.582, −0.313) ,

2 vs 3 : (−0.405, −0.135) .

6
5. [8] The two-sample problem is a special case of the one-way layout. You may find
the following formulas helpful for this question:
n1
X n2
X
2
SStot = (y1j − ȳ·· ) + (y2j − ȳ·· )2 ,
j=1 j=1

SStrt = n1 (ȳ1· − ȳ·· )2 + n2 (ȳ2· − ȳ·· )2 ,


n1
X n2
X
2
SSerr = (y1j − ȳ1· ) + (y2j − ȳ2· )2 .
j=1 j=1

(a) [4] Suppose µ1 ̸= µ2 . Compute E [(ȳ1· − ȳ2· )2 ].

Answer. By independence and the relationship E[X 2 ] = Var(X) + E2 [X], we have

 σ2 σ2
E (ȳ1· − ȳ2· )2 = + (µ1 − µ2 )2 .

+
n1 n2

(b) [4] Prove the formula for the decomposition of the sum of squares:

SStot = SStrt + SSerr .

Remark: while this is not a bonus question, do not start this problem unless you
have extra time.

x2i = − x̄)2 + nx̄2 , we get


P P
Answer. Using the well-known fact i i (xi

n1
X n1
X
2
(y1j − ȳ·· ) = {(y1j − ȳ1· ) + (ȳ1· − ȳ·· )}2
j=1 j=1
Xn1
= (y1j − ȳ1· )2 + n1 (ȳ1· − ȳ·· )2 .
j=1

For the same reason, we have


n2
X n2
X
2
(y2j − ȳ·· ) = (y2j − ȳ2· )2 + n2 (ȳ2· − ȳ·· )2 .
j=1 j=1

We get the desired identity by summing up the two sides. This completes the proof.

You might also like