Solutions Chapter6
Solutions Chapter6
Solutions Chapter6
Chapter 6
Contents
In order to analyze the effect of reducing nitrate loading in a Danish fjord, it was
decided to formulate a linear model that describes the nitrate concentration in
the fjord as a function of nitrate loading, it was further decided to correct for
fresh water runoff. The resulting model was
a) Which of the following statements are assumed fulfilled in the usual mul-
tiple linear regression model?
Solution
1 ) ε i follows a normal distribution with expectation equal zero, but the realiza-
tions are not zero, and further β j is deterministic and hence it does not follow
a distribution ( β̂ j does), hence 1) is not correct
2 )- 3) There are no assumptions on the expectation of x j and the variance of ε equal
σ2 , not β21 hence 2) and 3) are not correct
4 ) Is correct, this is the usual assumption about the errors
5 ) Is incorrect since ε j follow a normal distribution, further the are no distribu-
tional assumptions on x j . In fact we assume that x j is known
Chapter 6 6.1 NITRATE CONCENTRATION 4
The parameters in the model were estimated in R and the following results are
available (slightly modified output from summary):
> summary(lm(y ~ x1 + x2))
Call:
lm(formula = y ~ x1 + x2)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.36500 0.22184 -10.661 < 2e-16
x1 0.47621 0.06169 7.720 3.25e-13
x2 0.08269 0.06977 1.185 0.237
---
Residual standard error: 0.3064 on 237 degrees of freedom
Multiple R-squared: 0.3438,Adjusted R-squared: 0.3382
F-statistic: 62.07 on 2 and 237 DF, p-value: < 2.2e-16
b) What are the parameter estimates for the model parameters ( β̂ i and σ̂2 )
and how many observations are included in the estimation?
Solution
The number of degrees of freedom is equal n − ( p + 1), and since the number of
degrees of freedom is 237 and p = 2, we get n = 237 + 2 + 1 = 240. The parameters
are given in the first column of the coefficient matrix, i.e.
β̂ 0 = −2.365 (6-2)
β̂ 1 = 0.476 (6-3)
β̂ 2 = 0.083 (6-4)
Solution
From Theorem 6.5 we know that the confidence intervals can be calculated by
β̂ i ± t1−α/2 σ̂βi ,
where t1−α/2 is based on 237 degrees of freedom, and with α = 0.05, we get t0.975 =
1.97. The standard errors for the estimates is the second column of the coefficient
matrix, and the confidence intervals become
Solution
We can see directly from the confidence intervals above that β 0 and β 1 are signifi-
cantly different from zero (the confidence intervals does not cover zero), while we
cannot reject that β 2 = 0 (the confidence interval cover zero). The p-values we
can see directly in the R output: for β 0 is less than 10−16 and the p-value for β 1 is
3.25 · 10−13 , i.e. very strong evidence against the null hypothesis in both cases.
Chapter 6 6.2 MULTIPLE LINEAR REGRESSION MODEL 6
No. 1 2 3 4 5 6 7 8 9 10 11 12 13
y 1.45 1.93 0.81 0.61 1.55 0.95 0.45 1.14 0.74 0.98 1.41 0.81 0.89
x1 0.58 0.86 0.29 0.20 0.56 0.28 0.08 0.41 0.22 0.35 0.59 0.22 0.26
x2 0.71 0.13 0.79 0.20 0.56 0.92 0.01 0.60 0.70 0.73 0.13 0.96 0.27
No. 14 15 16 17 18 19 20 21 22 23 24 25
y 0.68 1.39 1.53 0.91 1.49 1.38 1.73 1.11 1.68 0.66 0.69 1.98
x1 0.12 0.65 0.70 0.30 0.70 0.39 0.72 0.45 0.81 0.04 0.20 0.95
x2 0.21 0.88 0.30 0.15 0.09 0.17 0.25 0.30 0.32 0.82 0.98 0.00
D <- data.frame(
x1=c(0.58, 0.86, 0.29, 0.20, 0.56, 0.28, 0.08, 0.41, 0.22,
0.35, 0.59, 0.22, 0.26, 0.12, 0.65, 0.70, 0.30, 0.70,
0.39, 0.72, 0.45, 0.81, 0.04, 0.20, 0.95),
x2=c(0.71, 0.13, 0.79, 0.20, 0.56, 0.92, 0.01, 0.60, 0.70,
0.73, 0.13, 0.96, 0.27, 0.21, 0.88, 0.30, 0.15, 0.09,
0.17, 0.25, 0.30, 0.32, 0.82, 0.98, 0.00),
y=c(1.45, 1.93, 0.81, 0.61, 1.55, 0.95, 0.45, 1.14, 0.74,
0.98, 1.41, 0.81, 0.89, 0.68, 1.39, 1.53, 0.91, 1.49,
1.38, 1.73, 1.11, 1.68, 0.66, 0.69, 1.98)
)
Chapter 6 6.2 MULTIPLE LINEAR REGRESSION MODEL 7
Solution
The question is answered by R. Start by loading data into R and estimate the param-
eters in R
Call:
lm(formula = y ~ x1 + x2, data = D)
Residuals:
Min 1Q Median 3Q Max
-0.155 -0.078 -0.020 0.050 0.301
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.43355 0.06598 6.57 1.3e-06 ***
x1 1.65299 0.09525 17.36 2.5e-14 ***
x2 0.00394 0.07485 0.05 0.96
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Solution
The parameter estimates are given in the first column of the coefficient matrix, i.e.
β̂ 0 = 0.434,
β̂ 1 = 1.653,
β̂ 2 = 0.0039,
and the error variance estimate is σ̂2 = 0.112 . The confidence intervals can either be
calculated using the second column of the coefficient matrix, and the value of t0.975
(with degrees of freedom equal 22), or directly in R:
Chapter 6 6.2 MULTIPLE LINEAR REGRESSION MODEL 8
confint(fit)
2.5 % 97.5 %
(Intercept) 0.2967 0.5704
x1 1.4555 1.8505
x2 -0.1513 0.1592
Solution
Since the confidence interval for β 2 cover zero (and the p-value is much larger than
0.05), the parameter should be removed from the model to get the simpler model
y i = β 0 + β 1 x1 + ε i , ε i ∼ N (0, σ2 ),
Call:
lm(formula = y ~ x1, data = D)
Residuals:
Min 1Q Median 3Q Max
-0.1563 -0.0763 -0.0215 0.0516 0.2999
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4361 0.0440 9.91 9.0e-10 ***
x1 1.6512 0.0871 18.96 1.5e-15 ***
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
c) Carry out a residual analysis to check that the model assumptions are ful-
filled.
Solution
We are interested in inspecting a q-q plot of the residuals and a plot of the residuals
as a function of the fitted values
par(mfrow=c(1,2))
qqnorm(fit$residuals, pch=19)
qqline(fit$residuals)
plot(fit$fitted.values, fit$residuals, pch=19,
xlab="Fitted.values", ylab="Residuals")
0.3
0.2
0.2
Sample Quantiles
Residuals
0.1
0.1
0.0
0.0
-0.1
-0.1
there are no strong evidence against the assumptions, the qq-plot is are a straight
line and the are no obvious dependence between the residuals and the fitted values,
and we conclude that the assumptions are fulfilled.
d) Make a plot of the fitted line and 95% confidence and prediction intervals
of the line for x1 ∈ [0, 1] (it is assumed that the model was reduced above).
Chapter 6 6.2 MULTIPLE LINEAR REGRESSION MODEL 10
Solution
Prediction
Confidence band
2.0
Prediction band
Prediction
1.5
1.0
0.5
Nr. 1 2 3 4 5 6 7 8
y 9.29 12.67 12.42 0.38 20.77 9.52 2.38 7.46
x1 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00
x2 4.00 12.00 16.00 8.00 32.00 24.00 20.00 28.00
D <- data.frame(
y=c(9.29,12.67,12.42,0.38,20.77,9.52,2.38,7.46),
x1=c(1.00,2.00,3.00,4.00,5.00,6.00,7.00,8.00),
x2=c(4.00,12.00,16.00,8.00,32.00,24.00,20.00,28.00)
)
Solution
par(mfrow=c(1,2))
plot(D$x1, D$y, xlab="x1", ylab="y")
plot(D$x2, D$y, xlab="x1", ylab="y")
Chapter 6 6.3 MLR SIMULATION EXERCISE 12
20
20
15
15
y
y
10
10
5
5
0
0
1 2 3 4 5 6 7 8 5 10 15 20 25 30
x1 x1
Yi = β 0 + β 1 x1,i + ε i , ε i ∼ N (0, σ2 ),
and
Yi = β 0 + β 1 x2,i + ε i , ε i ∼ N (0, σ2 ),
and report the 95% confidence intervals for the parameters. Are any of the
parameters significantly different from zero on a 5% confidence level?
Chapter 6 6.3 MLR SIMULATION EXERCISE 13
Solution
2.5 % 97.5 %
(Intercept) -0.5426 24.898
x1 -3.1448 1.893
confint(fit2)
2.5 % 97.5 %
(Intercept) -7.5581 15.9659
x2 -0.2958 0.8688
since all confidence intervals cover zero we cannot reject that the parameters are in
fact zero, and we would conclude neither x1 nor x2 explain the variations in y.
and go through the steps of Method 6.16 (use confidence level 0.05 in all
tests).
Chapter 6 6.3 MLR SIMULATION EXERCISE 14
Solution
Call:
lm(formula = y ~ x1 + x2, data = D)
Residuals:
1 2 3 4 5 6 7 8
0.9622 0.1783 -0.3670 -1.0963 -0.3448 -0.2842 0.0178 0.9339
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.0325 0.6728 11.9 0.0000727 ***
x1 -3.5734 0.1955 -18.3 0.0000090 ***
x2 0.9672 0.0489 19.8 0.0000061 ***
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Solution
par(mfrow=c(1,2))
qqnorm(fit$residuals)
qqline(fit$residuals)
plot(fit$fitted.values, fit$residuals,
xlab="Fitted values", ylab="Residuals")
Chapter 6 6.3 MLR SIMULATION EXERCISE 15
1.0
0.5
0.5
Sample Quantiles
Residuals
0.0
0.0
-0.5
-0.5
-1.0
-1.0
-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 5 10 15 20
Theoretical Quantiles Fitted values
The are no obvious structures in the residuals as a function of the fitted values and
also there does not seem be be serious departure from normality, but lets try to look
at the residuals as a function of the independent variables anyway
Solution
par(mfrow=c(1,2))
plot(D$x1, fit$residuals, xlab="x1", ylab="Residuals")
plot(D$x2, fit$residuals, xlab="x1", ylab="Residuals")
1.0
1.0
0.5
0.5
Residuals
Residuals
0.0
0.0
-0.5
-0.5
-1.0
-1.0
1 2 3 4 5 6 7 8 5 10 15 20 25 30
x1 x1
the plot of the residuals as a function of x1 suggest that there could be a quadratic
dependence.
Chapter 6 6.3 MLR SIMULATION EXERCISE 16
Solution
Call:
lm(formula = y ~ x1 + x2 + x3, data = D)
Residuals:
1 2 3 4 5 6 7 8
0.0417 -0.0233 -0.0107 -0.0754 -0.0252 0.1104 0.0585 -0.0758
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.1007 0.1212 83.3 1.2e-07 ***
x1 -5.0024 0.0709 -70.5 2.4e-07 ***
x2 1.0006 0.0054 185.2 5.1e-09 ***
x3 0.1474 0.0070 21.1 3.0e-05 ***
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
we can see that all parameters are still significant, and we can do the residual anal-
ysis of the resulting model.
Solution
par(mfrow=c(2,2))
qqnorm(fit3$residuals)
qqline(fit3$residuals)
plot(fitted.values(fit3), fit3$residuals,
xlab="Fitted values", ylab="Residuals")
plot(D$x1, fit3$residuals, xlab="x1", ylab="Residuals")
plot(D$x2, fit3$residuals, xlab="x2", ylab="Residuals")
Chapter 6 6.3 MLR SIMULATION EXERCISE 17
0.10
Sample Quantiles
0.05
0.05
Residuals
0.00
0.00
-0.05
-0.05
-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 0 5 10 15 20
Theoretical Quantiles Fitted values
0.10
0.10
0.05
0.05
Residuals
Residuals
0.00
0.00
-0.05
-0.05
1 2 3 4 5 6 7 8 5 10 15 20 25 30
x1 x2
There are no obvious structures left and there is no departure from normality, and
we can report the finally selected model as
2
Yi = β 0 + β 1 x1,i + β 2 x2,i + β 3 x1,i + εi, ε i ∼ ( N (0, σ2 ),
d) Find the standard error for the line, and the confidence and prediction in-
tervals for the line for the points (min( x1 ), min( x2 )), (x̄1 , x̄2 ), (max( x1 ), max( x2 )).
Chapter 6 6.3 MLR SIMULATION EXERCISE 18
Solution
## New data
Dnew <- data.frame(x1=c(min(D$x1),mean(D$x1),max(D$x1)),
x2=c(min(D$x2),mean(D$x2),max(D$x2)),
x3=c(min(D$x1),mean(D$x1),max(D$x1))^2)
1 2 3
0.07306 0.04785 0.07985
## Confidence interval
predict(fit3, newdata=Dnew, interval="confidence")
## Prediction interval
predict(fit3, newdata=Dnew, interval="prediction")
e) Plot the observed values together with the fitted values (e.g. as a function
of x1 ).
Chapter 6 6.3 MLR SIMULATION EXERCISE 19
Solution
+
20
y1
fitted.values
15
+ +
+ +
y
10
+
5
+
+
0
1 2 3 4 5 6 7 8
x1
Notice that we have an almost perfect fit when including x1 , x2 and x12 in the model,
while neither x1 nor x2 alone could predict the outcomes.