Homework 5 Solution
Homework 5 Solution
Homework 5 Solution
HOMEWORK 5 ANSWERS
March 9, 2010
Due: 03/04/10
Instructor: Frank Wood
1. (20 points) In order to get a maximum likelihood estimate of the parameters of a
Box-Cox transformed simple linear regression model (Yi = 0 + 1 Xi + i ), we need to find
the gradient of the likelihood with respect to its parameters (the gradient consists of the
partial derivatives of the likelihood function w.r.t. all of the parameters). Derive the partial
derivatives of the likelihood w.r.t all parameters assuming that i N (0, 2 ). (N.B. the
parameters here are , 0 , 1 , )
(Extra Credit: Given this collection of partial derivatives (the gradient), how would you then
proceed to arrive at final estimates of all the parameters? Hint: consider how to increase
the likelihood function by making small changes in the parameter settings.)
Answer:
The gradient of a multi-variate function is defined to be a vector consisting of all the partial
derivatives w.r.t
every single variable. So we need to write down the full likelihood first:
P
Q 1 (yi 021 xi )2
2
L = 2 e
Then the log-likelihood
function is:
P
(y
1 xi )2
0
i
l = n2 log( 2 )
2
2
Take derivatives w.r.t to all the four parameters, we have the followings:
l
1 X
= 2
(yi 0 1 xi )yi lnyi
l
1 X
(yi 0 1 xi )
= 2
0
l
1 X
= 2
(yi 0 1 xi )xi
1
P
l
n
(yi 0 1 xi )2
=
+
2
2 2
2 4
From the above equations array, we can have the gradient.
(1)
(2)
(3)
(4)
2. (15 points)
for the case of three statements, each with statement confidence coefficient 1 .
Answer:
16
4
1
5
"
10 0
2
; X X = 1
;Y=
4
3
15
13
3
4
22
"
#
55 17
1
(X0 X)1 = 41
;
17 6
"
#"
55
17
1 1 1 1
1
(X0 X)1 X0 = 41
17 6
4 1 2 3
1
1
(a) X =
1
1
1
1
2
1
#
1 1 1 1 1 1
1 2 3 3 4
1
1
1
#
1 1
=
3 4
"
1
41
1 "
#
2
6
17
=
3
17 55
3
4
#
13 38 21 4 4 13
7 11 5 1 1 7
This is problem 4.22 in Applied Linear Regression Models(4th edition) by Kutner etc.
This is problem 5.24 in Applied Linear Regression Models(4th edition) by Kutner etc.
0
1 1
1 0
H = X(X X) X = 41
1
1
1
4
15 6 1 8 8 15
1 "
6 27 16 5 5 6
#
1 16 11 6 6 1
2
13
38
21
4
4
13
1
= 41
8
3
5 6 7 7 8
7 11 5 1 1 7
8
3
5 6 7 7 8
4
15 6 1 8 8 15
16
5
"
" # "
#
#
13
38
21
4
4
13
18
10
0.4390
1
= 1
(1): = (X0 X)1 X0 Y = 41
41 189 = 4.6098
7 11 5 1 1 7
15
13
22
1
16
5 1
10 1
(2): Residual=Y X =
15 1
13 1
1
22
2.8780
4
0.0488
1 "
#
0.3415
0.4390
2
=
0.7317
3
4.6098
1.2683
3
3.1220
4
0.7424 = 0.8616
0
1 1
1 0
(c) As calculated in part(a), the hat matrix H = X(X X) X = 41
1
1
1
#
6.8055 2.1035
=
2.1035 0.7424
1 "
#
2
13
38
21
4
4
13
3
7 11 5 1 1 7
3
4
15
1 1
= 41
8
8
15
6 1 8 8 15
0.3659 0.1463 0.0244 0.1951 0.1951 0.3659
0.0244
16 11 6 6 1
0.3902
0.2683
0.1463
0.1463
0.0244
=
5 6 7 7 8 0.1951
0.1220 0.1463 0.1707 0.1707 0.1951
5 6 7 7 8 0.1951
0.1220 0.1463 0.1707 0.1707 0.1951
6 1 8 8 15
0.3659 0.1463 0.0244 0.1951 0.1951 0.3659
3.2171
0.7424 0.1237 0.9899 0.9899
1.8560
0.7424
0.1237
0.9899
0.9899
3.2171
Matlab Code:
X=[1 4;1 1;1 2;1 3;1 3;1 4]
Y=[16;5;10;15;13;22]
J=ones(6,6)
I=eye(6,6)
[n, m] = size(Y )
Z = inv(X 0 X)
H=X*Z*X
beta=Z*X*Y
residual=Y-H*Y
SSR=Y*(H-(1/n)*J)*Y
SSE=Y*(I-H)*Y
MSE=SSE/(n-2)
cov=MSE*Z
s2 e = M SE (I H)
Xh=[1;4]
Yhhat=Xh*beta
s2 pred = M SE (1 + Xh0 Z Xh)
4. (25 points) 3 In a small-scale regression study, the following data were obtained: Assume
3
This is problem 6.27 in Applied Linear Regression Models(4th edition) by Kutner etc.
i:
Xi1
Xi2
Yi
1
7
33
42
2 3
4 5
6
4 16 3 21 8
41 7 49 5 31
33 75 28 91 55
(5)
with independent normal error terms is appropriate. Using matrix methods, obtain (a) b;
(b) e; (c) H; (d) SSR; (e) s2 {b}; (f) Yh when Xh1 = 10, Xh2 = 30; (g) s2 {Yh } when Xh1 = 10,
Xh2 = 30
Answer:
33.9321
2.6996
1.2300
1.6374
(b) e = Y Xb =
1.3299
0.0900
6.9868
0.2314
0.2517
0.2118
0.1489 0.0548 0.2110
0.3124
0.0944
0.2663 0.1479 0.2231
0.2517
0.2118
0.0944
0.7044
0.3192
0.1045
0.2041
0
1 0
(c) H = X(X X) X =
0.1489
0.2663 0.3192 0.6143
0.1414 0.1483
i 33.9321
(g) At Xh1 =10 and Xh2 = 30, s2 {Yh } = X0h s2 {b}Xh = 5.4246
Matlab Code:
X=[1 7 33;1 4 41;1 16 7;1 3 49;1 21 5; 1 8 31]
Y=[42;33;75;28;91;55]
J=ones(6,6)
I=eye(6,6)
[n, m] = size(Y )
Z=inv(X*X)
H=X*Z*X
beta=Z*X*Y
residual=Y-H*Y
SSR=Y*(H-(1/n)*J)*Y
SSE=Y*(I-H)*Y
MSE=SSE/(n-3)
cov=MSE*Z
s2 e=MSE*(I-H)
Xh=[1;10;30]
Yhhat=Xh*beta
s2 yhat=Xh*cov*Xh
5. (15 points) Consider the classic regression model using matrix, i.e.
Y = X +
where X is a n p design matrix whose first column is an all 1 vector, N (0, I) and I is
an identity matrix. Prove the followings:
0 e
can be written in a matrix form:
a. The residual sum of squares RSS = e
RSS = y0 (I X(X0 X)1 X0 )y
(6)
b. We call the RHS of (2) a sandwich. Prove the matrix in the middle layer of the sandwich
N = I X(X0 X)1 X0 is an idempotent matrix.
6