STA302 Week11 Full

STA302/1001 - Methods of Data Analysis I
(Week 11 lecture notes)
Wei (Becky) Lin
Nov. 21, 2016
1/49
Last Week
• Geometric perspective of least squares regression

• F test for regression coefficients.
• Coefficient of Multiple Determination.
• Inferences about Regression Parameters.
• Interval Estimation of —k , E(Yh ).
• Extra sum of squares
2/49
Week 11- Learning objectives & Outcomes
,¥fK
#
itk"!,j"
¥
!
a
striped
ml
:7=atAKtE →
Y=potp,X,tRXzE
MZ : !
.is#xyNsExk
→
sSTo= # Hitp :
same form !mz ,)
i
g-
fixed SSRCKLX, )
I
may given his
SSRLXZIX ,
.
• Review of Extra sum of squares ) ⇒ SEN , )
• Type I and Type III SS =ssRlk.Xy SSRLX

-
Elk .k)
• Use of Extra Sum of Squares in Tests for Regression coefficients

• Coefficient of Partial Determination
• Summary of Tests Concerning Regression Coefficients
• Multicollinearity and Its Effects
3/49
Review on Extra Sum of Squares
• Extra Sum of Squares measures the marginal decrease in SSE

(equivalently, the marginal increase in SSR) when one or several
predictor variables are added to the regression model, given that other
variables are already in the model.
• Extra: SSE goes down by the amount of x , SSR goes up by the same
amount of x since SSTO=SSR+SSE.
• Examples:
• SSR(X1 , X2 , X3 ) is the total variation explained by X1 X2 , and X3 in a
model
• SSR(X1 |X2 ) is the additional variation explained by X1 added to a
model already containing X2 .
• SSR(X1 , X4 |X2 , X3 ) is the additional variation explained by X1 and X4
when they are added to a model already containing X2 and X3 .
• Subscripts after bar ( | ) represent variables already in model.
4/49
Review on Extra Sum of Squares
SSEZ =/
#
a
1
:
1*1×4×3>4*4
ml :
Y= pot BXZFBXHE
m2iY=
SSTO under ml :
SSRIK # }
ssTounderm2@sSRlXl.k.X3.Xd-SSRlXz.X
ftp.XHAXstp.xitpyxytq
, ,
} )
= ,
SSECXS ,X} ) SSECXI ,XzX},X41 SSRCX ,X41k,X ,) 5/49

= !
-
, 1
Sequential SS (type I SS)
• SSR (Type I SS) decomposition
SSR df
Extra
X1 1 )
sum of squares
X2 |X1 1 t SSRIX } IX.
given previous X 's { X |X , X 1
Xz
)
3 1 2
(X1 , X2 , X3 ) 3
D= D=
lssrlx.in#=SsRlXi)tSSRlXz1Xi
ISSRCX.be#=SsRlXzHSSR(XiIXz
same
SSR df
X2 1 )
X1 |X2 1
T
X3 |X1 , X2 1 SSRlX3|X , ,X2 )
(X1 , X2 , X3 ) 3
• Depends on variable order

6/49
Extra Sum of Squares (type I SS)
Y = —0 + —1 Xi + . . . + —p≠1 Xp≠1 + ‘
q
• SSR = i (Ŷi ≠ Ȳ )2 = SSR(X1 , . . . , Xp≠1 )
q
• SSE = i (Yi ≠ Ŷi )2 = SSE (X1 , . . . , Xp≠1 )
• Extra Sum of Squares
• Break down the SSR to contributions from different X’s sequentially
• SSR(X1 ), SSR(X2 |X1 ), SSR(X3 |X1 , X2 ), . . .
• SSR(X2 |X1 ) = SSR(X1 , X2 ) ≠ SSR(X1 ) = SSE (X1 ) ≠ SSE (X1 , X2 )
• SSR = SSR(X1 ) + SSR(X2 |X1 ) + . . . SSR(Xp≠1 |X1 , . . . , Xp≠2 )

• Degrees of freedom of extra SSR equal to number of extra variables
• e.g. dfSSR(X3 |X1 ) = 1, dfSSR(X2 ,X3 |X1 ) = 2
• Contributions are not additive
SSR(X1 , X2 , X3 ) ”= SSR(X1 ) + SSR(X2 ) + SSR(X3 )
7/49
Extended ANOVA Table
.
capers
Source of Variance SS df MS
ELIMX 1 SSR(X1 ) 1 MSR(X1 )
X2 |X1 SSR(X2 |X1 ) 1 MSR(X2 |X1 )
X3 |X1 , X2 SSR(X3 |X1 , X2 ) 1 MSR(X3 |X1 , X2 )
Error SSE(X1 , X2 , X3 ) n-4 MSR(X1 , X2 , X3 )
Total SSTO n-1
t R
default anova output in
using annoy
8/49
F test in anova output (type I SS)
• Type I SS: variables added in order. Sum of sequential SSR gives SSR.
• F-tests are testing each variable given previous variables already in
model.
9/49
Type III/II SS
• SSR (Type III SS) decomposition: refers to variable added last.

• These do NOT add to the SSR.
• F-tests are testing variable given that all of the other variables already
in the model.
SSR df
XKIX K
X1 |X2 , X3 1
-
Xi :
-
xz :
X2 |X1 , X3 1 given all the rest
X3 :
X3 |X1 , X2 1
(X1 , X2 , X3 ) 3 X 's
sutmssrlxklkk ) # SSR
• Does not depend on variable order
• Type II SS are pretty much the same as Type III, except they ignore
interaction terms.
10/49
Type I vs Type III
• Estimates using Type I SS tell us how much of the variation in Y can

be explained by X1 , how much of the residual variability (SSE (X1 ))
can be explained by X2 , how much of the remaining residual
(SSE (X1 , X2 )) can be explained by X3 and so on, in order.
• Estimates using Type III SS tell us how much of the residual variability
in Y can be accounted for by X1 after having accounted for everything
else, and how much of the residual variability in Y can be accounted
for X2 after having accounted for everything else as well, and so on.
11/49
7.2 Use of Extra Sums of Squares in Tests for
Regression Coefficients
12/49
Partial F test: Test whether several —k = 0
• Consider a regression model, we call Full model:
Y = —0 + —1 X1 + . . . + —q Xq + —q+1 Xq+1 + . . . + —q+p Xq+p + ‘
• We want to test the null hypothesis that some of the —k are zero
H0 : —q+1 = . . . = —q+p = 0
• The alternative hypothesis is
HA : not all —q+1 , . . . , —q+p equal zero
• The general linear test approach:
Fú = mnmgt
Cnn
(SSER ≠ SSEF )/(dfR ≠ dfF )
SSEF /dfF
• Decision: reject H0 , in favor of Ha at – significance level if
F ú Ø F1≠–;dfR ≠dfF ,dfF

13/49
Partial F test: Test whether several —k = 0 (contd.)
• Full model:
Y = —0 + —1 X1 + . . . + —q Xq + —q+1 Xq+1 + . . . + —q+p Xq+p + ‘

Full model
• Reduced model (under H0 ):
"
Y = —0 + —1 X1 + . . . + —q Xq + ‘
• From Full model, we get:
SSR(X1 , . . . , Xq , . . . , Xq+p ), SSE (X1 , . . . , Xq , . . . , Xq+p ) =

SSEF
• From Reduced model, we get:
SSR(X1 , . . . , Xq ), SSE (X1 , . . . , Xq ) =

SSER
14/49
Partial F test: Test whether several —k = 0 (contd.)
• The extra sum of squares is obtained as
SSR(Xq+1 , . . . , Xp+q |X1 , . . . , Xq ) =

SSER -
SSEF
= SSE (X1 , . . . , Xq ) ≠ SSE (X1 , . . . , Xq , . . . , Xq+p )
• Alternatively
SSR(Xq+1 , . . . , Xp+q |X1 , . . . , Xq ) =

SSRF SSRR -
= SSR(X1 , . . . , Xq , . . . , Xq+p ) ≠ SSR(X1 , . . . , Xq )

• Hence, the test statistic is dfr =
n .
cqtil
dtf = n -
(
qtptt )
(SSER ≠ SSEF )/p p=dfr At
Fú =
-
SSE (X1 , . . . , Xp+q )/(n ≠ p ≠ q ≠ 1)

SSR(Xq+1 , . . . , Xp+q |X1 , . . . , Xq )/p
=
SSE (X1 , . . . , Xp+q )/(n ≠ p ≠ q ≠ 1)
• Decision: Reject H0 , conclude Ha at – significance level if
F ú Ø F1≠–;p,n≠p≠q≠1 15/49
Body fat Example:Testing a single —3 = 0
Full model Yi = —i + —1 X1 + —2 X2 + —3 X3 + ‘
• Test:
H0 : —3 = 0 Ha : — ”= 0
• Reduced model under H0 :
Reduced model : Yi = —i + —1 X1 + —2 X2 + ‘
• The test whether or not —3 = 0 is a marginal test, given X1 , X2 are

already in the the model.
16/49
Body fat Example:Testing a single —3 = 0 (contd.)
• SSEF = SSE (X1 , X2 , X3 ), df = n ≠ 4

• SSER = SSE (X1 , X2 ), df = n ≠ 3
• The general linear test statistics
(SSER ≠ SSEF )/(dfR ≠ dfF )

Fú =
SSEF /dfF
[SSE (X1 , X2 ) ≠ SSE (X1 , X2 , X3 )]/[(n ≠ 3) ≠ (n ≠ 4)]
=
SSE (X1 , X2 , X3 )/(n ≠ 4)
SSR(X3 |X1 , X2 )/1
=
SSE (X1 , X2 , X3 )/(n ≠ 4)
MSR(X3 |X1 , X2 )
=
MSEF
17/49
}
MSEF MSRCX } IX. ,
Xz )
• Test statistics:
F ú = MSR(X3 |X1 , X2 )/MSEF = 11.54/6.15 = 1.876423
• Decision: F ú = 1.876 Æ 4.494 = F1≠0.05,1,16 , failed to reject H0 .
18/49
Body fat Example:Testing a single —k = 0 (contd.)
body=read.table("/Users/Wei/TA/Teaching/0-STA302-2016F/Week10-Nov14/bodyfat.txt",header=T)
n <- dim(body)[1]
fmod <- lm(Y~. ,data=body) ← f- ftp.XH/zXzTBX3tE
'
anova(fmod)
type Iss
## Analysis of Variance Table
##
## Response: Y
## Df Sum Sq Mean Sq F value Pr(>F)
## X1 1 352.27 352.27 57.2768 1.131e-06 ***
## X2 1 33.17 33.17 5.3931 0.03373 *
## X3 1 11.55 11.55 1.8773 0.18956
## Residuals 16 98.40 6.15
## ---
## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
rmod <- lm(Y~X1+X2,data=body)

Ho :
P3=o Us Ha :
past 0
SSEf = deviance(fmod)
Method I :
SSEr = deviance(rmod)
Ft <- ((SSEr-SSEf)/1)/(SSEf/(n-4)) Hersseflkdfr dfe ,
.
-
Ft - p*=
Feldt
## [1] 1.877289
~¥¥€n÷
pf(Ft,1,n-4,lower.tail=F)
. -
## [1] 0.1895628
19/49
n <- dim(body)[1]
fmod <- lm(Y~. ,data=body) → Full model
rmod <- lm(Y~X1+X2,data=body) → Reduced model
anova(rmod,fmod)

##
## Model 1: Y ~ X1 + X2
## Model 2: Y ~ X1 + X2 + X3
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 17 109.951
## 2 16 98.405 1 11.546 1.8773 0.1896
¥* ICE
,
.by#I=o.l896
20/49
Body fat Example:Testing —1 = —3 = 0
H0 : —1 = —3 = 0 Ha : not both —1 , —3 equal zero.
• Full model:
Yi = —i + —1 X1 + —2 X2 + —3 X3 + ‘
• Reduced model (under H0 ):
Yi = —i + —2 X2 + ‘
• Test statistics
[SSE (X2 ) ≠ SSE (X1 , X2 , X3 )]/[(n ≠ 2) ≠ (n ≠ 4)]

Fú =
SSE (X1 , X2 , X3 )/(n ≠ 4)
SSR(X1 , X3 |X2 )/2
=
SSEF /(n ≠ 4)
MSR(X1 , X3 |X2 )
=
MSEF
21/49
Body fat Example:Testing —1 = —3 = 0
n <- dim(body)[1]; fmod <- lm(Y~X2+X1+X3,data=body)
anova(fmod)

##
## Response: Y
## X2 1 381.97 381.97 62.1052 6.735e-07 ***
## X1 1 3.47 3.47 0.5647 0.4633 SSRCXI ,X}|X2/
## X3 1 11.55 11.55 1.8773 0.1896
## Residuals 16 98.40 6.15 = SSRIX .kz/tSSRlX31Xi.Xy
)
## ---
## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 = 3.47+11.55
u
sum(anova(fmod)[2:3,2])/2/anova(fmod)[4,3] = 15.02
mm
-
SSRCXI ,X}Hz)/z
MSEF =
## [1] 1.22098
MSEp(¥
"
-
-
p*=(sser-ssEf)kdtR-df=-)
-
' -
-
rmod <- lm(Y~X2,data=body) -
⇒
anova(rmod,fmod)
*
##
## Model 1: Y ~ X2
## Model 2: Y ~ X2 + X1 + X3
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 18 113.424
## 2 16 98.405 2 15.019 1.221 0.321 22/49
Comments
• Testing whether a single —k equals zero:

• the t ú test statistic
.
→
refer slides
25 oelq
• the F ú general linear test statistic
ú 2
• F. = (t ) when Xk is the last predictor in the full model using Type
ú
I SS.
• F ú = (t ú )2 for ’k when use Type III SS. refer slides 25126
=
• Testing whether several —k equal zero:

• the F ú general linear test statistic ( partial F test )
• General linear test statistic can be expressed in term of the the
coefficients of multiple determination R 2
(SSER ≠ SSEF )/(dfR ≠ dfF ) (R 2 ≠ RR2 )/(dfR ≠ dfF )

Fú = = F
SSEF /dfF (1 ≠ RF2 )/dfF
• The latter formula using R 2 is not appropriate when the full and
reduced models do not contain —0
23/49
Show

Fú = = F
.
For given Y , ssto are the same for full model and reduced model
ssdefh.su?glssto sser RE
p*= t
.
to
=
stage lssto
]µ€ SEE =
TRI
rarities
=
÷ thee
24/49
Body Fat example: F ú = (t ú )2 using Type III SS
fmod <- lm(Y~X1+X2+X3,data=body)
summary(fmod)
##
## Call:
## lm(formula = Y ~ X1 + X2 + X3, data = body)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.7263 -1.6111 0.3923 1.4656 4.1277
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 117.085 99.782 1.173 0.258
## X1 4.334 3.016 1.437 0.170
## X2 -2.857 2.582 -1.106 0.285
## X3 -2.186 1.595 -1.370 0.190
##
## Residual standard error: 2.48 on 16 degrees of freedom
## Multiple R-squared: 0.8014, Adjusted R-squared: 0.7641
## F-statistic: 21.52 on 3 and 16 DF, p-value: 7.343e-06
(summary(fmod)$coef[,"t value"])^2 ← ¥2
as F- value
same
## (Intercept) X1 X2 X3
## 1.376868 2.065734 1.224212 1.877289
→
in next slide
25/49
Body Fat example: F ú = (t ú )2 using Type III SS
fmod <- lm(Y~X1+X2+X3,data=body)
library(car)
Anova(fmod,type=3)
-
-type
## Anova Table (Type III tests) It ss
##
## Response: Y
## Sum Sq Df F value Pr(>F)
## (Intercept) 8.468 1 1.3769 0.2578
## X1 12.705 1 2.0657 0.1699
## X2 7.529 1 1.2242 0.2849
## X3 11.546 1 1.8773 0.1896
## Residuals 98.405 16
sqrt(Anova(fmod,type=3)[1:4,3])
## [1] 1.173400 1.437266 1.106441 1.370142
26/49
7.3 Summary of Tests concerning Regression
coefficients
27/49
Summary
• Test whether all —k = 0

• overall F test:
MSR
Fú = ≥ Fp≠1,n≠p
MSE
• Test whether a single —k = 0
• Partial F test:
MSR(Xk |X1 , . . . , Xk≠1 , Xk+1 , . . . , Xp≠1 )
Fú = ≥ F1,n≠p
MSE
bk
• F ú = (t ú )2 = s{b k}
true for last predictor using Type I SS, and true
for any k using type III SS.
28/49
Summary (contd.)
• Test whether some —k = 0
H0 : —q = —q+1 = . . . = —p≠1 = 0, Ha : not all —k in H0 equal zero.
• partial F test:

Fú = = F
MSR(Xq , . . . , Xp≠1 |X1 , . . . , Xp≠2 )
= ≥ Fp≠q,n≠p
MSEF
29/49
Summary (contd.)
• Other test using general linear test
Full :Y = —0 + —1 X1 + —2 X2 + —3 X3 + ‘ ← dff = n -4
• H0 : —1 = 2—2 , Ha : —1 ”= 2—2
• Reduced:Y = —0 + —c (2X1 + X2 ) + —3 X3 + ‘ ←
dfr = n
-
• The general F ú test statistics ≥ F1,n≠4

"
dtp dtf
.
=
(h
-
3)
-
( h 4) -
=/
• H0 : —1 = 3; —3 = 5, Ha : not both equalities in H0 hold
• Reduced:Y ≠ 3X1 ≠ 5X3 = —0 + —2 X2 + ‘ ←
dfr= n 2
• The general F ú test statistics ≥ F2,n≠4
-
dfrtdfp =
( h 2) .
-
In -41=2
30/49
7.4 Coefficient of Partial Determination
31/49
Coefficient of Partial Determination
• Coefficient of determination
SSR SSE
R2 = =1≠
SSTO SSTO
• the percentage of the total variation in Y that has been explained by

the model.
• Partial Determination: the amount of remaining variation explained
by a variable given other variables already in the model, this is called
partial determination.
• Coefficient of Partial Determination:
SSE (X2 , X3 ) ≠ SSE (X1 , X2 , X3 ) SSR(X1 |X2 , X3 )
RY2 1|23 = =
SSE (X2 , X3 ) SSE (X2 , X3 )
*
isms -1223*3
# ,
R: .name#xYM=-=sserIsIg
32/49
Coefficient of Partial Determination (contd.)
• Full model: Y = —0 + —1 X1 + —2 X2 + —3 X3 + ‘
• Reduced model: Y = —0 + —1 X2 + —2 X3 + ‘
SSER SSEF
SSE (X2 , X3 ) ≠ SSE (X1 , X2 , X3 ) SSR(X1 |X2 , X3 )
RY2 1|23 = =
SSE (X2 , X3 ) SSE (X2 , X3 )
"
SSER
• Measure relative reduction in Y variance after introducing X1 to
model with X2 , X3 .
• Takes values in [0,1]
• RY2 1|23 = R 2 of regressing residuals of reduced model to residuals of
E (X1 ) = —0 + —1 X2 + —2 X3 .
↳ .
mk lml 't xztx } )
{ TmImYsYrdId×Y×mY$
!raid ) → her ?
' in
33/49
Coefficient of Partial Determination (contd.)
• RY2 1|23 = R 2 of regressing residuals of reduced model to residuals of

E (X1 ) = —0 + —1 X2 + —2 X3 .
• Regress Y on X2 , X3 to get Ŷi (X2 , X3 ) and ei (Y |X2 , X3 )
{ • Regress X1 on X2 , X3 to get X̂i (X2 , X3 ) and ei (X1 |X2 , X3 )
• R 2 between ei (Y |X2 , X3 ) and ei (X1 |X2 , X3 ) will be the same as
RY2 1|23 .
• added variable plots or partial regression plot: the strength of the
relationship between Y and X1 adjusted for X2 , X3 .
ei (Y |X2 , X3 ) vs ei (X1 |X2 , X3 )
• More generally
SSR(Xp , . . . , Xm |X1 , . . . , Xp , . . . Xm ) SSER ≠ SSEF

RY2 p,...,m|1,2,...,p≠1 = =
SSE (X1 , . . . , Xp , . . . , Xm ) SSER
• Coefficient of Partial Correlation

Ò
rY k|1,...,p≠1 = sign(bk ) RY2 k|1,...,p≠1
34/49
7.6 Multicollinearity and its effect
35/49
What is multicollinearity
• Multicollinearity, is also called collinearity or intercorrelation: the

predictor variables are correlated among themselves.
• Uncorrelated predictor variables: the marginal reduction in the SSE
when the other predictor variables are in the model is exactly the
same when the predictor variable is in the model.
• eg. X1 , X2 are uncorrelated, then
SSR(X1 |X2 ) = SSR(X1 ), SSR(X2 |X1 ) = SSR(X2 )
36/49
Uncorrelated predictors
X1 = c(rep(4,4),rep(6,4)) # crew size

X2=c(2,2,3,3,2,2,3,3) # bonus pay
Y=c(42,39,48,51,49,53,61,60) # crew productivity
cor(X1,X2)
## [1] 0 Xi xz :
uncorrelated
,
anova(lm(Y~X1+X2))

##
## Response: Y
## X1 1 231.125 231.125 65.567 0.0004657 ***
## X2 1 171.125 171.125 48.546 0.0009366 ***
## Residuals 5 17.625 3.525 J
## --- SSRCXZIX , 1=171.125
## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
37/49
Uncorrelated predictors
X1 = c(rep(4,4),rep(6,4)) # crew size
X2=c(2,2,3,3,2,2,3,3) # bonus pay
Y=c(42,39,48,51,49,53,61,60) # crew productivity
cor(X1,X2)
## [1] 0
anova(lm(Y~X1))

##
## Response: Y
## X1 1 231.12 231.125 7.347 0.03508 *
## Residuals 6 188.75 31.458
## ---
## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
anova(lm(Y~X2))

##
## Response: Y
## X2 1 171.12 171.125 4.1276 0.08846 . still 37
##
##
Residuals 6 248.75 41.458 >
--- sSRlX2 )= ssrcxzlx ,
,k
## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
38/49
Predictors are perfect correlated
X2 = 5 + 0.5X1
E (Y ) = —0 + —1 X1 + —2 (5 + 0.5X1 ) = (—0 + 5—2 ) + (—1 + 0.5—2 )X1
39/49
Predictors are perfect correlated
• When two predictor variables are perfectly correlated, many response

functions will lead to the same fitted values for the observations.
• The perfect relation between X1 and X2 did not inhabit the ability to
obtain a good fit to the data.
• Since many different response functions provided the same good fit,
we can not interpret any one set of regression coefficients as reflecting
the effects of the different predictor variables.
40/49
Collinearity and its effects: body fat data
41/49
• Collinearity effect on regression coefficients
Variables in model b1 ( aoefotx ) b2 ( wet otk )

,
pvalcaol
.
Ml :
X1 0.8572(ú ú ú) - evidence
strong
X2 - 0.8565(ú ú ú)
←
d.
43 :
X1 , X2 0.2224 0.6594 (ú) moderate
X1 , X2 , X3 4.334 -2.857
a-
M4 evidence
:
aokp ,uol< 0.05

Ml : Y~X ,
MZ :
Y ~X2
M3 : Y -
XHXZ
M4 :
Y~XitXztX3
42/49
• Collinearity effect on s(bk )
Variables in model s(b1 ) s(b2 )

X1 0.1288 -
X2 - 0.1100
X1 , X2 0.3034 0.2912
X1 , X2 , X3 3.016 2.582
The high degree of multicollinearity among the predictor variables is

responsible for the inflated variability of the estimated regression
coefficients.
43/49
• Collinearity effect on fitted values and predictions
Variables in model MSE

X1 7.95
X1 , X2 6.47
X1 , X2 , X3 6.15
• Estimated means and predicted values are not affected.

-
44/49
• Collinearity effect on simultaneous tests of —k

• it is possible that when individual t tests are performed, neither —1 or
—2 is significant.
•
be significant. ÷
However, when the F test is performed for both, the results may still
.
• Need for more powerful diagnostics for multicollinearity.
45/49
Summary of Multicollinearity
• When X’s are orthogonal, XkÕ Xj = 0, so X T X is a diagonal matrix

• parameter estimates are independent, s 2 (b) = MSE (X T X )≠1 is also
diagonal.
• marginal contribution of each X is additive
SSR(X1 , .., Xp≠1 ) = SSR(X1 ) + . . . + SSR(Xp≠1 )
• Type I and Type III SS are equivalent.
q
• When X’s are collinear, Xk = j”=k aj Xj
• Different set of parameters identical mean response function
• marginal contribution of each X depends on which of the other
variables are already in model
• The X T X matrix is not invertible.
46/49
Summary of Multicollinearity
• Effects of multicollinearity
• b’s are highly correlated. cor (bk , bj ) ¥ 1
• b’s have high variance: s(bk ) is high.
• individual estimates appear insignificant
• signs of b’s contrary to intuition.

• problems with parameter interpretation
• bk is the rate change in E(Y) per unit change in Xk , keeping other X’s
fixed, but X Õ s change with Xk .
• inference for E(Y) and predictions remains valid.
• Type I and Type III SS are different, except for the last Type I SS.
47/49
VIF: Diagnostic for multicollinearity
• VIF: Variance Inflation Factors
VIFk = [rXX
≠1
]k,k , k = 1, . . . , p ≠ 1
• k th diagonal element of rXX

≠1
• Alternative definition
1
VIFk =
1 ≠ Rk2
q
• Rk2 is R 2 for E (Xk ) = j”=k
—j Xj
• VIFk measures how much bigger is S 2 (bk ) as compared to a model
.
with independent X’s.
Use of VIF
• If X’s are linearly independent, then VIFk ¥ 1
• If X’s are collinear, then VIF >> 1
• Rule of thumb: if maxk=1,...,p≠1 (VIFk ) > 10, then multicollinearity.
48/49
Practice problems after Week 11 lectures
• keep trying all problems that we have covered today in Ch7:

• 7.2, 7.3, 7.7, 7.8, 7.12, 7.15, 7.20, 7.22, 7.23, 7.27, 7.31.
• Upcoming topic:
• model selection.
• Final review
49/49

STA302 Week11 Full

Uploaded by

Copyright:

Available Formats

STA302 Week11 Full

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

STA302 Week11 Full

Uploaded by

Copyright:

Available Formats

STA302/1001 - Methods of Data Analysis I

(Week 11 lecture notes)

Wei (Becky) Lin

Nov. 21, 2016

• Geometric perspective of least squares regression

• Review of Extra sum of squares ) ⇒ SEN , )

• Type I and Type III SS =ssRlk.Xy SSRLX

• Use of Extra Sum of Squares in Tests for Regression coefficients

• Extra Sum of Squares measures the marginal decrease in SSE

SSECXS ,X} ) SSECXI ,XzX},X41 SSRCX ,X41k,X ,) 5/49

• Depends on variable order

• SSR = SSR(X1 ) + SSR(X2 |X1 ) + . . . SSR(Xp≠1 |X1 , . . . , Xp≠2 )

• Contributions are not additive

SSR(X1 , X2 , X3 ) ”= SSR(X1 ) + SSR(X2 ) + SSR(X3 )

• SSR (Type III SS) decomposition: refers to variable added last.

• Estimates using Type I SS tell us how much of the variation in Y can

Y = —0 + —1 X1 + . . . + —q Xq + —q+1 Xq+1 + . . . + —q+p Xq+p + ‘

• The alternative hypothesis is

HA : not all —q+1 , . . . , —q+p equal zero

• The general linear test approach:

• Decision: reject H0 , in favor of Ha at – significance level if

F ú Ø F1≠–;dfR ≠dfF ,dfF

Y = —0 + —1 X1 + . . . + —q Xq + —q+1 Xq+1 + . . . + —q+p Xq+p + ‘

• From Full model, we get:

SSR(X1 , . . . , Xq , . . . , Xq+p ), SSE (X1 , . . . , Xq , . . . , Xq+p ) =

• From Reduced model, we get:

SSR(X1 , . . . , Xq ), SSE (X1 , . . . , Xq ) =

SSR(Xq+1 , . . . , Xp+q |X1 , . . . , Xq ) =

SSR(Xq+1 , . . . , Xp+q |X1 , . . . , Xq ) =

= SSR(X1 , . . . , Xq , . . . , Xq+p ) ≠ SSR(X1 , . . . , Xq )

SSE (X1 , . . . , Xp+q )/(n ≠ p ≠ q ≠ 1)

• The test whether or not —3 = 0 is a marginal test, given X1 , X2 are

• SSEF = SSE (X1 , X2 , X3 ), df = n ≠ 4

(SSER ≠ SSEF )/(dfR ≠ dfF )

rmod <- lm(Y~X1+X2,data=body)

## Analysis of Variance Table

H0 : —1 = —3 = 0 Ha : not both —1 , —3 equal zero.

[SSE (X2 ) ≠ SSE (X1 , X2 , X3 )]/[(n ≠ 2) ≠ (n ≠ 4)]

## Analysis of Variance Table

rmod <- lm(Y~X2,data=body) -

• Testing whether a single —k equals zero:

• Testing whether several —k equal zero:

(SSER ≠ SSEF )/(dfR ≠ dfF ) (R 2 ≠ RR2 )/(dfR ≠ dfF )

(SSER ≠ SSEF )/(dfR ≠ dfF ) (R 2 ≠ RR2 )/(dfR ≠ dfF )

## [1] 1.173400 1.437266 1.106441 1.370142

• Test whether all —k = 0

• Test whether some —k = 0

H0 : —q = —q+1 = . . . = —p≠1 = 0, Ha : not all —k in H0 equal zero.

(SSER ≠ SSEF )/(dfR ≠ dfF ) (R 2 ≠ RR2 )/(dfR ≠ dfF )

• Other test using general linear test

• The general F ú test statistics ≥ F1,n≠4

• the percentage of the total variation in Y that has been explained by

• RY2 1|23 = R 2 of regressing residuals of reduced model to residuals of

ei (Y |X2 , X3 ) vs ei (X1 |X2 , X3 )

SSR(Xp , . . . , Xm |X1 , . . . , Xp , . . . Xm ) SSER ≠ SSEF

• Coefficient of Partial Correlation

• Multicollinearity, is also called collinearity or intercorrelation: the

X1 = c(rep(4,4),rep(6,4)) # crew size

## Analysis of Variance Table

## Analysis of Variance Table