Generalized Additive Model
Generalized Additive Model
Generalized Additive Model
Call:
lm(formula = wage ~ ns(year, 4) + ns(age, 5) + education, data = Wage)
Residuals:
Min
1Q
-120.513 -19.608
Median
-3.583
3Q
14.112
Max
214.535
Coefficients:
***
*
.
**
***
***
***
***
***
***
***
***
##
## Residual standard error: 35.16 on 2986 degrees of freedom
## Multiple R-squared: 0.293, Adjusted R-squared: 0.2899
## F-statistic: 95.2 on 13 and 2986 DF, p-value: < 2.2e-16
par(mfrow =c(1,3))
plot.gam(gam1,se=TRUE,col ="blue")
4. College Grad
10
10
30
2003
2005
2007
2009
30
40
20
10
20
ns(age, 5)
0
2
ns(year, 4)
20
30
10
40
1. < HS Grad
20
40
year
60
80
education
age
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
Pr(>F)
2.877e-06 ***
< 2.2e-16 ***
< 2.2e-16 ***
'*' 0.05 '.' 0.1 ' ' 1
par(mfrow =c(1,3))
plot.gam(gam.m3,se=TRUE,col ="red")
4. College Grad
30
10
10
10
20
s(age, 5)
0
2
40
20
30
s(year, 4)
20
30
40
10
1. < HS Grad
2003
2005
2007
2009
20
40
year
60
age
80
education
Model 2 is preferred
gam.m1=gam(wage~s(age ,5) +education ,data=Wage)
gam.m2=gam(wage~year+s(age ,5)+education ,data=Wage)
gam.m3=gam(wage~s(year, 4) + s(age, 5)+education,data=Wage)
anova(gam.m1, gam.m2 ,gam.m3,test="F")
##
##
##
##
##
##
##
##
##
##
##
Call: gam(formula =
education, data
Deviance Residuals:
Min
1Q
-116.997 -19.319
3Q
14.121
Max
214.445
##
##
##
##
##
##
##
##
##
##
##
##
##
##
par(mfrow=c(1,3))
plot.gam(gam.lo , se=TRUE , col ="green ")
4. College Grad
2003
2005
2007
2009
10
30
30
20
20
10
10
0
2
s(year, df = 4)
20
30
40
1. < HS Grad
20
40
year
60
80
age
education
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
Call: gam(formula = wage ~ lo(year, age, span = 0.5) + education, data = Wage)
Deviance Residuals:
Min
1Q
Median
3Q
Max
-121.293 -19.659
-3.303
13.911 213.067
(Dispersion Parameter for gaussian family taken to be 1234.897)
Null Deviance: 5222086 on 2999 degrees of freedom
Residual Deviance: 3688928 on 2987.235 degrees of freedom
AIC: 29884.6
Number of Local Scoring Iterations: 2
Anova for Parametric Effects
library(akima)
par(mfrow=c(1,2))
plot(gam.lo.i)
4. College Grad
10
20
e
ag
.5)
e, span = 0
lo(year, ag
20
30
1. < HS Grad
year
education
##
##
##
##
##
##
##
##
##
##
##
##
##
##
s(age, df = 5)
1
3.83 3.8262 4.7345 0.02964 *
education
4
65.81 16.4514 20.3569 < 2e-16 ***
Residuals
2989 2415.55 0.8081
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova for Nonparametric Effects
Npar Df Npar Chisq P(Chi)
(Intercept)
year
s(age, df = 5)
4
10.364 0.03472 *
education
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
par(mfrow =c(1,3))
plot(gam.lr,se=T,col ="green")
4. College Grad
200
200
0
2
4
s(age, df = 5)
0.0
0.4
400
0.2
0.2
0.4
400
1. < HS Grad
2003
2005
2007
2009
20
40
year
age
60
FALSE TRUE
268
0
966
5
643
7
8
80
education
##
##
4. College Grad
5. Advanced Degree
663
381
22
45
Remove < HS Grad since no one in this category has wage > 250
gam.lr.s=gam (I(wage >250)~year+s(age ,df=5)+education,family = binomial ,
data=Wage,subset =( education !="1. < HS Grad"))
summary(gam.lr.s)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
par(mfrow=c(1,3))
plot(gam.lr.s,se=T,col =" green ")
5. Advanced Degree
0.4
0
2
6
s(age, df = 5)
0.0
0.2
0.2
0.4
2. HS Grad
2003
2005
2007
year
2009
20
40
60
80
education
age
Reference:
James, Gareth, et al. An introduction to statistical learning. New
York: springer, 2013.
10