Chapter 3 Multiple Regression
Chapter 3 Multiple Regression
2023/2024
Model of k explanatory
In Population (PRF) In Sample (SRF)
E (yi ) = β0 + β1 xi1 + · · · + βk xik yˆi = β̂0 + β̂1 xi1 + · · · + β̂k xik
yi = β0 + β1 xi1 + · · · + βk xik + ui yi = β̂0 + β̂1 xi1 + · · · + β̂k xik + ei
Intercept β0 = E (y |0,...,0 )
∂E (y )
Slope βj = ∂xj
If β1 = · · · = βk = 0: model is overall insignificant
y1 = β0 + β1 x11 + · · · βk x1k + ε1
..
.
yn = β0 + β1 xn1 + · · · βk xnk + εn
y1 1 x11 · · · x1k ε1
y2 1 x21 · · · x2k β0 ε2
.
.. = .. .. .. + ..
.. . .
. . . . . .
βk
yn 1 xn1 · · · xnk εn
y = Xβ + u
ŷ = X β̂
y = X β̂ + e
β̂∥2
β̂) = ∥yy − X β̂
min S(β̂
β̂
β̂)′ (yy − X β̂
β̂) = (yy − X β̂
We have S(β̂ β̂) = yy ′ − 2β̂ ′X ′y + β̂ ′X ′X β̂
β̂. So take
FOC, one gets
X ′X β̂ = X ′y
If X has full column rank (no multicollinearity of x1 , . . . , xk ), then
X ′X )−1X ′y
β̂ OLS = (X (1)
We focus on β̂1 .
We have
n n
! !
X X
2
β̂1 = rˆi1 yi / rˆi1
i=1 i=1
We have
0 = X ′y − X ′X β̂ = X ′e
Which means that e should be perpendicular to every column vector
of matrix X , i.e. perpendicular to the vector space spanned by the
column vectors of X
Condition X ′e = 0 is called the system of normal equations.
X ′X )−1X ′ is orthogonal
Notice that ŷ = X β̂ = PX y , where PX = X (X
projector on vector space spanned by X .
Let MX = I − PX is orthogonal projector on the orthogonal space of
X , e = MX y .
ee ′
P 2
ei
s2 = =
n − (k + 1) n−k −1
Var
d (β̂ X ′X )−1
β̂) = s 2 (X
βj ) = Var
Se(β d (β̂j )
P
yi
Let ȳ = .
n
n
X
Total sum of squares: TSS = (yi − ȳ ), df = n − 1.
i=1
n
X
Explained/Regression sum of squares: ESS = (ŷi − ȳ ), df = k.
i=1
n
X n
X
Residual sum of squares: RSS = (yi − ŷi ) = ei2 ,
i=1 i=1
df = n − 1 − k.
TSS = ESS + RSS.
n−1 RSS n − 1 s2
Ra2 = 1 − (1 − R 2 ) = 1 − =1− 2
n−k −1 TSS n − k − 1 sy
OLS estimator
Under Assumptions 1-4, OLS estimator is unbias, E (β̂ OLS ) = β .
X ′X )−1 .
Under Assumptions 1-6, Var (β̂ OLS ) = σ 2 (X
Moreover
σ2
Var (β̂j OLS ) =
TSSj (1 − Rj2 )
where TSSj = i (xij − x¯j )2 is the total sample variation in xj , and Rj2 is
P
the R-squared from regressing xj on all other independent variables (and
including an intercept).
BLUE
Under Assumptions 1-6, the OLS estimator, β̂ OLS , is a best linear
unbiased estimator (BLUE) of β .
y = β0 + β1 x1 + β2 x2 + β3 x3
y = β0 + β1 x1 + β2 x2 + β3 x3 + u
y = β̃0 + β̃1 x1
σ
b
Se(βbj ) = q
TSSj (1 − Rj2 )
Using what we know about the distribution of βbj (normal) and Se(βbj )
(χ2 ), we get :
βbj − βj
∼ tn−k−1
Se(βbj )
Important t-test
H0 : βj = 0 vs H1 : βj ̸= 0, j = 1, k
βbj
If |t| = > tα/2 , reject H0 : significant coefficient
Se(βbj )
Model: y = β0 + β1 x1 + · · · + βk xk + u
Correlation of xk and y and estimated β
ck may has different sign
Added Variable plot
Regress y = β0 + β1 x1 + · · · + βk−1 xk−1 + u1 , gains e1
Regress xk = α0 + α1 x1 + · · · + αk−1 xk−1 + u2 , gains e2
Plot e2 on e1 → Added Plot, shows relationship of y versus xk
Partial Correlation
t(βck )
ry ,xk |x̸=k = q
ck ))2 + n − k − 1
(t(β
1
x ∗
1
Forecast at x1 = x1∗ , . . . , xk = xk∗ or vector x ∗ = .
..
xk∗
ck x ∗ = x ∗′ βb
Point estimate: yc∗ = βb0 + βb1 x1∗ + · · · + β k
q
′
Standard error: Se(pred) = s 1 + x (X X )−1 x ∗
∗ ′
Confidence interval
y = β0 + β1 x1 + · · · + βk−p xk−p + u
Hypotheses
H0 : βk−p+1 = · · · = βk = 0: Reduced model is correct
H1 : not H0 : Reduced model is not correct
Statistic
(RSSReduced − RSSFull )/p RSSReduced − RSSFull
Fstat = = 2
RSSFull /(n − k − 1) p × sFull
At significant level 5%
1. Test for significant of model
2. Remove variable male, then R 2 = 0.723, RSS = 26.202. Test for
removing male
3. Regress wage on exp, then R 2 = 0.52, RSS = 45.423. Test for
reducing model
4. Test hypothesis that sum of coefficient of exp and edu is 1, if reduced
model has R 2 = 0.6883, RSS = 24.597.
exp 1 2 2 3 4 5 7 10 10 12 15 16
edu 13 12 16 11 15 15 10 15 13 11 13 15
male 1 1 0 0 1 0 1 0 0 1 1 0
wage 6 6 12 6 11 8 8 10 11 10 15 13
exp <-c(1,2,2,3,4,5,7,10,10,12,15,16)
edu <-c(13,12,16,11,15,15,10,15,13,11,13,15)
male <- c(1,1,0,0,1,0,1,0,0,1,1,0)
wage <- c(6,6,12,6,11,8,8,10,11,10,15,13)
intercept <- c(rep(1,12))
explanatory<-data.frame(intercept, exp, edu, male)
X <-data.matrix(explanatory)
y <-data.matrix(wage)
beta <- solve(t(X) %*% X) %*% (t(X) %*% y)
beta
#output
reg1 <-lm(wage ~ exp + edu + male)
summary(reg1)
#variance-covariance matrix
round(vcov(reg1),4)
install.packages("AER")
library(AER)
linearHypothesis(reg1,"exp = 1")