Lecture 4 MLR - 1
Lecture 4 MLR - 1
Fu Ouyang
University of Queensland
1 / 30
Outline
2 / 30
Omitted Variable Bias (SW Section 6.1)
3 / 30
In the test score example:
4 / 30
In the test score example:
TestScorei = β0 + β1 STRi + β2 Zi + ei
where Zi is the proportion of ESL students in district i. Now, let’s assume that
the following stories are reasonable;
▶ STR ↑⇒ TestScore ↓. That is, β1 < 0
▶ Zi ↑⇒ English skill ↓ ⇒ TestScorei ↓. So, Zi is part of ui . Indeed β2 < 0
▶ Zi ↑⇒ Educ Budget ↓ ⇒ STRi ↑. So, corr (Zi , STRi ) > 0
Hence, if the equation without Zi is estimated,
TestScorei = β0 + β1 STRi + ui ,
|{z}
=β2 Zi +ei
the effect of Zi on TestScore will be partially absorbed into the effect of STR
on TestScore.
That is, OLS estimate for β1 will overestimate the effect of STR on TestScore.
5 / 30
What does the sample say about this?
6 / 30
Causality and regression analysis
7 / 30
What precisely do we want to estimate when we run a regression?
8 / 30
What is a causal effect?
9 / 30
Back to class size
10 / 30
How does our observational data differ from this ideal?
11 / 30
▶ (Randomization + control group) ⇒ any differences between the
treatment and control groups are random – not systematically related to
the treatment
▶ We can eliminate the difference in PctEL between the large (control) and
small (treatment) groups by examining the effect of class size among
districts with the same PctEL.
▶ If the only systematic difference between the large and small class size
groups is in PctEL, then we are back to the randomized controlled
experiment – within each PctEL group.
▶ This is one way to control for the effect of PctEL when estimating the effect
of STR.
12 / 30
Return to omitted variable bias
13 / 30
The Population Multiple Regression Model (SW Section 6.2)
Yi = β0 + β1 X1i + β2 X2i + ui , i = 1, . . . , n
▶ Y is the dependent variable (or LHS variable)
▶ X1 , X2 are the two independent variables (regressors, RHS variables)
▶ (Yi , X1i , X2i ) denote the i th observation on Y , X1 , and X2 .
▶ β0 = unknown population intercept
▶ β1 = effect on Y of a change in X1 , holding X2 constant
▶ β2 = effect on Y of a change in X2 , holding X1 constant
▶ ui = the regression error (omitted factors)
14 / 30
Interpretation of coefficients in multiple regression
Yi = β0 + β1 X1i + β2 X2i + ui , i = 1, . . . , n
Y = β0 + β1 X1 + β2 X2
▶ Population regression line after the change:
Y + ∆Y = β0 + β1 (X1 + ∆X1 ) + β2 X2
▶ Difference: ∆Y = β1 ∆X1 . So,
15 / 30
The OLS Estimator in Multiple Regression (SW Section 6.3)
16 / 30
Example: the California test score data
17 / 30
Multiple regression in R
18 / 30
Measures of Fit for Multiple Regression (SW Section 6.4)
19 / 30
SER and RMSE
▶ As in regression with a single regressor, the SER and the RMSE are
measures of the spread of the Y ’s around the regression line:
v
u n
u 1 X
SER = t bi2
u
n−k −1
i=1
v
u n
u1 X
RMSE = t bi2
u
n
i=1
20 / 30
R 2 and adjusted R 2
21 / 30
Measures of fit (continued)
22 / 30
The Least Squares Assumptions for Multiple Regression (SW Section 6.5)
23 / 30
The Least Squares Assumptions for Multiple Regression (SW Section 6.5)
24 / 30
The Least Squares Assumptions for Multiple Regression (SW Section 6.5)
25 / 30
The Sampling Distribution of the OLS Estimator (SW Section 6.6)
26 / 30
Multicollinearity, Perfect and Imperfect (SW Section 6.7)
27 / 30
Perfect multicollinearity (continued)
28 / 30
Imperfect multicollinearity
29 / 30
Imperfect multicollinearity, ctd.
30 / 30