Lecture 3 Multiple Regression Model-Estimation
Lecture 3 Multiple Regression Model-Estimation
Intercept,
Slope parameter,
constant term
coefficient
y 0 1 x1 2 x2 k xk u
Dependent variable, Error term,
Independent variable, disturbance,
explained variable, explanatory variable,
regressand, unobservables,…
right hand side variable
left hand side variable control variable, 1
response variable,… regressor, covariate,…
Motivation for multiple regression
Motivation:
1. Incorporate more explanatory factors into the model
2. Explicitly hold fixed other factors that otherwise
would be in u
3. Allow for more flexible functional forms
cons
1 2 2inc
inc
By how much does
consumption increase if Depends on how much
income is already there 4
income is increased by one
unit?
Example 4:
CEO salary, sales and CEO tenure
Regression residuals
uˆi yi yˆi yi ˆ0 ˆ1 xi1 ˆk xik
Minimize sum of squared residuals
n
min ˆ ui ( 0 , 1 ,
ˆ ˆ
ˆ 2
k)
0 , 1 , k
i 1
7
Minimization will be carried out by computer
Interpretation of the multiple
regression model
Ideally, we want to interprete it as “By how much
y does the dependent variable change if the j-th
independent variable is increased by one unit,
x j
j
holding all other independent variables constant”
“Ceteris paribus”-interpretation
8
FORMAL INTERPRETATION
Formally, if
then
and
“By how much does the expected
(average) value of the dependent variable
change if the j-th independent variable is
increased by one unit, holding all other
independent variables constant”
Interpretation
Holding ACT fixed, another point on hsGPA is associated with
another .453 points college grade point average
Or: If we compare two students with the same ACT, but the
hsGPA of student A is one point higher, we predict student A
to have a colGPA that is .453 higher than that of student B
uˆi 0
i 1
uˆi xi 0
i 1
y ˆ0 ˆ1 x1 ˆk x k
Interpretation:
Average prior sentence increases number of arrests (?)
Limited additional explanatory power as R-squared increases
by little
General remark on R-squared
Even if R-squared is small (as in the given example),
regression may still provide good estimates of ceteris paribus
effects
15
QUESTION
3. Which of the following is true of R2?
a. R2 is also called the standard error of regression.
b. A low R2 indicates that the Ordinary Least Squares line fits
the data well.
c. R2 usually decreases with an increase in the number of
independent variables in a regression.
d. R2 shows what percentage of the total variation in the
dependent variable, Y, is explained by the explanatory
variables.
yi 0 1 xi1 2 xi 2 k xik ui
17
Each data point therefore follows the population equation
Standard Assumptions
Assumption MLR.3 (No perfect collinearity)
In the sample (and therefore in the population), none of the
independent variables is constant and there are no exact
relationships among the independent variables
Remarks on MLR.3
The assumption only rules out perfect collinearity/correlation
between explanatory variables; (allow imperfect correlation )
•Example 2:
cons 0 1husbIncome 2 wifeIncome
3 familyIncome u
•Example 3:
expenditure 0 1deathrate 2 surviverate u
19
QUESTION
5. If an independent variable in a multiple linear regression
model is an exact linear combination of other independent
variables, the model suffers from the problem of _____.
a. perfect collinearity
b. homoskedasticity
c. heteroskedasticty
d. omitted variable bias
E u | x1 , x2 ,..., xk 0
The key assumption for unbiased estimators and for the “ceteris
paribus” interpretation
= 0 in the population
No problem because .
24
Omitted variable bias
26
Omitted variable bias: more general cases
True model
(contains x1, x2 and x3)
27
QUESTION
7. Exclusion of a relevant variable from a multiple linear
regression model leads to the problem of _____.
a. misspecification of the model
b. multicollinearity
c. perfect collinearity
d. homoskedasticity
with
29
All explanatory variables are
collected in a random vector
Theorem 3.2 (Sampling variances of OLS slope estimators)
30
Components of OLS Variances:
1) The error variance
A high error variance increases the sampling variance because there
is more “noise” in the equation
A large error variance necessarily makes estimates imprecise
The error variance does not decrease with sample size
For precise estimates of the differential effects, one would need information
about situations where expenditure categories change differentially.
33
Discussion of the multicollinearity problem
In the above example, it might be better to lump all expenditure
categories together because effects cannot be disentangled.
The estimated
sampling variation of
the estimated
36
Efficiency of OLS:
The Gauss-Markov Theorem
Under assumptions MLR.1 - MLR.5, OLS is unbiased
However, under these assumptions there may be many other
estimators that are unbiased
Which one is the unbiased estimator with the smallest variance?
In order to answer this question one usually limits oneself to linear
estimators, i.e. estimators linear in the dependent variable
38
QUESTION
9.Find the degrees of freedom in a regression model that has
10 observations and 7 independent variables.
a. 17
b. 2
c. 3
d. 4