Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

5 - 7. MR - Estimation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

5 - 7: Multiple Regression Analysis - Estimation∗

Learning Objectives

• Multiple Regression Analysis


• Mechanics & Interpretation of OLS
• Expected Value of the OLS Estimators
• Variance of OLS Estimators
• Efficiency of OLS

Multiple Regression (MR) Analysis

• Allows to explicitly control for many factors that affect the dependent variable.
• Hence, more amenable to ceteris paribus analysis.
• Naturally, if we add more factors to our model, then more of the variation in y can be
explained.
• It can also incorporate fairly general functional form relationships.

MR: Model with Two Independent Variables

• Wage Example.
– wage = β0 + β1 educ + β2 exper + u.
– W age is determined by the two independent variables.
– Other unobserved factors are contained in u.
– MR effectively takes exper out of the u and puts it explicitly in the equation.
– We still have to make assumptions about how u is related to educ and exper. (zero-
conditional mean)
– However, we can be confident of one thing.
– Because equation contains exper explicitly, we will be able to measure the effect of educ
on wage, holding exper fixed.
– In a simple regression analysis - which puts exper in the error term - we would have to
assume that exper is uncorrelated with educ, a tenuous assumption.

MR: Model with Two Independent Variables

• Average Test Score Example.


– Effect of per-student spending (expend) on the average standardized test score (avgscore)
at the high school level.
– avgscore = β0 + β1 expend + β2 avginc + u.
∗ Reference: WR, Chapter 3

1
– avginc: average family income.
– By including avginc explicitly in the model, we are able to control for its effect on
avgscore.
– In simple regression, avginc would be included in u, which would likely be correlated
with expend, causing the OLS estimator of β1 to be biased.

MR: Model with Two Independent Variables

• General model with two independent variables


– y = β0 + β1 x1 + β2 x2 + u.
– The key assumption about how u is related to x1 and x2 .
– E(u|x1 , x2 ) = 0.

MR & Functional Forms

• Family Consumption Example.


– Suppose family consumption (cons) is a quadratic function of family income (inc).
– cons = β0 + β1 inc + β2 inc2 + u.
– Mechanically, there will be no difference in using the method of OLS.
– However, an important difference in how one interprets the parameters.
– It makes no sense to measure the effect of inc on cons while holding inc2 fixed.
– Instead, the change in consumption with respect to the change in income - the marginal
propensity to consume is approximated by
– ∆cons
∆inc ≈ β1 + 2β2 inc.

MR with k Independent Variables

• Multiple regression analysis allows many observed factors to affect y.


• The general multiple linear regression (MLR) model
– y = β0 + β1 x1 + β2 x2 + ... + βk xk + u.
– The terminology is similar to that for simple regression.
– There will always be factors we cannot include and are in u.
– The key assumption in terms of a conditional expectation:
– E(u|x1 , x2 , ..., xk ) = 0.
– At a minimum, this requires that all factors in u be uncorrelated with the explanatory
variables.
– It also means that we have correctly accounted for the functional relationships.

Mechanics & Interpretation of OLS

• OLS for two independent variables.


– ŷ = β̂0 + β̂1 x1 + β̂2 x2 .
– The
Pn method of OLS chooses the estimates to minimize the sum of squared residuals.
2 2
– û
i=1 i = (y i − β̂0 − β̂1 x1 − β̂2 x2 ) .

Mechanics & Interpretation of OLS

• Case with k independent variables.


– OLS: ŷ = β̂0 + β̂1 x1 + β̂2 x2 + ... + β̂k xk .

2
– The
Pn method of OLS chooses the estimates to minimize the sum of squared residuals.
2 2
– û
i=1 i = (yi − β̂0 − β̂1 xi1 − β̂2 xi2 − ... − β̂k xik ) .

Mechanics & Interpretation of OLS: Interpretation

• Two independent variables.


– ŷ = β̂0 + β̂1 x1 + β̂2 x2 .
– β̂1 and β̂2 have partial effect. We can obtain the predicted change in y given the
changes in x1 and x2 .
– ∆ŷ = β̂1 ∆x1 + β̂2 ∆x2 .
– In particular, when x2 is held fixed.
– ∆ŷ = β̂1 ∆x1 .

Mechanics & Interpretation of OLS: Interpretation

• Two independent variables: Determinants of College GPA.


– OLS line to predict college GPA from high school GPA and achievement test score.
\A = 1.29 + 0.453hsGP A + 0.009ACT .
– colGP
– Because no one has either a zero on hsGPA or ACT, the intercept in this equation is
not, by itself, meaningful.
– Positive partial relationship between colGP A and hsGP A.
– Holding ACT fixed, another point on hsGP A is associated with .453 of a point on the
colGP A.

Mechanics & Interpretation of OLS: Interpretation

• More than two independent variables.


– ŷ = β̂0 + β̂1 x1 + β̂2 x2 + ... + β̂k xk .
– ∆ŷ = β̂1 ∆x1 + β̂2 ∆x2 + ... + β̂k ∆xk .
– β̂1 measures the change in ŷ due to a one-unit increase in x1 , holding all other variables
fixed.
– ∆ŷ = β̂1 ∆x1 .

Mechanics & Interpretation of OLS: Interpretation

• More than two independent variables: Hourly Wage Example.


\ = 0.284 + 0.092educ + 0.0041exper + 0.022tenure.
– log(wage)
– Holding exper and tenure fixed, another year of education is predicted to increase the
wage by 9.2%.

MR: Holding other Factors Fixed

• College GPA Example.


\A = 1.29 + 0.453hsGP A + 0.009ACT .
– colGP
– The coefficient on ACT measures the predicted difference in colGP A, holding hsGP A
fixed.
– MR provides this ceteris paribus interpretation even though the data have not been
collected in a ceteris paribus fashion.

3
– It may seem that we actually went out and sampled people with the same high school
GPA but possibly with different ACT scores.
– If we could collect a sample of individuals with the same high school GPA, then we
could perform a simple regression analysis relating colGP A to ACT .
– MR effectively allows us to mimic this situation without restricting the values of any
independent variables.
– MR allows us to keep other factors fixed in nonexperimental environments.

MR: Changing More Than One x Simultaneously

• Hourly Wage Example


\ = 0.284 + 0.092educ + 0.0041exper + 0.022tenure.
– log(wage)
– Estimated effect on wage when exper and tenure both increase by 1 year (holding educ
fixed).
\
– ∆log(wage) = 0.0041∆exper + 0.022∆tenure = 0.0041 + 0.022 = 0.0261 = 2.61%.

MR: Fitted Values & Residuals

-Fitted values - ŷi = β̂0 + β̂1 xi1 + β̂2 xi2 + ... + β̂k xik .
• Residuals
– ûi = yi − ŷi .

MR: “Partialling Out” Interpretation

• Consider the case of two independent variables.


– ŷ = β̂0 + β̂1 x1 + β̂2 x2 . Pn
r̂i1 yi
– One way to express β̂1 is: β̂1 = Pi=i
n 2 r̂i1
i=1
– r̂i1 are the OLS residuals from a simple regression of x1 on x2 .
– We regress x1 on x2 , and then obtain the residuals.
– Then do a simple regression of y on r̂1 to obtain β̂1 .
• Partial effect interpretation.
– r̂i1 is xi1 after the effects of xi2 is partialled out or netted out.
– Thus, β̂1 measures the sample relationship between y and x1 after x2 has been partialled
out.
– In general model with k explanatory variables, r̂i1 come from the regression of x1 on
x2 , ..., xk .
– Thus, β̂1 measures the relationship between y and x1 after x2 , .., xk has been partialled
out.

Comparison of SR & MR

• Equations
– SR: ỹ = β̃0 + β̃1 x1 .
– MR: ŷ = β̂0 + β̂1 x1 + β̂2 x2 .
• Relationship between β̃1 and β̂1 .
– β̃1 = β̂1 + β̂2 δ̃1 .
– δ̃1 is the slope coefficient from the simple regression of x2 on x1 .

4
• Two cases when they are equal.
– The partial effect of x2 on ŷ is zero in the sample. That is, β̂2 = 0.
– x1 and x2 are uncorrelated in the sample. That is, δ̃1 = 0.

Comparison of SR & MR

• Example: Participation in 401(k) pension plans.


– The effect of a plan’s match rate (mrate) on the participation rate (prate) in its 401(k)
pension plan.
– OLS: p\ rate = 80.12 + 5.52mrate + .243age.
– age is the age of 401(k) plan.
– What happens if we do not control for age?
– OLS: p\ rate = 83.08 + 5.86mrate.
– There is a difference in coefficient of mrate, but it is not big.
– This can be explained by the fact that the sample correlation between mrate and age is
only .12.

Comparison of SR & MR

• In case with k independent variables.


• SR of y on x1 and MR of y on x1 , x2 , .., xk produce an identical estimate of x1 only if
– OLS coefficients on x2 through xk are all zero.
– x1 is uncorrelated with each of x2 , ..., xk .

MR: Goodness of Fit

• Way of measuring how well the independent variables explains the dependent variable.
• For a sample, we can define the following:
Pn
– Total Sum of Squares (SST) = i=1P (yi − ȳ)2 .
n
– Explained Sum of Squares (SSE) =P i=1 (ŷi − ȳ)2 .
n
– Residual Sum of Squares (SSR) = i=1 ûi 2 .

MR: Goodness of Fit

• R-squared of the regression (also called coefficient of determination).


– R2 = SSE SSR
SST = 1 − SST
– Fraction of the sample variation in y that is explained by x1 , .., xk .
– The value of R2 is always between zero and one, because SSE can be no greater than
SST.
– R2 can also be shown to equal the squared correlation coefficient between the actual yi
Pn values yˆi . ¯ 2
and the fitted
2 ( (yi −ȳ)(ŷi −ŷ))
– R = Pn i=1 2 Pn
( (yi −ȳ) )( ¯ 2 (ŷi −ŷ) )
i=1 i=1

MR: Goodness of Fit

• Example: Determinants of College GPA.


\A = 1.29 + 0.453hsGP A + 0.009ACT .
– colGP
– n = 141, R2 = 0.176.

5
– This means that hsGP A and ACT together explain about 17.6% of the variation in
colGP A.

MR: Goodness of Fit

• Important fact about R2 .


– It never decreases, and it usually increases, when another independent variable is added
to a regression and the same set of observations is used for both regressions.
– This is an algebraic fact: SSR never increases when additional regressors are added to
the model.
– However, it assumes we do not have missing data on the explanatory variables.
– If two regressions use different sets of observations, then, in general, we cannot tell how
the R2 will compare.

MR: Regression through Origin

• OLS estimation when the intercept is zero.


– ỹ = β̃1 x1 + β̃2 x2 + ... + β̃k xk .
– The OLS estimates minimize the sum of squared residuals, but with the intercept set at
zero.
– However, the properties of OLS no longer hold for regression through the origin.
– In particular, the OLS residuals no longer have a zero sample average.
– One serious drawback: If β0 in the population model is different from 0, then the β̂j will
be biased.

Expected Value of the OLS Estimators

• Assumptions, under which the OLS estimators are unbiased for the population parameters.
– Assumption MLR 1: Linear in Parameters.
– Assumption MLR 2: Random Sampling.
– Assumption MLR 3: No Perfect Collinearity.
– Assumption MLR 4: Zero Conditional Mean.

Expected Value of the OLS Estimators

• Assumption MLR 1: Linear in Parameters.


– Simply defines the MLR model.
– The population model can be written as.
– y = β0 + β1 x1 + β2 x2 + ... + βk xk + u.

Expected Value of the OLS Estimators

• Assumption MLR 2: Random Sampling.


– We have a random sample following the population model: {(xi1 , xi2 , ..., xik , yi ) : i =
1, 2, .., n}.
– Equation of a particular observation i, in terms of population model.
– yi = β0 + β1 xi1 + β2 xi2 + ... + βk xik + ui .
– The term ui contains the unobserved factors for ith observation.
– OLS chooses the β0 and βk so that the residuals average to zero and the sample correlation
between each independent variable and the residuals is zero.

6
Expected Value of the OLS Estimators

• Assumption MLR 3: No Perfect Collinearity.


– None of the x is constant, and there are no exact linear relationships among the
independent variables.
– If an independent variable is an exact linear combination of the other independent
variables, then we say the model suffers from perfect collinearity, and it cannot be
estimated by OLS.
– MLR.3 does allow the independent variables to be correlated; they just cannot be
perfectly correlated.
– The simplest way that two independent variables can be perfectly correlated is when
one variable is a constant multiple of another.
– Same variable measured in different units in a regression equation.

Expected Value of the OLS Estimators

• Assumption MLR 3: No Perfect Collinearity.


– Nonlinear functions of the same variable can appear among the regressors. e.g. inc and
inc2 .
– Some caution in log models.
– log(cons) = β0 + β1 log(inc) + β2 log(inc2 ) + u.
– x1 = log(inc) and x2 = log(inc2 )
– Using basic properties of natural log: log(inc2 ) = 2log(inc).
– This means, x2 = 2x1 . Hence perfect collinearity.
– Rather include [log(inc)]2 in the equation.

Expected Value of the OLS Estimators

• Assumption MLR 3: No Perfect Collinearity.


– The solution to the perfect collinearity is simple: drop one of the linearly related
variables.
– MLR.3 also fails if the sample size is too small in relation to the number of parameters
being estimated.
– If the model is carefully specified and n ≥ k + 1, Assumption MLR.3 can fail in rare
cases.

Expected Value of the OLS Estimators

• Assumption MLR 4: Zero Conditional Mean.


– E(u|x1 , x2 , .., xk ) = 0.
– One way that MLR.4 can fail is if the functional relationship is misspecified.
– e.g. non inclusion of non linear term, log term.
– Omitting an important factor that is correlated with any of x1 , x2 , ..., xk causes MLR.4
to fail also.
– Problem of measurement error in an explanatory variable: Failure of MLR.4.
– When Assumption MLR.4 holds, we say that we have exogenous explanatory variables.
– If xj is correlated with u for any reason, then xj is said to be an endogenous explanatory
variable.

7
Expected Value of the OLS Estimators

• Unbiasedness of OLS: Under Assumptions MLR.1 through MLR.4.


– E(β̂j ) = βj , j = 0, 1, 2, ..., k.
– The OLS estimators are unbiased estimators of the population parameters.
– Exact meaning: Procedure by which the OLS estimates are obtained is unbiased when
we view the procedure as being applied across all possible random samples.

Expected Value of the OLS Estimators

• Including Irrelevant Variables in a Regression Model.


– Overspecifying the model.
– Population model: y = β0 + β1 x1 + β2 x2 + β3 x3 + u.
– This model satisfies Assumptions MLR.1 through MLR.4.
– However, x3 has no effect on y after x1 and x2 have been controlled for.
– E(y|x1 , x2 , x3 ) = E(y|x1 , x2 ) = β0 + β1 x1 + β2 x2
– The variable x3 may or may not be correlated with x1 and x2 .
– Including one or more irrelevant variables does not affect the unbiasedness of the OLS
estimators.
– However, there is undesirable effects on the variances of the OLS estimators.

Expected Value of the OLS Estimators

• Omitted Variable Bias: The Simple Case.


– Problem of excluding a relevant variable or underspecifying the model. Misspecification
Analysis.
– Suppose, the true population model is: y = β0 + β1 x1 + β2 x2 + u.
– However, due to our ignorance or data unavailability, we estimate the model by excluding
x2 .
– ỹ = β̃0 + β̃1 x1 .
– Deriving expected value of β̃1 .
– E(β̃1 ) = β1 + β2 δ̃1 .
– δ̃1 is the slope from the simple regression of x2 on x1 .
– Hence, bias in β̃1 can be derived as:
– Bias(β̃1 ) = β2 δ̃1 . (Also called as the omitted variable bias).

Expected Value of the OLS Estimators

• Omitted Variable Bias: The Simple Case.


– Two cases when β̃1 is unbiased
– β2 is 0
– δ̃1 is 0: If x1 and x2 are uncorrelatd in the sample.
– However in reality, we do not observe x2 .
– Hence, we will only have the idea about the direction of β2 and the sign of correlation
between x1 and x2 .

Expected Value of the OLS Estimators

• Omitted Variable Bias: The Simple Case.


– Summary of Bias.

8
Expected Value of the OLS Estimators

• Omitted Variable Bias: Hourly Wage Example


– Suppose true model is: log(wage) = β0 + β1 educ + β2 abil + u.
– However, we do not have data for abil: ability.
^ = 0.584 + 0.083educ.
– Hence we obtain: log(wage)
– This is the result from only a single sample, so we cannot say that .083 is greater than
β1 .
– Nevertheless, the average of the estimates across all random samples would be too large.

Expected Value of the OLS Estimators

• Omitted Variable Bias.


– Upward Bias: E(β̃1 ) > β1 .
– Downward Bias: E(β̃1 ) < β1 .
– Biased towards zero: Cases where E(β̃1 ) is closer to 0 than is β1 .
– If β1 is positive, then β̃1 is biased toward zero if it has a downward bias.
– If β1 is negative, then β̃1 is biased toward zero if it has a upward bias.

Expected Value of the OLS Estimators

• Omitted Variable Bias: General Case.


– Deriving the sign of bias when there are multiple regressors is more difficult.
– Correlation between a single x and u generally results in all OLS estimators being biased.
• Wage Example.
– Suppose, true model: wage = β0 + β1 educ + β2 exper + β3 abil + u.
– If abil is omitted, both β1 and β2 are biased, even if we assume exper is uncorrelated
with abil.
– We could have the idea of bias in β1 only if could assume: exper and abil, & educ and
abil are uncorrelated.
– Then we can say: Because β3 > 0 and educ and abil are positively correlated, β̃1 would
have an upward bias.

Variance of OLS Estimators

• AssumptionMLR 5: Homoscedasticity.
– V ar(u|x1 , x2 , ..., xk ) = σ 2 .
– Formulas are simplified.
– OLS has an important efficiency property.
– Assumptions MLR.1 through MLR.5 are collectively known as the Gauss - Markov
assumptions.
– x (bold x) to denote all independent varibles.
– V ar(y|x) = σ 2 .

9
Variance of OLS Estimators

• Sampling variances of the OLS slope estimators.


– Under Assumptions MLR.1 through MLR.5,
2
– V ar(β̂j ) = SSTjσ(1−R2 ) .
j
Pn
– SSTj = i=1 (xij − x̄j )2 .
– Rj2 is the R-squared from regressing xj on all other independent variables.
– A large variance means a less precise estimator: Larger confidence intervals and less
acurate hypothesis testing.

Variance: Components of OLS Variances

• The Error Variance, σ 2 .


– Larger σ 2 means larger sampling variances.
– More “noise” in the equation (larger σ 2 ) makes it difficult to estimate the partial effect.
– However, it is unknown and we need to estimate it.
• The total sample variance in xj , SSTj .
– Larger the total variation in xj , smaller the variance of the estimators.
– One way to increase sample variation in xj : Increase the sample size.

Variance: Components of OLS Variances

• Linear relationships among the independent variables, Rj2 .


– Rj2 is the R-squared from regressing xj on all other independent variables.
– Two Independent variables: y = β0 + β1 x1 + β2 x2 + u.
2
– V ar(β̂1 ) = [SST1σ(1−R2 )]
1
– R12 is the R-squared from regressing x1 on x2 .
– R12 close to 1: Much of the variation in x1 is explained by x2 . High correlation.

Variance: Components of OLS Variances

• Linear relationships among the independent variables, Rj2 .


– General Case: Rj2 is the proportion of variation in xj that can be explained by the other
independent variables.
– Rj2 close to 0: when xj has near 0 sample correlation with every other independent
variable.

Variance: Components of OLS Variances

• Linear relationships among the independent variables, Rj2 .


– Multicollinearity: High (but not perfect) correlatioin between two or more independent
variables.
– Not a violation of Assumption MLR.3.
– Way to deal with the problem of multicollinearity: Increase the sample size.
– In addition, a high correlation between certain variables might be irrelevant.
– If we are interested in β1 , then a high correlation between x2 and x3 have no direct
effect on V ar(β̂1 ).

10
Variance: Components of OLS Variances

• Linear relationships among the independent variables, Rj2 .


– Statistic to to determine the severity of multicollinearity (However, easy to misuse).
– Variance Inflation Factor.
1
– V IFj = (1−R 2) .
j
– Hence variance can be written as:
σ2
– V ar(β̂1 ) = SST j
V IFj .

Variance: Variances in Misspecified models

• The choice of whether to include a particular variable in a regression model.


– The tradeoff between bias and variance.
• Suppose the True Model is: y = β0 + β1 x1 + β2 x2 + u.
• Two estimators of β1 .
– β̂1 from multiple regression: ŷ = β̂0 + β̂1 x1 + β̂2 x2 .
– β̃1 from simple regression: ỹ = β̃0 + β̃1 x1 .
– If β2 6= 0, β̃1 is biased, unless x1 and x2 are uncorrelated.
– However, β̂1 is unbiased for any value of β1 .
– Hence, if bias is the only criteria, then β̂1 is preferred.

Variance: Variances in Misspecified models

• However, variance is also important.


• V ar(β̃1 ) is always smaller than V ar(β̂1 ). If x1 and x2 are uncorrelated, then both are same.
• Assuming x1 and x2 are not uncorrelated, following conclusions can be drawn.
– When β2 6= 0, β̃1 is biased, β̂1 is unbiased, and V ar(β̃1 ) < V ar(β̂1 ).
– When β2 = 0, β̃1 and β̂1 are both unbiased, and V ar(β̃1 ) < V ar(β̂1 ).
• If β2 = 0, then including x2 will increase the variance of the estimator of β1 .
• If β2 6= 0 , then excluding x2 will result in a biased estimator of β1 .
– Comparing the likely size of the omitted variable bias with the reduction in the variance.

Variance: Variances in Misspecified models

• However, when β2 6= 0, there are 2 favorable reasons for including x_2in the model.
– Bias in β̃1 does not shrink, as the sample size increases but variance can be reduced.
– σ 2 increases when x2 is dropped from the equation.

Variance: Estimating σ 2

• Unbiased estimator of σ 2 in general multiple regression.


• Under the Gauss-Markov assumptions MLR.1 through MLR.5, E(σ̂ 2 ) = σ 2 .
Pn 2
û SSR
• σ̂ 2 = (n−k−1)
i=1 i
= (n−k−1) .
• σ̂ = Standard error of regression, standard error of the estimate, root mean squared error.

11
σ̂
• se(β̂j ) = [SSTj (1−Rj2 ]1/2
. Invalid in presence of heteroscedasticity.

• Thus, heteroscedasticity does not cause bias in β̂j , but lead to bias in the V ar(β̂j ), which
then invalidates the standard errors.
• Standard errors can also be written as:
– se(β̂j ) = √nsd(x σ̂)√1−R2 .
j j

Efficiency of OLS: The Gauss-Markov Theorem

• Justifies the use of the OLS method rather than using a variety of competing estimators.
• Gauss-Markov Theorem.
– Under assumptions MLR 1 through MLR 5, the OLS estimator β̂j for βj is the best
linear unbiased estimator (Efficient).
– Best: Having the smallest variance.
– Linear: If it can be expressed as a linear function of the dependent variable.
– Unbiased: E(β̂j ) = βj .
– Estimator: It is a rule that can be applied to any sample of data to produce an estimate.

12

You might also like