Chap 7 Multiple Regression Analysis The Problem of Estimation
Chap 7 Multiple Regression Analysis The Problem of Estimation
(7.1.13)
Put differently, it gives the “direct” or the “net” effect of a unit change in X2
on the mean value of Y.
OLS estimators To find the OLS estimators, let us first write the sample
regression function (SRF) corresponding to the PRF of Eq. (7.1.1) as
follows: (7.4.1)
The OLS procedure consists of choosing the values of the unknown
parameters so that the residual sum of squares (RSS) is as small as possible.
(7.4.2)
Simple way is to is to differentiate (7.4.2) with respect to the unknowns, set
the resulting expressions to zero, and solve them simultaneously. This
procedure gives the following normal equations.
Basic
Econometrics Haleema Sadia
From Eq. (7.4.3) we
get
(7.4.22)
ഥ ത
Applying sum and dividing by n, we get �� ��=�� (σ
��2�� = σ ��3�� = 0) Notice that by virtue of Eq. (7.4.22) we
can write
ത
3. ��ෝ�� = ��ො= 0
4. The residuals uˆi are uncorrelated with X2i and X3i.
5. The residuals uˆi are uncorrelated with �� ��.
6. From Eqs. (7.4.12) and (7.4.15) it is evident that as r23, the correlation
coefficient between X2 and X3, increases toward 1, the
variances of ��2 and ��3 increase for given values of
σ2 . In the limit, when r2 3 = 1 (i.e., perfect collinearity), these variances
become infinite.
7. The variances of the OLS estimators are directly proportional to σ2 8.
OLS estimators of the partial regression coefficients not only are linear and
unbiased but also have minimum variance in the class of all linear unbiased
estimators. In short, they are BLUE. Put differently, they satisfy the Gauss–
Markov theorem.
Basic Econometrics Haleema Sadia
7.5 The Multiple Coefficient of Determination R2 and the Multiple
Coefficient of Correlation R
• Note that
R2, like r2, lies between 0 and 1. If it is 1, the fitted regression line
explains 100 percent of the variation in Y. On the other hand, if it is 0,
the model does not explain any of the variation in Y. Typically,
however, R2lies between these extreme values. The fit of the model is
said to be “better’’ the closer R 2is to 1.
• Recall that in the two-variable case we defined the quantity r as the
coefficient of correlation and indicated that it measures the degree
of (linear) association between two variables. The three-or-more
variable analogue of r is the coefficient of multiple correlation,
denoted by R, and it is a measure of the degree of association
between Y and all the explanatory variables jointly .
• Recall that assumption (7.1.10) of the classical linear regression model states
that the regression model used in the analysis is “correctly” specified; that is,
there is no specification bias or specification error.
• Assume that Eq. (7.6.1) is the “true” model explaining the behavior of child
mortality in relation to per capita GNP and female literacy rate (FLR). But
suppose we disregard FLR and estimate the following simple regression:
• Eq. (7.7.1) would constitute a specification error; the error here consists in
omitting the variable X3, the female literacy rate.
• Knowing that we have omitted the variable X3 (FLR) from the model, in
general, ��ෞ2will not be an unbiased estimator of the true β2. To give a
glimpse of the bias, let us run the regression (7.7.1), which gave the following
results.
Basic
Econometrics Haleema Sadia
7.7 Simple Regression in the Context of Multiple Regression: Introduction to Specification Bias Cont…
• Observe several things about this regression compared to the “true” multiple
regression (7.6.1):
1. In absolute terms (i.e., disregarding the sign), the PGNP coefficient has increased
from 0.0056 to 0.0114, almost a two-fold increase.
2. The standard errors are different.
3. The intercept values are different.
4. The ��2values are dramatically different, although it is generally the case that, as
the number of regressors in the model increases, the ��2value increases. • Now
suppose that you regress child mortality on female literacy rate, disregarding the
influence of PGNP. You will obtain the following results:
• Again if you compare the results of this (misspecified) regression with the “true”
multiple regression, you will see that the results are different, although the
difference here is not as noticeable as in the case of regression (7.7.2). The
important point to note is that serious consequences can ensue if you misfit a
model.
��−��………….(7.8.4
It is immediately apparent from Eq. (7.8.4) that
2 2
• for k > 1, ��ത < �� which implies that as the number of X variables
increases, the adjusted ��2increases less than the unadjusted ��2. •
ത
�� 2can be negative, although ��2is necessarily nonnegative. • In case
2
��ത turns out to be negative in an application, its value is taken as zero.
Thiel Criterion
2 2 2
• It is good practice to use ��ത rather than �� because �� tends to give an
overly optimistic picture of the fit of the regression, particularly when the
number of explanatory variables is not very small compared with the number of
observations.
Critique on Theil’s view:
2
He has offered no general theoretical justification for the “superiority’’ of ��ത .
Modified ����
• Goldberger
argues that
the following formula for ��2, called as modified R2 , is better:
His advice is to report ��2, n, and k and let the reader decide how to adjust R2 by
allowing for n and k.
• Besides ��2and adjusted ��2as goodness of fit measures, other criteria are
often used to judge the adequacy of a regression model. Two of these are
Akaike’s Information criterion (AIK) and Scwartz information criteria (SIC),
which are used to select between competing models. We will discuss these
criteria when we consider the problem of model selection in greater detail in a
later chapter
Basic Econometrics Haleema Sadia
Comparing two ����
Basic
Econometrics Haleema Sadia
7.10 Polynomial regression function
• A class of multiple regression models, the polynomial regression models, that
have found extensive use in econometric research relating to cost and
production functions. In introducing these models, we further extend the range
of models to which the classical linear regression model can easily be applied.
• To fix the ideas, consider Figure 7.1, which relates the short-run marginal
cost (MC) of production (Y) of a commodity to the level of its output (X). The
visually-drawn MC curve in the figure, the textbook U-shaped curve, shows
that the relationship between MC and output is nonlinear.
Basic
Econometrics Haleema Sadia
7.10 Polynomial regression function Cont…
EXAMPLE 7.4 Estimating the Total Cost Function