Unit-2
Unit-2
Structure
2.0 Objectives
2.1 Introduction
2.2 Omission of Relevant Variables
2.2.1 Specification Errors: Illustration
2.2.2 Regression Output
2.0 OBJECTIVES
After reading this unit, you will be able to:
• state the factors which need to be considered while specifying a model for
econometric analysis;
• delineate, theoretically, the consequences of omitting relevant variables in an
exercise of econometric modelling;
• explain why it is more important to accord due attention to the underlying
theoretical considerations in specifying a model for empirical investigation;
• write a note on the useful ‘indicators’ in a ‘regression output’;
• show why it is better to err by including an ‘irrelevant variable’ as compared to
omitting a ‘relevant variable’ in a regression model;
• outline why the consequences of ‘errors in measurement of dependent variable’
is less serious as compared to that in the ‘independent variables’; and
• discuss the consequences of ‘errors in measurement’ in ‘independent variables’.
2.1 INTRODUCTION
Model specification refers to the very beginning of the process of developing
a regression model. Here, we decide which variables should be included for
empirical investigation, which of these are justified to be treated as
‘independent variables’ [to appear on the ‘right hand side’ (RHS) of the
equation (or the regression model)], whether the nature of these variables is
26 *
Rimpy Kaushal, PGDAV College, Delhi.
quantitative or qualitative, etc. We also decide on the ‘dependent variable’ Specification Issues
which appears on the ‘left hand side’ (LHS) of the equation. In this process,
while we technically assume that the regression model is correctly specified,
in practice, an exact specification of the model is difficult. Though economic
theory helps us in deciding on the specification of regression model, theory
can itself be questioned or may prove ambiguous to select suitable variables.
Therefore, unknowingly, we commit specification errors like: omitting a
variable from the model that should be included, including an irrelevant
variable in the model, miss-specification of the functional form of the model
or make errors of measurement. Thus, specification errors may occur at two
stages viz. (i) when the functional form considered (vis-à-vis the explanatory
variables to be included in the regression model) is not close to the true
relationship in the population or (ii) when errors are committed while
measuring the variables.
� = �� + �� �� + �� �� + �� (2.1)
Not knowing this, suppose we wrongly construe and specify the model as:
�� = �� + �� ��� + �� (2.2)
We know that the estimate of ��, ��� or the OLS estimator for ��, can be
obtained from Equation 2.2. as:
�
∑ (��� ��¯� )��
��� = ∑���
� (�
��¯ )�
(2.3)
��� �� �
ii) The OLS estimates will also be inconsistent to the extent that even in
large samples, the estimates would remain biased.
28
iii) If X2 and X3 are not related in the sample, the value of �^� will be zero and Specification Issues
the estimate for slope coefficient will be unbiased as well as consistent.
However, the estimate for intercept would remain biased, unless the
mean of X3 is zero.
iv) The error variance estimated from the mis-specified model will also be a
biased estimator of the true error variance σ2 .Consequently, the estimates
for the variance of slope coefficient will also be biased and the variance
of the estimated slope coefficient ��� of the miss-specified model will be
overestimated.
Check Your Progress 1 [answer within the space given in about 50-100
words]
�� = �� + �� ��� + �� (2.6)
� = �� + �� �� + �� �� + �� (2.7)
iii) The estimator for variance of �� i.e. ��� is unbiased and correctly
estimated.
Thus, the difference between two variances will be large depending on how
large and close to 1 or – 1 is the correlation coefficient ��� �� . This difference
will be zero if the above correlation coefficient is also equal to zero. In this
case, the variance of estimator of �� from both (2.6) and (2.7) will be
identical. We can illustrate this by considering the regression result of
LGAEOF (i.e. logarithm of total annual household expenditure on food) on
LGTAHEXP (i.e. logarithm of total annual household total expenditure) and
LGNOP (i.e. logarithm of number of persons in the household) [from the data
collected for the 6334 households in the consumer expenditure survey
considered in sub-section 2.2.1]. Recall that this model was assumed to be
specified correctly. Now, if we include another variable LGHOUS (logarithm
of annual expenditure on housing services) without the theoretical
justification for its inclusion, we get the following result (Table 2.3). It is just
by chance that the coefficient of the variable LGHOUS is statistically
significant. Further, despite the inclusion of this variable, although the
estimates of the slope coefficient remain unbiased (since the coefficients of
LGTAHEXP from the two regressions are not much apart: being 0.58 and
0.64 respectively), their standard errors have increased (from 0.0097 to
0.0126) leading to a loss in efficiency. Note that this is not the case for
LGNOP since both the slope coefficient and their ‘standard errors’ (SEs) are
close (i.e. 0.33 & 0.32 and 0.0128 & 0.0129 respectively).
If we omit a variable that is relevant for the model, then the estimates of
regression coefficients become biased, inconsistent and inefficient with the
result that the usual hypothesis testing procedures (based on t and F-test)
becomes invalid. In other words, the estimates of the model lose their
relevance. On the other hand, if we include an irrelevant or unnecessary
variable, not only the OLS estimators still remain unbiased and consistent,
the hypothesis testing procedures remain valid. However, the efficiency of
the estimates of regression coefficients gets highly compromised in the sense
that larger variances lead to wider confidence intervals. As a result, in some
cases, we may fail to reject the null hypothesis of no significance. We can,
therefore, conclude that it is better to include irrelevant variables than to omit
a relevant variable. But this approach should not be stretched as there is a
cost for such inclusion in terms of both loss in efficiency and the degree of
freedom. The best approach is to include only those variables that are
theoretically justified.
�� = �� + �� ��� + �� (2.8)
34
Since Y is the value that is actually sought to be measured empirically, with Specification Issues
say �� as the measurement error, we have:
�� = �� + �� or �� = �� − ��
�� = �� + �� ��� + �� (2.9)
where �� = �� + �� . Note that (2.9) is different from (2.8) in the sense that
the error term �� has two components: (i) the error term from the original
model (�� ) and (ii) the error of measurement (�� ). Since the explanatory
variable remain unaffected, the OLS estimates of the regression coefficients
remain unbiased [as ���^� � = �� and ���^� � = �� ] provided the regressors
are non-stochastic. However, there will be a larger variance of the OLS
estimates. Specifically, the variance of the slope coefficient will be:
��� ��� + ���
���^� = � =
∑���(��� − �¯� )� ∑����(��� − �¯� )�
�� = �� + �� �′�� + �� (2.10)
�� = �′� + �� (2.11)
Now, if �� is also independently distributed of X' and has zero mean and ���
variance, we would have ���(�′� , �� ) = 0 and ���(�� , �� ) = 0. Substituting
(2.11) in (2.10), we get:
�� = �� + �� (�� − �� ) + �� = �� + �� �� + �� − �� �� (2.12)
In (2.12) there are two random components: (i) disturbance term from the
original model (�� ) and (ii) the measurement error (�� ) multiplied by −��. 35
Empirical Issues Indicating the composite disturbance term (�� − �� �� ) as�� , (2.12) can be re-
in Econometric
Research written as:
�� = �� + �� �� + �� (2.13)
Check Your Progress 2 [answer within the space given in about 50-100
words]
.....................................................................................................................
.....................................................................................................................
4) What are the broad reasons due to which we commonly encounter errors
of measurement in variables in economics? What are their
consequences?
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
5) State the consequences of ‘measurement errors’ in the dependent
variable.
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
6) State the consequences of ‘measurement errors’ in the independent
variables.
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
37
Empirical Issues same in the explanatory variables are more serious [since it destroys the
in Econometric
Research properties of OLS estimators (viz. unbiasedness, consistency and efficiency)].
2.7 REFERENCES
1) Gujarati, D. N. and Porter, D. C. (2010). Essentials of Econometrics,
Fourth Edition), McGraw Hill.
1) (i) omitting a variable from the model that should be included, (ii)
including an irrelevant variable in the model, (iii) miss-specification of
the functional form of the model and (iv) errors of measurement.
The second term is ‘zero’ because the term within brackets is ‘zero’.
5) (i) OLS estimates remain unbiased, (ii) variances of the estimators are
also unbiased, but the variances of the estimators will be larger, leading
to loss in precision.
40