Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

Unit-2

The document discusses specification issues in econometric modeling, focusing on the consequences of omitting relevant variables, including bias in regression coefficients and unreliable hypothesis testing. It emphasizes the importance of correctly specifying models by considering relevant and irrelevant variables, as well as the impact of measurement errors. The document also provides examples and regression output illustrations to highlight the significance of careful model specification in empirical research.

Uploaded by

abdi1211001
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Unit-2

The document discusses specification issues in econometric modeling, focusing on the consequences of omitting relevant variables, including bias in regression coefficients and unreliable hypothesis testing. It emphasizes the importance of correctly specifying models by considering relevant and irrelevant variables, as well as the impact of measurement errors. The document also provides examples and regression output illustrations to highlight the significance of careful model specification in empirical research.

Uploaded by

abdi1211001
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Empirical Issues

in Econometric UNIT 2 SPECIFICATION ISSUES*


Research

Structure

2.0 Objectives
2.1 Introduction
2.2 Omission of Relevant Variables
2.2.1 Specification Errors: Illustration
2.2.2 Regression Output

2.3 Inclusion of Irrelevant Variables


2.4 Errors of Measurement
2.4.1 Errors of Measurement in Dependent Variable
2.4.2 Errors of Measurement in Explanatory Variables

2.5 Let Us Sum Up


2.6 Key Words
2.7 Suggested Books for Further Reading
2.8 Answers/Hints to Check Your Progress Exercises

2.0 OBJECTIVES
After reading this unit, you will be able to:
• state the factors which need to be considered while specifying a model for
econometric analysis;
• delineate, theoretically, the consequences of omitting relevant variables in an
exercise of econometric modelling;
• explain why it is more important to accord due attention to the underlying
theoretical considerations in specifying a model for empirical investigation;
• write a note on the useful ‘indicators’ in a ‘regression output’;
• show why it is better to err by including an ‘irrelevant variable’ as compared to
omitting a ‘relevant variable’ in a regression model;
• outline why the consequences of ‘errors in measurement of dependent variable’
is less serious as compared to that in the ‘independent variables’; and
• discuss the consequences of ‘errors in measurement’ in ‘independent variables’.

2.1 INTRODUCTION
Model specification refers to the very beginning of the process of developing
a regression model. Here, we decide which variables should be included for
empirical investigation, which of these are justified to be treated as
‘independent variables’ [to appear on the ‘right hand side’ (RHS) of the
equation (or the regression model)], whether the nature of these variables is

26 *
Rimpy Kaushal, PGDAV College, Delhi.
quantitative or qualitative, etc. We also decide on the ‘dependent variable’ Specification Issues
which appears on the ‘left hand side’ (LHS) of the equation. In this process,
while we technically assume that the regression model is correctly specified,
in practice, an exact specification of the model is difficult. Though economic
theory helps us in deciding on the specification of regression model, theory
can itself be questioned or may prove ambiguous to select suitable variables.
Therefore, unknowingly, we commit specification errors like: omitting a
variable from the model that should be included, including an irrelevant
variable in the model, miss-specification of the functional form of the model
or make errors of measurement. Thus, specification errors may occur at two
stages viz. (i) when the functional form considered (vis-à-vis the explanatory
variables to be included in the regression model) is not close to the true
relationship in the population or (ii) when errors are committed while
measuring the variables.

We are aware that the properties of the estimated regression coefficients


closely depend on the validity of the assumption that there is no
‘specification error’ in the model. Therefore, if we leave out a variable that is
crucial, the estimated regression coefficients would be biased. In this case,
the estimated standard errors, the confidence intervals for the estimated
coefficients and hence the computed ‘test statistic’ itself would be incorrect
(or far from the unknown true value in the population). As a result, the results
of the hypothesis testing would not be reliable. On the other hand, if we
include an irrelevant variable in the model, the regression estimates could be
unbiased but yet inefficient (i.e. the standard errors and confidence intervals
could be unduly large). The objective of the present unit is to learn about the
consequences of various types of miss-specification in an estimated
regression model and more importantly how to avoid making such errors.

2.2 OMISSION OF RELEVANT VARIABLES


Quite often, unknowingly, we may omit a relevant explanatory variable from
being included in the regression model. Consider a regression model in which
the dependent variable Y is actually related to two variables X2 and X3 ,like:

� = �� + �� �� + �� �� + �� (2.1)

Not knowing this, suppose we wrongly construe and specify the model as:

�� = �� + �� ��� + �� (2.2)

We know that the estimate of ��, ��� or the OLS estimator for ��, can be
obtained from Equation 2.2. as:


∑ (��� ��¯� )��
��� = ∑���
� (�
��¯ )�
(2.3)
��� �� �

Substituting for Yi from (2.1) in (2.3) we get:


27
Empirical Issues ∑��=1(�2� − �¯2 ) ��1 + �2 �2� + �3 �3� + �� �
in Econometric ��� =
∑��=1(�2� − �¯2 )2
Research
�� ∑����(��� − �¯� ) + �� ∑����(��� − �¯� ) ��� + �� ∑����(��� − �¯� ) ��� + ∑����(��� − �¯� ) ��
=
∑����(��� − �¯� )�

Note that ∑����(��� − �¯� ) = 0 and


∑����(��� − �¯� ) ��� = ∑����(��� − �¯� )� (2.3*)

(see exercise 4, CYP 1, for this step)


� �
� ∑ (� ��¯ )� �∑ (��� ��¯� )��
Hence ��� = �� + � ��� ��∑� � (��� ��¯���
)�
(2.4)
��� �� �

Taking expectations on both sides of (2.4) we get:

� ) = �� + �� ∑����(��� − �¯� ) ���


�(� 2 ∑����(��� − �¯� )�

Using the assumption from the true model that �(�|�) = 0

����� � = �� + �� �^� (2.5)


∑ �
�� −�¯ ��� −�¯ � ∑ �
��2� −�¯2 ��3�
where ��� = �=1 � 2� 2 ¯ 3�2 3 = �=1
� ¯ 2
..
∑�=1��2� −�2 � ∑�=1��2� −�2 �

Therefore, the expectation of ��� is not equal to �� or there is a bias in the


estimate called Omitted Variable Bias that depends on �� i.e. the effect of the
excluded variable X3 on the dependent variable Y with �� representing the
effect of explanatory variable X2 included In the omitted variable X3. Other
similar consequences of omitting a variable can be stated as the following.

i) By omitting a variable that is correlated to the dependent variable, and to


any of the explanatory variables, the estimates of the regression
coefficients will be biased. The nature of the bias depends on the nature
of correlations between: (a) the dependent variable and omitted variables
and (b) independent variables and omitted variables. If the true
regression model is as in (2.1), in which X3 is the omitted variable, the
nature of bias can be summarised as below.

Relationship X2 and X3 negatively X2 and X3 positively


correlated correlated
Y and X3 negatively �� is overestimated ��is underestimated
correlated

Y and X3 positively �� is underestimated ��is overestimated


correlated

ii) The OLS estimates will also be inconsistent to the extent that even in
large samples, the estimates would remain biased.
28
iii) If X2 and X3 are not related in the sample, the value of �^� will be zero and Specification Issues
the estimate for slope coefficient will be unbiased as well as consistent.
However, the estimate for intercept would remain biased, unless the
mean of X3 is zero.

iv) The error variance estimated from the mis-specified model will also be a
biased estimator of the true error variance σ2 .Consequently, the estimates
for the variance of slope coefficient will also be biased and the variance
of the estimated slope coefficient ��� of the miss-specified model will be
overestimated.

v) As a result of all these, the confidence interval and the result of


hypothesis testing will be seriously comprmised to the extent that they
are unreliable.

2.2.1 Specification Errors: Illustration


The above explanation on the consequences of ‘specification errors’ is
theoretical. We can understand this better with the help of an illustration.
Consider a ‘consumer expenditure survey’ conducted for a sample of 6334
individuals. We can take data on ‘log of annual expenditure on food’ (i.e.
LGAEOF) as the dependent variable Y. We might consider regressing
LGAEOF on two explanatory variables viz. (i) log of total annual household
expenditure (LGTAHEXP: X1) and (ii) log of the number of number of
persons in the household (LGNOP: X2). Let us consider the results of two
separate regressions: one regressing Y on only X1 (an under-specified model:
Model 1) and the second by regressing Y on both X1 and X2 (Model 2). Note
that by taking logarithmic values of both the dependent and the ‘independent
variables’, we are focusing on investigating the impact of the relative changes
in Xi over Y and the regression coefficients gives us a measure of ‘elasticity’.
For four present purpose, we consider the estimated values of the two
regressions as in Table 2.1.

Table 2.1: Consequence of Specification Error

Dependant Results of Regression Estimation


Variable: Model 1 Model 2
LGAEOF
Value S. E. t-Ratio Value S. E. t-Ratio
Intercept/
(β) (β)
Variables
Intercept 0.70 0.08 8.3 1.2 0.08 14.1
LGTAHEXP (X1) 0.67 0.01 68.7 0.58 0.01 60.1
LGNOP (X2) - - - 0.33 0.01 26.2
Adjusted R2 0.43 0.48

A standard regression output generated by common software usually presents


a number of values. For our present purpose, in this sub-section, we shall
confine ourselves to the three basic values viz. estimated value of the
29
Empirical Issues coefficients of ‘independent/explanatory variables’, the S. E. (standard error)
in Econometric
Research of the estimated value and the value of t-statistic (which is a ratio of
‘estimated value’ and the S. E. i.e. t = value ÷ S.E.). For easy comprehension,
the values are presented up to one or two decimal points in Table 2.1.

Clearly, in an empirical exercise on estimating the annual expenditure on


food, the number of persons in a family is an important explanatory variable
(as more persons would mean higher expenditure on food and vice versa).
From this angle, Model 2 ought to be a better specified model. Let us
examine whether this is actually the case or not? You may observe that
‘despite the standard error of both the intercept term and X1 being the same’
(0.08 and 0.01 respectively), there is considerable deviation in the estimated
values of both the terms. Specifically, in Model 2, the intercept term is over-
estimated while LGTAHEXP is under-estimated. In other words, there is an
upward bias in the estimate for the intercept term and a downward bias in that
for LGTAHEXP. Further, the overall explanatory power of the Model 2 is
only marginally higher by 0.05. The illustration therefore underscores: (i) the
importance of carefully specifying a regression model and (ii) the need to
consider other variables which too might influence four dependent variable.
For instance, we can first calculate the average expenditure on food in a
household and then take its logarithmic value as Y. Taking such ratios of per
capita values quite often makes for a better specification of a Model. You are
aware from your study of the course on Introductory Econometrics that many
times such ‘ratio transformation’ helps us in controlling for the problem of
multicollinearity in the original variables. The low value of R2 in the
illustration above is indicative of the fact that there are possibly other
variables (e.g. total income level of the household) which are important to be
included in the model. In other words, understanding the underlying theory is
more important in specifying a model for empirical investigation. We must
however also note that inclusion of many explanatory variables can introduce
‘over-specification errors’. We shall study on this in the subsequent section
of this unit. Hence, due care is required to be taken for a judicious choice of
explanatory variables in the model. The example considered here is of a
cross-section data in which the sample size is usually large. However, when
we consider time series data (e.g. annual time series data), our sample size (n)
will be much smaller. In such cases, we know that as a general rule, ‘k’ (the
number of parameters estimated) should be less than ‘n + k – 1’.

2.2.2 Regression Output


A standard regression output presented by any software package presents the
results up to many decimal points. Besides, the values of estimated value of
parameters or coefficient of regression, their standard error and t-ratio (both
for the intercept term and each one of the independent variables), many other
summary statistics are also presented. Specifically, these relate to: (i) mean of
the dependent variable, (ii) standard deviation of the dependent variable, (iii)
sum of squares of residuals [ ∑ ei2 where �� = �� − � ^� ], (iv) standard error of
30
regression [i.e. (�� )], (v) R-square and adjusted R-square, (vi) F-value and P- Specification Issues
value of F that indicates the joint significance of the regression model, (vii)
log-likelihood value (which indicates fitness of the model such that a model
with higher value of log likelihood is preferred,), (viii) Akaike criterion, (ix)
Schwarz criterion and (x) Hann-Quinn criterion (which are all criteria used
for model selection). An illustration of the results from a standard regression
analysis is presented in Table 2.2. Let us now learn the usefulness of some of
these values and criterion here.

Table 2.2: Illustration of the Results from a Standard Regression Output

Ordinary Least Square


Number of observation: 6334
Dependent variable: LGAEOF

Coefficient Std. Error t-ratio p-value


Constant 1.15833 0.0820119 14.12 0.0000***
LGEXP 0.584210 0.00971737 60.12 0.0000***
LGSIZE 0.334348 0.0127587 26.21 0.0000***
Mean dependent variable 6.474297 S.D. dependent var 0.779391
Sum squared residuals 1988.365 S.E. of regression 0.560418
R-squared 0.483136 Adjusted R-squared 0.482973
F(2, 6331) 2958.936 P-value(F) 0.000000
Log-likelihood −5318.209 Akaike criterion 10642.42
Schwarz criterion 10662.68 Hannan-Quinn 10649.43

Check Your Progress 1 [answer within the space given in about 50-100
words]

1) State the four types of specification errors.


……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
2) What are the consequences of omitting an important variable from
inclusion in a regression model?
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
3) State the consequences of including an irrelevant variable in a regression
model.
……………………………………………………………………………
31
Empirical Issues ……………………………………………………………………………
in Econometric
Research ……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
4) In Equation (2.3*), show that: ∑����(��� − �¯� ) ��� = ∑����(��� − �¯� )� .
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
5) In what way, taking ‘ratio transformation’ is helpful in empirical
investigation?
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………

2.3 INCLUSION OF IRRELEVANT VARIABLES


Sometimes, in order to avoid the consequences of omitting a relevant
variable, we may include some variables even though the theoretical
justification may be lacking. The rationale behind this approach is that over-
specification of a model (i.e. inclusion of unnecessary variables not justified
by theory), does not harm the basic properties of the model. In other words,
(i) the regression estimates still remain unbiased and consistent and (ii)
standard confidence intervals and hypothesis testing also remain valid. For a
theoretical account of this fact, consider a simple two-variable model:

�� = �� + �� ��� + �� (2.6)

While we assume (2.6) to be correctly specified, let us say, instead of (2.6),


we proceed to estimate the following regression equation by including ��:

� = �� + �� �� + �� �� + �� (2.7)

where X3 is not a relevant variable with the theoretical backing of any


relationship between Y and X3 . The consequences of committing this
specification error are the following.

i) The OLS estimators of �� and �� in (2.7) are unbiased [i.e. ���^� � = ��


and ��^� � = �� ]. They are also consistent.
32
ii) The confidence interval and hypothesis-testing procedure are valid. Specification Issues

iii) The estimator for variance of �� i.e. ��� is unbiased and correctly
estimated.

However, a negative consequence of (2.7) is that the OLS estimators are


inefficient since the variance of ��� in (2.7) will be larger than that of (2.6). To
verify this, consider the variance of ��� of (2.6):
���
���� =
� ∑����(��� − �¯� )�

The variance of OLS estimate of �� estimated from (2.7) is:


��� 1
���� =
� � ¯
∑���(��� − �� ) �1 − ���� �� �

Thus, the difference between two variances will be large depending on how
large and close to 1 or – 1 is the correlation coefficient ��� �� . This difference
will be zero if the above correlation coefficient is also equal to zero. In this
case, the variance of estimator of �� from both (2.6) and (2.7) will be
identical. We can illustrate this by considering the regression result of
LGAEOF (i.e. logarithm of total annual household expenditure on food) on
LGTAHEXP (i.e. logarithm of total annual household total expenditure) and
LGNOP (i.e. logarithm of number of persons in the household) [from the data
collected for the 6334 households in the consumer expenditure survey
considered in sub-section 2.2.1]. Recall that this model was assumed to be
specified correctly. Now, if we include another variable LGHOUS (logarithm
of annual expenditure on housing services) without the theoretical
justification for its inclusion, we get the following result (Table 2.3). It is just
by chance that the coefficient of the variable LGHOUS is statistically
significant. Further, despite the inclusion of this variable, although the
estimates of the slope coefficient remain unbiased (since the coefficients of
LGTAHEXP from the two regressions are not much apart: being 0.58 and
0.64 respectively), their standard errors have increased (from 0.0097 to
0.0126) leading to a loss in efficiency. Note that this is not the case for
LGNOP since both the slope coefficient and their ‘standard errors’ (SEs) are
close (i.e. 0.33 & 0.32 and 0.0128 & 0.0129 respectively).

Table 2.3: Results of Regression by Including an Additional Variable

Ordinary Least Square


Number of observations 1-6334 (n = 6223)
Dependent variable: LGAEOF
Coefficient Std. error t-ratio p-value
Constant 1.04448 0.0839901 12.44 4.36e-35 ***
LGTAHEXP 0.63554 0.0126350 50.30 0.0000 ***
LGHOUS − 0.0474 0.00803035 −5.897 3.89e-09 ***
LGNOP 0.324457 0.0128652 25.22 1.05e-133 ***
33
Empirical Issues Mean of dependent variable 6.478563 S.D. of dependent variable 0.777861
in Econometric
Research Sum square of residuals 1935.247 S.E. of regression 0.557838
R-squared value 0.485953 Adjusted R-squared 0.485705
F(3, 6219) 1959.707 P-value(F) 0.000000

If we omit a variable that is relevant for the model, then the estimates of
regression coefficients become biased, inconsistent and inefficient with the
result that the usual hypothesis testing procedures (based on t and F-test)
becomes invalid. In other words, the estimates of the model lose their
relevance. On the other hand, if we include an irrelevant or unnecessary
variable, not only the OLS estimators still remain unbiased and consistent,
the hypothesis testing procedures remain valid. However, the efficiency of
the estimates of regression coefficients gets highly compromised in the sense
that larger variances lead to wider confidence intervals. As a result, in some
cases, we may fail to reject the null hypothesis of no significance. We can,
therefore, conclude that it is better to include irrelevant variables than to omit
a relevant variable. But this approach should not be stretched as there is a
cost for such inclusion in terms of both loss in efficiency and the degree of
freedom. The best approach is to include only those variables that are
theoretically justified.

2.4 ERRORS OF MEASUREMENT


Very often, it happens that while investigating relationship between variables
in economics, the variables involved are not measured correctly. Most of the
time, macroeconomic data on variables such as gross domestic product,
inflation, etc. are measured through sample and hence tend to be
approximations. Even microeconomic surveys are based on information
collected from individual units and thus might have been measured
inaccurately. While a variable is defined in a certain way, the data available
through secondary sources may not exactly correspond to such a definition.
Thus, in practice, there can be several reasons for errors of measurement in
the variables. These could be grouped under: (i) errors in reporting, (ii)
missing observations or (iii) human errors. Whatever may be reason for such
errors, these errors cause specification errors leading to serious consequences.

2.4.1 Errors of Measurement in Dependent Variable


In case of errors in measuring the dependent variable, the consequences can
be thought of as being accounted for in the stochastic term included in the
model. Consequently, the model tend to become imprecise i.e. it leads to a
loss in the precision of the regression estimates. However, the estimates will
remain unbiased. Let us consider the true value of the dependent variable Y to
be Z and its relationship with ��� can be expressed as:

�� = �� + �� ��� + �� (2.8)

34
Since Y is the value that is actually sought to be measured empirically, with Specification Issues
say �� as the measurement error, we have:

�� = �� + �� or �� = �� − ��

Hence, (2.8) can be re-written as:

�� = �� + �� ��� + �� (2.9)

where �� = �� + �� . Note that (2.9) is different from (2.8) in the sense that
the error term �� has two components: (i) the error term from the original
model (�� ) and (ii) the error of measurement (�� ). Since the explanatory
variable remain unaffected, the OLS estimates of the regression coefficients
remain unbiased [as ���^� � = �� and ���^� � = �� ] provided the regressors
are non-stochastic. However, there will be a larger variance of the OLS
estimates. Specifically, the variance of the slope coefficient will be:
��� ��� + ���
���^� = � =
∑���(��� − �¯� )� ∑����(��� − �¯� )�

which is larger than the variance when there is no error of measurement of


dependent variable. Thus, the consequences of errors in measurement of
dependent variable are: (i) the OLS estimates remain unbiased, (ii) the
variances of the estimators are also unbiased, but, (iii) the variances of the
estimators will be larger, leading to loss in precision. In sum, we may
therefore conclude that the errors of measurement in dependent variable do
not matter much in practice i.e. it is less serious. Note that this is in a relative
sense as the next sub-section (2.4.2) shows.

2.4.2 Errors of Measurement in Explanatory Variables


Unlike the errors of measurement in dependent variable, the errors of
measurement in explanatory variables of the model is more serious in nature.
This is because, estimators of regression coefficients remains neither
unbiased nor consistent. Suppose the true relationship between Y and X' is
like:

�� = �� + �� �′�� + �� (2.10)

where the disturbance term �� is distributed independently of X' with


zeromean and variance ��� . If we assume that X is the inaccurately measured
value of X' in which �� as the measurement error, then we can write it as:

�� = �′� + �� (2.11)

Now, if �� is also independently distributed of X' and has zero mean and ���
variance, we would have ���(�′� , �� ) = 0 and ���(�� , �� ) = 0. Substituting
(2.11) in (2.10), we get:

�� = �� + �� (�� − �� ) + �� = �� + �� �� + �� − �� �� (2.12)

In (2.12) there are two random components: (i) disturbance term from the
original model (�� ) and (ii) the measurement error (�� ) multiplied by −��. 35
Empirical Issues Indicating the composite disturbance term (�� − �� �� ) as�� , (2.12) can be re-
in Econometric
Research written as:

�� = �� + �� �� + �� (2.13)

Now, for estimating the parameters of (2.13), if we assume that there is no


systemic association between �� and �� , we would have ���(�� , �� ) = 0. By
assuming further that the disturbance term of (2.10), satisfies the assumptions
of classical linear regression, we have:

���(�� , �� ) = ���(−�� �� , �� ) = −�� � � � (2.14)

Thus, the CLRM assumption of no systemic association between �� and �� , is


violated with the consequence that ��� becomes a biased and inconsistent
estimator of ��. The consequences of the measurement errors in explanatory
variables may therefore be summarised as: (i) the OLS estimators are biased,
(ii) the estimators are inconsistent with the bias increasing with increase in
the sample size. Thus, the error of measurement in explanatory variable is
more serious than that in the dependent variable. If the errors of measurement
happen in both the dependent as well as the explanatory variables, the
problem would be even more serious.

Check Your Progress 2 [answer within the space given in about 50-100
words]

1) State the two basic properties of a regression model.


.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
2) What is the consequence of omitting a relevant explanatory variable in a
regression model?
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
3) State the consequence of inclusion of a irrelevant explanatory variable in
a regression model?
.....................................................................................................................
.....................................................................................................................
36
..................................................................................................................... Specification Issues

.....................................................................................................................
.....................................................................................................................
4) What are the broad reasons due to which we commonly encounter errors
of measurement in variables in economics? What are their
consequences?
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
5) State the consequences of ‘measurement errors’ in the dependent
variable.
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
6) State the consequences of ‘measurement errors’ in the independent
variables.
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................

2.5 LET US SUM UP


The CLRM assumes that regression model is ‘correctly specified’. The term
‘correctly specified’ means that all the theoretically relevant variables are
included in the model, irrelevant variables are excluded and there are no
errors of measurement. If an important explanatory variable is excluded from
the model, the coefficient of the model becomes not only biased but also
inconsistent and inefficient. As a result, the hypothesis-testing procedure
become invalid. On the other hand, if an irrelevant variable is included in the
model, the estimated coefficients remain unbiased and consistent but there
will be a loss in the precision of the estimators (since the standard errors of
the coefficients will be larger). In case of errors in measurement of variables,
while the measurement error in dependent variable is not very serious, the

37
Empirical Issues same in the explanatory variables are more serious [since it destroys the
in Econometric
Research properties of OLS estimators (viz. unbiasedness, consistency and efficiency)].

2.6 KEY WORDS

Relevant Variable : A variable with the theoretical justification for


inclusion in the model.
Irrelevant : A variable without the theoretical backing but of
Variable which the researcher is unsure and hence prefers to
have it included in the regression model. Such
inclusion, called over-specification, does not harm
the basic properties of the model. This means that
the estimates of regression co-efficients are on
biased and consistent so that the results of
hypothesis testing remain valid.
Omission of a : The consequences of this are: (i) estimated slope
Relevant Variable coefficients will be biased, (ii) OLS estimates are
inconsistent so much so that even large samples
would not eliminate this and (iii) both the
confidence interval and the result of hypothesis
testing are seriously compromised.
Inclusion of : This results in a situation which, in comparative
Irrelevant terms, is far less serious than the omission of a
Variable relevant variable. This is because of the sustenance
of the three basic properties viz. unbiasedness,
consistency and validation of test results.
Measurement : This is situation in which both the estimates of
Error in coefficients and the variance estimates remains
Dependent unbiased. The latter will be however larger leading
Variable to loss in precision. Hence, viewed relatively less
serious.
Measurement : This is a situation where estimators would be biased
Error in and inconsistent with the extent of bias increasing
Independent with increase in sample size. Hence, viewed
Variables relatively, this is more serious.

2.7 REFERENCES
1) Gujarati, D. N. and Porter, D. C. (2010). Essentials of Econometrics,
Fourth Edition), McGraw Hill.

2) Dougherty, C. (2011). Introduction to Econometrics, Oxford University


Press.

3) Gujarati, D. N. and Porter, D. C. (2009). Basic Econometrics (Fifth


38 Edition), McGraw Hill.
4) Wooldridge, J. M. (2009). Econometrics, Cengage Learning. Specification Issues

2.8 ANSWERS/HINTS TO CHECK YOUR


PROGRESS EXERCISES
Check Your Progress 1

1) (i) omitting a variable from the model that should be included, (ii)
including an irrelevant variable in the model, (iii) miss-specification of
the functional form of the model and (iv) errors of measurement.

2) The estimated regression coefficients would be biased and their standard


errors (and hence the confidence intervals for the estimated coefficients)
would be wide. As a consequence, the computed ‘test statistic’ would be
incorrect and the results of the hypothesis testing would be unreliable.

3) The estimates of regression coefficients could be unbiased but inefficient


(i.e. the standard errors and confidence intervals could be unduly large).

4) RHS = ∑����(��� − �¯� ) (��� − �¯� )

= ∑ [��� �(��� − �¯� )� − �¯� ∑(��� − �¯� )].

The second term is ‘zero’ because the term within brackets is ‘zero’.

5) It helps in reducing or eliminating the collinearity effect sometimes.

Check Your Progress 2

1) (i) the estimates of regression coefficients should be unbiased and


consistent; (ii) confidence interval should be valid so that the results of
the hypothesis testing too are valid.

2) The estimates of regression coefficients will be biased, inconsistent and


inefficient. As a result, the results of the hypothesis testing (based on t
and F tests) will be invalid.

3) The OLS estimators would be unbiased and consistent. The results of


hypothesis testing too remain valid. However, efficiency is compromised
with larger variances of the estimates and hence wider confidence
intervals.

4) (i) errors in reporting, (ii) missing observations or (iii) human errors.


The consequence is that they lead to specification errors with their
attendant consequences on estimates and results of testing.

5) (i) OLS estimates remain unbiased, (ii) variances of the estimators are
also unbiased, but the variances of the estimators will be larger, leading
to loss in precision.

6) Estimators of regression coefficients are neither unbiased nor consistent.


Moreover, the magnitude of bias increases with increase in sample size.
39
Empirical Issues Hence, the consequence of errors in measuring explanatory variables is
in Econometric
Research more serious than that of error in measuring the dependent variable.

40

You might also like