Error in Variables
Error in Variables
Structure
9.0 Objectives
9.1 Introduction
9.8
9.9
Key Words
some Useful BookslReferences
AnswersIHints to Check Your Progress Exercises
9.0 OBJECTIVES.
After going through this unit, you should be in a position to:
9.1 INTRODUCTION - -
In ordinary least squares model we assume that sample observations are measured
accurately. All our formulae are based upon the presumption ttat variables (both
explained and explanatory) are measured without error. The only form of error
admitted to our model is in the form of disturbance term. Here the error term
represents the influence of variohs explanatory variables that have not accurately
been included in the model. However, the assumption may not be realistic,
particularly in the case of secondary data. \
Under the classical assumptions the ordinary least squares (OLS) estimators are best
linear unbiased. One of the major underlying assumptions is the interdependence of
regressors fiom the disturbanceterm. If this condition does not hold, OLS estimators
are biased and inconsistent. This statement may be illustrated by simple errors in
variables.
We discuss about the consequences in the following, if the error appears in the
measurement of dependent variable, independent variable or both.
Y, = Px,+E,- ...(9.1)
where E, represents errors associated with the specification of the model (the effects
. of omitted variables, etc.).
The measurement error ul is not associated with the regressor. Thus we have
c o v (u,, x,) = 0
The regression model is estimated with y*as the dependent variable, with no accoun
being taken of the fact that y' is not an accurate measure ofy. Therefore, instead o
estimating Eq. (9. I), we estimate
For simplicity let us assume that E(E,)= E(u ,) = 0, COV(x, ) = 0 (which is the
assumption of classical linear regression), and Cov (x,, u , ) = 0, i.e., the errors of
Errors in Variables
measurement in y' are uncorrelated with x , and Cov (c, , u, ) = 0, the equation
error and measurement error are uncorrelated.
With these assumptions, it can be shown that p estimated from either (9.1) or (9.3)
will be an unbiased estimator of the true P. Thus, the errors of measurement in
dependent variable y' do not destroy the unbiased property of the OLS estimators.
However, the variances and standard errors of P estimated from (9.1) and (9.3)
will be differentbecause, employing the usual formula, we obtain
Qbviously, the varianoe given at 19.5) is larger than the variance given at (9.4).
Therefore, although the errors of measurement in the dependent variable still give
unbiased estimates of the parameters and their variables, the estimated variances
are now larger than the case where there are no. such errors of measurement.
Now let us assume that explanatory variable xl is measured with error and the
observed value becomes x,' such that
or, Xi = Xi - vi
Putting the value of x, in (9.6), we have
where w,= E, - fll.We have assumed earlier that the measuiement error in x is
n o d b distributed with zero mean, it has no serial conelati&, and it is uneornlated
with E, . However, we can no longer assume that the composite error term w, is
independent of the explanatory variable x,*.
Treatment of Violations of
Basic Assumptions Cov (w,, xi') = E [ w , - E(w,)] [ x,*- E( xi*)]
The composite error term wl has mean zero as E (wl) = E[E,- @,=I E(q>- PE(v)=O.
Thus, the explanatory variable and the error term are correlated, which violates the
cpcial assumption of the classical linear regression model that the explanatory
variable is uncorrelated with the stochastic disturbance term. If this assumption is
violated, then OLS estimates are not only biased but also inconsistent; they remain I
C ( x , + v1)(Bx1+&/I
-
( x , + v,)?
From above we find that B is a biased estimateas E( p) t P. Let us see the asymptotic
properties of b.
Since v, and E, are stochastic and are uncorrelated with each other as well as with x,,
we can say that
Py-
XI2
plim b =plim - c v,?
X12 +
Errors in Variables
- PVflr(x>
Var(x) + a,,2 (Since x, = X - Fby definition)
-
- Pax2
2 2
0, +a,,
Since X and Y have errors of measurement, we observe x,' k d y,' instead of xl and
yl, such that
* .
where u, and vl present the errors in the values ofyl and xlrespectively. We make the
following assumptions about the error terms:
(i) There is no correlation between the error term and correspondingvariable, i.e.,
(i) There is no correlation between error of one variable and measurement of the
other variable, i.e.,
On the basis of the above assumptions, our estimated regression equation will be
' Yl = P XI
-
(x)
~ a(x)r + cv2
a
Thus, will not be a consistent estimator of p. The presence of measurement error
ofthe type in question will lead to an underestimate of the true regressiorrparameter
if ordinary least squares are used.
Y '= Y + YE ...(9.10)
I
It is also usually assumed that the errors are correlated neither with each other nor
with the permanent components:
C = p l y + p,+ u ...(9.14)
that is, permanent consumption is a linear function of permanent income, where
p, and p, are constant parameters and u is a stochastic disturbance term. Notice
that the parameter p, in (9.14) represents marginal propensity to consume (MPC).
Combining (9.14) with (9.10) and (9.1 1) yields
C = p l Y + / ? 2 + ( ~ - P I Y & ) where
, v = u+C
= .
...(9.15)
Equation (9.15) is a simple linear regression model of the same form as (9.8). Then,
r
i
!
estimator. To solve this problem, it is necessary to construct a measure of permanent
income Y before estimating the consumption function.
Check Your Progress 1
i 1) Explain the concept of errors in variable. What are its consequences?
I
Treatment of Violations of .................................................................................................................
Basic Assumptions
The method of instrumental variables involves the search for a new variable Zwhich
II
is highly correlated with the indepe ' - . variableXand at the same time uncorrelated
with the error term in the equation ':: M 11.as the errors of measurement of both
variables). In practice, we are conceined , ith the consistency of parameter estimates
and therefore concentrate on the relationship between the variable Z and the
remaining variables in the model when the sample size gets large. We define the
random variabl, Z to be an instrument if the following conditions are met.
Assuming for the moment that such a variable can be found, we can alter the l e ~ s t
squares regression procedure to obtain estimated parameters that are consistent.
Unfortunately, there is no guarantee that the estimation process will yield unbiased
parameter estimates.To simpliG the matter, let us consider the case of measurement
errors in the independeqtvariable such thaty, = px, + ci an.! only x ismeasured with
error (as x*=x + v). In order to solve the problem we take the regression equation
y, = pz, + E, Where Z is the instrumental variable. The instrumental variables estimator
of the regression slope in the above model is
The choice of this particular slope formula is made so that the resulting estimator
will be consistent. To see this, we can derive the relationship between the
instrumental-variablesestimator and the true slope parameter as follows:
p
Clearly, the choice of Z as an instrument guarantees that will approach P as the
sample size gets large [Cov (z, E*) approaches 03 and will therefore be a consistent
estimator of 0. Remember that the variable x,* in (9.16) was not replaced by z, in
Y, =Pxl+&, ...(9.1)
There is a possibility that x might be measured with error.
If x is measured with error, we have seen that consisent estimator of p can be obtained
by using an instrument z which is correlated with x' but uncorrelated with E And v.
Suppose the relationship between z and x* is given by
'
where ?I, are the regression msidwls. Substituting the value of eq. (9.19) into eq.
(9.17) we have
= plim
- P z v ; ( x 1 +a)
N
= -pay2
8,
The Hausman specificationtest is as follows: If there are two estimators and $,
that converge to the true value p under the null but converge to different values
under the alternative, the null hypothesis can be verified by testing whether the
probability limit of the difference of the two estimators is zero,
Example 9.2
I1 h the second stage % is added to the original regression to correct for measurement
error. The resulting eqpatidrl is
I
EXP = -138.51 + 0.00174 AID + 0.00018INC - 0.275 POP + 1.372 it,
A two-tailed t test of the null hypothesis that there is no measurement error would
be accepted at the 5 per cent level, since 1.73 < 1.96. However, measurement error
would be important if we were using either a one-tail test, or a two-tailed test at 10
per cent significance level. Note that cokrecting for the possibility of measurement
error has substantiall) lowered the coefficient on the AID variable, suggesting that
measurement error causes the effect of AID on spending to be overstated.
When we have the variablesy and x both measured with errors (the observed values
being y* and x*),we consider two regreesion equations:
Y P J ~ +, E ...(9.22)
-
= PIX, +
where y salary
x, =true qulifications
x, - gender
xi*= x i + vi
I, for men
-
X2 -
0, for women LC) I
-
Treatment of Violations of Thgh A > 0 implies that men are paid mdre than women'wth the same qwifi~ations
Basic Assumptions and thus there is gender discrimination. Adirect least squares estimation of eq. (9.22)
with x,' substituted for x,, and j2> 0 has been frequently used as eviden& of gender
. discrimination.
In the inverse regression we take
Check Y ~ u Progress
r 2
..
Cy1z,
1) . Show that in the two-variable model B =7
Czl
.(where z is an instrument)
will not yield a consistent estimate of the true slope parameter,
.................................................................................................................
..............%t...........................................r.....................................................
2) What is,theHausmw specification test? How would you carry out a Hausman
-
specification test to evaluate the presence or absence of measurement error?
...................................................................................................................
LET US SUM UP
In ordinary least squares model we assume that sample observations are measured
without error, which is always not true. When this assumption does not hold, OLS
estimators are biased and inconsistent. Errors may appear in the measurement of
dependeit variable, indepehdent vhable or both. When there is error in dependent
variable, this does not destroy the unbiased property of the OLS estimators but the
estimated variances are larger than the case where there is nQ such errors of
measurement. On the other hand, when there is error in the independent variable,
or both dependent and independent variables, then OLS estimates are not only biqed
hut also inconsistent.
We can test for the presence of measurement error by using Hausman specification Errors in Variables
test. One technique which is available arid can solve the measurement error problem
is the technique of instrumental variables estimation.
2) a) true b) true.