Chapter16 Econometrics Measurement Error Models
Chapter16 Econometrics Measurement Error Models
Chapter16 Econometrics Measurement Error Models
A fundamental assumption in all the statistical analysis is that all the observations are correctly measured. In
the context of multiple regression model, it is assumed that the observations on the study and explanatory
variables are observed without any error. In many situations, this basic assumption is violated. There can be
several reasons for such a violation.
For example, the variables may not be measurable, e.g., taste, climatic conditions, intelligence,
education, ability etc. In such cases, the dummy variables are used, and the observations can be
recorded in terms of values of dummy variables.
Sometimes the variables are clearly defined, but it is hard to take correct observations. For example,
the age is generally reported in complete years or in multiple of five.
Sometimes the variable is conceptually well defined, but it is not possible to take a correct
observation on it. Instead, the observations are obtained on closely related proxy variables, e.g., the
level of education is measured by the number of years of schooling.
Sometimes the variable is well understood, but it is qualitative in nature. For example, intelligence is
measured by intelligence quotient (IQ) scores.
In all such cases, the true value of the variable can not be recorded. Instead, it is observed with some error.
The difference between the observed and true values of the variable is called as measurement error or
errors-in-variables.
where y is a n 1 vector of true observation on study variable, X is a n k matrix of true observations
on explanatory variables and is a k 1 vector of regression coefficients. The value y and X are not
observable due to the presence of measurement errors. Instead, the values of y and X are observed with
additive measurement errors as
y y u
X X V
where y is a n 1 vector of observed values of study variables which are observed with (n 1)
which are observed with n k matrix V of measurement errors in X . In such a case, the usual disturbance
term can be assumed to be subsumed in u without loss of generality. Since our aim is to see the impact of
measurement errors, so it is not considered separately in the present case.
We assume that
E (u ) 0, E (uu ') 2 I
E (V ) 0, E (V 'V ) , E (V ' u ) 0.
Suppose we ignore the measurement errors and obtain the OLSE. Note that ignoring the measurement errors
in the data does not mean that they are not present. We now observe the properties of such an OLSE under
the setup of measurement error model.
b X ' X X ' X
1
X ' X X '
1
E b E X ' X X '
1
X ' X X ' E ( )
1
0
as X is a random matrix which is correlated with . So b becomes a biased estimator of .
1 1 1
X ' X ' V '
n n n
1 1
X '(u V ) V '(u V )
n n
0.
Thus b is an inconsistent estimator of . Such inconsistency arises essentially due to correlation between
X and .
Note: It should not be misunderstood that the OLSE b X ' X X ' y is obtained by minimizing
1
S ' y X ' y X in the model y X . In fact ' cannot be minimized as in the case of
To see the nature of consistency, consider the simple linear regression model with measurement error as
yi 0 1 xi , i 1, 2,..., n
yi yi ui
xi xi vi .
Now
1 x1 1 x1 0 v1
1 x2 1 x2 0 v2
X , X , V
1 xn 1 xn 0 vn
and assuming that
1 n
plim xi
n i 1
1 n
plim ( xi ) 2 x2 ,
n i 1
we have
Also,
1
vv plim V 'V
n
0 0
2
.
0 v
Now
plim b xx vv vv
1
1
b 0 1 0 0 0
plim 0 2 2
b1 1 x v 0 v 1
2 2
1 x2 2 v2 0
2
x v2 2
1 v
v2
2 1
v x2
.
2
2 v 2 1
x v
Thus we find that the OLSEs of 0 and 1 are biased and inconsistent. So if a variable is subjected to
measurement errors, it not only affects its own parameter estimate but also affect other estimator of
parameter that are associated with those variable which are measured without any error. So the presence of
measurement errors in even a single variable not only makes the OLSE of its own parameter inconsistent but
also makes the estimates of other regression coefficients inconsistent which are measured without any error.
1. Functional form: When the xi ' s are unknown constants (fixed), then the measurement error model is
2. Structural form: When the xi ' s are identically and independently distributed random variables, say, with
mean and variance 2 2 0 , the measurement error model is said to be in the structural form.
3. Ultrastructural form: When the xi ' s are independently distributed random variables with different
means, say i and variance 2 2 0 , then the model is said to be in the ultrastructural form. This form is
a synthesis of function and structural forms in the sense that both the forms are particular cases of
ultrastructural form.
Z ' X Z ' X
1
1
plim ˆIV 1 1
plim Z ' X plim Z '
n n
1
ZX .0
0.
Any instrument that fulfils the requirement of being uncorrelated with the composite disturbance term and
correlated with explanatory variables will result in a consistent estimate of parameter. However, there can be
various sets of variables which satisfy these conditions to become instrumental variables. Different choices
of instruments give different consistent estimators. It is difficult to assert that which choice of instruments
Econometrics | Chapter 16 | Measurement Error Models | Shalabh, IIT Kanpur
8
will give an instrumental variable estimator having minimum asymptotic variance. Moreover, it is also
difficult to decide that which choice of the instrumental variable is better and more appropriate in
comparison to other. An additional difficulty is to check whether the chosen instruments are indeed
uncorrelated with the disturbance term or not.
Choice of instrument:
We discuss some popular choices of instruments in a univariate measurement error model. Consider the
model
yi 0 1 xi i , i ui 1vi , i 1, 2,..., n.
A variable that is likely to satisfy the two requirements of an instrumental variable is the discrete grouping
variable. The Wald’s, Bartlett’s and Durbin’s methods are based on different choices of discrete grouping
variables.
1. Wald’s method
Find the median of the given observations x1 , x2 ,..., xn . Now classify the observations by defining an
Another group with those xi ' s above the median of x1 , x2 ,..., xn . Find the means of yi ' s and
n
n xi n nx
Z'X n
i 1
n
n
0 x2 x1
Z i Z i xi 2
i 1 i 1
n
yi ny
Z'y n
i 1
n
y2 y1
Z i yi 2
i 1
1
ˆ0 IV n nx ny
n
ˆ 0 x2 x1 n y2 y1
1IV 2 2
x2 x1 y
2 2 x
x2 x1 0 y2 y1
1 2
y2 y1
y x
x2 x1
y2 y1
x2 x1
y y
ˆ1IV 2 1
x2 x1
y2 y1
ˆ0 IV y ˆ
x y 1IV x .
x2 x
If n is odd, then the middle observations can be deleted. Under fairly general conditions, the estimators are
consistent but are likely to have large sampling variance. This is the limitation of this method.
order. Now three groups can be formed, each containing n / 3 observations. Define the instrumental variable
as
1 if observation is in the top group
Z i 0 if observation is in the middle group
1 if observation is in the bottom group.
Now discard the observations in the middle group and compute the means of yi ' s and xi ' s in
Substituting the values of X and Z in ˆIV Z ' X Z ' y and on solving, we get
1
y3 y1
ˆ1IV
x3 x1
ˆ0 IV y ˆ1IV x .
These estimators are consistent. No conclusive pieces of evidence are available to compare Bartlett’s method
and Wald’s method but three grouping method generally provides more efficient estimates than two
grouping method is many cases.
3. Durbin’s method
Let x1 , x2 ,..., xn be the observations. Arrange these observations in an ascending order. Define the
instrumental variable Z i as the rank of xi . Then substituting the suitable values of Z and X in
Z y y
i i
ˆ 1IV i 1
n
.
Z x x
i 1
i i
Since the estimator uses more information, it is believed to be superior in efficiency to other grouping
methods. However, nothing definite is known about the efficiency of this method.
In general, the instrumental variable estimators may have fairly large standard errors in comparison to
ordinary least square estimators which is the price paid for inconsistency. However, inconsistent estimators
have little appeal.
Assume
2 if i j
E ui 0, E ui u j u
0 if i j ,
v2 if i j
E vi 0, E vi v j
0 if i j,
E uV
i j 0 for all i 1, 2,..., n; j 1, 2,..., n.
For the application of the method of maximum likelihood, we assume the normal distribution for ui and vi .
We consider the estimation of parameters in the structural form of the model in which xi ' s are stochastic. So
assume
xi ~ N , 2
E xi vi
2
2 v2
E yi 0 1 E xi
0 1 .
Var yi E yi E ( yi )
2
E 0 1 xi ui 0 1
2
12 2 u2
1 2 0 0 0
1 2 .
So
1
n /2
i i x
x yi 0 1 xi
2 2
exp i 1 exp i 1 .
2 u v 2 v2
2
2 u
Econometrics | Chapter 16 | Measurement Error Models | Shalabh, IIT Kanpur
13
The log-likelihood is
n n
xi xi y 0 1 xi
2 2
i
L* ln L constant
n
2
ln u2 ln v2 i 1
2 u2
i 1
2 v2
.
The normal equations are obtained by equating the partial differentiations equals to zero as
L * 1 n
(1) yi 0 1 xi 0
0 v2 i 1
L * 1 n
(2) xi yi 0 1 xi 0
1 v2 i 1
L * 1
(3) 2 xi xi 2 yi 0 1 xi 0, i 1, 2,..., n
xi u v
L * n 1 n
4 i
x xi
2
(4)
u
2
2 u 2 v i 1
2
L * n 1 n
4 i
y 0 1 xi .
2
(5)
v
2
2 v 2 v i 1
2
These are n 4 equations in n 4 parameters but summing equation (3) over i 1, 2,..., n and using
These equations can be used to estimate the two means and 0 1 , two variances and one covariance.
The six parameters , 0 , 1 , u2 , v2 and 2 can be estimated from the following five structural relations
derived from these normal equations
(i ) x
(ii ) y 0 1
(iii) mxx 2 v2
(iv) m yy 12 2 u2
(v ) mxy 1 2
1 n 1 n 1 n 1 n 1 n
i i xx n i ( xi x )( yi y ).
2
where x x , y y , m xi x , m yy ( y y ) 2
and m xy
n i 1 n i 1 i 1 n i 1 n i 1
E ( x)
E ( y ) 0 1
Var ( x) 2 v2
Var ( y ) 12 2 u2
Cov( x, y ) 1 2 .
We observe that there are six parameters 0 , 1 , , 2 , u2 and v2 to be estimated based on five structural
equations (i)-(v). So no unique solution exists. Only can be uniquely determined while remaining
parameters can not be uniquely determined. So only is identifiable and remaining parameters are
unidentifiable. This is called the problem of identification. One relation is short to obtain a unique solution,
so additional a priori restrictions relating any of the six parameters is required.
Note: The same equations (i)-(v) can also be derived using the method of moments. The structural
equations are derived by equating the sample and population moments. The assumption of normal
distribution for ui , vi and xi is not needed in case of method of moments.
ˆ0 y ˆ1 x
is estimated if ̂1 is uniquely determined. So we consider the estimation of 1 , 2 , u2 and v2 only. Some
additional information is required for the unique determination of these parameters. We consider now
various type of additional information which are used for estimating the parameters uniquely.
Suppose v2 is known a priori. Now the remaining parameters can be estimated as follows:
mxx 2 v2 ˆ 2 mxx v2
mxy
mxy 1 2 ˆ1
mxx v2
m yy 1 2 u2 ˆ u2 myy ˆ12ˆ 2
mxy2
myy .
mxx v2
Note that ˆ 2 mxx v2 can be negative because v2 is known and mxx is based upon sample. So we assume
Similarly, ˆ u2 is also assumed to be positive under suitable condition. All the estimators ˆ1 , ˆ 2 and ˆ u2 are
the consistent estimators of , 2 and u2 respectively. Note that ˆ1 looks like as if the direct regression
estimator of 1 has been adjusted by v2 for its inconsistency. So it is termed as adjusted estimator also.
2. u2 is known
Suppose u2 is known a priori. Then using mxy 1 2 , we can rewrite
m yy 12 2 u2
mxy 1 u2
m yy u2
ˆ1 ; myy u2
mxy
mxy
ˆ 2
ˆ1
ˆ mxx ˆ 2 .
2
v
The estimators ˆ1 , ˆ 2 and ˆ v2 are the consistent estimators of 1 , 2 and v2 respectively. Note that ̂1 looks
like as if the reverse regression estimator of 1 is adjusted by u2 for its inconsistency. So it is termed as
Consider
m yy 12 2 u2
1 mxy v2 (using (iv))
1 mxy mxx 2 (using (iii))
m
1 mxy mxx xy (using iv)
1
ˆ1
yy yy
2mxy
U
, say.
2mxy
2mxy2
0
U
since mxy2 0, so U must be nonnegative.
ˆ1
yy yy
.
2mxy
Econometrics | Chapter 16 | Measurement Error Models | Shalabh, IIT Kanpur
17
Other estimates are
myy 2ˆ1mxy ˆ12 sxx
ˆ v
2
ˆ 2 1
mxy
ˆ 2 .
ˆ1
Note that the same estimator ˆ1 of 1 can be obtained by orthogonal regression. This amounts to transform
xi by xi / u and yi by yi / v and use the orthogonal regression estimation with transformed variables.
explanatory variable and K x 0, means 2 0 which means the explanatory variable is fixed. A higher
value of K x is obtained when v2 is small, i.e., the impact of measurement errors is small. The reliability
mxx 2 v2
mxy 1 2
mxy 1 2
mxx 2 v2
1 K x
mxy
ˆ1
K x mxx
m
2 xy
1
ˆ 2 K x mxx
mxx 2 v2
ˆ v2 1 K x mxx .
Note that ˆ K 1b
1 x
mxy
where b is the ordinary least squares estimator b .
mxx
Econometrics | Chapter 16 | Measurement Error Models | Shalabh, IIT Kanpur
18
5. 0 is known
Suppose 0 is known a priori and E ( x) 0. Then
y 0 1
y 0
ˆ1
ˆ
y 0
x
m
ˆ 2 xy
ˆ 1
ˆ myy ˆ1mxy
2
u
mxy
ˆ v2 mxx .
ˆ 1
Note: In each of the cases 1-6, note that the form of the estimate depends on the type of available
information which is needed for the consistent estimator of the parameters. Such information can be
available from various sources, e.g., long association of the experimenter with the experiment, similar type
of studies conducted in the part, some external source etc.
unrealistic in the sense that when xi ' s are unobservable and unknown, it is difficult to know if they are fixed
or not. This can not be ensured even in repeated sampling that the same value is repeated. All that can be said
in this case is that the information, in this case, is conditional upon xi ' s . So assume that xi ' s are
1 n
2
yi 0 1 xi
1
n
2 xi xi 2
L 2
exp i 1
2
exp .
2 u 2 u2 2 v 2 v2
The log-likelihood is
n
y 1 xi
2
0
n i
n 1 n
x x .
2
L* ln L constant ln u2 i 1
ln v2 2
2 2
2 v
i i
2 u 2 i 1
The normal equations are obtained by partially differentiating L * and equating to zero as
L * 1 n
(I )
0
0 2
u
y
i 1
i 0 1 xi 0
L * 1 n
( II )
1
0 2
v
y
i 1
i 0 1 xi xi 0
L * n 1 n
y 1 xi 0
2
( III ) 0 2 4
u 2 u 2 u
2 i 0
i 1
L * n 1 n
x x
2
( IV ) 0 2 4 0
v
2
2 v 2 v
i i
i 1
L * 1
(V ) 0 2 yi 0 1 xi 2 ( xi xi ) 0.
xi u v
u i v i
Using the left-hand side of equation (III) and right-hand side of equation (IV), we get
n12 n
2
u v2
u
1
v
Econometrics | Chapter 16 | Measurement Error Models | Shalabh, IIT Kanpur
20
which is unacceptable because can be negative also. In the present case, as u 0 and v 0 , so will
always be positive. Thus the maximum likelihood breaks down because of insufficient information in the
model. Increasing the sample size n does not solve the purpose. If the restrictions like u2 known, v2
u2
known or known are incorporated, then the maximum likelihood estimation is similar to as in the case of
v2
structural form and the similar estimates may be obtained. For example, if u2 / v2 is known, then
substitute it in the likelihood function and maximize it. The same solution as in the case of structural form
are obtained.