measurement error edit_removed
measurement error edit_removed
A fundamental assumption in all the statistical analysis is that all the observations are correctly measured. In
the context of multiple regression model, it is assumed that the observations on the study and explanatory
variables are observed without any error. In many situations, this basic assumption is violated. There can be
several reasons for such a violation.
For example, the variables may not be measurable, e.g., taste, climatic conditions, intelligence,
education, ability etc. In such cases, the dummy variables are used, and the observations can be
recorded in terms of values of dummy variables.
Sometimes the variables are clearly defined, but it is hard to take correct observations. For example,
the age is generally reported in complete years or in multiple of five.
Sometimes the variable is conceptually well defined, but it is not possible to take a correct
observation on it. Instead, the observations are obtained on closely related proxy variables, e.g., the
level of education is measured by the number of years of schooling.
Sometimes the variable is well understood, but it is qualitative in nature. For example, intelligence is
measured by intelligence quotient (IQ) scores.
In all such cases, the true value of the variable can not be recorded. Instead, it is observed with some error.
The difference between the observed and true values of the variable is called as measurement error or
errors-in-variables.
where y is a n 1 vector of true observation on study variable, X is a n k matrix of true observations
on explanatory variables and is a k 1 vector of regression coefficients. The value y and X are not
observable due to the presence of measurement errors. Instead, the values of y and X are observed with
additive measurement errors as
y y u
X X V
where y is a n 1 vector of observed values of study variables which are observed with (n 1)
which are observed with n k matrix V of measurement errors in X . In such a case, the usual disturbance
term can be assumed to be subsumed in u without loss of generality. Since our aim is to see the impact of
measurement errors, so it is not considered separately in the present case.
We assume that
E (u ) 0, E (uu ') 2 I
E (V ) 0, E (V 'V ) , E (V ' u ) 0.
Suppose we ignore the measurement errors and obtain the OLSE. Note that ignoring the measurement errors
in the data does not mean that they are not present. We now observe the properties of such an OLSE under
the setup of measurement error model.
b X ' X X ' X
1
X ' X X '
1
E b E X ' X X '
1
X ' X X ' E ( )
1
0
as X is a random matrix which is correlated with . So b becomes a biased estimator of .
1 1 1
X ' X ' V '
n n n
1 1
X '(u V ) V '(u V )
n n
0.
Thus b is an inconsistent estimator of . Such inconsistency arises essentially due to correlation between
X and .
Note: It should not be misunderstood that the OLSE b X ' X X ' y is obtained by minimizing
1
S ' y X ' y X in the model y X . In fact ' cannot be minimized as in the case of
To see the nature of consistency, consider the simple linear regression model with measurement error as
yi 0 1 xi , i 1, 2,..., n
yi yi ui
xi xi vi .
Now
1 x1 1 x1 0 v1
1 x2 1 x2 0 v2
X , X , V
1 xn 1 xn 0 vn
and assuming that
1 n
plim xi
n i 1
1 n
plim ( xi ) 2 x2 ,
n i 1
we have
Also,
1
vv plim V 'V
n
0 0
2
.
0 v
Now
plim b xx vv vv
1
1
b 0 1 0 0 0
plim 0 2 2
b1 1 x v 0 v 1
2 2
1 x2 2 v2 0
2
x v2 2
1 v
v2
2 1
v x2
.
2
2 v 2 1
x v
Thus we find that the OLSEs of 0 and 1 are biased and inconsistent. So if a variable is subjected to
measurement errors, it not only affects its own parameter estimate but also affect other estimator of
parameter that are associated with those variable which are measured without any error. So the presence of
measurement errors in even a single variable not only makes the OLSE of its own parameter inconsistent but
also makes the estimates of other regression coefficients inconsistent which are measured without any error.
1. Functional form: When the xi ' s are unknown constants (fixed), then the measurement error model is
2. Structural form: When the xi ' s are identically and independently distributed random variables, say, with
mean and variance 2 2 0 , the measurement error model is said to be in the structural form.
3. Ultrastructural form: When the xi ' s are independently distributed random variables with different
means, say i and variance 2 2 0 , then the model is said to be in the ultrastructural form. This form is
a synthesis of function and structural forms in the sense that both the forms are particular cases of
ultrastructural form.