Chapter11 Regression Autocorrelation

Chapter 11
Autocorrelation
One of the basic assumption in linear regression model is that the random error components or disturbances
y X + u , it is assumed that
are identically and independently distributed. So in the model=
u2 if s = 0
E (ut , ut s ) =
0 if s 0
i.e., the correlation between the successive disturbances is zero.
ut s ) =
In this assumption, when E (ut , = 2
u, s 0 is violated, i.e., the variance of disturbance term does not
remains constant, then problem of heteroskedasticity arises. When E (ut , ut=

s ) 0, s 0 is violated, i.e., the
variance of disturbance term remains constant though the successive disturbance terms are correlated, then
such problem is termed as problem of autocorrelation.
When autocorrelation is present, some or all off diagonal elements in E (uu ') are nonzero.
Sometimes the study and explanatory variables have a natural sequence order over time, i.e., the data is
collected with respect to time. Such data is termed as time series data. The disturbance terms in time series
data are serially correlated.
The autocovariance at lag s is defined as

s = E (ut , ut s ); s = 0, 1, 2,... .
At zero lag, we have constant variance, i.e.,
0 E=
= (ut2 ) 2 .
The autocorrelation coefficient at lag s is defined as

E (ut ut s )
s = = s ; s = 0, 1, 2,...
Var (ut )Var (ut s ) 0
Assume s and s are symmetrical in s , i.e., these coefficients are constant over time and depend only on
length of lag s. The autocorrelation between the successive terms (u2 and u1 ) ,
(u3 and u2 ),..., (un and un 1 ) gives the autocorrelation of order one, i.e., 1 . Similarly, the autocorrelation
Regression Analysis | Chapter 11 | Autocorrelation | Shalabh, IIT Kanpur
1
between the successive terms (u3 and u1 ), (u4 and u2 )...(un and un 2 ) gives the autocorrelation of order two,
i.e., 2 .
Source of autocorrelation
Some of the possible reasons for the introduction of autocorrelation in the data are as follows:
1. Carryover of effect, atleast in part, is an important source of autocorrelation. For example, the
monthly data on expenditure on household is influenced by the expenditure of preceding month. The
autocorrelation is present in cross-section data as well as time series data. In the cross-section data,
the neighboring units tend to be similar with respect to the characteristic under study. In time series
data, the time is the factor that produces autocorrelation. Whenever some ordering of sampling units
is present, the autocorrelation may arise.
2. Another source of autocorrelation is the effect of deletion of some variables. In regression modeling,
it is not possible to include all the variables in the model. There can be various reasons for this, e.g.,
some variable may be qualitative, sometimes direct observations may not be available on the
variable etc. The joint effect of such deleted variables gives rise to autocorrelation in the data.
3. The misspecification of the form of relationship can also introduce autocorrelation in the data. It is
assumed that the form of relationship between study and explanatory variables is linear. If there are
log or exponential terms present in the model so that the linearity of the model is questionable then
this also gives rise to autocorrelation in the data.
4. The difference between the observed and true values of variable is called measurement error or
errorsin-variable. The presence of measurement errors on the dependent variable may also
introduce the autocorrelation in the data.

2
Structure of disturbance term:
Consider the situation where the disturbances are autocorrelated,
0 1 n 1
0 n 2
E ( ') = 1

n 1 n2 0
1 1 n 1
1 n 2
= 0 1

n 1 n2 1
1 1 n 1
1 n 2
2
= u 1
.

n 1 n2 1
Observe that now there are (n + k ) parameters- 1 , 2 ,..., k , u2 , 1 , 2 ,..., n 1. These (n + k ) parameters are
to be estimated on the basis of available n observations. Since the number of parameters are more than the
number of observations, so the situation is not good from the statistical point of view. In order to handle the
situation, some special form and the structure of the disturbance term is needed to be assumed so that the
number of parameters in the covariance matrix of disturbance term can be reduced.
The following structures are popular in autocorrelation:

1. Autoregressive (AR) process.
2. Moving average (MA) process.
3. Joint autoregression moving average (ARMA) process.
Estimation under the first order autoregressive process:

Consider a simple linear regression model
yt = 0 + 1 X t + ut , t =1, 2,..., n.
Assume ui ' s follow a first order autoregressive scheme defined as
ut ut 1 + t
=
where < 1, E ( t ) =
0,

3
2 if s = 0
E ( t , t + s ) =
0 if s 0
for all t = 1, 2,..., n where is the first order autocorrelation between ut and ut 1 , t = 1, 2,..., n. Now
ut ut 1 + t
=
= (ut 2 + t 1 ) + t
=
t t 1 + 2 t 2 + ...
=+

= r t r
r =0
E (ut ) = 0
E ( t2 ) + 2 E ( t21 ) + 4 E ( t2 2 ) + ...
E (ut2 ) =
=(1 + 2 + 4 + ....) 2 ( t' s are serially independent)
2
E (ut=
2
) =
2
for all i.
1 2
u
E (ut ut =
1) E ( t + t 1 + 2 t 2 + ...) ( t 1 + t 2 + 2 t 3 + ...)
= E { t + ( t 1 + t 2 + ...)}{ t 1 + t 2 + ...}
= E ( t 1 + t 2 + ...)
2

= u .
2
Similarly,
E (ut ut 2 ) = 2 u2 .
In general,
E (ut ut s ) = s u2
1 2 n 1

1 n2
.
E (uu ') = = u 2
2
1 n 3

n 1 n2
n 3 1

Note that the distribution terms are no more independent and E (uu ') 2 I . The disturbance are
nonspherical.
4
Consequences of autocorrelated disturbances:
Consider the model with first order autoregressive disturbances
y X + u
=
n1 nk k 1 n1
ut = ut 1 + t , t = 1, 2,..., n
with assumptions
E (u ) = 0, E (uu ') =
2 if s = 0
= E ( t ) 0,=
E ( t t + s )
0 if s 0
where is a positive definite matrix.
The ordinary least squares estimator of is
b = ( X ' X ) 1 X ' y
= ( X ' X ) 1 X '( X + u )
b = ( X ' X ) 1 X ' u
E (b ) =0.
So OLSE remains unbiased under autocorrelated disturbances.
The covariance matrix of b is

V (b) =E (b )(b ) '
= ( X ' X ) 1 X ' E (uu ') X ( X ' X ) 1
= ( X ' X ) 1 X ' X ( X ' X ) 1
u2 ( X ' X ) 1.
The residual vector is

e =y Xb =Hy =Hu
=
e ' e y=
' Hy u ' Hu
(e ' e) E (u ' u ) E u ' X ( X ' X ) 1 X ' u
E=
n u2 tr ( X ' X ) 1 X ' X .
=
e 'e
Since s 2 = , so
n 1
u2 1
E (s 2 ) = tr ( X ' X ) 1 X ' X ,
n 1 n 1
so s 2 is a biased estimator of 2 . In fact, s 2 has downward bias.
5
Application of OLS fails in case of autocorrelation in the data and leads to serious consequences as
overly optimistic view from R 2 .
narrow confidence interval.
usual t -ratio and F ratio tests provide misleading results.
prediction may have large variances.
Since disturbances are nonspherical, so generalized least squares estimate of yields more efficient
estimates than OLSE.
The GLSE of is
=
( X ' 1 X ) 1 X ' 1 y
E ( ) =
=
V ( ) 2 ( X ' 1 X ) 1.
u
The GLSE is best linear unbiased estimator of .
Durbin Watson test:

The Durbin-Watson (D-W) test is used for testing the hypothesis of lack of first order autocorrelation in the
disturbance term. The null hypothesis is
H0 : = 0
Use OLS to estimate in =

y X + u and obtain residual vector
e =y Xb =Hy
where b= ( X ' X ) 1 X ' y, H = I X ( X ' X ) 1 X '.
The D-W test statistic is

n
(e e )
2
t t 1
d= t =2
n
e
t =1
2
t
n n n
et2 et 1 et et 1
=n
=t 2=t 2
+ n 2 n
=t 2
.
et et
2
=t 1 =t 1
2
et 2
=t 1

6
For large n,
d 1 + 1 2r
d 2(1 r )
where r is the sample autocorrelation coefficient from residuals based on OLSE and can be regarded as the
regression coefficient of et on et 1 . Here
positive autocorrelation of et s d < 2
negative autocorrelation of et s d > 2
zero autocorrelation of et s d 2
As 1 < r <1, so
if 1 < r < 0, then 2 < d < 4 and
if 0 < r <1, then 0 < d < 2.
So d lies between 0 and 4.
Since e depends on X , so for different data sets, different values of d are obtained. So the sampling
distribution of d depends on X . Consequently exact critical values of d cannot be tabulated owing to
their dependence on X . Durbin and Watson therefore obtained two statistics d and d such that
d <d <d
and their sampling distributions do not depend upon X .
Considering the distribution of d and d , they tabulated the critical values as d L and dU respectively.
They prepared the tables of critical values for 15 < n < 100 and k 5. Now tables are available for
6 < n < 200 and k 10.
The test procedure is as follows:
H0 : = 0
Nature of H1 Reject H 0 when Retain H 0 when The test is inconclusive
when
H1 : > 0 d < dL d > dU d L < d < dU
H1 : < 0 d > (4 d L ) d < (4 dU ) ( 4 dU ) < d < (4 d L )
H1 : 0 d < d L or dU < d < (4 dU ) d L < d < dU
d > (4 d L ) or
(4 dU ) < d < (4 d L )
Values of d L and dU are obtained from tables.
7
Limitations of D-W test
1. If d falls in the inconclusive zone, then no conclusive inference can be drawn. This zone becomes
fairly larger for low degrees of freedom. One solution is to reject H 0 if the test is inconclusive. A
better solutions is to modify the test as
Reject H 0 when d < dU .
Accept H 0 when d dU .
This test gives satisfactory solution when values of xi s change slowly, e.g., price, expenditure
etc.
2. The D-W test is not applicable when intercept term is absent in the model. In such a case, one can use
another critical values, say d M in place of d L . The tables for critical values d M are available.
3. The test is not valid when lagged dependent variables appear as explanatory variables. For example,
yt 1 yt 1 + 2 yt 2 + .... + r yt r + r +1 xt1 + ... + k xt ,k r + ut ,
=
ut ut 1 + t .
=
In such case, Durbins h test is used which is given as follows.
Durbins h-test
Apply OLS to
yt 1 yt 1 + 2 yt 2 + .... + r yt r + r +1 xt1 + ... + k xt ,k r + ut ,
=
ut ut 1 + t
=
(b ). Then the Dubins h -
and find OLSE b1 of 1. Let its variance be Var (b1 ) and its estimator is Var 1
statistic is
n
h=r
(b )
1 n Var 1
which is asymptotically distributed as N (0,1) and

n
e e t t 1
r= t =2
n
.
e
t =2
2
t

8
(b ) < 0, then test breaks down. In such cases, the
This test is applicable when n is large. When 1 nVar
1
following test procedure can be adopted.

Introduce a new variable t 1 to
= ut ut 1 + t . Then
et t 1 + yt .
=
Now apply OLS to this model and test H 0 A : = 0 versus H1 A 0 using t -test . It H 0 A is accepted then
accept H 0 : = 0.
If H 0 A : = 0 is rejected, then reject H 0 : = 0.
4. If H 0 : = 0 is rejected by D-W test, it does not necessarily mean the presence of first order
autocorrelation in the disturbances. It could happen because of other reasons also, e.g.,
distribution may follows higher order AR process.
some important variables are omitted .
dynamics of model is misspecified.
functional term of model is incorrect.
Estimation procedures with autocorrelated errors when autocorrelation coefficient is

known
Consider the estimation of regression coefficient under first order autoregressive disturbances and
autocorrelation coefficient is known. The model is
y X + u,
=
ut ut + t
=
and assume that E (u ) = 2) I , E ( ) =

0, E (uu ') = 0, E ( ') =
2 I .
The OLSE of is unbiased but not, in general, efficient and estimate of 2 is biased. So we use
generalized least squares estimation procedure and GLSE of is
= ( X ' 1 X ) 1 X ' 1 y
where

9
1 0 0 0
1 + 2 0 0

0 1+ 2 0 0
=
1
.

0 0 0 1+ 2

0 0 0 1
To employ this, we proceed as follows:

1. Find a matrix P such that P ' P = 1. In this case
1 2 0 0 0 0

1 0 0 0
0 1 0 0
P= .

0 0 0 1 0
0 0 0 1
2. Transform the variables as

= =
y* Py =
, X * PX , * P .
Such transformation yields
1 2 y 1 2 1 2 x12 1 2 x1k
1

y2 y1 1 x22 x12 x2 k x1k
y y ,
y* =
3 2 X* = 1 x32 x22 x3k x2 k .

yn yn 1 1 xn 2 xn 1,2 , xn xn 1
Note that the first observation is treated differently then other observations. For first observation,
( ) (
1 2 y1 = )
1 2 x1' + ( )
1 2 u1
whereas for other observations

yt = yt 1 = ( xt xt 1 ) ' + (ut ut 1 ) ; t = 2,3,..., n
where xt' is a row vector of X . Also, 1 2 u1 and (u1 u0 ) have same properties. So we expect
these two errors to be uncorrelated and homoscedastic.

10
If first column of X is a vector of ones, then first column of X * is not constant. Its first element is
1 2.
Now employ OLSE with observations y * and X * , then the OLSE of is
* = ( X *' X *) 1 X *' y*,

its covariance matrix is
( ) = 2 ( X *' X *) 1
Var
= 2 ( X ' 1 X ) 1
and its estimator is
V ( ) = 2 ( X ' 1 X ) 1
where
( y X ) ' 1 ( y X )
2 = .
nk
Estimation procedures with autocorrelated errors when autocorrelation coefficient is

unknown
Several procedure have been suggested to estimate the regression coefficients when autocorrelation
coefficient is unknown. The feasible GLSE of is
F =
( X ' 1 X ) 1 X '
1 y
1 is the 1 matrix with replaced by its estimator .

where
1. Use of sample correlation coefficient

Most common method is to use the sample correlation coefficient r as the natural estimator of . The
sample correlation can be estimated using the residuals in place of disturbances as
n
e e t t 1
r= t =2
n
e
t =2
2
t
where et = 1, 2,..., n and b is OLSE of .

yt xt' b, t =

11
Two modifications are suggested for r which can be used in place of r .
nk
1. r* = r is the Theils estimator.
n 1
d
2. r **= 1 for large n where d is the Durbin Watson statistic for H 0 : = 0 .
2
2. Durbin procedure:
In Durbin procedure, the model
yt yt 1 = 0 (1 ) + ( xt xt 1 ) + t , t = 2,3,..., n
is expressed as
yt = 0 (1 ) + yt 1 + x1 xt 1 + t
= 0* + yt 1 + xt + * xt 1 + t , t =2,3,..., n (*)
where 0* =
0 (1 ), * =
.
Now run regression using OLS to model (*) and estimate r * as the estimated coefficient of yt 1.
Another possibility is that since (1,1) , so search for a suitable which has smaller error sum of
squares.
3. Cochran Orcutt procedure:

This procedure utilizes P matrix defined while estimating when is known. It has following steps:
(i) Apply OLS to yt = 0 + 1 xt + ut and obtain residual vector e .

n
e e t t 1
(ii) Estimate by r = t =2
n
.
e
t =2
2
t 1
Note that r is a consistent estimator of .

(iii) Replace by r is
yt yt 1 = 0 (1 ) + ( xt xt 1 ) + t
and apply OLS to transformed model
yt ryt 1 = 0* + ( xt rxt 1 ) + disturbance term
and obtain estimators of 0* and as 0* and respectively.

This is Cochran Orcutt procedure. Since two successive applications of OLS are involved, so it is also
called as two-step procedure.

12
This application can be repeated in the procedure as follows:
(I) Put 0* and in original model.
(II) Calculate the residual sum of squares.
n
e e t t 1
(III) Calculate by r = t =2
n
and substitute it is the model
e
t =2
2
t 1
yt yt 1 = 0 (1 ) + ( xt xt 1 ) + t
and again obtain the transformed model.
(IV) Apply OLS to this model and calculate the regression coefficients.
This procedure is repeated until convergence is achieved, i.e., iterate the process till the two successive
estimates are nearly same so that stability of estimator is achieved.
This is an iterative procedure and is numerically convergent procedure. Such estimates are asymptotically
efficient and there is a loss of one observation.
4. Hildreth-Lu procedure or Grid-search procedure:

The Hilreth-Lu procedure has following steps:
(i) Apply OLS to
( yt yt 1 ) = 0 (1 ) + ( xt xt 1 ) + t , t = 2,3,..., n
using different values of (1 1) such as = 0.1, 0.2,... .

(ii) Calculate residual sum of squares in each case.
(iii) Select that value of for which residual sum of squares is smallest.
Suppose we get = 0.4. Now choose a finer grid. For example, choose such that 0.3 < < 0.5 and
consider = 0.31, 0.32,..., 0.49 and pick up that with smallest residual sum of squares. Such iteration
can be repeated until a suitable value of corresponding to minimum residual sum of squares is obtained.
The selected final value of can be used and for transforming the model as in the case of Cocharan-Orcutt
procedure. The estimators obtained with this procedure are as efficient as obtained by Cochran-Orcutt
procedure and there is a loss of one observation.

13
5. Prais-Winston procedure
This is also an iterative procedure based on two step transformation.
n
e e t t 1
(i) Estimate by = t =2
n
where et s are residuals based on OLSE.
e
t =3
2
t 1
(ii) Replace by is the model as in Cochran-Orcutt procedure
( ) (
1 2 y1 = )
1 2 0 + ( ) (
1 2 xt + )
1 2 ut
yt yt 1 =(1 ) 0 + ( xt xt 1 ) + (ut ut 1 ), t =2,3,..., n.
(iii) Use OLS for estimating the parameters.

The estimators obtained with this procedure are asymptotically as efficient as best linear unbiased
estimators. There is no loss of any observation.
(6) Maximum likelihood procedure

Assuming that y ~ N ( X , 2 ), the likelihood function for , and 2 is
1 1
=L exp 2 ( y X ) ' 1 ( y X ) .
2
n
( 2 )
n
2 2
2
1
Ignoring the constant and using = , the log-likelihood is
1 2
n 1 1
ln L( , 2 , ) =
ln L = ln 2 + ln(1 2 ) 2 ( y X ) ' 1 ( y X ) .
2 2 2
The maximum likelihood estimators of , and 2 can be obtained by solving the normal equations
ln L ln L ln L
= 0,= 0,= 0.
2
There normal equations turn out to be nonlinear in parameters and can not be easily solved.
One solution is to
- first derive the maximum likelihood estimator of 2 .
- Substitute it back into the likelihood function and obtain the likelihood function as the function of
and .
- Maximize this likelihood function with respect to and .

14
Thus
ln L n 1
= 0 2 + 2 ( y X ) ' 1 ( y X ) = 0
2
2 2
1
2 = ( y X ) ' 1 ( y X )
n
is the estimator of 2 .
Substituting 2 in place of 2 in the log-likelihood function yields
n 1 1 n
ln L *( , ) =
ln L* = ln ( y X ) ' 1 ( y X ) + ln(1 2 )
2 n 2 2
n
ln {( y X ) ' 1 ( y X )} ln(1 2 ) + k
1
=
2 n

n ( y X ) ' 1 ( y X )
= k ln
2 1
(1 2 ) n
n n
=
where k ln n .
2 2
Maximization of ln L * is equivalent to minimizing the function
( y X ) ' 1 ( y X )
1
.
(1 )
2 n
Using optimization techniques of non-linear regression, this function can be minimized and estimates of
and can be obtained.
If n is large and is not too close to one, then the term (1 2 ) 1/ n is negligible and the estimates of
will be same as obtained by nonlinear least squares estimation.

15

Chapter11 Regression Autocorrelation

Uploaded by

Copyright:

Available Formats

Chapter11 Regression Autocorrelation

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter11 Regression Autocorrelation

Uploaded by

Copyright:

Available Formats

Chapter 11

remains constant, then problem of heteroskedasticity arises. When E (ut , ut=

The autocovariance at lag s is defined as

The autocorrelation coefficient at lag s is defined as

Regression Analysis | Chapter 11 | Autocorrelation | Shalabh, IIT Kanpur

The following structures are popular in autocorrelation:

Estimation under the first order autoregressive process:

Assume ui ' s follow a first order autoregressive scheme defined as

Regression Analysis | Chapter 11 | Autocorrelation | Shalabh, IIT Kanpur

The ordinary least squares estimator of is

The covariance matrix of b is

The residual vector is

The GLSE is best linear unbiased estimator of .

Durbin Watson test:

Use OLS to estimate in =

where b= ( X ' X ) 1 X ' y, H = I X ( X ' X ) 1 X '.

The D-W test statistic is

Regression Analysis | Chapter 11 | Autocorrelation | Shalabh, IIT Kanpur

positive autocorrelation of et s d < 2

negative autocorrelation of et s d > 2

which is asymptotically distributed as N (0,1) and

Regression Analysis | Chapter 11 | Autocorrelation | Shalabh, IIT Kanpur

following test procedure can be adopted.

If H 0 A : = 0 is rejected, then reject H 0 : = 0.

Estimation procedures with autocorrelated errors when autocorrelation coefficient is

and assume that E (u ) = 2) I , E ( ) =

Regression Analysis | Chapter 11 | Autocorrelation | Shalabh, IIT Kanpur

To employ this, we proceed as follows:

2. Transform the variables as

whereas for other observations

Regression Analysis | Chapter 11 | Autocorrelation | Shalabh, IIT Kanpur

Now employ OLSE with observations y * and X * , then the OLSE of is

* = ( X *' X *) 1 X *' y*,

Estimation procedures with autocorrelated errors when autocorrelation coefficient is

1 is the 1 matrix with replaced by its estimator .

1. Use of sample correlation coefficient

where et = 1, 2,..., n and b is OLSE of .

Regression Analysis | Chapter 11 | Autocorrelation | Shalabh, IIT Kanpur

3. Cochran Orcutt procedure:

(i) Apply OLS to yt = 0 + 1 xt + ut and obtain residual vector e .

Note that r is a consistent estimator of .

and obtain estimators of 0* and as 0* and respectively.

Regression Analysis | Chapter 11 | Autocorrelation | Shalabh, IIT Kanpur

4. Hildreth-Lu procedure or Grid-search procedure:

using different values of (1 1) such as = 0.1, 0.2,... .

Regression Analysis | Chapter 11 | Autocorrelation | Shalabh, IIT Kanpur

(ii) Replace by is the model as in Cochran-Orcutt procedure

(iii) Use OLS for estimating the parameters.

(6) Maximum likelihood procedure

Regression Analysis | Chapter 11 | Autocorrelation | Shalabh, IIT Kanpur

Substituting 2 in place of 2 in the log-likelihood function yields

Maximization of ln L * is equivalent to minimizing the function

will be same as obtained by nonlinear least squares estimation.

Regression Analysis | Chapter 11 | Autocorrelation | Shalabh, IIT Kanpur

You might also like

* = ( X ' X ) 1 X ' y,