12 Autocorrelation
12 Autocorrelation
12 Autocorrelation
Definition:
E ( μi μ j ) =0 i≠ j
Reasons of autocorrelation:
Inertia:
A salient feature of the most economic time series is inertia, or sluggishness. As is well
known, time series such as GNP, price indexes, production, employment, and unemployment
exhibit cycles. Starting at the bottom of the recession, when economic recovery starts, most
of the series start moving upward. In this upswing, the value of a series at point in time is
greater than its previous value. Thus there is a momentum built into them, it continues until
something happens to slow down. Therefore, in regressions involving time series data,
successive observations are likely to be interdependent.
Y t =β 1+ β2 X 2 t + β 3 X 3 t + β 4 X 4 t + μt …………………….(1)
Now if eq. (1) is the correct model or true relation, running (2) is tantamount to letting ϑ t =
β 4 X 4 t + μt . And to the extent the price of pork affects the consumption of beef; the error of
disturbance term ϑ will reflect a systematic pattern, thus creating autocorrelation.
True model
Marginal cost of B
production
Linear
A
model
Output
The marginal cost curve corresponding to the true model is shown in the above figure along
with the incorrect linear cost curve. Figure shows that, between points A and B the linear
marginal cost curve will consistently overestimate the true marginal cost, whereas beyond
these points it will consistently underestimate the true marginal cost.
This result to be expected, because the disturbance term ϑ i is, in fact, will equal to β 2 output
2 2
i+µi, and hence catch the systematic effect of the output term on marginal cost. In this case,
Cobweb Phenomenon:
The supply of many agricultural commodities reflects the so-called cobweb phenomenon,
where supply reacts to the price with a lag of one time period because supply decisions take
time to implement. Thus, at the beginning of this year’s planting of crops, farmers are
influenced by the price prevailing last year, so that their supply function is
Supplyt =β 1+ β 2 P t−1 + μt
Suppose at the end of period t, price Pt turns out to be lower than Pt-1. Therefore, in period t+1
farmers may very well decide to produce less than they did in time period t. Obviously, in
this situation the disturbances μt are not expected to be random because if the farmer
overproduce in year t, they are likely to reduce their production in t+1, and so on, leading to a
cobweb pattern.
Lags:
In a time series regression of consumption expenditure on income, it is not uncommon to find
that the consumption expenditure in the current periods depends, among other things , on the
consumptions expenditure of the previous period. That is,
Consumers do not change their habits readily for psychological, technological, or institutional
reasons. Now if we neglect the lagged term, the resulting error term will reflect a systematic
pattern due to the influence of lagged consumption on current consumption.
Nonstationary:
A time series is stationary if its characteristics (e.g., mean variance, covariance) are time
invariant; that is, they do not change over time. If that is not the case, we have a
nonstationary time series. In a two variable regression model, it is quite possible that both Y
and X are are nonstationary and therefore the error is also non stationary. In that case, the
error term will exhibit autocorrelation.
Y t =β 1+ β2 X t + μ t … … ..(1)
To make any headway, we must assume the mechanism that generates μt for E(μ t , μ t+ s)≠ 0 (
s ≠ 0 ¿ is too general an assumption to be of any practical use. As a starting point, or first
approximation, one can assume that the disturbance, or error terms are generated by the
following mechanism.
E ( ε t ) =0
2
V ( ε t )=σ ε
Cov ( ε t , ε t + s )=0 s ≠ 0
In the engineering literature, an error term with the preceding properties is often called a
white noise error term. The equation (2) postulates that the value of the disturbance term in
period t is equal to ρ times its value in the previous period plus a purely random error term.
The scheme (1) is known as a Markov first –order autoregressive scheme, or simply first
order autoregressive scheme, usually denoted as AR (1). Note that ρ, the coefficient of
autocovariance in (1), can also be interpreted as the first order coefficient of autocorrelation,
or more accurately, coefficient of autocorrelation at lag 1. Given the AR (1) scheme, it can be
shown that
2
σε
v ( μt )=E ( μ t )=
2
2
1− ρ
2
s σε
Cov ( μ t , μ t+ s )=E ( μ t μ t+ s )= ρ 2
1− ρ
s
Cor ( μ t , μ t +s )= ρ
One reason we use AR(1) process is not only because of its simplicity compared to higher
order AR schemes, but also because in many applications it has proved to be quite useful.
Now for the two variable regression model we know that
^β = ∑ t t
x y
2
∑ x2t
and its variance is given by
2
^ σ
v ( β2 ) =
∑ x2t
where the small letters as usual denote deviation from the mean values.
Now under the AR (1) scheme, it can be shown that the variance of this estimator is
v ( ^β 2) AR(1)=
σ
2
[1+2 ρ
∑ x t x t −1 +2 ρ2 ∑ x t x t−2 + … … .+ 2 ρ n−1 ∑ xt x n ]where v ( ^β )
∑ x 2t ∑ x 2t ∑ x 2t ∑ x2t 2 AR(1)
For the regression model Y t =β 1+ β2 X t + μ t and assuming the AR(1) process, we can show
that the BLUE estimator of β 2 is given by the following expression
n
∑ ( x t −ρ x t−1 ) ( y t −ρ y t −1 )
^β GLS
2 =
t =2
+C
n
∑ ( x t −ρ x t−1 ) 2
t=2
where C is a correction factor that may be disregarded in practice. And its variance is given
by
σ2
v ( ^β2 )= n
GLS
+D
∑ ( x t −ρ x t−1 )2
t=2
In the presence of autocorrelation, the OLS estimators are still linear unbiased as well as
consistent and asymptotically normally distributed, but they are no longer efficient.
4. Therefore, the usual t and F test of significance are no longer valid, and if applied, are
likely to give seriously misleading conclusions about the statistical significance of the
estimated regression coefficients.
RUN TEST:
Under the null hypothesis that successive outcomes are independent and assuming N1> 10
andN2> 10, the number of run is distributed normally with
2 N1 N2
Mean: E ( R )= +1
N
2 2 N 1 N 2(2 N 1 N 2−N )
Variance: σ R = 2
N (N −1)
If the null hypothesis of randomness is sustainable, following the properties of the normal
distribution, we should expect that
That is, the probability is 95% that the preceding interval will include R. So
Do not reject the null hypothesis of randomness with 95% confidence if R, the number of
runs, lies in the preceding confidence interval; reject the null hypothesis if the estimated R
lies outside these limits.
∑ (^μt −^μt−1 )2
d= t =2 t=n …………………………………(1)
∑ ^μ 2
t
t=1
Which is simply the ratio of the sum of squared differences in successive residuals to the
RSS.
Assumptions:
Durbin Watson was successful in deriving a lower bound d L and an upper bound du such that
if the computed d lies outside of these critical values, a decision can be made regarding the
presence of auto correlation.
Decision rule:
2 4- 4- dL
0 dL du 4
H0: No positive autocorrelation
H0*: No negative autocorrelation
Y t =β 1+ β2 X t + μ t … … ..(1)
and assume that the error term follows the AR(1) scheme , namely,
μt =ρ μt−1 + ε t−1 ≤ ρ ≤1
When ρ is known:
If the coefficient of the first order autocorrelation is known, the problem of autocorrelation
can be easily solved. If equation no.1 holds true at time t-1. Hence
Y t −1=β 1+ β 2 X t −1 + μt −1 … … … (2)
Multiplying equation (2) by ρ on both sides, we obtain
ρ Y t −1 =ρ β 1 + ρ β 2 X t−1 + ρ μ t−1 … … …(3)
Subtracting (3) from (1) we get
( Y t −ρ Y t −1 ) =β 1 ( 1− ρ )+ β2 ( X t −ρ X t−1 ) +(μ t−ρ μt −1)……………..(4)
( Y t −ρ Y t −1 ) =β 1 ( 1− ρ )+ β2 ( X t −ρ X t−1 ) +ε t where ε t=μt −ρ μt −1
We can express (4) as
¿ ¿ ¿ ¿
Y t =β 1+ β2 X t + ε t
Where
¿ ¿ ¿ ¿
β 1=β 1 ( 1−ρ ) ,Y t =( Y t −ρ Y t −1) , X t =( X t− ρ X t−1 ) ,∧β 2=β 2
Since the error term in the equation (4) satisfies the usual OLS assumptions, we can apply
OLS to the transformed variables and obtain estimators with all the optimum properties
namely BLUE.
When ρ is not known:
The method of generalized difference is difficult to implement because ρ is rarely known in
practice. Therefore, we need to find a ways of estimating ρ. We have several possibilities.
The first difference method:
Since ρ lies between 0 and ±1, one could start from two extreme positions. In one extreme,
one could assume that ρ=0, that is, no (first order ) serial correlation, and at the other extreme
we could let ρ=±1, that is perfect positive or negative correlation. As a matter of fact, when a
regression is run, one generally assumes that there is no autocorrelation and then lets the
Durbin –Watson or other test show whether this assumption is justified. If however, ρ=+1,
the generalized difference equation (4) reduce to the first difference equation:
( Y t −Y t −1) =β 2 ( X t− X t −1 ) +(μt −μt −1 )
∆ Y t=β 2 ∆ X t + ε t ……………………….(5)
Since the error term in (5) is free from autocorrelation, to run the regression (5) all one has to
do is form a first differences of both the regressand and regressor(s) and run the regression on
these first difference.
This transformation may be appropriate if the coefficient of autocorrelation is very high, say
in excess of .8, or the Durbin-Watson d is quite low. Use first difference method form
whenever d<R2 (proposed by Maddala). An interesting feature of the (5) is that there is no
intercept in it. Hence to estimate (5) one have to use the regression through origin routine. If,
however, you forget to drop the intercept term in the model and estimate the following model
that includes the intercept term
∆ Y t=β 1 + β 2 ∆ X t + ε t
Then the original model must have a trend in it and β1 represents the coefficient of the trend
variable.