Sources of Autocorrelation
Sources of Autocorrelation
Sources of Autocorrelation:
The natural question is: Why does serial correlation occur? There are several
reasons, some of which are as follows:
Inertia. A salient feature of most economic time series is inertia, or sluggishness.
As is well known, time series such as GNP, price indexes, production, employment,
and unemployment exhibit (business) cycles.
Starting at the bottom of the recession, when economic recovery starts, most of these
series start moving upward. In this upswing, the value of a series at one point in time
is greater than its previous value. Thus there is a "momentum" built into them, and
it continues until something happens (e.g., increase in interest rate or taxes or both)
to slow them down. Therefore, in regressions involving time series data, successive
observations are likely to be interdependent.
Now if (12.1.2) is the "correct" model or the "truth" or true relation, running (12.1.3)
is tantamount to letting 𝑣𝑡 = 𝛽4 𝑋4𝑡 + 𝑢𝑡 . And to the extent the price of pork affects
the consumption of beef, the error or disturbance term v will reflect a systematic
pattern, thus creating (false) autocorrelation. A simple test of this would be to run
both 12.1.2) and (12.1.3) and see whether autocorrelation, if any, observed in model
(12.1.3) disappears when (12.1.2) is run.5 The actual mechanics of detecting
autocorrelation will be discussed in Section 12.6 where we will show that a plot of
the residuals from regressions (12.1.2) and (12.1.3) will often shed considerable light
on serial correlation.
2
The marginal cost curve corresponding to the "true" model is shown in Figure 12.2
along with the "incorrect" linear cost curve.
As Figure 12.2 shows, between points A and B the linear marginal cost curve
will consistently overestimate the true marginal cost, whereas beyond these points it
will consistently underestimate the true marginal cost. This result is to be expected,
because the disturbance term 𝑣𝑖 , is, in fact, equal to outpu𝑡 2 + 𝑢𝑖 , and hence will
catch the systematic effect of the outpu𝑡 2 term on marginal cost. In this case 𝑣𝑖 , will
reflect autocorrelation because of the use of an incorrect functional form. In Chapter
13 we will consider several methods of detecting specification bias.
Suppose at the end of period t, price P, turns out to be lower than 𝑃𝑡−1 . Therefore, in
period t + 1 farmers may very well decide to produce less than they did in period t.
Obviously, in this situation the disturbances 𝑢𝑖 are not expected to be random
because if the farmers overproduce in year t, they are likely to reduce their
production in t + 1, and so on, leading to a Cobweb pattern.
3
A regression such as (12.1.7) is known as autoregression because one of the
explanatory variables is the lagged value of the dependent variable. (We shall study
such models in Chapter 17.) The rationale for a model such as (12.1.7) is simple.
Consumers do not change their consumption habits readily for psychological,
technological, or institutional reasons. Now if we neglect the lagged term in (12.1.7),
the resulting error term will reflect a systematic pattern due to the influence of lagged
consumption on current consumption.
𝑌𝑡 = 𝛽1 + 𝛽2 𝑋𝑡 + 𝑢𝑡 (12.1.8)
where, say, Y = consumption expenditure and X = income. Since (12.1.8) holds true
at every time period, it holds true also in the previous time
4
period, (t-1) . So, we can write (12.1.8) as
𝑌𝑡−1 , 𝑋𝑡−1 and 𝑢𝑡−1 are known as the lagged values of Y, X, and 𝑢𝑡 respectively,
here lagged by one period. We will see the importance of the lagged values later in
the chapter as well in several places in the text.
Now if we subtract (12.1.9) from (12.1.8), we obtain
5
The point of the preceding example is that sometimes autocorrelation may be
induced as a result of transforming the original model.
6
procedure. As Kmenta shows, this result is likely to be the case even if the
sample size increases indefinitely.14 That is, 𝛽̂2 is not asymptotically efficient.
The implication of this finding for hypothesis testing is clear. We are likely to
declare a coefficient statistically insignificant (i.e., not different from zero)
even though in fact (i.e., based on the correct GLS procedure) it may be. This
difference can be seen clearly from Figure 12.4. In this figure we show the
95% OLS [AR(1)] and GLS confidence intervals assuming that true 𝛽2 = 0.
Consider a particular estimate of 𝛽2 , say, 𝑏2 Since 𝑏2 lies in the OLS
confidence interval, we could accept the hypothesis that true 𝛽2 is zero with
95% confidence. But if we were to use the (correct) GLS confidence interval,
we could reject the null hypothesis that true 𝛽2 is zero, for 𝛽2 lies in the region
of rejection.
The message is: To establish confidence intervals and to test hypo- theses,
one should use GLS and not OLS even though the estimators derived
from the latter are unbiased and consistent. (However, see Section 12.11
later.)
The situation is potentially very serious if we not only use 𝛽̂2 , but also continue to
use var (𝛽̂2 ) = 𝜎 2 / ∑ 𝑥𝑡2 ) o²/x, which completely disregards the problem of
autocorrelation, that is, we mistakenly believe that the usual assumptions of the
classical model hold true. Errors will arise for the following reasons:
7
4. Therefore, the usual t and F tests of significance are no longer valid, and if applied,
are likely to give seriously misleading conclusions about the statistical significance
of the estimated regression coefficients.
There are various ways of examining the residuals. We can simply plot them
against time, the time sequence plot, as we have done in Figure 12.8, which
shows the residuals obtained from the wages-productivity regression (12.5.1).
The values of these residuals are given in Table 12.5 along with some other
data.
8
II. The Runs Test
If we carefully examine Figure 12.8, we notice a peculiar feature: Initially, we
have several residuals that are negative, then there is a series of positive
residuals, and then there are several residuals that are negative. If these
residuals were purely random, could we observe such a pattern? Intuitively, it
seems unlikely. This intuition can be checked by the so-called runs test,
sometimes also know as the Geary test, a nonparametric test.20
To explain the runs test, let us simply note down the signs (+ or -) of the
residuals obtained from the wages-productivity regression, which are given in
the first column of Table 12.5.
(-----------------)(++++++++++++++++++++++++) (----------------)
(12.6.1)
9
Now let
Then under the null hypothesis that the successive outcomes (here, residuals) are
independent, and assuming that N₁ > 10 and ₂ > 10, the number of runs is
(asymptotically) normally distributed with
2𝑁1 𝑁2
Mean: 𝐸(𝑅) = +1
𝑁
(12.6.2)
Note: N =𝑁1 𝑁2
If the null hypothesis of randomness is sustainable, following the proper- ties of the
normal distribution, we should expect that
That is, the probability is 95 percent that the preceding interval will include R.
Therefore we have this rule:
10
11