Cross Section: Combination Pooled Data Heteroscedasticity
Cross Section: Combination Pooled Data Heteroscedasticity
The three types of data available for empirical analysis include: (1) Cross section, (2) Time series,
and (3) Combination of cross section and time series, also known as pooled data.
The assumption of homoscedasticity (or equal error variance), may not be always tenable in
cross-sectional data, as this kind of data are often plagued by the problem of heteroscedasticity.
In cross-section studies, data collected using random sample of cross-sectional units (households-
consumption function analysis or firms - investment study analysis), so there is no prior reason to believe
that the error term pertaining to one household (firm) is not correlated with the error term of another
household (firm). If by chance such a correlation is observed in cross-sectional units, it is called spatial
autocorrelation, that is, correlation in space rather than over time (Temporal).
However, it is important to remember that, in cross-sectional analysis, the ordering of the data
must have some logic, or economic interest, to make sense of any determination of whether (spatial)
autocorrelation is present or not.
The situation, is different while dealing with time series data, for the observations in such data
follow a natural ordering over time so that successive observations are likely to exhibit inter-correlations,
especially if the time interval between successive observations is short, such as a day, a week, or a month
rather than a year. If one observes stock price indices of a company over successive days, they move up
or down for several days in succession. Obviously, in such situations, the assumption of no auto, or
serial, correlation in the error terms that underlies the CLRM will be violated.
The Nature of the Problem
The autocorrelation may be defined as “correlation between members of series of observations
ordered in time [as in time series data] or space [as in cross-sectional data]”. In the regression context, the
classical linear regression model assumes that such autocorrelation does not exist in the disturbances ui.
Symbolically,
The classical model assumes that the disturbance term relating to any observation is not
influenced by the disturbance term relating to any other observation.
Ex: i) If we are dealing with quarterly time series data involving the regression of output on labor and
capital inputs and if, say, there is a labor strike affecting output in one quarter, there is no reason to
believe that this disruption will be carried over to the next quarter. That is, if output is lower this
quarter, there is no reason to expect it to be lower next quarter.
ii) While dealing with cross-sectional data involving the regression of family consumption
expenditure on family income, the effect of an increase of one family’s income on its consumption
expenditure is not expected to affect the consumption expenditure of another family.
However, if there is such a dependence, we have autocorrelation.
Symbolically,
In this situation, the disruption caused by a strike this quarter may very well affect output next
quarter, or the increases in the consumption expenditure of one family may very well prompt another
family to increase its consumption expenditure if it wants to keep up with the Joneses.
The terms autocorrelation and serial correlation are used synonymously, some authors like
Tintner differentiate them, he defines autocorrelation as “lag correlation of a given series with itself,
lagged by a number of time units,’’ whereas, he used serial correlation to “lag correlation between two
different series.’’ Thus, correlation between two time series such as u1, u2, . . . , u10 and u2, u3, . . . , u11,
where the former is the latter series lagged by one time period, is autocorrelation, whereas correlation
between time series such as u1, u2, . . . , u10 and v2, v3, . . . , v11, where u and v are two different time
series, is called serial correlation.
AEC 507/GMG Autocorrelation 1
In following Figures Fig.1a to d shows that there is a distinct pattern among the u’s. Fig.1a shows
a cyclical pattern; Fig.1b and c suggests an upward or downward linear trend in the disturbances; whereas
Fig.1d indicates that both linear and quadratic trend terms are present in the disturbances. Only Fig.1e
indicates no systematic pattern, supporting the non-autocorrelation assumption of the classical linear
regression model.
AEC 507/GMG Autocorrelation 2
Causes for autocorrelation:
1. Inertia: A salient feature of most economic time series is inertia (inactivity), or sluggishness. As is
well known, time series such as GNP, price indices, production, employment, and unemployment
exhibit (business) cycles.
Starting at the bottom of the recession, when economic recovery starts, most of these series start
moving upward. In this upswing, the value of a series at one point in time is greater than its previous
value. This “momentum’’ built into them, continues until something happens (e.g., increase in interest
rate or taxes or both) to slow them down. Therefore, in regressions involving time series data, successive
observations are likely to be interdependent.
2. Specification Bias- Excluded Variables Case:
The researcher often starts with a plausible regression model that may not be the most “perfect’’
one. After the regression analysis, to find out whether the results accord with apriori expectations. If
plotted residuals ˆui obtained from the fitted regression follows patterns such as shown in Fig. 1a to d.
These residuals (which are proxies for ui) may suggest that some variables were not included in the model
for a variety of reasons should be included. This is the case of excluded variable specification bias.
Often the inclusion of such variables removes the correlation pattern observed among the
residuals. Ex: For the following demand model:
………. (1)
Where,
Y = Quantity of beef demanded,
X2 = Price of beef,
X3 = Consumer income,
X4 = Price of pork, and
t = time.
However, for some reason, if we run the following regression:
………. (2)
If (1) is the “correct’’ model or true relation, running (2) is equivalent to letting vt = β4X4t + ut.
And to the extent the price of pork affects the consumption of beef, the error term v will reflect a
systematic pattern, thus creating (false) autocorrelation. A simple test of this would be to run both (1) and
(2) and see whether autocorrelation, if any, observed in model (2) disappears when (1) is run.
3. Specification bias - Incorrect functional form
Suppose the “true’’ or correct model in a cost-output study is as follows:
……….(3)
but we fit the following model:
……….(4)
AEC 507/GMG Autocorrelation 3
Fig.2: Specification bias: incorrect functional form
The marginal cost curve corresponding to the “true’’ model is shown in Fig. 2 along with the
“incorrect’’ linear cost curve. As Fig.2 shows, between points A and B the linear marginal cost curve will
consistently overestimate the true marginal cost, whereas beyond these points it will consistently
underestimate the true marginal cost. This result is to be expected, because the disturbance term vi is,
infact, equal to output2 + ui, and hence will catch the systematic effect of the output2 term on marginal
cost. In this case, vi will reflect autocorrelation because of the use of an incorrect functional form.
4. Cobweb Phenomenon:
The supply of many agricultural commodities reflects the cobweb phenomenon, where supply
reacts to price with a lag of one-time period because supply decisions take time to implement (the
gestation period). Thus, this year’s planting of crops, farmers are influenced by the price prevailing last
year, so that their supply function is
……….(5)
Suppose at the end of period t, price Pt turns out to be lower than Pt−1.
Therefore, in period t+1 farmers may very well decide to produce less than they did in period t.
Obviously, in this situation the disturbances ut are not expected to be random because if the farmers
overproduce in year t, they are likely to reduce their production in t + 1, and so on, leading to a Cobweb
pattern.
5. Lags
In a time series regression of consumption expenditure on income, it is not uncommon to find that
the consumption expenditure in the current period depends, among other things, on the consumption
expenditure of the previous period. That is,
……….(6)
Above regression is known as auto-regression because one of the explanatory variables is the
lagged value of the dependent variable.
Consumers do not change their consumption habits readily for psychological, technological, or
institutional reasons. Now if we neglect the lagged term in above equation, the resulting error term will
reflect a systematic pattern due to the influence of lagged consumption on current consumption.
AEC 507/GMG Autocorrelation 4
6. Manipulation of Data:
In empirical analysis, the raw data are often “manipulated.’’
Ex: In time series regressions involving quarterly data [adding 3 Months value / 3]. This averaging
introduces smoothness into the data by dampening the fluctuations in the monthly data. Therefore, the
graph plotting the quarterly data looks much smoother than the monthly data, and this smoothness may
itself lend to a systematic pattern in the disturbances, thereby introducing autocorrelation.
Another source of manipulation is interpolation or extrapolation of data.
Ex: The Population Census is conducted every 10 years, like during 2001 and 1991. Now if there is a
need to obtain data for some year within the inter-census period 1991–2001, the common practice is to
interpolate on the basis of some adhoc assumptions. All such data “massaging’’ techniques might impose
upon the data a systematic pattern that might not exist in the original data.
7. Data Transformation:
Consider the following model:
Yt = β1 + β2Xt + ut ……….(7)
where,
Y = Consumption expenditure and X = Income.
Since it holds true at every time period, it holds true also in the previous time period, (t − 1). So,
we can write (7) as
Yt−1 = β1 + β2Xt−1 + ut−1 ……….(8) [Known as level form regression]
Yt−1, Xt−1, and ut−1 are known as the lagged values of Y, X, and u, respectively, by one period.
subtracting (8) from (7), we obtain
∆Yt = β2∆Xt +∆ut……….(9) [Known as difference form regression]
∆ is the first difference operator, tells us to take successive differences of the variables.
For empirical purposes, we write this equation as
∆Yt = β2∆Xt + vt-----------------(10) [Known as dynamic regression models – which
involve lagged regressands]
If in (8) Y and X represent the logarithms of consumption expenditure and income, then in (9) ∆Y and
∆X will represent changes in the logs of consumption expenditure and income. A change in the log of a
variable is a relative or a percentage change, if the former is multiplied by 100. So, instead of studying
relationships between variables in the level form, we may be interested in their relationships in the growth
form.
Now if the error term in (7) satisfies the standard OLS assumptions, particularly the assumption
of no autocorrelation, it can be shown that the error term vt in (10) is autocorrelated.
Since, vt = ut − ut−1,
E(vt) = E(ut − ut−1) = E(ut)−E(ut−1) = 0, since E(u) = 0, for each t.
Now, var (vt) = var (ut − ut−1) = var (ut) + var (ut−1) = 2σ2,
Since, the variance of each ut is σ2 and the u’s are independently distributed. Hence, vt is homoscedastic.
But cov (vt , vt−1) = E(vtvt−1) = E[(ut − ut−1)(ut−1 − ut−2)] = −σ2 which is obviously nonzero. Therefore,
although the u’s are not autocorrelated, the v’s are.
8. Nonstationarity:
A time series is stationary if its characteristics (e.g., mean, variance, and covariance) are time
invariant; that is, they do not change over time. If that is not the case, we have a nonstationary time
series. In a regression model (7), it is quite possible that both Y and X are nonstationary and therefore the
error u is also nonstationary. In that case, the error term will exhibit autocorrelation.
Autocorrelation can be positive (Fig. a) as well as negative, although most economic time series
generally exhibit positive autocorrelation because most of them either move upward or downward over
extended time periods and do not exhibit a constant up-and-down movement such as that shown in Fig. b.
AEC 507/GMG Autocorrelation 5
Fig.3: (a) Positive and (b) negative autocorrelation
AEC 507/GMG Autocorrelation 6
An error term with these properties is called a white noise error term. Equation (12) postulates
that the value of the disturbance term in period t is equal to rho times its value in the previous period plus
a purely random error term. The scheme (12) is known as Markov first-order autoregressive scheme,
or simply a first-order autoregressive scheme, usually denoted as AR(1). The name autoregressive is
appropriate because (12) can be interpreted as the regression of ut on itself lagged one period. It is first
order because ut and its immediate past value are involved; that is, the maximum lag is 1.
If the model is ut = ρ1ut−1 + ρ2ut−2 + εt, it would be an AR(2), or second-order, autoregressive
scheme, and so on.
Rho (coefficient of autocovariance) in (12), can also be interpreted as the first-order coefficient
of autocorrelation, or more accurately, the coefficient of autocorrelation at lag 1.
Given the AR(1) scheme, it can be shown that
……………….. (13)
…………………(14)
…………………(15)
Where, cov (ut, ut+s) means covariance between error terms s periods apart and where cor (ut, ut+s) means
correlation between error terms s periods apart. Because of the symmetry property of covariances and
correlations, cov (ut , ut+s) = cov (ut, ut−s) and cor(ut, ut+s) = cor(ut, ut−s) .
Since ρ is a constant between −1 and +1, (13) shows that under the AR(1) scheme, the variance of
ut is still homoscedastic, but ut is correlated not only with its immediate past value but its values several
periods in the past. It is critical to note that |ρ| < 1, that is, the absolute value of rho is less than one.
If ρ =1, the variances and covariances listed above are not defined.
If |ρ| < 1, we say that the AR(1) process given in (12,i.e., ut = ρut−1 + εt) is stationary; that is, the
mean, variance, and covariance of ut do not change over time.
If |ρ| is less than one, then it is clear from (14) that the value of the covariance will decline as we
go into the distant past.
One reason we use the AR(1) process is not only because of its simplicity compared to higher-order
AR schemes, but also because in many applications it has proved to be quite useful. Additionally, a
considerable amount of theoretical and empirical work has been done on the AR(1) scheme.
…………………(17)
Now under the AR(1) scheme, it can be shown that the variance of this estimator is:
……(18)
Where, var (ˆβ2)AR1 means the variance of ˆβ2 under first-order autoregressive scheme.
A comparison of (18) with (17) shows the former is equal to the latter times a term that depends
on ρ as well as the sample autocorrelations between the values taken by the regressor X at various lags.
AEC 507/GMG Autocorrelation 7
In general we cannot predict whether var (ˆβ2) is less than or greater than var (ˆβ2)AR1. Of course,
if rho is zero, the two formulas will coincide. Also, if the correlations among the successive values of the
regressor are very small, the usual OLS variance of the slope estimator will not be seriously biased. But,
as a general principle, the two variances will not be the same.
To give some idea about the difference between the variances given in (17) and (18), assume that
the regressor X also follows the first-order autoregressive scheme with a coefficient of autocorrelation of
r. Then it can be shown that (18) reduces to:
……(19)
If, for example, r = 0.6 and ρ = 0.8, using (19)
var (ˆβ2)AR1 = 2.8461 var (ˆβ2)OLS.
To put it another way,
var (ˆβ2)OLS = 1/2.8461var (ˆβ2)AR1 = 0.3513 var (ˆβ2)AR1 .
That is, the usual OLS formula [i.e.,(17)] will underestimate the variance of (ˆβ2)AR1 by about 65 percent.
[This answer is specific for the given values of r and ρ].
Warning:
A blind application of the usual OLS formulas to compute the variances and standard errors of the
OLS estimators could give seriously misleading results.
Suppose we continue to use the OLS estimator ˆβ2 and adjust the usual variance formula by taking into
account the AR(1) scheme. That is, we use ˆβ2 given by (16) but use the variance formula given by (18).
What now are the properties of ˆβ2? It can be proved that ˆ β2 is still linear and unbiased.
Detecting Autocorrelation
Ex: Relationship between wages and productivity in the business sector of the US, 1959–1998
Table-1: Indices of Real Compensation and Productivity, United States, 1959–1998
AEC 507/GMG Autocorrelation 8
Above Table 1 contains the data on indices of real compensation per hour (Y) and output per hour
(X) in the business sector of the U.S. economy for the period 1959–1998[Base is 1992 = 100]
Fig.4: Residuals and standardized residuals from the wages–productivity regression (1)
Plotting this data on Y and X, we obtain Fig.4. Since the relationship between them is expected to
be positive, it is not surprising that the two variables are positively related, but the relationship between
the two is almost linear is surprising, although there is some hint that at higher values of productivity the
relationship between the two may be slightly nonlinear. Therefore, linear as well as a log–linear models
were estimated and results are follows.
AEC 507/GMG Autocorrelation 9
Qualitatively, both the models give similar results. In both cases the estimated coefficients are
“highly” significant, as indicated by the high t values.
In the linear model, if the index of productivity goes up by a unit, on average, the index of
compensation goes up by about 0.71 units.
In the log–linear model, if the index of productivity goes up by 1 percent, on average, the index
of real compensation goes up by about 0.67 percent.
How reliable are the results if there is autocorrelation?
Tests / Methods to detect Autocorrelation
If there is autocorrelation, the estimated standard errors are biased, as a result of which the
estimated t ratios are unreliable. Hence there is need to find out if data suffer from autocorrelation using
different methods as detailed below with the LINEAR MODEL.
I. Graphical Method
The assumption of non-autocorrelation of the classical model relates to the population
disturbances ut, which are not directly observable. What we have instead are their proxies, the residuals
ˆut, which can be obtained by the usual OLS procedure. Although the ˆut are not the same thing as ut, very
often a visual examination of the ˆu’s gives us some clues about the likely presence of autocorrelation in
the u’s. Actually, a visual examination of ˆut or (ˆu2t) can provide useful information not only about
autocorrelation but also about heteroscedasticity, model inadequacy, or specification bias.
There are various ways of examining the residuals. Residuals are plotted against time, the time
sequence plot [Fig.5], shows the residuals obtained from the wages–productivity regression (1). The
values of these residuals are given in Table 2 along with some other data.
Fig.5: Residuals and standardized residuals from the wages–productivity regression (1)
Alternatively, we can plot the standardized residuals against time, which are also shown in
Fig.5 and Table 2.
The standardized residuals are the residuals (ˆut) divided by the standard error of the regression
(√ˆσ2), that is, they are (ˆut/ˆσ ). [Note that ˆut and ˆσ are measured in the units in which the regressand Y
is measured]. The values of the standardized residuals will therefore be pure numbers (devoid of units of
measurement) and can be compared with the standardized residuals of other regressions. Moreover, the
standardized residuals, like ˆut, have zero mean and approximately unit variance. In large samples (ˆut/ˆσ)
is approximately normally distributed with zero mean and unit variance. For our example, ˆσ = 2.6755.
Examining the time sequence plot given in Fig 5, one can observe that both ˆut and the standardized ˆut
exhibit a pattern observed in Figure 1d, suggesting that perhaps ut are not random.
To see this differently, we can plot ˆut against ˆut−1, that is, plot the residuals at time t against their
value at time (t-1), a kind of empirical test of the AR(1) scheme. If the residuals are nonrandom, we
should obtain pictures similar to those shown in Fig.3. This plot for wages–productivity regression is
shown in Fig.6 for the data in Table 2. As this figure reveals, most of the residuals are bunched in the
second (northeast) and the fourth (southwest) quadrants, suggesting a strong positive correlation in the
residuals.
Note: N = N1 + N2.
If the null hypothesis of randomness is sustainable, following the properties of the normal
distribution, we should expect that
Prob [E(R) − 1.96σR ≤ R ≤ E(R) + 1.96σR] = 0.95
That is, the probability is 95 percent that the preceding interval will include R.
Decision Rule:
Do not reject the null hypothesis of randomness with 95% confidence if R, the number of runs,
lies in the preceding confidence interval; reject the null hypothesis if the estimated R lies outside these
limits. (can choose any level of confidence)
In this example,
N1 - the number of minuses = 19
N2 - the number of pluses = 21
R = 3.
Then using the above formula,
E(R) = 10.975
σ2R = 9.6936
σR = 3.1134
The 95% confidence interval for R is thus:
[10.975 ± 1.96(3.1134)] = (4.8728, 17.0722)
Obviously, this interval does not include 3. Hence, we can reject the hypothesis that the residuals
in our wages–productivity regression are random with 95% confidence. In other words, the residuals
exhibit autocorrelation.
As a general rule, if there is
Positive autocorrelation- the number of runs will be few, whereas
Negative autocorrelation - the number of runs will be many.
Swed and Eisenhart have developed special tables that give critical values of the runs expected in a
random sequence of N observations if N1 or N2 is smaller than 20.
3. Durbin–Watson d Test
The most celebrated test for detecting serial correlation is that developed by statisticians Durbin
and Watson. It is popularly known as the Durbin–Watson d statistic, which is defined as
Since ∑ût2 and û2t-1 differ in only one observation, they are approximately equal. Therefore, setting
∑ û2t-1 ≈∑ û2t, above equation may be written as
Illustration: Wages–productivity regression for the data given in Table 2 the estimated d value can be
shown to be 0.1229, suggesting that there is positive serial correlation in the residuals. From the Durbin–
Watson tables, we find that for 40 observations and one explanatory variable, dL = 1.44 and dU = 1.54 at
the 5 percent level. Since the computed d of 0.1229 lies below dL, we cannot reject the hypothesis that
there is positive serial correlation in the residuals.
However, the d test has great drawback that if it falls in the indecisive zone, one cannot conclude that
(first-order) autocorrelation does or does not exist. Then following modified d test can be used.
Given the level of significance α,
1. H0: ρ = 0 versus H1:ρ > 0. Reject H0 at α level if d < dU. That is, there is statistically significant
positive autocorrelation.
2. H0: ρ = 0 versus H1:ρ < 0. Reject H0 at α level if the estimated (4 − d) < dU, that is, there is
statistically significant evidence of negative autocorrelation.
3. H0: ρ = 0 versus H1: ρ _= 0. Reject H0 at 2α level if d < dU or (4 − d) <dU, that is, there is
statistically significant evidence of autocorrelation, positive or negative.
It may be pointed out that the indecisive zone narrows as the sample size increases, which can be
seen clearly from the Durbin–Watson tables. For example, with 4 regressors and 20 observations, the 5
percent lower and upper d values are 0.894 and 1.828, respectively, but these values are 1.515 and 1.739
if the sample size is 75.
The computer program Shazam performs an exact d test, that is, it gives the p value, the exact
probability of the computed d value. With modern computing facilities, it is no longer difficult to find the
p value of the computed d statistic. Using SHAZAM (version 9) the wages–productivity regression, the p
value of the computed d of 0.1229 is practically zero, thereby reconfirming our earlier conclusion based
on the Durbin–Watson tables.
The Durbin–Watson d test has become so venerable that practitioners often forget the
assumptions underlying the test. In particular, the assumptions that (1) the explanatory variables, or
regressors, are nonstochastic; (2) the error term follows the normal distribution; and (3) that the regression
models do not include the lagged value(s) of the regressand are very important for the application of the d
test.
If a regression model contains lagged value(s) of the regressand, the d value in such cases is often
around 2, which would suggest that there is no (first-order) autocorrelation in such models. Thus, there is
a built-in bias against discovering (first-order) autocorrelation in such models.
Breusch–Godfrey (BG) Test
This test allows for
(1) nonstochastic regressors, such as the lagged values of the regressand;
(2) higher-order autoregressive schemes, such as AR(1), AR(2), etc.; and
(3) simple or higher-order moving averages of white noise error terms, such as εt in Equation (12).The
BG test, also known as the LM test.
…………..(5)
That is, asymptotically, n − p times the R2 value obtained from the eqn (4) regression follows the chi-
square distribution with p df. If in an application, (n − p)R2 exceeds the critical chi-square value at the
chosen level of significance, we reject the null hypothesis, in which case at least one rho in eqn (2) is
statistically significantly different from zero.