Ecf480 FPD 9 2015 2
Ecf480 FPD 9 2015 2
Ecf480 FPD 9 2015 2
This unit will introduce the notion of large sample properties. These are properties that
estimators possess as the sample size hypothetically increases to infinite.
Objectives
Students are introduced to the optimal properties that estimators possess in large
samples.
Students appreciate the importance of large samples in regression analysis
By assumption the least estimators have some optimal properties in finite samples. In addition
to the BLUE properties, the Least Squares estimators are also assumed to be normally
distributed. However, it has been shown that when some assumptions are relaxed the
estimators cease to possess some of the desirable properties. For instance, when the error
terms are not normally distributed the Least Squares Estimators cannot be said to be normal
(recall that this assumption is essential for the purposes of hypothesis testing). It can also be
shown that if the explanatory are not independently distributed of the error terms then it can be
shown that the Least Squares are not unbiased.
As the sample size hypothetically increases to infinite the Least Squares will possess certain
optimal properties known as large samples. Below we briefly discuss some of the properties.
a) Asymptotic Unbiasedness
lim E ( θ^ n) =¿ θ ¿
n→∞
2
S=
∑ ( X i−X )
2
( 1n )
E ( S2 ) =σ 2 1−
lim E ( S 2 )=σ 2
n→∞
1
b) Consistency
^
lim P { ¿ θ−θ∨¿ δ }=1 δ >0
n→∞
p lim θ^ ¿ θ
n →∞
c) Asymptotic Efficiency
d) Asymptotic Normality
2
UNIT 4: AUTOCORRELATION
In this unit we examine in detail one of the violation of the assumptions of the Classical Linear
Regression Model.
Objectives:
Students understand the notion of Serial correlation and why violation of the assumption
of no serial correlation has serious implication for hypothesis testing.
Recall that the one of the assumption of the classical linear regression model is that the error
terms are not correlated. That is to say that errors are not interdependent. This assumption may
be tenable in cross sectional data but not in time series data. Recall that the assumption of no
autocorrelation is simply specified as:
E ( U t U t + s )=0 for s ≠ 0
This means that two errors s periods apart are not interdependent. As mentioned this
assumption may be violated in time series data, implying that E ( U t U t + s ) ≠ 0 for s ≠ 0. There are
several reasons why autocorrelations may occur:
Inertia. Most economic time series exhibit the habit of inertia or sluggishness. Data such
as GNP, price indexes, production and employment exhibit cycles. Starting at the bottom
of the recession, when economic recovery starts, the series start to move upward. Thus
momentum is built into them, and it continues until something happens to slow them
down. Therefore, in regressions involving time series data, successive observations are
likely to be interdependent.
Specification Bias: Excluded Variables Case. Once a relevant variable is omitted
from the estimated model this may show up as autocorrelation in the error terms.
Specification Bias: Incorrect Functional Form. Sometimes the researcher may
estimate the econometric model using the wrong functional form. For instance,
estimating a linear cost function instead of a quadratic cost function.
Cobweb Phenomenon
Lags. Some time series model require that a lagged regressand be included among the
regressors. Omission the lagged regressand among the regressors may indicate the
presence of autocorrelation in the error terms.
“Manipulation” of Data
Data Transformation
Nature of Autocorrelation
3
Having studied the causes of autocorrelation it is important to know the precise form that the
autocorrelation takes. In other words, we want to impose a structure on how the error terms are
related.
The most commonly assumed structure of autocorrelation is the first order autoregressive
scheme. To illustrate this structure, we assume the simple linear regression model,
Y t =β 1+ β2 X t + U t .
The first order autoregressive scheme assumes that the error terms are related as1:
U t =ρ U t −1 + ε t −1 < ρ < 1
E ( ε t ) =0
2
var ( ε t ) =σ ε
cov ( ε t , ε t +s ) =0 for s ≠ 0
σ 2ε
var ( u t ) =E ( U ) =
2
t
1−ρ2
σ 2ε
cov ( U t , U t +s ) =( U t U t +s ) =ρs
1−ρ2
2
cor ( U t , U t +s ) =ρ
Since ρ is a constant between -1 and +1, it holds that under the AR(1) scheme the error term U t
is still homoscedastic. However U t is correlated not only with its immediate past value but its
values several periods in the past.
1
The error terms may be correlated with higher order autoregressive schemes such as
U t =ρ 1 U t−1 + ρ2 U t−2 +ε t (2nd Order autoregressive scheme) or more generally put,
U t =ρ 1 U t−1 + ρ2 U t−2 +… ρn U t −n+ ε t . For explanation purposes the first order autoregressive will
suffice.
4
It must be noted that ¿ ρ∨¿1 this implies that the mean, variance and covariance of U t is
stationary; that is they do not change over-time. It can also be seen that the value of the
covariance will decline as we go into the distant past.
Therefore under the autocorrelation, the variance of ^β 2 depends on the value of ρ and the
sample correlations between the values taken by the regressor X at various lags.
2
σ
Recall that under the assumptions of no autocorrelation the variance of ^β 2 is ( β^ 2 ) = .
∑ x2t
hence it can be seen that the variance of ^β 2 under the first order autoregressive scheme is the
multiple of the variance of var ( ^β 2) times a term that depends on ρ and the sample correlations
between the values taken by the regressor X at various lags.
In the presence of autocorrelation, the least squares estimator is still linear and unbiased.
Assuming that we obtain the estimator without adjusting for autocorrelation but we instead
adjust the variance of the estimator for autocorrelation as shown above, the least squares
estimator computed under such conditions is not efficient, i.e., in the class of linear and
unbiased, it will not have minimum variance.
To obtain the BLUE when there is autocorrelation in the error term we resort to the Generalized
Least Squares method. The BLUE estimator for the regression model (above) is given as:
n
∑ ( x t −ρ x t−1 ) ( y t −ρ yt −1 )
^β GLS = t =2 +C ,
2 n
∑ ( x t− ρ x t −1) 2
t=2
5
Where C is a correction factor that is usually disregarded in practice. Its variance is given as:
2
σ
Var ^β 2 =
GLS
n
+D
∑ ( x t −ρ x t −1 ) 2
t=2
Note that the generalized Least squares method utilizes the information given by the
autocorrelation structure as given by the value of ρ . When ρ=0, the Generalized Least Squares
method is identical to the Least Squares method.
It has been shown that the confidence interval of the estimator computed using the variance
var ( ^β 2) AR 1 is much wider than that computed using the variance Var ^βGLS
2 . Even as the sample
size increases the confidence intervals will still be much wider, hence the estimator ^β 2 is not
asymptotically efficient even after allowing for autocorrelation. The implication for hypothesis
testing is that we are likely declare a coefficient statistically insignificant even though in fact it
may be.
Disregarding autocorrelation entails that we implement our estimation and hypothesis as though
σ2
there is no autocorrelation in the model. In this instance our estimator will be var ( ^β 2) =
∑ x 2t
^ σ2
with variance denoted as var ( β 2) = . A number of errors will arise due to this omission:
∑ x 2t
6
d. Therefore, the usual t and F tests of significance are no longer valid, and if applied, are
lilkely to give seriously misleading conclusions about the statistical significance of the
estimated regression coefficients.
DETECTING AUTOCORRELATION
a. Graphical Method
Plot the residuals against time. Any systematic patterns in the residuals over time is a good
indicator that the model may be suffering from autocorrelation in the error terms.
∑ ( et −e t−1 )2
d= t =2 t=n
∑ e 2t
t=1
7
The null-hypothesis states two null-hypothesis – the null hypothesis that there is no positive
autocorrelation and the one that is no negative serial correlation.
0 2 4- 4- 4
2
The decision rules are as follows if the Durbin-Watson statistic lies in:
i) Non-stochastic regressors, such as the lagged values of the regressand; ii) higher-
order autoregressive schemes and iii) simple or higher-order moving of white noise
error terms. (See Gujarati for Details).
REMEDIAL MEASURES
8
Autocorrelation maybe pure or a result of model mis-specification. If it is pure autocorrelation
there are several procedures that may be used to overcome the problem of autocorrelation. In
the case of autocorrelation arising from model misspecification other appropriate methods may
be employed to reverse the misspecification.
When autocorrelation is detected it may be due tom pure correlation of the error terms or due to
misspecification of the model. If the autocorrelation is due to misspecification, the remedy is
simply to correct the misspecification. However, if the autocorrelation is pure, the following
remedies are proposed:
To understand this method we assume that simple regression model that is plagued by
autocorrelation of the first order autoregressive; that is:
Y t =β 1+ β2 X t + U t
(1)
U t =ρ U t −1 + ε t −1 < ρ < 1
The researcher is typically faced with two issues; when ρ is known and when ρ is unknown.
If Y t =β 1+ β2 X t + U t holds at time t then it hold at time ( t−1 ) equation (2) will hold true.
Y t −1=β 1+ β 2 X t −1 +U t −1
(2)
9
¿ ¿ ¿ ¿
Where β 1=β 1 ( 1−ρ ), Y t =(Y ¿ ¿t− ρY t −1) ¿, X t =( X ¿ ¿ t−1− ρX t −1)¿, and β 2=β 2.
The error term in equation (5) satisfies all the usual OLS assumptions, Least Squares
¿ ¿
estimation is then applied to the transformed variables Y t and X t , and the estimators obtained
satisfy all the optimum properties, that is they are BLUE.
In practice the value of ρ is rarely known. Therefore Econometricians have to find a way of
estimating ρ .
Y t −Y t −1 =β2 ( X t − X t−1 ) + ( U t −U t −1 )
(7)
∆ Y t =β 2 ∆ Y t + ε t
(8)
The first difference transformation may be appropriate if the coefficient of autocorrelation is very
high, in excess of 0.8, or the Durbin-Watson d is quite. Madam proposed the rule of thumb: Use
the first difference form whenever d < R2 .
ρ can be estimated from the relationship between ρ and the Durbin-Watson d statistic. ρ can be
estimated as:
d
^ρ ≈ 1−
2
e t =ρ et −1+ v t
10
Instead of using the Feasible Generalized Least Squares discussed above, the Least Squares
estimators can still be used but the standard errors are corrected by implementing a procedure
developed by Newey and West. The corrected standard errors are known as Newey-West
standard errors or Heteroscedasticity and autocorrelation consistent standard errors.
Example
In macroeconomics you learnt that output is function of labor and capital inputs and hence the
production function can be specified as:
Y =f ( K , L)
Y t = ^β 1+ ^β2 K t + β^ 3 Lt +e t
L is labour input.
We estimate a simple production function for Zambia using the STATA software and data
obtained from the Penn-world Tables.
As a measure of output we utilize the variable - Expenditure-side real GDP at chained PPPs (in
mil. 2005US$) (rgdpe ¿
As a measure of Labour input we use the variable - Human capital index, based on years of
schooling (Barro/Lee, 2010) (hc )
As a measure of the capital input we utilize the variable - Capital stock at current PPPs (in mil.
2005US$) (ck )
rgdpet = ^β 1+ ^β2 hc t + β^ 3 ck t + et
11
. reg rgdpe hc ck
As usual, after estimating the model we have to test whether the model is statistically significant.
Using the F test and inspecting the p value for the F test statistic we conclude that at the 5 %
level of significance our model is statistically significant.
An inspection of the partial regression coefficients shows that at the 5 % of level of significance
all our partial regression coefficients are statistically significant.
However, before we proceed to the interpretation of the model, we carry out diagnostic tests,
the first test we carryout is a test to determine whether the error terms are serially correlated.
Recall that there a number of tests that we can employ but as a first approximation, we plot the
residuals (obtained from the regression) against time. Such a plot yield the graph below:
12
10000
5000
Residuals
0
-5000
The graph shows that the residuals are displaying a systematic pattern and hence there is a
strong evidence that the error terms may be serially correlated or perhaps our model may be
mis-specified. To determine whether there is serial correlation we use the Durbin-Watson test:
After implementing the Durbin Watson, we Durbin Watson Statistic equal to:
To make use of the Durbin – Watson statistic in deciding whether the error terms are serially
correlated of order 1, we have to determine the critical values (I leave this to the students to do).
After determining the critical values, the test statistic above indicates that the error terms are
negatively serially correlated.
Hence we have to find a remedy to cure the serial correlation in the error terms, to do so we
estimate the regression and we then obtain the robust standard errors. The resulting model is
given as:
13
. reg rgdpe hc ck, vce(robust)
Robust
rgdpe Coef. Std. Err. t P>|t| [95% Conf. Interval]
After estimating the model, we use the Durbib-Watson do determine whether the error terms are
still serially correlated. The Durbin-Watson Test Statistic obtained is:
This indicates that there is still serial correlation in our error terms. Most likely the serial
correlation that the error terms are detecting is due to model misspecification and not pure serial
correlation.
14