Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
31 views

Topic 1v5

The document discusses the generalized linear regression model and how ordinary least squares (OLS) properties change under autocorrelation and/or heteroskedasticity. Specifically, it states that while OLS remains an unbiased estimator, its variance increases and usual statistical inference is invalid. Asymptotic consistency still holds but regular inference requires estimating the asymptotic covariance matrix, often using heteroskedasticity and autocorrelation consistent (HAC) estimators. Efficient estimators like generalized least squares (GLS) are also introduced.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Topic 1v5

The document discusses the generalized linear regression model and how ordinary least squares (OLS) properties change under autocorrelation and/or heteroskedasticity. Specifically, it states that while OLS remains an unbiased estimator, its variance increases and usual statistical inference is invalid. Asymptotic consistency still holds but regular inference requires estimating the asymptotic covariance matrix, often using heteroskedasticity and autocorrelation consistent (HAC) estimators. Efficient estimators like generalized least squares (GLS) are also introduced.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Econometrics II

The Generalized Regression Model and Autocorrelation


A refresher on the multiple regression model
The Generalized Regression Model and Autocorrelation
OLS Properties under Autocorrelation and/or
Heteroskedasticity
Autocorrelation
Estimation of the asymptotic covariance matrix of the OLS
estimator
Inference using a HAC estimator
Efficient Estimation by Generalized Least Squares (GLS)
Feasible GLS
Dynamically Complete Models
Misspecification

1 / 68

A refresher on the multiple regression model

You need to review the following assumptions on the Multiple


Regression model (See the slides of Econometria I):
Finite Samples

Assumption (FS1 - Linearity)


yi = β1 + β2 xi2 + ... + βK xiK + εi . The model specifies a linear relationship
between y and x1 , ..., xK (The model is linear in the parameters).

Assumption (FS2 - Full column rank)


There is no exact linear relationship among any of the independent variables
in the model, rank (X) = K

2 / 68
A refresher on the multiple regression model

Assumption (FS3 - Exogeneity of the independent variables - Strict


Exogeneity)
E εi jxj1 , ..., xjK = 0, 8i, j , E ( εi j X) = 0, 8i , E ( εj X) = 0.

Remark: Recall that strict exogeneity rules out models with lagged
dependent variables as regressors.

Assumption (FS4 - Homoskedasticity and no autocorrelation)


Var ( εi j X) = E ε2i X = σ2 > 0, 8i; Cov εi , εj X = 0, 8i, j; i 6= j

Assumption (FS5 - Normal Distribution)


εi j X N 0, σ2

3 / 68

A refresher on the multiple regression model

Large Sample (n ! ∞)

Assumption (LS1 - Linearity, S&WD)


The model is linear yi = xi0 β + εi and f(yi , xi )g is jointly Stationary and
Weakly Dependent.

Assumption (LS2 - Rank Condition)


X0 X p
n = 1
n ∑ni=1 xi xi0 ! E xi xi0 = Q and Q is nonsingular.

Assumption (LS3 - Predetermined regressors)


All the regressors are predetermined in the, i.e. they are orthogonal to the
contemporaneous error term: E (xik εi ) = 0, 8i, k.

4 / 68
A refresher on the multiple regression model

Assumption (LS4 - fxi εi g is a MDS with Finite Second Moments)


fwi g, where wi := xi εi , is a martingale difference sequence (so a fortiori
E (xi εi ) = 0). The K K matrix of cross moments, E ε2i xi xi0 , is
nonsingular.

Remark: Sometimes, instead of E (xi εi ) = 0 it is required the stronger


condition E (εi jxi ) = 0, which, in the time series context, is known as
contemporaneous exogeneity of the regressors xi .

Assumption (LS5 - Conditional Homoskedasticity)


Var ( εi j xi ) = σ2 , 8i.

5 / 68

The Generalized Regression Model and


Autocorrelation
Introduction

The generalized linear regression model is

y = Xβ + ε
E ( ε j X) = 0
0
E εε X = σ2 Ω = Σ where Ω6= I

6 / 68
The Generalized Regression Model and
Autocorrelation
Introduction

Homoskedasticity and no Autocorrelation (Spherical


Disturbances):
2 3
E ε21 X E ( ε1 ε2 j X) E ( ε1 εn j X)
6 E ( ε2 ε1 j X) E ε2 X E ( ε2 εn j X) 7
0 6 2 7
E εε X = 6 .. .. . .. .. 7
4 . . . 5
E ( εn ε1 j X) E ( εn ε2 j X) E ε2n X
2 3
σ2 0 0
6 0 σ2 0 7
6 7 2
= 6 .. .. .. .. 7 = σ I
4 . . . . 5
0 0 σ2

7 / 68

The Generalized Regression Model and


Autocorrelation
Introduction

Heteroskedasticity and no Autocorrelation


2 3
σ21 0 0
6 0 σ22 0 7
0 6 7
E εε X = 6 .. .. .. .. 7
4 . . . . 5
0 0 σ2n
= diag(σ21 , σ22 , ..., σ2n )

8 / 68
The Generalized Regression Model and
Autocorrelation
Introduction

Autocorrelation (and Homoskedasticity):


2 3
1 ω 12 ω 1n
6 ω 12 1 ω 2n 7
0 26 7
E εε X = σ 6 .. .. .. .. 7
4 . . . . 5
ω 1n ω 2n 1

(at least one ω ij 6= 0, i, j = 1, 2, .., n)


Remark: Autocorrelation is also known as Serial correlation
Autocorrelation and Heteroskedasticity:
2 3
ω 11 ω 12 ω 1n
6 ω 12 ω 22 ω 2n 7
0 26 7
E εε X = σ 6 .. .. . . .. 7
4 . . . . 5
ω 1n ω 2n ω nn

9 / 68

OLS Properties under Autocorrelation and/or


Heteroskedasticity
Finite-Sample Properties of OLS

Assumptions FS1-FS3 may hold under Autocorrelation and/or


Heteroskedasticity, so the OLS estimator is unbiased:

E (b) = E ( bj X) = β.

However, it can be proved [board] that


1 1
Var ( bj X) = X0 X X0 σ 2 Ω X X0 X ,
1 1
Var (b) = E X0 X X0 σ 2 Ω X X0 X

Additionally, if Assumption FS5 holds (The disturbances are


normally distributed), then
1 1
bj X N β, X0 X X0 σ 2 Ω X X0 X

and statistical inference can be carried out from this result.


10 / 68
OLS Properties under Autocorrelation and/or
Heteroskedasticity
Asymptotic Properties of OLS

Assumptions LS1-LS3 may hold under Autocorrelation and/or


Heteroskedasticity, so the OLS estimator is consistent:
p
b ! β.
However, usual inference is not valid. To see why consider the
following.
We know that b β = X0 X 1 X0 ε. Hence,
1 1
p X0 X 1 ∑i xi xi0 1
n (b β) = p X0 ε = p ∑ xi εi
n n n n i
Under “some regularity conditions” we have
1
∑i xi xi0 p
!Q 1
where Q := E xi xi0 and
n
!
1 d 1
p ∑ xi εi ! N (0, S) , where S := AVar p ∑ xi ε i .
n i n i
Hence, 11 / 68

OLS Properties under Autocorrelation and/or


Heteroskedasticity

Remarks:
The Gauss-Markov Theorem (based on Assumptions FS1-FS4)
no longer holds for the OLS estimator, because FS4 does not
hold. The BLUE is some other estimator.
However, the OLS estimator b is unbiased and can still be used
even if FS4 does not hold.
Because the variance of the least squares estimator is not
1 1
σ2 (X0 X) statistical inference based on σ2 (X0 X) is incorrect.
The usual t-ratio is not distributed as the t distribution. The same
comment applies to the F-test.

12 / 68
OLS Properties under Autocorrelation and/or
Heteroskedasticity
Properties of OLS

Remarks:
It can be proved that s2 is a biased estimator of σ2 ; however, s2 is
consistent for σ2 under certain conditions.
1
Therefore s2 (X0 X) is (completely) inadequate to estimate
1
Var ( bj X) . There is usually no way to know whether σ2 (X0 X)
is larger or smaller than the true variance of b.
If Ω is known we may develop the theory under the Assumption
FS1-FS3 and FS5. Otherwise we need the Assumptions LS1-LS4
to estimate Ω through a consistent estimator.

13 / 68

Heteroskedasticity

See ECONOMETRIA I

14 / 68
Autocorrelation

Because the issue of serial correlation arises almost always in


time-series models, we use the subscript "t" instead of "i" in this
section.
We consider now the case
2 3
1 ω 12 ω 1n
6 ω 12 1 ω 2n 7
0 26 7
E εε X = σ 6 . .. .. .. 7
4 .. . . . 5
ω 1n ω 2n 1
or
Cov ( εt , εs j X) 6= 0, 8t, s.
In most cases of interest Cov ( εt , εs j X) = Cov (εt , εs ) .To simplify we
assume that
Cov ( εt , εs j X) = Cov (εt , εs ) .
Note Cov (εt , εs ) = E (εt εs ) since E (εt ) = 0.

15 / 68

Autocorrelation

Example: Consider

yt = β1 + β2 xt2 + εt ,
εt = ρεt 1 + ut , jρj < 1

where fut g is a sequence of i.i.d. r.v. independent of xt2 with


E (ut ) = 0, Var (ut ) = σ2u and E (εt k ut ) = 0 8k 2 N. We say that fεt g
follows an AR(1) [autoregressive process of order 1]. We have:

σ2u σ2u
Var ( εt j X) = ... = , E εt εt j X = ... = ρj , j 0
1 ρ2 1 ρ2
2 3
1 ρ ρn 1

σ2u 6 ρ 1 ρn 2 7
0 0 6 7
E εε X = E εε = 6 .. .. .. .. 7.
1 ρ2 4 . . . . 5
ρn 1 ρn 2 1

16 / 68
Autocorrelation

Example: Previous example with ρ = 0.95 (. It is clear that


Cov εi , εj = 0 fails.

17 / 68

Autocorrelation

Another example where E εt εt j 6= 0 (Misspecification, see below):

18 / 68
Autocorrelation

Autocorrelated errors could arise for several reasons:


Prolonged influence of shocks. In time series data, random shocks
(disturbances) have effects that often persist over more than one
time period. An earthquake, flood, strike, pandemic, or war, for
example, will probably affect the economy’s operation in periods
following the period in which it occurs.
Inertia. Owing to inertia or psychological conditioning, past
actions often have a strong effect on current actions, so that a
positive disturbance in one period is likely to influence activity
in succeeding periods.

19 / 68

Autocorrelation

Autocorrelated errors could arise for several reasons (cont.):


Spatial autocorrelation. In regional cross-section data, a random
shock affecting economic activity in one region may cause
economic activity in an adjacent region to change because of
close economic ties between the regions.
Data manipulation. Published data often undergo interpolation or
smoothing, procedures that average true disturbances over
successive time periods.
Misspecification. An omitted relevant independent variable that is
autocorrelated will make the disturbance (associated with the
misspecified model) autocorrelated. An incorrect functional
form or a misspecification of the equation’s dynamics could do
the same. In these instances, the appropriate procedure is to
correct the misspecification.

20 / 68
Testing for Autocorrelation
We assume a more general structure of autocorrelation:
εt = ρ1 εt 1 + ... + ρp εt p + ut
where fut g is a White Noise process and ρ1 , ..., ρp are such that fεt g is
stationary (We will see later in what conditions fεt g is stationary, as a
function of ρi ).
Testing with Strictly Exogenous Regressors
Under strictly exogenous regressors (FS3):
E ( εt j X) = 0, t = 1, 2, ..., n
it can be shown that the hypothesis H0 : ρ1 = ρ2 = ... = ρp = 0 can be
tested through the following auxiliary regression:
regression et on et 1 , ..., et p . (1)
(without intercept). Under the null
d
LM = nR2 ! χ2(p)

where R2 refers to the auxiliary regression (1).


21 / 68

Testing for Autocorrelation

Example 1: The general fertility rate (gfr) is the number of children


born to every 1,000 women of childbearing age. For the years 1913
through 1984, the equation,

gfrt = β1 + β2 pet + β3 pet 1 + β4 pet 2 + β5 ww2t + β6 pillt + εt

explains gfr in terms of the average real dollar value of the personal
tax exemption (pe) for periods t, t 1 and t 2, and two dummy
variables. The variable ww2 takes on the value unity during the years
1941 through 1945, when the United States was involved in World
War II. The variable pill is unity from 1963 onward, when the birth
control pill was made available for contraception.

22 / 68
Testing for Autocorrelation

Example 1:
The above model was estimated in Stata.

23 / 68

Testing for Autocorrelation

Example 1 (cont.):

Test H0 : ρ1 = ρ2 = ρ3 = 0 at 5% level.

24 / 68
Testing for Autocorrelation

Testing for autocorrelation without requiring Strictly Exogeneity of


the Regressors (Breusch-Godfrey test)
If the regressors are not strictly exogenous, for example, if there is a
lagged endogenous (yt 1 , or yt 2, etc.) as explanatory variable the
test presented in the previous slide is not valid. The reason is
somewhat technical and is explained in Hayashi’s book, pp. 144-146.
The trick consists in removing the effect of xt in the regression of et on
et 1 , ..., et p by considering now the regression

et on xt ,et 1 , ..., et p (2)

and then calculating the LM statistic for the hypothesis that the p
coefficients of et 1 , ..., et p are all zero. This test is still valid when the
regressors are strictly exogenous

25 / 68

Testing for Autocorrelation

Given

et = θ 1 + θ 2 xt2 + ... + θ K xtK + ρ1 et 1 + ... + ρp et p + errort

the null hypothesis can be formulated as

H0 : ρ1 = ρ2 = ... = ρp = 0

Under the null


d
LM = nR2 ! χ2(p)

where R2 refers to the auxiliary regression (2).

26 / 68
Testing for Autocorrelation

To sump up. To test H0 : ρ1 = ρ2 = ... = ρp = 0


Select p
xt are strictly exogenous
Run the regression of et on et 1 , ..., et p
Test that all coefficients are zero using the χ2(p) test
xt are not strictly exogenous
Run the regression of et on xt and et 1 , ..., et p
Test that all coefficients associated with et j are zero using χ2(p) test

27 / 68

Testing for Autocorrelation


Example 2: We consider now the same data of Example 1, but the
following regression model
gfrt = β1 + β2 pet + β3 pet 1 + β4 pet 2 + β5 ww2t + β6 pillt + β7 gfrt 1 + εt
Estimating the model in Stata yields:
. reg gfr pe L.pe L2.pe ww2 pill L.gfr

Source SS df MS Number of obs = 70


F(6, 63) = 312.94
Model 25148.6244 6 4191.4374 Prob > F = 0.0000
Residual 843.809324 63 13.3937988 R-squared = 0.9675
Adj R-squared = 0.9644
Total 25992.4337 69 376.701938 Root MSE = 3.6598

gfr Coefficient Std. err. t P>|t| [95% conf. interval]

pe
--. -.067659 .032529 -2.08 0.042 -.132663 -.002655
L1. .0425119 .0399539 1.06 0.291 -.0373297 .1223535
L2. .0479673 .0323838 1.48 0.144 -.0167466 .1126812

ww2 3.405305 2.879548 1.18 0.241 -2.349012 9.159623


pill -5.441894 1.333312 -4.08 0.000 -8.106306 -2.777482

gfr
L1. .9039273 .0299643 30.17 0.000 .8440484 .9638062

_cons 7.597187 3.044825 2.50 0.015 1.512589 13.68178

. predict ehat, res


(2 missing values generated)
28 / 68
Testing for Autocorrelation
Example 2:
. reg ehat pe L.pe L2.pe ww2 pill L.gfr L.ehat L2.ehat L3.ehat

Source SS df MS Number of obs = 67


F(9, 57) = 0.59
Model 69.6441662 9 7.73824069 Prob > F = 0.7990
Residual 746.272602 57 13.0925018 R-squared = 0.0854
Adj R-squared = -0.0591
Total 815.916768 66 12.3623753 Root MSE = 3.6184

ehat Coefficient Std. err. t P>|t| [95% conf. interval]

pe
--. .0111744 .0329507 0.34 0.736 -.0548084 .0771571
L1. -.0027292 .0400544 -0.07 0.946 -.0829367 .0774783
L2. -.0020058 .0323179 -0.06 0.951 -.0667212 .0627096

ww2 -1.045472 3.05387 -0.34 0.733 -7.160742 5.069798


pill -.5600795 1.401743 -0.40 0.691 -3.367021 2.246862

gfr
L1. -.0237148 .0353775 -0.67 0.505 -.0945571 .0471274

ehat
L1. .1871415 .1381683 1.35 0.181 -.0895358 .4638187
L2. -.2297485 .1326608 -1.73 0.089 -.4953972 .0359002
L3. .1034538 .1363822 0.76 0.451 -.1696468 .3765545

_cons 1.673988 3.444333 0.49 0.629 -5.223169 8.571145

Test H0 : ρ1 = ρ2 = ρ3 = 0 at 5% level.
29 / 68

Autocorrelation

If you conclude that the errors are serial correlated you have a few
options:
(a) you don’t know the form of autocorrelation so you rely
on the OLS, but you use the consistent estimator of the
asymptotic covariance matrix of the OLS estimator:
Q 1 SQ 1 .
(b) You know (at least approximately) the form of
autocorrelation and so you use a feasible GLS estimator
(requires strict exogeneity of the regressors).
(c) You are concerned only with the dynamic specification
of the model and with forecast. You may try to convert
your model into a dynamically complete model.
(d) Your model may be misspecified: you respecify the model
and the autocorrelation disappears.

30 / 68
Estimation of the asymptotic covariance matrix of the
OLS estimator

Assumpt. LS1-LS3 may hold under serial correlation, so the OLS


estimator may be consistent even if the error is autocorrelated.
However, usual inference is not valid because
p d 1 1
n (b β) ! N 0, Q SQ .

where
0
Q:= E xi xi
!
n
1 1
S:= AVar p X0 ε = lim Var ∑ xi εi
n n! ∞ n
i=1

31 / 68

Estimation of the asymptotic covariance matrix of the


OLS estimator

Remarks:
When the regressors include a constant (true in virtually all
known applications), Assumption LS4 implies that the error
term is a scalar martingale difference sequence, so if the error is
found to be serially correlated (or autocorrelated), that is an
indication of a failure of Assumption LS4.
We have Cov xt εt , xt j εt j 6= 0. In fact,

Cov xt εt , xt j εt j = E xt εt xt0 j εt j

= E E xt εt xt0 j εt j xt j , xt

= E xt xt0 jE εt εt j xt j , xt .

Therefore E εt εt j xt j , xt 6= 0 ) Cov xt εt , xt j εt j 6= 0

32 / 68
Estimation of the asymptotic covariance matrix of the
OLS estimator

Summary of what we know:


If errors are homoskedastic and not autocorrelated then

S = Var (xi εi ) = σ2 E xi xi0 .


n
s2
Ŝ =
n ∑ xt xt0
t=1

If errors are heteroskedastic and not autocorrelated then :

S = Var (xi εi ) = E ε2i xi xi0 .


1 n 2 0
n t∑
Ŝ = et x t x t
=1

33 / 68

Estimation of the asymptotic covariance matrix of the


OLS estimator
If the errors are autocorrelated:
!
n
1 1
S = AVar p X0 ε = ... = lim Var ∑ xi εi
n n i=1
1 n
n i∑
= lim Var (xt εt )
=1
1n 1 n
+ lim
n j∑ ∑ E xt εt xt0 j εt j + E xt j εt j xt0 εt
=1 t=j+1

1n 1 n
= Var (xt εt ) + lim
n j∑ ∑ E εt εt j xt xt0 j + E εt j εt xt j xt0
=1 t=j+1

2
Hence, if the errors are autocorrelated we cannot use σn ∑nt=1 xt xt0 or
1 n 2 0
n ∑t=1 et xt xt (robust to conditional Heteroskedasticity) as a consistent
estimator of S.
34 / 68
Estimation of the asymptotic covariance matrix of the
OLS estimator

For sake of generality, assume that we have also a problem of


Heteroskedasticity.
Given

1n 1 n
S = E ε2t xt xt0 + lim ∑ E εt εt j xt xt0 + E εt j εt xt j xt0
n j∑ j ,
=1 t=j+1

a possible estimator of S based on the analogy principle would be

1 n 2 0 1n 1 n
et xt xt + ∑ ∑ et et j xt xt0 + et j et xt j xt0
n t∑ j .
=1 n j=1 t=j+1

35 / 68

Estimation of the asymptotic covariance matrix of the


OLS estimator

This estimator has a serious problem. To estimate


∑nt=j+1 E εt εt j xt xt0 j /n and n1 ∑nt=j+1 E εt j εt xt j xt0 /n it uses
∑nt=j+1 et et j xt xt0 j /n and ∑nt=j+1 et j et xt j xt0 /n respectively and
these estimators are not consistent if j is large.
A possible solution is to use

1 n 2 0 1 Ln n
et xt xt + ∑ ∑ et et j xt xt0 + et j et xt j xt0 ,
n t∑
=1 n j=1 t=j+1 j

where Ln goes to infinity with n, but Ln < n 1.


Autocorrelations at lags larger than Ln are ignored.
A major problem with this estimator is that it is not positive
semi-definite and hence cannot be a well-defined
variance-covariance matrix.

36 / 68
Estimation of the asymptotic covariance matrix of the
OLS estimator

Newey and West show that with a suitable weighting function ω (j),
the estimator below is consistent and positive semi-definite:

1 n 2 0 1 Ln n
et xt xt + ∑ ∑ ω (j) et et j xt xt0 + et j et xt j xt0
n t∑
ŜHAC = j
=1 n j=1 t=j+1

where the weighting function ω (j) is

j
ω (j) = 1 .
Ln + 1
The maximum lag Ln must be determined in advance. Newey and
West require Ln to be chosen such that limn!+∞ Ln = +∞ and
limn!+∞ n 1/4 Ln = 0.

37 / 68

Estimation of the asymptotic covariance matrix of the


OLS estimator

Estimators of this type are known as HAC (Heteroskedasticity- and


autocorrelation-consistent) covariance matrix estimators and are
valid when both conditional Heteroskedasticity and serial
correlations are present but of an unknown form. The term HAC
p
[
estimator also applies to AVar n (b β) = Q̂ 1 ŜHAC Q̂ 1 , where
0
Q̂= ∑i xi xi /n.
Remark: There are alternative estimators to the Newey-West
estimator, although this is the most popular estimator in empirical
work.

38 / 68
Estimation of the asymptotic covariance matrix of the
OLS estimator
Example: For xt = 1, n = 9, Ln = 3 we have
Ln n
∑ ∑ ω (j) et et j xt xt0 j + et j et xt j xt0
j=1 t=j+1
Ln n
= ∑ ∑ ω (j) 2et et j
j=1 t=j+1
= ω (1) (2e1 e2 + 2e2 e3 + 2e3 e4 + 2e4 e5 + 2e5 e6 + 2e6 e7 + 2e7 e8 + 2e8 e9 ) +
ω (2) (2e1 e3 + 2e2 e4 + 2e3 e5 + 2e4 e6 + 2e5 e7 + 2e6 e8 + 2e7 e9 ) +
ω (3) (2e1 e4 + 2e2 e5 + 2e3 e6 + 2e4 e7 + 2e5 e8 + 2e6 e9 ) .

1
ω (1) = 1 = 0.75
4
2
ω (2) = 1 = 0.50
4
3
ω (3) = 1 = 0.25
4
39 / 68

Estimation of the asymptotic covariance matrix of the


OLS estimator

Selecting Ln - this is an empirical question. Some authors recommend

n 2/9
Ln = int(4 ), (int (x) : integer part of x)
100
Clearly this sequence satisfies limn!+∞ Ln = +∞ and
limn!+∞ n 1/4 Ln = 0.
There are more involved rules of thumb for how to choose Ln .

40 / 68
Inference using a HAC estimator

Recall that
p d
β) ! N 0,Q 1 SQ 1 .
n (b
p
Consistent estimator for AVar n (b β) = Q 1 SQ 1:

p
[
AVar n (b β) = Q̂ 1
ŜHAC Q̂ 1
.

Under H0 : βk = β0k we have

bk β0k d
t0k = p ! N (0, 1) ,
σ̂bk / n
p h p i
where σ̂2bk = [
AVar n bk β0k [
= AVar n (b β) .
kk

41 / 68

Inference using a HAC estimator

Under H0 : Rβ = r with rank (R) = p, we have (Wald test)


p 0 1 d
W = n (Rb [
r)0 RAVar n (b β) R (Rb r) ! χ2(p)

This result is extremely important and useful. It implies that without


actually specifying the type of autocorrelation and
Heteroskedasticity, we can still make appropriate inferences based on
the results of least squares. This implication is especially useful if we
are unsure of the precise nature of the autocorrelation and
Heteroskedasticity (which is probably most of the time).

42 / 68
Inference using a HAC estimator

Example 1 (cont.): To compute the standard errors in Stata using the


HAC estimator you should use the command newey.

. newey gfr pe L.pe L2.pe ww2 pill, lag(3)

Regression with Newey–West standard errors Number of obs = 70


Maximum lag = 3 F( 5, 64) = 14.33
Prob > F = 0.0000

Newey–West
gfr Coefficient std. err. t P>|t| [95% conf. interval]

pe
--. .0726719 .0821657 0.88 0.380 -.0914731 .2368168
L1. -.0057796 .0745476 -0.08 0.938 -.1547056 .1431465
L2. .0338268 .0797214 0.42 0.673 -.125435 .1930886

ww2 -22.1265 7.930112 -2.79 0.007 -37.96872 -6.284283


pill -31.30499 5.540035 -5.65 0.000 -42.37248 -20.2375
_cons 95.8705 7.901181 12.13 0.000 80.08607 111.6549

43 / 68

Efficient Estimation by Generalized Least Squares


(GLS)

There are many forms of autocorrelation and each one leads to a


different structure for the error covariance matrix ˙. The most popular
form is known as the first-order autoregressive process. In this case
the error term in
yt = xt0 β + εt
is assumed to follow the AR(1) model

εt = ρεt 1 + ut , jρj < 1,

where fut g satisfies the following conditions E (ut ) = 0, Var(ut ) = σ2u ,


for all t and E ut , uj = 0 t 6= j.
We assume in this section that ˙ is known. For this reason we may
develop the relevant theory under the assumptions FS1-FS3.

44 / 68
Efficient Estimation by Generalized Least Squares
(GLS)
Derivation of GLS
As in the heteroskedastic case, to obtain the GLS estimator we need
to find a full rank n n matrix P such that

Py = PXβ + Pε
y = X β+ε

and
0 0 0 0 0 2 0 2
E ε ε X = E Pεε P X = P E εε X P = σ PΩP = σ I.

Thus, P is such that

PΩP0 = I , Ω 1= P0 P

We use this matrix P to obtain y = Py and X = PX.


Remark: Note that the decomposition Ω 1 = P0 P exists as Ω is
positive definite, but it is not unique. But for the discussion below the
choice of P does not matter.
45 / 68

Efficient Estimation by Generalized Least Squares


(GLS)
The GLS estimator is the OLS estimator applied to the transformed
model y = X β + ε , i.e.
1
β̂GLS = X0 X X0 y .
It follows that
E( β̂GLS ) = E( β̂GLS jX) = β
1
Var β̂GLS X = σ2 X0 X
We may expressed β̂GLS and Var β̂GLS X as
1 1
β̂GLS = X0 X X0 y = X0 P0 PX X0 P0 Py,
1
= X0 Ω 1
X X0 Ω 1
y, Note: Ω 1
= P0 P.
and
1 1
Var β̂GLS X = σ 2 X0 X = σ 2 X0 Ω 1
X ,
1
Var β̂GLS = E [ σ 2 X0 Ω 1
X ].
46 / 68
Efficient Estimation by Generalized Least Squares
(GLS)
Question: how to obtain Ω?
Consider the case In this case the error term in

yt = xt0 β + εt

is assumed to follow the AR(1) model

εt = ρεt 1 + ut , jρj < 1,

where fut g satisfies the following conditions E (ut ) = 0, Var(ut ) = σ2u ,


for all t and E ut , uj = 0 t 6= j.
In this case the matrix Ω is given by
2 3
1 ρ ρn 1

σ2u 6
6 ρ 1 ρn 2 7
7
Ω= 6 .. .. .. .. 7
1 ρ2 4 . . . . 5
ρn 1 ρn 2 1

47 / 68

Efficient Estimation by Generalized Least Squares


(GLS)

It can be proven (this is not straightforward) that


2 3
1 ρ 0 0 0
6
6 ρ 1 + ρ2 ρ 0 0 7
7
6 0 ρ 1 + ρ2 0 0 7
1 6 7
Ω =6 .. .. .. .. .. 7.
6
6 . . . . . 7
7
4 0 0 0 1 + ρ2 ρ 5
0 0 0 ρ 1 + ρ2

48 / 68
Efficient Estimation by Generalized Least Squares
(GLS)

To obtain the transformed model assuming that εt follows an AR(1)


2 p 32 3
1 ρ2 0 0 0 0 y1
6
6 ρ 1 0 0 0 7 76 y2 7
6 0 ρ 1 0 0 76 7
6 7 6 y3 7
6
Py = 6 .. .. .. .. .. 7 7
6 . . . . . 766
7
7
6 74
1 0 5 yn 1
5
4 0 0 0
0 0 0 ρ 1 y n
2 p 3 2 3
y1 1 ρ 2 ỹ1
6 y2 ρy1 7 6 ỹ2 7
6 7 6 7
6 y3 ρy 2
7 6 ỹ3 7
6 7 6 7
=6 .. 7 = 6 .. 7
. 7 6 . 7
6 7 6 7
6
4 yn 1 ρyn 2 5 4 ỹn 1 5
yn ρyn 1 ỹn

49 / 68

Efficient Estimation by Generalized Least Squares


(GLS)

2 p PX =3 2 3
1 ρ2 0 0 0 0 1 x12 x1K
6
6 ρ 1 0 0 0 776
6 1 x22 x2K 7
7
6
6 0 ρ 1 0 0 76
7
6 1 x32 x3K 7
7
6 .. .. .. .. .. 7 6 .. .. .. 7
6
6 . . . . . 76
7
6 . . . 7
7
4 0 0 0 1 0 54 1 xn 1,2 xn 1,K 5
0 0 0 ρ 1 1 xn2 xnK

50 / 68
Efficient Estimation by Generalized Least Squares
(GLS)

2 p p p 3
1 ρ2 x12 1 ρ2 x1K 1 ρ2
6
6 1 ρ x22 ρx12 x2K ρx1K 7
7
6
6 1 ρ x32 ρx22 x3K ρx2K 7
7
PX = 6 .. .. .. 7
6
6 . . . 7
7
4 1 ρ xn 1,2 ρxn 2,2 xn 1,K ρxn 2,K 5
1 ρ xn2 ρxn 1,2 xnK ρxn 1,K
2 3
x̃10
6 x̃0 7
6 2 7
6 x̃0 7
6 3 7
= 6 .. 7
6 . 7
6 7
4 x̃0 5
n 1
0
x̃n

51 / 68

Efficient Estimation by Generalized Least Squares


(GLS)

Let us write the transformed model in a scalar form:

ỹt = x̃t0 β + ut

where
p p
1 ρ2 y1 t=1 1 ρ2 x10 t=1
ỹt = , x̃t0 = ,
yt ρyt 1 t>1 ( xt ρxt 1 )0 t>1

Without the first observation, the transformed model is


0
yt ρyt 1 = (xt ρxt 1) β + ut , t > 1.

52 / 68
Efficient Estimation by Generalized Least Squares
(GLS)

The GLS estimator is the OLS estimator applied to the transformed


model. So the GLS can also be expressed as
! 1 n
n
β̂GLS = ∑ x̃t x̃t0 ∑ x̃t ỹt
t=1 t=1

1
which is the same as β̂GLS = X0 Ω 1X X0 Ω 1 y.

GLS is BLUE if FS.1 through FS.5 hold in the transformed model.

53 / 68

Feasible GLS
Problem: don’t know ρ, need to get an estimate first
Run OLS on the original model and then regress residuals et on
lagged residuals et 1 (with OLS). The obtained estimator ρ̂ is the
estimator of ρ.
Let
( q ( q
1 ρ̂ 2
y1 t = 1 0 1 ρ̂2 x10 t=1
ỹt = , x̃t = 0
,
yt ρ̂yt 1 t > 1 (xt ρ̂xt 1 ) t > 1
The FGLS estimator is
! 1 n
n
β̂FGLS = ∑ x̃t x̃t0 ∑ x̃t ỹt
t=1 t=1
This estimator is also known as Prais-Winsten estimator.
If we ignore the first observation (t = 1) we have the
Cochrane-Orcutt estimator (also a FGLS estimator).
These FGLS estimators are not unbiased, but are consistent
under some regularity conditions.
The asymptotic distributions of the Prais-Winsten estimator and
the Cochrane-Orcutt estimator are the same.
54 / 68
Feasible GLS

t and F tests from the transformed equations are valid


(asymptotically).
FGLS is asymptotically more efficient than OLS
This basic method can be extended to allow for higher order
serial correlation,AR(q), in the error term.

55 / 68

Feasible GLS
Example 1 (cont.): Computation of the Cochrane-Orcutt estimator in
Stata

. reg uhat L.uhat, nocon

Source SS df MS Number of obs = 69


F(1, 68) = 299.10
Model 9927.20746 1 9927.20746 Prob > F = 0.0000
Residual 2256.90876 68 33.1898347 R-squared = 0.8148
Adj R-squared = 0.8120
Total 12184.1162 69 176.581394 Root MSE = 5.7611

uhat Coefficient Std. err. t P>|t| [95% conf. interval]

uhat
L1. .875014 .0505946 17.29 0.000 .774054 .9759739

. gen double gfrx=gfr-.875014*L.gfr


(1 missing value generated)

. gen double pex=pe-.875014*L.pe


(1 missing value generated)

56 / 68
Feasible GLS

. gen double ww2x=ww2-.875014*L.ww2


(1 missing value generated)

. gen double pillx=pill-.875014*L.pill


(1 missing value generated)

. gen double b0=1-.875014

. reg gfrx pex L.pex L2.pex ww2x pillx b0, nocon

Source SS df MS Number of obs = 69


F(6, 63) = 84.51
Model 8798.63881 6 1466.4398 Prob > F = 0.0000
Residual 1093.14881 63 17.3515684 R-squared = 0.8895
Adj R-squared = 0.8790
Total 9891.78762 69 143.359241 Root MSE = 4.1655

gfrx Coefficient Std. err. t P>|t| [95% conf. interval]

pex
--. -.0285285 .0335738 -0.85 0.399 -.0956204 .0385635
L1. -.0082986 .0314497 -0.26 0.793 -.0711457 .0545485
L2. .1265701 .0350112 3.62 0.001 .0566057 .1965345

ww2x 1.388913 4.0764 0.34 0.734 -6.757122 9.534948


pillx -9.087774 3.911138 -2.32 0.023 -16.90356 -1.271988
b0 81.92976 6.147842 13.33 0.000 69.64428 94.21524

57 / 68

Dynamically Complete Models

If we are concerned only with the dynamic specification of the model


and with forecast we may try to convert a model with autocorrelation
into a Dynamically Complete Model.
Consider
yt = xt0 β + εt
such that E ( εt j xt ) = 0. This condition although guarantees
consistency of b (if other conditions are also met), does not preclude
autocorrelation. You may try to increase the number of regressors
(the elements of xt ) and get a new regression model

yt = xt0 β + εt such that

E ( yt j xt , yt 1 , xt 1 , yt 2 , ...) = E ( yt j xt ) = xt0 β.
Written in terms of εt

E ( εt j xt , yt 1 , xt 1 , yt 2 , ...) = 0.

58 / 68
Dynamically Complete Models

Definition
The model yt = xt0 β + εt is dynamically complete (DC) if

E ( yt j xt , yt 1 , xt 1 , yt 2 , ...) = E ( yt j xt ) or
E ( εt j xt , yt 1 , xt 1 , yt 2 , ...) = 0

holds.
If a model is DC then once xt has been controlled for, no lags of either
y or x help to explain current yt .

59 / 68

Dynamically Complete Models

Theorem
If a model is DC then the errors are not correlated. Moreover fxt εt g is a
MDS.
Notice that E ( εt j xt , yt 1 , xt 1 , yt 2 , ...) = 0 can be rewritten as

E ( εt j Ft ) = 0 where
Ft = fεt 1 , εt 2 , ..., ε1 , xt , xt 1 , ..., x1 g .

60 / 68
Dynamically Complete Models

Example: Consider

yt = β1 + β2 xt2 + ut , ut = φut 1 + εt

where fεt g satisfies E ( εt j xt2 , yt 1 , xt 1,2 , yt 2 , ...) = 0. Set


xt0 = 1 xt2 . The above model is not DC since the errors are
autocorrelated. Notice that

E ( yt j xt2 , yt 1 , xt 1,2 , yt 2 , ...) = β1 + β2 xt2 + φut 1

does not coincide with

E ( yt j xt ) = E ( yt j xt2 ) = β1 + β2 xt2 .

61 / 68

Dynamically Complete Models


However, it is easy to obtain a DC model. Since

ut = yt ( β1 + β2 xt2 ) )
ut 1 = yt 1 ( β1 + β2 xt 1,2 )

we have

yt = β1 + β2 xt2 + ut
= β1 + β2 xt2 + φut 1 + εt
= β1 + β2 xt2 + φ (yt 1 ( β1 + β2 xt 1,2 )) + εt .

This equation can be written in the form

yt = γ1 + γ2 xt2 + γ3 yt 1 + γ4 xt 1,2 + εt .

Define xt0 = (xt2 , yt 1 , xt 1,2 ) . The previous equation is a DC model as

E ( yt j xt , yt 1 , xt 1 , ...) = E ( yt j xt ) = γ1 + γ2 xt2 + γ3 yt 1 + γ 4 xt 1,2 .

62 / 68
Misspecification

In many cases the finding of autocorrelation is an indication that the


model is misspecified. If this is the case, the most natural route is not to
change your estimator (from OLS to GLS) but to change your model.
Types of misspecification may lead to a finding of autocorrelation in
your OLS residuals:
functional form misspecification.
dynamic misspecification;
omitted variables (that are autocorrelated) - see exercise 8 of
Exercise Sheet 1;

63 / 68

Misspecification
Functional form misspecification

Functional form misspecification. Suppose that the true linear


relationship is
yt = β1 + β2 log t + εt .
In the following figure we estimate a misspecified functional form:
yt = β1 + β2 t + εt .The residuals are clearly autocorrelated

64 / 68
Misspecification
Dynamic misspecification

Serial correlation might lead to inconsistency if we have lagged


dependent variables as regressors, but not always.
Example A:
yt = β1 + β2 yt 1 + ut
where E[ut jyt 1 ] = 0. In this case OLS is consistent and there might
be autocorrelation.To see this note that
ut 1 = yt 1 β1 β 2 yt 2.

Therefore
cov(ut , ut 1) = β1 cov(ut , yt 2 ).
If β1 6= 0 and cov(ut , yt 2) 6= 0, there is autocorrelation.
Example B:
yt = β1 + β2 yt 1 + ut
and
ut = ρut 1 + εt ,
t = 2, . . . , n, where εt are i.i.d., jρj < 1 and
E[εt jut 1 , ut 2 , . . .] = E[εt jyt 1 , yt 2 , . . .] = 0.
65 / 68

Misspecification
Dynamic misspecification

Example B (cont):
Then,

Cov(yt 1 , ut ) = E[yt 1 (ρut 1 + εt )]


= ρE(yt 1 ut 1 )
= ρE[yt 1 (yt 1 β1 β2 yt 2 )] 6= 0

unless ρ = 0.
In this case the OLS estimators are not consistent for β1 , β2 . This is
a special form of autocorrelation.

66 / 68
Misspecification

Models with lagged dependent variables and serial correlation in the


errors can often be easily transformed into models without serial
correlation in the errors
Example B (cont): Notice that:

yt = β1 + β2 yt 1 + ut
= β1 + β2 yt 1 + ρ(yt 1 β1 β2 yt 2 ) + εt
= β (1 ρ) + ( β + ρ)yt 1 ρβ2 yt 2 + εt
| 1 {z } | 2{z } | {z }
α0 α1 α2
= α0 + α1 yt 1 + α2 yt 2 + εt

where E[εt jyt 1 , yt 2 , . . .] = 0 and


E[yt jyt 1 , yt 2 , . . .] = E[yt jyt 1 , yt 2 ] = α0 + α1 yt 1 + α 2 yt 2.

67 / 68

Misspecification

Thus, the “relevant” model is an AR(2) model for yt . With further


conditions on the parameters we can estimate the αj ’s
consistently.
Hence, if you have serial correlation you can add a lagged
dependent variable to the model and that might lead to a model
with no serial correlation.

68 / 68

You might also like