Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Block 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 105

Block 2

Regression models based on time series


Stationarity, cointegration, ECM,
Distributed lag models, PAM, AEH

Advanced econometrics 1 4EK608


Pokročilá ekonometrie 1 4EK416

Vysoká škola ekonomická v Praze


Outline

1 Time series-based LRMs – repetition from BSc. courses

2 Stationarity, unit root tests, cointegration


Unit root tests
Cointegration, ECM

3 TS & forecasting

4 Finite and infinite distributed lag models


Polynomial distributed lag
Geometric distributed lag (Koyck)
Rational distributed lag (RDL)
Partial adjustment model (PAM)
Adaptive expectations hypothesis (AEH)
Rational expectations
Time series-based LRMs – repetition from BSc. courses
TS is a stochastic (random) process, a sequence of observations
indexed by time.
Observed TS: one realization of a stochastic process.

Static models
yt = β1 + β2 xt + ut , t = 1, 2, . . . , n

Dynamic models
Finite distributed Lag (FDL) model:
yt = α0 + β0 xt + β1 xt−1 + β2 xt−2 + ut
Infinite distributed lag (IDL) model:
yt = α0 + β0 xt + β1 xt−1 + β2 xt−2 + · · · + ut
Dynamic models: lag order
For convenience, β subscripts follow lag order,
Impact / lagged / long-run multiplier,
Effect of temporary (one-off) × permanent increase in x,
Lag distribution (function).
G-M assumptions for TSRMs

TS.1 Linearity
The stochastic process {(xt1 , xt2 , . . . , xtk , yt ); t = 1, 2, . . . , n}
follows a linear model yt = β1 + β2 xt2 + · · · + βK xtK + ut .

TS.2 No perfect collinearity


There is no perfect collinearity among regressors.
Comment: It allows collinearity among regressors.

TS.3 Strict exogeneity


For each t, the expected value of error conditionally on the
explanatory variables at all time periods is zero:
E(ut |X) = 0, t = 1, 2, . . . , n

Note: Compare strict exogeneity to a relaxed version,


contemporaneous exogeneity (discussed next):
E(ut |xt1 , xt2 , . . . , xtk ) = E(ut |xt ) = 0.
G-M assumptions for TSRMs

TS.4 Conditional homoscedasticity

var(ut |X) = var(ut ) = σ 2 , t = 1, 2, . . . , n

TS.5 Serial correlation (autocorrelation) is not present

corr(ut , us |X) = 0, t 6= s

TS.6 Normality
ut are independent of X and i.i.d. : ut ∼ N (0, σ 2 ), i.i.d.

CLRM: Classical linear regression model


TS.1 - TS.6 conditions hold
Properties of OLS estimators

Under TS.1 - TS.3, OLS estimators are unbiased.

Under assumptions TS.1 - TS.5,


var(βˆj ) = σ̂ 2 (X 0 X)−1 and σ̂ 2 = SSR
n−K , where (n − K) = d.f.

Under assumptions TS.1 - TS.5,


OLS estimators are BLUE (conditional on X).

Under assumptions TS.1 - TS.6, β̂j are normally distributed.


Under H0 , each t statistics has a t distribution and F statistic
has a F distribution (small-sample and asymptotically). The
usual construction of confidence intervals is also valid.
Trends and spurious regression

Regression: y on t
If there is a linear trend in y

Regression: log y on t
Exponential trend, constant rate of growth of y

Spurious regression:
We can find relationship between two or more trending variables
even if it does not exist in reality (non-stationarity and
cointegration topics discussed next).
Detrending and deseasonalizing

Detrending algorithm (based on FWL theorem):

yˆt = βˆ1 + βˆ2 xt2 + βˆ3 xt3 + βˆ4 t

Regress each variable on constant, time and save residuals

ÿt , ẍt2 , ẍt3 , t = 1, 2, . . . , n

Regress ÿt on ẍt2 , ẍt3

ÿˆt = βˆ2 ẍt2 + βˆ3 ẍt3

Coefficients βˆ2 , βˆ3 from this regression are the same as in the
original regression
Coefficient of determination when y is trending

With trending y, coefficient of determination overshoots.

2 σ̂u2
R =1−
σ̂y2

σ̂u2 is unbiased estimator (if trend is among regressors).


SST
Pn
With trending y, σ̂y2 = (n−1) , where SST = t=1 (yt − y)2
is neither unbiased nor consistent estimator.

Better approach: regress ÿt on ẍt1 , ẍt2 . The corresponding


coefficient of determination from this regression, i.e.:
SSR
R2 = 1 − Pn 2
t=1 ÿt

is more reliable (does not overshoot) when compared to the


original regression.
Detrending and deseasonalizing

Deseasonalizing algorithm (based on FWL theorem):


Example based on quarterly data

yˆt = βˆ1 + βˆ2 xt2 + βˆ3 xt3 + γˆ1 dummy1 + γˆ2 dummy2 + γˆ3 dummy3

Regress variables on constant, seasonal dummies and save residuals:

ÿt , ẍt2 , ẍt3 , t = 1, 2, . . . , n

Regress ÿt on ẍt2 , ẍt3

ÿˆt = βˆ2 ẍt2 + βˆ3 ẍt3

Coefficients βˆ2 , βˆ3 from this regression are the same as in the original
regression
Stationary and weakly dependent time series

Time series-based LRMs:


Strict exogeneity, homoscedasticity, absence of serial correlation
and normality assumptions are very limiting

With large samples, weaker assumptions are sufficient

For large samples, key assumptions are:


Covariance stationarity and weak dependency
A time series is strictly stationary if its marginal and all joint
distributions are invariant across time.

Covariance stationarity: first two moments and


auto-covariance do not change over time.
E(yt ) = µ, var(yt ) = σ 2 , cor(yt , yt+h ) = f (h)

Weak dependency: correlation between yt and yt+h


“quickly” converges to zero with h growing to infinity.
Stationary and weakly dependent time series

For Central Limit Theorem (CLT) and Law of Large Numbers


(LLN) to hold, dependency between observations must not be
too strong and must sufficiently quickly decrease with growing
time distance between them.

Time series can be non-stationary and weakly dependent.

Notes on CLT & LLN


CLT implies that the sum of independent random variables
(or weakly dependent RVs) – if centered and standardized by
its s.d. – has an asymptotic distribution N (0, 1).
LLN theorem implies that the average taken from a random sample
converges in probability to the population average; LLN holds for
stationary and weakly dependent series.
Stationary and weakly dependent time series
Examples of weakly dependent time series:

Moving average process of order one: ma(1)

yt = et + α1 et−1 ,

where et is an iid time series.


α1
corr(yt , yt+1 ) = 1+α21
,
observations with higher time distance than 1 are uncorrelated.

Stable autoregressive process of order 1: ar(1)


Under stability condition |ρ| < 1, it can be demonstrated that
(Wooldridge, Introductory econometrics, ch. 11.1):

yt = ρyt−1 + et ⇒ corr(yt , yt+h ) = ρh

If the stability condition holds, TS is weakly dependent because


correlation converges to zero with growing h.
Asymptotic properties of OLS estimators

TS.1’ Linearity
The stochastic process {(xt1 , xt2 , . . . , xtk , yt ); t = 1, 2, . . . , n}
follows the linear model yt = β0 + β1 xt1 + · · · + βk xtk + ut
We assume both dependent and independent variables are
stationary and weakly dependent.

TS.2’ No perfect collinearity


There is no perfect collinearity among regressors.
Comment: the same assumption as TS.2

TS.3’ Contemporaneous exogeneity


Null conditional expected value of errors:

E(ut |xt1 , . . . , xtk ) = E(ut |xt ) = 0


Asymptotic properties of OLS estimators

TS.4’ Contemporaneous homoscedasticity

var(ut |xt ) = var(ut ) = σ 2

TS.5’ No serial correlation


(autocorrelation in residuals is not present)

corr(ut , us |xt , xs ) = 0, t 6= s
Asymptotic properties of OLS estimators
Under assumptions TS.1’, TS.2’ and TS.3’,

OLS estimators are consistent (not unbiased)


TS.1’ - TS.3’ ⇒ plim β̂j = βj , j = 0, 1, . . . , k

Removing strict exogeneity (TS.3 → TS.3’): no restriction on how ut


is related to regressors in other time periods. Hence:

We allow for feedback from (lagged) explained variable to


“future” values of explanatory variables

We can use lagged dependent variable as a regressor.

Theorem: Asymptotic normality of OLS: Under assumptions TS.1’ –


TS.5’, OLS estimators are asymptotically normally distributed.

Usual OLS standard errors, t-statistics and F -statistics are


asymptotically valid.
β̂ → N (β, σ̂ 2 (X 0 X)−1 ) as n → ∞
Serial correlation in TSRM

Causes and effects of autocorrelation

Causes:
DGP, dynamic incompleteness of models

Effects on OLS estimates:


Unbiased β̂j (only with strictly exogenous regressors),
Consistent β̂j (contemporaneous exogeneity holding),
Biased inference.

FGLS (only with strictly exogenous regressors)

OLS + robust inference (Newey-West & other HAC s.e.)


Serial correlation in TSRM
Testing AR(1) with strictly exogenous regressors

ρ estimation:
yt = β0 + β1 xt1 + · · · + βk xtk + ut
ut = ρut−1 + et
we estimate ρ from:
ût = ρût−1 + error
H0 : ρ = 0

Durbin-Watson test:
n
P 2
ût − ût−1
t=2
d= n
û2t
P
t=1
.
small sample validity (conditions apply)
d-statistic: symmetric distribution h0, 4i , E(d) = 2
d ≈ 2(1 − ρ̂) i.e. ρ̂ ≈ 1 − d/2. Test for: H0 : ρ = 0.
Serial correlation

Testing serial correlation with general regressors

Testing AR(1):

ût = α0 + α1 xt1 + · · · + αk xtK + ρût−1 + error

H0 : ρ = 0
We can use a heteroscedasticity-robust version of the t-test.

Breusch-Godfrey test for AR(q) serial correlation:

ût = α0 + α1 xt1 + · · · + αk xtK + ρ1 ût−1 + · · · + ρq ût−q + error

H0 : ρ 1 = · · · = ρ q = 0

Use F -test or LM-test: (n − q)Ra2 ∼ χ2(q) .


H0
Adjusting LRMs for serial correlation
LRM on TS with autocorrelation (for |ρ| < 1):

yt = x0t β + ut , ut = ρut−1 + εt ,
we can substitute the autoregressive process into main LRM:

yt = x0t β + ρut−1 + εt note: ut−1 = yt−1 − x0t−1 β,


hence:

yt = ρyt−1 + x0t β − ρ(x0t−1 β) + εt ,


which is non-linear in parameters (ρβ) and can be

(a) Estimated by NLS (general regressors)

(b) Re-arranged into generalized first differences specification


for example: (xtj − ρxt−1,j ), where |ρ| < 1.

(yt − ρyt−1 ) = (x0t − ρx0t−1 )β + εt ,


and estimated by ‘Hildreth-Lu’, ‘Cochrane-Orctutt’
or ‘Prais-Winsten’ methods.
Adjusting LRMs for serial correlation
LRM on TS with autocorrelation (for |ρ| < 1):

yt = x0t β + ut , ut = ρut−1 + εt ,
(yt − ρyt−1 ) = (x0t − ρx0t−1 )β + εt .

Hildreth-Lu:

a) Start by repeated OLS estimation of the generalized first


differences specification, using different (fixed) values of
ρ ∈ (−1, 1).
b) From such auxiliary regressions (i.e. for different ρ values), select
one that minimizes RSS.

Potentially computationally expensive (fine-grain iterations


along ρ-values). No advantage over NLS methods
(Hildreth-Lu is a form of “brute-force” NLS estimator).
Global optimum (min RSS) can be reached.
Adjusting LRMs for serial correlation
LRM on TS with autocorrelation (for |ρ| < 1):

yt = x0t β + ut , ut = ρut−1 + εt ,
(yt − ρyt−1 ) = (x0t − ρx0t−1 )β + εt .

Cochrane-Orctutt:

a) Start by OLS estimation of original LRM and obtain residuals


ût . Estimate ρ̂ from the ar(1) model ût = ρût−1 + εt .
b) Use ρ̂ to calculate generalized first differences (GFD) and
estimate β parameters of the GFD model.
c) Vector β̂ from step (b) can be used to produce ‘improved’
residuals ût = yt − x0t β̂ and subsequent improved ρ̂ estimates.
d) β and ρ estimates are iteratively improved until no substantial
change in the estimated value of ρ is observed.

Computationally accessible, typically just a few iterations.


Convergence to a local (not global) minimum of RSS can happen.
Adjusting LRMs for serial correlation
LRM on TS with autocorrelation (for |ρ| < 1):

yt = x0t β + ut , ut = ρut−1 + εt ,
(yt − ρyt−1 ) = (x0t − ρx0t−1 )β + εt .

Prais-Winsten:

Small alteration to the Cochrane-Orcutt (CO) method.


CO disregards first observation from the dataset – due to
(generalized) first differences.

PW transformation retains the first observation with weight


p
1 − ρ2 .
All remaining steps and features of the PW method follow
directly from CO.
In small samples (short TS), PW is efficient w.r.t. CO, as sample
size for estimation of the transformed model does not decrease.
Adjusting LRMs for serial correlation

SLRM on TS with autocorrelation, example outline (for |ρ| < 1):

yt = β1 + β2 xt + ut , ut = ρut−1 + εt ,

(yt − ρyt−1 ) = β1 (1 − ρ) + β2 (xt − ρxt−1 ) + εt .


yt∗ = β1∗ + β2 x∗t + εt

     p 
y1 — y1 1 − ρ2
 y2 
 

 y2 − ρy1


 y2 − ρy1 
 
where y =  y3  ,
  ∗
yCO =
 y3 − ρy2
,
 ∗
yPW =  y3 − ρy2 ,
 
 ..   ..   .. 
.  .   . 
yn yn − ρyn−1 yn − ρyn−1

X-matrix elements are transformed by analogy,


FGLS (CO/PW) is the OLS on transformed data,
β̂1∗
β̂1 = (1−ρ̂) and β̂2 is used “directly” from the GFD form.
Stationarity, unit root tests, cointegration

Stationarity

Unit root tests

Cointegration
Weakly and strongly dependent TS

Weakly dependent time series

Moving average process of order one ma(1) yt = et + α1 et−1 ,


where et is i.i.d. time series.
yt and yt+h observations with distance h ≥ 2 are uncorrelated.
This process is stationary.

For stable autoregressive process of order 1 ar(1):


yt = ρyt−1 + et ⇒ cor(yt , yt+h ) = ρh

If stability condition |ρ| < 1 holds, the process is weakly


dependent because correlation converges to zero with growing h.
Also, this process is stationary for y0 = 0.
Weakly and strongly dependent TS
Strongly dependent time series:
Random walk:

yt = yt−1 + et , et ∼ Distr (0, σ 2 ), iid


by consecutive substitutions:
yt = yt−2 + et−1 + et ,
yt = yt−3 + et−2 + et−1 + et ,
···
yt = y0 + e1 + · · · + et−1 + et .

Shocks have permanent effects, the series is not covariance stationary


and is strongly dependent.

E(yt ) = E(y0 )
var(yt ) = σe2 t
p
cor(yt , yt+h ) = t/(t + h)

Correlation decreases very slowly and speed depends on t.


Strongly dependent TS

Two realizations of a random walk

1
y

0 10 20 30 40 50
x
Strongly dependent TS

Random walk with a drift

yt = α0 + yt−1 + et ⇒ yt = α0 t + et + et−1 + · · · + e1 + y0

A linear trend with random walk around the trend.


It is neither covariance stationary nor weakly dependent.

E(yt ) = α0 t + E(y0 )
var(yt ) = σe2 t
p
cor(yt , yt+h ) = t/(t + h)

Correlation decreases very slowly and decline speed depends on t.


Weakly and strongly dependent TS

Realization of a random walk with a drift


50

40

30
yd

20

10

0 10 20 30 40 50
x

Different realizations of trending TS (weakly dependent around


the trend) may produce similar time series.
Weakly and strongly dependent TS

yt = 1 · yt−1 + ut = yt−1 + ut

Unit root process: yt = yt−1 + ut ;


where: ut is a weakly dependent series.
Random walk is a special case of the unit root process
where: ut ∼ Distr (0, σu2 ), iid

We need to distinguish strongly and weakly dependent TS:


Economic reasons:
In strongly dependent series, shocks or policy changes have long
or permanent effects; in weakly dependent series, their effects are
only temporary.
Statistical reasons:
Analysis with strongly dependent series must be handled in
specific ways.
Integrated series

Terminology - Order of integration

Weakly dependent TS are integrated of order zero: I(0).

If we have to difference a TS once to get a weakly dependent TS,


then it is integrated of order 1: I(1).

Random walk – example of a I(1) process:

yt = yt−1 + et ⇒ ∆yt = yt − yt−1 = et


log yt = log yt−1 + et ⇒ ∆ log yt = et

A time series is integrated of order d: I(d), if it becomes


a weakly dependent TS after being differenced d times.
Unit roots tests

Unit root tests help to decide if a time series is I(0) or not

Use either some informal procedure or a unit root test

Informal procedures
Analyze autocorrelation of the first order

ρ̂ = corr(y
ˆ t , yt−1 )

If ρ̂ approaches 1, it indicates that the series can have unit


root. Alternatively, it could have a deterministic trend.
We can analyze sample autocorrelations using
a correlogram
Unit root tests

cov (yt ,yt−h )


Correlogram: ρh = σyt ·σyt−h

I(1)-like series I(0)-like series


Unit root tests

Dickey-Fuller (DF) test – motivation


Unit root test in an ar(1) process:
yt = α + ρyt−1 + et
H0 : ρ = 1, H1 : ρ < 1

Under H0 , yt has a unit root.


◦ For ρ = 1 ∧ α = 0 → yt is a random walk.
◦ For ρ = 1 ∧ α 6= 0 → yt is a random walk with a drift
and E(yt ) is a linear function of t .
Under H1 , yt is a weakly dependent ar(1) process.
Unit root tests

Dickey-Fuller (DF) test – motivation


Unit root test in an ar(1) process:
yt = α + ρyt−1 + et
H0 : ρ = 1, H1 : ρ < 1
For DF tests, H1 : ρ < 1 is a common simplification to the full space
of alternatives to H0 : ρ = 1.
For |ρ| < 1, yt is weakly dependent (as plim ρh = 0)
However, if unit root is likely to be present, the probability of
ρ < 0 is negligible.
We usually ignore the possibility of ρ > 1 , as it would lead to
explosive behavior in yt .
. . . |ρ| > 1 would allow for explosive oscillations in yt .
Dickey Fuller (DF) test

Basic equation for unit root test in an ar(1) process:


yt = α + ρyt−1 + et
For DF test, we apply a suitable transformation to yt :
we subtract yt−1 from both sides of the equation:

∆yt = α + (ρ − 1)yt−1 + et ; apply substitution: θ = (ρ − 1)


i.e. H0 : ρ = 1 ⇔ H0 : θ = 0
∆yt = α + θyt−1 + et ; now: H1 : ρ < 1 ⇔ H1 : θ < 0

We use a t-ratio for testing H0 : θ = 0. However:


Under H0 , t-ratios don’t have a t-distribution, but follow a
DF -distribution. (-negative- critical values of the DF
distribution are much farther from zero)
Critical values for the DF distribution are available from
statistical tables and implemented in most relevant SW packages.
DF test & ADF test

TS with unit root can manifest various levels of complexity. Hence, DF test
for a given yt TS is usually performed using the following specifications:

∆yt = θyt−1 + et
random walk
∆yt = α + θyt−1 + et random walk with a drift
∆yt = α + θyt−1 + δt + et random walk with a drift and trend

DF test is the same (H0 : θ = 0) for all specifications /critical values difffer/
Augmented Dickey-Fuller (ADF) test is a common generalization of DF test
(example: Augmentation of the DF test for the 2nd specification)

∆yt = α + θyt−1 + γ1 ∆yt−1 + · · · + γp ∆yt−p + et

When estimating θ, we control for possible ar(p) behavior in ∆yt .


ADF test has the same null hypothesis as a DF test → H0 : θ = 0.
Unit root tests in R: package {urca}

Description of the options for the ur.df() function:

1 type "none"
∆yt = θyt−1 + et
tau1: we test for H0 : θ = 0 (unit root)

2 type "drift"
∆yt = α + θyt−1 + et
tau2: H0 : θ = 0 (unit root)
phi1: H0 : θ = α = 0 (unit root and no drift)

3 type "trend"
∆yt = α + θyt−1 + δt + et
tau3: H0 : θ = 0 (unit root)
phi2: H0 : θ = α = δ = 0 (unit root, no drift, no trend) phi3:
H0 : θ = δ = 0 (unit root and no trend)
Multiple other unit root tests exist:
(KPSS, tests for seasonal data, break in the DGP, etc.).
Unit root tests

ADF test for TS with trend

∆yt = α + θyt−1 + δt + γ1 ∆yt−1 + · · · + γp ∆yt−p + et

Under the alternative hypothesis of no unit root, the process is


trend-stationary.

The critical values in the ADF distribution with time trend are
even more negative as compared to random walk and random
walk with a drift.

When using DF/ADF specification 1 or 2 (R-W, R-W with drift)


to test for unit root in a clearly trending TS, the test would not
have sufficient power (we would not reject H0 for trending
weakly dependent TS).
Unit roots and trend-stationary series

∆yt = α + θyt−1 + δt + γ1 ∆yt−1 + · · · + γp ∆yt−p + et

Terminology:

Stochastic trend: θ = 0
Also called difference-stationary process: yt can be
turned into I(0) series by differencing. Terminology
emphasizes stationarity after differencing yt instead of weak
dependence in differenced TS.
Deterministic trend: δ 6= 0, θ < 0
Also called trend-stationary process: has a linear trend,
not a unit root. yt is weakly dependent - I(0) - around its
trend. We can use such series in LRMs, if trend is also used
as regressor.

DF/ADF tests are not precise tools. Distinguishing between


stochastic and deterministic trend is not easy (sample size!).
Stationarity in ar(p) processes

Introduction: lag operators:


for some ar(p) TS xt :

Lxt = xt−1
L(Lxt ) = L2 xt = xt−2
···
p
L xt = xt−p

Using lag operators,

ar(p) process : xt = α + φ1 xt−1 + φ2 xt−2 + · · · + φp xt−p + ut

can be rewritten as:

(1 − φ1 L − φ2 L2 − · · · − φp Lp )xt = α + ut
Stationarity in ar(p) processes

(1 − φ1 L − φ2 L2 − · · · − φp Lp )xt = α + ut (1)
Stochastic process (1) will only be stationary if the roots of
corresponding equation (2) are all greater than unity in absolute value

1 − φ1 L − φ2 L2 − · · · − φp Lp = 0 (2)

Illustration 1 – ar(1) process:


xt = α + φxt−1 + ut (3)
(1 − φL)xt = α + ut

1 − φL = 0
L = 1/φ
For (3) to be stationary, |L| > 1 ↔ −1 < φ < 1
Stationarity in ar(p) processes

Illustration 2 – ar(3) process:

xt = 2 + 3.9xt−1 + 0.6xt−2 − 0.8xt−3 + ut


To evaluate stationarity of xt , we use

1 − 3.9L − 0.6L2 + 0.8L3 = 0,

which can be factorized:

(1 − 0.4L)(1 + 0.5L)(1 − 4L) = 0

1st root :L = 2.5


2nd root :L = −2
3rd root :L = 0.25 ⇒ TS is non-stationary
Handling trend-stationary time series

Trend-stationary TS fulfill TS.1’ assumption. We can use them


in regressions if we have time trend among regressors.

Strongly dependent time series do not fulfill TS.1’ assumption.


We cannot use them in regressions directly.

Sometimes, we can transform such series into weakly dependent


time series.
Sometimes, taking logarithms helps.
Differencing is popular, but it has drawbacks.
As a special case, LRMs can be estimated and inference
performed if TS in the model are cointegrated.
Handling strongly dependent time series
Example - differencing TS:
yt = β0 + β1 xt + εt yt , xt ∼ I(1) (4)
yt−1 = β0 + β1 xt−1 + εt−1 εt ∼ i.i.d. (5)
∆yt = β1 ∆xt + vt vt = εt − εt−1 (6)

Coefficient β1 does not change between (4) and (6).


However, equations (4) and (6) are different.
⇒ β1 has two interpretations now: it is the change in yt for a
unit change in xt , but is it also the change in the growth of y for
a unit change in the growth of x.
Problems involved in the approach:
1 Differenced erros vt are no longer i.i.d.
2 In model (6), we loose information linked with the levels of
variables, short term relations are stressed.
3 Information drawn from (6) often generates bad long-term
predictions of level variables: ∆ŷt = β̂1 ∆xt ; . . . what if
β0 6= 0?
Handling strongly dependent time series

Cointegrated TS – motivation
Some properties of integrated processes

1 The sum of stationary and non-stationary series must be


non-stationary.

2 Consider a process yt = α + βxt :


· If xt is stationary then yt will be stationary.
· If xt is non-stationary then yt will be non-stationary.

3 If two time series are integrated of different orders, then any


linear combination of the series will be integrated at the higher
of the two orders of integration.

4 Sometimes (“special case scenario”) it turns out that a linear


combination of two I(d) series is integrated of order less then d.
Spurious regression or cointegration

Spurious regression Regressing one I(1)-series on another


I(1)-series may lead to extremely high t-statistics even if the
series are completely independent. Similarly, the R2 of such
regressions tend to be very high.
Regression analysis involving time series that have a unit root
may generate completely misleading inferences.

Cointegration Fortunately, regressions with I(1)-variables are


not always spurious: If there is a stable relationship between
time series that, individually, display unit root behavior, these
time series are called “cointegrated”.
Spurious regression or cointegration

General definition of cointegration

Two I(1)-time series yt , xt are said to be cointegrated if there exists a


stable relationship between them, where:

yt = α + βxt + et , et ∼ I(0)

Cointegration (CI) test if CI parameters are known

For residuals of the known CI relationship:

et := yt − α − βxt ,

test whether the residuals have a unit root


(DF/ADF and other unit root tests may be applied “directly”).
If the unit root H0 is rejected, yt , xt are cointegrated.
Spurious regression or cointegration

Testing for CI if the parameters are unknown


If the potential relationship is unknown, it can be estimated by
OLS. After that, we test whether the regression residuals have a
unit root. If the unit root is rejected, this “means” (test result
interpretation at a given significance level) that yt & xt are
cointegrated. Due to the pre-estimation of parameters, critical
values are different than in the case of known parameters.
(Software handles this automatically.)
The CI relationship may include a time trend
If the two series have differential time trends (drifts in this case),
the deviation between them may still be I(0) but with a linear
time trend. In this case one should include a time trend in the
CI-regression. Also, we have to use different critical values when
testing residuals.
(Software handles this automatically.)
Cointegration tests based on regression residuals

Engle-Granger test estimates a p-lag ADF equation:


p
X
∆ût = θ ût−1 + ∆ût−j + et
j=1

Esentially, this is an ADF test on ût [θ = (ρ − 1)]


Specific critical values apply (farther from 0 than t or DF ).

Phillips-Ouliaris test estimates a DF equation:


∆ût = θ ût−1 + et

The t-ratio is based on robust standard errors,


different estimators exist for the robust standard errors.

In both cases (EG and PO), H0 of unit root in û


i.e. “no-cointegration” is tested.
Error correction model (ECM)

It can be shown that when variables are cointegrated, i.e. when


there exists a long-term relationship among them, their
short-term dynamics are related as in a so-called error correction
model (ECM).
Error correction model (ECM)

Autoregressive distributed lag models

Autoregressive distributed lag model with one regressor


p
X q
X
ADL(p, q) : yt = β0 + βi yt−i + γj xt−j +ut , ut ∼ iid(0, σ 2 )
i=1 j=0

There are many useful modifications/simplifications/restrictions


to the general ADL(p, q) process. For example, ADL(1,1) is
equivalent to a ECM specification with one lag used in the
error-correction mechanism (shown next):

ADL(1, 1) : yt = β0 + β1 yt−1 + γ0 xt + γ1 xt−1 + ut . (7)

Additional ADL(1, 1) restrictions: β1 = 1 and γ1 = −γ0

give a model in 1st diffs.: ∆yt = β0 + γ0 ∆xt + ut .


Error correction model (ECM)

For ADL(1,1) model (7), suppose there is an equilibrium value x◦ and


in the absence of shocks, xt → x◦ as t → ∞. Then, assuming absence
of ut errors, yt converges to steady state: y ◦ .
Hence, the ADL(1,1) model (7) can be re-written as:

y ◦ = β0 + β1 y ◦ + (γ0 + γ1 )x◦

Solving this for y ◦ as a function of x◦ , we get


β0 γ0 + γ1 ◦ β0
y◦ = + x = + λx◦
1 − β1 1 − β1 1 − β1
γ0 +γ1
where λ ≡ 1−β1 and |β1 | < 1 is assumed.
Error correction model (ECM)

β0
y◦ = + λx◦ ,
1 − β1
γ0 + γ1
λ≡ , |β1 | < 1
1 − β1

λ is the long-run derivative of y ◦ with respect to x◦ .

λ is an elasticity if both y ◦ and x◦ are in logs.

λ̂ can be computed directly from the estimated parameters of the


ADL(1,1) model (7).
Error correction model (ECM)

The ADL(1,1) equation (7) - repeated here for convenience:

yt = β0 + β1 yt−1 + γ0 xt + γ1 xt−1 + ut ,

can be equivalently rewritten as follows:

∆yt = β0 + (β1 − 1)(yt−1 − λxt−1 ) + γ0 ∆xt + ut . (8)


γ0 +γ1
Again, λ ≡ 1−β1 and |β1 | < 1 is assumed.
Equation (8) is an error-correction model (ECM).
Error correction model (ECM)

ECM: ∆yt = β0 + (β1 − 1)(yt−1 − λxt−1 ) + γ0 ∆xt + ut .

(yt−1 − λxt−1 ) measures the extent to which the long run


equilibrium between yt and xt is not satisfied (at t − 1).
Consequently, (β1 − 1) can be interpreted as the proportion of
the disequilibrium (yt−1 − λxt−1 ) that is reflected in the
movement of yt , i.e. in ∆yt .
(β1 − 1)(yt−1 − λxt−1 ) is the error-correction term.
Many ADL(p, q) specifications can be re-written as ECMs.
ECMs can be used with both stationary & cointegrated
non stationary TS.
ECMs (β1 − 1) is essentially the same as θ from
Partial adjustment model (discussed next).
Error correction model (ECM)

Some more complicated ECMs:

1) We can use higher order lags, e.g. ADL(2,2):

yt = β0 + β1 yt−1 + β2 yt−2 + γ0 xt + γ1 xt−1 + γ2 xt−2 + ut ,

to establish ECMs. It is again possible to rearrange and


re-parametrize ADL(2,2) to get an ECM. More than one
re-parameterization is possible.

2) More than two variables can enter into an equilibrium


relationship.
ECM: non-stationary & cointegrated series

Superconsistency: yt = β0 + β1 xt + ut
1 Provided xt and yt are cointegrated, the OLS estimators β̂0 and
β̂1 will be consistent.
2 β̂j converge in probability to their true values βj more quickly in
the cointegrated non-stationary case than in the stationary case
(asymptotic efficiency).

Consequences:
For simple static regression between two cointegrated variables:
yt , xt ∼ C(1, 1), super-consistency applies (with deterministic
regressors such as intercept and trend added upon relevance).
Dynamic misspecifications do not necessarily have serious
consequences. This is a large sample property - in small samples, OLS
estimators are biased.
(Specific statistical inference applies to cointegrating vectors.)
ECM: non-stationary & cointegrated series

Granger representation theorem:1


If two TS xt and yt are cointegrated, the short-term disequilibrium
relationship between them can be expressed in the ECM form
∆yt = lagged(∆y, ∆x) − δut−1 + εt (9)

where ut−1 = yt−1 − β0 − β1 xt−1 is the disequilibrium error and δ is a


short-run adjustment parameter.
Note: as u is on the scale of y, δ can be interpreted in percentages.
Example: δ = 0.8 → 80% of the disequilibrium error gets corrected between
t − 1 and t (on average).
Two implications:
1 The general-to-specific model search can focus on ECMs
2 Engle-Granger two-stage procedure

1
Engle and Granger (1987)
ECM: non-stationary & cointegrated series

Engle-Granger two-stage procedure:


We short-cut the search of an ECM from a general model:

1st stage: Estimation of the cointegrating (static) regression and


saving residuals
ût = yt − β̂0 − β̂1 xt

2nd stage: Use residuals ût−1 in (9) instead of ut−1 and estimate by
OLS

Estimators are consistent and asymptotically efficient, but biased in


small samples.
Assumptions: yt and xt are non-stationary and cointegrated.
ECM: non-stationary & cointegrated series

Possibility of more cointegrating vectors:


Long-run relationship: yt = β0 + β1 xt + β2 wt + β3 zt + ut ,
assume all observed variables are I(1) and cointegrating relationship
exists – then the disequilibrium error is:

ut = [yt − β0 − β1 xt − β2 wt − β3 zt ] ∼ I(0). (10)

If a linear combination of variables such as (10) is stationary, then the


coefficients in this relationship form a cointegrating vector, e.g.
(1, −β1 , −β2 , −β3 ).
In the multivariate case, there may be more than one linearly
independent stationary combination linking the cointegrated variables
(topic discussed separately along VAR/VECM estimators).
Cointegration: the existence of at least one cointegrating vector.
Cointegration among more than two variables

Testing and estimation


Cointegration can be tested using the EG and/or PO tests
Only one cointegrating vector exists
Estimation can proceed by the Engle-Granger two-stage method for
ECMs.
Two or more cointegrating vectors
Engle-Granger two-stage method is not applicable. Johansen (1988)
suggests a maximum likelihood approach.
TS & forecasting

Chow tests

Forecasts from TS-based models


Chow tests

For any LRM: y = Xβ + u

Say, the sample (time series) for a period t = 1, 2, . . . , T may be


conveniently divided into two groups: T1 + T2 = T .
[ consider two periods: fixed vs. floating F/X rates ]
[ pre-EU accession vs. post-EU accession period ]
[ applies to CS data as well; e.g Male/Female ]

Now, the LRM’s vectors and matrices may be partitioned as


follows:
     
y1 X1 u
= β+ 1
y2 X2 u2
where y10 = (y1 , . . . , yT1 ), y20 = (yT1 +1 , . . . , yT ), etc.
i.e. {y1 , X1 } ∈ T1 , {y2 , X2 } ∈ T2 .
Chow tests

For any LRM: y = Xβ + u, Chow test can be based on an auxiliary


regression (unrestricted model for the F test):
       
y1 X1 0 u
= β+ γ+ 1
y2 X2 X2 u2

where 0 is a zero-matrix of the same dimensions as X1 ,


i.e. (T1 ×K).

Also, we can see that:


T1 : ŷ = X β̂
T2 : ŷ = X(β̂ + γ̂)

Note: Power of the test depends on proper T1 vs. T2 cutoff.


Chow test may be generalized for 3+ time periods (groups).
Chow tests

For our unrestricted model:


       
y1 X1 0 u
= β+ γ+ 1
y2 X2 X2 u2

We can formulate the null of no structural change in model dynamics


between the two time periods (groups) as follows:
H0 : γ = 0, i.e.: γ1 = γ2 = γ3 = · · · = γK = 0
H1 : ¬H0

This can be tested using an F -test (or its HC version):

SSRr −SSRur n−2K


F = SSRur × K ∼ F [K, (n − 2K)]
H0

Note: Here, β1 and γ1 both relate to the intercept.


Chow test - CS-based example

A simple Chow test example for CS data:


(to assess whether parameters are equal for M/F students.)

Original model (Chow test restricted model):


. . . based on the well known Wooldridge dataset.

cumgpa = β1 + β2 sat + β3 hsperc + β4 tothrs + u

Auxiliary model (Chow test unrestricted model):

cumgpa = β1 + γ1 female
+ β2 sat + γ2 (female×sat)
+ β3 hsperc + γ3 (female×hsperc)
+ β4 tothrs + γ4 (female×tothrs) + u
Chow test - CS-based example (contd.)

Null hypothesis H0 : γ1 = γ2 = γ3 = γ4 = 0

If all interactions effects are zero, we have the same regression


function for both groups.

Estimate of the unrestricted model


\ = 1.48 − .353 female + .0011 sat + .0075 (female×sat)
cumgpa
(.21) (.411) (.0002) (.00039)

− .0085 hsperc − .00055 (female×hsperc)


(.0014) (.00316)

+ .0023 tothrs − .00012 (female×tothrs)


(.0009) (.00163)

. . . t-tests cannot be used to evaluate the joint H0 .


Chow test - CS-based example (contd.)

F -statistic:
(SSRr − SSRur )/K (85.515 − 78.355)/4
F = = ≈ 8.18
SSRur /(n − 2K) 78.355/(366 − 8)

. . . using p-value, we reject the null hypothesis

Important: Chow tests (all types) assume constant error


variance across groups.
Chow 1: stability test for TS

Here, the F -statistic for the Chow test is calculated in an alternative


way (Chow 1):
For a suitable (potential) “breakpoint”, we divide our sample
{t = 1, 2, . . . , T } in two groups:
“T1 ” with {t = 1, 2, . . . , T1 } and
“T2 ” with {t = T1 +1, T1 +2, . . . , T }
. . . note that the choice of T1 is arbitrary
. . . (breakpoint-searching algorithms can be used)
Run separate regressions for both T1 , T2 groups;
the SSRur is given by the sum of the SSRs of the two separately
estimated regression models.
. . . sufficient observations in T1 and T2 are required (d.f.)
Run the original (restricted) regression model on the whole
sample T and store SSRr .
Chow 1: stability test for TS

SSRr −SSRur T −2K


F = SSRur · K ∼ F ( K , T −2K )
H0

where
SSRur = SSR T1 + SSR T2
SSRr = SSR T
K is the number of parameters (including intercept) in LRM

H0 : stable structure of coefficients - no statistically significant


differences between parameters in T1 and T2 .
H1 : ¬H0 . H1 suggests a structural change in parameters over time.
Hence, separate regressions fit the data significantly better and
SSRur < SSRr (the difference is statistically significant).

Note: Chow 1 can be generalized for G time periods (G − 1 “breakpoints”).


PG
. . . In such case, SSRur = g=1 SSRg , d.f. = T − GK
. . . and we assume Tg > K for all time groups.
. . . (only usable for small G-values, problematic setup of breakpoints)
Chow 2: prediction test for TS

Sometimes, we do not have enough observations to estimate the LRM


separately for T1 and T2 as in the Chow 1 test.

In such case, we can use Chow 2: test of prediction unsuitability


(slightly different F -statistics).

The whole period is again divided into two subsets: T = T1 + T2 .


T1 is the “base” period (sample size)
T2 is the number of “additional” observations, it usually
corresponds to an ex-post prediction period
Chow 2: prediction test for TS

SSRr −SSRur T1 −K
F = SSRur · T2 ∼ F ( T2 , T1 −K )
H0

where
SSRur = SSR T1 (from LRM estimated for “base” period)
SSRr = SSR T (from LRM estimated for the whole period)
K is the number of parameters (including intercept) in LRM

H0 : additional (T2 ) observations come from the same DGP as


in T1 .
H1 : ¬H0 (assume significant differences between samples)
. . . If H0 is rejected, we would expect large differences
. . . between predictions and actual observations of yt .

If enough T1 and T2 observations are available, Chow 1 is preferred


(compared to Chow 2) as it has more “power”.
TS & forecasting

One-step-ahead forecast: ft
ft is the forecast of yt+1 made at time t
Forecast error et+1 = yt+1 − ft
Information set: It
Loss function: e2t+1 or |et+1 |
In forecasting, we minimize E(e2t+1 |It ) = E[(yt+1 − ft )2 |It ]
Solution: E(yt+1 |It )

Multiple-step-ahead forecast ft,h


ft,h is the forecast of yt+h made at time t
Solution: E(yt+h |It )
TS & forecasting

For some processes, E(yt+1 |It ) is easy to obtain:


1 Martingale process (MP):
If E(yt+1 |yt , yt−1 , . . . , y0 ) = yt , ∀t ≥ 0 then {yt } is MP ft = yt
If a process {yt } is a martingale then {∆yt } is martingale
difference sequence (MDS)

E(∆yt+1 |yt , yt−1 , . . . , y0 ) = 0

2 Process with exponential smoothing:

E(yt+1 |It ) = αyt + α(1 − α)yt−1 + · · · + α(1 − α)t y0 ; 0 < α < 1.

If we set f0 = y0 , then for t ≥ 1 : ft = αyt + (1 − α)ft−1


(Simplification: forecast for t + 1 is a weighted average of yt and
forecast of yt made at t − 1.)
TS & forecasting

3 Regression models
Static model: yt = β0 + β1 xt + ut
E(yt+1 |It ) = β0 + β1 xt+1 → Conditional forecasting
It contains xt+1 , yt , xt , . . . , y1 , x1
Here, knowledge of xt+1 is assumed (forecast condition).
E(yt+1 |It ) = β0 + β1 E(xt+1 |It ) → Unconditional forecasting
It contains yt , xt , . . . , y1 , x1
Here, xt+1 needs to be estimated before yt+1

Dynamic models depending on lagged variables only:


yt = δ0 + α1 yt−1 + γ1 xt−1 + ut
E(ut |It−1 ) = 0
E(yt+1 |It ) = δ0 + α1 yt + γ1 xt
Also, we can use more lags, drop or add regressors . . .
TS & forecasting

One-Step-Ahead Forecasting with


yt = δ0 + α1 yt−1 + γ1 xt−1 + ut :

point forecast: fˆt = δ̂0 + α̂1 yt + γ̂1 xt


forecast error: êt+1 = yt+1 − fˆt
s.e. of forecast: s.e.(êt+1 ) = {[s.e.(fˆt )]2 + σ̂ 2 }1/2

forecast interval: essentially the same as prediction interval


approximate 95% forecast interval is: fˆt ± 1.96×s.e.(êt+1 )
TS & forecasting

Example: File PHILLIPS


Forecasting US unemployment rate

u
\ nemt = 1.572 + .732 unemt−1
(.577) (.097)

2
n = 48, R = .544
u
\ nemt = 1.304 + .647 unemt−1 + .184 inft−1
(.490) (.084) (.041)

2
n = 48, R = .677
Note that these regressions are not meant as causal equations. The
hope is that the linear regressions approximate well the conditional
expectation.
TS & forecasting

Evaluating forecast quality


We can measure how forecasted values fit to actual observations
(in-sample criteria, e.g. R2 )

It is better, however, to evaluate the forecasting performance


when forecasting out-of-sample values (out-of-sample criteria).
For this purpose, use first n observations for estimation, and the
remaining m observations to calculate the forecast errors ên+h

Forecast evaluation measures:


Pm
Mean Absolute Error MAE = m−1 |ên+h |,
h=1
Pm
Root Mean Squared Error RMSE = (m−1 h=1 ê2n+h )1/2
k-Fold Cross-Validation (kFCV) approach
. . . has limited empirical benefits in TS analysis
TS & forecasting

Additional comments
Multiple-step-ahead forecasts are possible, but necessarily
less precise.
Forecasts may make use of deterministic trends, but the
error made by extrapolating time trends too far into the
future may be large.
Similarly, seasonal patterns may be incorporated into
forecasts.
It is possible to calculate confidence intervals for the point
multiple-step-ahead forecasts.
Forecasting I(1) time series can be based on adding
predicted changes (which are I(0)) to base levels.
Forecast intervals for I(0) series converge to the
unconditional variance, whereas for integrated series, they
are unbounded.
Finite and infinite distributed lag models

Finite and infinite distributed lag models

Polynomial distributed lag

Geometric distributed lag (Koyck transformation)

Rationad distributed lag

Partial adjustment model (PAM)

Adaptive expectations hypothesis (AEH)

Rational expectations
Finite and infinite distributed lag models

Finite distributed Lag (FDL) model:

yt = α0 + β0 xt + β1 xt−1 + β2 xt−2 + ut

Infinite distributed lag (IDL) model:

yt = α0 + δ0 xt + δ1 xt−1 + δ2 xt−2 + · · · + ut

Dynamically complete (FDL) models

Model is dynamically complete if we have a “sufficient” number


of lags of regressors, so that no more additional lags would help
with explanation of variability in the dependent variable.

In dynamically incomplete models, we usually detect


autocorrelation in the error term of the LRM.
FDL: Polynomial distributed lag (Almon)
Used in Finite distributed lag models
. . . example below also extends to higher order polynomials

yt = α + β0 xt + β1 xt−1 + · · · + βm xt−m + ut (11)


m
X
yt = α + ( βi xt−i ) + ut , (i is the lag operator here)
i=0

Simplifying assumption: order = 2, m = 8

0.5
βi = k0 + k1 i + k2 i2 (12)

β0 = k 0 0.4
0.3
beta

β1 = k0 + k1 + k2
0.2

β2 = k0 + k1 ·2 + k2 ·4
0.1

...
0.0

0 2 4 6 8

βm = k0 + k1 m + k2 m2 lag
Polynomial distributed lag

Almon-type transformation of (11) for m = 8 and k = 2:

yt = α + k0 xt + (k0 + k1 + k2 )xt−1 + (k0 + 2k1 + 4k2 )xt−2 +


+ · · · + (k0 + 8k1 + 64k2 )xt−8 + ut , (13)
yt = α + k0 (xt + xt−1 + · · · + xt−8 )+
+ k1 (xt−1 + 2xt−2 + · · · + 8xt−8 )+
+ k2 (xt−1 + 4xt−2 + · · · + 64xt−8 ) =
8
X 8
X 8
X
yt = α + k0 xt−i +k1 i xt−i +k2 i2 xt−i +ut (14)
i=0 i=1 i=1
| {z } | {z } | {z }
W0t W1t W2t

yt = α + k0 W0t + k1 W1t + k2 W2t + ut (15)

We estimate (15), then calculate βi (lags 0 to 8) as in (12)


. . . note the reduction in estimated parameters (10 vs 4).
Polynomial distributed lag

Method developed by Shirley Almon in the 60ies.


Equations (14) & (15) can be generalized easily: for m lags, sums
go to m, for higher order polynomials, we add more W -terms.

Advantages of this approach:


Saves degrees of freedom
Removes the problem of multicollinearity
Does not affect the assumptions for u, because errors do not
change during transformation

In EViews, transformation is slightly modified.


In R, routines are available.
Infinite distributed lag models

Infinite distributed lag (IDL) models

Lagged regressors extend back to infinity


We cannot estimate IDL models without the use of simplifying
restrictions on parameters,
i.e. restrictions on lag distribution

IDL models are useful under the assumption of lagged


coefficients converging to zero as lag increases

Order of the IDL model (∞),


impact multiplier vs. long-run multiplier,
temporary vs. permanent change in x,
. . . all analogical to FDL models
Geometric distributed lag (Koyck)

IDL linear regression model: yt = f (xt , xt−1 , xt−2 , . . . ):

yt = α + δ0 xt + δ1 xt−1 + δ2 xt−2 + δ3 xt−3 + · · · + ut

Assumption for the geometric δj weights:


δj = γρ j , 0 < ρ < 1, j = 0, 1, 2, . . .
• (j is the lag operator here)
• δ0 ≡ γ for convenience of notation (see RDL)
• |ρ| < 1 not used (it would allow sign oscillations in δj ).
δj = δj−1 ρ, 0 < ρ < 1.
• follows from assumption above

Instantaneous propensity (multiplier): δ0 ≡ γ = γρ0

Long-run propensity (multiplier):


γ
δ0 + δ1 + δ2 + · · · = γ + γρ + γρ2 + · · · = γ(1 + ρ + ρ2 + . . . ) = 1−ρ
Geometric distributed lag (Koyck)

Koyck transformation of the IDL model:

yt = α + δ0 xt + δ1 xt−1 + δ2 xt−2 + · · · + ut |δj = γρj


yt = α + γxt + γρxt−1 + γρ2 xt−2 + · · · + ut (16)

yt−1 = α + γxt−1 + γρxt−2 + γρ2 xt−3 + · · · + ut−1 | × ρ (17)


2
ρyt−1 = αρ + γρxt−1 + γρ xt−2 + · · · + ρut−1 (18)

Now, we subtract (18) from (16):

yt − ρyt−1 = α(1 − ρ) +γxt + ut − ρut−1 (19)


| {z } | {z }
α0 vt

yt = α0 + γxt + ρyt−1 + vt (20)


Geometric distributed lag (Koyck)
IDL model: yt = α + δ0 xt + δ1 xt−1 + δ2 xt−2 + · · · + ut
Koyck transf.: yt = α0 + γxt + ρyt−1 + vt
Using the Koyck transformation, we can calculate parameters
of the IDL model from the estimated model after Koyck
transformation:
δ̂0 = γ̂
δ̂j = γ̂ ρ̂ j ; j = 1, 2, 3, . . .
α̂0
α̂ = 1−ρ̂

Problems of the Koyck transformation:


in (20), regressor yt−1 is not exogenous (vt = ut − ρut−1 )
. . . IVR discussed separately
vt = ut − ρut−1 is not i.i.d.
Koyck transformation extends to models with multiple regressors
only if common ρ (geometric decay) can be assumed.
Rational distributed lag (RDL)

The geometric distributed lag is a special case of


rational distributed lag (RDL) model:

yt = α0 + γxt + ρyt−1 + vt (geometric distributed lag)


yt = α0 + γ0 xt + γ1 xt−1 + ρyt−1 + vt (RDL) (21)

This can be shown by successive substitution for the yt−1 term


of RDL equation (21), where we get:

yt = α + γ0 xt + (ργ0 + γ1 )xt−1 + ρ(ργ0 + γ1 )xt−2 +


+ ρ2 (ργ0 + γ1 )xt−3 + ρ3 (ργ0 + γ1 )xt−4 + . . .
+ ρh−1 (ργ0 + γ1 )xt−h + · · · + ut (22)

After estimating (21), we can calculate lag distribution for (22)


Rational distributed lag (RDL)

RDL specification:
yt = α0 + γ0 xt + γ1 xt−1 + ρyt−1 + vt
can be used to calculate δh in the IDL model:
yt = α + δ0 xt + δ1 xt−1 + δ2 xt−2 + · · · + ut
With RDLs, impact propensity γ0 ≡ δ0 can differ in sign from lagged
coefficients.
δh = ρh−1 (ργ0 + γ1 ) corresponds to the xt−h variable for h ≥ 1.
. . . δ0 may differ in sign from “lags”, even if 0 < ρ < 1.
. . . for ρ > 0, δh doesn’t change sign with growing h ≥ 1.
γ0 +γ1
Long-run propensity: LRP = 1−ρ ,
where (1 − ρ) > 0 ⇒ the sign of LRP follows the sign of (γ0 + γ1 ).
Also, ut = vt + ρvt−1 + ρ2 vt−2 + · · · M A(∞)
Koyck transformation and RDL model – summary

IDL model specification:


yt = α + δ0 xt + δ1 xt−1 + δ2 xt−2 + δ3 xt−3 + · · · + ut

Koyck: yt = α0 + γxt + ρyt−1 + vt


RDL: yt = α0 + γ0 xt + γ1 xt−1 + ρyt−1 + vt

Table 1: Koyck vs. RDL coefficients


Koyck RDL
Impact multiplier δ0 = γ δ0 = γ0
Lagged multiplier (lag h) δh = γρh δh = ρh−1 (ργ0 + γ1 )
γ γ0 +γ1
LRP 1−ρ 1−ρ
Partial adjustment model

Partial adjustment model (PAM)


is based on two main assumptions:
1 LRM describes long-run behavior of yt∗ : the unobserved,
expected/equilibrium/target/optimum value of yt :

yt∗ = α + βxt + ut (23)

2 Between two time periods, yt follows the process:

yt − yt−1 = θ(yt∗ − yt−1 ), 0<θ<1 (24)

Hence, the actual ∆yt is only a fraction of the “desirable” change


from yt−1 to the optimum value of yt∗ .
. . . in the special case of θ = 1, ∆yt leads to optimum.
Note: (24) can be re-written as: yt = θyt∗ + (1 − θ)yt−1
Partial adjustment model (PAM)
Parameter estimation of PAM:

yt∗ = α + βxt + ut (23)


yt = θyt∗ + (1 − θ)yt−1 , 0<θ<1 (24)

1 Substitute for yt∗ in (24) from (23):

yt = αθ + βθxt + (1 − θ)yt−1 + θut


(25)
yt = β00 + β10 xt + β20 yt−1 + θut
2 Estimate (25) and then calculate sought parameters of the PAM
in (23) and (24):
θ̂ = 1 − β̂20
α̂ = β̂00 /θ̂
β̂ = β̂10 /θ̂

Note that θut can be independent of yt−1 and i.i.d.


Partial adjustment model (PAM)

Parameter interpretation of PAM:

yt∗ = α + βxt + ut
yt = θyt∗ + (1 − θ)yt−1 , 0<θ<1

yt = αθ + βθxt + (1 − θ)yt−1 + θut


yt = β00 + β10 xt + β20 yt−1 + θut

Parameters:

θ : θ = (1 − β20 ) is the adjustment coefficient; higher θ̂ indicates


higher speed of adjustment towards equilibrium.
β10 : impact multiplier (short-run marginal propensity)
β : β = β10 /θ is the long run multiplier.
Adaptive expectations hypothesis (AEH)
Adaptive expectations hypothesis (AEH)
model is based on two main assumptions:

1 LRM describes behavior of yt , as a function of x∗t : the


unobserved, expected/equilibrium/target/optimum value
of xt (permanent income, potential output, etc.):

yt = α + βx∗t + ut . (26)

2 The unobserved x∗t process is defined as:

x∗t − x∗t−1 = φ(xt − x∗t−1 ), 0<φ<1


⇓ (27)
x∗t = φxt + (1 − φ)x∗t−1 .

with φ = 0 for static expectations


and φ = 1 for immediate adjustment.
Note: alternative 2nd hypothesis: x∗t = φxt−1 + (1 − φ)x∗t−1 .
Adaptive expectations hypothesis (AEH)
Parameter estimation of AEH model:

yt = α + βx∗t + ut , (26)

x∗t = φxt + (1 − φ)x∗t−1 (27)


Successive substitution for x∗t from (27) to (26): IDL process

yt = α + βφxt + βφ(1 − φ)xt−1 + βφ(1 − φ)2 xt−2 + · · · + ut (28)

After applying Koyck transformation, we get

yt = αφ + βφxt + (1 − φ)yt−1 + vt
(29)
yt = β00 + β10 xt + β20 yt−1 + vt

Estimate (29), then calculate parameters in (26) and (27). φ̂ = 1 − β̂20


(φ is the “adaptive expectations coefficient”)
α̂ = β̂00 /φ̂
β̂ = β̂10 /φ̂ (β10 and β are the SR and LR propensities )
Note: Problems of Koyck-transformed model estimation apply.
Koyck, PAM, AEH: regression of yt on xt and yt−1

The same underlying regression model (statistical form) is used:


The Koyck transformation:
yt = α0 + γxt + ρyt−1 + vt
The Partial adjustment model (PAM):
yt = αθ + βθxt + (1 − θ)yt−1 + θut
The Model with adaptive expectations (AEH):
yt = αφ + βφxt + (1 − φ)yt−1 + vt

We can make three different interpretations from one estimated


equation.

Of course, not all interpretations are always relevant, we must choose


according to application and have to test the assumptions made.
Koyck, PAM, AEH: regression of yt on xt and yt−1

Example: Model for ct ← f (xt , xt−1 , xt−2 , . . . ):


private consumption (ct ) as a function of disposable income (xt ).
Same estimated model for Koyck, PAM, AEH:

ĉt = 1, 038 + 0.404xt + 0.501ct−1


Koyck: ĉt = α̂0 + γ̂xt + ρ̂ct−1

Koyck: IDL, geometric decay in δ parameters assumed:


ρ̂ = 0.501
1,038
α̂0 = 1, 038 = α̂(1 − ρ̂) = α̂(1 − 0.501) ⇒ α̂ = 0.499 = 2, 080
δ̂j = γ̂ ρ̂ j = 0.404 × 0.501 j
γ̂ 0.404 .
LRP = 1−ρ̂ = 0.499 = 0.81

IDL: ĉt = 2, 080 + 0.404


| {z } xt + 0.202
| {z } xt−1 + 0.101
| {z } xt−2 + . . .
γ̂ ρ̂0 γ̂ ρ̂ γ̂ ρ̂2
Koyck, PAM, AEH: regression of yt on xt and yt−1

Example continued:

ĉt = 1, 038 + 0.404xt + 0.501ct−1


PAM: ĉt = α̂θ̂ + β̂ θ̂xt + (1 − θ̂)ct−1

(1 − θ̂) = 0.501 ⇒ θ̂ = 0.499


1,038
α̂θ̂ = 1, 038 ⇒ α̂ = = 2, 080
0.499
0.404 .
β̂ θ̂ = 0.404 ⇒ β̂ = 0.499 = 0.81

PAM: ĉ∗t = 2, 080 + 0.81xt


ct − ct−1 = 0.499 · (c∗t − ct−1 )

If ct has a prominent inertia and ∆ct significantly follows


changes in habits, we might use the PAM approach.
Koyck, PAM, AEH: regression of yt on xt and yt−1

Example continued:

ĉt = 1, 038 + 0.404xt + 0.501ct−1


AEH: ĉt = α̂φ̂ + β̂ φ̂xt + (1 − φ̂)ct−1

(1 − φ̂) = 0.501 ⇒ φ̂ = 0.499


1,038
α̂φ̂ = 1, 038 ⇒ α̂ = = 2, 080
0.499
.
β̂ φ̂ = 0.404 ⇒ β̂ = 0.404
0.499 = 0.81

AEH: ĉt = 2, 080 + 0.81x∗t


x∗t = 0.499xt + 0.501x∗t−1

If ct if formed as a function of expected (e.g. permanent)


income, we might prefer AEH.
Rational expectations

Rational expectations

Et−1 (xt ) = a0 + a1 xt−1 + b1 z1,t−1 + b2 z2,t−2 + . . .

Et−1 (xt ): expected value of xt at time t − 1


zk,t−j : exogenous variables with impact on Et−1 (xt )

We put x∗t = Et−1 (xt ) into (26): yt = α + βx∗t + ut

We assume that agents:


know all relevant information
know how to use this information

Agents can make prediction errors (vt ), so:

xt = x∗t + vt
Rational vs. adaptive expectations

Under rational expectations:

Expected value of prediction errors [vt = xt − x∗t ] must be zero.


If they were systematically different from zero, rational agents
would immediately adjust their forecasting methods accordingly.

[vt = xt − x∗t ] prediction error must be uncorrelated with any


information available when the prediction is made (t − 1). If not,
this would imply that the forecaster has not made use of all
available information.

These properties can be used for testing the rational


expectations hypothesis in different applications
(e.g. through ex-post simulated dynamic forecasts).
Some economic application that use expectations

Philips curve (Expectations-augmented Phillips curve)


Efficient market hypothesis (EMH)
Consumption function - Permanent income hypothesis

You might also like