Block 2

Block 2
Regression models based on time series

Stationarity, cointegration, ECM,
Distributed lag models, PAM, AEH
Advanced econometrics 1 4EK608

Pokročilá ekonometrie 1 4EK416
Vysoká škola ekonomická v Praze

Outline
1 Time series-based LRMs – repetition from BSc. courses
2 Stationarity, unit root tests, cointegration

Unit root tests
Cointegration, ECM
3 TS & forecasting
4 Finite and infinite distributed lag models

Polynomial distributed lag
Geometric distributed lag (Koyck)
Rational distributed lag (RDL)
Partial adjustment model (PAM)
Adaptive expectations hypothesis (AEH)
Rational expectations
Time series-based LRMs – repetition from BSc. courses
TS is a stochastic (random) process, a sequence of observations
indexed by time.
Observed TS: one realization of a stochastic process.
Static models
yt = β1 + β2 xt + ut , t = 1, 2, . . . , n
Dynamic models
Finite distributed Lag (FDL) model:
yt = α0 + β0 xt + β1 xt−1 + β2 xt−2 + ut
Infinite distributed lag (IDL) model:
yt = α0 + β0 xt + β1 xt−1 + β2 xt−2 + · · · + ut
Dynamic models: lag order
For convenience, β subscripts follow lag order,
Impact / lagged / long-run multiplier,
Effect of temporary (one-off) × permanent increase in x,
Lag distribution (function).
G-M assumptions for TSRMs
TS.1 Linearity
The stochastic process {(xt1 , xt2 , . . . , xtk , yt ); t = 1, 2, . . . , n}
follows a linear model yt = β1 + β2 xt2 + · · · + βK xtK + ut .
TS.2 No perfect collinearity

There is no perfect collinearity among regressors.
Comment: It allows collinearity among regressors.
TS.3 Strict exogeneity

For each t, the expected value of error conditionally on the
explanatory variables at all time periods is zero:
E(ut |X) = 0, t = 1, 2, . . . , n
Note: Compare strict exogeneity to a relaxed version,

contemporaneous exogeneity (discussed next):
E(ut |xt1 , xt2 , . . . , xtk ) = E(ut |xt ) = 0.
G-M assumptions for TSRMs
TS.4 Conditional homoscedasticity
var(ut |X) = var(ut ) = σ 2 , t = 1, 2, . . . , n
TS.5 Serial correlation (autocorrelation) is not present
corr(ut , us |X) = 0, t 6= s
TS.6 Normality
ut are independent of X and i.i.d. : ut ∼ N (0, σ 2 ), i.i.d.
CLRM: Classical linear regression model

TS.1 - TS.6 conditions hold
Properties of OLS estimators
Under TS.1 - TS.3, OLS estimators are unbiased.
Under assumptions TS.1 - TS.5,

var(βˆj ) = σ̂ 2 (X 0 X)−1 and σ̂ 2 = SSR
n−K , where (n − K) = d.f.
Under assumptions TS.1 - TS.5,

OLS estimators are BLUE (conditional on X).
Under assumptions TS.1 - TS.6, β̂j are normally distributed.

Under H0 , each t statistics has a t distribution and F statistic
has a F distribution (small-sample and asymptotically). The
usual construction of confidence intervals is also valid.
Trends and spurious regression
Regression: y on t
If there is a linear trend in y
Regression: log y on t
Exponential trend, constant rate of growth of y
Spurious regression:
We can find relationship between two or more trending variables
even if it does not exist in reality (non-stationarity and
cointegration topics discussed next).
Detrending and deseasonalizing
Detrending algorithm (based on FWL theorem):
yˆt = βˆ1 + βˆ2 xt2 + βˆ3 xt3 + βˆ4 t
Regress each variable on constant, time and save residuals
ÿt , ẍt2 , ẍt3 , t = 1, 2, . . . , n
Regress ÿt on ẍt2 , ẍt3
ÿˆt = βˆ2 ẍt2 + βˆ3 ẍt3
Coefficients βˆ2 , βˆ3 from this regression are the same as in the
original regression
Coefficient of determination when y is trending
With trending y, coefficient of determination overshoots.
2 σ̂u2
R =1−
σ̂y2
σ̂u2 is unbiased estimator (if trend is among regressors).

SST
Pn
With trending y, σ̂y2 = (n−1) , where SST = t=1 (yt − y)2
is neither unbiased nor consistent estimator.
Better approach: regress ÿt on ẍt1 , ẍt2 . The corresponding

coefficient of determination from this regression, i.e.:
SSR
R2 = 1 − Pn 2
t=1 ÿt
is more reliable (does not overshoot) when compared to the

original regression.
Detrending and deseasonalizing
Deseasonalizing algorithm (based on FWL theorem):

Example based on quarterly data
yˆt = βˆ1 + βˆ2 xt2 + βˆ3 xt3 + γˆ1 dummy1 + γˆ2 dummy2 + γˆ3 dummy3
Regress variables on constant, seasonal dummies and save residuals:
ÿt , ẍt2 , ẍt3 , t = 1, 2, . . . , n
Regress ÿt on ẍt2 , ẍt3
ÿˆt = βˆ2 ẍt2 + βˆ3 ẍt3
Coefficients βˆ2 , βˆ3 from this regression are the same as in the original
regression
Stationary and weakly dependent time series
Time series-based LRMs:

Strict exogeneity, homoscedasticity, absence of serial correlation
and normality assumptions are very limiting
With large samples, weaker assumptions are sufficient
For large samples, key assumptions are:

Covariance stationarity and weak dependency
A time series is strictly stationary if its marginal and all joint
distributions are invariant across time.
Covariance stationarity: first two moments and

auto-covariance do not change over time.
E(yt ) = µ, var(yt ) = σ 2 , cor(yt , yt+h ) = f (h)
Weak dependency: correlation between yt and yt+h

“quickly” converges to zero with h growing to infinity.
For Central Limit Theorem (CLT) and Law of Large Numbers

(LLN) to hold, dependency between observations must not be
too strong and must sufficiently quickly decrease with growing
time distance between them.
Time series can be non-stationary and weakly dependent.
Notes on CLT & LLN

CLT implies that the sum of independent random variables
(or weakly dependent RVs) – if centered and standardized by
its s.d. – has an asymptotic distribution N (0, 1).
LLN theorem implies that the average taken from a random sample
converges in probability to the population average; LLN holds for
stationary and weakly dependent series.
Examples of weakly dependent time series:
Moving average process of order one: ma(1)
yt = et + α1 et−1 ,
where et is an iid time series.

α1
corr(yt , yt+1 ) = 1+α21
,
observations with higher time distance than 1 are uncorrelated.
Stable autoregressive process of order 1: ar(1)

Under stability condition |ρ| < 1, it can be demonstrated that
(Wooldridge, Introductory econometrics, ch. 11.1):
yt = ρyt−1 + et ⇒ corr(yt , yt+h ) = ρh
If the stability condition holds, TS is weakly dependent because

correlation converges to zero with growing h.
Asymptotic properties of OLS estimators
TS.1’ Linearity
The stochastic process {(xt1 , xt2 , . . . , xtk , yt ); t = 1, 2, . . . , n}
follows the linear model yt = β0 + β1 xt1 + · · · + βk xtk + ut
We assume both dependent and independent variables are
stationary and weakly dependent.
TS.2’ No perfect collinearity

There is no perfect collinearity among regressors.
Comment: the same assumption as TS.2
TS.3’ Contemporaneous exogeneity

Null conditional expected value of errors:
E(ut |xt1 , . . . , xtk ) = E(ut |xt ) = 0

TS.4’ Contemporaneous homoscedasticity
var(ut |xt ) = var(ut ) = σ 2
TS.5’ No serial correlation

(autocorrelation in residuals is not present)
corr(ut , us |xt , xs ) = 0, t 6= s
Under assumptions TS.1’, TS.2’ and TS.3’,
OLS estimators are consistent (not unbiased)

TS.1’ - TS.3’ ⇒ plim β̂j = βj , j = 0, 1, . . . , k
Removing strict exogeneity (TS.3 → TS.3’): no restriction on how ut

is related to regressors in other time periods. Hence:
We allow for feedback from (lagged) explained variable to

“future” values of explanatory variables
We can use lagged dependent variable as a regressor.
Theorem: Asymptotic normality of OLS: Under assumptions TS.1’ –

TS.5’, OLS estimators are asymptotically normally distributed.
Usual OLS standard errors, t-statistics and F -statistics are

asymptotically valid.
β̂ → N (β, σ̂ 2 (X 0 X)−1 ) as n → ∞
Serial correlation in TSRM
Causes and effects of autocorrelation
Causes:
DGP, dynamic incompleteness of models
Effects on OLS estimates:

Unbiased β̂j (only with strictly exogenous regressors),
Consistent β̂j (contemporaneous exogeneity holding),
Biased inference.
FGLS (only with strictly exogenous regressors)
OLS + robust inference (Newey-West & other HAC s.e.)

Serial correlation in TSRM
Testing AR(1) with strictly exogenous regressors
ρ estimation:
yt = β0 + β1 xt1 + · · · + βk xtk + ut
ut = ρut−1 + et
we estimate ρ from:
ût = ρût−1 + error
H0 : ρ = 0
Durbin-Watson test:
n
P 2
ût − ût−1
t=2
d= n
û2t
P
t=1
.
small sample validity (conditions apply)
d-statistic: symmetric distribution h0, 4i , E(d) = 2
d ≈ 2(1 − ρ̂) i.e. ρ̂ ≈ 1 − d/2. Test for: H0 : ρ = 0.
Serial correlation
Testing serial correlation with general regressors
Testing AR(1):
ût = α0 + α1 xt1 + · · · + αk xtK + ρût−1 + error
H0 : ρ = 0
We can use a heteroscedasticity-robust version of the t-test.
Breusch-Godfrey test for AR(q) serial correlation:
ût = α0 + α1 xt1 + · · · + αk xtK + ρ1 ût−1 + · · · + ρq ût−q + error
H0 : ρ 1 = · · · = ρ q = 0
Use F -test or LM-test: (n − q)Ra2 ∼ χ2(q) .

H0
Adjusting LRMs for serial correlation
LRM on TS with autocorrelation (for |ρ| < 1):
yt = x0t β + ut , ut = ρut−1 + εt ,
we can substitute the autoregressive process into main LRM:
yt = x0t β + ρut−1 + εt note: ut−1 = yt−1 − x0t−1 β,

hence:
yt = ρyt−1 + x0t β − ρ(x0t−1 β) + εt ,

which is non-linear in parameters (ρβ) and can be
(a) Estimated by NLS (general regressors)
(b) Re-arranged into generalized first differences specification

for example: (xtj − ρxt−1,j ), where |ρ| < 1.
(yt − ρyt−1 ) = (x0t − ρx0t−1 )β + εt ,

and estimated by ‘Hildreth-Lu’, ‘Cochrane-Orctutt’
or ‘Prais-Winsten’ methods.
(yt − ρyt−1 ) = (x0t − ρx0t−1 )β + εt .
Hildreth-Lu:
a) Start by repeated OLS estimation of the generalized first

differences specification, using different (fixed) values of
ρ ∈ (−1, 1).
b) From such auxiliary regressions (i.e. for different ρ values), select
one that minimizes RSS.
Potentially computationally expensive (fine-grain iterations

along ρ-values). No advantage over NLS methods
(Hildreth-Lu is a form of “brute-force” NLS estimator).
Global optimum (min RSS) can be reached.
(yt − ρyt−1 ) = (x0t − ρx0t−1 )β + εt .
Cochrane-Orctutt:
a) Start by OLS estimation of original LRM and obtain residuals

ût . Estimate ρ̂ from the ar(1) model ût = ρût−1 + εt .
b) Use ρ̂ to calculate generalized first differences (GFD) and
estimate β parameters of the GFD model.
c) Vector β̂ from step (b) can be used to produce ‘improved’
residuals ût = yt − x0t β̂ and subsequent improved ρ̂ estimates.
d) β and ρ estimates are iteratively improved until no substantial
change in the estimated value of ρ is observed.
Computationally accessible, typically just a few iterations.

Convergence to a local (not global) minimum of RSS can happen.
(yt − ρyt−1 ) = (x0t − ρx0t−1 )β + εt .
Prais-Winsten:
Small alteration to the Cochrane-Orcutt (CO) method.

CO disregards first observation from the dataset – due to
(generalized) first differences.
PW transformation retains the first observation with weight

p
1 − ρ2 .
All remaining steps and features of the PW method follow
directly from CO.
In small samples (short TS), PW is efficient w.r.t. CO, as sample
size for estimation of the transformed model does not decrease.
SLRM on TS with autocorrelation, example outline (for |ρ| < 1):
yt = β1 + β2 xt + ut , ut = ρut−1 + εt ,
(yt − ρyt−1 ) = β1 (1 − ρ) + β2 (xt − ρxt−1 ) + εt .

yt∗ = β1∗ + β2 x∗t + εt
     p 
y1 — y1 1 − ρ2
 y2 
 

 y2 − ρy1


 y2 − ρy1 
 
where y =  y3  ,
  ∗
yCO =
 y3 − ρy2
,
 ∗
yPW =  y3 − ρy2 ,
 
 ..   ..   .. 
.  .   . 
yn yn − ρyn−1 yn − ρyn−1
X-matrix elements are transformed by analogy,

FGLS (CO/PW) is the OLS on transformed data,
β̂1∗
β̂1 = (1−ρ̂) and β̂2 is used “directly” from the GFD form.
Stationarity, unit root tests, cointegration
Stationarity
Unit root tests
Cointegration
Weakly and strongly dependent TS
Weakly dependent time series
Moving average process of order one ma(1) yt = et + α1 et−1 ,

where et is i.i.d. time series.
yt and yt+h observations with distance h ≥ 2 are uncorrelated.
This process is stationary.
For stable autoregressive process of order 1 ar(1):

yt = ρyt−1 + et ⇒ cor(yt , yt+h ) = ρh
If stability condition |ρ| < 1 holds, the process is weakly

dependent because correlation converges to zero with growing h.
Also, this process is stationary for y0 = 0.
Strongly dependent time series:
Random walk:
yt = yt−1 + et , et ∼ Distr (0, σ 2 ), iid

by consecutive substitutions:
yt = yt−2 + et−1 + et ,
yt = yt−3 + et−2 + et−1 + et ,
···
yt = y0 + e1 + · · · + et−1 + et .
Shocks have permanent effects, the series is not covariance stationary

and is strongly dependent.
E(yt ) = E(y0 )
var(yt ) = σe2 t
p
cor(yt , yt+h ) = t/(t + h)
Correlation decreases very slowly and speed depends on t.

Strongly dependent TS
Two realizations of a random walk
1
y
0 10 20 30 40 50
x
Strongly dependent TS
Random walk with a drift
yt = α0 + yt−1 + et ⇒ yt = α0 t + et + et−1 + · · · + e1 + y0
A linear trend with random walk around the trend.

It is neither covariance stationary nor weakly dependent.
E(yt ) = α0 t + E(y0 )
var(yt ) = σe2 t
p
cor(yt , yt+h ) = t/(t + h)
Correlation decreases very slowly and decline speed depends on t.

Realization of a random walk with a drift

50
40
30
yd
20
10
0 10 20 30 40 50
x
Different realizations of trending TS (weakly dependent around

the trend) may produce similar time series.
yt = 1 · yt−1 + ut = yt−1 + ut
Unit root process: yt = yt−1 + ut ;

where: ut is a weakly dependent series.
Random walk is a special case of the unit root process
where: ut ∼ Distr (0, σu2 ), iid
We need to distinguish strongly and weakly dependent TS:

Economic reasons:
In strongly dependent series, shocks or policy changes have long
or permanent effects; in weakly dependent series, their effects are
only temporary.
Statistical reasons:
Analysis with strongly dependent series must be handled in
specific ways.
Integrated series
Terminology - Order of integration
Weakly dependent TS are integrated of order zero: I(0).
If we have to difference a TS once to get a weakly dependent TS,

then it is integrated of order 1: I(1).
Random walk – example of a I(1) process:
yt = yt−1 + et ⇒ ∆yt = yt − yt−1 = et

log yt = log yt−1 + et ⇒ ∆ log yt = et
A time series is integrated of order d: I(d), if it becomes

a weakly dependent TS after being differenced d times.
Unit roots tests
Unit root tests help to decide if a time series is I(0) or not
Use either some informal procedure or a unit root test
Informal procedures
Analyze autocorrelation of the first order
ρ̂ = corr(y
ˆ t , yt−1 )
If ρ̂ approaches 1, it indicates that the series can have unit

root. Alternatively, it could have a deterministic trend.
We can analyze sample autocorrelations using
a correlogram
Unit root tests
cov (yt ,yt−h )

Correlogram: ρh = σyt ·σyt−h
I(1)-like series I(0)-like series

Unit root tests
Dickey-Fuller (DF) test – motivation

Unit root test in an ar(1) process:
yt = α + ρyt−1 + et
H0 : ρ = 1, H1 : ρ < 1
Under H0 , yt has a unit root.

◦ For ρ = 1 ∧ α = 0 → yt is a random walk.
◦ For ρ = 1 ∧ α 6= 0 → yt is a random walk with a drift
and E(yt ) is a linear function of t .
Under H1 , yt is a weakly dependent ar(1) process.
Unit root tests
Dickey-Fuller (DF) test – motivation

Unit root test in an ar(1) process:
H0 : ρ = 1, H1 : ρ < 1
For DF tests, H1 : ρ < 1 is a common simplification to the full space
of alternatives to H0 : ρ = 1.
For |ρ| < 1, yt is weakly dependent (as plim ρh = 0)
However, if unit root is likely to be present, the probability of
ρ < 0 is negligible.
We usually ignore the possibility of ρ > 1 , as it would lead to
explosive behavior in yt .
. . . |ρ| > 1 would allow for explosive oscillations in yt .
Dickey Fuller (DF) test
Basic equation for unit root test in an ar(1) process:

For DF test, we apply a suitable transformation to yt :
we subtract yt−1 from both sides of the equation:
∆yt = α + (ρ − 1)yt−1 + et ; apply substitution: θ = (ρ − 1)

i.e. H0 : ρ = 1 ⇔ H0 : θ = 0
∆yt = α + θyt−1 + et ; now: H1 : ρ < 1 ⇔ H1 : θ < 0
We use a t-ratio for testing H0 : θ = 0. However:

Under H0 , t-ratios don’t have a t-distribution, but follow a
DF -distribution. (-negative- critical values of the DF
distribution are much farther from zero)
Critical values for the DF distribution are available from
statistical tables and implemented in most relevant SW packages.
DF test & ADF test
TS with unit root can manifest various levels of complexity. Hence, DF test
for a given yt TS is usually performed using the following specifications:
∆yt = θyt−1 + et
random walk
∆yt = α + θyt−1 + et random walk with a drift
∆yt = α + θyt−1 + δt + et random walk with a drift and trend
DF test is the same (H0 : θ = 0) for all specifications /critical values difffer/
Augmented Dickey-Fuller (ADF) test is a common generalization of DF test
(example: Augmentation of the DF test for the 2nd specification)
∆yt = α + θyt−1 + γ1 ∆yt−1 + · · · + γp ∆yt−p + et
When estimating θ, we control for possible ar(p) behavior in ∆yt .

ADF test has the same null hypothesis as a DF test → H0 : θ = 0.
Unit root tests in R: package {urca}
Description of the options for the ur.df() function:
1 type "none"
∆yt = θyt−1 + et
tau1: we test for H0 : θ = 0 (unit root)
2 type "drift"
∆yt = α + θyt−1 + et
tau2: H0 : θ = 0 (unit root)
phi1: H0 : θ = α = 0 (unit root and no drift)
3 type "trend"
∆yt = α + θyt−1 + δt + et
tau3: H0 : θ = 0 (unit root)
phi2: H0 : θ = α = δ = 0 (unit root, no drift, no trend) phi3:
H0 : θ = δ = 0 (unit root and no trend)
Multiple other unit root tests exist:
(KPSS, tests for seasonal data, break in the DGP, etc.).
Unit root tests
ADF test for TS with trend
∆yt = α + θyt−1 + δt + γ1 ∆yt−1 + · · · + γp ∆yt−p + et
Under the alternative hypothesis of no unit root, the process is

trend-stationary.
The critical values in the ADF distribution with time trend are
even more negative as compared to random walk and random
walk with a drift.
When using DF/ADF specification 1 or 2 (R-W, R-W with drift)

to test for unit root in a clearly trending TS, the test would not
have sufficient power (we would not reject H0 for trending
weakly dependent TS).
Unit roots and trend-stationary series
∆yt = α + θyt−1 + δt + γ1 ∆yt−1 + · · · + γp ∆yt−p + et
Terminology:
Stochastic trend: θ = 0
Also called difference-stationary process: yt can be
turned into I(0) series by differencing. Terminology
emphasizes stationarity after differencing yt instead of weak
dependence in differenced TS.
Deterministic trend: δ 6= 0, θ < 0
Also called trend-stationary process: has a linear trend,
not a unit root. yt is weakly dependent - I(0) - around its
trend. We can use such series in LRMs, if trend is also used
as regressor.
DF/ADF tests are not precise tools. Distinguishing between

stochastic and deterministic trend is not easy (sample size!).
Stationarity in ar(p) processes
Introduction: lag operators:

for some ar(p) TS xt :
Lxt = xt−1
L(Lxt ) = L2 xt = xt−2
···
p
L xt = xt−p
Using lag operators,
ar(p) process : xt = α + φ1 xt−1 + φ2 xt−2 + · · · + φp xt−p + ut
can be rewritten as:
(1 − φ1 L − φ2 L2 − · · · − φp Lp )xt = α + ut
(1 − φ1 L − φ2 L2 − · · · − φp Lp )xt = α + ut (1)
Stochastic process (1) will only be stationary if the roots of
corresponding equation (2) are all greater than unity in absolute value
1 − φ1 L − φ2 L2 − · · · − φp Lp = 0 (2)
Illustration 1 – ar(1) process:

xt = α + φxt−1 + ut (3)
(1 − φL)xt = α + ut
1 − φL = 0
L = 1/φ
For (3) to be stationary, |L| > 1 ↔ −1 < φ < 1
Illustration 2 – ar(3) process:
xt = 2 + 3.9xt−1 + 0.6xt−2 − 0.8xt−3 + ut

To evaluate stationarity of xt , we use
1 − 3.9L − 0.6L2 + 0.8L3 = 0,
which can be factorized:
(1 − 0.4L)(1 + 0.5L)(1 − 4L) = 0
1st root :L = 2.5

2nd root :L = −2
3rd root :L = 0.25 ⇒ TS is non-stationary
Handling trend-stationary time series
Trend-stationary TS fulfill TS.1’ assumption. We can use them

in regressions if we have time trend among regressors.
Strongly dependent time series do not fulfill TS.1’ assumption.

We cannot use them in regressions directly.
Sometimes, we can transform such series into weakly dependent

time series.
Sometimes, taking logarithms helps.
Differencing is popular, but it has drawbacks.
As a special case, LRMs can be estimated and inference
performed if TS in the model are cointegrated.
Handling strongly dependent time series
Example - differencing TS:
yt = β0 + β1 xt + εt yt , xt ∼ I(1) (4)
yt−1 = β0 + β1 xt−1 + εt−1 εt ∼ i.i.d. (5)
∆yt = β1 ∆xt + vt vt = εt − εt−1 (6)
Coefficient β1 does not change between (4) and (6).

However, equations (4) and (6) are different.
⇒ β1 has two interpretations now: it is the change in yt for a
unit change in xt , but is it also the change in the growth of y for
a unit change in the growth of x.
Problems involved in the approach:
1 Differenced erros vt are no longer i.i.d.
2 In model (6), we loose information linked with the levels of
variables, short term relations are stressed.
3 Information drawn from (6) often generates bad long-term
predictions of level variables: ∆ŷt = β̂1 ∆xt ; . . . what if
β0 6= 0?
Handling strongly dependent time series
Cointegrated TS – motivation
Some properties of integrated processes
1 The sum of stationary and non-stationary series must be

non-stationary.
2 Consider a process yt = α + βxt :

· If xt is stationary then yt will be stationary.
· If xt is non-stationary then yt will be non-stationary.
3 If two time series are integrated of different orders, then any

linear combination of the series will be integrated at the higher
of the two orders of integration.
4 Sometimes (“special case scenario”) it turns out that a linear

combination of two I(d) series is integrated of order less then d.
Spurious regression or cointegration
Spurious regression Regressing one I(1)-series on another

I(1)-series may lead to extremely high t-statistics even if the
series are completely independent. Similarly, the R2 of such
regressions tend to be very high.
Regression analysis involving time series that have a unit root
may generate completely misleading inferences.
Cointegration Fortunately, regressions with I(1)-variables are

not always spurious: If there is a stable relationship between
time series that, individually, display unit root behavior, these
time series are called “cointegrated”.
General definition of cointegration
Two I(1)-time series yt , xt are said to be cointegrated if there exists a

stable relationship between them, where:
yt = α + βxt + et , et ∼ I(0)
Cointegration (CI) test if CI parameters are known
For residuals of the known CI relationship:
et := yt − α − βxt ,
test whether the residuals have a unit root

(DF/ADF and other unit root tests may be applied “directly”).
If the unit root H0 is rejected, yt , xt are cointegrated.
Testing for CI if the parameters are unknown

If the potential relationship is unknown, it can be estimated by
OLS. After that, we test whether the regression residuals have a
unit root. If the unit root is rejected, this “means” (test result
interpretation at a given significance level) that yt & xt are
cointegrated. Due to the pre-estimation of parameters, critical
values are different than in the case of known parameters.
(Software handles this automatically.)
The CI relationship may include a time trend
If the two series have differential time trends (drifts in this case),
the deviation between them may still be I(0) but with a linear
time trend. In this case one should include a time trend in the
CI-regression. Also, we have to use different critical values when
testing residuals.
(Software handles this automatically.)
Cointegration tests based on regression residuals
Engle-Granger test estimates a p-lag ADF equation:

p
X
∆ût = θ ût−1 + ∆ût−j + et
j=1
Esentially, this is an ADF test on ût [θ = (ρ − 1)]

Specific critical values apply (farther from 0 than t or DF ).
Phillips-Ouliaris test estimates a DF equation:

∆ût = θ ût−1 + et
The t-ratio is based on robust standard errors,

different estimators exist for the robust standard errors.
In both cases (EG and PO), H0 of unit root in û

i.e. “no-cointegration” is tested.
Error correction model (ECM)
It can be shown that when variables are cointegrated, i.e. when

there exists a long-term relationship among them, their
short-term dynamics are related as in a so-called error correction
model (ECM).
Autoregressive distributed lag models
Autoregressive distributed lag model with one regressor

p
X q
X
ADL(p, q) : yt = β0 + βi yt−i + γj xt−j +ut , ut ∼ iid(0, σ 2 )
i=1 j=0
There are many useful modifications/simplifications/restrictions

to the general ADL(p, q) process. For example, ADL(1,1) is
equivalent to a ECM specification with one lag used in the
error-correction mechanism (shown next):
ADL(1, 1) : yt = β0 + β1 yt−1 + γ0 xt + γ1 xt−1 + ut . (7)
Additional ADL(1, 1) restrictions: β1 = 1 and γ1 = −γ0
give a model in 1st diffs.: ∆yt = β0 + γ0 ∆xt + ut .

For ADL(1,1) model (7), suppose there is an equilibrium value x◦ and

in the absence of shocks, xt → x◦ as t → ∞. Then, assuming absence
of ut errors, yt converges to steady state: y ◦ .
Hence, the ADL(1,1) model (7) can be re-written as:
y ◦ = β0 + β1 y ◦ + (γ0 + γ1 )x◦
Solving this for y ◦ as a function of x◦ , we get

β0 γ0 + γ1 ◦ β0
y◦ = + x = + λx◦
1 − β1 1 − β1 1 − β1
γ0 +γ1
where λ ≡ 1−β1 and |β1 | < 1 is assumed.
β0
y◦ = + λx◦ ,
1 − β1
γ0 + γ1
λ≡ , |β1 | < 1
1 − β1
λ is the long-run derivative of y ◦ with respect to x◦ .
λ is an elasticity if both y ◦ and x◦ are in logs.
λ̂ can be computed directly from the estimated parameters of the

ADL(1,1) model (7).
The ADL(1,1) equation (7) - repeated here for convenience:
yt = β0 + β1 yt−1 + γ0 xt + γ1 xt−1 + ut ,
can be equivalently rewritten as follows:
∆yt = β0 + (β1 − 1)(yt−1 − λxt−1 ) + γ0 ∆xt + ut . (8)

γ0 +γ1
Again, λ ≡ 1−β1 and |β1 | < 1 is assumed.
Equation (8) is an error-correction model (ECM).
ECM: ∆yt = β0 + (β1 − 1)(yt−1 − λxt−1 ) + γ0 ∆xt + ut .
(yt−1 − λxt−1 ) measures the extent to which the long run

equilibrium between yt and xt is not satisfied (at t − 1).
Consequently, (β1 − 1) can be interpreted as the proportion of
the disequilibrium (yt−1 − λxt−1 ) that is reflected in the
movement of yt , i.e. in ∆yt .
(β1 − 1)(yt−1 − λxt−1 ) is the error-correction term.
Many ADL(p, q) specifications can be re-written as ECMs.
ECMs can be used with both stationary & cointegrated
non stationary TS.
ECMs (β1 − 1) is essentially the same as θ from
Partial adjustment model (discussed next).
Some more complicated ECMs:
1) We can use higher order lags, e.g. ADL(2,2):
yt = β0 + β1 yt−1 + β2 yt−2 + γ0 xt + γ1 xt−1 + γ2 xt−2 + ut ,
to establish ECMs. It is again possible to rearrange and

re-parametrize ADL(2,2) to get an ECM. More than one
re-parameterization is possible.
2) More than two variables can enter into an equilibrium

relationship.
ECM: non-stationary & cointegrated series
Superconsistency: yt = β0 + β1 xt + ut
1 Provided xt and yt are cointegrated, the OLS estimators β̂0 and
β̂1 will be consistent.
2 β̂j converge in probability to their true values βj more quickly in
the cointegrated non-stationary case than in the stationary case
(asymptotic efficiency).
Consequences:
For simple static regression between two cointegrated variables:
yt , xt ∼ C(1, 1), super-consistency applies (with deterministic
regressors such as intercept and trend added upon relevance).
Dynamic misspecifications do not necessarily have serious
consequences. This is a large sample property - in small samples, OLS
estimators are biased.
(Specific statistical inference applies to cointegrating vectors.)
Granger representation theorem:1

If two TS xt and yt are cointegrated, the short-term disequilibrium
relationship between them can be expressed in the ECM form
∆yt = lagged(∆y, ∆x) − δut−1 + εt (9)
where ut−1 = yt−1 − β0 − β1 xt−1 is the disequilibrium error and δ is a

short-run adjustment parameter.
Note: as u is on the scale of y, δ can be interpreted in percentages.
Example: δ = 0.8 → 80% of the disequilibrium error gets corrected between
t − 1 and t (on average).
Two implications:
1 The general-to-specific model search can focus on ECMs
2 Engle-Granger two-stage procedure
1
Engle and Granger (1987)
Engle-Granger two-stage procedure:

We short-cut the search of an ECM from a general model:
1st stage: Estimation of the cointegrating (static) regression and

saving residuals
ût = yt − β̂0 − β̂1 xt
2nd stage: Use residuals ût−1 in (9) instead of ut−1 and estimate by
OLS
Estimators are consistent and asymptotically efficient, but biased in

small samples.
Assumptions: yt and xt are non-stationary and cointegrated.
Possibility of more cointegrating vectors:

Long-run relationship: yt = β0 + β1 xt + β2 wt + β3 zt + ut ,
assume all observed variables are I(1) and cointegrating relationship
exists – then the disequilibrium error is:
ut = [yt − β0 − β1 xt − β2 wt − β3 zt ] ∼ I(0). (10)
If a linear combination of variables such as (10) is stationary, then the

coefficients in this relationship form a cointegrating vector, e.g.
(1, −β1 , −β2 , −β3 ).
In the multivariate case, there may be more than one linearly
independent stationary combination linking the cointegrated variables
(topic discussed separately along VAR/VECM estimators).
Cointegration: the existence of at least one cointegrating vector.
Cointegration among more than two variables
Testing and estimation

Cointegration can be tested using the EG and/or PO tests
Only one cointegrating vector exists
Estimation can proceed by the Engle-Granger two-stage method for
ECMs.
Two or more cointegrating vectors
Engle-Granger two-stage method is not applicable. Johansen (1988)
suggests a maximum likelihood approach.
TS & forecasting
Chow tests
Forecasts from TS-based models

Chow tests
For any LRM: y = Xβ + u
Say, the sample (time series) for a period t = 1, 2, . . . , T may be

conveniently divided into two groups: T1 + T2 = T .
[ consider two periods: fixed vs. floating F/X rates ]
[ pre-EU accession vs. post-EU accession period ]
[ applies to CS data as well; e.g Male/Female ]
Now, the LRM’s vectors and matrices may be partitioned as

follows:

y1 X1 u
= β+ 1
y2 X2 u2
where y10 = (y1 , . . . , yT1 ), y20 = (yT1 +1 , . . . , yT ), etc.
i.e. {y1 , X1 } ∈ T1 , {y2 , X2 } ∈ T2 .
Chow tests
For any LRM: y = Xβ + u, Chow test can be based on an auxiliary

regression (unrestricted model for the F test):

y1 X1 0 u
= β+ γ+ 1
y2 X2 X2 u2
where 0 is a zero-matrix of the same dimensions as X1 ,

i.e. (T1 ×K).
Also, we can see that:

T1 : ŷ = X β̂
T2 : ŷ = X(β̂ + γ̂)
Note: Power of the test depends on proper T1 vs. T2 cutoff.

Chow test may be generalized for 3+ time periods (groups).
Chow tests
For our unrestricted model:

y1 X1 0 u
= β+ γ+ 1
y2 X2 X2 u2
We can formulate the null of no structural change in model dynamics

between the two time periods (groups) as follows:
H0 : γ = 0, i.e.: γ1 = γ2 = γ3 = · · · = γK = 0
H1 : ¬H0
This can be tested using an F -test (or its HC version):
SSRr −SSRur n−2K

F = SSRur × K ∼ F [K, (n − 2K)]
H0
Note: Here, β1 and γ1 both relate to the intercept.

Chow test - CS-based example
A simple Chow test example for CS data:

(to assess whether parameters are equal for M/F students.)
Original model (Chow test restricted model):

. . . based on the well known Wooldridge dataset.
cumgpa = β1 + β2 sat + β3 hsperc + β4 tothrs + u
Auxiliary model (Chow test unrestricted model):
cumgpa = β1 + γ1 female
+ β2 sat + γ2 (female×sat)
+ β3 hsperc + γ3 (female×hsperc)
+ β4 tothrs + γ4 (female×tothrs) + u
Chow test - CS-based example (contd.)
Null hypothesis H0 : γ1 = γ2 = γ3 = γ4 = 0
If all interactions effects are zero, we have the same regression

function for both groups.
Estimate of the unrestricted model

\ = 1.48 − .353 female + .0011 sat + .0075 (female×sat)
cumgpa
(.21) (.411) (.0002) (.00039)
− .0085 hsperc − .00055 (female×hsperc)

(.0014) (.00316)
+ .0023 tothrs − .00012 (female×tothrs)

(.0009) (.00163)
. . . t-tests cannot be used to evaluate the joint H0 .

Chow test - CS-based example (contd.)
F -statistic:
(SSRr − SSRur )/K (85.515 − 78.355)/4
F = = ≈ 8.18
SSRur /(n − 2K) 78.355/(366 − 8)
. . . using p-value, we reject the null hypothesis
Important: Chow tests (all types) assume constant error

variance across groups.
Chow 1: stability test for TS
Here, the F -statistic for the Chow test is calculated in an alternative

way (Chow 1):
For a suitable (potential) “breakpoint”, we divide our sample
{t = 1, 2, . . . , T } in two groups:
“T1 ” with {t = 1, 2, . . . , T1 } and
“T2 ” with {t = T1 +1, T1 +2, . . . , T }
. . . note that the choice of T1 is arbitrary
. . . (breakpoint-searching algorithms can be used)
Run separate regressions for both T1 , T2 groups;
the SSRur is given by the sum of the SSRs of the two separately
estimated regression models.
. . . sufficient observations in T1 and T2 are required (d.f.)
Run the original (restricted) regression model on the whole
sample T and store SSRr .
Chow 1: stability test for TS
SSRr −SSRur T −2K

F = SSRur · K ∼ F ( K , T −2K )
H0
where
SSRur = SSR T1 + SSR T2
SSRr = SSR T
K is the number of parameters (including intercept) in LRM
H0 : stable structure of coefficients - no statistically significant

differences between parameters in T1 and T2 .
H1 : ¬H0 . H1 suggests a structural change in parameters over time.
Hence, separate regressions fit the data significantly better and
SSRur < SSRr (the difference is statistically significant).
Note: Chow 1 can be generalized for G time periods (G − 1 “breakpoints”).

PG
. . . In such case, SSRur = g=1 SSRg , d.f. = T − GK
. . . and we assume Tg > K for all time groups.
. . . (only usable for small G-values, problematic setup of breakpoints)
Chow 2: prediction test for TS
Sometimes, we do not have enough observations to estimate the LRM

separately for T1 and T2 as in the Chow 1 test.
In such case, we can use Chow 2: test of prediction unsuitability

(slightly different F -statistics).
The whole period is again divided into two subsets: T = T1 + T2 .

T1 is the “base” period (sample size)
T2 is the number of “additional” observations, it usually
corresponds to an ex-post prediction period
Chow 2: prediction test for TS
SSRr −SSRur T1 −K
F = SSRur · T2 ∼ F ( T2 , T1 −K )
H0
where
SSRur = SSR T1 (from LRM estimated for “base” period)
SSRr = SSR T (from LRM estimated for the whole period)
K is the number of parameters (including intercept) in LRM
H0 : additional (T2 ) observations come from the same DGP as

in T1 .
H1 : ¬H0 (assume significant differences between samples)
. . . If H0 is rejected, we would expect large differences
. . . between predictions and actual observations of yt .
If enough T1 and T2 observations are available, Chow 1 is preferred

(compared to Chow 2) as it has more “power”.
TS & forecasting
One-step-ahead forecast: ft
ft is the forecast of yt+1 made at time t
Forecast error et+1 = yt+1 − ft
Information set: It
Loss function: e2t+1 or |et+1 |
In forecasting, we minimize E(e2t+1 |It ) = E[(yt+1 − ft )2 |It ]
Solution: E(yt+1 |It )
Multiple-step-ahead forecast ft,h

ft,h is the forecast of yt+h made at time t
Solution: E(yt+h |It )
TS & forecasting
For some processes, E(yt+1 |It ) is easy to obtain:

1 Martingale process (MP):
If E(yt+1 |yt , yt−1 , . . . , y0 ) = yt , ∀t ≥ 0 then {yt } is MP ft = yt
If a process {yt } is a martingale then {∆yt } is martingale
difference sequence (MDS)
E(∆yt+1 |yt , yt−1 , . . . , y0 ) = 0
2 Process with exponential smoothing:
E(yt+1 |It ) = αyt + α(1 − α)yt−1 + · · · + α(1 − α)t y0 ; 0 < α < 1.
If we set f0 = y0 , then for t ≥ 1 : ft = αyt + (1 − α)ft−1

(Simplification: forecast for t + 1 is a weighted average of yt and
forecast of yt made at t − 1.)
TS & forecasting
3 Regression models
Static model: yt = β0 + β1 xt + ut
E(yt+1 |It ) = β0 + β1 xt+1 → Conditional forecasting
It contains xt+1 , yt , xt , . . . , y1 , x1
Here, knowledge of xt+1 is assumed (forecast condition).
E(yt+1 |It ) = β0 + β1 E(xt+1 |It ) → Unconditional forecasting
It contains yt , xt , . . . , y1 , x1
Here, xt+1 needs to be estimated before yt+1
Dynamic models depending on lagged variables only:

yt = δ0 + α1 yt−1 + γ1 xt−1 + ut
E(ut |It−1 ) = 0
E(yt+1 |It ) = δ0 + α1 yt + γ1 xt
Also, we can use more lags, drop or add regressors . . .
TS & forecasting
One-Step-Ahead Forecasting with

yt = δ0 + α1 yt−1 + γ1 xt−1 + ut :
point forecast: fˆt = δ̂0 + α̂1 yt + γ̂1 xt

forecast error: êt+1 = yt+1 − fˆt
s.e. of forecast: s.e.(êt+1 ) = {[s.e.(fˆt )]2 + σ̂ 2 }1/2
forecast interval: essentially the same as prediction interval

approximate 95% forecast interval is: fˆt ± 1.96×s.e.(êt+1 )
TS & forecasting
Example: File PHILLIPS

Forecasting US unemployment rate
u
\ nemt = 1.572 + .732 unemt−1
(.577) (.097)
2
n = 48, R = .544
u
\ nemt = 1.304 + .647 unemt−1 + .184 inft−1
(.490) (.084) (.041)
2
n = 48, R = .677
Note that these regressions are not meant as causal equations. The
hope is that the linear regressions approximate well the conditional
expectation.
TS & forecasting
Evaluating forecast quality

We can measure how forecasted values fit to actual observations
(in-sample criteria, e.g. R2 )
It is better, however, to evaluate the forecasting performance

when forecasting out-of-sample values (out-of-sample criteria).
For this purpose, use first n observations for estimation, and the
remaining m observations to calculate the forecast errors ên+h
Forecast evaluation measures:

Pm
Mean Absolute Error MAE = m−1 |ên+h |,
h=1
Pm
Root Mean Squared Error RMSE = (m−1 h=1 ê2n+h )1/2
k-Fold Cross-Validation (kFCV) approach
. . . has limited empirical benefits in TS analysis
TS & forecasting
Additional comments
Multiple-step-ahead forecasts are possible, but necessarily
less precise.
Forecasts may make use of deterministic trends, but the
error made by extrapolating time trends too far into the
future may be large.
Similarly, seasonal patterns may be incorporated into
forecasts.
It is possible to calculate confidence intervals for the point
multiple-step-ahead forecasts.
Forecasting I(1) time series can be based on adding
predicted changes (which are I(0)) to base levels.
Forecast intervals for I(0) series converge to the
unconditional variance, whereas for integrated series, they
are unbounded.
Finite and infinite distributed lag models
Geometric distributed lag (Koyck transformation)
Rationad distributed lag
Finite distributed Lag (FDL) model:
yt = α0 + β0 xt + β1 xt−1 + β2 xt−2 + ut
Infinite distributed lag (IDL) model:
yt = α0 + δ0 xt + δ1 xt−1 + δ2 xt−2 + · · · + ut
Dynamically complete (FDL) models
Model is dynamically complete if we have a “sufficient” number

of lags of regressors, so that no more additional lags would help
with explanation of variability in the dependent variable.
In dynamically incomplete models, we usually detect

autocorrelation in the error term of the LRM.
FDL: Polynomial distributed lag (Almon)
Used in Finite distributed lag models
. . . example below also extends to higher order polynomials
yt = α + β0 xt + β1 xt−1 + · · · + βm xt−m + ut (11)

m
X
yt = α + ( βi xt−i ) + ut , (i is the lag operator here)
i=0
Simplifying assumption: order = 2, m = 8
0.5
βi = k0 + k1 i + k2 i2 (12)
β0 = k 0 0.4
0.3
beta
β1 = k0 + k1 + k2
0.2
β2 = k0 + k1 ·2 + k2 ·4
0.1
...
0.0
0 2 4 6 8
βm = k0 + k1 m + k2 m2 lag
Almon-type transformation of (11) for m = 8 and k = 2:
yt = α + k0 xt + (k0 + k1 + k2 )xt−1 + (k0 + 2k1 + 4k2 )xt−2 +

+ · · · + (k0 + 8k1 + 64k2 )xt−8 + ut , (13)
yt = α + k0 (xt + xt−1 + · · · + xt−8 )+
+ k1 (xt−1 + 2xt−2 + · · · + 8xt−8 )+
+ k2 (xt−1 + 4xt−2 + · · · + 64xt−8 ) =
8
X 8
X 8
X
yt = α + k0 xt−i +k1 i xt−i +k2 i2 xt−i +ut (14)
i=0 i=1 i=1
| {z } | {z } | {z }
W0t W1t W2t
yt = α + k0 W0t + k1 W1t + k2 W2t + ut (15)
We estimate (15), then calculate βi (lags 0 to 8) as in (12)

. . . note the reduction in estimated parameters (10 vs 4).
Method developed by Shirley Almon in the 60ies.

Equations (14) & (15) can be generalized easily: for m lags, sums
go to m, for higher order polynomials, we add more W -terms.
Advantages of this approach:

Saves degrees of freedom
Removes the problem of multicollinearity
Does not affect the assumptions for u, because errors do not
change during transformation
In EViews, transformation is slightly modified.

In R, routines are available.
Infinite distributed lag models
Infinite distributed lag (IDL) models
Lagged regressors extend back to infinity

We cannot estimate IDL models without the use of simplifying
restrictions on parameters,
i.e. restrictions on lag distribution
IDL models are useful under the assumption of lagged

coefficients converging to zero as lag increases
Order of the IDL model (∞),

impact multiplier vs. long-run multiplier,
temporary vs. permanent change in x,
. . . all analogical to FDL models
IDL linear regression model: yt = f (xt , xt−1 , xt−2 , . . . ):
yt = α + δ0 xt + δ1 xt−1 + δ2 xt−2 + δ3 xt−3 + · · · + ut
Assumption for the geometric δj weights:

δj = γρ j , 0 < ρ < 1, j = 0, 1, 2, . . .
• (j is the lag operator here)
• δ0 ≡ γ for convenience of notation (see RDL)
• |ρ| < 1 not used (it would allow sign oscillations in δj ).
δj = δj−1 ρ, 0 < ρ < 1.
• follows from assumption above
Instantaneous propensity (multiplier): δ0 ≡ γ = γρ0
Long-run propensity (multiplier):

γ
δ0 + δ1 + δ2 + · · · = γ + γρ + γρ2 + · · · = γ(1 + ρ + ρ2 + . . . ) = 1−ρ
Koyck transformation of the IDL model:
yt = α + δ0 xt + δ1 xt−1 + δ2 xt−2 + · · · + ut |δj = γρj

yt = α + γxt + γρxt−1 + γρ2 xt−2 + · · · + ut (16)
yt−1 = α + γxt−1 + γρxt−2 + γρ2 xt−3 + · · · + ut−1 | × ρ (17)

2
ρyt−1 = αρ + γρxt−1 + γρ xt−2 + · · · + ρut−1 (18)
Now, we subtract (18) from (16):
yt − ρyt−1 = α(1 − ρ) +γxt + ut − ρut−1 (19)

| {z } | {z }
α0 vt
yt = α0 + γxt + ρyt−1 + vt (20)

IDL model: yt = α + δ0 xt + δ1 xt−1 + δ2 xt−2 + · · · + ut
Koyck transf.: yt = α0 + γxt + ρyt−1 + vt
Using the Koyck transformation, we can calculate parameters
of the IDL model from the estimated model after Koyck
transformation:
δ̂0 = γ̂
δ̂j = γ̂ ρ̂ j ; j = 1, 2, 3, . . .
α̂0
α̂ = 1−ρ̂
Problems of the Koyck transformation:

in (20), regressor yt−1 is not exogenous (vt = ut − ρut−1 )
. . . IVR discussed separately
vt = ut − ρut−1 is not i.i.d.
Koyck transformation extends to models with multiple regressors
only if common ρ (geometric decay) can be assumed.
The geometric distributed lag is a special case of

rational distributed lag (RDL) model:
yt = α0 + γxt + ρyt−1 + vt (geometric distributed lag)

yt = α0 + γ0 xt + γ1 xt−1 + ρyt−1 + vt (RDL) (21)
This can be shown by successive substitution for the yt−1 term

of RDL equation (21), where we get:
yt = α + γ0 xt + (ργ0 + γ1 )xt−1 + ρ(ργ0 + γ1 )xt−2 +

+ ρ2 (ργ0 + γ1 )xt−3 + ρ3 (ργ0 + γ1 )xt−4 + . . .
+ ρh−1 (ργ0 + γ1 )xt−h + · · · + ut (22)
After estimating (21), we can calculate lag distribution for (22)

RDL specification:
yt = α0 + γ0 xt + γ1 xt−1 + ρyt−1 + vt
can be used to calculate δh in the IDL model:
yt = α + δ0 xt + δ1 xt−1 + δ2 xt−2 + · · · + ut
With RDLs, impact propensity γ0 ≡ δ0 can differ in sign from lagged
coefficients.
δh = ρh−1 (ργ0 + γ1 ) corresponds to the xt−h variable for h ≥ 1.
. . . δ0 may differ in sign from “lags”, even if 0 < ρ < 1.
. . . for ρ > 0, δh doesn’t change sign with growing h ≥ 1.
γ0 +γ1
Long-run propensity: LRP = 1−ρ ,
where (1 − ρ) > 0 ⇒ the sign of LRP follows the sign of (γ0 + γ1 ).
Also, ut = vt + ρvt−1 + ρ2 vt−2 + · · · M A(∞)
Koyck transformation and RDL model – summary
IDL model specification:

yt = α + δ0 xt + δ1 xt−1 + δ2 xt−2 + δ3 xt−3 + · · · + ut
Koyck: yt = α0 + γxt + ρyt−1 + vt

RDL: yt = α0 + γ0 xt + γ1 xt−1 + ρyt−1 + vt
Table 1: Koyck vs. RDL coefficients

Koyck RDL
Impact multiplier δ0 = γ δ0 = γ0
Lagged multiplier (lag h) δh = γρh δh = ρh−1 (ργ0 + γ1 )
γ γ0 +γ1
LRP 1−ρ 1−ρ
Partial adjustment model

is based on two main assumptions:
1 LRM describes long-run behavior of yt∗ : the unobserved,
expected/equilibrium/target/optimum value of yt :
yt∗ = α + βxt + ut (23)
2 Between two time periods, yt follows the process:
yt − yt−1 = θ(yt∗ − yt−1 ), 0<θ<1 (24)
Hence, the actual ∆yt is only a fraction of the “desirable” change

from yt−1 to the optimum value of yt∗ .
. . . in the special case of θ = 1, ∆yt leads to optimum.
Note: (24) can be re-written as: yt = θyt∗ + (1 − θ)yt−1
Parameter estimation of PAM:
yt∗ = α + βxt + ut (23)

yt = θyt∗ + (1 − θ)yt−1 , 0<θ<1 (24)
1 Substitute for yt∗ in (24) from (23):
yt = αθ + βθxt + (1 − θ)yt−1 + θut

(25)
yt = β00 + β10 xt + β20 yt−1 + θut
2 Estimate (25) and then calculate sought parameters of the PAM
in (23) and (24):
θ̂ = 1 − β̂20
α̂ = β̂00 /θ̂
β̂ = β̂10 /θ̂
Note that θut can be independent of yt−1 and i.i.d.

Parameter interpretation of PAM:
yt∗ = α + βxt + ut
yt = θyt∗ + (1 − θ)yt−1 , 0<θ<1

yt = β00 + β10 xt + β20 yt−1 + θut
Parameters:
θ : θ = (1 − β20 ) is the adjustment coefficient; higher θ̂ indicates

higher speed of adjustment towards equilibrium.
β10 : impact multiplier (short-run marginal propensity)
β : β = β10 /θ is the long run multiplier.
model is based on two main assumptions:
1 LRM describes behavior of yt , as a function of x∗t : the

unobserved, expected/equilibrium/target/optimum value
of xt (permanent income, potential output, etc.):
yt = α + βx∗t + ut . (26)
2 The unobserved x∗t process is defined as:
x∗t − x∗t−1 = φ(xt − x∗t−1 ), 0<φ<1

⇓ (27)
x∗t = φxt + (1 − φ)x∗t−1 .
with φ = 0 for static expectations

and φ = 1 for immediate adjustment.
Note: alternative 2nd hypothesis: x∗t = φxt−1 + (1 − φ)x∗t−1 .
Parameter estimation of AEH model:
yt = α + βx∗t + ut , (26)
x∗t = φxt + (1 − φ)x∗t−1 (27)

Successive substitution for x∗t from (27) to (26): IDL process
yt = α + βφxt + βφ(1 − φ)xt−1 + βφ(1 − φ)2 xt−2 + · · · + ut (28)
After applying Koyck transformation, we get
yt = αφ + βφxt + (1 − φ)yt−1 + vt
(29)
yt = β00 + β10 xt + β20 yt−1 + vt
Estimate (29), then calculate parameters in (26) and (27). φ̂ = 1 − β̂20

(φ is the “adaptive expectations coefficient”)
α̂ = β̂00 /φ̂
β̂ = β̂10 /φ̂ (β10 and β are the SR and LR propensities )
Note: Problems of Koyck-transformed model estimation apply.
Koyck, PAM, AEH: regression of yt on xt and yt−1
The same underlying regression model (statistical form) is used:

The Koyck transformation:
yt = α0 + γxt + ρyt−1 + vt
The Partial adjustment model (PAM):
The Model with adaptive expectations (AEH):
yt = αφ + βφxt + (1 − φ)yt−1 + vt
We can make three different interpretations from one estimated

equation.
Of course, not all interpretations are always relevant, we must choose

according to application and have to test the assumptions made.
Example: Model for ct ← f (xt , xt−1 , xt−2 , . . . ):

private consumption (ct ) as a function of disposable income (xt ).
Same estimated model for Koyck, PAM, AEH:
ĉt = 1, 038 + 0.404xt + 0.501ct−1

Koyck: ĉt = α̂0 + γ̂xt + ρ̂ct−1
Koyck: IDL, geometric decay in δ parameters assumed:

ρ̂ = 0.501
1,038
α̂0 = 1, 038 = α̂(1 − ρ̂) = α̂(1 − 0.501) ⇒ α̂ = 0.499 = 2, 080
δ̂j = γ̂ ρ̂ j = 0.404 × 0.501 j
γ̂ 0.404 .
LRP = 1−ρ̂ = 0.499 = 0.81
IDL: ĉt = 2, 080 + 0.404

| {z } xt + 0.202
| {z } xt−1 + 0.101
| {z } xt−2 + . . .
γ̂ ρ̂0 γ̂ ρ̂ γ̂ ρ̂2
Example continued:
ĉt = 1, 038 + 0.404xt + 0.501ct−1

PAM: ĉt = α̂θ̂ + β̂ θ̂xt + (1 − θ̂)ct−1
(1 − θ̂) = 0.501 ⇒ θ̂ = 0.499

1,038
α̂θ̂ = 1, 038 ⇒ α̂ = = 2, 080
0.499
0.404 .
β̂ θ̂ = 0.404 ⇒ β̂ = 0.499 = 0.81
PAM: ĉ∗t = 2, 080 + 0.81xt

ct − ct−1 = 0.499 · (c∗t − ct−1 )
If ct has a prominent inertia and ∆ct significantly follows

changes in habits, we might use the PAM approach.
Example continued:
ĉt = 1, 038 + 0.404xt + 0.501ct−1

AEH: ĉt = α̂φ̂ + β̂ φ̂xt + (1 − φ̂)ct−1
(1 − φ̂) = 0.501 ⇒ φ̂ = 0.499

1,038
α̂φ̂ = 1, 038 ⇒ α̂ = = 2, 080
0.499
.
β̂ φ̂ = 0.404 ⇒ β̂ = 0.404
0.499 = 0.81
AEH: ĉt = 2, 080 + 0.81x∗t

x∗t = 0.499xt + 0.501x∗t−1
If ct if formed as a function of expected (e.g. permanent)

income, we might prefer AEH.
Et−1 (xt ) = a0 + a1 xt−1 + b1 z1,t−1 + b2 z2,t−2 + . . .
Et−1 (xt ): expected value of xt at time t − 1

zk,t−j : exogenous variables with impact on Et−1 (xt )
We put x∗t = Et−1 (xt ) into (26): yt = α + βx∗t + ut
We assume that agents:

know all relevant information
know how to use this information
Agents can make prediction errors (vt ), so:
xt = x∗t + vt
Rational vs. adaptive expectations
Under rational expectations:
Expected value of prediction errors [vt = xt − x∗t ] must be zero.

If they were systematically different from zero, rational agents
would immediately adjust their forecasting methods accordingly.
[vt = xt − x∗t ] prediction error must be uncorrelated with any

information available when the prediction is made (t − 1). If not,
this would imply that the forecaster has not made use of all
available information.
These properties can be used for testing the rational

expectations hypothesis in different applications
(e.g. through ex-post simulated dynamic forecasts).
Some economic application that use expectations
Philips curve (Expectations-augmented Phillips curve)

Efficient market hypothesis (EMH)
Consumption function - Permanent income hypothesis

Block 2

Uploaded by

Copyright:

Available Formats

Block 2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Block 2

Uploaded by

Copyright:

Available Formats

Block 2

Regression models based on time series

Advanced econometrics 1 4EK608

Vysoká škola ekonomická v Praze

1 Time series-based LRMs – repetition from BSc. courses

2 Stationarity, unit root tests, cointegration

4 Finite and infinite distributed lag models

TS.2 No perfect collinearity

TS.3 Strict exogeneity

Note: Compare strict exogeneity to a relaxed version,

TS.4 Conditional homoscedasticity

var(ut |X) = var(ut ) = σ 2 , t = 1, 2, . . . , n

TS.5 Serial correlation (autocorrelation) is not present

CLRM: Classical linear regression model

Under TS.1 - TS.3, OLS estimators are unbiased.

Under assumptions TS.1 - TS.5,

Under assumptions TS.1 - TS.5,

Under assumptions TS.1 - TS.6, β̂j are normally distributed.

Detrending algorithm (based on FWL theorem):

yˆt = βˆ1 + βˆ2 xt2 + βˆ3 xt3 + βˆ4 t

Regress each variable on constant, time and save residuals

ÿt , ẍt2 , ẍt3 , t = 1, 2, . . . , n

Regress ÿt on ẍt2 , ẍt3

ÿˆt = βˆ2 ẍt2 + βˆ3 ẍt3

With trending y, coefficient of determination overshoots.

σ̂u2 is unbiased estimator (if trend is among regressors).

Better approach: regress ÿt on ẍt1 , ẍt2 . The corresponding

is more reliable (does not overshoot) when compared to the

Deseasonalizing algorithm (based on FWL theorem):

Regress variables on constant, seasonal dummies and save residuals:

ÿt , ẍt2 , ẍt3 , t = 1, 2, . . . , n

Regress ÿt on ẍt2 , ẍt3

ÿˆt = βˆ2 ẍt2 + βˆ3 ẍt3

Time series-based LRMs:

With large samples, weaker assumptions are sufficient

For large samples, key assumptions are:

Covariance stationarity: first two moments and

Weak dependency: correlation between yt and yt+h

For Central Limit Theorem (CLT) and Law of Large Numbers

Time series can be non-stationary and weakly dependent.

Notes on CLT & LLN

Moving average process of order one: ma(1)

where et is an iid time series.

Stable autoregressive process of order 1: ar(1)

yt = ρyt−1 + et ⇒ corr(yt , yt+h ) = ρh

If the stability condition holds, TS is weakly dependent because

TS.2’ No perfect collinearity

TS.3’ Contemporaneous exogeneity

E(ut |xt1 , . . . , xtk ) = E(ut |xt ) = 0

TS.4’ Contemporaneous homoscedasticity

var(ut |xt ) = var(ut ) = σ 2

TS.5’ No serial correlation

OLS estimators are consistent (not unbiased)

Removing strict exogeneity (TS.3 → TS.3’): no restriction on how ut

We allow for feedback from (lagged) explained variable to

We can use lagged dependent variable as a regressor.

Theorem: Asymptotic normality of OLS: Under assumptions TS.1’ –

Usual OLS standard errors, t-statistics and F -statistics are

Causes and effects of autocorrelation