Block 5

Block 5
Multivariate Time Series analysis

VAR models & IRFs, VECM
Advanced econometrics 1 4EK608

Pokročilá ekonometrie 1 4EK416
Vysoká škola ekonomická v Praze

Content
1 VAR: Vector autoregression models
2 VAR: Model setup and testing
3 VAR: Forecasting
4 VAR: Impulse-response functions (IRFs)
5 VAR: Variance decomposition
6 VECM
7 VAR & VECM – Other extensions

VAR: Vector autoregression models
VAR model: introduction
Univariate autoregressive models (AR models) describe specific

time-varying processes in nature, economy, etc.
AR processes/models may be either stationary or non-stationary.
AR(p) model (autoregressive model of order p): the modelled variable
depends linearly on its own previous values and a stochastic term
Pp
AR(p): yt = c + ρ1 yt−1 + ρ2 yt−2 + · · · + ρp yt−p + ut = c + i=1
ρi yt−i + ut
VAR models generalize the univariate autoregressive model (AR model)

by allowing for more than one evolving variable:
yt becomes yt , where yt0 = (y1t , y2t , . . . , ymt )
VAR models capture linear interdependencies among multiple time
series
Atheoretical: The only prior knowledge required for VAR modeling is
a list of variables which can be hypothesized to affect each other
inter-temporally
VAR models: origins

C. Sims (Nobel price) reacted in the 1980ies against SEMs. His arguments:
Large scale macroeconomic models failed in providing governments
with adequate economic forecasts.
For identification, some SEM variables must be omitted (e.g. through
zero restrictions on parameters). Such omissions are often arbitrary
and lack justification (hence undermine model credibility).
Endogenous/exogenous division of variables tends to be arbitrary.
VAR models vs. SEMs:
Even very simple VAR models usually provide more realistic
predictions of variables involved (compared to SEMs).
In VAR models, all variables are treated as endogenous.
No zero-restrictions placed on parameters (such restrictions may be
easily applied if empirically convenient).
VAR models make for a versatile toolbox, many extensions and
generalizations are possible.
VAR model: notation
We work with m-variable (m-dimensional) VAR(p) models
A two-variable VAR(3) model may be denoted as follows:
3
P
yt = c + A1 yt−1 + A2 yt−2 + A3 yt−3 + ut = c + Ai yt−i + ut
i=1
or:

y1t c1 a11,1 a12,1 y1t−1 a11,2 a12,2 y1t−2
= + + +
y2t c2 a21,1 a22,1 y2t−1 a21,2 a22,2 y2t−2

a11,3 a12,3 y1t−3 u1t
+ +
a21,3 a22,3 y2t−3 u2t
or:
y1t = c1 + a11,1 y1t−1 + a12,1 y2t−1 + a11,2 y1t−2 + a12,2 y2t−2 + a11,3 y1t−3 +
+ a12,3 y2t−3 + u1t
+ a22,3 y2t−3 + u2t
VAR model: notation

Any VAR(p) specification can be equivalently rewritten as a VAR(1)
by stacking the lags of the VAR(p) variables and by appending
identities to complete the number of equations;
For example, a VAR(2) model yt = c + A1 yt−1 + A2 yt−2 + ut can be
written as a VAR(1):

yt c A1 A2 yt−1 ut
= + +
yt−1 0 I 0 yt−2 0
In general, any m-dimensional VAR(p) model may be re-written as:

Yt = v + AYt−1 + Ut , where:
 
A1 A2 ... Ap−1 Ap
yt v ut
     
 Ik 0 ... 0 0
 yt−1  0 0 Ik 0
0
0 ; U :=  
 
Yt := 
 ... ; v :=  ... ; A :=  .
    t .. 
 .. .. .. .. .. 
  .
. . . .
yt−p+1 0 0
0 0 ... Ik 0
dim: (mp × 1) (mp × 1) (mp × mp) (mp × 1)

+ a12,3 y2t−3 + u1t
+ a22,3 y2t−3 + u2t
All regressors are lagged variables - they can be assumed

contemporaneously uncorrelated with the disturbances u1t , u2t .
Hence, each equation can be consistently estimated by OLS on
an individual basis.
We have identical regressors in all individual VAR model
equations. Often, we experience contemporaneous correlation
between endogenous variables. Hence, in practical applications,
the elements of ut tend to be contemporaneously correlated.
However, FGLS methodology doesn’t bring any improvement to
model estimation (refer to SURE-related discussion in Block 3).
For forecasting into the t + 1 period, only current and past values
of y variables are required (generally, these are readily available).
VAR: Model setup and testing
What do we need to set up a VAR model?
A (small) set of endogenous variables

In empirical applications, we would use up to six variables
(roughly). However, there is no theoretical upper limit for m,
given adequate (long) TS are available for model estimation.
Decision on a lag length
VAR approach assumes a single lag-length for the whole model
(for each equation).
Longer is preferable with this method. Sufficient lags have to be
included to ensure non-autocorrelated residuals in all equations.
On the other hand, because of the degrees of freedom problem
(over-parameterization), variables frequently have to be excluded
from the model and limit has to be placed on the length of lags.
Decide on inclusion of deterministic/exogenous regressors
Trend, seasonal dummies, “pulse” variables (often, time dummies
controlling for one-off events), exogenous regressors (oil price).
VAR(p) models augmented by deterministic (and lagged
exogenous) regressors are often called VARX(p) models.
How do we choose endogenous variables for a VAR model?
Prior information (economic theory) is applied to select

model variables
Granger Causality tests are applied for setup verification

Definition of Granger Causality (GC): X is said to be a Granger
cause of Y if present Y can be predicted with greater accuracy
by using past values of X rather than not using such past values,
all other relevant information being identical.
Definition easily extends to the situation where X and Y are
multidimensional processes.
GC is a statistical concept {its not an “actual” causality}.
Different types of GC tests exist. (Sims test, modified Sims test)
GC tests apply to stationary series!
Granger Causality Tests: F test for a 2-variable

VAR(p) model (Direct GC Test: X → Y)
GC
In a VAR(p) model with two variables (Y , X), we can use a

simple F test for multiple linear restrictions to test for the H0
of no Granger causality:
We start with a full (unrestricted) VAR(p) model
p p
X X Test statistic:
yt = c + αi yt−i + βi xt−i + ut
(SSRR −SSRU R )
i=1 i=1 F = q(SSRU R )
(T −k) ∼ F (q; T −k)
H0
where
Under H0 of no GC, all βi = 0:
k - number of estimated parameters
p in the UR model
X
yt = c + αi yt−i + ut q - number of restrictions imposed
on the R model
i=1
T - number of observations
Wald test for GC
In an m-dimensional VAR(p) model, we can partition yt into

two processes: xt and zt .
Then, the H0 of non-causality between xt and zt (GC-type)
may be characterized – and tested – using specific zero
constraints on the coefficients of the estimated VAR(p) system.
Wald test may be used as an asymptotical test for such
constraints.
(see Lütkepohl: “New introduction to multiple time series analysis”)

Limitations and drawbacks in Granger Causality testing
In practical applications, the “all other relevant

information being identical” clause in the GC definition
may cause problems, as the results of GC testing between
X and Y are sensitive to information (variables, lags)
included in the system.
Data frequency can have important impact. For example, if

GC is found in monthly data, this does not necessarily
imply GC in daily/weekly/quarterly/annual series of the
same variables. The same applies to seasonally
adjusted/unadjusted series.
GC tests are performed on estimated rather than known

systems.
How do we decide on the lag-length of a VAR(p) model?
When estimating VARs or conducting GC tests, results can be

sensitive to the lag length of the VAR
Sometimes the VAR model lag length corresponds to the data,
such that models with quarterly data have 4 lags, monthly data
are used with 12 lags, etc.
A more rigorous way to determine the optimal lag length is to
use the Akaike or Schwarz-Bayesian information criteria (IC).
However, VAR model estimations tend to be sensitive to the
presence of autocorrelation. In such case, after using IC, if there
is any evidence of autocorrelation, further lags are added, above
the number indicated by the IC, until the autocorrelation is
removed.
The main information criteria are the Schwarz-Bayesian criteria

(SIC, SBIC) and the Akaike criteria (AIC).
They operate on the basis that there are two competing factors
related to adding more lags to a model. More lags will reduce
the RSS, but also generate some loss of degrees of freedom
(penalty for complexity).
The aim is to minimize the IC value - adding an extra lag will
only benefit the model if the reduction in the RSS outweighs the
loss of degrees of freedom.
In general, the SBIC has a harsher complexity penalty term than
the AIC (sometimes leading to smaller p values).
Single-equation statistics Multivariate statistics
2k 2k 0
AIC = log(σ̂ 2 ) + MAIC = log |Σ̂| +
T T
k k0
SBIC = log(σ̂ 2 ) + ln T MSBIC = log |Σ̂| + log(T )
T T
where: where:
σ̂ 2 - residual variance Σ̂ - covariance matrix of the residuals
T - sample size T - number of observations
k - number of parameters k0 - total number of regressors
in all equations
Lag-length selection - empirical example:

varsoc oilp igae_s ex_rate impi ppi cpi i_rate if time>= tm(2001m7), maxlag(6)
Selection-order criteria
Sample: 2001m7 - 2013m2 Number of obs = 140
lag LL LR df p FPE AIC HQIC SBIC
0 1951.92 2.0e-21 -27.7845 -27.7247 -27.6374
1 3360.47 2817.1 49 0.000 7.4e-30 -47.2067 -46.7285 -46.03*
2 3458.16 195.39 49 0.000 3.7e-30* -47.9023* -47.0058* -45.6961
3 3494.82 73.311 49 0.014 4.5e-30 -47.7259 -46.411 -44.4901
4 3528.78 67.922 49 0.038 5.7e-30 -47.5111 -45.7778 -43.2457
5 3562.2 66.846 49 0.046 7.4e-30 -47.2886 -45.1369 -41.9936
6 3603.7 82.998 49 0.002 8.8e-30 -47.1814 -44.6113 -40.8569
Endogenous: oilp igae_s ex_rate impi ppi cpi i_rate
Exogenous: _cons
Stability testing of a VAR model:

A VAR(1) proces
yt = v + A1 yt−1 + ut is stable if
the following condition is met:
det(Im − A1 z) 6= 0 for |z| ≤ 1,
alternatively:
lim An
1 = 0m
n→∞
where:
z is a scalar (number)
Im – identity matrix, where m is
the number of variables in a VAR,
0m – (m × m) zero matrix.
Graphical representation of the stability
Note: Any VAR(p) model may be condition (example): If the moduli of the
re-written as VAR(1) . . . eigenvalues of A1 are less than one, then
the VAR(p)-process is stable.
VAR model testing (residuals):
Serial correlation tests: Portmanteau, Breusch & Godfrey

heteroscedasticity: ARCH
Normality tests: Jarque & Bera, etc.
Structural stability: CUSUM
Functions serial(), arch(), normality() and stability()

in package {vars}.
VAR: Forecasting
VAR-based Forecasting: Introduction
Wold decomposition of stable VAR(p) models:

. . . concept useful for forecasting and IRF construction.
A simplified stable VAR(p) model

yt = A1 yt−1 + A2 yt−2 + · · · + Ap yt−p + ut can be written as:
yt = Φ0 ut + Φ1 ut−1 + Φ2 ut−2 . . . ,
Ps
where Φs = j=1 Φs−j Aj for s = 1, 2, . . . ; Aj = 0 for j > p
Φ0 = Im (m × m) Identity matrix.
See Lütkepohl: “New introduction to multiple time series

analysis” for derivation and discussion.
VAR: Forecasting
Wold decomposition examples:
Stable VAR(2) model: Stable VAR(1) model:

. . . simplified: no intercept term . . . any VAR(p) may be written
yt = A1 yt−1 + A2 yt−2 + ut as VAR(1)
may be written as: yt = A1 yt−1 + ut
may be written as:
yt = Φ0 ut + Φ1 ut−1 +
+ Φ2 ut−2 . . . yt = Φ0 ut + Φ1 ut−1 +
+ Φ2 ut−2 . . .
where:
Φ0 = Im where:
Φ1 = Φ0 A1 Φ0 = Im
Φ2 = Φ1 A1 + Φ0 A2 Φ1 = Φ0 A 1 = A 1
Φ3 = Φ2 A 1 + Φ1 A 2 Φ2 = Φ1 A1 = A1 A1 = A21
... Φ3 = Φ2 A1 = A31
Φs = Σsj=1 Φs−j Aj = ...
= Φs−1 A1 + Φs−2 A2 Φs = As1
(s = 1, 2, . . . ; Aj = 0 for j > p) (note that Φ0 = A01 = Im )
VAR: Forecasting
Iterative forecasting
VAR(p) model is estimated using observations t = 1, 2, . . . , T :
yt = Â1 yt−1 + Â2 yt−2 + · · · + Âp yt−p + ût
and for t = T :
yT = Â1 yT −1 + Â2 yT −2 + · · · + Âp yT −p + ûT .
Arbitrarily long forecasts (t = T + 1, T + 2, . . . , T + h) can be
iteratively produced using the estimated A-matrices and the observed
(t = 1, 2, . . . , T ) and predicted (t = T +1, T +2, . . . , T +h) values of yt :
As we move through the
ŷT +1 = Â1 yT + Â2 yT −1 + · · · + Âp yT −p+1 prediction time period,
predicted values are used
ŷT +2 = Â1 ŷT +1 + Â2 yT + · · · + Âp yT −p+2 as regressors for subsequent
periods & predictions . . .
ŷT +3 = Â1 ŷT +2 + Â2 ŷT +1 + · · · + Âp yT −p+3

......
VAR: Forecasting
Forecast error covariance matrix:

 I 0 ... 0
  I 0 ... 0
0
yT +1 − ŷT +1 Φ I 0 Φ1 I 0
cov ... =  . ..  (Σu ⊗ Ih )  . .. 
yT +h − ŷT +h . .
. . 0 . . 0
Φh−1 Φh−2 ... I Φh−1 Φh−2 ... I
where
Σu = cov (ut ) is the Forecast from a VAR(5): example
white noise covariance (confidence levels included)
matrix,
(Σu ⊗Ih ) is a Kroneker
product;
Ih is h × h,
Φi are the coefficient
matrices of the Wold
moving average repre-
sentation of a stable
VAR(p)-process.
VAR: Impulse-response functions (IRFs)
Based on Wold decomposition of a stable VAR(p).
IRFs describe the dynamic interactions between endogenous variables.

(provided yt is stationary!)
The [i, j]-th elements of the matrices Φs are (interpreted as) the
expected response of variable y(i,t+s) to a unit change in variable yj,t .
Psbe cumulated through time: [i, j]-th elements of

IRF can
Cs = l=1 Φl measure the accumulated response of variable yi,t+s to
a unit change in variable yj,t .
IRFs are used for policy analysis: for individual shocks (shocks in
different model equations), we can study the dynamic effects on all
variables in the model.
Disturbances in different model equations tend to be

contemporaneously correlated: we cannot realistically simulate
isolated shocks. Solution: model transformation/orthogonalization. . .
IRF example:
(from Lütkepohl: “New introduction to multiple time series analysis”)
We start with an estimated 3-dimensional VAR(1) system:
y1,t = 0.5y1,t−1 + u1,t How IRFs work:

y2,t = 0.1y1,t−1 + 0.1y2,t−1 + 0.3y3,t−1 + u2,t Say, at time t, we have
a unit disturbance in
y3,t = + 0.2y2,t−1 + 0.3y3,t−1 + u3,t
y2,t (isolated contem-
This may be re-written as poraneous disturbance
through u2,t ). At time
" # " #" # " #
y1,t 0.5 0 0 y1,t−1 u1,t t + 1, it causes: y2,t+1
y2,t = 0.1 0.1 0.3 y2,t−1 + u2,t changes by 0.1 and
y3,t 0 0.2 0.3 y3,t−1 u3,t y3,t+1 changes by 0.2
and y1,t+1 is unaffected
" #
0.5 0 0
Hence, A1 = 0.1 0.1 0.3 Also, for a VAR(1) model: Φs = As1
0 0.2 0.3
IRF example contd.
Using Φs = As1 , the IRFs may be generated as follows:
" #
1 0 0
Φ0 = A01 = 0 1 0 ,
0 0 1
" #
0.500 0 0
Φ1 = A 1 = 0.100 0.100 0.300 ,
0 0.200 0.300
0.250 0 0
 
0.060 0.070 0.120
Φ2 = A21 =  ,

[3,1]-th elements of Φs = As1
0.020 0.080 0.150
[VAR(1) model . . . ] matrices mea-
sure the response of variable y3,t+s
0.125 0 0
 
to a unit change in variable y1,t
0.037 0.031 0.057 (a unit u1,t disturbance occurs at
Φ3 = A31 =  ,

t = 0).
0.018 0.038 0.069
Φs : “Impulses/shocks in
columns and responses in rows”
...
IRF example contd.
Cumulative
Ps IRFs may be easily produced and plotted using
Cs = l=1 Φl :
IRF (CPI → GDP) Cumulative IRF (CPI → GDP)
In a stable VAR(p) model, responses to a one-off shock die out over

time (shown left). Hence, the accumulated IRF converges to some
constant value (shown right).
Important extensions to VARs are based on assumptions (zero restrictions)
on the (asymptotic) behavior of Cs (Blanchard-Quah decomposition, see:
Lütkepohl: “New introduction to multiple time series analysis”)
IRF – orthogonalization example

Previous example is based on – a very strong – assumption of
uncorrelated random elements of the vector ut . Usually, we cannot
realistically simulate isolated shocks on observed variables.
2-dimensional VAR(1) model with correlated errors example:

y1t a11 a12 y1t−1 u
= + 1t
y2t a21 a22 y2t−1 u2t
where var(u1t ) = σ12 and var(u2t ) = σ22 ,

and, most importantly, cov(u1t , u2t ) = E(u1t · u2t ) = cov12 6= 0.
| {z }
h i
⇒ It is unrealistic to simulate isolated unit disturbances to ut as 10 .
If cov12 6= h0, then – ifor ah unitidisturbance in u1t – we have:

1
dist(ut ) =
E(1 · u2t )
= cov1
12
IRF – orthogonalization example contd.

h i h ih i h i
y a a y u
Base VAR model: y1t = a11 a12 y1t−1 + u1t
2t 21 22 2t−1 2t
In our VAR(1) model, responses to a unit disturbance in u1t are:

h i h i h ih i h i
1 y a a 1 a
For a ut disturbance 0 : E∆ y1t+1 = a11 a12 0 = a11
2t+1 21 22 21
h i
1
For a ut disturbance cov :
12
h i h ih i h i
y1t+1 a a 1 a + a cov
E∆ y = a11 a12 cov = a11 + a12 cov12
2t+1 21 22 12 21 22 12
It is virtually impossible to study the isolated effects of individual

disturbances (analysis is even more complicated for p > 1 & m > 2)
IRF – orthogonalization example contd.

y a a12 y1t−1 u
Our sample VAR(1) model: 1t = 11 + 1t
y2t a21 a22 y2t−1 u2t
with var(u1t ) = σ12 , var(u2t ) = σ22 and cov12 6= 0,
errors may be transformed (orthogonalized) as follows:

y1t a11 a12 y1t−1 u1t
= +
(y2t − δy1t ) (a21 − δa11 ) (a22 − δa12 ) y2t−1 (u2t − δu1t )
where δ = cov12 /σ12 and cov(u1t , (u2t − δu1t )) = 0
Hence, IRFs based on the transformed model depict the

isolated effects of a given unit disturbance.
IRF orthogonalization – a generalized approach & notation:
In a VAR(p) model, Σu = cov(ut ) is the error-term covariance matrix

and it may be expressed as Σu = P P 0 where P is a lower triangular
matrix (assumptions on Σu apply).
Orthogonalized IRFs from a VAR(p) model
yt = A1 yt−1 + A2 yt−2 + · · · + Ap yt−p + ut
may be calculated using the transformed MA representation:
yt = Ψ0 εt + Ψ1 εt−1 + Ψ2 εt−2 . . . ,
where: εt = P −1 ut
Ψi = Φi P for i = 1, 2, . . .
Ψ0 = P .
(Use bootstrapped confidence intervals for the orthogonalized IRFs)
. . . see: Lütkepohl: “New introduction to multiple time series
analysis” for detailed description.
IRF orthogonalization – final remarks:
Σu = P P 0 and P –based transformation is sensitive to the

ordering of equations.
i.e. different ordering of variables in y often yields different
orthogonalized IRFs!
Orthogonalized innovations are difficult to interpret... Even

if y1t and y2t are well defined, the dimension of (y2t − δy1t )
/ see previous example / often has no economic
interpretation.
We treat orthogonalized IRFs as dimensionless series . . .
Generalized orthogonalization approaches exist (IRFs

independent of y ordering).
. . . see: Lütkepohl: “New introduction to multiple time
series analysis”
VAR: Variance decomposition
Forecast Error Variance Decomposition (FEVD)
FEVD: based on orthogonalised impulse response

coefficient matrices Ψ
Used to analyse the contribution of variable j to the h-step
FEV of variable k.
R: use fevd() in {vars}
VAR: Final remarks
Are VAR models atheoretical?

Essentially yes, there are no prior restrictions on
parameters in VARs.
Often, estimated VARs may lead to models consistent with
economic theory. For example, for yt0 = (Unemplt , CPIt ),
VAR(p) models often generate IRFs consistent with
Phillip’s theory. For verification, we use causality tests.
IRF critique
If “important” variables are dropped from a VAR model,
IRFs may be significantly distorted.
In most practical cases (yet, not generally) predictions from
such “reduced” VARs might remain largely unaffected.
VAR: Final remarks
Selected VAR-related topics & extensions

not covered in this course:
Structural VAR models: SVARs
Time-varying (and/or) factor-augmented VARs
Blanchard-Quah decomposition
. . . many extensions to VARs exist
VECM: Introduction
VAR models are well suited to model systems of

nonstationary cointegrated variables.
Forecasting is possible,
IRFs do not converge to zero over time if the underlying
series are non-stationary . . .
For cointegrated series, we can use the error correction
mechanism (ECM) to model short-time dynamics.
Such models are named Vector Error Correction Models
(VECMs).
Long term dynamics in the m-dimensional system of
variables [given cointegrating relationship(s) exist(s)] is
used in a VECM: an ECM-like model in first differences.
Most of the previous discussion of VAR models can be
adequately applied to VECMs.
VECM-specific topics follow
VECM: Number of cointegrating vectors
For an m-dimensional I(1) / non-stationary / vector y = (y1 , y2 , . . . , ym )0 ,
there are
0 ≤ r < m possible cointegrating vectors [r = 0 ⇒ non-CI series]
and (m − r) common stochastic trends [if r is nonzero]
(Proof comes from a Beveridge-Nelson decomposition of ∆y)
y = (y1 , y2 )0 : max. 1 linearly independent cointegrating vector: r ∈ {0; 1}

If any (αy1 + βy2 ) ∼ I(0), we can find ∞ linear combinations of (α, β)
that lead to I(0) processes.
Hence, it is easy and common to normalize the CI relationship by
setting α = 1
If y1 , y2 ∼ CI(1, 1), a cointegrating vector β = (1, −β2 )0 exists, such
that (y1 − β2 y2 ) ∼ I(0)
Example of a CI system for y = (y1 , y2 )0 with β = (1, −β2 )0 :

1 y1t = β2 y2t + ut , ⇐ one CI relationship
2 y2t = y2t−1 + vt , ⇐ one common “stochastic trend”
where ut , vt ∼ I(0)
Ex.: CI system for y = (y1 , y2 , y3 )0 with r = 1 an β = (1, −β2 , −β3 )0 :

1 y1t = β2 y2t + β3 y3t + ut , ⇐ one CI relationship
2 y2t = y2t−1 + vt , ⇐ first common “stochastic trend”
3 y3t = y3t−1 + wt , ⇐ second common “stochastic trend”
where ut , vt , wt ∼ I(0)
The 1st equation describes the long-run equilibrium, 2nd and 3rd equations
describe the (m − r) common stochastic trends.
Possible economic setup for the example:
y1t ∼ I(1) nominal F/X rate index for 2 currencies (domestic/foreign),
y2t ∼ I(1) domestic inflation index (e.g. CPI),
y3t ∼ I(1) foreign inflation.
According to Zivot and Wang, we can test a hypothesis of stationary
combination of the three series: real exchange rate ∼ I(0).
(with equal domestic and foreign inflation, F/X rate is unchanged)
Example of a CI system for y = (y1 , y2 , y3 )0 with r = 2
1 y1t = β13 y3t + ut , ⇐ first CI relationship
2 y2t = β23 y3t + vt , ⇐ second CI relationship
3 y3t = y3t−1 + wt , ⇐ a common “stochastic trend”
where ut , vt , wt ∼ I(0)
Here, we have two CI vectors: β1 = (1, 0, −β13 )0 and β2 = (0, 1, −β23 )0 .
Remember: β1 and β2 are normalized - yet not unique - stationary
combinations of the variables.
Any linear combination: β3 = c1 β1 + c2 β2 is also a CI vector
β1 and β2 form the basis (span the space) of
cointegrating vectors. . .
Financial market-based setup for the example:
y1t ∼ I(1) Long-term interest rate “1” (say, 3M),
y2t ∼ I(1) Long-term interest rate “2” (say, 6M),
y3t ∼ I(1) Short-term interest rate (say, overnight).
CI relationships indicate that the spreads between the short-run and
long-run rates are stationary, i.e. I(0).
VECM: Construction
Start with a VAR(p) model for an m-dimensional I(1) / non-stationary /

vector y = (y1 , y2 , . . . , ym )0 :
yt = ΘDt + A1 yt−1 + A2 yt−2 + · · · + Ap yt−p + ut , t = 1, 2, . . . , T ,
where Dt contains deterministic terms (constant, trend, seasonality, etc.)
Even if y series are cointegrated through some CI vector β, the
cointegrating relationship is “hidden” in the VAR(p) representation – it
only becomes apparent in first differences-based VECM:
This is the error correction
element of the VECM spec-
The VECM model is defined as follows: ification
∆yt = ΘDt + Πyt−1 + Γ1 ∆yt−1 + · · · + Γp−1 ∆yt−p+1 + εt ,
where Π = A1 + A2 + · · · + Ap − Im×m is the long-run impact matrix

0 ≤ rank(Π) < m defines r, the number of CI vectors
Pp
Γk = − j=k+1
Aj k = 1, . . . , p − 1 : short-run impact matrices
Note that VAR(p) is transformed into a “VECM(p − 1)”

VECM: Construction
VAR(1) model for a 2-dimensional vector y = (y1 , y2 )0 ; y1 , y2 ∼ CI(1, 1):

yt = A1 yt−1 + ut , t = 1, 2, . . . , T ,
For a bivariate VAR, only one cointegrating vector β = (1, −β2 )0 can exist
/normalized/. The VECM(0) model – no (Γp−j ∆yt−p+j ) terms – is defined
as follows:
∆yt = Πyt−1 + εt ,
where Π = (A1 − I2 ) is a 2 × 2 matrix with rank:
r = 1.
α1 α1 −α1 β2
Π may be decomposed as follows: Π = (1, −β2 ) =
α2 α2 −α2 β2
To understand the decomposition, we may re-write the VECM:
∆y1t = α1 (y1,t−1 − β2 y2,t−1 ) + ε1t
1st equation relates changes in y1t to disequilibrium error (y1,t−1 −β2 y2,t−1 )
∆y2t = α2 (y1,t−1 − β2 y2,t−1 ) + ε2t
2nd equation relates changes in y2t to disequilibrium error (y1,t−1 −β2 y2,t−1 )
VECM: Johansen’s methodology
1 Specify & estimate the m-dimensional VAR(p) model for

yt .
2 Construct (Likelihood ratio) tests to determine the rank of
Π, i.e. to determine the number of cointegrating vectors.
3 Impose normalization and identifying restrictions on the
cointegrating vectors (if necessary).
4 Given normalized CI vectors, estimate the resulting
cointegrated VECM using maximum likelihood.
X IRFs and forecasts may be generated after a VECM is
estimated . . .
∆yt = ΘDt + Πyt−1 + Γ1 ∆yt−1 + · · · + Γp−1 ∆yt−p+1 + εt ,
If yt ∼ I(1) ⇒ Π is a singular matrix: 0 ≤ rank(Π) < m
1 rank (Π) = 0 ⇒ Π = 0
yt is not cointegrated and the VECM is a VAR on 1st diffs.
2 0 < rank(Π) = r < m

yt is cointegrated with r linearly independent CI vectors
and (m − r) common stochastic trends.
3 . . . rank(Π) = m ⇐⇒ yt ∼ I(0)
full rank of Π means that yt is in fact stationary.
Johansen’s Trace Statistic & test
Based on the estimated eigenvalues of the matrix Π:

λ̂1 > λ̂2 > · · · > λ̂m where 0 ≤ λj < 1
H0 (r) : r = r0
H1 (r0 ) : r > r0
(where r0 is the # of nonzero eigenvalues under H0 .)
Under H0 , eigenvalues λ̂r0 +1 , . . . , λ̂m should be close to

zero (as well as the LRtr statistic):
m
X
LRtr (r0 ) = − T log(1 − λ̂i )
i=r0 +1
Under H0 , LRtr follows a multivariate Dickey-Fuller

distribution.
Testing sequence for Johansen’s Trace-Statistic test
1 H0 : r = 0 vs. H1 : 0 < r ≤ m
2 H0 : r = 1 vs. H1 : 1 < r ≤ m
3 H0 : r = 2 vs. H1 : 2 < r ≤ m
··· ··· ···
m H0 : r = m − 1 vs. H1 : r = m
We keep increasing r until we no longer reject the null.

Johansen’s Maximum Eigenvalue statistic & test
Based on the estimated eigenvalues of the matrix Π:

λ̂1 > λ̂2 > · · · > λ̂m where 0 ≤ λj < 1
H0 (r0 ) : r = r0
H1 (r0 ) : r = r0 + 1
LRmax (r0 ) = − T log(1 − λ̂r0 +1 )
Under H0 , LRmax follows a complex multivariate

distribution (critical values implemented in SW).
Testing sequence is analogous

VAR & VECM – Other extensions
Advanced analysis of univariate and multivariate time series

(with examples in R)
http://faculty.washington.edu/ezivot/econ589/manual.pdf

Block 5

Uploaded by

Copyright:

Available Formats

Block 5

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Block 5

Uploaded by

Copyright:

Available Formats

Block 5

Multivariate Time Series analysis

Advanced econometrics 1 4EK608

Vysoká škola ekonomická v Praze

1 VAR: Vector autoregression models

2 VAR: Model setup and testing

4 VAR: Impulse-response functions (IRFs)

5 VAR: Variance decomposition

7 VAR & VECM – Other extensions

VAR model: introduction

Univariate autoregressive models (AR models) describe specific

VAR models generalize the univariate autoregressive model (AR model)

VAR models: origins

VAR model: notation

In general, any m-dimensional VAR(p) model may be re-written as:

dim: (mp × 1) (mp × 1) (mp × mp) (mp × 1)

All regressors are lagged variables - they can be assumed

A (small) set of endogenous variables

How do we choose endogenous variables for a VAR model?

Prior information (economic theory) is applied to select

Granger Causality tests are applied for setup verification

Granger Causality Tests: F test for a 2-variable

In a VAR(p) model with two variables (Y , X), we can use a

Wald test for GC

In an m-dimensional VAR(p) model, we can partition yt into

(see Lütkepohl: “New introduction to multiple time series analysis”)

Limitations and drawbacks in Granger Causality testing

In practical applications, the “all other relevant

Data frequency can have important impact. For example, if

GC tests are performed on estimated rather than known

How do we decide on the lag-length of a VAR(p) model?

When estimating VARs or conducting GC tests, results can be

How do we decide on the lag-length of a VAR(p) model?

The main information criteria are the Schwarz-Bayesian criteria

How do we decide on the lag-length of a VAR(p) model?

Single-equation statistics Multivariate statistics

Lag-length selection - empirical example:

Stability testing of a VAR model:

det(Im − A1 z) 6= 0 for |z| ≤ 1,

VAR model testing (residuals):

Serial correlation tests: Portmanteau, Breusch & Godfrey

Functions serial(), arch(), normality() and stability()

VAR-based Forecasting: Introduction

Wold decomposition of stable VAR(p) models:

A simplified stable VAR(p) model

See Lütkepohl: “New introduction to multiple time series

Wold decomposition examples:

Stable VAR(2) model: Stable VAR(1) model:

ŷT +3 = Â1 ŷT +2 + Â2 ŷT +1 + · · · + Âp yT −p+3

Forecast error covariance matrix:

Based on Wold decomposition of a stable VAR(p).

IRFs describe the dynamic interactions between endogenous variables.

Psbe cumulated through time: [i, j]-th elements of

Disturbances in different model equations tend to be

We start with an estimated 3-dimensional VAR(1) system:

y1,t = 0.5y1,t−1 + u1,t How IRFs work:

In a stable VAR(p) model, responses to a one-off shock die out over

IRF – orthogonalization example

where var(u1t ) = σ12 and var(u2t ) = σ22 ,