Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Block 5

Download as pdf or txt
Download as pdf or txt
You are on page 1of 46

Block 5

Multivariate Time Series analysis


VAR models & IRFs, VECM

Advanced econometrics 1 4EK608


Pokročilá ekonometrie 1 4EK416

Vysoká škola ekonomická v Praze


Content

1 VAR: Vector autoregression models

2 VAR: Model setup and testing

3 VAR: Forecasting

4 VAR: Impulse-response functions (IRFs)

5 VAR: Variance decomposition

6 VECM

7 VAR & VECM – Other extensions


VAR: Vector autoregression models

VAR model: introduction

Univariate autoregressive models (AR models) describe specific


time-varying processes in nature, economy, etc.
AR processes/models may be either stationary or non-stationary.
AR(p) model (autoregressive model of order p): the modelled variable
depends linearly on its own previous values and a stochastic term
Pp
AR(p): yt = c + ρ1 yt−1 + ρ2 yt−2 + · · · + ρp yt−p + ut = c + i=1
ρi yt−i + ut

VAR models generalize the univariate autoregressive model (AR model)


by allowing for more than one evolving variable:
yt becomes yt , where yt0 = (y1t , y2t , . . . , ymt )
VAR models capture linear interdependencies among multiple time
series
Atheoretical: The only prior knowledge required for VAR modeling is
a list of variables which can be hypothesized to affect each other
inter-temporally
VAR: Vector autoregression models

VAR models: origins


C. Sims (Nobel price) reacted in the 1980ies against SEMs. His arguments:
Large scale macroeconomic models failed in providing governments
with adequate economic forecasts.
For identification, some SEM variables must be omitted (e.g. through
zero restrictions on parameters). Such omissions are often arbitrary
and lack justification (hence undermine model credibility).
Endogenous/exogenous division of variables tends to be arbitrary.
VAR models vs. SEMs:
Even very simple VAR models usually provide more realistic
predictions of variables involved (compared to SEMs).
In VAR models, all variables are treated as endogenous.
No zero-restrictions placed on parameters (such restrictions may be
easily applied if empirically convenient).
VAR models make for a versatile toolbox, many extensions and
generalizations are possible.
VAR: Vector autoregression models
VAR model: notation
We work with m-variable (m-dimensional) VAR(p) models
A two-variable VAR(3) model may be denoted as follows:
3
P
yt = c + A1 yt−1 + A2 yt−2 + A3 yt−3 + ut = c + Ai yt−i + ut
i=1
or:
         
y1t c1 a11,1 a12,1 y1t−1 a11,2 a12,2 y1t−2
= + + +
y2t c2 a21,1 a22,1 y2t−1 a21,2 a22,2 y2t−2
    
a11,3 a12,3 y1t−3 u1t
+ +
a21,3 a22,3 y2t−3 u2t
or:
y1t = c1 + a11,1 y1t−1 + a12,1 y2t−1 + a11,2 y1t−2 + a12,2 y2t−2 + a11,3 y1t−3 +
+ a12,3 y2t−3 + u1t
y2t = c2 + a21,1 y1t−1 + a22,1 y2t−1 + a21,2 y1t−2 + a22,2 y2t−2 + a21,3 y1t−3 +
+ a22,3 y2t−3 + u2t
VAR: Vector autoregression models

VAR model: notation


Any VAR(p) specification can be equivalently rewritten as a VAR(1)
by stacking the lags of the VAR(p) variables and by appending
identities to complete the number of equations;
For example, a VAR(2) model yt = c + A1 yt−1 + A2 yt−2 + ut can be
written as a VAR(1):
        
yt c A1 A2 yt−1 ut
= + +
yt−1 0 I 0 yt−2 0

In general, any m-dimensional VAR(p) model may be re-written as:


Yt = v + AYt−1 + Ut , where:
 
A1 A2 ... Ap−1 Ap
yt v ut
     
 Ik 0 ... 0 0
 yt−1  0 0 Ik 0
0
0 ; U :=  
 
Yt := 
 ... ; v :=  ... ; A :=  .
    t .. 
 .. .. .. .. .. 
  .
. . . .
yt−p+1 0 0
0 0 ... Ik 0

dim: (mp × 1) (mp × 1) (mp × mp) (mp × 1)


VAR: Vector autoregression models
y1t = c1 + a11,1 y1t−1 + a12,1 y2t−1 + a11,2 y1t−2 + a12,2 y2t−2 + a11,3 y1t−3 +
+ a12,3 y2t−3 + u1t
y2t = c2 + a21,1 y1t−1 + a22,1 y2t−1 + a21,2 y1t−2 + a22,2 y2t−2 + a21,3 y1t−3 +
+ a22,3 y2t−3 + u2t

All regressors are lagged variables - they can be assumed


contemporaneously uncorrelated with the disturbances u1t , u2t .
Hence, each equation can be consistently estimated by OLS on
an individual basis.
We have identical regressors in all individual VAR model
equations. Often, we experience contemporaneous correlation
between endogenous variables. Hence, in practical applications,
the elements of ut tend to be contemporaneously correlated.
However, FGLS methodology doesn’t bring any improvement to
model estimation (refer to SURE-related discussion in Block 3).
For forecasting into the t + 1 period, only current and past values
of y variables are required (generally, these are readily available).
VAR: Model setup and testing
What do we need to set up a VAR model?

A (small) set of endogenous variables


In empirical applications, we would use up to six variables
(roughly). However, there is no theoretical upper limit for m,
given adequate (long) TS are available for model estimation.
Decision on a lag length
VAR approach assumes a single lag-length for the whole model
(for each equation).
Longer is preferable with this method. Sufficient lags have to be
included to ensure non-autocorrelated residuals in all equations.
On the other hand, because of the degrees of freedom problem
(over-parameterization), variables frequently have to be excluded
from the model and limit has to be placed on the length of lags.
Decide on inclusion of deterministic/exogenous regressors
Trend, seasonal dummies, “pulse” variables (often, time dummies
controlling for one-off events), exogenous regressors (oil price).
VAR(p) models augmented by deterministic (and lagged
exogenous) regressors are often called VARX(p) models.
VAR: Model setup and testing

How do we choose endogenous variables for a VAR model?

Prior information (economic theory) is applied to select


model variables

Granger Causality tests are applied for setup verification


Definition of Granger Causality (GC): X is said to be a Granger
cause of Y if present Y can be predicted with greater accuracy
by using past values of X rather than not using such past values,
all other relevant information being identical.
Definition easily extends to the situation where X and Y are
multidimensional processes.
GC is a statistical concept {its not an “actual” causality}.
Different types of GC tests exist. (Sims test, modified Sims test)
GC tests apply to stationary series!
VAR: Model setup and testing

Granger Causality Tests: F test for a 2-variable


VAR(p) model (Direct GC Test: X → Y)
GC

In a VAR(p) model with two variables (Y , X), we can use a


simple F test for multiple linear restrictions to test for the H0
of no Granger causality:
We start with a full (unrestricted) VAR(p) model
p p
X X Test statistic:
yt = c + αi yt−i + βi xt−i + ut
(SSRR −SSRU R )
i=1 i=1 F = q(SSRU R )
(T −k) ∼ F (q; T −k)
H0
where
Under H0 of no GC, all βi = 0:
k - number of estimated parameters
p in the UR model
X
yt = c + αi yt−i + ut q - number of restrictions imposed
on the R model
i=1
T - number of observations
VAR: Model setup and testing

Wald test for GC

In an m-dimensional VAR(p) model, we can partition yt into


two processes: xt and zt .
Then, the H0 of non-causality between xt and zt (GC-type)
may be characterized – and tested – using specific zero
constraints on the coefficients of the estimated VAR(p) system.
Wald test may be used as an asymptotical test for such
constraints.

(see Lütkepohl: “New introduction to multiple time series analysis”)


VAR: Model setup and testing

Limitations and drawbacks in Granger Causality testing

In practical applications, the “all other relevant


information being identical” clause in the GC definition
may cause problems, as the results of GC testing between
X and Y are sensitive to information (variables, lags)
included in the system.

Data frequency can have important impact. For example, if


GC is found in monthly data, this does not necessarily
imply GC in daily/weekly/quarterly/annual series of the
same variables. The same applies to seasonally
adjusted/unadjusted series.

GC tests are performed on estimated rather than known


systems.
VAR: Model setup and testing

How do we decide on the lag-length of a VAR(p) model?

When estimating VARs or conducting GC tests, results can be


sensitive to the lag length of the VAR
Sometimes the VAR model lag length corresponds to the data,
such that models with quarterly data have 4 lags, monthly data
are used with 12 lags, etc.
A more rigorous way to determine the optimal lag length is to
use the Akaike or Schwarz-Bayesian information criteria (IC).
However, VAR model estimations tend to be sensitive to the
presence of autocorrelation. In such case, after using IC, if there
is any evidence of autocorrelation, further lags are added, above
the number indicated by the IC, until the autocorrelation is
removed.
VAR: Model setup and testing

How do we decide on the lag-length of a VAR(p) model?

The main information criteria are the Schwarz-Bayesian criteria


(SIC, SBIC) and the Akaike criteria (AIC).
They operate on the basis that there are two competing factors
related to adding more lags to a model. More lags will reduce
the RSS, but also generate some loss of degrees of freedom
(penalty for complexity).
The aim is to minimize the IC value - adding an extra lag will
only benefit the model if the reduction in the RSS outweighs the
loss of degrees of freedom.
In general, the SBIC has a harsher complexity penalty term than
the AIC (sometimes leading to smaller p values).
VAR: Model setup and testing

How do we decide on the lag-length of a VAR(p) model?

Single-equation statistics Multivariate statistics

2k 2k 0
AIC = log(σ̂ 2 ) + MAIC = log |Σ̂| +
T T
k k0
SBIC = log(σ̂ 2 ) + ln T MSBIC = log |Σ̂| + log(T )
T T
where: where:
σ̂ 2 - residual variance Σ̂ - covariance matrix of the residuals
T - sample size T - number of observations
k - number of parameters k0 - total number of regressors
in all equations
VAR: Model setup and testing

Lag-length selection - empirical example:


varsoc oilp igae_s ex_rate impi ppi cpi i_rate if time>= tm(2001m7), maxlag(6)
Selection-order criteria
Sample: 2001m7 - 2013m2 Number of obs = 140
lag LL LR df p FPE AIC HQIC SBIC
0 1951.92 2.0e-21 -27.7845 -27.7247 -27.6374
1 3360.47 2817.1 49 0.000 7.4e-30 -47.2067 -46.7285 -46.03*
2 3458.16 195.39 49 0.000 3.7e-30* -47.9023* -47.0058* -45.6961
3 3494.82 73.311 49 0.014 4.5e-30 -47.7259 -46.411 -44.4901
4 3528.78 67.922 49 0.038 5.7e-30 -47.5111 -45.7778 -43.2457
5 3562.2 66.846 49 0.046 7.4e-30 -47.2886 -45.1369 -41.9936
6 3603.7 82.998 49 0.002 8.8e-30 -47.1814 -44.6113 -40.8569
Endogenous: oilp igae_s ex_rate impi ppi cpi i_rate
Exogenous: _cons
VAR: Model setup and testing

Stability testing of a VAR model:


A VAR(1) proces
yt = v + A1 yt−1 + ut is stable if
the following condition is met:

det(Im − A1 z) 6= 0 for |z| ≤ 1,

alternatively:

lim An
1 = 0m
n→∞

where:
z is a scalar (number)
Im – identity matrix, where m is
the number of variables in a VAR,
0m – (m × m) zero matrix.
Graphical representation of the stability
Note: Any VAR(p) model may be condition (example): If the moduli of the
re-written as VAR(1) . . . eigenvalues of A1 are less than one, then
the VAR(p)-process is stable.
VAR: Model setup and testing

VAR model testing (residuals):

Serial correlation tests: Portmanteau, Breusch & Godfrey


heteroscedasticity: ARCH
Normality tests: Jarque & Bera, etc.
Structural stability: CUSUM

Functions serial(), arch(), normality() and stability()


in package {vars}.
VAR: Forecasting

VAR-based Forecasting: Introduction

Wold decomposition of stable VAR(p) models:


. . . concept useful for forecasting and IRF construction.

A simplified stable VAR(p) model


yt = A1 yt−1 + A2 yt−2 + · · · + Ap yt−p + ut can be written as:

yt = Φ0 ut + Φ1 ut−1 + Φ2 ut−2 . . . ,
Ps
where Φs = j=1 Φs−j Aj for s = 1, 2, . . . ; Aj = 0 for j > p
Φ0 = Im (m × m) Identity matrix.

See Lütkepohl: “New introduction to multiple time series


analysis” for derivation and discussion.
VAR: Forecasting

Wold decomposition examples:

Stable VAR(2) model: Stable VAR(1) model:


. . . simplified: no intercept term . . . any VAR(p) may be written
yt = A1 yt−1 + A2 yt−2 + ut as VAR(1)
may be written as: yt = A1 yt−1 + ut
may be written as:
yt = Φ0 ut + Φ1 ut−1 +
+ Φ2 ut−2 . . . yt = Φ0 ut + Φ1 ut−1 +
+ Φ2 ut−2 . . .
where:
Φ0 = Im where:
Φ1 = Φ0 A1 Φ0 = Im
Φ2 = Φ1 A1 + Φ0 A2 Φ1 = Φ0 A 1 = A 1
Φ3 = Φ2 A 1 + Φ1 A 2 Φ2 = Φ1 A1 = A1 A1 = A21
... Φ3 = Φ2 A1 = A31
Φs = Σsj=1 Φs−j Aj = ...
= Φs−1 A1 + Φs−2 A2 Φs = As1
(s = 1, 2, . . . ; Aj = 0 for j > p) (note that Φ0 = A01 = Im )
VAR: Forecasting

Iterative forecasting
VAR(p) model is estimated using observations t = 1, 2, . . . , T :
yt = Â1 yt−1 + Â2 yt−2 + · · · + Âp yt−p + ût
and for t = T :
yT = Â1 yT −1 + Â2 yT −2 + · · · + Âp yT −p + ûT .
Arbitrarily long forecasts (t = T + 1, T + 2, . . . , T + h) can be
iteratively produced using the estimated A-matrices and the observed
(t = 1, 2, . . . , T ) and predicted (t = T +1, T +2, . . . , T +h) values of yt :
As we move through the
ŷT +1 = Â1 yT + Â2 yT −1 + · · · + Âp yT −p+1 prediction time period,
predicted values are used
ŷT +2 = Â1 ŷT +1 + Â2 yT + · · · + Âp yT −p+2 as regressors for subsequent
periods & predictions . . .

ŷT +3 = Â1 ŷT +2 + Â2 ŷT +1 + · · · + Âp yT −p+3


......
VAR: Forecasting

Forecast error covariance matrix:


 I 0 ... 0
  I 0 ... 0
0
  yT +1 − ŷT +1   Φ I 0 Φ1 I 0
cov ... =  . ..  (Σu ⊗ Ih )  . .. 
yT +h − ŷT +h . .
. . 0 . . 0
Φh−1 Φh−2 ... I Φh−1 Φh−2 ... I

where
Σu = cov (ut ) is the Forecast from a VAR(5): example
white noise covariance (confidence levels included)
matrix,
(Σu ⊗Ih ) is a Kroneker
product;
Ih is h × h,
Φi are the coefficient
matrices of the Wold
moving average repre-
sentation of a stable
VAR(p)-process.
VAR: Impulse-response functions (IRFs)

Based on Wold decomposition of a stable VAR(p).

IRFs describe the dynamic interactions between endogenous variables.


(provided yt is stationary!)

The [i, j]-th elements of the matrices Φs are (interpreted as) the
expected response of variable y(i,t+s) to a unit change in variable yj,t .

Psbe cumulated through time: [i, j]-th elements of


IRF can
Cs = l=1 Φl measure the accumulated response of variable yi,t+s to
a unit change in variable yj,t .

IRFs are used for policy analysis: for individual shocks (shocks in
different model equations), we can study the dynamic effects on all
variables in the model.

Disturbances in different model equations tend to be


contemporaneously correlated: we cannot realistically simulate
isolated shocks. Solution: model transformation/orthogonalization. . .
VAR: Impulse-response functions (IRFs)
IRF example:
(from Lütkepohl: “New introduction to multiple time series analysis”)

We start with an estimated 3-dimensional VAR(1) system:

y1,t = 0.5y1,t−1 + u1,t How IRFs work:


y2,t = 0.1y1,t−1 + 0.1y2,t−1 + 0.3y3,t−1 + u2,t Say, at time t, we have
a unit disturbance in
y3,t = + 0.2y2,t−1 + 0.3y3,t−1 + u3,t
y2,t (isolated contem-
This may be re-written as poraneous disturbance
through u2,t ). At time
" # " #" # " #
y1,t 0.5 0 0 y1,t−1 u1,t t + 1, it causes: y2,t+1
y2,t = 0.1 0.1 0.3 y2,t−1 + u2,t changes by 0.1 and
y3,t 0 0.2 0.3 y3,t−1 u3,t y3,t+1 changes by 0.2
and y1,t+1 is unaffected
" #
0.5 0 0
Hence, A1 = 0.1 0.1 0.3 Also, for a VAR(1) model: Φs = As1
0 0.2 0.3
VAR: Impulse-response functions (IRFs)
IRF example contd.
Using Φs = As1 , the IRFs may be generated as follows:

" #
1 0 0
Φ0 = A01 = 0 1 0 ,
0 0 1
" #
0.500 0 0
Φ1 = A 1 = 0.100 0.100 0.300 ,
0 0.200 0.300
0.250 0 0
 
0.060 0.070 0.120
Φ2 = A21 =  ,

[3,1]-th elements of Φs = As1
0.020 0.080 0.150
[VAR(1) model . . . ] matrices mea-
sure the response of variable y3,t+s
0.125 0 0
 
to a unit change in variable y1,t
0.037 0.031 0.057 (a unit u1,t disturbance occurs at
Φ3 = A31 =  ,

t = 0).
0.018 0.038 0.069
Φs : “Impulses/shocks in
columns and responses in rows”
...
VAR: Impulse-response functions (IRFs)
IRF example contd.
Cumulative
Ps IRFs may be easily produced and plotted using
Cs = l=1 Φl :
IRF (CPI → GDP) Cumulative IRF (CPI → GDP)

In a stable VAR(p) model, responses to a one-off shock die out over


time (shown left). Hence, the accumulated IRF converges to some
constant value (shown right).
Important extensions to VARs are based on assumptions (zero restrictions)
on the (asymptotic) behavior of Cs (Blanchard-Quah decomposition, see:
Lütkepohl: “New introduction to multiple time series analysis”)
VAR: Impulse-response functions (IRFs)

IRF – orthogonalization example


Previous example is based on – a very strong – assumption of
uncorrelated random elements of the vector ut . Usually, we cannot
realistically simulate isolated shocks on observed variables.
2-dimensional VAR(1) model with correlated errors example:
      
y1t a11 a12 y1t−1 u
= + 1t
y2t a21 a22 y2t−1 u2t

where var(u1t ) = σ12 and var(u2t ) = σ22 ,


and, most importantly, cov(u1t , u2t ) = E(u1t · u2t ) = cov12 6= 0.
| {z }
h i
⇒ It is unrealistic to simulate isolated unit disturbances to ut as 10 .

If cov12 6= h0, then – ifor ah unitidisturbance in u1t – we have:


1
dist(ut ) =
E(1 · u2t )
= cov1
12
VAR: Impulse-response functions (IRFs)

IRF – orthogonalization example contd.


h i h ih i h i
y a a y u
Base VAR model: y1t = a11 a12 y1t−1 + u1t
2t 21 22 2t−1 2t

In our VAR(1) model, responses to a unit disturbance in u1t are:


h i h i h ih i h i
1 y a a 1 a
For a ut disturbance 0 : E∆ y1t+1 = a11 a12 0 = a11
2t+1 21 22 21
h i
1
For a ut disturbance cov :
12
h i h ih i h i
y1t+1 a a 1 a + a cov
E∆ y = a11 a12 cov = a11 + a12 cov12
2t+1 21 22 12 21 22 12

It is virtually impossible to study the isolated effects of individual


disturbances (analysis is even more complicated for p > 1 & m > 2)
VAR: Impulse-response functions (IRFs)

IRF – orthogonalization example contd.


      
y a a12 y1t−1 u
Our sample VAR(1) model: 1t = 11 + 1t
y2t a21 a22 y2t−1 u2t
with var(u1t ) = σ12 , var(u2t ) = σ22 and cov12 6= 0,
errors may be transformed (orthogonalized) as follows:
      
y1t a11 a12 y1t−1 u1t
= +
(y2t − δy1t ) (a21 − δa11 ) (a22 − δa12 ) y2t−1 (u2t − δu1t )

where δ = cov12 /σ12 and cov(u1t , (u2t − δu1t )) = 0

Hence, IRFs based on the transformed model depict the


isolated effects of a given unit disturbance.
VAR: Impulse-response functions (IRFs)

IRF orthogonalization – a generalized approach & notation:

In a VAR(p) model, Σu = cov(ut ) is the error-term covariance matrix


and it may be expressed as Σu = P P 0 where P is a lower triangular
matrix (assumptions on Σu apply).
Orthogonalized IRFs from a VAR(p) model
yt = A1 yt−1 + A2 yt−2 + · · · + Ap yt−p + ut
may be calculated using the transformed MA representation:
yt = Ψ0 εt + Ψ1 εt−1 + Ψ2 εt−2 . . . ,
where: εt = P −1 ut
Ψi = Φi P for i = 1, 2, . . .
Ψ0 = P .
(Use bootstrapped confidence intervals for the orthogonalized IRFs)
. . . see: Lütkepohl: “New introduction to multiple time series
analysis” for detailed description.
VAR: Impulse-response functions (IRFs)

IRF orthogonalization – final remarks:

Σu = P P 0 and P –based transformation is sensitive to the


ordering of equations.
i.e. different ordering of variables in y often yields different
orthogonalized IRFs!

Orthogonalized innovations are difficult to interpret... Even


if y1t and y2t are well defined, the dimension of (y2t − δy1t )
/ see previous example / often has no economic
interpretation.
We treat orthogonalized IRFs as dimensionless series . . .

Generalized orthogonalization approaches exist (IRFs


independent of y ordering).
. . . see: Lütkepohl: “New introduction to multiple time
series analysis”
VAR: Variance decomposition

Forecast Error Variance Decomposition (FEVD)

FEVD: based on orthogonalised impulse response


coefficient matrices Ψ
Used to analyse the contribution of variable j to the h-step
FEV of variable k.
R: use fevd() in {vars}
VAR: Final remarks

Are VAR models atheoretical?


Essentially yes, there are no prior restrictions on
parameters in VARs.
Often, estimated VARs may lead to models consistent with
economic theory. For example, for yt0 = (Unemplt , CPIt ),
VAR(p) models often generate IRFs consistent with
Phillip’s theory. For verification, we use causality tests.

IRF critique
If “important” variables are dropped from a VAR model,
IRFs may be significantly distorted.
In most practical cases (yet, not generally) predictions from
such “reduced” VARs might remain largely unaffected.
VAR: Final remarks

Selected VAR-related topics & extensions


not covered in this course:
Structural VAR models: SVARs
Time-varying (and/or) factor-augmented VARs
Blanchard-Quah decomposition
. . . many extensions to VARs exist
VECM: Introduction

VAR models are well suited to model systems of


nonstationary cointegrated variables.
Forecasting is possible,
IRFs do not converge to zero over time if the underlying
series are non-stationary . . .
For cointegrated series, we can use the error correction
mechanism (ECM) to model short-time dynamics.
Such models are named Vector Error Correction Models
(VECMs).
Long term dynamics in the m-dimensional system of
variables [given cointegrating relationship(s) exist(s)] is
used in a VECM: an ECM-like model in first differences.
Most of the previous discussion of VAR models can be
adequately applied to VECMs.
VECM-specific topics follow
VECM: Number of cointegrating vectors
For an m-dimensional I(1) / non-stationary / vector y = (y1 , y2 , . . . , ym )0 ,
there are
0 ≤ r < m possible cointegrating vectors [r = 0 ⇒ non-CI series]
and (m − r) common stochastic trends [if r is nonzero]
(Proof comes from a Beveridge-Nelson decomposition of ∆y)

y = (y1 , y2 )0 : max. 1 linearly independent cointegrating vector: r ∈ {0; 1}


If any (αy1 + βy2 ) ∼ I(0), we can find ∞ linear combinations of (α, β)
that lead to I(0) processes.
Hence, it is easy and common to normalize the CI relationship by
setting α = 1
If y1 , y2 ∼ CI(1, 1), a cointegrating vector β = (1, −β2 )0 exists, such
that (y1 − β2 y2 ) ∼ I(0)

Example of a CI system for y = (y1 , y2 )0 with β = (1, −β2 )0 :


1 y1t = β2 y2t + ut , ⇐ one CI relationship
2 y2t = y2t−1 + vt , ⇐ one common “stochastic trend”
where ut , vt ∼ I(0)
VECM: Number of cointegrating vectors

Ex.: CI system for y = (y1 , y2 , y3 )0 with r = 1 an β = (1, −β2 , −β3 )0 :


1 y1t = β2 y2t + β3 y3t + ut , ⇐ one CI relationship
2 y2t = y2t−1 + vt , ⇐ first common “stochastic trend”
3 y3t = y3t−1 + wt , ⇐ second common “stochastic trend”
where ut , vt , wt ∼ I(0)
The 1st equation describes the long-run equilibrium, 2nd and 3rd equations
describe the (m − r) common stochastic trends.
Possible economic setup for the example:
y1t ∼ I(1) nominal F/X rate index for 2 currencies (domestic/foreign),
y2t ∼ I(1) domestic inflation index (e.g. CPI),
y3t ∼ I(1) foreign inflation.
According to Zivot and Wang, we can test a hypothesis of stationary
combination of the three series: real exchange rate ∼ I(0).
(with equal domestic and foreign inflation, F/X rate is unchanged)
VECM: Number of cointegrating vectors
Example of a CI system for y = (y1 , y2 , y3 )0 with r = 2
1 y1t = β13 y3t + ut , ⇐ first CI relationship
2 y2t = β23 y3t + vt , ⇐ second CI relationship
3 y3t = y3t−1 + wt , ⇐ a common “stochastic trend”
where ut , vt , wt ∼ I(0)
Here, we have two CI vectors: β1 = (1, 0, −β13 )0 and β2 = (0, 1, −β23 )0 .
Remember: β1 and β2 are normalized - yet not unique - stationary
combinations of the variables.
Any linear combination: β3 = c1 β1 + c2 β2 is also a CI vector
β1 and β2 form the basis (span the space) of
cointegrating vectors. . .
Financial market-based setup for the example:
y1t ∼ I(1) Long-term interest rate “1” (say, 3M),
y2t ∼ I(1) Long-term interest rate “2” (say, 6M),
y3t ∼ I(1) Short-term interest rate (say, overnight).
CI relationships indicate that the spreads between the short-run and
long-run rates are stationary, i.e. I(0).
VECM: Construction

Start with a VAR(p) model for an m-dimensional I(1) / non-stationary /


vector y = (y1 , y2 , . . . , ym )0 :
yt = ΘDt + A1 yt−1 + A2 yt−2 + · · · + Ap yt−p + ut , t = 1, 2, . . . , T ,
where Dt contains deterministic terms (constant, trend, seasonality, etc.)
Even if y series are cointegrated through some CI vector β, the
cointegrating relationship is “hidden” in the VAR(p) representation – it
only becomes apparent in first differences-based VECM:
This is the error correction
element of the VECM spec-
The VECM model is defined as follows: ification

∆yt = ΘDt + Πyt−1 + Γ1 ∆yt−1 + · · · + Γp−1 ∆yt−p+1 + εt ,

where Π = A1 + A2 + · · · + Ap − Im×m is the long-run impact matrix


0 ≤ rank(Π) < m defines r, the number of CI vectors
Pp
Γk = − j=k+1
Aj k = 1, . . . , p − 1 : short-run impact matrices

Note that VAR(p) is transformed into a “VECM(p − 1)”


VECM: Construction

VAR(1) model for a 2-dimensional vector y = (y1 , y2 )0 ; y1 , y2 ∼ CI(1, 1):


yt = A1 yt−1 + ut , t = 1, 2, . . . , T ,

For a bivariate VAR, only one cointegrating vector β = (1, −β2 )0 can exist
/normalized/. The VECM(0) model – no (Γp−j ∆yt−p+j ) terms – is defined
as follows:
∆yt = Πyt−1 + εt ,
where Π = (A1 − I2 ) is a 2 × 2 matrix with rank:
 r = 1.  
α1 α1 −α1 β2
Π may be decomposed as follows: Π = (1, −β2 ) =
α2 α2 −α2 β2
To understand the decomposition, we may re-write the VECM:
∆y1t = α1 (y1,t−1 − β2 y2,t−1 ) + ε1t
1st equation relates changes in y1t to disequilibrium error (y1,t−1 −β2 y2,t−1 )
∆y2t = α2 (y1,t−1 − β2 y2,t−1 ) + ε2t
2nd equation relates changes in y2t to disequilibrium error (y1,t−1 −β2 y2,t−1 )
VECM: Johansen’s methodology

1 Specify & estimate the m-dimensional VAR(p) model for


yt .
2 Construct (Likelihood ratio) tests to determine the rank of
Π, i.e. to determine the number of cointegrating vectors.
3 Impose normalization and identifying restrictions on the
cointegrating vectors (if necessary).
4 Given normalized CI vectors, estimate the resulting
cointegrated VECM using maximum likelihood.
X IRFs and forecasts may be generated after a VECM is
estimated . . .
VECM: Johansen’s methodology

∆yt = ΘDt + Πyt−1 + Γ1 ∆yt−1 + · · · + Γp−1 ∆yt−p+1 + εt ,

If yt ∼ I(1) ⇒ Π is a singular matrix: 0 ≤ rank(Π) < m

1 rank (Π) = 0 ⇒ Π = 0
yt is not cointegrated and the VECM is a VAR on 1st diffs.

2 0 < rank(Π) = r < m


yt is cointegrated with r linearly independent CI vectors
and (m − r) common stochastic trends.

3 . . . rank(Π) = m ⇐⇒ yt ∼ I(0)
full rank of Π means that yt is in fact stationary.
VECM: Johansen’s methodology

Johansen’s Trace Statistic & test

Based on the estimated eigenvalues of the matrix Π:


λ̂1 > λ̂2 > · · · > λ̂m where 0 ≤ λj < 1

H0 (r) : r = r0
H1 (r0 ) : r > r0
(where r0 is the # of nonzero eigenvalues under H0 .)

Under H0 , eigenvalues λ̂r0 +1 , . . . , λ̂m should be close to


zero (as well as the LRtr statistic):
m
X
LRtr (r0 ) = − T log(1 − λ̂i )
i=r0 +1

Under H0 , LRtr follows a multivariate Dickey-Fuller


distribution.
VECM: Johansen’s methodology

Testing sequence for Johansen’s Trace-Statistic test

1 H0 : r = 0 vs. H1 : 0 < r ≤ m
2 H0 : r = 1 vs. H1 : 1 < r ≤ m
3 H0 : r = 2 vs. H1 : 2 < r ≤ m
··· ··· ···
m H0 : r = m − 1 vs. H1 : r = m

We keep increasing r until we no longer reject the null.


VECM: Johansen’s methodology

Johansen’s Maximum Eigenvalue statistic & test

Based on the estimated eigenvalues of the matrix Π:


λ̂1 > λ̂2 > · · · > λ̂m where 0 ≤ λj < 1

H0 (r0 ) : r = r0
H1 (r0 ) : r = r0 + 1

LRmax (r0 ) = − T log(1 − λ̂r0 +1 )

Under H0 , LRmax follows a complex multivariate


distribution (critical values implemented in SW).

Testing sequence is analogous


VAR & VECM – Other extensions

Advanced analysis of univariate and multivariate time series


(with examples in R)

http://faculty.washington.edu/ezivot/econ589/manual.pdf

You might also like