Var Pres

Introduction Stable VAR Processes
Vector autoregressions
Based on the book New Introduction to Multiple Time Series
Analysis by Helmut L
utkepohl
Robert M. Kunst
robert.kunst@univie.ac.at
University of Vienna
and
Institute for Advanced Studies Vienna
November 3, 2011
Vector autoregressions University of Vienna and Institute for Advanced Studies Vienna
Outline
Introduction
Stable VAR Processes
Basic assumptions and properties
Forecasting
Structural VAR analysis
Objectives of analyzing multiple time series
Main objectives of time series analysis may be:
1. Forecasting: prediction of the unknown future by looking at
the known past:
y
T+h
= f (y
T
, y
T1
, . . .)
denotes the h-step prediction for the variable y;
2. Quantifying the dynamic response to an unexpected shock to
a variable by the same variable h periods later and also by
other related variable: impulse-response analysis;
3. Control: how to set a variable in order to achieve a given time
path in another variable; description of system dynamics
without further purpose.
Some basics: stochastic process
Assume a probability space (, F, Pr ). A (discrete) stochastic
process is a real-valued function
y : Z R,
such that, for each xed t Z, y(t, ) is a random variable. Z is a
useful index set that represents time, for example Z = Z or Z = N.
Some basics: multivariate stochastic process
A (discrete) Kdimensional vector stochastic process is a
real-valued function
y : Z R
K
,
such that, for each xed t Z, y(t, ) is a Kdimensional random
vector.
A realization is a sequence of vectors y
t
(), t Z, for a xed . It
is a function Z R
K
. A multiple time series is assumed to be a
nite portion of a realization.
Given such a realization, the underlying stochastic process is called
the data generation process (DGP).
Vector autoregressive processes
Let y
t
= (y
1t
, . . . , y
Kt
)
, = (
1
, . . . ,
K
)
, and
A
j
=
_
11,j

1K,j
.
.
.
.
.
.
.
.
.
K1,j

KK,j
_
_
.
Then, a vector autoregressive process (VAR) satises the equation
y
t
= + A
1
y
t1
+ . . . + A
p
y
tp
+ u
t
,
with u
t
a sequence of independently identically distributed random
Kvectors with zero mean (conditions relaxed later).
Forecasting using a VAR
Assume y
t
follows a VAR(p). Then, the forecast y
T+1
is given by
y
T+1
= + A
1
y
T
+ . . . + A
p
y
Tp+1
,
i.e. the systematic part of the dening equation. Note that this
also denes a forecast for each component of y
T+1
.
A owchart for VAR analysis
Forecasting
Structural
analysis
Model checking
Specification and
estimation of VAR model
model accepted
model
rejected
The VAR(p) model
The object of interest is the vector autoregressive process of order
p that satises the equation
y
t
= + A
1
y
t1
+ . . . + A
p
y
tp
+ u
t
, t = 0, 1, 2, . . .
with u
t
assumed as Kdimensional white noise, i.e. Eu
t
= 0,
Eu
s
u
t
= 0 for s = t, and Eu
t
u
t
= with nonsingular
(conditions relaxed).
First we concentrate on the VAR(1) model
y
t
= + A
1
y
t1
+ u
t
.
Substituting in the VAR(1)
Continuous substitution in the VAR(1) model yields
y
1
= + A
1
y
0
+ u
1
,
y
2
= (I
K
+ A
1
) + A
2
1
y
0
+ A
1
u
1
+ u
2
,
.
.
.
y
t
= (I
K
+ A
1
+ . . . + A
t1
1
) + A
t
1
y
0
+
t1
j =0
A
j
1
u
tj
,
such that y
1
, . . . , y
t
can be represented as a function of
y
0
, u
1
, . . . , u
t
. All y
t
, t 0, are a function of just one starting
value and the errors.
The Wold representation of the VAR(1)
If all eigenvalues of A
1
have modulus less than one, substitution
can be continued using the y
j
, j < 0, and the limit exists:
y
t
= (I
K
A
1
)
1
+
j =0
A
j
1
u
tj
, t = 0, 1, 2, . . . ,
and the constant portion can be denoted by .
The matrix sequence converges according to linear algebra results.
The random vector converges in mean square due to an important
statistical lemma.
Convergence of sums of stochastically bounded processes
Theorem
Suppose (A
j
) is an absolutely summable sequence of real
(K K)matrices and (z
t
) is a sequence of Kdimensional random
variables that are bounded by a common c R in the sense of
E(z
t
z
t
) c, t = 0, 1, 2, . . . .
Then there exists a sequence of random variables (y
t
), such that
n
j =n
A
j
z
tj
y
t
,
as n , in quadratic mean. (y
t
) is uniquely dened except on a
set of probability 0.
Aspects of the convergent sum
The matrices converge geometrically and hence absolutely, and the
theorem applies. The limit in the Wold representation is well
dened.
This is called a Wold representation, as Wolds Theorem

provides an innite-order moving-average representation for all
univariate covariance-stationary processes.
Note that the white-noise property was not used. The sum
would even converge for time-dependent u
t
.
Expectation of the stationary VAR(1)
The Wold-type representation implies
E(y
t
) = (I
K
A
1
)
1
= .
This is due to the fact that Eu
t
= 0 for the white-noise terms and
a statistical theorem that permits exchanging the limit and
expectation operations under the conditions of the lemma. Note
that the white-noise property (uncorrelated sequence) is not used.
Second moments of the stationary VAR(1)
Luetkepohl presents a derivation of the cross-covariance
function
y
(h) = E(y
t
)(y
th
)
= lim
n
n
i =0
n
j =0
A
i
1
E(u
ti
u
tj h
)(A
j
1
)
= lim
n
i =0
A
h+i
1

u
(A
i
1
)
i =0
A
h+i
1

u
(A
i
1
)
,
which uses E(u
t
u
s
) = 0 for s = t, E(u
t
u
t
) =
u
, and a corollary
to the lemma that permits evaluation of second moments under
the same conditions. Here, the white-noise property of u
t
is used.
The denition of a stable VAR(1)
Denition
A VAR(1) is called stable i all eigenvalues of A
1
have modulus
less than one. By a mathematical lemma, this condition is
equivalent to
det(I
K
A
1
z) = 0 for |z| 1.
No roots within or on the unit circle. Note that this denition
diers from stability as dened by other authors. Stability is not
equivalent to stationarity: a stable process started in t = 1 is not
stationary; a backward-directed entirely unstable process is
stationary.
Representation of VAR(p) as VAR(1)
All VAR(p) models of the form
y
t
= + A
1
y
t1
+ . . . + A
p
y
tp
+ u
t
can be written as VAR(1) models
Y
t
=
+AY
t1
+ U
t
,
with
A =
_
_
A
1
A
2
. . . A
p1
A
p
I
K
0 . . . 0 0
.
.
.
.
.
.
.
.
.
0 0 . . . I
K
0
_
_
.
More on the state-space VAR(1) form
In the VAR(1) representation of a VAR(p), the vectors Y
t
,
, and
U
t
have length Kp:
Y
t
=
_
_
y
t
y
t1
. . .
y
tp+1
_
_
,
=
_
0
. . .
0
_
_
, U
t
=
_
_
u
t
0
. . .
0
_
_
.
The big matrix A has dimension Kp Kp. This state-space form
permits using all results from VAR(1) for the general VAR(p).
Stability of the VAR(p)
Denition
A VAR(p) is called stable i all eigenvalues of A have modulus less
than one. By a mathematical lemma, this condition is equivalent to
det(I
Kp
Az) = 0 for |z| 1.
This condition is equivalent to the stability condition
det(I
K
A
1
z . . . A
p
z
p
) = 0 for |z| 1,
which is usually more ecient to check. Equivalence follows from
the determinant properties of partitioned matrices.
The innite-order MA representation of the VAR(p)
The stationary stable VAR(p) can be represented in the convergent
innite-order MA form
Y
t
=
j =0
A
j
U
tj
.
This is, however, still an inconvenient process of dimension Kp.
Formally, the rst K entries of the vector Y
t
are obtained via the
(K Kp)matrix
J = [I
K
: 0 : . . . : 0]
as y
t
= JY
t
.
The Wold representation of the VAR(p)
Using J, it follows that
y
t
= J
+ J
j =0
A
j
U
tj
= +
j =0
JA
j
J
JU
tj
= +
j =0
j
u
tj
for the stable and stationary VAR(p), a Wold representation with
j
= JA
j
J
. This is the canonical or fundamental or

prediction-error representation.
First and second moments of the VAR(p)
Applying the lemma to the MA representation yields E(y
t
) = and
y
(h) = E(y
t
)(y
th
)
= E
_
h1
i =0
i
u
ti
+
i =0
h+i
u
thi
_
_
_
j =0
j
u
thj
_
_
i =0
h+i
i
The Wold-type representation with lag operators
Using the operator L dened by Ly
t
= y
t1
permits writing the
AR(p) model as
y
t
= + (A
1
L + . . . + A
p
L
p
)y
t
+ u
t
or, with A(L) = 1 A
1
L . . . A
p
L
p
,
A(L)y
t
= + u
t
.
Then, one may write (L) =
j =0

j
L
j
and
y
t
= + (L)u
t
= A
1
(L)( + u
t
),
thus formally A(L)(L) = I or (L) = A
1
(L). Note that A(L) is
a polynomial and (L) is a power series.
Remarks on the lag operator representation
The property (L)A(L) = I allows to determine

j
iteratively
by comparing coecient matrices;
Note that = A
1
(L) = A
1
(1) and that
A(1) = 1 A
1
. . . A
p
;
It is possible that A
1
(L) is a nite-order polynomial, while
this is impossible for scalar processes;
The MA representation exists i the VAR(p) is stable, i.e. i

all zeros of det(A(z)) are outside the unit circle: A(L) is
called invertible.
Remarks on stationarity
Formally, covariance stationarity of Kvariate processes is dened
by constancy of rst moments Ey
t
= t and of second
moments
E(y
t
)(y
th
)
=
y
(h) =
y
(h)
t, h = 0, 1, 2, . . .
Strict stationarity is dened by time invariance of all
nite-dimensional joint distributions. Here, stationarity refers to
covariance stationarity, for example in the proposition:
Proposition
A stable VAR(p) process y
t
, t Z, is stationary.
Yule-Walker equations for VAR(1) processes
Assume the VAR(1) is stable and stationary. The equation
y
t
= A
1
(y
t1
) + u
t
can be multiplied by (y
th
)
from the right. Application of

expectation yields
E(y
t
)(y
th
)
= A
1
E{(y
t1
)(y
th
)
}+Eu
t
(y
th
)
or
y
(h) = A
1
y
(h 1)
for h 1.
The system of Yule-Walker equations for VAR(1)
For the case h = 0, the last term is not 0:
E(y
t
)(y
t
)
= A
1
E{(y
t1
)(y
t
)
} + Eu
t
(y
t
)
or
y
(0) = A
1
y
(1) +
u
= A
1
y
(1)
+
u
,
which by substitution from the equation for h = 1 yields
y
(0) = A
1
y
(0)A
1
+
u
,
which can be transformed to
vec
y
(0) = (I
K
2 A
1
A
1
)
1
vec
u
,
an explicit formula to obtain the process variance from given
coecient matrix and error variance.
How to use the Yule-Walker equations for VAR(1)
For synthetic purposes, rst evaluate

y
(0) from given A
1
and
u
;
Then, the entire ACF is obtained from

y
(h) = A
h
1
y
(0);
The big matrix in the h = 0 equation must be invertible, as

the eigenvalues of A
1
A
1
are the squares of the eigenvalues
of A
1
, which have modulus less than one;
Sure, the same trick works for VAR(p), as they have a

VAR(1) representation, but you have to invert
((Kp)
2
(Kp)
2
)matrices;
For analytic purposes, A

1
=
0
1
1
can be used to estimate
A
1
from the correlogram.
Autocorrelations of stable VAR processes
Autocorrelations are often preferred to autocovariances. Formally,
they are dened via
ij
(h) =

ij
(h)
_
ii
(0)
_
jj
(0)
from the autocovariances for i , j = 1, . . . , K and h Z. The
matrix formula
R
y
(h) = D
1
y
(h)D
1
with D = diag(
11
(0)
1/2
, . . . ,
KK
(0)
1/2
) is given for completeness.
Forecasting
The forecasting problem
Based on an information set
t
{y
s
, s t} available at t, the
forecaster searches an approximation y
t
(h) to the unknown y
t+h
that minimizes some expected loss or cost
E{g(y
t+h
y
t
(h))|
t
}.
The most common loss function g(x) = x
2
minimizes the forecast
mean squared errors (MSE). t is the forecast origin, h is the
forecast horizon, y
t
(h) is an hstep predictor.
Forecasting
Conditional expectation
Proposition
The hstep predictor that minimizes the forecast MSE is the
conditional expectation
y
t
(h) = E(y
t+h
|y
s
, s t).
Often, the casual notation E
t
(y
t+h
) is used.
This property (proof constructive) also applies to vector processes
and to VARs, where the MSE is dened by
MSE(y
t
(h)) = E{y
t+h
y
t
(h)}{y
t+h
y
t
(h)}
.
Forecasting
Conditional expectation in a VAR
Assume u
t
is independent white noise (martingale dierence
sequence with E(u
t+1
|u
s
, s t) = 0 suces), then for a VAR(p)
E
t
(y
t+1
) = + A
1
y
t
+ A
2
y
t1
+ . . . + A
p
y
tp+1
,
and, recursively,
E
t
(y
t+2
) = + A
1
E
t
(y
t+1
) + A
2
y
t
+ . . . + A
p
y
tp+2
,
etc., which allows the iterative evaluation for all horizons.
Forecasting
Larger horizons for a VAR(1)
By repeated insertion, the following formula is easily obtained:
E
t
(y
t+h
) = (I
K
+ A
1
+ . . . + A
h1
1
) + A
h
1
y
t
,
which implies that the forecast tends to become trivial as h
increases, given the geometric convergence in the last term.
Forecasting
Forecast MSE for VAR(1)
The MA representation y
t
= +
j =0
A
j
1
u
tj
clearly decomposes
y
t+h
into the predictor known in t and the remaining error, such
that
y
t+h
y
t
(h) =
h1
j =0
A
j
1
u
t+hj
,
and
y
(h) = MSE(y
t
(h)) = E
_
_
h1
j =0
A
j
1
u
t+hj
_
_
_
_
h1
j =0
A
j
1
u
t+hj
_
_
=
h1
j =0
A
j
1
u
(A
j
1
)
= MSE(y
t
(h 1)) + A
h1
1

u
(A
h1
1
)
,
such that MSE increases in h.
Forecasting
Forecast MSE for general VAR(p)
Using the Wold-type MA representation y
t
= +
j =0

j
u
tj
, a
scheme analogous to p = 1 works for VAR(p) with p > 1, using J.
The forecast error variance is
y
(h) = MSE(y
t
(h)) =
h1
j =0
j
,
which converges to
y
=
y
(0) for h .
These MSE formulae can also be used to determine interval
forecasts (condence intervals).
There are three (interdependent) approaches to the interpretation
of VAR models:
1. Granger causality
2. Impulse response analysis
3. Forecast error variance decomposition (FEVD)
Granger causality
Assume two M- and Ndimensional sub-processes x and z of a
Kdimensional process y, such that y = (z
, x
.
Denition
The process x
t
is said to cause z
t
in Grangers sense i
z
(h|
t
) <
z
(h|
t
\ {x
s
, s t})
for some t and h.
The set
t
is an information set containing y
s
, s t; the matrix <
is dened via positive deniteness of the dierence; the correct
interpretation of the \ operator is doubtful.
The property is not antisymmetric: x may cause z and z may also
cause x: feedback.
Instantaneous Granger causality
Again, assume two M- and Ndimensional sub-processes x and z
of a Kdimensional process y.
Denition
There is instantaneous causality between process x
t
and z
t
in
Grangers sense i
z
(1|
t
{x
t+1
}) <
z
(1|
t
).
The property is symmetric: x and z can be exchanged in the
denition: instantaneous causality knows no direction.
Granger causality in a MA model
Assume the representation
y
t
=
_
z
t
x
t
_
=
_
2
_
+
_
11
(L)
12
(L)
21
(L)
22
(L)
_ _
u
1t
u
2t
_
.
It is easily motivated that x does not cause z i
12,j
= 0 for all j .
Granger causality in a VAR
A stationary stable VAR has an MA representation, so Granger
causality can be checked on that one. Alternatively, consider the
partitioned VAR
y
t
=
_
z
t
x
t
_
=
_
2
_
+
p
j =1
_
A
11,j
A
12,j
A
21,j
A
22,j
_ _
z
tj
x
tj
_
+
_
u
1t
u
2t
_
.
It is easily shown that x does not cause z i A
12,j
= 0, j = 1, . . . , p
(block inverse of matrix).
Remarks on testing for Granger causality in a VAR
The property that

12,j
= 0 characterizes non-causality is not
restricted to VARs: it works for any process with a Wold-type
MA representation;
The property may also be generalized to x and z being two

sub-vectors of y with M + N < K. Some extensions, however,
do not work properly for the VAR representation, just for the
MA representation;
The denition is sometimes modied to hstep causality,

meaning x
t
does not improve z
t
(j ), j < h but does improve
z
t
(h): complications in naive testing for the VAR form,
though not for the MA form.
Characterization of instantaneous causality
Proposition
Let y
t
be a VAR with nonsingular innovation variance matrix
u
.
There is no instantaneous causality between x
t
and z
t
i
E(u
1t
u
2t
) = 0.
This condition is certainly symmetric.
Instantaneous causality and the non-unique MA
representation
Consider the Cholesky factorization of
u
= PP
, with P lower
triangular. Then, it holds that
y
t
= +
j =0
j
PP
1
u
tj
= +
j =0
j
w
tj
,
with
j
=
j
P and w = P
1
u and
w
= P
1
u
(P
1
)
= I
K
.
In this form, instantaneous causality corresponds to
21,0
= 0,
which looks asymmetric. An analogous form and condition is
achieved by exchanging x and z.
Impulse response analysis: the idea
The researcher wishes to add detail to the Granger-causality
analysis and to quantify the eect of an impulse in a component
variable y
j ,t
on another component variable y
k,t
.
The derivative y
k,t+h
/y
j ,t
cannot be determined from the VAR
model. The derivative
y
k,t+h
u
j ,t
corresponds to the (k, j ) entry in the matrix
h
of the MA
representation. It is not uniquely determined. The matrix of graphs
of
kj ,h
versus h is called the impulse response function (IRF).
Impulse response analysis: general properties
If y
j
does not Granger-cause y
k
, the corresponding impulse
response in (k, j ) is constant zero;
If the rst p(K 1) values of an impulse response are 0, then

all values are 0;
If the VAR is stable, all impulse response functions must

converge to 0 as h ;
It is customary to scale the impulse responses by the standard

deviation of the response variable

kk
;
The impulse response based on the canonical MA

representation
h
, h N, ignores the correlation across the
components of u in
u
and may not correspond to the true
reaction.
Orthogonal impulse response
Re-consider the alternative MA representation based on
u
= PP
,
j
=
j
P, w = P
1
u,
that is,
y
t
= +
j =0
j
w
tj
.
Because of
w
= I
K
, shocks are orthogonal. Note that w
j
is a
linear function of u
k
, k j . The resulting matrix of graphs
kj ,h
versus h is an orthogonal impulse response function (OIRF).
Orthogonal impulse response: properties
Because of the multiplication by the matrix P, diagonal

entries in
0
will not be ones. This problem can be remedied
simply via a diagonal matrix, such that

0
has diagonal ones
and
w
is diagonal;
The OIRF can be quite dierent from the IRF based on

h
. If
there is no instantaneous causality, both will coincide;
The orthogonal IRF based on a re-ordering of variable

components will dier from the correspondingly re-ordered
OIRF. Additional to the permutations, a continuum of OIRF
versions may be considered.
Ways out of the arbitrariness dilemma
Some researchers suggest to arrange the vector of components

such that the a priori most exogenous variable appears rst
etc.;
The generalized impulse response function (GIRF) according

to Pesaran summarizes the OIRF for each response variable
suering the maximum response (coming last in the vector).
It is not an internally consistent IRF;
So-called structural VARs attempt to identify the shocks

from economic theory. They often use an additional matrix A
0
that permits an immediate reaction of a component y
k
to
another y
j
and various identication restrictions. They may
also be over-identied and restrict the basic VAR(p) model.
Decomposition of the hstep error
Starting from an orthogonal MA representation with
w
= I
K
,
y
t
= +
i =0
i
w
ti
,
the error of an hstep forecast is
y
t+h
y
t
(h) =
h1
i =0
i
w
t+hi
,
and for the j th component
y
j ,t+h
y
j ,t
(h) =
h1
i =0
K
k=1
jk,i
w
k,t+i
.
All hK terms are orthogonal, and this error can be decomposed
into the K contributions from the component errors.
Forecast error variance decomposition
Consider the variance of the j th forecast component
MSE(y
j ,t
(h)) =
h1
i =0
K
k=1
2
jk,i
.
The share that is due to the kth component error,
jk,h
=
h1
i =0

2
jk,i
MSE(y
j ,t
(h))
,
denes the forecast error variance decomposition (FEVD) and is
often tabulated or plotted versus h for j , k = 1, . . . , K.
Invariants and others in structural analysis
1. Granger causality is independent of the choice of Wold-type
MA representation. It is there or it is not;
2. Impulse response functions depend on the chosen
representation. OIRF may dier for distinct orderings of the
component variables;
3. Forecast error variance decomposition inherits the problems of
IRF analysis: unique only in the absence of instantaneous
causality.

Var Pres

Uploaded by

Copyright:

Available Formats

Var Pres

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Var Pres

Uploaded by

Copyright:

Available Formats

Introduction Stable VAR Processes

This is called a Wold representation, as Wolds Theorem

. This is the canonical or fundamental or

The property (L)A(L) = I allows to determine

The MA representation exists i the VAR(p) is stable, i.e. i

from the right. Application of

For synthetic purposes, rst evaluate

Then, the entire ACF is obtained from

The big matrix in the h = 0 equation must be invertible, as

Sure, the same trick works for VAR(p), as they have a

For analytic purposes, A

The property that

The property may also be generalized to x and z being two

The denition is sometimes modied to hstep causality,

If the rst p(K 1) values of an impulse response are 0, then

If the VAR is stable, all impulse response functions must

It is customary to scale the impulse responses by the standard

The impulse response based on the canonical MA

Because of the multiplication by the matrix P, diagonal

The OIRF can be quite dierent from the IRF based on

The orthogonal IRF based on a re-ordering of variable

Some researchers suggest to arrange the vector of components

The generalized impulse response function (GIRF) according

So-called structural VARs attempt to identify the shocks

You might also like