Topic4 VARs 2019

Bayesian Vector Autoregressive Models
Bayesian VARs 1 / 67
I
ntroduction
There are many popular time series models and all cannot be covered
in a short course.
In this course, will focus on models popular with empirical
macroeconomists, characterized by:
i) Multivariate in nature (macroeconomists interested in relationships
between variables, not properties of a single variable).
ii) Allow for parameters to change (e.g. over time, across business
cycle, etc.).
We will not cover univariate time series nor nonlinear time series
models such as Markov switching, TAR, STAR, etc.
See Bayesian Econometric Methods Chapters 17 and 18 for treatment
of some of these models.
We will discuss state space models (which can be used to model
nonlinearities).
T
ime Series Modelling for Empirical Macroeconomics

Vector Autoregressive (VAR) models popular way of summarizing
inter-relationships between macroeconomic variables.
Used for forecasting, impulse response analysis, etc.
Economy is changing over time. Is model in 1970s same as now?
Thus, time-varying parameter VARs (TVP-VARs) are of interest.
Great Moderation of business cycle leads to interest in modelling error
variances
TVP-VARs with multivariate stochastic volatility is our end goal.
Begin with Bayesian VARs
A common theme: These models are over-parameterized so need
shrinkage to get reasonable results (shrinkage = prior).
B
ayesian VARs
One way of writing VAR(p) model:
p
yt = a0 + ∑ Aj yt −j + ε t
j =1
yt is M × 1 vector
ε t is M × 1 vector of errors
a0 is M × 1 vector of intercepts
Aj is an M × M matrix of coefficients.
ε t is i.i.d. N (0, Σ).
Exogenous variables or more deterministic terms can be added (but
we don’t to keep notation simple).
Several alternative ways of writing the VAR (and we will use some
alternatives below).
One way: let y be MT × 1 vector (y = (y10 , .., yT0 )) and ε stacked
conformably
xt = 1, yt0−1 , .., yt0−p

 
x1
 x2 
X = .
 
 ..


xT
K = 1 + Mp is number of coefficients in each equation of VAR and X
is a T × K matrix.
The VAR can be written as:
y = (IM ⊗ X ) α + ε
ε ∼ N (0, Σ ⊗ IM ).
Another way of writing VAR:
Let Y and E be T × M matrices placing the T observations on each
variable in columns next to one another.
Then can write VAR as
Y = XA + E
In first VAR, α is KM × 1 vector of VAR coefficients, here A is K × M
Relationship between two: α = vec (A)
We will use both notations below (and later on, when working with
restricted VAR need to introduce yet more notation).
L
ikelihood Function
Likelihood function can be derived and shown to be of a form that
breaks into two parts (see Bayesian Econometric Methods Exercise
17.6)
First of these parts α given Σ and another for Σ
−1
α|Σ, y ∼ N bα, Σ ⊗ X 0 X
Σ−1 has Wishart form
Σ −1 | y ∼ W S −1 , T − K − M − 1

−1
where Ab = (X 0 X ) X 0 Y is OLS estimate of A, b
α = vec Ab and
0
S = Y − X Ab Y − X Ab
D
igression
Remember regression models had parameters β and σ2
1
There proved convenient to work with h = σ2
In VAR proves convenient to work with Σ −1
In regression h typically had Gamma distribution
With VAR Σ−1 will typically have Wishart distribution
Wishart is matrix generalization of Gamma
Details see appendix to textbook.
If Σ−1 is W (C , c ) then “Mean” is cC and c is degrees of freedom.
Note: easy to take random draws from Wishart.
P
rior Issue 1
VARs are not parsimonious models: α contains KM parameters
For a VAR(4) involving 5 dependent variables: 105 parameters.
Macro data sets: number of observations on each variable might be a
few hundred.
Without prior information, hard to obtain precise estimates.
Features such as impulse responses and forecasts will tend to be
imprecisely estimated.
Desirable to “shrink” forecasts and prior information offers a sensible
way of doing this shrinkage.
Different priors do shrinkage in different ways.
P
rior Issue 2
Some priors lead to analytical results for the posterior and predictive
densities.
Other priors require MCMC methods (which raise computational
burden).
E.g. recursive forecasting exercise typically requires repeated
calculation of posterior and predictive distributions
In this case, MCMC methods can be very computationally demanding.
May want to go with not-so-good prior which leads to analytical
results, if ideal prior leads to slow computation.
P
rior Issue 3
Priors differ in how easily they can handle extensions of the VAR
defined above.
Restricted VARs: different equations have different explanatory
variables.
TVP-VARs: Allowing for VAR coefficients to change over time.
Heteroskedasticity
Such extensions typically require MCMC, so no need to restrict
consideration to priors which lead to analytical results in basic VAR
T
he Minnesota Prior
The classic shrinkage priors developed by researchers (Litterman,
Sims, etc.) at the University of Minnesota and the Federal Reserve
Bank of Minneapolis.
They use an approximation which simplifies prior elicitation and
computation: replace Σ with an estimate, Σ.
b
Original Minnesota prior simplifies even further by assuming Σ to be a
bii = si2
diagonal matrix with σ
si2 is OLS estimate of the error variance in the i th equation
If Σ not diagonal, can use, e.g., Σ
b = S.
T
Minnesota prior assumes
α ∼ N (αMin , V Min )
Minnesota prior is way of automatically choosing αMin and V Min

Note: explanatory variables in any equation can be divided as:
own lags of the dependent variable
the lags of the other dependent variables
exogenous or deterministic variables
αMin = 0 implies shrinkage towards zero (a nice way of avoiding
over-fitting).
When working with differenced data (e.g. GDP growth), Minnesota
prior sets αMin = 0
When working with levels data (e.g. GDP growth) Minnesota prior
sets element of αMin for first own lag of the dependent variable to 1.
Idea: Centred over a random walk. Shrunk towards random walk
(specification which often forecasts quite well)
Other values of αMin also used, depending on application.
Prior mean: “towards what should we shrink?”
Prior variance: “by how much should we shrink?”
Minnesota prior: V Min is diagonal.
Let V i denote block of V Min for coefficients in equation i
V i,jj are diagonal elements of V i
A common implementation of Minnesota prior (for r = 1, .., p lags):
 a1
 r2
for coefficients on own lags
a2 σii
V i,jj = r 2 σjj
j 6= i
for coefficients on lags of variable

a3 σii for coefficients on exogenous variables
Typically, σii = si2 .
KM (KM +1)
Problem of choosing 2 elements of V Min reduced to simply
choosing , a1 , a2 , a3 .
Property: as lag length increases, coefficients are increasingly shrunk
towards zero
Property: by setting a1 > a2 own lags are more likely to be important
than lags of other variables.
See Litterman (1986) for motivation and discussion of these choices
(e.g. explanation for how σσjjii adjusts for differences in the units that
the variables are measured in).
Minnesota prior seems to work well in practice.
Recent paper by Giannone, Lenza and Primiceri (in ReStat) develops
methods for estimating prior hyperparameters from the data
P
osterior Inference with Minnesota Prior

Simple analytical results involving only the Normal distribution.

α|y ∼ N αMin , V Min
h
−1
i−1
V Min = V Min + Σb −1 ⊗ X 0 X
0
−1 b −1 ⊗ X
αMin = V Min V Min αMin + Σ y
N
atural conjugate prior

A drawback of Minnesota prior is its treatment of Σ.
Ideally want to treat Σ as unknown parameter
Natural conjugate prior allows us to do this in a way that yields
analytical results.
But (as we shall sell) has some drawbacks.
In practice, noninformative limiting version of natural conjugate prior
sometimes used (but noninformative prior does not do shrinkage)
An examination of likelihood function (see also similar derivations for
Normal linear regression model where Normal-Gamma prior was
natural conjugate) suggests VAR natural conjugate prior:
α|Σ ∼ N (α, Σ ⊗ V )
Σ −1 ∼ W S −1 , ν

α, V , ν and S are prior hyperparameters chosen by the researcher.

Noninformative prior: ν = 0 and S = V −1 = cI and let c → 0.
P
osterior when using natural conjugate prior
Posterior has analytical form:
α|Σ, y ∼ N α, Σ ⊗ V

−1
Σ −1 | y ∼ W S , ν
where −1
V = V −1 + X 0 X

h i
A = V V −1 A + X 0 X Ab
0
S = S + S + Ab0 X 0 X Ab + A0 V −1 A − A V −1 + X 0 X A

ν = T +ν
Remember: in regression model joint posterior for ( β, h ) was
Normal-Gamma, but marginal posterior for β had t-distribution
Same thing happens with VAR coefficients.
Marginal posterior for α is a multivariate t-distribution.
Posterior mean is α
Degrees of freedom parameter is ν
Posterior covariance matrix:
1
var (α|y ) = S ⊗V
ν−M −1
Posterior inference can be done using (analytical) properties of
t-distribution.
Predictive inference can also be done analytically (for one-step ahead
forecasts)
P
roblems with Natural Conjugate Prior

Natural conjugate prior has great advantage of analytical results, but
has some problems which make it rarely used in practice.
To make problems concrete consider a macro example:
The VAR involves variables such as output growth and the growth in
the money supply
Researcher wants to impose the neutrality of money.
Implies: coefficients on the lagged money growth variables in the
output growth equation are zero (but coefficients of lagged money
growth in other equations would not be zero).
Problem 1: Cannot simply impose neutrality of money restriction.
The (IM ⊗ X ) form of the explanatory variables in VAR means every
equation must have same set of explanatory variables.
But if we do not maintain (IM ⊗ X ) form, don’t get analytical
conjugate prior (see Kadiyala and Karlsson, JAE, 1997 for details).
Problem 2: Cannot “almost impose” neutrality of money restriction
through the prior.
Cannot set prior mean over neutrality of money restriction and set
prior variance to very small value.
To see why, let individual elements of Σ be σij .
Prior covariance matrix has form Σ ⊗ V
This implies prior covariance of coefficients in equation i is σii V .
Thus prior covariance of the coefficients in any two equations must be
proportional to one another.
So can “almost impose” coefficients on lagged money growth to be
zero in ALL equations, but cannot do it in a single equation.
Note also that Minnesota prior form V Min is not consistent with
natural conjugate prior.
S
ome interesting approaches I will not discuss

Choosing prior hyperparameters by using dummy observations
(fictitious prior data set), see Sims and Zha (1998, IER).
Using prior information from macro theory (e.g. DSGE models), see
Ingram and Whiteman (1994, JME) and Del Negro and Schorfheide
(2004, IER).
Villani (2009, JAE): priors about means of dependent variables
Useful since researchers often have prior information on these.
Write VAR as:
Ae (L) (yt − ae0 ) = ε t
Ae (L) = I − Ae1 L − .. − Aep Lp , L is the lag operator
ae0 are unconditional means of the dependent variables.
Gibbs sampling required.
T
he Independent Normal-Wishart Prior

Natural conjugate prior had α|Σ being Normal and Σ−1 being
Wishart and VAR had same explanatory variables in every equation.
Want more general setup without these restrictive features.
Can do this with a prior for VAR coefficients and Σ−1 being
independent (hence name “independent Normal-Wishart prior”)
And using a more general formulation for the VAR
To allow for different equations in the VAR to have different
explanatory variables, modify notation.
To avoid, use “β” notation for VAR coefficients now instead of α.
Each equation (for m = 1, .., M) of the VAR is:
0
ymt = zmt β m + ε mt,
0
If we set zmt = 1, yt0−1 , .., yt0−p for m = 1, .., M then exactly same
VAR as before.
However, here zmt can contain different lags of dependent variables,
exogenous variables or deterministic terms.
Vector/matrix notation:
yt = (y1t , .., yMt )0 , ε t = (ε 1t , .., ε Mt )0
 
β1
β =  ... 
 
βM
0
 
z1t 0 ··· 0
 0 .. .. 
 0 z2t . . 
Zt = 
 .. .. ..


 . . . 0 
0 ··· 0 0
zMt
β is k × 1 vector, Zt is M × k where k = ∑M j =1 kj .
ε t is i.i.d. N (0, Σ).
Can write VAR as:
y t = Zt β + ε t
Stacking:  
y1
 ..
y = .


yT
 
ε1
 ..
ε= .


εT
 
Z1
 ..
Z = .


ZT
VAR can be written as:
y = Zβ + ε
ε is N (0, I ⊗ Σ).
Thus, VAR can be written as a Normal linear regression model with
error covariance matrix of a particular form (SUR form).
Independent Normal-Wishart prior:
p β, Σ−1 = p ( β) p Σ−1

where
β ∼ N β, V β
and
Σ −1 ∼ W S −1 , ν

V β can be anything the researcher chooses (not restrictive Σ ⊗ V

form of the natural conjugate prior).
β and V β could be set as in the Minnesota prior.
A noninformative prior obtained by setting ν = S = V − 1
β = 0.
P
osterior inference in the VAR with independent Normal-Wishart prior
p β, Σ−1 |y does not have a convenient form allowing for analytical

results.
But Gibbs sampler can be set up.
Conditional posterior distributions p β|y , Σ−1 and p Σ−1 |y , β do

have convenient forms
β|y , Σ−1 ∼ N β, V β

where ! −1
T
Vβ = V−
β
1
+ ∑ Zt0 Σ−1 Zt
t =1
and !
T
β = Vβ V− 1
β β+ ∑ Zt0 Σ−1 yt
i =1
−1
Σ−1 |y , β ∼ W S , ν,
where
ν = T +ν
T
S =S+ ∑ (yt − Zt β) (yt − Zt β)0
t =1
Remember: for any Gibbs sampler, the resulting draws can be used to
calculate posterior properties of any function of the parameters (e.g.
impulse responses), marginal likelihoods (for model comparison)
and/or to do prediction.
P
rediction in VARs
I will use prediction and forecasting to mean the same thing
Goal predict yτ for some period τ using data available at time τ − 1
For the VAR, Zτ contains information dated τ − 1 or earlier.
For predicting at time τ given information through τ − 1, can use:
yτ |Zτ , β, Σ ∼ N (Zt β, Σ)
This result and Gibbs draws β(s ) , Σ(s ) for s = 1, .., S allows for
predictive inference.
E.g. predictive mean (a popular point forecast) could be obtained as:
∑Ss=1 Zt β(s )
E (yτ | Zτ ) =
S
Other predictive moments can be calculated in a similar fashion
P
rediction in VARs
Or can do predictive simulation:
(s )
For each Gibbs draw β(s ) , Σ(s ) simulate one (or more) yτ
(s )
Result will be yτ for s = 1, .., S draws
Plot them to produce predictive density
Average them to produce predictive mean
Take their standard deviation to produce predictive standard deviation
etc.
P
rediction in VARs
Preceding material was about predicting yτ using data available at
time τ − 1
This is one-period ahead forecasting
But what about h-period ahead forecast
h is the forecast horizon
E.g. with quarterly data forecasting a year ahead h = 4
Can do direct or iterated forecasting
D
irect Forecasting in VARs

Direct forecasting is straightforward: simply redefine Zτ
0
Above defined each equation using zmτ = 1, yτ0 −1 , .., yτ0 −p
0
Replace this by zmτ = 1, yτ0 −h , .., yτ0 −p −h+1
Then your model is always predicting yτ using data available at time
τ−h
All posterior and predictive formulae are as above
If forecasting (e.g.) for h = 1, 2, 3, 4 must re-estimate model for each
h
I
terated Forecasting in VARs
0
Estimate the model once using zmτ = 1, yτ0 −1 , .., yτ0 −p
Remember result that
yτ |Zτ , β, Σ ∼ N (Zt β, Σ) (**)
When forecasting yτ using information available at time τ − h for
h > 1 you face a problem using (**)
Use h = 2 and p = 2 to illustrate
In the model, yτ depends on yτ −1 and yτ −2
But as a forecaster, you do not know yτ −1 yet
E.g. suppose you have data through 2015Q4
When forecasting 2016Q1 (h = 1) will have data for 2015Q4 and
2015Q3
So Zt is known for h = 1
But when forecasting 2016Q2 (h = 2) will not have data for 2016Q1
and not know Zt
I
terated Forecasting in VARs

Solution to problem:
Do predictive simulation beginning with h = 1
(s )
Use draw of yτ −1 (along with yt −2 , β(s ) , Σ(s ) ) to plug into (**)
This is called iteration
For h > 2 just keep on iterating
(s ) (s )
Strategy above will provide you with draws yτ −1 and yτ −2
For h = 3 can use these to define appropriate Zt for use in (**)
etc.
Which of iterated or direct forecasting is better?
This seems to depend on the data set being used
R
ecursive and Rolling Forecasts

Data runs from t = 1, .., T
E.g. annual data set from 1960 through 2015
Sometimes researcher is interested in out-of-sample forecasting:
Forecasting 2016 (or 2017, 2018, etc.)
2016 is not yet observed = out of sample
Sometimes researcher wants to know how well model might have
forecast in past
E.g. given data I had in 1995 how well would I have forecast 1996?
In general, given data available at time τ − h, how well would I
forecast τ?
Pseudo out-of-sample forecast evaluation
R

For pseudo out–of-sample forecast evaluation proceed as follows:
choose a forecast evaluation period: τ = τ0 , .., T
E.g. 1970 to 2015
Note τ0 > 1 since you need at least some data to sensibly estimate
the VAR
Recursive forecasting involves:
use data for t = 1, .., τ − h to forecast yτ
Repeat for τ = τ0 , .., T
Note: can be computationally demanding (esp. if MCMC and
predictive simulation used)
Repeatedly estimate model on “expanding window” of data
R

Recursive forecasting uses all data available at τ − h to forecast
But what if parameter change has occurred (e.g. 1960s data
irrelevant for 1990s forecasting)?
E.g. Recursive forecasts in 1990s will be contaminated with 1960s
data
Best solution: build parameter/regime change into your model (more
in future on this)
Rolling forecasts: same as recursive forecasts but use data from
t = τ − h − τ1 , .., τ − h to estimate VAR for forecasting yτ
Fixed window of data (always use most recent τ1 observations)
E
valuating Forecasts
Suppose you have produced forecasts somehow (direct or
iterated/recursive or rolling) for τ = τ0 , .., T and have
Predictive densities p (yτ |Zt )
Predictive means (point forecasts): E (yτ |Zt )
Note: in past point forecasts popular, now huge interest in
uncertainty about future (e.g. Bank of England inflation fan charts)
Predictive densities (or density forecasts) hot topic
Usually will have forecasts from several models (e.g. comparing VAR
to other modelling approaches)
How do you decide whether your forecasts are good?
Large literature exists on forecast evaluation
Necessary to distinguish between random variable and its realization
E.g. if yit is random variable and yitR is the observed value (e.g.
observed inflation in 2015)
Here I will define two common approaches
M
ean Squared Forecast Error (MSFE)

MSFE is the most common way of measuring performance of point
forecasts for a variable in the VAR (e.g. yit = inflation)
2
∑T y R − E (yi τ |Zt )
MSFE = τ =τ0 it
T − τ0 + 1
Many related variants such as Mean Absolute Forecast Error (MAFE):
∑T
R
τ =τ0 yit − E (yi τ |Zt )

MAFE =
T − τ0 + 1
P
redictive Likelihoods
Most common way of evaluating performance of entire predictive
density is with predictive likelihood
Predictive likelihood is predictive density evaluated at the actual
realization
Predictive likelihood for variable i at time τ: p yi τ = yiRτ |Zτ

Common to present cumulative sums of log predictive likelihoods as

measure of forecast performance:
T h i
∑ log p yi τ = yiRτ |Zτ
τ =τ0
Can show if τ0 = 1, cumulative sums of predictive likelihoods equal to

log marginal likelihood
Have interpretation similar to marginal likelihoods over forecast
evaluation period
S
tochastic Search Variable Selection (SSVS) in VARs
There are many approaches which seek parsimony/shrinkage in VARs,
take SSVS as a representative example
SSVS is usually done in VAR where every equation has same
explanatory variables
Hence, return to our initial notation for VARs where X contains
lagged dependent variable, α are VAR coefficients, etc.
SSVS can be interpreted as a prior shrinks some VAR coefficients to
zero
Or as a model selection device (select the model with explanatory
variables with non-zero coefficients)
Or as a model averaging device (which averages over models with
different non-zero coefficients).
Can be implemented in various ways, here we follow George, Sun and
Ni (2008, JoE)
Remember: of basic idea for a VAR coefficient, αj
SSVS is hierarchical prior, mixture of two Normal distributions:
2 2

αj |γj ∼ (1 − γj ) N 0, κ0j + γj N 0, κ1j
γj is dummy variable.

2
γj = 1 then αj has prior N 0, κ1j

2
γj = 0 then αj has prior N 0, κ0j
Prior is hierarchical since γj is unknown parameter and estimated in a
data-based fashion.
2 is “small” (so coefficient is shrunk to be virtually zero)
κ0j
2 is “large” (implying a relatively noninformative prior for α ).
κ1j j
Below we describe a Gibbs sampler for this model which provides
draws of γ and other parameters
SSVS can select a single restricted model.
Run Gibbs sampler and calculate Pr (γj = 1|y ) for j = 1, .., KM
Set to zero all coefficients with Pr (γj = 1|y ) < a (e.g. a = 0.5).
Then re-run Gibbs sampler using this restricted model
Alternatively, if the Gibbs sampler for unrestricted VAR is used to
produce posterior results for the VAR coefficients, result will be
Bayesian model averaging (BMA).
G
ibbs Sampling with the SSVS Prior
SSVS prior for VAR coefficients, α, can be written as:
α|γ ∼ N (0, DD )
γ is a vector with elements γj ∈ {0, 1},
D is diagonal matrix with (j, j )th element dj :

κ0j if γj = 0
dj =
κ1j if γj = 1
“default semi-automatic approach” to selecting κ0j and κ1j
p p
Set κ0j = c0 varc (αj ) and κ1j = c1 var
c ( αj )
c (αj ) is estimate from an unrestricted VAR
var
E.g. OLS or a preliminary Bayesian estimate from a VAR with
noninformative prior
Constants c0 and c1 must have c0 c1 (e.g. c0 = 0.1 and c1 = 10).
We need prior for γ and a simple one is:
Pr (γj = 1) = q j
Pr (γj = 0) = 1 − q j
q j = 12 for all j implies each coefficient is a priori equally likely to be

included as excluded.
Can use same Wishart prior for Σ−1
Note: George, Sun and Ni also show how to do SSVS on off-diagonal
elements of Σ
Gibbs sampler sequentially draws from p (α|y , γ, Σ) , p (γ|y , α, Σ) and
p Σ−1 |y , γ, α
α|y , γ, Σ ∼ N (αα , V α )
where
V α = [Σ−1 ⊗ (X 0 X ) + (DD )−1 ]−1
αα = V α [(ΨΨ0 ) ⊗ (X 0 X )α̂]
Â = (X 0 X )−1 X 0 Y
α̂ = vec (Â)
p (γ|y , α, Σ) has γj being independent Bernoulli random variables:
Pr (γj = 1|y , α, Σ) = q j
Pr (γj = 0|y , α, Σ) = 1 − q j
where
!
1 α2j
exp − 2 q j
κ1j 2κ1j
qj = 2
! !
1 αj 1 α2j
exp − 2 q j + exp − 2 1 − qj
κ1j 2κ1j κ0j 2κ0j
p Σ−1 |y , γ, α has similar Wishart form as previously, so I will not

repeat here.
I
llustration of Bayesian VAR Methods in a Small VAR

Data set: standard quarterly US data set from 1953Q1 to 2006Q3.
Inflation rate ∆πt , the unemployment rate ut and the interest rate rt
yt = (∆πt , ut , rt )0 .
These three variables are commonly used in New Keynesian VARs.
The data are plotted in Figure 1.
We use unrestricted VAR with intercept and 4 lags
We consider 6 priors:
Noninformative: Noninformative version of natural conjugate prior
Natural conjugate: Informative natural conjugate prior with
subjectively chosen prior hyperparameters
Minnesota: Minnesota prior
Independent Normal-Wishart: Independent Normal-Wishart prior with
subjectively chosen prior hyperparameters
SSVS-VAR: SSVS prior for VAR coefficients and Wishart prior for Σ−1
SSVS: SSVS on both VAR coefficients and error covariance
Point estimates for VAR coefficients often are not that interesting,
but Table 1 presents them for 2 priors
With SSVS priors, Pr (γj = 1|y ) is the “posterior inclusion
probability” for each coefficient, see Table 2
1
Model selection using Pr (γj = 1|y ) > 2 restricts 25 of 39
coefficients to zero.
Table 1. Posterior mean of VAR Coefficients for Two Priors
Noninformative SSVS - VAR
∆πt ut rt ∆πt ut rt
Intercept 0.2920 0.3222 -0.0138 0.2053 0.3168 0.0143
∆πt −1 1.5087 0.0040 0.5493 1.5041 0.0044 0.3950
ut −1 -0.2664 1.2727 -0.7192 -0.142 1.2564 -0.5648
rt −1 -0.0570 -0.0211 0.7746 -0.0009 -0.0092 0.7859
∆πt −2 -0.4678 0.1005 -0.7745 -0.5051 0.0064 -0.226
ut −2 0.1967 -0.3102 0.7883 0.0739 -0.3251 0.5368
rt −2 0.0626 -0.0229 -0.0288 0.0017 -0.0075 -0.0004
∆πt −3 -0.0774 -0.1879 0.8170 -0.0074 0.0047 0.0017
ut −3 -0.0142 -0.1293 -0.3547 0.0229 -0.0443 -0.0076
rt −3 -0.0073 0.0967 0.0996 -0.0002 0.0562 0.1119
∆πt −4 0.0369 0.1150 -0.4851 -0.0005 0.0028 -0.0575
ut −4 0.0372 0.0669 0.3108 0.0160 0.0140 0.0563
rt −4 -0.0013 -0.0254 0.0591 -0.0011 -0.0030 0.0007
Table 2. Posterior Inclusion Probabilities for
VAR Coefficients: SSVS-VAR Prior
∆πt ut rt
Intercept 0.7262 0.9674 0.1029
∆πt −1 1 0.0651 0.9532
ut −1 0.7928 1 0.8746
rt −1 0.0612 0.2392 1
∆πt −2 0.9936 0.0344 0.5129
ut −2 0.4288 0.9049 0.7808
rt −2 0.0580 0.2061 0.1038
∆πt −3 0.0806 0.0296 0.1284
ut −3 0.2230 0.2159 0.1024
rt −3 0.0416 0.8586 0.6619
∆πt −4 0.0645 0.0507 0.2783
ut −4 0.2125 0.1412 0.2370
rt −4 0.0556 0.1724 0.1097
I
mpulse Response Analysis
Impulse response analysis is commonly done with VARs
Given my focus on the Bayesian econometrics, as opposed to
macroeconomics, I will not explain in detail
The VAR so far is a reduced form model:
p
yt = a0 + ∑ Aj yt −j + ε t
j =1
where var (ε t ) = Σ
Macroeconomists often work with structural VARs:
p
C0 yt = c0 + ∑ Cj yt −j + ut
j =1
where var (ut ) = I

ut are shocks which have an economic interpretation (e.g. monetary
policy shock)
Macroeconomist interested in effect of (e.g.) monetary policy shock
now on all dependent variables in future = impulse response analysis
Need to restrict C0 to identify model.
We assume C0 lower triangular
This is a standard identifying assumption used, among many others,
by Bernanke and Mihov (1998), Christiano, Eichanbaum and Evans
(1999) and Primiceri (2005).
Allows for the interpretation of interest rate shock as monetary policy
shock.
Aside: sign-restricted impulse responses of Uhlig (2005) are
increasingly popular
Figures 2 and 3 present impulse responses of all variables to shocks
Use two priors: the noninformative one and the SSVS prior.
Posterior median is solid line and dotted lines are 10th and 90th
percentiles.
Priors give similar results, but a careful examination reveals SSVS
leads to slightly more precise inferences (evidenced by a narrower
band between the 10th and 90th percentiles) due to the shrinkage it
provides.
Impulse Responses for Noninformative Prior
Impulse Responses for SSVS Prior
L
arge VARs: A Promising Way of Dealing with Fat Data
Pioneering paper: Banbura, Giannone and Reichlin (2010, JAE)
”Large Bayesian Vector Autoregressions”
Banbura et al paper has 131 dependent variables (standard US macro
variables)
Many others, here is a sample (note range of types of applications in
macro/finance and internationally):
Carriero, Kapetanios and Marcellino (2009, IJF): exchange rates for
many countries
Carriero, Kapetanios and Marcellino (2012, JBF): US government
bond yields of different maturities
Giannone, Lenza, Momferatou and Onorante (2010): euro area
inflation forecasting (components of inflation)
Koop and Korobilis (2016, EER) eurozone sovereign debt crisis
Bloor and Matheson (2010, EE): macro application for New Zealand
Jarociński and Maćkowiak (2016, ReStat): Granger causality
Banbura, GIannone and Lenza (2014, ECB): conditional
W
hy large VARs?
Availability of more data
More data means more information, makes sense to include it
Concerns about missing out important information (omitted variables
bias, fundamentalness, etc.)
The main alternatives are factor models
Factors squeeze information in large number of variables to small
number of factors
But this squeezing is done without reference to explanatory power (i.e.
squeeze first then put in regression model or VAR): “unsupervised”
Large VAR methods are supervised and can easily see role of
individual variables
And they work: often beating factor methods in forecasting
competitions
BGR “medium” VAR has 20 dep vars and “large” VAR has 130
Usually, when working with so many macroeconomic variables, factor
methods are used
However, BGR find that medium and large Bayesian VARs can
forecast better than factor methods
Perhaps Bayesian VARs should be used even when researcher has
dozens or hundreds of variables?
Dimensionality of α is key
Large VAR with quarterly data might have n = 100 and p = 4 so α
contains over 40000 coefficients.
With monthly data it would have over 100000 coefficients.
For a medium VAR, α might have about 1500 coefficients with
quarterly data.
n (n +1)
Σ is parameter rich: 2 elements.
Number of parameters may far exceed the number of observations.
In theory, this is no problem for Bayesian methods.
These combine likelihood function with prior.
Even if parameters in likelihood function are not identified, combining
with prior will (under weak conditions) lead to valid posterior density
But how well do they work in practice?
Role of prior information becomes more important as likelihood is less
informative
Methods I have discussed have been found to work well
But very active area of research (both for econometric theory and
empirical practice)
S
ummary
Lecture began with summary of basic methods and issues which arise
with Bayesian VAR modelling and addressed questions such as:
Why is shrinkage necessary?
How should shrinkage be done?
With recent explosion of interest in large VARs, need for answers for
such questions is greatly increased
Many researchers now developing models/methods to address them
I have described one popular category focussing on SSVS methods
But many more exist (e.g. variants on LASSO) and are coming out
all the time
Recent survey paper: Sune Karlsson: Forecasting with Bayesian Vector
Autoregressions (in Handbook of Economic Forecasting, volume 2)

Topic4 VARs 2019

Uploaded by

Copyright:

Available Formats

Topic4 VARs 2019

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Topic4 VARs 2019

Uploaded by

Copyright:

Available Formats

Bayesian Vector Autoregressive Models

ime Series Modelling for Empirical Macroeconomics

Σ−1 has Wishart form

Minnesota prior is way of automatically choosing αMin and V Min

Typically, σii = si2 .

osterior Inference with Minnesota Prior

atural conjugate prior

α, V , ν and S are prior hyperparameters chosen by the researcher.

roblems with Natural Conjugate Prior

ome interesting approaches I will not discuss

he Independent Normal-Wishart Prior

V β can be anything the researcher chooses (not restrictive Σ ⊗ V

have convenient forms

irect Forecasting in VARs

terated Forecasting in VARs

ecursive and Rolling Forecasts

ecursive and Rolling Forecasts

ecursive and Rolling Forecasts

ean Squared Forecast Error (MSFE)

Common to present cumulative sums of log predictive likelihoods as

Can show if τ0 = 1, cumulative sums of predictive likelihoods equal to

q j = 12 for all j implies each coefficient is a priori equally likely to be

p Σ−1 |y , γ, α has similar Wishart form as previously, so I will not

llustration of Bayesian VAR Methods in a Small VAR

where var (ut ) = I

You might also like