Var PPTS
Var PPTS
Var PPTS
A Bayesian Approach to
Identification of Structural VAR
Models
Christiane Baumeister
University of Notre Dame
CEPR and NBER
Training Course
Central Reserve Bank of Peru
March 13-15, 2023
What is Econometrics All About?
• Econometrics is concerned with the use of sample data to
learn about a phenomenon the researcher is interested in
use data to learn about unknown economic parameters
that capture the relationship between macro variables
2
Why Bayesian?
• In many applications, the econometrician possesses, in
addition to the sample, other information about the
parameters:
theoretical constraints on the parameter space: integrate theory
with empirics (e.g. identifying restrictions, stability constraints)
previous empirical research: past samples, data from other
countries, micro data (e.g. surveys)
• Bayesian analysis allows to:
include non-sample information in estimation in a flexible way
Vector autoregressions (VARs)
Short time series, measurement error
Lag length
account for uncertainty in decision-making context (e.g. policy)
analyze models that are intractable using classical methods 3
What Are The Goals of This Course?
• Chris Sims (2007):
Being Bayesian is more than a basket of “methods”,
it is a mindset.
4
Course Overview
5
Vector Autoregressive Models
• Workhorse models in empirical macroeconomics
capture the dynamic interrelationships between variables
that represent the economy
used for data description, forecasting, structural
dynamics, and policy & counterfactual analysis
3
How do Classical and Bayesian Analysis
Differ?
2. Bayesian analysis
μ is treated as a random variable it has a
probability distribution
The distribution summarizes our knowledge about
the model parameter 2 sources of information:
Prior information (before seeing the data):
subjective belief about how likely different
parameter values are
Sample information: leads researcher to
revise/update his prior beliefs
Probabilities are subjective and not necessarily
related to the relative frequency of an event.
Explicit use of probabilities to quantify
uncertainty. 4
Key Ingredients for Bayesian Analysis
1. Probabilities
Review some probability rules to derive Bayes’
rule
2. Initial information
What is the reason for using prior information?
How to specify a prior distribution for
parameters?
3. How to combine data and non-data (prior)
information?
Bayesian estimation in practice
5
Some Rules of Probability
Consider two random variables: A and B
7
A Closer Look at Each Component
Key object of interest: p(θ | y)
9
Skills for Bayesian Inference
Bayesian inference requires a good knowledge of:
• Probability distributions
to formulate prior distributions
to generate draws from them
to analyze posterior distributions
10
More on Priors
• Two decisions with regard to priors:
1. Family of the prior distribution
2. Hyperparameters of the prior distribution
• In principle any distribution can be combined with the
likelihood to form the posterior.
• Conjugate priors
If a prior is conjugate, then the posterior has the same density as the
prior. Very convenient
• Natural conjugate priors
Additional property: they have the same functional form as the
likelihood function. The prior can be interpreted as arising from
earlier data analysis. 11
The Linear Regression Model
• Consider the linear regression model with fixed
regressors:
𝑻
• Likelihood:
12
Bayesian Analysis
• Idea 1: The parameters are random variables
with a probability distribution.
• Idea 2: A Bayesian estimate of this distribution combines
prior beliefs and information from the data.
Step 1: Form prior beliefs about parameters (based on past
experience or other studies) and express in the form of
a probability distribution: 𝑝 𝜽
Step 2: Information contained in the data is summarized by the
likelihood function: 𝐿 𝜽|𝐘
Step 3: Bayes’ Rule gives the posterior distribution of the
parameters: 𝑝 𝜽 𝐘 ∝ 𝐿 𝜽 𝐘 𝑝 𝜽
13
Example 1: Inference of β when known
Prior distribution of β
p(β|σ2) ~ N(β0, Σ0)
K 1
1
Prior density: 2𝜋 2 | 𝚺𝟎 | 2 exp 𝛃 𝛃𝟎 ′ 𝚺𝟎 𝟏 𝛃 𝛃𝟎
2
2 1 ′ 𝟏
p(β|σ ) 𝟎 𝟎 𝟎
2
1 1 0
Example: 𝛃𝟎 and 𝚺𝟎
1 0 10
Likelihood
2 1 ′
L(β|σ , Y)
2𝜎 2 14
Combining prior density and likelihood
p(β|σ2, Y) p(β|σ2) L(β|σ2, Y)
1 1
∝ exp 𝛃 𝛃𝟎 ′ 𝚺𝟎 𝟏 𝛃 𝛃𝟎 𝐘 𝐗𝛃 ′ 𝐘 𝐗𝛃
2 2𝜎 2
Posterior distribution of β
p(β|σ2, Y) ~ N(β1, Σ1)
where
β1 = (Σ0-1 + σ-2X'X)-1 (Σ0-1β0 + σ-2X'Y)
= (Σ0-1 + σ-2X'X)-1 (Σ0-1β0 + σ-2X'Xb) with b=(X'X)-1X'Y
Σ1 = (Σ0-1 + σ-2X'X)-1
15
Example 2: Inference of when β known
• Recall:
then
with the density for the Gamma distribution given by:
⁄
where and
16
Why Use this Prior?
for
2) flexible family (different shapes)
17
Gamma Distributions with Mean Unity
ν 1, 𝛿 1
ν 2, 𝛿 2
ν 4, 𝛿 4
18
Why Use this Prior?
3) It is the “natural conjugate prior” given the likelihood,
meaning that if the prior is ,
then the posterior turns out to be
19
Example 2: Inference of when β known
2 1 𝜈𝑜 1 𝛿0
Prior density: p(1/σ |β) 2
2
𝜎 2𝜎 2
Likelihood
2 1 1 ′
L(1/σ |β, Y) 𝑇
2𝜎 2
𝜎2 2
20
Combining prior density and likelihood
p(1/σ2|β, Y) p(1/σ2|β) L(σ2|β, Y)
𝜈𝑜
1 1 𝛿0 1 1
2 ′
∝ exp 𝑇 exp 𝐘 𝐗𝛃 𝐘 𝐗𝛃
𝜎2 2𝜎 2 2𝜎 2
𝜎2 2
1 𝜈𝑜 𝑇
1 1 ′
2
2 2 exp 𝛿0 𝐘 𝐗𝛃 𝐘 𝐗𝛃
𝜎 2𝜎 2
22
What If All Parameters Are Unknown?
• Calculating the joint posterior distribution
𝑝 𝛃, 1/𝜎 2 |𝐘 ∝ 𝑝 𝐘, 𝛃, 1/𝜎 2
1 1 ′
∝ 𝑇 exp 𝐘 𝐗𝛃 𝐘 𝐗𝛃
2𝜎 2
𝜎2 2
𝜈𝑜
1
1 2 𝛿0
exp
𝜎2 2𝜎 2
𝐾/2
1 1
exp 𝛃 𝛃𝟎 ′ 𝚺𝟎 𝟏 𝛃 𝛃𝟎
𝜎2 2𝜎 2
23
Posterior for
∗
0
∗ ′ ′
0 𝟎 𝟎
′ 1
1 1 𝟏
𝟎 𝟎
24
Posterior for
∗ ∗ 1
𝟎 𝟎
∗ 1 1
𝟎
• Diffuse prior: 𝟎 →
∗
→
∗
→
= usual OLS formulas
25
Posterior for
• Dogmatic prior: 𝟎 →
∗
→
∗
→ 𝟎
posterior = prior
26
Posterior for
∗
• In general: is a matrix-weighted average of
and , where weights depend on
confidence in prior ( ) and strength of
evidence from data (
27
Another Way to Interpret the Prior
• Suppose I had observed an earlier sample of
observations:
28
Another Way to Interpret the Prior
• Then my OLS estimate based on all information
would be:
29
Another Way to Interpret the Prior
• Let be the OLS estimate based on the prior
sample alone:
30
Another Way to Interpret the Prior
∗
identical to formula for posterior mean
31
Another Way to Interpret the Prior
∗
for the posterior variance defined earlier
32
What About the Marginal Posterior for ?
• To make inference on , we need to know the marginal
posterior:
∞
2 2
0
• For this simple model under the natural conjugate prior
analytical results can be obtained:
multivariate Student t with degrees of
freedom, mean ∗ , and scale matrix ∗ ∗ ∗ as
defined before
• BUT » integration is hard
» with other prior distributions analytical derivation
of joint and marginal posterior is not possible 33
Solution: Gibbs Sampling
• Suppose the parameter vector can be partitioned
as with the property that is
of unknown form but
from
from
(3) Repeat step (2) for
(4) Throw out first draws (for large) and use
remaining draws for inference 36
Back to our Regression Model
• Idea: By sampling repeatedly from the conditional
distributions and , we can
approximate the joint and marginal distributions of our
parameters of interest
• Steps:
1. Set priors and initial guess for 𝜎
2. Sample 𝛃 conditional on
3. Sample conditional on 𝛃
4. Cycle through steps (2) and (3) a large number of times and keep
only the last 𝐷 𝐷 draws
37
Application 1
• Linear regression model with one exogenous variable:
, and
38
Application 1
,
(2) At iteration j, conditional on draw , draw
,
where
,
where
39
How to Take Draws
• Normal distribution
To sample a vector from , generate
draws 𝟎 from the standard normal distribution (randn in
Matlab) and then apply the following transformation
𝟎 ⁄
42
Bayesian Analysis of
Structural VAR Models
The Identification Problem
• Classic questions in empirical macroeconomics:
What is the effect of a policy intervention (interest
rate increase, fiscal stimulus) on macroeconomic
aggregates (output, inflation, employment,…)?
Which structural shocks drive aggregate
fluctuations?
,
2
Dynamic structural model:
A yt B 1 yt−1 B m yt−m D 1/2 v t
nnn1 n1 nn n1 nn n1 nn n1
v t i.i.d. N0, In
′
x ′t−1 ′
1, yt−1 , yt−2 , . . . , y′t−m ′
d 11 0 ... 0
1/2
0 d 22 . . . 0
D
...
0 0 ... d nn
3
Example: supply and demand
q t s s p t b s11 p t−1 b s12 q t−1 b s21 p t−2
b s22 q t−2 b sm1 p t−m b sm2 q t−m d s v st
q t d d p t b d11 p t−1 b d12 q t−1 b d21 p t−2
b d22 q t−2 b dm1 p t−m b dm2 q t−m d d v dt
qt 1 − s
yt A
pt 1 − d
4
Reduced-form (can easily estimate):
y t c 1 y t−1 m y t−m t
t i.i.d. N0,
−1
̂ T
T ∑ y x t−1 ′
∑ T ′
x t−1 x t−1
t1 t t1
′ ′ ′
x t−1 1, y t−1 , y t−2 , . . . , y ′t−m ′
c 1 2 m
̂ t y t − ̂ T x t−1
̂
T T ∑ tt
−1 T
̂ ̂ ′
t1
5
Reminder of Bayesian Principles
• Bayesian idea: before observing a data sample ,
the researcher has some beliefs about how likely
different parameter values are which can be
expressed in the form of a distribution
prior density
• Combine prior information with information in the data
through the likelihood function to obtain the posterior
distribution:
p | yT pyT | p
∗ −1 −1
V V −1
∗
0 0 ⊗ X ′
X ̂ OLS 8
Prior for Variance Matrix
• Univariate regression:
2 E 2t
Let z i N0, −1 for i 1, 2, . . .
then, W z 21 z 22 . . . z 2
Γ,
9
Prior for Variance Matrix
• Univariate regression:
2 E 2t
Let z i N0, −1 for i 1, 2, . . .
then, W z 21 z 22 . . . z 2
Γ,
• Multivariate regression:
′
E
Now zi N0, −1 where zi is 1 n
then, W z21 z22 . . . z2
W,
10
Prior for Variance Matrix
• The conjugate prior for the VAR covariance matrix
is an inverse Wishart distribution:
p IW 0 , S 0
where 0 are the prior degrees of freedom
S 0 is the prior scale matrix
12
Minnesota Prior
Structured prior beliefs:
1) The endogenous variables in the VAR follow a
random walk or an AR(1) process.
Example: bivariate VAR(2) model
𝑦𝑡 0 𝑏11 0 𝑦𝑡 1 0 0 𝑦𝑡 2 𝜀1
𝑥𝑡 0 0 𝑏22 𝑥𝑡 1 0 0 𝑥𝑡 2 𝜀2
0 0 ′
Prior mean: 𝟎 11 22
0 0
Under RW assumption: 11 22
13
Minnesota Prior
2) The variance of the prior for the VAR coefficients
is set based on the following observations:
14
To make this operational define a set of hyperparameters that
control the tightness of the prior:
2
• 𝑖 4 for the constant
and are the standard deviations of error
terms from AR regressions estimated via OLS
the ratio of and accounts for the possibility that
variables and may have different scales 16
What does look like?
2
𝜎1 𝜆4 0 0 0 0 0 0 0 0 0
2
⎛ 0 𝜆1 0 0 0 0 0 0 0 0 ⎞
2
⎜ 𝜎1 𝜆1 𝜆2 ⎟
⎜ 0 0
𝜎2
0 0 0 0 0 0 0 ⎟
⎜ 2 ⎟
⎜ 0 𝜆1 ⎟
0 0 0 0 0 0 0 0
⎜ 2𝜆 3 ⎟
⎜ 𝜎1 𝜆1 𝜆2 2 ⎟
⎜ 0 0 0 0 0 0 0 0 0 ⎟
⎜ 𝜎2 2𝜆 3 ⎟
𝐕0 2
⎜ 0 0 0 0 0 𝜎2 𝜆4 0 0 0 0 ⎟
⎜ 𝜎2 𝜆1 𝜆2 2 ⎟
⎜ 0 0 0 0 0 0 0 0 0 ⎟
⎜ 𝜎1 ⎟
2
⎜ 0 0 0 0 0 0 0 𝜆1 0 0 ⎟
⎜ 2 ⎟
𝜎2 𝜆1 𝜆2
⎜ 0 0 0 0 0 0 0 0 0 ⎟
⎜ 𝜎1 2𝜆 3 ⎟
2
𝜆1
⎝ 0 0 0 0 0 0 0 0 0 ⎠
2𝜆 3
• For example:
Prior mean: 0 X ′D X D −1 X ′D Y D
Posterior mean: ∗ X ∗′ X ∗ −1 X ∗′ Y ∗
yt − t
yt−1 − 0
t ≡ et ≡
nm 1
nm 1
yt−m1 − 0
21
Rewriting a VAR( ) as a VAR(1)
Companion matrix:
1 2 3 m−1 m
In 0 0 0 0
F ≡ 0 In 0 0 0
nm nm
0 0 0 In 0
t F t−1 e t
22
Stability Condition
t F t−1 e t
23
Vector MA( ) Representation
24
Nonorthogonalized Impulse Responses
∂yts
s
∂ ′t
25
Structural model:
Ayt B x t−1 D 1/2 v t v t i.i.d. N0, In
Structural impulse responses:
∂yts ∂yts ∂ t
sH 0 In
∂v ′t ∂ ′t ∂v ′t
∂ t
H A −1 D 1/2
∂v ′t
Reduced form:
yt x t−1 t t i.i.d. N0,
A −1 B
t A −1 D 1/2 v t Hv t
E t ′t A −1 DA −1 ′ 26
The Identification Problem
A −1 DA −1 ′
Supply and demand example:
4 structural parameters in A and D
s , d , d s , d d
BUT can only estimate 3 parameters in by OLS
11 , 12 , 22
27
The Identification Problem
28
What is the Problem?
Structural model:
Ayt B 1 yt−1 B m yt−m u t u t D 1/2 v t
u t i.i.d. N0, D D diagonal
Intuition:
If we knew row i of A (denoted a ′i ,
then we could estimate coefficients for
i th structural equation (b ′i by OLS
regression of a ′i yt on x t−1 :
−1 ′
̂b i ∑ T x t−1 x ′ ∑ T
x t−1 y′t a i ̂
Tai
t1 t−1 t1
d̂ ii a ′i
̂ Tai ̂ T A ′ (diagonal)
̂ A
D 29
Traditional Approach to Identification
Point identification
30
Popular Identification Strategies for
Exact Identification
• Recursive ordering of the variables based on timing
assumptions (Sims 1986)
• Short-run structural relationships (Bernanke 1986,
Blanchard and Watson 1986)
• Separating transitory from permanent components by
assuming long-run structural relationships (Blanchard
and Quah 1989)
• Combination of short-run and long-run restrictions
(Galí 1992)
31
Example
• Assume that short-run price elasticity of supply
1 − s
A
1 − d
HP
Then we infer dynamic structural responses:
∂yts ∂yts ∂ t
sP
∂v ′t ∂ ′t ∂v ′t
33
Point Identification: Example
• Application 2: Simple bivariate supply and demand
model of the global oil market
yt Δq t , p t ′
= oil production growth
= real price of oil
monthly VAR(24) for 1975M2 to 2007M12
35
Identification Using Inequality Constraints
• We may have confidence in signs:
∂q t /∂v dt ∂q t /∂v st −
H
∂p t /∂v dt ∂p t /∂v st
36
How Do We Obtain Such an H?
• Claim: the set of all H such that HH ′
is the set defined by H PQ where Q is the
set of all orthogonal matrices (all n n matrices
satisfying QQ ′ I n
HH ′ PQPQ ′ PQQ ′ P ′ PIn P ′
• Q is called a rotation matrix because it allows
to "rotate" the initial Cholesky factor (recursive
matrix) while maintaining the property that
shocks are uncorrelated 38
One strategy:
j
(1) Generate a million matrices Q j 1, . . . , 10 6
drawn "uniformly" from the set of all orthogonal
matrices.
j
j j ̂
(2) For each Q calculate H PQ for
̂PP̂ ′
̂.
(3) Keep H j if it satisfies restrictions.
38
How to Generate a Draw for Q?
40
Solution:
j −1
(i) Generate from Wishart with
T − p degrees of freedom and scale matrix T ̂.
(ii) Calculate P j P j′ j and H j P j Q j .
41
Traditional Sign Restriction Algorithm
Step 1 Take a draw , from the posterior
of the reduced-form coefficients
Step 2 Compute the Cholesky factor P of
Step 3 Generate an n n matrix X x ij from N0, 1
Step 4 Take the QR decomposition of X QR for
Q orthogonal and R upper triangular. Normalize
the elements in Q such that the diagonal entries
of R are positive.
Step 5 Compute IRFs using H PQ
Step 6 Keep H if it satisfies sign restrictions; otherwise,
discard it. 43
Let’s Look at the Algorithm
• Application 2: Simple bivariate supply and demand
model of the global oil market
yt Δq t , p t ′
= oil production growth
= real price of oil
monthly VAR(24) for 1975M2 to 2007M12
45
Histogram of impact effect of one-standard-
deviation shocks: bivariate model
46
Histogram of impact effect of one-standard-
deviation shocks: 6-variable VAR
q 11 x 11 / x 211 x 2n1
q n1 x n1 / x 211 x 2n1
47
Bivariate Case
𝑥
𝑞 ,𝑞
𝜃
0 1
𝑥
q 11 x 11 / x 211 x 221
q 21 x 21 / x 211 x 221
is the angle between 1, 0 and x 11 , x 21 48
Some Trigonometry
hypotenuse (c)
opposite (b)
𝜃
adjacent (a)
Pythagoras: a 2 b 2 c 2 c a2 b2 49
Rotation Matrix
q 11 x 11 / x 211 x 221 cos
q 21 x 21 / x 211 x 221 sin
cos − sin
with prob 1/2
sin cos
Q
cos sin
with prob 1/2
sin − cos
U−,
50
q i1 x i1 / x 211 x 2n1
q 2i1 x 2i1 /x 211 x 2n1
Recall: x 11 N0, 1 x 211 2 1
x 211 x 2n1 2 n
In general: if X 2 and Y 2 are independent,
then X
XY
Beta 2 , 2
q 2i1 Beta1/2, n − 1/2
Γn/2
Γ1/2Γn−1/2
1 − q 2i1 n−3/2 if q i1 ∈ −1, 1
pq i1
0 otherwise
h 11 p 11 q 11 11 q 11
51
Impact effect of one-standard-deviation
shock on variable i: analytic distribution
52
Impact effect of one-standard-deviation
shock on variable i: evidence
̂T 2. 28 −0. 47 66 2. 94
−0. 47 32. 26
(6-variable VAR)
11 1. 51 (bivariate)
53
Take-Away #1
• A prior that is UNINFORMATIVE about a parameter
(here: the angle of rotation ) is in general
informative about nonlinear transformations of
55
Other Objects of Interest
• Suppose we are interested in the effect of a shock that
raises the price by 1% on quantity
• In the bivariate case with , we normalize
the impact matrix H by dividing the first column by its
first element
1 ...
H h 21
h 11
...
56
What’s the implicit prior distribution here?
H PQ
h 11 h 12 p 11 0 q 11 q 12
h 21 h 22 p 21 p 22 q 21 q 22
If we normalize shock 1 as something
that raises variable 1 by 1 unit:
h 21 p 21 q 11 p 22 q 21 p 21 p 22 x 21
h ∗21 h 11
p 11 q 11 p 11 p 11 x 11
x 21 /x 11 Cauchy0, 1
h ∗ij | Cauchyc ∗ij , ∗ij
location parameter: c ∗ij ij / jj
ii − 2ij / jj
scale parameter: ∗ij jj 57
Impact effect on variable i of shock
that increases j by one unit
58
What Happens If We Impose
Sign Restrictions?
• Sign restrictions confine these distributions to particular
regions but do not change their basic features.
• Apply traditional algorithm to 8-lag VAR fit to growth rates
of U.S. real compensation per worker and of U.S.
employment for the period 1970:Q1-2014:Q2
Application 3
• Identify supply and demand shock by sign restrictions:
Δwt
Δn t −
59
∗
Implied elasticity of labor demand ( )
p 11 cos p 11 sin
p 21 cos p 22 sin p 21 sin − p 22 cos
variable 1 wage, variable 2 employment
shock 1 demand, shock 2 supply
h 11 h 12
h 21 h 22 −
h 11 , h 12 ≥ 0 ∈ 0, /2 62
What’s the Nature of this Truncation?
h 11 h 12 p 11 cos p 11 sin
h 21 h 22 p 21 cos p 22 sin p 21 sin − p 22 cos
63
What Does This Imply for the Elasticities?
h 11 h 12 p 11 cos p 11 sin
h 21 h 22 p 21 cos p 22 sin p 21 sin − p 22 cos
for ∈ 0, h ∗21 ∈ 21 / 11 , 22 / 21
h ∗22 ∈ −, 0
64
p 21
for 0 : h ∗21 p 11 since tan0 0
h ∗22 − since cot 0
p 21 p 22 p 22 p 221 p 222
for : h ∗21 p 11 p 11 p 21 p 11 p 21
p 21
since tan 1/ cot and cot p 22
p 21 p 22 p 21
h ∗22 p 11− p 11 p 22 0
p 21
since cot p 22
′ p 11 0 p 11 p 21
PP
p 21 p 22 0 p 22
p 211 p 21 p 11 11 21
p 21 p 11 p 221 p 222 21 22 65
Take-Away #2
• The sign restrictions may end up implying no or
trivial restrictions on the feasible set
d ∈ −, 0
s ∈ 0. 04, 4. 06
the allowable set is uselessly large
68
Bayesian Inference in Set-Identified
Structural VAR Models
u t i.i.d. N0, D
D diagonal
68
Bayesian approach:
Summarize whatever information we have
that helps identify A in the form of a density pA.
pA is highest for values of A we think are
most plausible.
pA 0 for values of A we rule out altogether.
pA can also impose sign restrictions
and zeros
69
Bayesian begins with prior beliefs
before seeing data:
pA, D, B pApD|ApB|D, A
70
Prior for pD|A
d −1
ii |A Γ i , i
where
Ed −1
ii |A i / i
−1 2
Vard ii |A i / i
uninformative priors: i , i → 0
71
Prior for pB|D, A
B B1 B2 Bm
b i |A, D Nm i , d ii M i
uninformative priors: M −1
i → 0
72
Likelihood:
pY T |A, D, B 2 −Tn/2 |detA| T |D| −T/2
T
exp −1/2 ∑ t1 Ay t − Bx t−1 ′ D −1 Ay t − Bx t−1
prior:
pA, D, B pApD|ApB|A, D
posterior:
pY T |A,D,BpA,D,B
pA, D, B|Y T
pY T |A,D,BpA,D,BdAdDdB
pA|Y T pD|A, Y T pB|A, D, Y T
73
74
prior:
d −1
ii |A Γ i , i
posterior:
d −1
ii |A, Y T Γ ∗ ∗
i , i
p
As T → , d ii → true value
75
Posterior distribution for A
prior: pA
If M −1
i 0, and i i 0,
k T pA|det T A′ | T/2
posterior: pA|Y T ̂ T A ′ T/2
det diag(A
76
Posterior distribution for A
k T pA|det T A ′ | T/2
pA|Y T ̂ T A ′ T/2
det diag(A
77
Posterior distribution for A
k T pA|det T A ′ | T/2
pA|Y T ̂ T A ′ T/2
det diag(A
demand:
Δn t k d d Δw t b d11 Δw t−1 b d12 Δn t−1 b d21 Δw t−2
b d22 Δn t−2 b dm1 Δw t−m b dm2 Δn t−m u dt
supply:
Δn t k s s Δw t b s11 Δw t−1 b s12 Δn t−1 b s21 Δw t−2
b s22 Δn t−2 b sm1 Δw t−m b sm2 Δn t−m u st
81
Prior for the Elasticities
− d 1
for yt Δwt , Δn t ′ : A
− s 1
82
What do we know about the short-run wage
elasticity of labor demand?
• Hamermesh (1996) surveys microeconometric
studies: 0.1 to 0.75
• Lichter et al. (2014) conduct meta-analysis of
942 estimates: lower end of Hamermesh range
• Theoretical macro models can imply value
above 2.5 (Akerlof and Dickens, 2007; Galí,
Smets and Wouters 2012)
83
84
Student t prior for labor demand elasticity
85
What do we know about the wage elasticity
of labor supply?
• Long run: often assumed to be zero because
income and substitution effects cancel (e.g.,
Kydland and Prescott, 1982)
• Short run: often interpreted as Frisch elasticity
• Reichling and Whalen survey of microeconometric
studies: 0.27-0.53
• Chetty et al. (2013) review 15 quasi-experimental
studies: < 0.5
• Macro models often assume value greater than 2
(Kydland and Prescott, 1982, Cho and Cooley,
1994, Smets and Wouters, 2007) 86
87
Student t prior for labor supply elasticity
88
Prior for the inverse of the structural
variances
• Recall: d −1
ii |A Γ i , i
• Considerations:
Prior should in part reflect the scale of the data
Scales of individual innovations are obtained from
residuals of univariate denoted
is the sample variance matrix of
residuals
Prior mean is set equal to the
reciprocal of the diagonal element of
A −1 B
E In 0
nn nk−n
nk
EB|A A
′
m i A Eb i |A a i
91
Prior variance
Variance reflects increasing confidence in prior
expectation as lag order increases:
: confidence in higher-order lags = 0
v ′1 1/1 2 1 , 1/2 2 1 , . . . , 1/m 2 1
1m
are the diagonal elements of
v ′2 s −1 ,
11 22s −1
, . . . , s −1 ′
nn
1n
v1 ⊗ v2
v3 20
23
d −1
ii |A, Y T Γ ∗ ∗
i , i
∗i i T/2
∗i i A ∗i A/2
′ ′ ′ −1 ′
∗i A ̃
YiY ̃i − ̃
YiX ̃i ̃
XiX ̃i ̃
XiY ̃i
95
Posterior Distribution for
k T pAdetA ̂ T A ′ T/2 n
pA|Y T n i A i
i1
∗
2/T i A i∗ i1
96
Baumeister-Hamilton Algorithm
• Goal:
Generate draws from the joint posterior distribution
pA, D, B|Y T
• Procedure:
Draw A ℓ from pA|Y T
Draw D ℓ from pD|A, Y T
Draw B ℓ from pB|A, D, Y T
Repeat for ℓ 1, . . . , 10 6
ℓ ℓ ℓ N
• A , D , B ℓ1 is a draw from the joint posterior
97
How to Generate Draws for ?
• Problem:
Posterior distribution for A is of unknown form
cannot directly sample from it
• Solution:
Use random-walk Metropolis-Hastings algorithm to
approximate the posterior distribution pA|Y T
98
Metropolis-Hastings Algorithm
• Goal:
Draw samples from a distribution with unusual
form (referred to as the target density)
where is a K 1 vector of parameters
• How can we do that?
Specify a candidate-generating (proposal)
density q G1 | G or q G1 from which
we can generate draws easily.
G1 /q G1
Evaluate to decide whether
G /q G
104
Generating draws for A
• To enhance the efficiency of the algorithm, find the
value that maximizes the target function numerically
̂ −
∂ 2 qA
∂ ∂ ′ ̂
105
A Bayesian Interpretation of
Traditional Approaches to
Identification
Structural model of interest:
A y t B 1 y t1 B m y tm u t
nnn1
u t i.i.d. N0, D
D diagonal
2
Identification Strategy
• Traditional approach:
assume perfect knowledge of structure to achieve
identification
• Bayesian approach:
represent imperfect information about elements in A
in the form of a Bayesian prior distribution pA
3
How does this relate to non-Bayesian approaches?
(1) Traditional point-identified structural VARs
are a special case of a Bayesian prior that
is dogmatic.
A. Cholesky identification
B. Long-run restrictions
(2) Set-identified VARs with implicit informative
priors that cannot be chosen by the user
⇒ Application 5
5
Structural Model of the Global Oil Market
oil supply:
q t qy y t qp p t b 1 x t1 u 1t
economic activity:
y t yq q t yp p t b 2 x t1 u 2t
inverse of oil demand curve:
p t pq q t py y t b 3 x t1 u 3t
Note: pq inverse of short-run
price-elasticity of oil demand
6
Put in Canonical Form
1 qy qp
A yq 1 yp
pq py 1
7
Example 1: Kilian (AER 2009)
What Does Cholesky Identification Imply?
(Cholesky identification)
qy qp yp 0
oil supply:
q t qy y t qp p t b 1 x t1 u 1t
economic activity:
y t yq q t yp p t b 2 x t1 u 2t
inverse of oil demand curve:
p t pq q t py y t b 3 x t1 u 3t
8
Bayesian Translation of Cholesky
Identification
• I put absolutely zero possibility on any A
unless the (1,2), (1,3), and (2,3) elements
are all zero.
dogmatic prior: degenerate distribution
1 0 0
A yq 1 0
pq py 1
10
How to Represent “No Information”?
• Requirement:
Prior has to be a proper density
• Proposal:
(2,1) element: p yq Student t with location 0,
scale 100, d.f. 3
11
Prior for Lower-Triangular Elements of A
15
Prior (red) and posterior (blue) distributions for
unknown elements of A
16
Posterior density of short-run oil demand elasticity
ũ dt and
and q twould bias OLS estimate upward (closer to zero)
⇒ implies bias of estimated demand elasticity toward
larger absolute value (Baumeister and Hamilton, ET)
What do we know about the price elasticity
of demand?
• Cross-country regression of log of petroleum use per
dollar of GDP on real price of gasoline for 23 OECD
countries in 2004
4 4
(gallons/year/real GDP) in 2004
3 3
2.5 2.5
2 2
0 0.5 1 1.5 2 0
18
Log gasoline price ($ per gallon) in 2004
What do we know about the price elasticity
of demand?
• Cross-sectional evidence based on household
surveys
Newey and Hausman (1995): -0.81
Yatchew and No (2001): -0.9
21
Take-Away #4
22
B. Bayesian Interpretation of
Sign-Restricted VARs
Application 6: Kilian and Murphy JEEA (2012)
• Know with certainty the signs of the impact matrix
𝑯 = 𝑨−1
• Know with certainty interval in which elasticities fall
(boundary restrictions)
23
Sign Restrictions
24
′
What Prior Beliefs on the 𝜶 𝒔 Produce
Those Signs?
oil supply qy 0, qp 0
q t qy y t qp p t b 1 x t1 u 1t
economic activity yq 0, yp 0
y t yq q t yp p t b 2 x t1 u 2t
inverse of oil demand curve pq 0, py 0
p t pq q t py y t b 3 x t1 u 3t
25
1 0 qp
A 0 1 yp
pq py 1
26
How Do We Know?
1 py yp qp py qp
H A 1 1
1 qp pq py yp
yp pq 1 qp pq yp
pq py 1
detA
29
(2) Prior for the Impact Effect
This is a statement about the (2,3) element of the IRF
prior: U(-1.5, 0)
30
(2) Prior for the Impact Effect
31
Summary of Prior Beliefs
Prior for A:
qy yq 0
qp U0, 0. 0258
yp Student t (0,100,3)
truncated to be negative
pq Student t (0,100,3)
truncated to be negative
py Student t (0,100,3)
truncated to be positive 32
Blue: posterior median IRFs calculated using BH algorithm
Red: IRFs calculated using Kilian-Murphy (JEEA, 2012)
33
sign and boundary restrictions
Prior (red) and posterior (blue) distributions
for unknown elements of A and H
34
Posterior density of short-run
price elasticity of demand
• Measurement error:
is pervasive but tends to be ignored in most
structural inferences
supply: q t y t p t u st
y
income: y t q t p t ut
demand: q t y t p t d
ut
• Meaning of demand elasticity 𝛽:
If price were to increase 1% with income held
constant, by how much would quantity
demanded change?
Effects of shocks in this 3-variable system
⇨ Use inverse of H!
A Fully Bayesian
Approach: Estimation
and Inference
What Have We Learned So Far?
• Better to use nondogmatic priors and use all
available information
⇒ Be an economist!
2
Another advantage of being openly
Bayesian
3
Structural model of interest:
A y t B 1 y t1 B m y tm u t
nnn1
u t i.i.d. N0, D
D diagonal
4
Application 7:
A Three-Variable Macro Model
5
Model Description
6
Commonly used Taylor Rule
r t r 1 y y t 1 t
r t1 r u mt
is a special case of our equation
m
r t k y t t b x t1 u mt
m y
y 1 y
1
7
Prior information:
Taylor (1993) proposed values of
y 0. 5 and 1. 5
9
Prior information for smoothing parameter :
Lubik and Schorfheide (2004) and
Del Negro and Schorfheide (2004)
Beta2. 6, 2. 6
mean 0. 5, std dev 0. 2
10
1 s 0
A 1 d d
1 y 1 1
11
Commonly used dynamic IS curve
y t b y y t1|t r t t1|t u dt
where intertemporal elasticity of substitution
12
y t1|t c x t
y y
t1|t c x t
13
Minnesota prior: the most useful variable for
predicting any variable is its own lagged value.
y x t y y t
x t t
Minnesota prior: everything is a random walk
y 1
For our variables (output gap, inflation) we
instead expect
y 0. 75
14
y t c d y t1|t r t t1|t u dt
d y y t r t t u dt
y t d /1 y r t /1 y t ũ dt
15
So for our AD equation
d
y t k t r t b x t1 u dt
d d d
we expect
d /1 y 0. 5/0. 5 1
d /1 y 0. 75
Bayesian prior
d Student t1, 0. 4, 3 truncated 0
d Student t0. 75, 0. 4, 3 no sign restriction
16
Phillips Curve
y t k s s t b s x t1 u st
17
Priors for Contemporaneous Structural
Coefficients
1 s 0
A 1 d d
1 y 1 1
18
Additional Considerations
19
Solutions
H A 1 1
H
detA
d d 1 s s d
H d 1 y 1 1 d
1 d y 1 s y s d
21
Priors for Impacts of Shocks
?
signH
? ?
22
What Information on Impact Effects?
• Sign restriction on response of output gap
to supply shock:
h 1 d d 1 0
expect increase of output after
favorable supply shock
where
v x is a standard Student t variable
x is the cumulative distribution function
for a standard N0, 1 variable 24
Features of the Asymmetric t Distribution
26
How to Determine the Parameters of
This Distribution?
• By simulation:
1. Take draws from distributions for 𝛽𝑑 , 𝛾 𝑑 , 𝜓 𝜋 , and 𝜌
2. Compute for each draw ℎ1 = 𝛽𝑑 + 𝛾 𝑑 (1 − 𝜌)𝜓 𝜋
3. Compute mean and standard deviation from
empirical distribution
• By economic theory:
The output gap is unlikely to move one-for-one with a
change in the policy rate on impact.
27
Prior for ℎ1
• Simulation results: h 1 0. 1 and h 1 1
• Set: h 1 3, h 1 4 and h 1 1
28
Prior for ℎ2
• Economic intuition: h 2 0. 3 and h 2 0. 5
• Set: h 2 3, h 2 2 and h 2 1
29
Joint Prior
Add to
logpA logp s logp d logp d
logp y logp logp
the following two terms
logph 1 h 1 log v h1 h 1 h 1 / h 1 log h 1 h 1 / h 1
logph 2 h 2 log v h2 h 1 h 2 / h 2 log h 2 h 2 / h 2
where
h 1 and h 2 govern the overall weight put on
prior for h 1 and h 2 (here h 1 1 and h 2 1
30
Joint Prior
• Resulting prior is no longer independent across the
individual elements of A, but includes some joint
information about their interaction
favors combinations of parameters that are in
line with ℎ1 and ℎ2 over those that are not
31
Prior Toolbox
• To visualize prior densities that best reflect your prior
beliefs
• To calculate and simulate moments of prior
distributions
• Toolbox contains a set of useful distributions:
(truncated) Student t
Gamma
Beta
Asymmetric Student t
• Toolbox contains function to compute impact matrix
32
Priors for Structural Variances
• d 1
ii |A , a
i Sa i
2
33
Priors for Lagged Structural Coefficients
Minnesota prior:
• coeff on own lag in reduced form is 0.75
first three elements of b i may be close
to 0.75a i for a i the i th column of A
• all other coeffs 0
35
Impulse-Response Functions
Solid blue lines: posterior median. Shaded regions: 68% posterior credibility set. 36
Dotted blue lines: 95% posterior credibility set. Dashed red lines: prior median.
Prior and posterior probabilities that effect
of shock is positive at horizon s
Supply shock Demand shock Monetary policy shock
(1) (2) (3) (4) (5) (6)
Prior Posterior Prior Posterior Prior Posterior
Variable
s=0
y 0.851 1.000 1.000 1.000 0.000 0.000
π 0.000 0.000 1.000 1.000 0.000 0.000
r 0.008 0.229 1.000 1.000 0.999 1.000
s=1
y 0.717 1.000 0.994 1.000 0.037 0.079
π 0.006 0.000 0.961 1.000 0.117 0.046
r 0.054 0.374 0.965 1.000 0.981 1.000
s=2
y 0.617 1.000 0.974 1.000 0.143 0.206
π 0.021 0.000 0.879 1.000 0.272 0.078
r 0.156 0.478 0.869 1.000 0.916 1.000
37
Historical
Historical Decompositions
decompositions:
y ts ŷ ts|t
s1
m0
m tsm
ŷ ts|t 1
s1
m0
m A utsm
This decomposes value of y ts into forecast
at t and the n structural shocks between t
and t s.
⇒ Answers questions such as:
• How would a particular variable have evolved if only a
specific shock had occurred historically?
• What is the relative contribution of different types of
structural shocks to fluctuations in observed variable?38
Historical Decomposition of the Output Gap
Dashed red: actual data in deviation from mean. Solid blue: portion
attributed to indicated structural shock. Shaded regions: 68% posterior
39
credibility sets. Dotted blue: 95% posterior credibility sets.
Historical Decomposition of Inflation
Dashed red: actual data in deviation from mean. Solid blue: portion
attributed to indicated structural shock. Shaded regions: 68% posterior
40
credibility sets. Dotted blue: 95% posterior credibility sets.
Historical Decomposition of Fed Funds Rate
Dashed red: actual data in deviation from mean. Solid blue: portion
attributed to indicated structural shock. Shaded regions: 68% posterior
41
credibility sets. Dotted blue: 95% posterior credibility sets.
Variance Decompositions
What is the contribution of structural shock 𝑗 to
the s-period-ahead mean squared forecast
error of the 𝑖 𝑡ℎ element of yts ?
where
Q js d jj
s1
m0
h j m; h j m;
Q js /
n
j1
Q js
43
Variance decomposition of 4-quarter-
ahead forecast errors
45
Plot of Student t density with location parameter 0.75,
3 degrees of freedom, and scale parameter of 0.4, 2,
or 10.
46
Response of output gap
to monetary shock
with an uninformative prior
for indicated parameter.
Solid blue:
Posterior median.
Dashed red lines:
Benchmark posterior.
Parentheses:
Median contribution of MP
shock to 4-quarter-ahead
squared forecast error of
output gap.
47
Priors for Larger Systems
• Add to 3-dimensional system, 3 variables:
corporate bond spread, commodity spot price
and wages
Use informative priors for original set of
parameters
Use relatively uninformative priors for parameters
on additional variables: Student t(0,1,3)
Use sign restrictions for impact effects of
monetary policy shock
Identify only subset of structural shocks
48
1 a 12 a 13 a 14 a 15 a 16
1 a 22 a 23 a 24 a 25 a 26
a 31 a 32 1 a 34 a 35 a 36
A
a 41 a 42 a 43 1 a 45 a 46
a 51 a 52 a 53 a 54 1 a 56
a 61 a 62 a 63 a 64 a 65 1
49
Responses to a monetary policy shock in 6-variable VAR
with sign restrictions on impact matrix
50
Conclusions
(1) Structural interpretation of correlations only possible
by drawing on prior understanding of economic
structure.
52