Chapter 4 likelihood

Filtering and Likelihood Inference

Jesús Fernández-Villaverde

University of Pennsylvania

July 10, 2011

Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 1 / 79

Introduction

Motivation

Filtering, smoothing, and forecasting problems are pervasive in
economics.

Examples:

1 Macroeconomics: evaluating likelihood of DSGE models.

2 Microeconomics: structural models of individual choice with
unobserved heterogeneity.

3 Finance: time-varying variance of asset returns.

However, …ltering is a complicated endeavor with no simple and exact
algorithm.

Introduction

Environment I

Discrete time t 2 f1, 2, ...g .

Why discrete time?

1 Economic data is discrete.

2 Easier math.

Comparison with continuous time:

1 Discretize observables.

2 More involved math (stochastic calculus) but often we have extremely
powerful results.

Introduction

Environment II

States St .

We will focus on continuous state spaces.

Comparison with discrete states:

1 Markov-Switching models.

2 Jumps and continuous changes.

Initial state S0 is either known or it comes from p (S0 ; γ) .

Properties of p (S0 ; γ)? Stationarity?


State Space Representations


Transition equation:

St = f ( St 1 , Wt ; γ )

Measurement equation:

Yt = g (St , Vt ; γ)

f and g are measurable functions.

Interpretation. Modelling origin.

Note Markov structure.


Shocks

fWt g and fVt g are independent of each other.

fWt g is known as process noise and fVt g as measurement noise.

Wt and Vt have zero mean.

No assumptions on the distribution beyond that.

Often, we assume that the variance of Wt is given by Rt and the
variance of Vt by Qt .



DSGE Models and State Space Representations

We have the solution of a DSGE model:

St = P1 St 1 + P2 Zt
Yt = R1 St 1 + R2 Zt

This has nearly the same form that

St = f ( S t 1 , Wt ; γ )

We only need to be careful with:

1 To rewrite the measurement equation in terms of St instead of St 1.

2 How we partition Zt into Wt and Vt .

Later, we will present an example.


Generalizations I

We can accommodate many generalizations by playing with the state
de…nition:

1 Serial correlation of shocks.

2 Contemporaneous correlation of shocks.

3 Time changing state space equations.

Often, even in…nite histories (for example in a dynamic game) can be
tracked by a Lagrangian multiplier.



Generalizations II

However, some generalizations can be tricky to accommodate.

Take the model:
St = f ( St 1 , Wt ; γ )

Yt = g (St , Vt , Yt 1 ; γ)

Yt will be an in…nite-memory process.



Conditional Densities

From St = f (St 1 , Wt ; γ ) , we can compute p (St jSt 1 ; γ ).

From Yt = g (St , Vt ; γ), we can compute p (Yt jSt ; γ) .

From St = f (St 1 , Wt ; γ ) and Yt = g (St , Vt ; γ), we have:

Yt = g (f (St 1 , Wt ; γ ) , Vt ; γ )

and hence we can compute p (Yt jSt 1 ; γ ).


Filtering

Filtering, Smoothing, and Forecasting

Filtering: we are concerned with what we have learned up to current
observation.

Smoothing: we are concerned with what we learn with the full sample.

Forecasting: we are concerned with future realizations.


Filtering

Goal of Filtering I

Compute conditional densities: p St jy t 1; γ and p (St jy t ; γ) .

Why?

1 It allows probability statements regarding the situation of the system.

2 Compute conditional moments: mean, st jt and st jt 1, and variances
Pt jt and Pt jt 1 .

3 Other functions of the states. Examples of interest.

Theoretical point: do the conditional densities exist?


Filtering

Goals of Filtering II
Evaluate the likelihood function of the observables y T at parameter
values γ:
p yT ; γ
Given the Markov structure of our state space representation:
T
p yT ; γ = ∏p yt jy t 1
;γ
t =1
Then:
T
p yT ; γ = p (y1 jγ) ∏ p yt jy t 1
;γ
t =2
Z T Z
= p (y1 js1 ; γ) dS1 ∏ p (yt jSt ; γ) p St jy t 1
; γ dSt
t =2
T
Hence, knowledge of p St jy t 1 ; γ t =1 and p (S1 ; γ) allow the
evaluation of the likelihood of the model.

Filtering

Two Fundamental Tools

1 Chapman-Kolmogorov equation:
Z
p St j y t 1
;γ = p ( St j St 1 ; γ) p St 1 jy
t 1
; γ dSt 1

2 Bayes’theorem:

p (yt jSt ; γ) p St jy t 1; γ
p St j y t ; γ =
p (yt jy t 1 ; γ)

where:
Z
p yt jy t 1
;γ = p (yt jSt ; γ) p St jy t 1
; γ dSt


Filtering

Interpretation

All …ltering problems have two steps: prediction and update.

1 Chapman-Kolmogorov equation is one-step ahead predictor.

2 Bayes’theorem updates the conditional density of states given the new
observation.

We can think of those two equations as operators that map measures
into measures.


Filtering

Recursion for Conditional Distribution

Combining the Chapman-Kolmogorov and the Bayes’theorem:

p St j y t ; γ =
R
p (St jSt 1 ; γ) p St 1 jy t 1 ; γ dSt 1
R R p (yt jSt ; γ)
p (St jSt 1 ; γ) p (St 1 jy t 1 ; γ) dSt 1 p (yt jSt ; γ) dSt

To initiate that recursion, we only need a value for s0 or p (S0 ; γ).

Applying the Chapman-Kolmogorov equation once more, we get
T
p St jy t 1 ; γ t =1 to evaluate the likelihood function.


Filtering

Initial Conditions I

From previous discussion, we know that we need a value for s1 or
p ( S1 ; γ ) .

Stationary models: ergodic distribution.

Non-stationary models: more complicated. Importance of
transformations.

Initialization in the case of Kalman …lter.

Forgetting conditions.

Non-contraction properties of the Bayes operator.


Filtering

Smoothing

We are interested on the distribution of the state conditional on all
the observations, on p St jy T ; γ and p yt jy T ; γ .

We compute:
Z
p St + 1 j y T ; γ p ( St + 1 j St ; γ )
p St j y T ; γ = p St j y t ; γ dSt +1
p ( St + 1 j y t ; γ )

a backward recursion that we initialize with p ST jy T ; γ ,
T
fp (St jy t ; γ)gT=1 and p St jy t
t
1; γ
t =1
we obtained from …ltering.


Filtering

Forecasting

We apply the Chapman-Kolmogorov equation recursively, we can get
p (St +j jy t ; γ) , j 1.

Integrating recursively:
Z
p yl +1 jy l ; γ = p (yl +1 jSl +1 ; γ) p Sl +1 jy l ; γ dSl +1

from t + 1 to t + j, we get p yt +j jy T ; γ .

Clearly smoothing and forecasting require to solve the …ltering
problem …rst!


Filtering

Problem of Filtering
We have the recursion

p St j y t ; γ =
R
p (St jSt 1 ; γ) p St 1 jy t 1 ; γ dSt 1
R R p (yt jSt ; γ)
p (St jSt 1 ; γ) p (St 1 jy t 1 ; γ) dSt 1 p (yt jSt ; γ) dSt

A lot of complicated and high dimensional integrals (plus the one
involved in the likelihood).

In general, we do not have closed form solution for them.

Translate, spread, and deform (TSD) the conditional densities in ways
that impossibilities to …t them within any known parametric family.

Filtering

Exception

There is one exception: linear and Gaussian case.

Why? Because if the system is linear and Gaussian, all the conditional
probabilities are also Gaussian.

Linear and Gaussian state spaces models translate and spread the
conditional distributions, but they do not deform them.

For Gaussian distributions, we only need to track mean and variance
(su¢ cient statistics).

Kalman …lter accomplishes this goal e¢ ciently.


Kalman Filtering

Linear Gaussian Case

Let the following system:

Transition equation

st = Fst 1 + G ωt , ωt N (0, Q )

Measurement equation

yt = Hst + υt , υt N (0, R )

Assume we want to write the likelihood function of y T = fyt gT=1 .
t


Kalman Filtering

The State Space Representation is Not Unique

Take the previous state space representation.

Let B be a non-singular squared matrix conforming with F .

Then, if st = Bst , F = BFB 1 , G = BG , and H = HB 1, we can
write a new, equivalent, representation:

Transition equation

st + 1 = F st + G ω t , ω t N (0, Q )

Measurement equation

yt = H s t + υ t , υ t N (0, R )


Kalman Filtering

Example I

AR(2) process:

yt = ρ1 yt 1 + ρ2 zt 2 + σ υ υt , υt N (0, 1)

Model is not apparently not Markovian.

However, it is trivial to write it in a state space form.

In fact, we have many di¤erent state space forms.


Kalman Filtering

Example I

State Space Representation I:

yt ρ1 1 yt 1 συ
= + υt
ρ2 yt 1 ρ2 0 ρ2 yt 2 0
yt
yt = 1 0
ρ2 yt 1

State Space Representation II:

yt ρ1 ρ2 yt 1 συ
= + υt
yt 1 1 0 yt 2 0
yt
yt = 1 ρ2
yt 1

1 0
Rotation B = on the second system to get the …rst one.
0 ρ2

Kalman Filtering

Example II
MA(1) process:

yt = υt + θυt 1, υt N 0, σ2 , and Eυt υs = 0 for s 6= t.
υ

State Space Representation I:
yt 0 1 yt 1 1
= + υt
θυt 0 0 θυt 1 θ
yt
yt = 1 0
θυt
State Space Representation II:

st = υt 1
yt = sxt + υt
Again both representations are equivalent!

Kalman Filtering

Example III
Now we explore a di¤erent issue.

Random walk plus drift process:
yt = yt 1 + β + σ υ υt , υt N (0, 1)

This is even more interesting: we have a unit root and a constant
parameter (the drift).

State Space Representation:
yt 1 1 yt1 συ
= + υt
β 0 1 β 0
yt
yt = 1 0
β


Kalman Filtering

Some Conditions on the State Space Representation

We only consider stable systems.

A system is stable if for any initial state s0 , the vector of states, st ,
converges to some unique s .

A necessary and su¢ cient condition for the system to be stable is
that:
jλi (F )j < 1
for all i, where λi (F ) stands for eigenvalue of F .


Kalman Filtering

Introducing the Kalman Filter

Developed by Kalman and Bucy.

Wide application in science.

Basic idea.

Prediction, smoothing, and control.

Di¤erent derivations.


Kalman Filtering

Some De…nitions

De…nition
Let st jt 1 = E st jy t 1 be the best linear predictor of st given the history
of observables until t 1, i.e., y t 1 .

De…nition
Let yt jt 1 = E yt jy t 1 = Hst jt 1 be the best linear predictor of yt given
the history of observables until t 1, i.e., y t 1 .

De…nition
Let st jt = E (st jy t ) be the best linear predictor of st given the history of
observables until t, i.e., s t .


Kalman Filtering

What is the Kalman Filter Trying to Do?

Let assume we have st jt 1 and yt jt 1.

We observe a new yt .

We need to obtain st jt .

Note that st +1 jt = Fst jt and yt +1 jt = Hst +1 jt , so we can go back to
the …rst step and wait for yt +1 .

Therefore, the key question is how to obtain st jt from st jt 1 and yt .


Kalman Filtering

A Minimization Approach to the Kalman Filter
Assume we use the following equation to get st jt from yt and st jt 1:

st j t = st j t 1 + Kt yt yt jt 1 = st j t 1 + Kt yt Hst jt 1

This formula will have some probabilistic justi…cation (to follow).

Kt is called the Kalman …lter gain and it measures how much we
update st jt 1 as a function in our error in predicting yt .

The question is how to …nd the optimal Kt .

The Kalman …lter is about how to build Kt such that optimally
update st jt from st jt 1 and yt .

How do we …nd the optimal Kt ?

Kalman Filtering

Some Additional De…nitions
De…nition
0
Let Σt jt 1 E st st jt 1 st st jt 1 jy t 1 be the predicting error
variance covariance matrix of st given the history of observables until
t 1, i.e. y t 1 .

De…nition
0
Let Ωt jt 1 E yt yt jt 1 yt yt jt 1 jy t 1 be the predicting
error variance covariance matrix of yt given the history of observables until
t 1, i.e. y t 1 .

De…nition
0
Let Σt jt E st st j tjy t be the predicting error variance
st st j t
covariance matrix of st given the history of observables until t, i.e. y t .

Kalman Filtering

The Kalman Filter Algorithm I

Given Σt jt 1, yt , and st jt 1, we can now set the Kalman …lter
algorithm.

Let Σt jt 1, then we compute:

0
Ω t jt 1 E yt yt jt 1 yt yt jt 1 jy t 1

0 0
1
s t j t 1 st s t j t 1 H 0
H st
B 0 C
= E@ + υ t st st j t 1 H 0 A
+ H st st j t 1 υ t0 + υ υ 0 jy t 1
t t
0
= HΣt jt 1H +R


Kalman Filtering

The Kalman Filter Algorithm II


0
E yt yt jt 1 st st j t 1 jy t 1
=
0
!
H s t s t j t 1 st st j t 1
E 0 t 1
= HΣt jt 1
+ υ t st st j t 1 jy

1
Kt = Σ t j t 1H
0
HΣt jt 1H
0
+R

Let Σt jt 1 , st j t 1 , Kt , and yt , then we compute:

st j t = st j t 1 + Kt yt Hst jt 1


Kalman Filtering

Finding the Optimal Gain
We want Kt such that min Σt jt .
Thus:
1
Kt = Σ t j t 1H
0
HΣt jt 1H
0
+R

with the optimal update of st jt given yt and st jt 1 being:
st j t = st j t 1 + Kt yt Hst jt 1

Intuition: note that we can rewrite Kt in the following way:
Kt = Σ t j t 1H
0
Ω t jt
1
1

1 If we did a big mistake forecasting st jt 1 using past information (Σt jt 1
large), we give a lot of weight to the new information (Kt large).
2 If the new information is noise (R large), we give a lot of weight to the
old prediction (Kt small).

Kalman Filtering

Example
Assume the following model in state space form:
st = µ + ω t , ω t N 0, σ2
ω

yt = s t + υ t , υ t N 0, σ2
υ

Let σ2 = qσ2 .
υ ω
Then, if Σ1 j0 = σ2 , (s1 is drawn from the ergodic distribution of st ):
ω
1 1
K1 = σ 2
ω ∝ .
1+q 1+q

Therefore, the bigger σ2 relative to σ2 (the bigger q), the lower K1
υ ω
and the less we trust y1 .

Kalman Filtering

The Kalman Filter Algorithm III

Let Σt jt 1 , st j t 1 , Kt , and yt .
Then, we compute:
0
Σ t jt E st st j t st st j t jy t =
0 0 1
st st j t
st j t 1 1 st
B 0
st st jt 1 yt Hst jt 1 Kt0 C
B C
EB 0 C = Σ t jt 1 Kt HΣt jt 1
@ Kt yt Hst jt 1 st st jt 1 + A
0 0 t
Kt yt Hst jt 1 yt Hst jt 1 Kt jy

where
st st j t = s t st j t 1 Kt yt Hst jt 1 .


Kalman Filtering

The Kalman Filter Algorithm IV

Let Σt jt 1 , st j t 1 , Kt , and yt , then we compute:

Σt +1 jt = F Σt jt F 0 + GQG 0

Let st jt , then we compute:

1 st +1 jt = Fst jt

2 yt +1 jt = Hst +1 jt

Therefore, from st jt 1, Σ t jt 1, and yt we compute st jt and Σt jt .

We also compute yt jt 1 and Ωt jt 1 to help (later) to calculate the
likelihood function of y T = fyt gT=1 .
t


Kalman Filtering

The Kalman Filter Algorithm: A Review
We start with st jt 1 and Σt jt 1. Then, we observe yt and:

Ω t jt 1 = HΣt jt 1H
0 +R

yt jt 1 = Hst jt 1

1
Kt = Σ t j t 1H
0 HΣt jt 1H
0 +R

Σ t jt = Σ t jt 1 Kt HΣt jt 1

st j t = st j t 1 + K t yt Hst jt 1

Σt +1 jt = F Σt jt F 0 + GQG 0

st +1 jt = Fst jt

We …nish with st +1 jt and Σt +1 jt .

Kalman Filtering

Writing the Likelihood Function

Likelihood function of y T = fyt gT=1 :
t

log p y T jF , G , H, Q, R =
T
∑ log p yt jy t 1
F , G , H, Q, R =
t =1
T
N 1 1 0
∑ 2
log 2π + log Ωt jt
2 1 + ς t Ω t jt 1 ς t
2
1
t =1

where:
ςt = yt yt jt 1 = yt Hst jt 1

is white noise and:
Ω t jt 1 = Ht Σ t j t 0
1 Ht +R


Kalman Filtering

Initial conditions for the Kalman Filter
An important step in the Kalman Filter is to set the initial conditions.
Initial conditions s1 j0 and Σ1 j0 .
Where do they come from?

Since we only consider stable system, the standard approach is to set:
s1 j 0 = s
Σ 1 j0 = Σ

where s solves:
s = Fs
Σ = F Σ F 0 + GQG 0
How do we …nd Σ ?
1
Σ = [I F F] vec (GQG 0 )

Kalman Filtering

Initial conditions for the Kalman Filter II

Under the following conditions:

1 The system is stable, i.e. all eigenvalues of F are strictly less than one
in absolute value.

2 GQG 0 and R are p.s.d. symmetric.

3 Σ1 j0 is p.s.d. symmetric.

Then Σt +1 jt ! Σ .


Kalman Filtering

Remarks

1 There are more general theorems than the one just described.

2 Those theorems are based on non-stable systems.

3 Since we are going to work with stable system the former theorem is
enough.

4 Last theorem gives us a way to …nd Σ as Σt +1 jt ! Σ for any Σ1 j0 we
start with.


The Kalman Filter and DSGE models

Basic real business cycle model:
∞
max E0 ∑ βt flog ct + ψ log (1 lt )g
t =0
ct + kt +1 = ktα (e zt lt )1 α
+ (1 δ) kt
zt = ρzt 1 + σεt , εt N (0, 1)
Equilibrium conditions:
1 1
= βEt αktα+1 (e zt +1 lt +1 )1 α + 1 δ
1
ct ct + 1
lt
ψ ct = (1 α) ktα (e zt lt )1 α
1 lt
ct + kt +1 = ktα (e zt lt )1 α + (1 δ) kt
zt = ρzt 1 + σεt


The Kalman Filter and Linearized DSGE Models
We loglinearize (or linearize) the equilibrium conditions around the
steady state.

Alternatives: particle …lter.

We assume that we have data on:

1 log outputt
2 log lt
3 log ct
0
s.t. a linearly additive measurement error Vt = v1,t v2,t v3,t .

Why measurement error? Stochastic singularity.

Degrees of freedom in the measurement equation.


Policy Functions

We need to write the model in state space form.

Remember that a loglinear solution has the form:

b b
kt +1 = p1 kt + p2 zt

and

output t b
= q1 kt + q2 zt
blt b
= r1 kt + r2 zt
bt
c b
= u1 kt + u2 zt



Writing the Likelihood Function

0 1 0 10 1 0 1
1 1 0 0 1 0
@ kt A = @ 0 p1 p2 A@ kt 1 A + @
b b 0 A t .
|{z}
zt 0 0 ρ zt 1 σ ω
| {z } | {z }| {z } | {z } t
st F st 1 G

0 1 0 10 1 0 1
log outputt log y q1 q2 1 v1,t
@ log lt A = @ log l r1 r2 A@ kt A + @
b v2,t A
log ct log c u1 u2 zt v3,t
| {z } | {z }| {z } | {z }
yt H st υ



The Solution to the Model in State Space Form

Now, with y T , F , G , H, Q, and R as de…ned before...

...we can use the Ricatti equations to evaluate the likelihood function:

log p y T jγ = log p y T jF , G , H, Q, R

where γ = fα, β, ρ, ψ, δ, σg .

Cross-equations restrictions implied by equilibrium solution.

With the likelihood, we can do inference!


Nonlinear Filtering

Nonlinear Filtering

Di¤erent approaches.

Deterministic …ltering:

1 Kalman family.

2 Grid-based …ltering.

Simulation …ltering:

1 McMc.

2 Sequential Monte Carlo.


Nonlinear Filtering

Kalman Family of Filters

Use ideas of Kalman …ltering to NLGF problems.

Non-optimal …lters.

Di¤erent implementations:

1 Extended Kalman …lter.

2 Iterated Extended Kalman …lter.

3 Second-order Extended Kalman …lter.

4 Unscented Kalman …lter.

Nonlinear Filtering

The Extended Kalman Filter

EKF is historically the …rst descendant of the Kalman …lter.

EKF deals with nonlinearities with a …rst order approximation to the
system and applying the Kalman …lter to this approximation.

Non-Gaussianities are ignored.


Nonlinear Filtering

Algorithm
Given st 1 j t 1 , st j t 1 = f st 1 jt 1 , 0; γ .
Then:
0
Pt jt 1 = Qt 1 + Ft Pt 1 j t 1 Ft
where
df (St 1 , Wt ; γ)
Ft =
dSt 1 St 1 =s t 1 jt 1 ,W t =0

Kalman gain, Kt , is:
0 0 1
Kt = Pt jt 1 Gt Gt Pt jt 1 Gt + Rt
where
dg (St 1 , vt ; γ)
Gt =
dSt 1 St 1 =s t jt 1 ,v t =0

Then
st j t = st j t 1 + Kt yt g st jt 1 , 0; γ
Pt jt = Pt jt 1 Kt Gt Pt jt 1

Nonlinear Filtering

Problems of EKF

1 It ignores the non-Gaussianities of Wt and Vt .

2 It ignores the non-Gaussianities of states distribution.

3 Approximation error incurred by the linearization.

4 Biased estimate of the mean and variance.

5 We need to compute Jacobian and Hessians.

As the sample size grows, those errors accumulate and the …lter diverges.

Nonlinear Filtering

Iterated Extended Kalman Filter I
Compute st jt 1 and Pt jt 1 as in EKF.
Iterate N times on:
i0 i0 1
Kti = Pt jt 1 Gt Gti Pt jt 1 Gt + Rt
where
dg (St 1 , vt ; γ)
Gti =
dSt 1 St i
1 =s t jt 1 ,v t =0

and
sti jt = st jt 1 + Kti yt g st j t 1 , 0; γ
Why are we iterating? How many times?
Then:
st j t = st j t 1 + Kt yt g stNt
j 1 , 0; γ

Pt jt = Pt jt 1 KtN GtN Pt jt 1


Nonlinear Filtering

Second-order Extended Kalman Filter

We keep second-order terms of the Taylor expansion of transition and
measurement.

Theoretically, less biased than EKF.

Messy algebra.

In practice, not much improvement.


Nonlinear Filtering

Unscented Kalman Filter I
Recent proposal by Julier and Uhlmann (1996).
Based around the unscented transform.
A set of sigma points is selected to preserve some properties of the
conditional distribution (for example, the …rst two moments).
Then, those points are transformed and the properties of the new
conditional distribution are computed.
The UKF computes the conditional mean and variance accurately up
to a third order approximation if the shocks Wt and Vt are Gaussian
and up to a second order if they are not.
The sigma points are chosen deterministically and not by simulation
as in a Monte Carlo method.
The UKF has the advantage with respect to the EKF that no Jacobian
or Hessians is required, objects that may be di¢ cult to compute.

Nonlinear Filtering

New State Variable
We modify the state space by creating a new augmented state
variable:

St = [St , Wt , Vt ]
that includes the pure state space and the two random variables Wt
and Vt .

We initialize the …lter with

s0 j0 = E (St ) = E (S0 , 0, 0)
2 3
P0 j0 0 0
P0 j0 = 4 0 R0 0 5
0 0 Q0

Nonlinear Filtering

Sigma Points

Let L be the dimension of the state variable St .

For t = 1, we calculate the 2L + 1 sigma points:

S0,t 1 jt 1 = st 1 jt 1
0.5
Si ,t 1 jt 1 = st 1 jt 1 (L + λ) Pt 1 jt 1 for i = 1, ..., L
0.5
Si ,t 1 jt 1 = st 1 jt 1 + (L + λ) Pt 1 jt 1 for i = L + 1, ..., 2L


Nonlinear Filtering

Parameters

λ = α2 (L + κ ) L is a scaling parameter.

α determines the spread of the sigma point and it must belong to the
unit interval.

κ is a secondary parameter usually set equal to zero.

Notation for each of the elements of S :

Si = [Sis , Siw , Siv ] for i = 0, ..., 2L


Nonlinear Filtering

Weights

Weights for each point:

m λ
W0 =
L+λ
c λ
W0 = + 1 α2 + β
L+λ
m c 1
W0 = X0 = for i = 1, ..., 2L
2 (L + λ )

β incorporates knowledge regarding the conditional distributions.

For Gaussian distributions, β = 2 is optimal.


Nonlinear Filtering

Algorithm I: Prediction of States

We compute the transition of the pure states:

Sis,t jt 1 = f Sis,t jt w
1 , Si ,t 1 jt 1 ; γ

Weighted state
2L
st j t 1 = ∑ Wim Sis,t jt 1
i =0

Weighted variance:
2L 0
Pt jt 1 = ∑ Wic Sis,t jt 1 st j t 1 Sis,t jt 1 st j t 1
i =0


Nonlinear Filtering

Algorithm II: Prediction of Observables

Predicted sigma observables:

Yi ,t jt 1 = g Sis,t jt v
1 , Si ,t jt 1 ; γ

Predicted observable:
2L
yt jt 1 = ∑ Wim Yi ,t jt 1
i =0


Nonlinear Filtering

Algorithm III: Update

Variance-covariance matrices:

2L
∑ Wic
0
Pyy ,t = Yi ,t jt 1 yt jt 1 Yi ,t jt 1 yt jt 1
i =0
2L
∑ Wic
0
Pxy ,t = Sis,t jt 1 st j t 1 Yi ,t jt 1 yt jt 1
i =0

Kalman gain:

Kt = Pxy ,t Pyy1
,t


Nonlinear Filtering

Algorithm IV: Update

Update of the state:

s t j t = s t j t + K t yt yt jt 1

the update of the variance:

Pt jt = Pt jt 1 + Kt Pyy ,t Kt0

Finally:
2 3
Pt jt 0 0
Pt jt =4 0 Rt 0 5
0 0 Qt

Nonlinear Filtering

Grid-Based Filtering

Remember that we have the recursion

p st j y t ; γ =
R
p (st jst 1 ; γ) p st 1 jy t 1 ; γ dst 1
R R p (yt jst ; γ)
p (st jst 1 ; γ) p (st 1 jy t 1 ; γ) dst 1 p (yt jst ; γ) dst

This recursion requires the evaluation of three integrals.

This suggests the possibility of addressing the problem by computing
those integrals by a deterministic procedure as a grid method.

Kitagawa (1987)and Kramer and Sorenson (1988).


Nonlinear Filtering

Grid-Based Filtering I

We divide the state space into N cells, with center point sti ,
sti : i = 1, ..., N .

We substitute the exact conditional densities by discrete densities that
N
put all the mass at the points sti i =1 .

We denote δ (x ) is a Dirac function with mass at 0.


Nonlinear Filtering

Grid-Based Filtering II

Then, approximated distributions and weights:
N
p st j y t 1
;γ ' ∑ ωit jt 1δ st sti
i =1
N
p st j y t ; γ ' ∑ ωit jt 1δ st sti
i =1
N
ω it jt 1 = ∑ ωjt 1 jt 1
p sti jstj 1; γ
j =1

ω it jt 1
p yt jsti ; γ
ω it jt =
j
∑ N 1 ω t jt
j= 1
p yt jstj ; γ


Nonlinear Filtering

Approximated Recursion

p st j y t ; γ =
h i
N
N
∑ j =1 ω jt 1 jt 1 p sti jstj 1 ; γ p yt jsti ; γ
∑ h
j j
i
j
δ st sti
i =1 ∑N 1 ∑N 1 ω t 1 jt 1 p sti jst 1 ; γ p yt jst ; γ
j= j=

Compare with

p st j y t ; γ =
R
p (st jst 1 ; γ) p st 1 jy t 1 ; γ dst 1
R R p (yt jst ; γ)
p (st jst 1 ; γ) p (st 1 jy t 1 ; γ) dst 1 p (yt jst ; γ) dst
given that
N
p st 1 jy
t 1
;γ ' ∑ ωit 1 jt 1 δ sti
i =1


Nonlinear Filtering

Problems

Grid …lters require a constant readjustment to small changes in the
model or its parameter values.

Too computationally expensive to be of any practical bene…t beyond
very low dimensions.

Grid points are …xed ex ante and the results are very dependent on
that choice.

Can we overcome those di¢ culties and preserve the idea of integration?
Yes, through Monte Carlo Integration.


Nonlinear Filtering

Particle Filtering

Remember,
1 Transition equation:
St = f ( St 1 , Wt ; γ )
2 Measurement equation:


Some Assumptions:
1 We can partition fWt g into two independent sequences fW1,t g and
fW2,t g, s.t. Wt = (W1,t , W2,t ) and
dim (W2,t ) + dim (Vt ) dim (Yt ).
2 We can always evaluate the conditional densities
p yt j W 1 , y t 1 , S 0 ; γ .
t

3 The model assigns positive probability to the data.

Nonlinear Filtering

Rewriting the Likelihood Function

Evaluate the likelihood function of the a sequence of realizations of
the observable y T at a particular parameter value γ:

p yT ; γ

We factorize it as:

T
p yT ; γ = ∏p yt jy t 1
;γ
t =1
T Z Z
= ∏ p yt jW1 , y t
t 1
, S 0 ; γ p W1 , S 0 j y t
t 1 t
; γ dW1 dS0
t =1


Nonlinear Filtering

A Law of Large Numbers

n oN T
t jt 1,i t jt 1,i
If s0 , w1 N i.i.d. draws from
i =1 t =1
T
p W1 , S 0 j y t
t 1; γ
t =1
, then:

T N
1
∏N ∑p
t jt 1,i t jt 1,i
p yT ; γ ' yt j w 1 , yt 1
, s0 ;γ
t =1 i =1

The problem of evaluating the likelihood is equivalent to the problem of
drawing from

T
p W1 , S 0 j y t
t 1
;γ t =1


Nonlinear Filtering

Introducing Particles
n oN
t 1,i t 1,i t 1
s0 , w1 N i.i.d. draws from p W1 , S0 j y t 1; γ .
i =1
n oN
Each s0 1,i , w1 1,i is a particle and s0 1,i , w1 1,i
t t t t
a swarm of
i =1
particles.
n o
t jt 1,i t jt 1,i N
s0 , w1 N i.i.d. draws from p W1 , S0 jy t 1 ; γ .
t
i =1

t jt 1,i t jt 1,i
Each s0 , w1 is a proposed particle and
n o
t jt 1,i t jt 1,i N
s0 , w1 a swarm of proposed particles.
i =1

Weights:
t jt 1,i 1 , s t jt 1,i ; γ
p yt jw1 , yt 0
i
qt = t jt 1,i 1 , s t jt 1,i ; γ
∑N 1 p yt jw1
i= , yt 0


Nonlinear Filtering

A Proposition
n oN
N
si e i
Let e0 , w1 i =1
be a draw with replacement from s0 jt
t 1,i t jt 1,i
, w1
i =1
i N
si e i
and probabilities qt . Then e0 , w1 i =1
is a draw from p (W1t , S0 jy t ; γ).

Importance of the Proposition:

n o
t jt 1,i t jt 1,i N
1 It shows how a draw s0 , w1 from p W1 , S0 jy t 1 ; γ
t
n oN i =1
t,i t,i
can be used to draw s 0 , w1 from p (W1 , S0 jy t ; γ).
t
i =1

n oN
t,i t,i
2 With a draw from p (W1 , S0 jy t ; γ) we can use
s 0 , w1 t
n i =1 o
t +1 jt,i t +1 jt,i N
p (W1,t +1 ; γ) to get a draw s0 , w1 and iterate the
i =1
procedure.

Nonlinear Filtering

Sequential Monte Carlo
Step 0, Initialization: Set t 1 and set
p W1 1 , S 0 j y t 1 ; γ = p ( S 0 ; γ ) .
t
n oN
t jt 1,i t jt 1,i
Step 1, Prediction: Sample N values s0 , w1 from
i =1
the density p W1 , S0 jy t
t 1; γ = p (W1,t ; γ) p W1t 1 , S0 jy t 1 ; γ .
t jt 1,i t jt 1,i
Step 2, Weighting: Assign to each draw s0 , w1 the
weight qt . i
n oN
t,i t,i
Step 3, Sampling: Draw s0 , w1 with rep. from
n oN i =1
t jt 1,i
s0 , w1
t jt 1,i i N
with probabilities qt i =1 . If t < T set
i =1
t t + 1 and go to step 1. Otherwise go to step 4.
n o T
t jt 1,i t jt 1,i N
Step 4, Likelihood: Use s0 , w1 to compute:
i =1 t =1
T N
1
∏N ∑p
t jt 1,i t jt 1,i
p yT ; γ ' yt j w 1 , yt 1
, s0 ;γ
t =1 i =1

Nonlinear Filtering

A “Trivial” Application

How do we evaluate the likelihood function p y T jα, β, σ of the nonlinear,
non-Gaussian process:

st 1
st = α+β + wt
1 + st 1
yt = st + vt

where wt N (0, σ) and vt t (2) given some observables
T = y T
y f t gt =1 and s0 .


Nonlinear Filtering

0,i
1 Let s0 = s0 for all i.
n oN
1 j0,i
2 Generate N i.i.d. draws s0 , w 1 j0,i from N (0, σ).
i =1
3 Evaluate
1 j0,i
1 j0,i 1 j0,i s0
p y1 jw1 , y 0 , s0 = pt (2 ) y1 α+β 1 j0,i + w 1 j0,i .
1 +s 0
!!
1 j0,i
s0
p t (2 ) y 1 α+ β 1 j0,i
+w 1 j0,i
i 1 +s 0
4 Evaluate the relative weights q1 = 1 j0,i
!! .
s
∑ N 1 p t (2 ) y 1
i= α+ β 0 1 j0,i +w 1 j0,i
1 +s 0
oN n
1 j0,i
5 Resample with replacement N values of , w 1 j0,i with s0
n i =1
oN
1,i
i
relative weights q1 . Call those sampled values s0 , w 1,i .
i =1
6 Go to step 1, and iterate 1-4 until the end of the sample.


Nonlinear Filtering

A Law of Large Numbers

A law of the large numbers delivers:

N
1
∑p
1 j0,i 1 j0,i
p y1 j y 0 , α, β, σ ' y1 jw1 , y 0 , s0
N i =1

and consequently:

T N
1
∏N ∑p
t jt 1,i t jt 1,i
p y T α, β, σ ' yt jw1 , yt 1
, s0
t =1 i =1


Chapter 4 likelihood

More Related Content

Chapter 4 likelihood