Biometrika Trust

Biometrika Trust
On Gibbs Sampling for State Space Models

Author(s): C. K. Carter and R. Kohn
Source: Biometrika, Vol. 81, No. 3 (Aug., 1994), pp. 541-553
Published by: Biometrika Trust
Stable URL: http://www.jstor.org/stable/2337125 .
Accessed: 17/06/2014 00:57
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.
http://www.jstor.org
This content downloaded from 185.2.32.121 on Tue, 17 Jun 2014 00:57:14 AM

All use subject to JSTOR Terms and Conditions
Biometrika(1994), 81, 3, pp. 541-53
Printed in Great Britain
On Gibbssamplingfor state spacemodels

BY C. K. CARTER AND R. KOHN
Australian GraduateSchool of Management, Universityof New South Wales,PO Box 1,
Kensington, N.S.W., Australia, 2033
SUMMARY
We show how to use the Gibbs samplerto carry out Bayesianinferenceon a linear
state space model with errors that are a mixture of normals and coefficientsthat can
switch over time. Our approachsimultaneouslygeneratesthe whole of the state vector
given the mixtureand coefficientindicatorvariablesand simultaneouslygeneratesall the
indicatorvariablesconditionalon the state vectors. The states are generatedefficiently
using the Kalmanfilter.We illustrateour approachby severalexamplesand empirically
compareits performanceto anotherGibbs samplerwherethe states are generatedone at
a time. The empiricalresults suggest that our approachis both practicalto implement
and dominatesthe Gibbs samplerthat generatesthe states one at a time.
Some key words: Diffuse parameter; Kalman filter; Markov chain Monte Carlo; Mixture of normals; Spline
smoothing; Switching regression;Trend plus seasonal model.
1. INTRODUCTION
Considerthe linearstate space model
y(t) = h(t)'x(t) + e(t),(1)
x(t + 1) =F(t + 1)x(t) +u(t + 1), (1.2)
where y(t) is a scalar observationand x(t) is an m x 1 state vector.We assumethat the
errorsequences{e(t), t > 1} and {u(t), t > 1} aremixturesof normals.Let 0 be a parameter
vector whose value determinesh(t) and F(t) and also the distributionsof e(t) and u(t).
Furtherdetails of the structureof the model are given in ? 2 1. Equation(1 1) is called
the observationequation and (1 2) the state transitionequation.When e(t) and u(t) are
independentGaussiansequencesunknownparametersare usuallyestimatedby maximum
likelihoodfollowingSchweppe(1965).The Kalmanfilterand statespacesmoothingalgor-
ithms are used to carryout the computations.
Thereare a numberof applicationsin the literaturewhereit is necessaryto go beyond
the Gaussianlinearstate space model:e.g. Harrison& Stevens(1976), Gordon & Smith
(1990),Hamilton(1989) and Shumway& Stoffer(1991). Meinhold& Singpurwalla(1989)
robustifythe Kalman filter by taking both e(t) and u(t) to be t distributed.A general
approach to estimating non-Gaussian and nonlinear state space models is given by
Kitagawa(1987). Exceptwhen the dimensionof the state vectoris very small,Kitagawa's
approachappearscomputationallyintractableat this stage.Variousapproximatefiltering
and smoothingalgorithmsfor nonlinearand non-Gaussianstate spacemodels have been
given in the literature.See, for example,Anderson& Moore (1979, Ch. 8) and West &
Harrison(1989).

542 C. K. CARTERAND R. KOHN
Using the Gibbs sampler,Carlin,Polson & Stoffer(1992) providea generalapproach
to Bayesianstatisticalinferencein state space models allowingthe errorse(t) and u(t) to
be non-Gaussianand the dependenceon x(t) in (11) and (1-2) to be nonlinear.They
generatethe states one at a time utilizingthe Markovpropertiesof the state spacemodel
to condition on neighbouringstates. In this paper we take a differentGibbs sampling
approach,generatingall the states at once by taking advantageof the time orderingof
the state space model. We show how to carry out all the necessarycomputationsusing
standardGaussian filteringand smoothing algorithms.Although our approachis less
generalthan that of Carlinet al. (1992), for the class of models consideredin this paper
our approachwill be more efficientthan theirs, in the sense that convergenceto the
posterior distributionwill be faster and estimates of the posterior moments will have
smallervariances.To quantifythe differencebetween our approachand that of Carlin
et al. (1992), we study empiricallythe performanceof both algorithmsfor two simpleand
commonly-usedtrendand seasonalmodels.For both examplesgeneratingthe statessimul-
taneously producesGibbs iterateswhich convergerapidlyto the posteriordistribution
from arbitrarystartingpoints. In contrast,when the states are generatedone at a time
therewas slow convergenceto the posteriordistributionfor one of the examplesand the
estimatesof the posteriormeans were far less efficientthan the correspondingestimates
when generatingthe states simultaneously.In the secondexamplethereis no convergence
to the posteriordistributionwhen the states were generatedone at a time because the
resultingMarkovchainis reducible.Our approachis supportedtheoreticallyby the results
of Liu, Wong & Kong (1994) who show that when measuredin some norm generating
variablessimultaneouslyproducesfasterconvergencethan generatingthem one at a time.
Section 2 discussesGibbs samplingand how to generatethe states and the indicator
variables.Section 3 illustratesthe general theory with four examples and empirically
comparesthe performanceof our algorithmwith that of generatingthe states one at a
time. Appendix 1 shows how to generatethe state vector using a state space filtering
algorithmand Appendix2 shows how to generatethe indicatorvariables.
2. THE GIBBS SAMPLER

241. General
Let yn = {y(l), ..., y(n)}' be the vector of observationsand X = {x(1)', . . . , x(n)'}' the
total state vector.Let K(t) be a vector of indicatorvariablesshowingwhich membersof
the mixtureeach of e(t) and u(t) belong to and which values h(t) and F(t) take, and let
K = {K(1), ... , K(n)}'. We write the parameter vector 0 = {01, . . ., Op,}.We assume that,
conditionalon K and 0, e(t) and u(t) are independentGaussiansequenceswhich are also
independentof each other. To illustrateour notation we considerthe following simple
example.Let
y(t) = x(t) + e(t), x(t) = x(t - 1) + u(t),
with x(t) univariate. The errors e(t) are a mixture of two normals with e(t) - N(O, a.2)
with probability Pi and e(t) - N(O, Ca2) with probability 1 - Pl, where C > 1 and Pi are
assumedknown.The disturbanceu(t) N(O,- 2). Then 0 (U2, z2) iS the unknownparam-
eter vector. We define the indicator variable K(t) as K(t) = 0 if var {e(t)} = 2 and K(t)-
1 if var {e(t)} = C92.
Let p(X, K, oIYn) be the joint posterior density of X, K and 0. The Gibbs sampler
(Gelfand & Smith,1990)is an iterative Monte Carlo technique that, in our case, success-

Gibbssamplingfor state space models 543
ively generatesX, K and 0 from the conditional densitiesp(X Iyn, K, 0), p(K Iyn, X, 0)
and p(iI yn, X, K, Oj*) for i = 1, . . ., p until eventually(X, K, 0) is generatedfrom the
joint posteriordistributionp(X, K, 01YIn).Tierney(1994) proves the convergenceof the
Gibbssamplerunderappropriateregularityconditions.For any givenexampleit is usually
straightforward to check whetherthese conditionshold.
We will assume that Oi can be generated from p(oiIYn,X, K, Oji) for i = 1,... p. ,
Efficientways of doing so will be determinedon a case by case basis. Sections2 2 and 2 3

show how to generatefrom p(X Iyn, K, 0) and p(K Iyn, X, 0).
2 2. Generatingthe state vector

We assume that x(1) has a proper distributionand that conditional on K and the
parametervector 0, h(t) and F(t) are known, and e(t) and u(t) are Gaussianwith known
means and variances.For notationalconveniencewe usuallyomit dependenceon K and
0 in this section. For t = 1, ... , n let Yt consist of all y(j) (j < t). The following lemma
shows how to generatethe whole of X given yn, K and 0. Its proof is straightforward
and is omitted.
LEMMA2>1. We have
n-1
p(XIY )=p{x(n) YI} H p{x(t)IYt,x(t+ 1)}.
t=1
Thus to generateX fromp(X Iy") we first generatex(n) from p{x(n)IYn} and then for
t = n - 1, ..., 1 we generate x(t) from p{x(t)IYt, x(t + 1)}. Because p{x(n)IYnf} and
p{x(t)IYt, x(t + 1)} are Gaussian densities,in order to generateall the x(t) we need to
compute E{x(n) Iyn } and var {x(n)IynI} and
E{x(t)IJYt x(t + 1)}, var {x(t)I Yt, x(t + 1)} (t = n-1,.. ., 1).
Let x(tIj)=E{x(t)IYi} and S(tIj)=var{x(t)IYi}. We obtain x(tlt) and S(tlt) for t=
1, ... , n using the Kalman filter (Anderson & Moore, 1979, p. 105). To obtain
E{x(t)l IYt,x(t + 1)} and var {x(t)j IYt,x(t + 1)} we treat the equation
x(t+ 1)=F(t+ 1)x(t)+u(t+ 1)
as m additionalobservationson the state vectorx(t) and applythe Kalmanfilterto them.
Details are given in Appendix1.
In many applications the distributionof the initial state vector x(1) is partly un-
known and this part is usually taken as a constant to be estimatedor equivalentlyto
have a diffuse distribution making x( 1) partially diffuse. By this we mean that
x(l) N(0, SI?]+ kSU1')with k -+ oo. The generationalgorithmcan be appliedas outlined
above and in Appendix1, except that now we use the modifiedfilteringand smoothing
algorithmsof Ansley & Kohn (1990).
Remark. In specificmodels it may be possible to use a fasterfilteringalgorithmthan
the Kalmanfilterto obtainx(t It) and S(t It). See, for example,the fast filteringalgorithms
of Anderson& Moore (1979, Ch. 6) when e(t) and u(t) are Gaussianand F(t) and h(t)
areconstant.A refereehas suggestedthe use of a Metropolisstepwithinthe Gibbssampler
to speed up the generationof the states (Tierney,1994). To do so it is necessaryto find
a candidatedensity q(X K, 0) for generatingX which is faster to generatefrom than
p(X I Y,K, 03)and yet is close enough to it so the rejectionrate in the Metropolisstep is

not too high. We tried using the prior for X as q(X K, 0), but this resultedin huge
rejectionrates and was thereforenot practical.
2 3. Generatingthe indicatorvariables
Recallthat K(t) (t = 1, ... , n) is a vectorof indicatorvariablesshowingwhichmembers
of the mixtureeach of e(t) and u(t) belong to and which values h(t) and F(t) take. Let
Kt= {K(1),... , K(t)} and Xt= {x(l),..., x(t)}. For notational convenience we omit
dependenceon 0. Conditionallyon K and 0, e(t) and u(t) are independentfor t = 1, . . . , n
in (1 1) and (1-2). This impliesthat
p{y(t)IYt'- Xt, Kt} =p{y(t)lx(t),K(t)}, p{x(t)IXt-',Kt} =p{x(t)lx(t-1),K(t)}.
We assume that the prior distributionof K is Markov.The next lemma shows how to
generatethe whole of K given yn, X and 0. We omit its proof as it is straightforward.
LEMMA2 2. We have
n-1
p(KI Yn, X) = p{K(n)IYn, X} f7 p{K(t)I Yt, Xt, K(t + 1)}.
t=1
Thus to generateK from p(K Iyn, X) we first generateK(n) from p{K(n) Iyn, X} and
then for t = n -1, ..., 1 we generate K(t) from p{K(t)I Yt,Xt, K(t + 1)}. Because
p{K(n) IYn X} and p{K(t) IYt,Xt, K(t + 1)} are discrete valued we can generate
from them easily, once we have calculated them. To calculate p{K(n)Iyn, X} and
p{K(t) IYt Xt, K(t + 1)} we use recursive filtering equations (Anderson & Moore, 1979,
Ch. 8) in a similar way to our use of the Kalman filter in ? 2 2. Details are in Appendix 2.
Because K(t) is discrete valued, the filtering equations can be evaluated efficiently.
3. EXAMPLES
3 1. General
We illustrate the results in ? 2 and Appendices 1 and 2 by applying them to four
examples.The first is a stochastictrend model giving a cubic spline smoothingestimate
of the signal. The second exampleis a trend plus seasonalmodel. In the third example
the errors e(t) are a discretemixture of normals with Markov dependence.The fourth
examplediscussesswitchingregression.The first two examplescompareempiricallythe
performanceof the approachthat generatesall the statessimultaneouslywith the approach
that generates the states one at a time.
3-2. Example 1: Cubic smoothingspline

Our first example is a continuous time stochastic trend model for which the signal
estimate is a cubic smoothing spline. We implementthe Gibbs samplerusing the first
elementof the state vector and make the importantpoint that in many applicationsonly
a subset of the elements of the state vector is needed.
Supposewe have observationson the signalplus noise model
y(i) = g(ti) + e(i) (i = 1, . .. , n), (3.1)
with the e(i) independent N(O, a2) with the signal g(t) generatedby the stochasticdifferen-
tial equation
d2g(t)/dt2 = -cdW(t)/dt; (3.2)

W(t) is a Wiener process with W(O)= 0 and var {W(t)} = 1 and z is a scale parameter.
We assume that the initial conditions on g(t) and dg(t)/dt are diffuse; that is, with k -o 00,
{g(tj), dg(tl)ldt' t- N(05 kI2). (3 3)
We takeO < t1 < t2 < ... < tn. Following Kohn & Ansley (1987) we can write (341)and
(3 2) in state space form as
y(i) = h'x(tj) + e(i), x(tj) = F(3j)x(tj_1) + u(i),
where the state vector x(t) = {g(t), dg(t)/dt}', the increments i = ti - ti1 (to = 0), and the
u(i) are independentN{O,z2U(bi)}. The vectorh = (1,0 )' and the 2 x 2 matricesF(b) and
U(5) are given by
3 62
2
F(b) = ) U(b) = (/3
The vector of unknownparametersis 0 = (p2, z2)' and from (3 3) the initial state vector
x(t1) has a diffusedistribution.For a furtherdiscussionof this model and its connection
with spline smoothingsee Wahba(1983) and Kohn & Ansley(1987).
To complete the Bayesianspecificationof the model we impose the improperpriors
p(U2) oc 1/U2 exp (_ f3,a/2), with #,,Bsmall, and p(C2)oc 1. As we only use the first element
of x(t) in the Gibbs samplerlet G = {g(tJ) ... , g(tn)}. The vectors G and 0 are generated
as follows. For given 0, X is generatedas explainedin ? 2 2 and Appendix 1. We then
extract G. To generate U2, we can show that
p(o2| Y, G,T2)C(oc ) n/21 x exp [ Z e(i)2 + #a 5
wheree(i) = y(i) - h'x(ti).Hence U2 iS generatedfrom an inversegammadistributionwith

parametersn/2 and 1, e(i)2 + f,. To generatez2, note that for given k > 0
IG; k) ocp(GIX2;k)p(C2)ocp(G IX2;k).
p(I21 yn",G, U2;k) = p(-u2
It follows from Ansley & Kohn (1985) that
liM
p(2IoG; k)oc(z2) -n/2 1x exp {-2 Z
L (i)21R(i)
where e(i) and R(i) are the innovationsand innovation variancesrespectivelyobtained

from runningthe modifiedKalmanfilteron the state space model
g(tj) = h'x(tj), x(tj) = F(6j)x(tj_1) + u(i).
Hence z2is generatedfrom an inversegamma distributionwith parametersn/2 -2 and
1E g(i)2 IR(i).
For this model we now describethe Gibbs samplerapproachof Carlinet al. (1992).
The state vector x(t) is generatedfrom p{x(t)Iy(t), x(t - 1),x(t + 1), 2 2}. The error
variancex2iS generatedas aboveby notingthatp(o2Iyn, X, z2) = p(c2 Iyn G, z2). To gener-
ate r2 note that for given k > 0
p(-C21 yn, X, U2; k) = p(-r2 IX; k) 0Cp(X 1,u2;k)p(-C2)OCp(X jz2; k).
It follows from Ansley & Kohn (1985) that
lim p(2 IX; k)OC(-C2) x exp - 1 U(i) UOJ

U()

whereu(i)= x(ti) - F(bi)x(ti-1). Hencewe generatezr2given Y', X and C2 froman inverse
gamma distribution with parameters n -2 and 2 , U(i)'U(Q3)-1u(i).
We now compare empiricallythe approach generatingall the states at once to the
approachthat generatesthe states one at a time.The data are generatedby (3 1) with the
function
g(t) = 3#10,5(t)+ 3#,r(t) + 305,10(t),
whereflp,qis a beta densitywith parametersp and q, and 0 < t < 1. This functionwas used
by Wahba(1983) in her simulations.The errorstandarddeviationis C= 0-2, the sample
size is n = 50 and the design is equally spaced with ti = i/S0 (i = 1, . . ., 50). For both
algorithms we first ran the Gibbs sampler with the starting values
(U2)101 = 1, x(t)0l' = E{x(t)Io2 = 1, r2 =1 }.
The value of z2 was generated by the Gibbs sampler. Figure l(a) is a plot of the iterates
of U2 and Fig. 1(b) a plot of the iterates of z2 when the states are generated one at a time.
The horizontal axis is the iterate number. It appears that for this approach the Gibbs
sampler takes about 15 000 iterations to converge. Figures l(c) and (d) are plots of the
first 2000 iterates of the Gibbs sampler of U2 and z2 respectivelywhen the states are
generatedsimultaneously.The same startingvalues are used for both approaches.The
Gibbs samplerappearsto convergeafter 100 iterations.Similarresultsare obtainedfor
other arbitrarystartingvalues.
(a) (b)
1000.0
0-50 100.0
a2 T2 10.0
1.0
0-05
0.1
0 10 000 20 000 0 10 000 20 000
Iteration j Iteration j
(c) (d)
1000.0
0-50
100.0
a2 -2 10?0
0-05. 1.0
0.1
0 500 1000 1500 2000 0 500 1000 1500 2000
Iteration j Iteration j
Fig. 1. Example 1: generatedvalues of a2 and -2 with starting values a2= 1 and X = E(X I2 = 1,
T2= 1). In (a) and (b) the states are generatedone at a time and in (c) and (d) they are generated
simultaneously.
To study the relative efficienciesof the two algorithmsonce the Gibbs samplerhas
convergedwe use the marginallikelihoodestimatesof (F2 and T;2as startingvalues.For a

definitionand discussionof the marginallikelihoodestimatessee Kohn & Ansley(1987).
For both algorithmswe ran the Gibbs samplerfor a warm-upperiod of 1000 iterations
followedby a samplingrun of 10 000 iterations.Using the final 10 000 iterateswe com-
puted the first 300 autocorrelationsof the signal estimateat the abscissa t = 0 4, which
we call g(04), and also for z2. Figures 2(a) and (b) are plots of the autocorrelationsof
the iteratesof g(025) and -2 respectivelywhen the states are generatedone at a time and
Fig. 2(c)and(d) arethe correspondingplots forthe algorithmwhenthe statesaregenerated
simultaneously.Clearlythe autocorrelationsfor the first algorithmare much higherthan
for the second.
(a) (b)
0.8. 0 0.8
100 200 300 0 100 200 300
Lag Lag
(c) (d)
0.2 0.2
10_ 10
0-6 06i
024. u 04
0 -0 ... ...........................................
. .. . .. . ..
0 100 200 300 0 100 200 300

Lag Lag
Fig. 2. Example 1: sample autocorrelation function (ACF) for g(0 25) and -c2.In (a) and (b) the
states are generated one at a time and in (c) and (d) they are generated simultaneously.
Using the samplingrun of 10 000 iterateswe now presentthe relativeefficienciesof the

two algorithmsin estimatingthe posteriormean E{g(t) IY} of the signal.There are two
ways to estimatethe posteriormean.The firstis to use the samplemomentsof the Gibbs
iteratesto formwhat is calleda histogramestimate.The second way is to form a mixture
estimate.When generatingall the states simultaneouslythe histogramand mixtureesti-
mates of the posteriormean of g(t) are respectively
I N 1 N
where N 106000
= and g'i(t) is the jth Gibbs iterate of g(t) duringthe samplingperiod.
The smoothedvalues E{g(t)IY,0[ } in (3 4) are obtainedusing the smoothingalgorithm
of Ansley & Kohn (1990). For the algorithmgeneratingthe states one at a time the
histogramestimatesare as in (3 4) while the mixtureestimatesare computedas in ? 2 of
Carlinet al. (1992). The resultsof Gelfand& Smith(1990) and Liu et al. (1994) suggest
that mixtureestimateswill usually have smallervariancethan histogramestimates.We

first consider the efficiencyof the histogramestimates of the signal by estimatingthe
posteriormean of the signal at the abscissaet = 0-02, 0 25 and 0 5 and calling g(t) the
estimate at t. We assume that in the sampling period the Gibbs sampler has con-
verged so the g(t)Ej'form a stationary sequence for each t. For a given t let yi =
g(t)[i+]I} be the ith autocovarianceof g(t)[1Jwith correspondingsampleauto-
cov {g(t)[J1,
covariance jit, We estimateN var {g(t)} by
E(1 -liIN AYit
i I_1000
using the first 1000sampleautocovariances.For a discussionof varianceestimationfrom

simulationexperimentssee Moran (1975).
Table1 presents the results for the histogramestimates.The first column gives the
abscissat, the second column the samplevarianceestimateyot, and the third column the
varianceestimateof N var {g(t)} whenthe statesare generatedsimultaneously.The fourth
and fifth columns have the same interpretationas the second and third columns except
that now the states are generatedone at a time. The sixth columnis the ratio of the fifth
and thirdcolumnsand is an estimateof the factorby whichthe numberof Gibbs iterates
for the approachwhich generatesone state at a time would have to increasein orderto
have the same accuracyas the approachwhich generatesall the states at once. We take
it to be the measure of the efficiencyof the two algorithms.Table1 shows that the
efficienciesrange from 91 to 358 so that the number of iterates of the algorithmthat
generatesthe states one at a time would need to increaseby a factor of about 350 to
achievethe same accuracyas that whichgeneratesthe statessimultaneously.We also note
fromthe tablethat the samplevariancesyOt areapproximatelythe sameforboth algorithms
suggestingthat we are generatingfrom the correct distributionfor the algorithmthat
generatesthe states one at a time.
Table2 has the same interpretationas Table1 but we now deal with the iteratesgener-
ated by the mixtureestimates.The efficienciesnow rangefrom 498 to 178 000.
Table 1. Histogram estimates of E{g(t) IYn}
Simultaneous One at a time
t kO,t N var {t)} A,t N var {g(t)} Ratio
0-02 2-3 x 102 4.4 x 10-2 22 x 10-2 4 91
0-25 6-8 x 1O' 7.7 x 10-2 76 x 10- 2-5 318
05 7 0 x 10' 7-0 x 10' 9-3 x 10- 2-5 358
Table2. Mixture estimates of E{g(t) Iyn }

Simultaneous One at a time
t kO,t N var {g(t)} Yo,t N var {k(t)} Ratio
0-02 2-9 x 10' 78 x 10' 2-1 x 10-2 3-9 498
0 25 5-8 x 10-6 1-4 x 10' 7.5 x 10- 2-5 178731
0-5 1-1 x 10' 1-4 x 10-i 9-2 x 10- 2-5 17455
We repeatedthis study with differentfunctionsg(t), differentsamplesizes and different

values of errorstandarddeviationand obtainedsimilarresultsto those reportedabove.
We concludethat for this simplemodel the approachthat generatesthe states one at a
time is far slowerto convergeand is far less efficientthan the approachthat generatesall
the states simultaneously.

3-3. Example 2: Trendplus seasonal componentstime series model
A popularmodel for quarterlyeconomictime seriesis
y(t) = g(t) + T(t) + e(t) (t = 1, .. ., n), (3.5)
with the errorse(t) independentN(O,ar2)and with the seasonalg(t) and trend T(t) gener-
ated by the stochasticdifferenceequations
3
, g(t -j) = v(t), (3 6)
j=O
T(t) - 2T(t - 1) + T(t -2) = w(t), (3-7)
wherethe v(t) are independentN(0, z2)and the w(t) areindependentN(0, W2).This model
is proposed by Kitagawa& Gersch (1984) who regard(3 6) and (3 7) as priors for the
seasonaland trend and who express(3 5)-(3 7) in state space form with the state vector
x(t) = {g(t), g(t - 1), g(t -2), T(t), T(t -1)}.
For the approachgeneratingthe states simultaneouslyestimationof the model using the
Gibbs samplercan be done as in ? 3-2. For the approachgeneratingthe states one at a
time the state vector x(t) is known, for 1 < t < n, if we condition on x(t - 1) and x(t + 1)
and so new variationis only introducedwhen generatingx( 1) and x(n).Thus the resulting
Gibbs samplerdoes not convergeto the posteriordistribution.
To study empiricallythe approachthat generatesthe states simultaneouslywe consider
for simplicitythe pure seasonalmodel
y(t) = g(t) + e(t).
We generated50 observationsusingg(t) = sin (27rt/4-1)and ar= 0f2.The priorsfor U2and
z2 are the same as in ? 3-2. Figure3 shows the output from the Gibbs samplerwith the
= 1 and x(t)E0'= E{x(t) l 2 = 1, 2 = 1}. Figure3(a) shows the data
startingvalues (cr2)[0'
togetherwith the functiong(t) and the mixtureestimatesof g(t). Fig. 3(b) and (c) show
the generatedvalues of U2 and z2 respectively.The warm-upperiod was 1000 iterations
and the samplingperiod was 1000 iterations.From Fig. 3 the algorithmto generatethe
states simultaneouslyappears to converge within a hundrediterations.Similarresults
were obtainedfor other abitrarystartingvalues.
To understandthe differencein performanceof the two algorithmswe view the state
transitionequation(1-2)as a priorfor the state vector.If this prioris tight then generating
the statesone at a timewill produceGibbsiteratesof the stateswhicharehighlydependent
and so will tend to move verylittlein successiveiterations.An extremecase is the seasonal
plus trendmodel (3 5)-(3 7).
3 4. Normal mixture errors with Markov dependence

We illustrateour algorithmto generatethe indicatorvariablesby consideringthe case
wheree(t) and u(t) are normalmixtures.It is sufficientto discussthe case wheree(t) is a
mixtureof two normalsand u(t) is normal as the generalcase can be handledsimilarly.
We assumethat we have a linearstate space model as given in (1L1)and (1-2), and that
e(t) has mean zero and varianceequal to either U2 orK2cr2. Such a normalmixturemodel
for e(t) with K> 1 has been used by Box & Tiao (1968) to handle outliers.Let K(t) = 1
if e(t) has variance c2, and let K(t) = 2 otherwise, and let K = {K(1), ... , K(n)}'.
We note that, conditionallyon K, the e(t) and u(t) are Gaussianso that the resultsin

550 C. K. CARTER AND R. KOHN
(a) (b)
0-50
g(t) 00
-1-0
0.0 0.2 0.4 0-6 0-8 1.0 0 500 1000 1500 2000
t Iteration j
(c)
0 '050
T2.
0005
0 500 1000 1500 2000
Iteration j
Fig. 3. Example 2: (a) shows the data (dots) together with the function g(t) (dashes) and the mixture
estimates of g(t) (solid); (b) and (c) show the generated values of U2 and T2. The states are generated
simultaneously.
??2 and 3 2 applyto the generationof X and 0. Thus we will only considerthe generation
of K. We first assumethat a priorithe K(t) come from a Markovchain with
p{K(t+1) = 2jK(t)= i}= pj (i =1, 2).

For simplicitywe take the probabilitiesPi and P2 as fixed,for examplewe could takePi =
005 and P2= 05 for detecting outliers that cause a temporaryincrease in variance.
Alternativelywe could place a prior on Pi and P2, for examplea beta prior.To simplify
notation we omit dependenceon 0. From Lemma2 2 the distributionof K given yn and
X is
n-1
p(KI Yn, X) = p{K(n)Iyn, X} f7 p{K(t)I Yt, Xt, K(t + 1)}.

t=l
We show how to calculateeach term in Appendix2.

We now considerthe simplercase wherea priorithe K(t) are independent.In this case
Lemma2 2 becomes
n n
p(KI Yn X) = H p{K(t) Iy(t), x(t)} oc H p{y(t) Ix(t), K(t)} p{K(t)},

t=l t-l
so that the K(t) are independentand binomialand it is straightforward

to generatethem.
Ourapproachcan also handleerrorsthat aregeneralnormalscalemixtures,for example
t distributederrors.Some furtherdetails and examplesare given by Carlinet al. (1992).

3-5. Switching regression model
In the switching regression model the coefficients {h(t), F(t), t = 1, ... , n} take on a
small number of different values determined by some probabilistic distribution. Shumway
& Stoffer (1991) use a switching regression model to identify targets when a large number
of targets with unknown identities is observed. To show how our results apply it is sufficient
to discuss the simplest case in which F(t) is constant for all t and h(t) takes on just two
values, h1 and h2 say. Let K(t) = 1 if h(t) = h1, and let K(t) =2 otherwise. As in ? 3*4 we
assume that a priori the K(t) come from a Markov chain with parameters Pi and P2. If
Pi and P2 are unknown we would place a beta prior on them. Given K = {K( 1), . . . , K(n)}
we generate X as in ? 2-1. Generating K given yn, X and 0 is very similar to the way we
generated it in ? 3*4and we omit details.
ACKNOWLEDGEMENT
We would like to thank the Division of Mathematics and Statistics, CSIRO, and the
Australian Research Council for partial support. We would also like to thank David Wong
for help with the computations.
APPENDIX 1
Algorithmto generatestate vector
We show how to generateX conditionalon yn, K and 0. We omit dependenceon K and 0, and,
as in ? 22, let
x(tJj) = Ex(t)J YiJ, S(tji) = var {x(t)JYiJ}
For t = 1, ... , n the conditionalmeanx(t It) and the conditionalvarianceS(t It) are obtainedusing
the Kalmanfilter(Anderson& Moore, 1979,p. 105).
Using Lemma 2 1 we show how to generatex(n),. .. , x(1) in that order conditioningon yn.
First,p{x(n) IYnI}is normalwith mean x(nIn)and varianceS(nIn).To generatex(t) conditionalon
yt and x(t + 1) we note that we can regardthe equation
x(t+ 1)=F(t+ 1)x(t)+u(t+ 1)

as m additionalobservationson x(t). If U(t + 1) is diagonal then the ui(t + 1) (i = 1, .. ., m) are
independentand we can apply the observationupdatestep of the Kalmanfilterm times as shown
below to obtain
E{x(t) I yt, x(t + 1)}, var {x(t)l Iyt, x(t + 1)}.
More generallywe can factorizeU(t + 1) = L(t + 1)A(t + 1)L(t+ 1)' usingthe Choleskydecompo-
sition with L(t + 1) a lower triangularmatrixwith ones on the diagonaland A(t + 1) a diagonal
matrix.Let
x(t+ 1)=L(t+1l)-1x(t+ 1), F(t+ 1)=L(t+ 1)-1F(t+ 1), ui(t+ 1)=IXt+1l)-1u(t+ 1).
We can then write
x(t+ 1)=F(t+ 1)x(t)+ui(t+ 1)
so that, for i= 1,... m,
xi(t + 1) = Fi(t + 1)'x(t)+ fii(t + 1), (Al1)
where Pi(t + 1)' is the ith row of F(t + 1) and xi(t + 1) and fui(t + 1) are the ith elements of x(t + 1)

552 C. K. CARTER AND R. KOHN
and ui(t+ 1). The elements fii(t + 1) are independentN{O,A(t + 1)}, where Ai(t + 1) is the ith
diagonalelementof A(t + 1). For i = 1, . . . , m let
x(tlt, i) = E{x(t)IYt,x(t + 1)... Xi(t+ 1)},
S(tlt,i) = var {x(t)l Yt,xl(t + 1) ..., Xi(t + 1)},
and definex(tIt, 0) = x(t It) and S(tIt, 0) = S(tIt). We now apply the observationupdate step of
the Kalmanfilterm times to (Al 1) as follows.
For i= 1,.. .,m let
8(t,i) = xi(t + 1) -Fi (t + 1)'x(tIt, i -1),
R(t, i) =Fi(t + 1)'S(tIt, i -l )Fi(t+ 1) + A\i( + 1).
Then
x(t It, i) =x(t It, i - ) +S(t It, i - )Fi(t + 1)E(t, i)IR(t,i),
S(t It,i) = S(t It, i -1) -S(t It,i -1)F-i(t + 1)F-i(t+ 1)'S(tIt,i -1)/R(t, i).
We thereforeobtain
x(tlt,m)=E{x(t)IYt,x(t+1)}, S(tlt,m)=var{x(t)IYt,x(t+1)}.
It is now straightforwardto generatex(t) conditionallyon Yt and x(t + 1), as it is normally
distributedwith mean x(t It,m) and varianceS(t It,m).
2
APPENDIX
Algorithmto generateindicatorvariables
Weshowhow to generateK conditionalon yn, X and 0. Weomitdependenceon 0. Let k1,... , km
be the possible values assumedby K(t) (t = 1, ... ., n) and suppose that the transitionmatrices
specifyingp{K(t)IK(t- 1)} (t = 2, . . . , n) are known.We note that if y(t) is observedthen
p{K(t)IYt,Xt} ocp{y(t)Ix(t), K(t)}p{x(t)Ix(t - 1), K(t)}p{K(t)IYt-1 Xt-},
and if y(t) is not observedthen

p{K(t)IYt,Xt} ocp{x(t)Ix(t - 1), K(t)}p{K(t)IYt-1,Xt'-}.
The following algorithmuses recursivefilteringequationsfollowing Anderson& Moore (1979,
Ch. 8) to calculatep{K(t)I Yt, Xt}.
Discrete filter: For t = 1, ... ., n.
Step 1. Performed for t > 1,
m
p{K(t)Iyt-l9 xt-i} E p{K(t)IK(t -1) = kjjp{K(t -1) = kjlyt-1l Xt-1}.
j=1
Step 2a. If y(t) is observedset
p* {K(t)I Yt,Xt} = p{y(t)jx(t), K(t)}p{x(t)Ix(t- 1), K(t)}p{K(t)I Yt-1, Xt-1 }.
Step 2b. If y(t) is not observedset

p* {K(t)I Yt, Xt} = p{x(t)lx(t - 1), K(t)}p{K(t)IYt-1 Xt-'}.
Step 3. Obtain p{K(t) IYt,Xt} using
p m
DpK(t)I yt. Xt I = D* { K(t)I yt. Xt} /' D* {K(t) = k IYt. Xt }.

We note that p{y(t)Ix(t),K(t)} and p{x(t)Ix(t -1), K(t)} for t = 1, . . . , n are known from the
specificationof the state space model.
Using Lemma2 2 we show how to generateK(n),. . . , K(1) in that orderconditioningonly on
Yn and X. First,we calculatep{K(t)IYt, Xt} for t = 1, . . ., n using the discretefiltershown above.
To generateK(t) conditionalon Yt, Xt and K(t + 1) we use the followingresultfor t = n - 1, . . ., 1:
p{K(t) I Yt,Xt, K(t + 1)} = + 1) IK(t)Yp{K(t)

p{K(t X Yt} XtI
REFERENCES
ANDERSON, B. D. 0. & MooRE, J. B. (1979). Optimal Filtering. Englewood Cliffs, New Jersey: Prentice Hall.
ANSLEY, C. F. & KoHN. R. (1985). Estimation filtering and smoothing in state space models with partially
diffuse initial conditions. Ann Statist. 13, 1286-316.
ANSLEY, C. F. & KoHN, R. (1990). Filtering and smoothing in state space models with partially diffuse initial
conditions. J. Time Ser. Anal. 11, 277-93.
Box, G. E. P. & TIAO,G. C. (1968). A Bayesian approach to some outlier problems. Biometrika 55, 119-29.
CARLIN,B. P., POLSON,N. G. & STOFFER, D. S. (1992). A Monte Carlo approach to nonnormal and nonlinear
state space modeling. J. Am. Statist. Assoc. 87, 493-500.
GELFAND,A. E. & SMITH, A. F. M. (1990). Sampling-based approaches to calculating marginal densities.
J. Am. Statist. Assoc. 85, 398-409.
GORDON,K. & SMITH,A. F. M. (1990). Monitoring and modeling biomedical time series. J. Am. Statist.
Assoc. 85, 328-37.
HAMILTON, J. D. (1989). A new approach to the economic analysis of nonstationary time series and the
business cycle. Econometrica 57, 357-84.
HARRISON, P. J. & STEVENS, C. F. (1976). Bayesian forecasting (with discussion). J. R. Statist. Soc. B 38, 205-47.
KITAGAWA,G. (1987). Non-Gaussian state space modeling of nonstationary time-series (with discussion).
J. Am. Statist. Assoc. 82, 1032-63.
KITAGAWA,G. & GERSCH,W. (1984). A smoothness priors-state space approach to time series with trend and
seasonalities. J. Am. Statist. Assoc. 79, 378-89.
KoHN, R. & ANSLEY,C. F. (1987). A new algorithm for spline smoothing based on smoothing a stochastic
process. SIAM J. Sci. Statist. Comput.8, 33-48.
LIu, J., WONG, W. H. & KONG, A. (1994). Covariance structure of the Gibbs sampler with applications to
the comparison of estimators and augmentation schemes. Biometrika 81, 27-40.
MEINHOLD,R. J. & SINGPURWALLA, N. D. (1989). Robustification of Kalman filter models. J. Am. Statist.
Assoc. 84, 479-86.
MORAN,P. A. P. (1975). The estimation of standard errors in Monte Carlo simulation experiments. Biometrika
62, 1-4.
SCHWEPPE,C. F. (1965). Evaluation of likelihood functions for Gaussian signals. IEEE Trans. Info. Theory
11, 61-70.
SHuMwAY,R. H. & STOFFER,D. S. (1991). Dynamic linear models with switching. J. Am. Statist. Assoc.
86, 763-9.
TIERNEY,L. (1994). Markov chains for exploring posterior distributions. Ann. Statist. To appear.
WAHBA,G. (1983). Bayesian 'confidence intervals' for the cross-validated smoothing spline. J. R. Statist. Soc. B
45, 133-50.
WEST, M. & HARRISON,J. (1989). Bayesian Forecasting and Dynamic Models, Springer Series in Statistics.
New York: Springer-Verlag.
[Received June 1992. Revised July 1993]


Biometrika Trust

Uploaded by

Copyright:

Available Formats

Biometrika Trust

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Biometrika Trust

Uploaded by

Copyright:

Available Formats

Biometrika Trust

On Gibbs Sampling for State Space Models

This content downloaded from 185.2.32.121 on Tue, 17 Jun 2014 00:57:14 AM

On Gibbssamplingfor state spacemodels

This content downloaded from 185.2.32.121 on Tue, 17 Jun 2014 00:57:14 AM

2. THE GIBBS SAMPLER

This content downloaded from 185.2.32.121 on Tue, 17 Jun 2014 00:57:14 AM

Efficientways of doing so will be determinedon a case by case basis. Sections2 2 and 2 3

2 2. Generatingthe state vector

This content downloaded from 185.2.32.121 on Tue, 17 Jun 2014 00:57:14 AM

3-2. Example 1: Cubic smoothingspline

This content downloaded from 185.2.32.121 on Tue, 17 Jun 2014 00:57:14 AM

p(o2| Y, G,T2)C(oc ) n/21 x exp [ Z e(i)2 + #a 5

wheree(i) = y(i) - h'x(ti).Hence U2 iS generatedfrom an inversegammadistributionwith

where e(i) and R(i) are the innovationsand innovation variancesrespectivelyobtained

lim p(2 IX; k)OC(-C2) x exp - 1 U(i) UOJ

This content downloaded from 185.2.32.121 on Tue, 17 Jun 2014 00:57:14 AM

This content downloaded from 185.2.32.121 on Tue, 17 Jun 2014 00:57:14 AM

0 100 200 300 0 100 200 300

Using the samplingrun of 10 000 iterateswe now presentthe relativeefficienciesof the

This content downloaded from 185.2.32.121 on Tue, 17 Jun 2014 00:57:14 AM

using the first 1000sampleautocovariances.For a discussionof varianceestimationfrom

Table2. Mixture estimates of E{g(t) Iyn }

We repeatedthis study with differentfunctionsg(t), differentsamplesizes and different

This content downloaded from 185.2.32.121 on Tue, 17 Jun 2014 00:57:14 AM

3 4. Normal mixture errors with Markov dependence

This content downloaded from 185.2.32.121 on Tue, 17 Jun 2014 00:57:14 AM

p{K(t+1) = 2jK(t)= i}= pj (i =1, 2).

p(KI Yn, X) = p{K(n)Iyn, X} f7 p{K(t)I Yt, Xt, K(t + 1)}.

We show how to calculateeach term in Appendix2.

p(KI Yn X) = H p{K(t) Iy(t), x(t)} oc H p{y(t) Ix(t), K(t)} p{K(t)},

so that the K(t) are independentand binomialand it is straightforward

This content downloaded from 185.2.32.121 on Tue, 17 Jun 2014 00:57:14 AM

x(t+ 1)=F(t+ 1)x(t)+u(t+ 1)

This content downloaded from 185.2.32.121 on Tue, 17 Jun 2014 00:57:14 AM

and if y(t) is not observedthen

Step 2b. If y(t) is not observedset

This content downloaded from 185.2.32.121 on Tue, 17 Jun 2014 00:57:14 AM

p{K(t) I Yt,Xt, K(t + 1)} = + 1) IK(t)Yp{K(t)

[Received June 1992. Revised July 1993]

This content downloaded from 185.2.32.121 on Tue, 17 Jun 2014 00:57:14 AM

You might also like