Biometrika Trust
Biometrika Trust
Biometrika Trust
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.
http://www.jstor.org
SUMMARY
We show how to use the Gibbs samplerto carry out Bayesianinferenceon a linear
state space model with errors that are a mixture of normals and coefficientsthat can
switch over time. Our approachsimultaneouslygeneratesthe whole of the state vector
given the mixtureand coefficientindicatorvariablesand simultaneouslygeneratesall the
indicatorvariablesconditionalon the state vectors. The states are generatedefficiently
using the Kalmanfilter.We illustrateour approachby severalexamplesand empirically
compareits performanceto anotherGibbs samplerwherethe states are generatedone at
a time. The empiricalresults suggest that our approachis both practicalto implement
and dominatesthe Gibbs samplerthat generatesthe states one at a time.
Some key words: Diffuse parameter; Kalman filter; Markov chain Monte Carlo; Mixture of normals; Spline
smoothing; Switching regression;Trend plus seasonal model.
1. INTRODUCTION
Considerthe linearstate space model
y(t) = h(t)'x(t) + e(t),(1)
x(t + 1) =F(t + 1)x(t) +u(t + 1), (1.2)
where y(t) is a scalar observationand x(t) is an m x 1 state vector.We assumethat the
errorsequences{e(t), t > 1} and {u(t), t > 1} aremixturesof normals.Let 0 be a parameter
vector whose value determinesh(t) and F(t) and also the distributionsof e(t) and u(t).
Furtherdetails of the structureof the model are given in ? 2 1. Equation(1 1) is called
the observationequation and (1 2) the state transitionequation.When e(t) and u(t) are
independentGaussiansequencesunknownparametersare usuallyestimatedby maximum
likelihoodfollowingSchweppe(1965).The Kalmanfilterand statespacesmoothingalgor-
ithms are used to carryout the computations.
Thereare a numberof applicationsin the literaturewhereit is necessaryto go beyond
the Gaussianlinearstate space model:e.g. Harrison& Stevens(1976), Gordon & Smith
(1990),Hamilton(1989) and Shumway& Stoffer(1991). Meinhold& Singpurwalla(1989)
robustifythe Kalman filter by taking both e(t) and u(t) to be t distributed.A general
approach to estimating non-Gaussian and nonlinear state space models is given by
Kitagawa(1987). Exceptwhen the dimensionof the state vectoris very small,Kitagawa's
approachappearscomputationallyintractableat this stage.Variousapproximatefiltering
and smoothingalgorithmsfor nonlinearand non-Gaussianstate spacemodels have been
given in the literature.See, for example,Anderson& Moore (1979, Ch. 8) and West &
Harrison(1989).
Thus to generateX fromp(X Iy") we first generatex(n) from p{x(n)IYn} and then for
t = n - 1, ..., 1 we generate x(t) from p{x(t)IYt, x(t + 1)}. Because p{x(n)IYnf} and
p{x(t)IYt, x(t + 1)} are Gaussian densities,in order to generateall the x(t) we need to
compute E{x(n) Iyn } and var {x(n)IynI} and
E{x(t)IJYt x(t + 1)}, var {x(t)I Yt, x(t + 1)} (t = n-1,.. ., 1).
Let x(tIj)=E{x(t)IYi} and S(tIj)=var{x(t)IYi}. We obtain x(tlt) and S(tlt) for t=
1, ... , n using the Kalman filter (Anderson & Moore, 1979, p. 105). To obtain
E{x(t)l IYt,x(t + 1)} and var {x(t)j IYt,x(t + 1)} we treat the equation
x(t+ 1)=F(t+ 1)x(t)+u(t+ 1)
as m additionalobservationson the state vectorx(t) and applythe Kalmanfilterto them.
Details are given in Appendix1.
In many applications the distributionof the initial state vector x(1) is partly un-
known and this part is usually taken as a constant to be estimatedor equivalentlyto
have a diffuse distribution making x( 1) partially diffuse. By this we mean that
x(l) N(0, SI?]+ kSU1')with k -+ oo. The generationalgorithmcan be appliedas outlined
above and in Appendix1, except that now we use the modifiedfilteringand smoothing
algorithmsof Ansley & Kohn (1990).
Remark. In specificmodels it may be possible to use a fasterfilteringalgorithmthan
the Kalmanfilterto obtainx(t It) and S(t It). See, for example,the fast filteringalgorithms
of Anderson& Moore (1979, Ch. 6) when e(t) and u(t) are Gaussianand F(t) and h(t)
areconstant.A refereehas suggestedthe use of a Metropolisstepwithinthe Gibbssampler
to speed up the generationof the states (Tierney,1994). To do so it is necessaryto find
a candidatedensity q(X K, 0) for generatingX which is faster to generatefrom than
p(X I Y,K, 03)and yet is close enough to it so the rejectionrate in the Metropolisstep is
2 3. Generatingthe indicatorvariables
Recallthat K(t) (t = 1, ... , n) is a vectorof indicatorvariablesshowingwhichmembers
of the mixtureeach of e(t) and u(t) belong to and which values h(t) and F(t) take. Let
Kt= {K(1),... , K(t)} and Xt= {x(l),..., x(t)}. For notational convenience we omit
dependenceon 0. Conditionallyon K and 0, e(t) and u(t) are independentfor t = 1, . . . , n
in (1 1) and (1-2). This impliesthat
p{y(t)IYt'- Xt, Kt} =p{y(t)lx(t),K(t)}, p{x(t)IXt-',Kt} =p{x(t)lx(t-1),K(t)}.
We assume that the prior distributionof K is Markov.The next lemma shows how to
generatethe whole of K given yn, X and 0. We omit its proof as it is straightforward.
LEMMA2 2. We have
n-1
p(KI Yn, X) = p{K(n)IYn, X} f7 p{K(t)I Yt, Xt, K(t + 1)}.
t=1
Thus to generateK from p(K Iyn, X) we first generateK(n) from p{K(n) Iyn, X} and
then for t = n -1, ..., 1 we generate K(t) from p{K(t)I Yt,Xt, K(t + 1)}. Because
p{K(n) IYn X} and p{K(t) IYt,Xt, K(t + 1)} are discrete valued we can generate
from them easily, once we have calculated them. To calculate p{K(n)Iyn, X} and
p{K(t) IYt Xt, K(t + 1)} we use recursive filtering equations (Anderson & Moore, 1979,
Ch. 8) in a similar way to our use of the Kalman filter in ? 2 2. Details are in Appendix 2.
Because K(t) is discrete valued, the filtering equations can be evaluated efficiently.
3. EXAMPLES
3 1. General
We illustrate the results in ? 2 and Appendices 1 and 2 by applying them to four
examples.The first is a stochastictrend model giving a cubic spline smoothingestimate
of the signal. The second exampleis a trend plus seasonalmodel. In the third example
the errors e(t) are a discretemixture of normals with Markov dependence.The fourth
examplediscussesswitchingregression.The first two examplescompareempiricallythe
performanceof the approachthat generatesall the statessimultaneouslywith the approach
that generates the states one at a time.
The vector of unknownparametersis 0 = (p2, z2)' and from (3 3) the initial state vector
x(t1) has a diffusedistribution.For a furtherdiscussionof this model and its connection
with spline smoothingsee Wahba(1983) and Kohn & Ansley(1987).
To complete the Bayesianspecificationof the model we impose the improperpriors
p(U2) oc 1/U2 exp (_ f3,a/2), with #,,Bsmall, and p(C2)oc 1. As we only use the first element
of x(t) in the Gibbs samplerlet G = {g(tJ) ... , g(tn)}. The vectors G and 0 are generated
as follows. For given 0, X is generatedas explainedin ? 2 2 and Appendix 1. We then
extract G. To generate U2, we can show that
liM
p(2IoG; k)oc(z2) -n/2 1x exp {-2 Z
L (i)21R(i)
(a) (b)
1000.0
0-50 100.0
a2 T2 10.0
1.0
0-05
0.1
0 10 000 20 000 0 10 000 20 000
Iteration j Iteration j
(c) (d)
1000.0
0-50
100.0
a2 -2 10?0
0-05. 1.0
0.1
0 500 1000 1500 2000 0 500 1000 1500 2000
Iteration j Iteration j
Fig. 1. Example 1: generatedvalues of a2 and -2 with starting values a2= 1 and X = E(X I2 = 1,
T2= 1). In (a) and (b) the states are generatedone at a time and in (c) and (d) they are generated
simultaneously.
To study the relative efficienciesof the two algorithmsonce the Gibbs samplerhas
convergedwe use the marginallikelihoodestimatesof (F2 and T;2as startingvalues.For a
(a) (b)
0.8. 0 0.8
100 200 300 0 100 200 300
Lag Lag
(c) (d)
0.2 0.2
10_ 10
0-6 06i
024. u 04
0 -0 ... ...........................................
. .. . .. . ..
Fig. 2. Example 1: sample autocorrelation function (ACF) for g(0 25) and -c2.In (a) and (b) the
states are generated one at a time and in (c) and (d) they are generated simultaneously.
where N 106000
= and g'i(t) is the jth Gibbs iterate of g(t) duringthe samplingperiod.
The smoothedvalues E{g(t)IY,0[ } in (3 4) are obtainedusing the smoothingalgorithm
of Ansley & Kohn (1990). For the algorithmgeneratingthe states one at a time the
histogramestimatesare as in (3 4) while the mixtureestimatesare computedas in ? 2 of
Carlinet al. (1992). The resultsof Gelfand& Smith(1990) and Liu et al. (1994) suggest
that mixtureestimateswill usually have smallervariancethan histogramestimates.We
(a) (b)
0-50
g(t) 00
-1-0
0.0 0.2 0.4 0-6 0-8 1.0 0 500 1000 1500 2000
t Iteration j
(c)
0 '050
T2.
0005
0 500 1000 1500 2000
Iteration j
Fig. 3. Example 2: (a) shows the data (dots) together with the function g(t) (dashes) and the mixture
estimates of g(t) (solid); (b) and (c) show the generated values of U2 and T2. The states are generated
simultaneously.
??2 and 3 2 applyto the generationof X and 0. Thus we will only considerthe generation
of K. We first assumethat a priorithe K(t) come from a Markovchain with
ACKNOWLEDGEMENT
We would like to thank the Division of Mathematics and Statistics, CSIRO, and the
Australian Research Council for partial support. We would also like to thank David Wong
for help with the computations.
APPENDIX 1
Algorithmto generatestate vector
We show how to generateX conditionalon yn, K and 0. We omit dependenceon K and 0, and,
as in ? 22, let
x(tJj) = Ex(t)J YiJ, S(tji) = var {x(t)JYiJ}
For t = 1, ... , n the conditionalmeanx(t It) and the conditionalvarianceS(t It) are obtainedusing
the Kalmanfilter(Anderson& Moore, 1979,p. 105).
Using Lemma 2 1 we show how to generatex(n),. .. , x(1) in that order conditioningon yn.
First,p{x(n) IYnI}is normalwith mean x(nIn)and varianceS(nIn).To generatex(t) conditionalon
yt and x(t + 1) we note that we can regardthe equation
S(t It,i) = S(t It, i -1) -S(t It,i -1)F-i(t + 1)F-i(t+ 1)'S(tIt,i -1)/R(t, i).
We thereforeobtain
x(tlt,m)=E{x(t)IYt,x(t+1)}, S(tlt,m)=var{x(t)IYt,x(t+1)}.
It is now straightforwardto generatex(t) conditionallyon Yt and x(t + 1), as it is normally
distributedwith mean x(t It,m) and varianceS(t It,m).
2
APPENDIX
Algorithmto generateindicatorvariables
Weshowhow to generateK conditionalon yn, X and 0. Weomitdependenceon 0. Let k1,... , km
be the possible values assumedby K(t) (t = 1, ... ., n) and suppose that the transitionmatrices
specifyingp{K(t)IK(t- 1)} (t = 2, . . . , n) are known.We note that if y(t) is observedthen
p{K(t)IYt,Xt} ocp{y(t)Ix(t), K(t)}p{x(t)Ix(t - 1), K(t)}p{K(t)IYt-1 Xt-},
REFERENCES
ANDERSON, B. D. 0. & MooRE, J. B. (1979). Optimal Filtering. Englewood Cliffs, New Jersey: Prentice Hall.
ANSLEY, C. F. & KoHN. R. (1985). Estimation filtering and smoothing in state space models with partially
diffuse initial conditions. Ann Statist. 13, 1286-316.
ANSLEY, C. F. & KoHN, R. (1990). Filtering and smoothing in state space models with partially diffuse initial
conditions. J. Time Ser. Anal. 11, 277-93.
Box, G. E. P. & TIAO,G. C. (1968). A Bayesian approach to some outlier problems. Biometrika 55, 119-29.
CARLIN,B. P., POLSON,N. G. & STOFFER, D. S. (1992). A Monte Carlo approach to nonnormal and nonlinear
state space modeling. J. Am. Statist. Assoc. 87, 493-500.
GELFAND,A. E. & SMITH, A. F. M. (1990). Sampling-based approaches to calculating marginal densities.
J. Am. Statist. Assoc. 85, 398-409.
GORDON,K. & SMITH,A. F. M. (1990). Monitoring and modeling biomedical time series. J. Am. Statist.
Assoc. 85, 328-37.
HAMILTON, J. D. (1989). A new approach to the economic analysis of nonstationary time series and the
business cycle. Econometrica 57, 357-84.
HARRISON, P. J. & STEVENS, C. F. (1976). Bayesian forecasting (with discussion). J. R. Statist. Soc. B 38, 205-47.
KITAGAWA,G. (1987). Non-Gaussian state space modeling of nonstationary time-series (with discussion).
J. Am. Statist. Assoc. 82, 1032-63.
KITAGAWA,G. & GERSCH,W. (1984). A smoothness priors-state space approach to time series with trend and
seasonalities. J. Am. Statist. Assoc. 79, 378-89.
KoHN, R. & ANSLEY,C. F. (1987). A new algorithm for spline smoothing based on smoothing a stochastic
process. SIAM J. Sci. Statist. Comput.8, 33-48.
LIu, J., WONG, W. H. & KONG, A. (1994). Covariance structure of the Gibbs sampler with applications to
the comparison of estimators and augmentation schemes. Biometrika 81, 27-40.
MEINHOLD,R. J. & SINGPURWALLA, N. D. (1989). Robustification of Kalman filter models. J. Am. Statist.
Assoc. 84, 479-86.
MORAN,P. A. P. (1975). The estimation of standard errors in Monte Carlo simulation experiments. Biometrika
62, 1-4.
SCHWEPPE,C. F. (1965). Evaluation of likelihood functions for Gaussian signals. IEEE Trans. Info. Theory
11, 61-70.
SHuMwAY,R. H. & STOFFER,D. S. (1991). Dynamic linear models with switching. J. Am. Statist. Assoc.
86, 763-9.
TIERNEY,L. (1994). Markov chains for exploring posterior distributions. Ann. Statist. To appear.
WAHBA,G. (1983). Bayesian 'confidence intervals' for the cross-validated smoothing spline. J. R. Statist. Soc. B
45, 133-50.
WEST, M. & HARRISON,J. (1989). Bayesian Forecasting and Dynamic Models, Springer Series in Statistics.
New York: Springer-Verlag.