Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Modelling covariance matrices by the trigonometric separation strategy with application to hidden Markov models

  • Original Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

Bayesian inference on the covariance matrix is usually performed after placing an inverse-Wishart or a multivariate Jeffreys as a prior density, but both of them, for different reasons, present some drawbacks. As an alternative, the covariance matrix can be modelled by separating out the standard deviations and the correlations. This separation strategy takes advantage of the fact that usually it is more straightforward and flexible to set priors on the standard deviations and the correlations rather than on the covariance matrix. On the other hand, the priors must preserve the positive definiteness of the correlation matrix. This can be obtained by considering the Cholesky decomposition of the correlation matrix, whose entries are reparameterized using trigonometric functions. The efficiency of the trigonometric separation strategy (TSS) is shown through an application to hidden Markov models (HMMs), with conditional distributions multivariate normal. In the case of an unknown number of hidden states, estimation is conducted using a reversible jump Markov chain Monte Carlo algorithm based on the split-and-combine and birth-and-death moves whose design is straightforward because of the use of the TSS. Finally, an example in remote sensing is described, where a HMM containing the TSS is used for the segmentation of a multi-colour satellite image.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Barnard J, McCulloch R, Meng X-L (2000) Modeling covariance matrices in terms of standard deviations and correlations, with application to shrinkage. Stat Sin 10:1281–1311

    MathSciNet  MATH  Google Scholar 

  • Cappé O, Moulines E, Rydén T (2005) Inference in hidden Markov models. Springer, New York

    Book  MATH  Google Scholar 

  • Cappé O, Robert CP, Rydén T (2003) Reversible jump, birth-and-death and more general continuous Markov chain Monte Carlo samplers. J R Stat Soc Ser B 63:679–700

    Article  MathSciNet  MATH  Google Scholar 

  • Celeux G, Hurn M, Robert CP (2000) Computational and differential difficulties with mixture posterior distributions. J Am Stat Assoc 95:957–970

    Article  MATH  Google Scholar 

  • Daniels MJ, Kass RE (1999) Nonconjugate Bayesian estimation of covariance matrices and its use in hierarchical models. J Am Stat Assoc 94:1254–1263

    Article  MathSciNet  MATH  Google Scholar 

  • Daniels MJ, Pourahmadi M (2002) Bayesian analysis of covariance matrices and dynamic models for longitudinal data. Biometrika 89:553–566

    Article  MathSciNet  MATH  Google Scholar 

  • Daniels MJ, Pourahmadi M (2009) Modeling covariance matrices via partial autocorrelations. J Multivariate Anal 100:2352–2363

    Article  MathSciNet  MATH  Google Scholar 

  • Dellaportas P, Papageorgiou I (2006) Multivariate mixtures of normals with unknown number of components. Stat Comput 16:57–68

    Article  MathSciNet  Google Scholar 

  • Dellaportas P, Plataniotis A, Titsias MK (2015) Scalable inference for a full multivariate stochastic volatility model. arXiv:1510.05257v1. Accessed 25 Aug 2017

  • Friel N, Pettitt AN, Reeves R, Wit E (2009) Bayesian inference in hidden Markov random fields for binary data defined on large lattices. J Comput Graph Stat 18:243–261

    Article  MathSciNet  Google Scholar 

  • Frühwirth-Schnatter S (2001) Markov chain Monte Carlo estimation of classical and dynamic switching and mixture models. J Am Stat Assoc 96:194–209

    Article  MathSciNet  MATH  Google Scholar 

  • Gelman A, Meng X-L (1998) Simulating normalizing constants: from importance sampling to bridge sampling to path sampling. Stat Sci 13:163–185

    Article  MathSciNet  MATH  Google Scholar 

  • Giordana N, Pieczynski W (1997) Estimation of generalised multisensor hidden Markov chains and unsupervised image segmentation. IEEE Trans Pattern Anal Mach Intell 19:465–475

    Article  Google Scholar 

  • Green PJ, Richardson S (2002) Hidden Markov models and disease mapping. J Am Stat Assoc 97:1055–1070

    Article  MathSciNet  MATH  Google Scholar 

  • Hamilton JD (1994) Time series analysis. Princeton University Press, Princeton

    MATH  Google Scholar 

  • Hoff PD (2009) A hierarchical eigenmodel for pooled covariance estimation. J R Stat Soc Ser B 71:971–992

    Article  MathSciNet  MATH  Google Scholar 

  • Kamary K, Robert CP (2014) Reflecting about selecting noninformative priors. arXiv:1402.6257v3. Accessed 25 Aug 2017

  • Kim C-J (1993) Dynamic linear models with Markov-switching. J Econ 60:1–22

    Article  MathSciNet  Google Scholar 

  • Krolzig H-M (1997) Markov-switching vector autoregressions: modelling, statistical inference and applications to business cycle analysis. Springer, Berlin

    Book  MATH  Google Scholar 

  • Leonard T, Hsu JST (1992) Bayesian inference for a covariance matrix. Ann Stat 20:1669–1696

    Article  MathSciNet  MATH  Google Scholar 

  • Liechty JC, Liechty MW, Müller P (2004) Bayesian correlation estimation. Biometrika 91:1–14

    Article  MathSciNet  MATH  Google Scholar 

  • Marin JM, Mengersen KL, Robert CP (2005) Bayesian modelling and inference on mixture of distributions. In: Dey D, Rao CR (eds) Handbooks of statistics 25. Elsevier Science, Amsterdam, pp 459–507

    Google Scholar 

  • Møller J, Pettitt AN, Berthelsen KK, Reeves RW (2006) An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants. Biometrika 93:451–458

    Article  MathSciNet  MATH  Google Scholar 

  • Murray I, Ghahramani Z, MacKay DJC (2006) MCMC for doubly-intractable distributions. In: Dechter R, Richardson T (eds) Proceedings of the twenty-second conference on uncertainty in artificial intelligence. AUAI Press, Arlington, pp 359–366

  • Paroli R, Spezia L (2010) Reversible jump MCMC methods and segmentation algorithms in hidden Markov models. Aust N Z J Stat 52:151–166

    Article  MathSciNet  MATH  Google Scholar 

  • Pinheiro JC, Bates DM (1996) Unconstrained parameterizations for the variance-covariance matrix. Stat Comput 6:289–296

    Article  Google Scholar 

  • Qian W, Titterington DM (1991) Estimation of parameters in hidden Markov models. Philos Trans Roy Soc Lond Ser A 337:407–428

    Article  MATH  Google Scholar 

  • Richardson S, Green PJ (1997) On Bayesian analysis of mixtures with an unknown number of components (with discussion). J R Stat Soc Ser B 59:731–792

    Article  MATH  Google Scholar 

  • Scott SL, James GM, Sugar CA (2005) Hidden Markov models for longitudinal comparisons. J Am Stat Assoc 100:359–369

    Article  MathSciNet  MATH  Google Scholar 

  • Seaman JW III, Seaman JW Jr, Stamey JD (2012) Hidden dangers of specifying noninformative priors. Am Stat 66:77–84

    Article  MathSciNet  Google Scholar 

  • Smith M, Kohn R (2002) Parsimonius covariance matrix estimation for longitudinal data. J Am Stat Assoc 97:1141–1153

    Article  MATH  Google Scholar 

  • Spezia L (2010) Bayesian analysis of multivariate Gaussian hidden Markov models with an unknown number of regimes. J Time Ser Anal 31:1–11

    Article  MathSciNet  MATH  Google Scholar 

  • Spezia L, Friel N, Gimona A (2017) Spatial hidden Markov models and species distribution. J Appl Stat, published online

  • Wang H, Pillai NS (2013) On a class of shrinkage priors for covariance matrix estimation. J Comput Graph Stat 22:689–707

    Article  MathSciNet  Google Scholar 

  • Yang R, Berger JO (1994) Estimation of a covariance matrix using the reference prior. Ann Stat 22:1195–1211

    Article  MathSciNet  MATH  Google Scholar 

  • Zucchini W, MacDonald IA, Langrock R (2016) Hidden Markov models for time series: an introduction using R, 2nd edn. Chapman & Hall/CRC Press, Boca Raton

    MATH  Google Scholar 

Download references

Acknowledgements

This research was funded by the Scottish Government’s Rural and Environment Science and Analytical Services Division. The images in Fig. 6 were kindly produced by Laura Origgi. The satellite image was provided by Carlos Padovani. A discussion with Laura Poggio was useful to understand a few problems related to multispectral sensors. Comments from Mark Brewer, Glenn Marion, and two anonymous referees improved the quality of the final paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luigi Spezia.

Appendices

Appendix A

The vector \(\left( \mu ,S,\alpha ,\Omega ,m\right) ^{\prime }\) is estimated in a single RJMCMC run only, by using the values of \(\mu \), S, \(\alpha \), and \(\Omega \) collected in those sweeps in which the number of hidden states is equal to the most frequent value of m. Nevertheless, to make the understanding of the algorithm easier, we first present the details of the MCMC algorithm for a given number m of states and then describe the RJMCMC when m is unknown.

Known number of states

In any sweep of the MCMC algorithm, the parameters \(\mu \), S, \(\alpha \), \(\Omega \) are accepted or rejected, after generating their elements by random walk moves. Random walks can update real parameters; thus, positive quantities \(\omega _{j,i}\), standard deviations \(\sigma _{i,h}\), and angles \(\alpha _{i,k,l}\) are mapped onto the real line through the logarithmic transformation. As the angles belong to the interval \(( A_{i,k,l};B_{i,k,l}) \), we consider the natural logarithm of \([ \alpha _{i,k,l}-A_{i,k,l}] /[ B_{i,k,l}-\alpha _{i,k,l}] \). The values \(\mu _{i,h}\), \(\ln \sigma _{i,h}\), \(\ln [ \alpha _{i,k,l}-A_{i,k,l}] /[ B_{i,k,l}-\alpha _{i,k,l}] \), and \(\ln \omega _{j,i}\), for any \( i;j=1,\ldots ,m\), \(h=1,\ldots ,p\), \(k=1,\ldots ,p-1\), for any \(l=k+1,\ldots ,p\), belong to the interval \(\left( -\,\infty ;+\,\infty \right) \) and can be generated by the following random walk proposals:

$$\begin{aligned}&\mu _{i,h}=\mu _{i,h}^{(old)}+U_{M} \\&\ln \sigma _{i,h}=\ln \sigma _{i,h}^{(old)}+U_{\Sigma } \\&\ln \left[ \alpha _{i,k,l}-A_{i,k,l}\right] /\left[ B_{i,k,l}-\alpha _{i,k,l} \right] =\ln \left[ \alpha _{i,k,l}^{(old)}-A_{i,k,l}\right] /\left[ B_{i,k,l}-\alpha _{i,k,l}^{(old)}\right] +U_{A} \\&\ln \omega _{j,i}=\ln \omega _{j,i}^{(old)}+U_{\Omega }, \end{aligned}$$

where \(U_{\Psi }\sim \mathcal {N}\left( 0;\sigma _{\Psi }^{2}\right) \), with \( \Psi \in \left\{ M;\Sigma ;A;\Omega \right\} \).

The proposals \(\mu \), S, \(\alpha \), \(\Omega \) are accepted if \(u_{\Psi }\le \min \left\{ 1;A_{\Psi }\right\} \), where \(u_{\Psi }\) is a random number generated from the uniform distribution \(\mathcal {U}\left( 0;1\right) \) and \(A_{\Psi }\) are the acceptance ratios (\( \Psi \in \left\{ M;\Sigma ;A;\Omega \right\} \)), i.e.

$$\begin{aligned} A_{\Psi }=\text {likelihood ratio }\times \text { prior ratio }\times \text { ratio of the products of the Jacobians,} \end{aligned}$$

where the likelihood ratio is

$$\begin{aligned} p(y^{T}\mid \mu ,S,\alpha ,\Omega ,m) /p(y^{T}\mid \mu ^{(old)},S ^{(old)},\alpha ^{(old)},\Omega ^{(old)},m) , \end{aligned}$$

the prior ratios, respectively, are

$$\begin{aligned} \begin{array}{cccc} p\left( \mu \right) /p\left( \mu ^{(old)}\right) ;&p\left( S\right) /p\left( S^{(old)}\right) ;&p\left( \alpha \right) /p\left( \alpha ^{(old)}\right) ;&p\left( \Omega \right) /p\left( \Omega ^{(old)}\right) , \end{array} \end{aligned}$$

and the ratio of the products of the Jacobian of the logarithmic transformations of the \(\sigma _{i,h}\), the \(\alpha _{i,k,l}\), and the \(\omega _{j,i}\), respectively, are

$$\begin{aligned} \overset{m}{\underset{i=1}{\prod }}\overset{p}{\underset{h=1}{\prod }} \sigma _{i,h}\bigg / \overset{m}{\underset{i=1}{\prod }}\overset{p}{ \underset{h=1}{\prod }}\sigma _{i,h}^{(old)} , \frac{{{\prod }_{i=1}^m}{{\prod }_{k=1}^p} {{\prod }_{l=1}^p}\left( B_{i,k,l}-\alpha _{i,k,l}\right) \left( \alpha _{i,k,l}-A_{i,k,l}\right) }{{{\prod }_{i=1}^m} {{\prod }_{k=1}^p}{{\prod }_{l=1}^p}\left( B_{i,k,l}-\alpha _{i,k,l}^{(old)}\right) \left( \alpha _{i,k,l}^{(old)}-A_{i,k,l}\right) }, \end{aligned}$$

and

$$\begin{aligned} \overset{m}{\underset{i=1}{\prod }}\overset{m}{\underset{j=1}{\prod }}\omega _{j,i}\bigg / \overset{m}{\underset{i=1}{\prod }}\overset{m}{\underset{j=1}{ \prod }}\omega _{j,i}^{(old)}. \end{aligned}$$

At the end of each iteration, the MCMC sample is post-processed as in Marin et al. (2005). Assume that at any iteration k of the MCMC algorithm, we store the values \(\{\mu ^{(k)},S^{(k)}, \alpha ^{(k)},\Omega ^{(k)}\} \). Let H be the class of the m! permutations \(\eta _{j}\) of the labels (\(\eta _{j}\in H\), for any \( j=1,\ldots ,m!\)), so that \(\eta _{j}\ (\mu ^{(k)},S ^{(k)},\alpha ^{(k)},\Omega ^{(k)}) \) be some permutation of the parameters obtained at the k-th iteration, by which the means, the standard deviations, the angles, and the rows and the columns of the transition matrix assume a new order.

After the burn-in, if a sample of size N (\(k=1,\ldots ,N\)) is simulated, the post-processing algorithm works as follows:

(i):

compute the posterior mode \(\left\{ \mu ^{*}, S^{*},\alpha ^{*},\Omega ^{*}\right\} \), such that

$$\begin{aligned} \left\{ \mu ^{*},S^{*},\alpha ^{*}, \Omega ^{*}\right\} =\arg \underset{k=1,\ldots ,N}{\max } p(\mu ^{(k)},S^{(k)},\alpha ^{(k)},\Omega ^{(k)}\mid y^{T},m) \end{aligned}$$
(ii):

for any \(k=1,\ldots ,N\), compute \(\eta ^{*}\) such that

$$\begin{aligned} \eta ^{*}=\arg \underset{\eta _{j}\in H}{\min }\left\| \eta _{j}(\mu ^{(k)},S^{(k)},\alpha ^{(k)}, \Omega ^{(k)}) -(\mu ^{*},S^{*},\alpha ^{*},\Omega ^{*}) \right\| \end{aligned}$$

and place

$$\begin{aligned} (\mu ^{(k)},S^{(k)},\alpha ^{(k)},\Omega ^{(k)}) =\eta ^{*}(\mu ^{(k)},S ^{(k)},\alpha ^{(k)},\Omega ^{(k)}) . \end{aligned}$$

In step (ii), for any entry of the MCMC sample, we first compute the Euclidean norm between any permuted vector of parameters and the posterior mode; then, we select that special reordered vector which is the nearest to the posterior mode. Therefore, the label switching problem is circumvented without selecting any artificial identifiability constraint.

Finally, the hidden sequence of the states is reconstructed. Each state is the maximizer of the current smoothed probabilities: after obtaining parameter estimates, we can compute backwards the smoothed probabilities of the states (Kim 1993), that is the probabilities of any state, at any time, given all observations and the estimates of the parameters, i.e. \(\widehat{ \mu }\), \(\widehat{\Sigma }\), \(\widehat{\Gamma }\):

$$\begin{aligned} x_{t}=\arg \max _{j}P( X_{t}=j\mid y^{T},\widehat{\mu } ,\widehat{\Sigma },\widehat{\Gamma },m) , \end{aligned}$$

with \(t=1,\ldots ,T\), where

$$\begin{aligned}&P( X_{t}=j\mid y^{T},\widehat{\mu },\widehat{ \Sigma },\widehat{\Gamma },m) \nonumber \\&\quad =P( X_{t}=j\mid y ^{t},\widehat{\mu },\widehat{\Sigma },\widehat{ \Gamma },m) \overset{m}{\underset{i=1}{\sum }}\frac{\gamma _{j,i} \text { }P( X_{t+1}=i\mid y^{T},\widehat{\mu }, \widehat{\Sigma },\widehat{\Gamma },m) }{P( X_{t+1}=i\mid y^{t},\widehat{\mu },\widehat{\Sigma },\widehat{\Gamma },m) }, \end{aligned}$$
(3)

for any \(t=T-1,\ldots ,1\) and any \(j=1,\ldots ,m\), starting from \(P( X_{T}\mid y^{T},\widehat{\mu },\widehat{\Sigma }, \widehat{\Gamma },m) \), which can be obtained by means of the filtered probabilities (2).

Unknown number of states

Our RJMCMC algorithm is based on three main moves, which allow changes in the number of hidden states:

[i]:

update the parameters as described in the previous subsection;

[ii]:

split one state of the MVN-HMM into two or merge two states into one;

[iii]:

give birth or death to a state.

In move [ii], the split is randomly chosen with probability \(b_{m}= \mathbb {I}(m=1)+0.5\cdot \mathbb {I}(2\le m<m_{\max })\), whereas the combine is randomly chosen with probability \(d_{m}=1-b_{m}\).

In the combine move, two adjacent states, e.g. \(i_{1}\) and \(i_{2}=i_{1}+1\), are randomly selected and combined in state \(i^{*}\), reducing by one the number of hidden states; the corresponding parameters are combined as follows:

$$\begin{aligned} \begin{array}{ll} \mu _{i^{*},h}=( \mu _{i_{1},h}+\mu _{i_{2},h}) /2 &{}\quad \text { for any }h=1,\ldots ,p \\ \sigma ^{2}_{i^{*},h}=( \sigma ^{2}_{i_{1},h}\cdot \sigma ^{2} _{i_{2},h}) ^{1/2} &{}\quad \text {for any }h=1,\ldots ,p \\ \alpha _{i^{*},k,l}=\alpha _{i_{1},k,l}+\alpha _{i_{2},k,l} &{}\quad \text {for any }k=1,\ldots ,p-1\text { and any }l=k+1,\ldots ,p \\ \omega _{i,i^{*}}=\omega _{i,i_{1}}+\omega _{i,i_{2}} &{}\quad \text {for any } i\ne i^{*} \\ \omega _{i^{*},j}=( \omega _{i_{1},j}\cdot \omega _{i_{2},j}) ^{1/2} &{}\quad \text {for any }j\ne i^{*} \\ \omega _{i^{*},i^{*}}=( \omega _{i_{1},i_{1}}\cdot \omega _{i_{2},i_{1}}) ^{1/2}+( \omega _{i_{1},i_{2}}\cdot \omega _{i_{2},i_{2}}) ^{1/2} &{} \end{array} \end{aligned}$$
(4)

In the split move, a state \(i^{*}\) is picked at random and split in the two adjacent states \(i_{1}\) and \(i_{2}\); the corresponding parameters are split as follows, respecting the six equalities in (4). First, we generate the following \(p(p+3)/2+2m+1\) random values:

  • \(u_{1,h}\) from \(\mathcal {N}\left( 0;0.5\right) \), for any \( h=1,\ldots ,p\);

  • \(u_{2,h}\) from \(\mathcal {G}\left( 1;5\right) \), for any \( h=1,\ldots ,p\);

  • \(u_{3,k,l}\) from \(\mathcal {U}\left( 0;1\right) \), for any \( k=1,\ldots ,p-1\) and any \(l=k+1,\ldots ,p\);

  • \(v_{i}\) from \(\mathcal {U}\left( 0;1\right) \), for any \(i\ne i^{*}\);

  • \(w_{j}\) from \(\mathcal {G}\left( 1;5\right) \), for any \(j\ne i^{*}\);

  • \(\rho \) from \(\mathcal {U}\left( 0;1\right) \);

  • \(\tau _{1}\) and \(\tau _{2}\) from \(\mathcal {G}\left( 1;5\right) \).

Then, we set:

$$\begin{aligned} \begin{array}{ll} \mu _{i_{1},h}=\mu _{i^{*},h}-\sigma _{i^{*},h,h}\cdot u_{1,h} &{}\quad \mu _{i_{2},h}=\mu _{i^{*},h}+\sigma _{i^{*},h,h}\cdot u_{1,h} \\ \sigma ^{2}_{i_{1},h}=\sigma ^{2}_{i^{*},h}\cdot u_{2,h} &{}\quad \sigma ^{2}_{i_{2},h}=\sigma ^{2}_{i^{*},h}/u_{2,h} \\ \alpha _{i_{1},k,l}=\alpha _{i^{*},k,l}\cdot u_{3,k,l} &{}\quad \alpha _{i_{2},k,l}=\alpha _{i^{*},k,l}\cdot \left( 1-u_{3,k,l}\right) \\ \omega _{i,i_{1}}=\omega _{i,i^{*}}\cdot v_{i} &{}\quad \omega _{i,i_{2}}=\omega _{i,i^{*}}\cdot \left( 1-v_{i}\right) \\ \omega _{i_{1},j}=\omega _{i^{*},j}\cdot w_{j} &{}\quad \omega _{i_{2},j}=\omega _{i^{*},j}/w_{j} \\ \omega _{i_{1},i_{1}}=\omega _{i^{*},i^{*}}\cdot \rho \cdot \tau _{1} &{}\quad \omega _{i_{2},i_{1}}=\omega _{i^{*},i^{*}}\cdot \rho /\tau _{1} \\ \omega _{i_{1},i_{2}}=\omega _{i^{*},i^{*}}\cdot \left( 1-\rho \right) \cdot \tau _{2} &{}\quad \omega _{i_{2},i_{2}}=\omega _{i^{*},i^{*}}\cdot \left( 1-\rho \right) /\tau _{2} \end{array} \end{aligned}$$

The split move is accepted with probability \(\min \left\{ 1;A\right\} \), while the combine move is accepted with probability \(\min \left\{ 1;A^{-1}\right\} \). Let the tilde mark the parameters in the model with \(m+1\) states, with respect to those entering in the model with m states; the analytic expression of A is

$$\begin{aligned}&\frac{p\left( y^{T}\mid \widetilde{\mu },\widetilde{S}, \widetilde{\alpha },\widetilde{\Omega },m+1\right) }{ p\left( y^{T}\mid \mu ,S,\alpha ,\Omega ,m\right) }\cdot \frac{p(m+1)}{p(m)}\cdot \frac{p\left( \mu \right) \cdot p\left( \widetilde{S}\right) \cdot p\left( \widetilde{ \alpha }\right) \cdot p\left( \widetilde{\Omega }\right) }{p\left( \widetilde{\mu }\right) \cdot p\left( S\right) \cdot p\left( \alpha \right) \cdot p\left( \Omega \right) }\cdot \frac{d_{m+1}/m}{b_{m}/m}\cdot \nonumber \\&\quad \cdot \frac{m+1}{{{\prod }_{h=1}^p}p( u_{1,h}) \cdot {{\prod }_{h=1}^p}p( u_{2,h}) \cdot {{\prod }_{p-1}^{k=1}}{{\prod }_{l=k+1}^p} p( u_{3,k,l}) \cdot {{\prod }_{i\ne i^*}}p( v_{i}) \cdot {{\prod }_{j\ne i^*}}p( w_{j}) \cdot p( \rho ) \cdot p( \tau _{1}) \cdot p( \tau _{2}) }\cdot |J|, \nonumber \\ \end{aligned}$$
(5)

where \(p(m+1)/p(m)\) cancels out; \(b_{m}/m\) is the probability of splitting the special state \(i^{*}\), while \(d_{m+1}/m\) is the probability of merging one of the m pairs \(\left( i_{1};i_{2}\right) \) of adjacent states; factor \(\left( m+1\right) \) is the ratio \(\left( m+1\right) !/m!\), in which the factorials arise from the exchangeability assumption on the states; J is the Jacobian of the transformation from \((\omega _{i,i^{*}},v_{i},\omega _{i^{*},j},w_{j},\omega _{i^{*},i^{*}},\rho ,\tau _{1},\tau _{2},\mu _{i^{*},h},u_{1,h},\)\(\sigma ^{2}_{i^{*},h},u_{2,h},\alpha _{i^{*},k,l},u_{3,k,l})\) to \((\widetilde{\omega } _{i,i_{1}},\widetilde{\omega }_{i,i_{2}},\widetilde{\omega }_{i_{1},j}, \widetilde{\omega }_{i_{2},j},\widetilde{\omega }_{i_{1},i_{1}},\widetilde{ \omega }_{i_{1},i_{2}},\widetilde{\omega }_{i_{2},i_{1}},\widetilde{\omega } _{i_{2},i_{2}},\widetilde{\mu }_{i_{1}},\widetilde{\mu }_{i_{2}},\widetilde{ \sigma }^{2}_{i_{1},h},\widetilde{\sigma }^{2}_{i_{2},h},\widetilde{\alpha }_{i_{1},k,l}\), \(\widetilde{\alpha }_{i_{2},k,l})\).

Note that the Jacobian can be decomposed into the product of five subdeterminants, i.e. \(J_{1}\) for the transformation from \(\left( \omega _{i,i^{*}},v_{i}\right) \) to \(( \widetilde{\omega }_{i,i_{1}}, \widetilde{\omega }_{i,i_{2}}) \), \(J_{2}\) for the transformation from \( ( \omega _{i^{*},j},w_{j}) \) to \(( \widetilde{\omega } _{i_{1},j},\widetilde{\omega }_{i_{2},j}) \), \(J_{3}\) for the transformation from \(( \omega _{i^{*},i^{*}},\rho ,\tau _{1},\tau _{2}) \) to \(( \widetilde{\omega }_{i_{1},i_{1}}, \widetilde{\omega }_{i_{1},i_{2}},\widetilde{\omega }_{i_{2},i_{1}}, \widetilde{\omega }_{i_{2},i_{2}}) \), \(J_{4}\) for the transformation from \(( \mu _{i^{*},h},u_{1,h},\sigma ^{2}_{i^{*},h},u_{2}) \) to \(( \widetilde{\mu }_{i_{1,h}},\widetilde{\mu }_{i_{2},h}, \widetilde{\sigma }^{2}_{i_{1},h},\widetilde{\sigma }^{2}_{i_{2},h}) \), \( J_{5}\) for the transformation from \(( \alpha _{i^{*},k,l},u_{3,k,l}) \) to \(( \widetilde{\alpha }_{i_{1},k,l}, \widetilde{\alpha }_{i_{2},k,l}) \). Hence, we have \(|J|=|J_{1}\cdot J_{2}\cdot J_{3}\cdot J_{4}\cdot J_{5}|\), where

$$\begin{aligned} J_{1}= & {} \underset{i\ne i^{*}}{\prod }\left( -\,\omega _{i,i^{*}}\right) \qquad J_{2}=\left( -\,2\right) ^{m-1}\cdot \underset{j\ne i^{*} }{\prod }\frac{\omega _{i^{*},j}}{w_{j}}\qquad J_{3}=-\,2\cdot \omega _{i^{*},i^{*}}^{3}\cdot \frac{\rho \cdot \left( 1-\rho \right) }{ \tau _{1}\cdot \tau _{2}} \\ J_{4}= & {} \left( -\,4\right) ^{p}\cdot \overset{p}{\underset{h=1}{\prod }}\frac{ \sigma _{i^{*},h}^{3}}{u_{2,h}}\qquad J_{5}=\overset{p-1}{\underset{ k=1}{\prod }}\underset{l=k+1}{\overset{p}{\prod }}\left( -\,\alpha _{i^{*},k,l}\right) . \end{aligned}$$

In move [iii], birth and death are chosen with probability \(b_{m}\) and \(d_{m}\), respectively. In a death move, a state is selected at random and then suppressed along with the corresponding parameters. In a birth move a new state \(i^{*}\) is added to the previous m and the new parameters are drawn from their respective priors; the position of the new state is generated at random. The birth move is accepted with probability \(\min \left\{ 1;A\right\} \), while the death move is accepted with probability \( \min \left\{ 1;A^{-1}\right\} \); the analytic expression of A is

$$\begin{aligned}&\frac{p\left( y^{T}\mid \widetilde{\mu },\widetilde{S}, \widetilde{\alpha },\widetilde{\Omega },m+1\right) }{ p\left( y^{T}\mid \mu ,S,\alpha ,\Omega ,m\right) }\cdot \frac{p(m+1)}{p(m)}\cdot \frac{p\left( \mu \right) \cdot p\left( \widetilde{S}\right) \cdot p\left( \widetilde{ \alpha }\right) \cdot p\left( \widetilde{\Omega }\right) }{p\left( \widetilde{\mu }\right) \cdot p\left( S\right) \cdot p\left( \alpha \right) \cdot p\left( \Omega \right) }\\&\quad \cdot \frac{d_{m+1}/\left( m+1\right) }{b_{m}/\left( m+1\right) }\cdot \\&\quad \cdot \frac{m+1}{p( \mu _{i^{*}}) \cdot p( S_{i^{*}}) \cdot p( \alpha _{i^{*}}) \cdot {{\prod }_{i\ne i^*}}p( \omega _{i,i^{*}}) \cdot {{\prod }_{j\ne i^{*}}}p( \omega _{i^{*},j}) \cdot p( \omega _{i^{*},i^{*}}) }\cdot |J|, \end{aligned}$$

where \(p(m+1)/p(m)\) cancels out; the ratio of the products of the prior densities multiplied by the reciprocal of the product of the densities of the new-born parameters is equal to 1; \(b_{m}/\left( m+1\right) \) is the probability of giving birth to a new state in the special position \(i^{*} \), while \(d_{m+1}/\left( m+1\right) \) is the probability of killing a special state; factor \(\left( m+1\right) \) has the same meaning as in (5); the Jacobian J is 1.

Appendix B

Estimates (posterior means) of the transition matrix:

$$\begin{aligned} \mathbf {\Gamma }=\left[ \begin{array}{ccc} 0.960 &{}\quad 0.029 &{}\quad 0.011 \\ 0.045 &{}\quad 0.921 &{}\quad 0.034 \\ 0.017 &{}\quad 0.032 &{}\quad 0.951 \end{array} \right] \end{aligned}$$

Estimates (posterior means) of the mean vectors:

$$\begin{aligned} \mathbf {\mu }_{1}= & {} \left( -\,1.292,-\,0.026,1.222\right) \quad \mathbf {\mu } _{2}=\left( -\,0.282,-\,0.064,0.467\right) \\ \mathbf {\mu }_{3}= & {} \left( -\,1.416,0.068,1.318\right) \end{aligned}$$

Estimates (posterior means) of the covariance matrices:

$$\begin{aligned} \mathbf {\Sigma }_{1}= & {} \left[ \begin{array}{ccc} 0.144 &{}\quad 0.273 &{}\quad 0.117 \\ 0.273 &{}\quad 1.309 &{}\quad 0.260 \\ 0.117 &{}\quad 0.260 &{}\quad 0.165 \end{array} \right] \quad \mathbf {\Sigma }_{2}=\left[ \begin{array}{ccc} 0.402 &{}\quad 0.249 &{}\quad 0.170 \\ 0.249 &{}\quad 0.836 &{}\quad 0.210 \\ 0.170 &{}\quad 0.210 &{}\quad 0.254 \end{array} \right] \\ \mathbf {\Sigma }_{3}= & {} \left[ \begin{array}{ccc} 0.195 &{}\quad 0.210 &{}\quad 0.120 \\ 0.210 &{}\quad 0.789 &{}\quad 0.190 \\ 0.120 &{}\quad 0.190 &{}\quad 0.105 \end{array} \right] \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Spezia, L. Modelling covariance matrices by the trigonometric separation strategy with application to hidden Markov models. TEST 28, 399–422 (2019). https://doi.org/10.1007/s11749-018-0580-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-018-0580-8

Keywords

Mathematics Subject Classification