Abstract
Mixtures of common t factor analyzers (MCtFA) have been shown its effectiveness in robustifying mixtures of common factor analyzers (MCFA) when handling model-based clustering of the high-dimensional data with heavy tails. However, the MCtFA model may still suffer from a lack of robustness against observations whose distributions are highly asymmetric. This paper presents a further robust extension of the MCFA and MCtFA models, called the mixture of common restricted skew-t factor analyzers (MCrstFA), by assuming a restricted multivariate skew-t distribution for the common factors. The MCrstFA model can be used to accommodate severely non-normal (skewed and leptokurtic) random phenomena while preserving its parsimony in factor-analytic representation and performing graphical visualization in low-dimensional plots. A computationally feasible expectation conditional maximization either algorithm is developed to carry out maximum likelihood estimation. The numbers of factors and mixture components are simultaneously determined based on common likelihood penalized criteria. The usefulness of our proposed model is illustrated with simulated and real datasets, and experimental results signify its superiority over some existing competitors.
Similar content being viewed by others
References
Aitken AC (1926) On Bernoulli’s numerical solution of algebraic equations. Proc R Soc Edinb 46:289–305
Arellano-Valle RB, Genton MG (2005) On fundamental skew distributions. J Multivar Anal 96:93–116
Azzalini A (2014) The skew-normal and related families. IMS monographs series. Cambridge University Press, Cambridge
Azzalini A, Browne RP, Genton MG, McNicholas PD (2016) On nomenclature for, and the relative merits of, two formulations of skew distributions. Stat Probab Lett 110:201–206
Baek J, McLachlan GJ (2011) Mixtures of common \(t\)-factor analyzers for clustering high-dimensional microarray data. Bioinformatics 27:1269–1276
Baek J, McLachlan GJ, Flack LK (2010) Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualization of high-dimensional data. IEEE Trans Pattern Anal Mach Intell 32:1–13
Barndorff-Nielsen O, Shephard N (2001) Non-Gaussian Ornstein–Uhlenbeck-based models and some of their uses in financial economics. J Roy Stat Soc Ser B 63:167–241
Beal MJ (2003) Variational algorithms for approximate Bayesian inference. Ph.D. thesis, The University of London, London, UK
Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22:719–725
Cabral CR, Lachos VH, Prates MO (2012) Multivariate mixture modeling using skew-normal independent distributions. Comput Stat Data Anal 56:126–142
Castro LM, Costa DR, Prates MO, Lachos VH (2015) Likelihood-based inference for Tobit confirmatory factor analysis using the multivariate Student-\(t\) distribution. Stat Comput 25:1163–1183
Chen X, Cheung ST, So S, Fan ST, Barry C, Higgins J, Lai KM, Ji J, Dudoit S, Ng IO, Van De Rijn M, Botstein D, Brown PO (2002) Gene expression patterns in human liver cancers. Mol Biol Cell 13:1929–1939
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J R Stat Soc B 9:1–38
Ghahramani Z, Beal M (2000) Variational inference for Bayesian mixture of factor analysers. In: Solla S, Leen T, Muller K-R (eds) Advances in neural information processing systems. MIT Press, Cambridge
Ghahramani Z, Hinton GE (1997) The EM algorithm for factor analyzers. Technical Report No. CRG-TR-96-1, The University of Toronto, Toronto
Hartigan JA, Wong MA (1979) Algorithm AS 136: a K-means clustering algorithm. J R Stat Soc C 28:100–108
Hubert LJ, Arabie P (1985) Comparing partitions. J Classif 2:193–218
Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK (1999) An introduction to variational methods for graphical models. Mach Learn 37:183–233
Lachos VH, Morenoa EJL, Chen K, Cabralc CRB (2017) Finite mixture modeling of censored data using the multivariate Student-\(t\) distribution. J Multivar Anal 159:151–167
Lee SX, McLachlan GJ (2014) Finite mixtures of multivariate skew \(t\)-distributions: some recent and new results. Stat Comp 24:181–202
Lee SX, McLachlan GJ (2016) Finite mixtures of canonical fundamental skew \(t\)-distributions: the unication of the restricted and unrestricted skew \(t\)-mixture models. Stat Comp 26:573–589
Lee YW, Poon SH (2011) Systemic and systematic factors for loan portfolio loss distribution. Econometrics and applied economics workshops, pp 1–61. School of Social Science, University of Manchester
Lee WL, Chen YC, Hsieh KS (2003) Ultrasonic liver tissues classification by fractal feature vector based on M-band wavelet transform. IEEE Trans Med Imaging 22:382–392
Lin TI (2014) Learning from incomplete data via parameterized \(t\) mixture models through eigenvalue decomposition. Comput Stat Data Anal 71:183–195
Lin TI, Wu PH, McLachlan GJ, Lee SX (2015) A robust factor analysis model using the restricted skew-\(t\) distribution. TEST 24:510–531
Lin TI, McLachlan GJ, Lee SX (2016) Extending mixtures of factor models using the restricted multivariate skew-normal distribution. J Multivar Anal 143:398–413
Lin TI, Wang WL, McLachlan GJ, Lee SX (2018) Robust mixtures of factor analysis models using the restricted multivariate skew-\(t\) distribution. Stat Model 28:50–72
Liu C, Rubin DB (1994) The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81:33–648
McLachlan GJ, Basford KE (1988) Mixture models: inference and application to clustering. Marcel Dekker, New York
McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions, 2nd edn. Wiley, New York
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
McNicholas PD, Murphy TB (2008) Parsimonious Gaussian mixture models. Stat Comp 18:285–296
McNicholas PD, Murphy TB, McDaid AF, Frost D (2010) Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Comput Stat Data Anal 54:711–723
Meng XL, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80:267–278
Murray PM, Browne RP, McNicholas PD (2014a) Mixtures of skew-\(t\) factor analyzers. Comput Stat Data Anal 77:326–335
Murray PM, McNicholas PD, Browne RP (2014b) Mixtures of common skew-\(t\) factor analyzers. Stat 3:68–82
Murray PM, Browne RP, McNicholas PD (2017a) A mixture of SDB skew-\(t\) factor analyzers. Econom Stat 3:160–168
Murray PM, Browne RP, McNicholas PD (2017b) Hidden truncation hyperbolic distributions, finite mixtures thereof, and their application for clustering. J Multivar Anal 161:141–156
Ouyang M, Welsh W, Georgopoulos P (2004) Gaussian mixture clustering and imputation of microarray data. Bioinformatics 20:917–923
Prates MO, Cabral CR, Lachos VH (2013) mixsmsn: fitting finite mixture of scale mixture of skew-normal distributions. J Stat Soft 54:1–20
Pyne S, Hu X, Wang K, Rossin E, Lin TI, Maier LM, Baecher-Allan C, McLachlan GJ, Tamayo P, Hafler DA, De Jager PL, Mesirov JP (2009) Automated high-dimensional flow cytometric data analysis. Proc Natl Acad Sci USA 106:8519–8524
Sahu SK, Dey DK, Branco MD (2003) A new class of multivariate skew distributions with application to Bayesian regression models. Can J Stat 31:129–150
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Subedi S, McNicholas PD (2014) Variational Bayes approximations for clustering via mixtures of normal inverse Gaussian distributions. Adv Data Anal Classif 8:167–193
Teschendorff A, Wang Y, Barbosa-Morais N, Brenton J, Caldas C (2005) A variational Bayesian mixture modelling framework for cluster analysis of gene-expression data. Bioinformatics 21:3025–3033
Tortora C, McNicholas P, Browne R (2016) A mixture of generalized hyperbolic factor analyzers. Adv Data Anal Classif 10:423–440
Ueda N, Nakano R, Ghahramani Z, Hinton GE (2000) SMEM algorithm for mixture models. Neural Comput 12:2109–2128
Wang WL (2013) Mixtures of common factor analyzers for high-dimensional data with missing information. J Multivar Anal 117:120–133
Wang WL (2015) Mixtures of common \(t\)-factor analyzers for modeling high-dimensional data with missing values. Comput Stat Data Anal 83:223–235
Wang WL, Lin TI (2016) Maximum likelihood inference for the multivariate t mixture model. J Multivar Anal 149:54–64
Wang WL, Lin TI (2017) Flexible clustering via extended mixtures of common \(t\)-factor analyzers. AStA Adv Stat Anal 101:227–252
Wang K, McLachlan GJ, Ng SK, Peel D (2009) EMMIX-skew: EM algorithm for mixture of multivariate skew normal/\(t\) distributions. R package version 1.0-12
Wang WL, Castro LM, Lin TI (2017a) Automated learning of \(t\) factor analysis models with complete and incomplete data. J Multivar Anal 161:157–171
Wang WL, Liu M, Lin TI (2017b) Robust skew-\(t\) factor analysis models for handling missing data. Stat Methods Appl 26:649–672
Waterhouse S, MacKay D, Robinson T (1996) Bayesian methods for mixture of experts. In: Touretzky DS, Mozer MC, Hasselmo ME (eds) Advances in neural information processing systems, vol 8. MIT Press, Cambridge
Acknowledgements
The authors gratefully acknowledge the Coordinating Editor, Maurizio Vichi, the Associate Editor and three anonymous referees for their comments and suggestions that greatly improved this paper. W.L. Wang and T.I. Lin would like to acknowledge the support of the Ministry of Science and Technology of Taiwan under Grant Nos. MOST 105-2118-M-035-004-MY2 and MOST 105-2118-M- 005-003-MY2, respectively. L.M. Castro acknowledges support from Grant FONDECYT 1170258 from Chilean government.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Proof of hierarchical representation (8)
It follows from (7) that
and
This gives rise to the following joint distribution:
We then have the following standard results:
and
where \({\varvec{\beta }}_i={\varvec{\varSigma }}_i^{-1}{\varvec{A}}{\varvec{\varOmega }}_i\). Using the characterization of the multivariate normal distribution, we can obtain
With similar arguments, we have
Hence, it is trivial to establish that \(\gamma _j\mid ({\varvec{y}}_j,\tau _j,Z_{ij}=1) \sim TN(h_i,\tau _j^{-1}\sigma _i^2;(0,\infty ))\). Furthermore, standard calculation gives
Appendix B: Proof of Proposition 1
-
(a)
Standard calculation of conditional expectation yields
$$\begin{aligned}&E(\tau _j\mid {\varvec{y}}_j,\,z_{ij}=1)=\int _{0}^{\infty }\tau _j f(\tau _j\mid {\varvec{y}}_j,\,z_{ij}=1)d\tau _j\nonumber \\&\quad =\int _{0}^{\infty }\tau _j\frac{{\varPhi }\left( \sqrt{\tau _j}M_{ij}\right) }{T\left( M_{ij}\sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}};\nu _i+p\right) }g\left( \tau _j;\frac{\nu _i+p}{2},\,\frac{\nu _i+\delta _{ij}}{2}\right) d\tau _j\nonumber \\&\quad =\frac{\left( \frac{\nu _i+p}{\nu _i+\delta _{ij}}\right) }{T\left( M_{ij}\sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}};\nu _i+p\right) }\int _{0}^{\infty }{\varPhi }\left( \sqrt{\tau _j}M_{ij}\right) g\left( \tau _j;\frac{\nu _i+p+2}{2},\,\frac{\nu _i+\delta _{ij}}{2}\right) d\tau _j\nonumber \\&\quad =\left( \frac{\nu _i+p}{\nu _i+\delta _{ij}}\right) \frac{T\left( M_{ij}\sqrt{\frac{\nu _i+p+2}{\nu _i+\delta _{ij}}};\nu _i+p+2\right) }{T\left( M_{ij}\sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}};\nu _i+p\right) }. \end{aligned}$$(B.1) -
(b)
Because \(\gamma _j\mid ({\varvec{y}}_j,\tau _j,Z_{ij}=1) \sim TN(h_i,\tau _j^{-1}\sigma _i^2;(0,\infty ))\), we obtain
$$\begin{aligned} E(\gamma _j\mid {\varvec{y}}_j,\,\tau _j,\,z_{ij}=1)=h_{ij}+\frac{\sigma _i}{\sqrt{\tau _j}}\frac{\phi \left( \sqrt{\tau _j}M_{ij}\right) }{{\varPhi }\left( \sqrt{\tau _j}M_{ij}\right) }. \end{aligned}$$(B.2) -
(c)
We first need to show
$$\begin{aligned}&E\left( \tau _j^{\frac{k}{2}}\frac{\phi \left( \sqrt{\tau _j}M_{ij}\right) }{{\varPhi }\left( \sqrt{\tau _j}M_{ij}\right) }\bigg |{\varvec{y}}_j,\,z_{ij}=1\right) \nonumber \\&\quad =\frac{1}{T\left( M_{ij}\sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}};\nu _i+p\right) }\int _{0}^{\infty }\tau _j^{\frac{k}{2}}\phi \left( \sqrt{\tau _j}M_{ij}\right) g\left( \tau _j;\frac{\nu _i+p}{2},\,\frac{\nu _i+\delta _{ij}}{2}\right) d\tau _j\nonumber \\&\quad =\frac{1}{T\left( M_{ij}\sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}};\nu _i+p\right) }\int _{0}^{\infty }\tau _j^{\frac{k-1}{2}}\phi \big (M_{ij};0,\,\tau _j^{-1}\big )g\left( \tau _j;\frac{\nu _i+p}{2},\,\frac{\nu _i+\delta _{ij}}{2}\right) d\tau _j\nonumber \\&\quad =\frac{{\varGamma }\left( \frac{\nu _i+p+k-1}{2}\right) \int _{0}^{\infty }\phi \left( M_{ij};0,\,\tau _j^{-1}\right) g\left( \tau _j;\frac{\nu _i+p+k-1}{2},\,\frac{\nu _i+\delta _{ij}}{2}\right) d\tau _j}{{\varGamma }\left( \frac{\nu _i+p}{2}\right) \left( \frac{\nu _i+\delta _{ij}}{2}\right) ^{\frac{k-1}{2}}T\left( M_{ij}\sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}};\nu _i+p\right) }\nonumber \\&\quad =\frac{{\varGamma }\left( \frac{\nu _i+p+k-1}{2}\right) \sqrt{\frac{\nu _i+p+k-1}{\nu _i+\delta _{ij}}}\,t\left( M_{ij}\sqrt{\frac{\nu _i+p+k-1}{\nu _i+\delta _{ij}}};\nu _i+p+k-1\right) }{{\varGamma }\left( \frac{\nu _i+p}{2}\right) \left( \frac{\nu _i+\delta _{ij}}{2}\right) ^{\frac{k-1}{2}}T\left( M_{ij}\sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}};\nu _i+p\right) }. \end{aligned}$$(B.3)Applying the result in (B.3) with \(k=-\,1\) and (B.2) yields
$$\begin{aligned}&E(\gamma _j\mid {\varvec{y}}_j,\,z_{ij}=1) =E[E(\gamma _j\mid {\varvec{y}}_j,\,\tau _j,\,z_{ij}=1)\mid {\varvec{y}}_j,\,z_{ij}=1]\nonumber \\&\quad =E\left[ h_{ij}+\frac{\sigma _i}{\sqrt{\tau _j}}\frac{\phi \left( \sqrt{\tau _j}M_{ij}\right) }{{\varPhi }\left( \sqrt{\tau _j}M_{ij}\right) }\bigg |{\varvec{y}}_j,\,z_{ij}=1\right] \nonumber \\&\quad =h_{ij}+\sigma _iE\left( \frac{1}{\sqrt{\tau _j}}\frac{\phi \left( \sqrt{\tau _j}M_{ij}\right) }{{\varPhi }\left( \sqrt{\tau _j}M_{ij}\right) }\bigg |{\varvec{y}}_j,\,z_{ij}=1\right) \nonumber \\&\quad =h_{ij}+\frac{\sigma _i}{\sqrt{\frac{\nu _i+p-2}{\nu _i+\delta _{ij}}}}\frac{t\left( M_{ij}\sqrt{\frac{\nu _i+p-2}{\nu _i+\delta _{ij}}};\nu _i+p-2\right) }{T\left( M_{ij}\sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}};\nu _i+p\right) }. \end{aligned}$$(B.4) -
(d)
Using (B.1), (B.2) and (B.3) with \(k=1\), we have
$$\begin{aligned}&E(\tau _j\gamma _j|{\varvec{y}}_j,\,z_{ij}=1) =E[E(\tau _j\gamma _j|{\varvec{y}}_j,\,\tau _j,\,z_{ij}=1)|{\varvec{y}}_j,\,z_{ij}=1]\nonumber \\&\quad =E[\tau _jE(\gamma _j|{\varvec{y}}_j,\,\tau _j,\,z_{ij}=1)|{\varvec{y}}_j,\,z_{ij}=1]\nonumber \\&\quad =E\left[ \tau _j\left( h_{ij}+\frac{\sigma _i}{\sqrt{\tau _j}}\frac{\phi \left( \sqrt{\tau _j}M_{ij}\right) }{{\varPhi }\left( \sqrt{\tau _j}M_{ij}\right) }\right) \bigg |{\varvec{y}}_j,\,z_{ij}=1\right] \nonumber \\&\quad =h_{ij} E(\tau _j|{\varvec{y}}_j,\,z_{ij}=1)+\sigma _i E\left[ \sqrt{\tau _j}\frac{\phi \left( \sqrt{\tau _j}M_{ij}\right) }{{\varPhi }\left( \sqrt{\tau _j}M_{ij}\right) }\bigg |{\varvec{y}}_j,\,z_{ij}=1\right] \nonumber \\&\quad =h_{ij}\left[ \frac{\nu _i+p}{\nu _i+\delta _{ij}}\frac{T\left( M_{ij}\sqrt{\frac{\nu _i+p+2}{\nu _i+\delta _{ij}}};\nu _i+p+2\right) }{T\left( M_{ij}\sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}};\nu _i+p\right) }\right] \nonumber \\&\qquad +\,\sigma _i\left[ \sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}}\frac{t\left( M_{ij}\sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}};\nu _i+p\right) }{T\left( M_{ij}\sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}};\nu _i+p\right) }\right] . \end{aligned}$$(B.5) -
(e)
Using the result of (B.2), the second moment of a truncated normal distribution is given by
$$\begin{aligned} E\left( \gamma _j^2|{\varvec{y}}_j,\,\tau _j,\,z_{ij}=1\right)= & {} h_{ij}E(\gamma _j|{\varvec{y}}_j,\,\tau _j,\,z_{ij}=1)+\frac{\sigma _i^2}{\tau _j}\nonumber \\= & {} h_{ij}\left( h_{ij}+\frac{\sigma _i}{\sqrt{\tau _j}}\frac{\phi \left( \sqrt{\tau _j}M_{ij}\right) }{{\varPhi }\left( \sqrt{\tau _j}M_{ij}\right) }\right) +\frac{\sigma _i^2}{\tau _j}. \end{aligned}$$(B.6) -
(f)
Applying the double expectation and using (B.5) and (B.6), we have
$$\begin{aligned} E\left( \tau _j\gamma _j^2|{\varvec{y}}_j,\,z_{ij}=1\right)= & {} E\left[ E\left( \tau _j\gamma _j^2|{\varvec{y}}_j,\,\tau _j,\,z_{ij}=1\right) |{\varvec{y}}_j,\,z_{ij}=1\right] \nonumber \\= & {} E\left[ \tau _jE\left( \gamma _j^2|{\varvec{y}}_j,\,\tau _j,\,z_{ij}=1\right) |{\varvec{y}}_j,\,z_{ij}=1\right] \nonumber \\= & {} E\left\{ \tau _j\left[ h_{ij}E(\gamma _j|{\varvec{y}}_j,\,\tau _j,\,z_{ij}=1)+\tau _j^{-1}\sigma _i^2\right] |{\varvec{y}}_j,\,z_{ij}=1\right\} \nonumber \\= & {} h_{ij} E\left( \tau _j\gamma _j|{\varvec{y}}_j,\,z_{ij}=1\right) +\sigma _i^2. \end{aligned}$$(B.7) -
(g)
Applying the double expectation and the result of (B.4), we have
$$\begin{aligned}&E({\varvec{U}}_{ij}|{\varvec{y}}_j,\,z_{ij}=1)=E\left[ E({\varvec{U}}_{ij}|{\varvec{y}}_j,\,\gamma _j,\,\tau _j,\,z_{ij}=1)|{\varvec{y}}_j,\,z_{ij}=1\right] \nonumber \\&\quad =E\left[ {\varvec{\xi }}_i+{\varvec{\lambda }}_i\gamma _j+{\varvec{\beta }}_i^{\top }({\varvec{y}}_j-{\varvec{\mu }}_i-{\varvec{\alpha }}_i\gamma _j)|{\varvec{y}}_j,\,z_{ij}=1\right] \nonumber \\&\quad ={\varvec{\xi }}_i+{\varvec{\beta }}_i^{\top }({\varvec{y}}_j-{\varvec{\mu }}_i)+({\varvec{\lambda }}_i-{\varvec{\beta }}_i^{\top }{\varvec{\alpha }}_i)E(\gamma _j|{\varvec{y}}_j,\,z_{ij}=1). \end{aligned}$$(B.8) -
(h)
Applying the double expectation and using (B.1) and (B.5), we have
$$\begin{aligned}&E(\tau _j{\varvec{U}}_{ij}|{\varvec{y}}_j,\,z_{ij}=1)=E\left[ E(\tau _j{\varvec{U}}_{ij}|{\varvec{y}}_j,\,\gamma _j,\,\tau _j,\,z_{ij}=1)|{\varvec{y}}_j,\,z_{ij}=1\right] \nonumber \\&\quad =E\left[ \tau _j E({\varvec{U}}_{ij}|{\varvec{y}}_j,\,\gamma _j,\,\tau _j,\,z_{ij}=1)|{\varvec{y}}_j,\,z_{ij}=1\right] \nonumber \\&\quad =E\left\{ \tau _j\left[ {\varvec{\xi }}_i+{\varvec{\lambda }}_i\gamma _j+{\varvec{\beta }}_i^{\top }({\varvec{y}}_j-{\varvec{\mu }}_i-{\varvec{\alpha }}_i\gamma _j)\right] |{\varvec{y}}_j,\,z_{ij}=1\right\} \nonumber \\&\quad =\left[ {\varvec{\xi }}_i+{\varvec{\beta }}_i^{\top }({\varvec{y}}_j-{\varvec{\mu }}_i)\right] E(\tau _j|{\varvec{y}}_j,\,z_{ij}=1)\nonumber \\&\qquad +\,\left( {\varvec{\lambda }}_i-{\varvec{\beta }}_i^{\top }{\varvec{\alpha }}_i\right) E(\tau _j\gamma _j|{\varvec{y}}_j,\,z_{ij}=1). \end{aligned}$$(B.9) -
(i)
Applying the double expectation and using (B.5) and (B.7), we have
$$\begin{aligned}&E\left( \tau _j\gamma _j{\varvec{U}}_{ij}|{\varvec{y}}_j,\,z_{ij}=1\right) \\&\quad =E\left[ E(\tau _j\gamma _j{\varvec{U}}_{ij}|{\varvec{y}}_j,\,\gamma _j,\,\tau _j,\,z_{ij}=1)|{\varvec{y}}_j,\,z_{ij}=1\right] \nonumber \\&\quad =E\left[ \tau _j\gamma _j E({\varvec{U}}_{ij}|{\varvec{y}}_j,\,\gamma _j,\,\tau _j,\,z_{ij}=1)|{\varvec{y}}_j,\,z_{ij}=1\right] \\&\quad =E\left\{ \tau _j\gamma _j[{\varvec{\xi }}_i+{\varvec{\lambda }}_i\gamma _j+{\varvec{\beta }}_i^\top ({\varvec{y}}_j-{\varvec{\mu }}_i-{\varvec{\alpha }}_i\gamma _j)]|{\varvec{y}}_j,\,z_{ij}=1\right\} \nonumber \\&\quad =\left[ {\varvec{\xi }}_i+{\varvec{\beta }}_i^\top ({\varvec{y}}_j-{\varvec{\mu }}_i)\right] E(\tau _j\gamma _j|{\varvec{y}}_j,\,z_{ij}=1)\\&\qquad +\,({\varvec{\lambda }}_i-{\varvec{\beta }}_i^\top {\varvec{\alpha }}_i)E(\tau _j\gamma _j^2|{\varvec{y}}_j,\,z_{ij}=1). \end{aligned}$$ -
(j)
Applying the double expectation and using (B.8) and (B.9), we have
$$\begin{aligned}&E\left( \tau _j{\varvec{U}}_{ij}{\varvec{U}}_{ij}^{\top }|{\varvec{y}}_j,\,z_{ij}=1\right) =E\left[ E\left( \tau _j{\varvec{U}}_{ij}{\varvec{U}}_{ij}^{\top }|{\varvec{y}}_j,\,\gamma _j,\,\tau _j,\,z_{ij}=1\right) |{\varvec{y}}_j,\,z_{ij}=1\right] \\&\quad =E\left[ \tau _jE({\varvec{U}}_{ij}{\varvec{U}}_{ij}^{\top }|{\varvec{y}}_j,\,\gamma _j,\,\tau _j,\,z_{ij}=1)|{\varvec{y}}_j,\,z_{ij}=1\right] \\&\quad =E\big \{\tau _j[E({\varvec{U}}_{ij}|{\varvec{y}}_j,\,\gamma _j,\,\tau _j,\,z_{ij}=1)E({\varvec{U}}_{ij}^{\top }|{\varvec{y}}_j,\,\gamma _j,\,\tau _j,\,z_{ij}=1)\\&\qquad +\,\text{ cov }({\varvec{U}}_{ij}|{\varvec{y}}_j,\,\gamma _j,\,\tau _j,\,z_{ij}=1)]|{\varvec{y}}_j,\,z_{ij}=1\big \}\\&\quad =E\big \{\tau _j[E({\varvec{U}}_{ij}|{\varvec{y}}_j,\,\gamma _j,\,\tau _j,\,z_{ij}=1)({\varvec{\xi }}_i+{\varvec{\lambda }}_i\gamma _j+{\varvec{\beta }}_i^{\top }({\varvec{y}}_j-{\varvec{\mu }}_i-{\varvec{\alpha }}_i\gamma _j))^{\top }\\&\qquad +\,\tau _j^{-1}({\varvec{I}}_q-{\varvec{\beta }}_i^{\top }{\varvec{A}}){\varvec{\varOmega }}_i]|{\varvec{y}}_j,\,z_{ij}=1\big \}\\&\quad =E(\gamma _j\tau _j{\varvec{U}}_{ij}|{\varvec{y}}_j,\,z_{ij}=1)({\varvec{\lambda }}_i-{\varvec{\beta }}_i^{\top }{\varvec{\alpha }}_i)^{\top }\\&\qquad +\,E(\tau _j{\varvec{U}}_{ij}|{\varvec{y}}_j,\,z_{ij}=1)[{\varvec{\xi }}_i+{\varvec{\beta }}_i^{\top }({\varvec{y}}_j-{\varvec{\mu }}_i)]^{\top }+({\varvec{I}}_q-{\varvec{\beta }}_i^{\top }{\varvec{A}}){\varvec{\varOmega }}_i. \end{aligned}$$ -
(k)
It is known that \(\int _0^{\infty }f(\tau _j|{\varvec{y}}_j,\,Z_{ij}=1)d\tau _j=1\), that is,
$$\begin{aligned} \int _0^{\infty }\frac{{\varPhi }\big (\sqrt{\tau _j}M_{ij}\big )}{T\left( M_{ij}\sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}};\nu _i+p\right) }\frac{{(\frac{\nu _i+\delta _{ij}}{2})}^{(\frac{\nu _i+p}{2})}}{{\varGamma }(\frac{\nu _i+p}{2})}\exp \left\{ -\frac{\nu _i+\delta _{ij}}{2}\tau _j\right\} d\tau _j =1. \end{aligned}$$Then
$$\begin{aligned} \frac{d}{d\nu _i}\int _0^{\infty }b_j{\varPhi }\big (\sqrt{\tau _j}M_{ij}\big )\exp \left\{ -\frac{\nu _i+\delta _{ij}}{2}\tau _j\right\} d\tau _j=0, \end{aligned}$$where
$$\begin{aligned} b_j=\frac{{\left( \frac{\nu _i+\delta _{ij}}{2}\right) }^{\left( \nu _i+p\right) /2}}{{\varGamma }\left( \frac{\nu _i+p}{2}\right) T\left( M_{ij}\sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}};\nu _i+p\right) }. \end{aligned}$$By Leibnitz’s rule, we can obtain
$$\begin{aligned}&E(\log \tau _j|{\varvec{y}}_j,\,z_{ij}=1)-E(\tau _j|{\varvec{y}}_j,\,z_{ij}=1)+\log \left( \frac{\nu _i+\delta _{ij}}{2}\right) +\left( \frac{\nu _i+p}{\nu _i+\delta _{ij}}\right) \\&\quad -\,\mathrm{DG}\left( \frac{\nu _i+p}{2}\right) -\frac{\int _{-\infty }^{M_{ij}}t\left( x;0,\frac{\nu _i+\delta _{ij}}{\nu _i+p},\nu _i+p\right) f_{\nu _i}(x)dx}{T\left( M_{ij}\sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}};\nu _i+p\right) }=0, \end{aligned}$$where
$$\begin{aligned} f_{\nu _i}(x)= & {} \mathrm{DG}\left( \frac{\nu _i+p+1}{2}\right) -\mathrm{DG}\left( \frac{\nu _i+p}{2}\right) -\frac{1}{\pi (\nu _i+\delta _{ij})}\nonumber \\&-\,\log \left( 1+\frac{x^2}{\nu _i+\delta _{ij}}\right) +\frac{(\nu _i+p+1)x^2}{(\nu _i+\delta _{ij})(x^2+\nu _i+\delta _{ij})}. \end{aligned}$$(B.10)It follows that
$$\begin{aligned} E(\log \tau _j|{\varvec{y}}_j,\,z_{ij}=1)= & {} E(\tau _j|{\varvec{y}}_j,\,z_{ij}=1)-\log \left( \frac{\nu _i+\delta _{ij}}{2}\right) -\left( \frac{\nu _i+p}{\nu _i+\delta _{ij}}\right) \\&+\,\mathrm{DG}\left( \frac{\nu _i+p}{2}\right) +\,\frac{\int _{-\infty }^{M_{ij}}t\left( x;0,\frac{\nu _i+\delta _{ij}}{\nu _i+p},\nu _i+p\right) f_{\nu _i}(x)dx}{T\left( M_{ij}\sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}};\nu _i+p\right) }. \end{aligned}$$
Appendix C: Proof of CM-steps
-
(a)
By the Lagrange multiplier method, we define
$$\begin{aligned} L(\pi _i,\lambda )=Q({\varvec{\varTheta }}\mid \hat{{\varvec{\varTheta }}}^{(k)})-\lambda \left( \sum _{i=1}^g\pi _i-1\right) , \end{aligned}$$and then take partial derivatives, yielding
$$\begin{aligned} \frac{\partial L(\pi _i,\lambda )}{\partial \pi _i} = \sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}\frac{1}{\pi _i}-\lambda =0,\quad \text{ and } \quad \frac{\partial L(\pi _i,\lambda )}{\partial \lambda } = -\left( \sum _{i=1}^g\pi _i-1\right) =0. \end{aligned}$$Since \(\sum _{i=1}^g\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}=n\), we obtain \(\hat{\pi }_i^{(k+1)}=\sum _{j=1}^n{\hat{z}}_{ij}^{(k)}/n\).
-
(b)
Differentiating \(Q({\varvec{\varTheta }}\mid \hat{{\varvec{\varTheta }}}^{(k)})\) with respect to \({\varvec{\xi }}_i\) leads to
$$\begin{aligned} \frac{\partial Q}{\partial {\varvec{\xi }}_i}= & {} -\frac{1}{2}\frac{\partial }{\partial {\varvec{\xi }}_i}\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{\varvec{\varOmega }}_i^{-1}\mathrm{tr}\bigg [-{\hat{{\varvec{\eta }}}_{ij}}^{(k)}{\varvec{\xi }}_i^\top -{\varvec{\xi }}_i{\hat{{\varvec{\eta }}}_{ij}}^{(k)^\top }\\&+\,{\varvec{\xi }}_i{\hat{\tau }_{ij}}^{(k)}{\varvec{\xi }}_i^\top +{\varvec{\xi }}_i{{\hat{s}}_{1ij}}^{(k)}{\varvec{\lambda }}_i^\top +{\varvec{\lambda }}_i{{\hat{s}}_{1ij}}^{(k)}{\varvec{\xi }}_i^\top \bigg ]\\= & {} \mathrm{tr}\bigg \{{\varvec{\varOmega }}_i^{-1}\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}\left[ {\hat{{\varvec{\eta }}}_{ij}}^{(k)}-{\hat{\tau }_{ij}}^{(k)}{\varvec{\xi }}_i-{{\hat{s}}_{1ij}}^{(k)}{\varvec{\lambda }}_i\right] \bigg \}. \end{aligned}$$Moreover, the partial derivative of \(Q({\varvec{\varTheta }}\mid \hat{{\varvec{\varTheta }}}^{(k)})\) with respect to \({\varvec{\lambda }}_i\) is
$$\begin{aligned} \frac{\partial Q}{\partial {\varvec{\lambda }}_i}= & {} -\frac{1}{2}\frac{\partial }{\partial {\varvec{\lambda }}_i}\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{\varvec{\varOmega }}_i^{-1}\mathrm{tr}\bigg [-{\hat{{\varvec{\zeta }}}_{ij}}^{(k)}{\varvec{\lambda }}_i^\top +{\varvec{\xi }}_i{{\hat{s}}_{1ij}}^{(k)}{\varvec{\lambda }}_i^\top \\&-\,{\varvec{\lambda }}_i{\hat{{\varvec{\zeta }}}_{ij}}^{(k)^\top }+{\varvec{\lambda }}_i{{\hat{s}}_{1ij}}^{(k)}{\varvec{\xi }}_i^\top +{\varvec{\lambda }}_i{{\hat{s}}_{2ij}}^{(k)}{\varvec{\lambda }}_i^\top \bigg ]\\= & {} \mathrm{tr}\bigg \{{\varvec{\varOmega }}_i^{-1}\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}\left[ {\hat{{\varvec{\zeta }}}_{ij}}^{(k)}-{{\hat{s}}_{1ij}}^{(k)}{\varvec{\xi }}_i-{{\hat{s}}_{2ij}}^{(k)}{\varvec{\lambda }}_i\right] \bigg \}. \end{aligned}$$Solving the above two equations, we get
$$\begin{aligned} \frac{\partial Q}{\partial {\varvec{\xi }}_i}= & {} \sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{\varvec{\varOmega }}_i^{-1}({\hat{{\varvec{\eta }}}_{ij}}^{(k)}-{{\hat{s}}_{1ij}}^{(k)}{\varvec{\lambda }}_i)-\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{\varvec{\varOmega }}_i^{-1}{\hat{\tau }_{ij}}^{(k)}{\varvec{\xi }}_i=\mathbf 0, \end{aligned}$$(C.1)$$\begin{aligned} \frac{\partial Q}{\partial {\varvec{\lambda }}_i}= & {} \sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{\varvec{\varOmega }}_i^{-1}({\hat{{\varvec{\zeta }}}_{ij}}^{(k)}-{{\hat{s}}_{1ij}}^{(k)}{\varvec{\xi }}_i)-\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{\varvec{\varOmega }}_i^{-1}{{\hat{s}}_{2ij}}^{(k)}{\varvec{\lambda }}_i=\mathbf 0. \end{aligned}$$(C.2)After rearrangement, (C.1) and (C.2) can be rewritten as
$$\begin{aligned} \sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{\hat{\tau }_{ij}}^{(k)}{\varvec{\xi }}_i+\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{{\hat{s}}_{1ij}}^{(k)}{\varvec{\lambda }}_i= & {} \sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{\hat{{\varvec{\eta }}}_{ij}}^{(k)},\\ \sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{{\hat{s}}_{1ij}}^{(k)}{\varvec{\xi }}_i+\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{{\hat{s}}_{2ij}}^{(k)}{\varvec{\lambda }}_i= & {} \sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{\hat{{\varvec{\zeta }}}_{ij}}^{(k)}. \end{aligned}$$Using Cramer’s law, the solutions of the two linear equations are
$$\begin{aligned} \hat{{\varvec{\xi }}}_i^{\left( k+1\right) }= \frac{ \left( \sum _{j=1}^n{{\hat{z}}_{ij}}^{\left( k\right) }{\hat{{\varvec{\eta }}}_{ij}}^{\left( k\right) }\right) \left( \sum _{j=1}^n{{\hat{z}}_{ij}}^{\left( k\right) }{{\hat{s}}_{2ij}}^{\left( k\right) }\right) - \left( \sum _{j=1}^n{{\hat{z}}_{ij}}^{\left( k\right) }{\hat{{\varvec{\zeta }}}_{ij}}^{\left( k\right) }\right) \left( \sum _{j=1}^n{{\hat{z}}_{ij}}^{\left( k\right) }{{\hat{s}}_{1ij}}^{\left( k\right) }\right) }{ \left( \sum _{j=1}^n{{\hat{z}}_{ij}}^{\left( k\right) }{\hat{\tau }_{ij}}^{\left( k\right) }\right) \left( \sum _{j=1}^n{{\hat{z}}_{ij}}^{\left( k\right) }{{\hat{s}}_{2ij}}^{\left( k\right) }\right) - \left( \sum _{j=1}^n{{\hat{z}}_{ij}}^{\left( k\right) }{{\hat{s}}_{1ij}}^{\left( k\right) }\right) ^2}, \end{aligned}$$and
$$\begin{aligned} \hat{{\varvec{\lambda }}}_i^{\left( k+1\right) }=\frac{ \left( \sum _{j=1}^n{{\hat{z}}_{ij}}^{\left( k\right) }{\hat{\tau }_{ij}}^{\left( k\right) }\right) \left( \sum _{j=1}^n{{\hat{z}}_{ij}}^{\left( k\right) }{\hat{{\varvec{\zeta }}}_{ij}}^{\left( k\right) }\right) - \left( \sum _{j=1}^n{{\hat{z}}_{ij}}^{\left( k\right) }{{\hat{s}}_{1ij}}^{\left( k\right) }\right) \left( \sum _{j=1}^n{{\hat{z}}_{ij}}^{\left( k\right) }{\hat{{\varvec{\eta }}}_{ij}}^{\left( k\right) }\right) }{ \left( \sum _{j=1}^n{{\hat{z}}_{ij}}^{\left( k\right) }{\hat{\tau }_{ij}}^{\left( k\right) }\right) \left( \sum _{j=1}^n{{\hat{z}}_{ij}}^{\left( k\right) }{{\hat{s}}_{2ij}}^{\left( k\right) }\right) - \left( \sum _{j=1}^n{{\hat{z}}_{ij}}^{\left( k\right) }{{\hat{s}}_{1ij}}^{\left( k\right) }\right) ^2}. \end{aligned}$$ -
(c)
The partial derivative of \(Q({\varvec{\varTheta }}\mid \hat{{\varvec{\varTheta }}}^{(k)})\) with respect to \({\varvec{A}}\) is
$$\begin{aligned} \frac{\partial Q}{\partial {\varvec{A}}}= & {} -\frac{1}{2}\frac{\partial }{\partial {\varvec{A}}}\sum _{i=1}^g\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}\mathrm{tr}\bigg (-{\varvec{D}}_i^{-1}{\varvec{y}}_j{\hat{{\varvec{\eta }}}_{ij}}^{(k)^\top }{\varvec{A}}^\top \nonumber \\&-\,{\varvec{D}}_i^{-1}{\varvec{A}}{\hat{{\varvec{\eta }}}_{ij}}^{(k)}{\varvec{y}}_j^\top +{\varvec{D}}_i^{-1}{\varvec{A}}{\hat{{\varvec{\varPsi }}}_{ij}}^{(k)}{\varvec{A}}^\top \bigg )\nonumber \\= & {} \mathrm{tr}\left( \sum _{i=1}^g\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{\varvec{D}}_i^{-1}{\varvec{y}}_j{\hat{{\varvec{\eta }}}_{ij}}^{(k)^\top }-\sum _{i=1}^g\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{\varvec{D}}_i^{-1}{\hat{{\varvec{\varPsi }}}_{ij}}^{(k)}{\varvec{A}}\right) . \end{aligned}$$(C.3)Equating (C.3) to the zero matrix, we have
$$\begin{aligned} \hat{{\varvec{A}}}^{(k+1)}=\left( \sum _{i=1}^g\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{\varvec{y}}_j{\hat{{\varvec{\eta }}}_{ij}}^{(k)^\top }\right) \left( \sum _{i=1}^g\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{\hat{{\varvec{\varPsi }}}_{ij}}^{(k)}\right) ^{-1}. \end{aligned}$$ -
(d)
The partial derivative of \(Q({\varvec{\varTheta }}\mid \hat{{\varvec{\varTheta }}}^{(k)})\) with respect to \({\varvec{\varOmega }}_i\) is
$$\begin{aligned} \frac{\partial Q}{\partial {\varvec{\varOmega }}_i^{-1}}= & {} \frac{1}{2}\frac{\partial }{\partial {\varvec{\varOmega }}_i^{-1}}\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}\left\{ \log |{\varvec{\varOmega }}_i^{-1}|-\mathrm{tr}\left( {\varvec{\varOmega }}_i^{-1}{\varvec{\varLambda }}_{ij}\right) \right\} \nonumber \\= & {} \frac{1}{2}\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}\Big [2{\varvec{\varOmega }}_i-\mathrm{Diag}\{{\varvec{\varOmega }}_i\}-\big (2{\varvec{\varLambda }}_{ij}-\mathrm{Diag}\{{\varvec{\varLambda }}_{ij}\}\big )\Big ]. \end{aligned}$$(C.4)Equating (C.4) to the zero vector gives
$$\begin{aligned} {\hat{{\varvec{\varOmega }}}_i}^{(k+1)}=\frac{\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{\hat{{\varvec{\varLambda }}}_{ij}}^{(k+1)}}{\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}}. \end{aligned}$$ -
(e)
Taking the partial derivative of \(Q({\varvec{\varTheta }}\mid \hat{{\varvec{\varTheta }}}^{(k)})\) with respect to \({\varvec{D}}_i\) yields
$$\begin{aligned} \frac{\partial Q}{\partial {\varvec{D}}_i^{-1}}= & {} \frac{1}{2}\frac{\partial }{\partial {\varvec{D}}_i^{-1}}\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}\left[ \log |{\varvec{D}}_i^{-1}|-\mathrm{tr}\left( {\varvec{D}}_i^{-1}{\varvec{\varUpsilon }}_{ij}\right) \right] \nonumber \\= & {} \frac{1}{2}\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}({\varvec{D}}_i-{\varvec{\varUpsilon }}_{ij}). \end{aligned}$$(C.5)We have the following estimator
$$\begin{aligned} {\hat{{\varvec{D}}}_i}^{(k+1)}=\frac{\mathrm{Diag}\left\{ \sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{\hat{{\varvec{\varUpsilon }}}_{ij}}^{(k+1)}\right\} }{\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}} \end{aligned}$$obtained by equating (C.5) to the zero matrix.
Appendix D: Parameter estimation for the MCghstFA model using the ECM algorithm
According to Table 5, the MCghstFA model admits a three-level hierarchy:
From (D.1), it can be verified that
and
where \({\varvec{\mu }}_{2\cdot 1}={\varvec{\xi }}_i+W_j{\varvec{\lambda }}_i+ {\varvec{{\varOmega }}}_i{\varvec{A}}^{\top }({\varvec{A}}{\varvec{{\varOmega }}}_i{\varvec{A}}^{\top }+{\varvec{D}})^{-1}({\varvec{y}}_j-{\varvec{A}}{\varvec{\xi }}_i-W_j{\varvec{A}}{\varvec{\lambda }}_i)\) and \({\varvec{{\varSigma }}}_{22\cdot 1}=W_j({\varvec{{\varOmega }}}_i-{\varvec{{\varOmega }}}_i{\varvec{A}}^{\top }({\varvec{A}}{\varvec{{\varOmega }}}_i{\varvec{A}}^{\top }+{\varvec{D}})^{-1}{\varvec{A}}{\varvec{{\varOmega }}}_i)= W_j({\varvec{{\varOmega }}}^{-1}_i-{\varvec{A}}^{\top }{\varvec{D}}^{-1}{\varvec{A}})^{-1}\).
A positive random variable X is said to follow the Generalized Inverse Gaussian (GIG) distribution (Good 1953), denoted by \(W\sim \mathrm{GIG}(\psi ,\chi ,r)\), if it has the pdf
where \(\psi ,\chi \in {\mathbb {R}}^+\), \(r\in {\mathbb {R}}\), and \(K_q\) is the modified Bessel function of the third kind with index r. Some particular moments of the GIG distribution have tractable forms, for instance,
and
where
By Bayes’ Theorem, the conditional pdf of \(W_j\) given \({\varvec{y}}_j\) can be written as
where \({\varDelta }_{ij}=({\varvec{y}}_j-{\varvec{A}}{\varvec{\xi }}_i)^{\top }({\varvec{A}}{\varvec{\varOmega }}_i{\varvec{A}}^{\top }+{\varvec{D}})^{-1}({\varvec{y}}_j-{\varvec{A}}{\varvec{\xi }}_i)\). It follows from (D.3) that
Alternatively, the MCghstFA model can be represented by a four-level hierarchy:
From (D.7), the complete-data log-likelihood function for \({\varvec{\varTheta }}\) on the basis of \({\varvec{Y}}_{c}=\{{\varvec{y}}_{j},{\varvec{U}}_{ij},W_{j},{\varvec{Z}}_j\}^{n}_{j=1}\), for \(i=1,\ldots ,g\), is given by
To evaluate the expected value of (D.8), called the Q function, we first calculate
which is the posterior probability of \({\varvec{y}}_j\) belonging to the ith component of the mixture. In addition, we utilize the results (D.4) and (D.5) to calculate of the following conditional expectations:
where \(\hat{\omega }_{ij}=\sqrt{(\hat{\nu }_i+\hat{{\varDelta }}_{ij})\hat{{\varvec{\lambda }}}^{\top }_i\hat{{\varvec{A}}}^{\top }(\hat{{\varvec{A}}}\hat{{\varvec{\varOmega }}}_i\hat{{\varvec{A}}}^{\top }+\hat{{\varvec{D}}})^{-1}\hat{{\varvec{A}}}\hat{{\varvec{\lambda }}}_i}\) and \(K^{\prime }_{-\frac{(\hat{\nu }+p)}{2}}(\hat{\omega }_{ij})\) is evaluated via (D.6). By (D.2), we obtain
and
where \(\hat{{\varvec{\gamma }}}_i = (\hat{{\varvec{A}}} \hat{{\varvec{{\varOmega }}}}_i \hat{{\varvec{A}}}^{\top } + \hat{{\varvec{D}}})^{-1}\hat{{\varvec{A}}}\hat{{\varvec{{\varOmega }}}}_i\).
After some algebraic manipulations, the resulting Q function that gets rid of the constants is given by
Taking partial derivatives of (D.14) with respect to \({\varvec{\xi }}_i\) and \({\varvec{\lambda }}_i\) and equating them to zero vectors yield
In summary, the ECM algorithm for estimating the parameters of MCghstFA proceeds as follows:
-
E-step:
Given the current value \({\varvec{\varTheta }}=\hat{{\varvec{\varTheta }}}\), compute \({\hat{z}}_{ij}\), \({\hat{s}}_{1ij}\), \({\hat{s}}_{2ij}\), \({\hat{s}}_{3ij}\), \(\hat{{\varvec{u}}}_{ij}\), \(\hat{{\varvec{\eta }}}_{ij}\) and \(\hat{{\varvec{{\varPsi }}}}_{ij}\) as defined in (D.9)–(D.13) for \(i=1,\ldots ,g\) and \(j=1,\ldots ,n\).
-
CM step 1:
Maximizing (D.14) with respect to \(\pi _i\) and using the Lagrange multiplier method, this gives \(\hat{\pi }_i={\hat{n}}_i/n\), where \({\hat{n}}_i=\sum _{j=1}^n {\hat{z}}_{ij}\).
-
CM step 2:
Update parameters \({\varvec{\xi }}_i\) and \({\varvec{\lambda }}_i\) by solving simultaneous Eqs. (D.15) and (D.16). Simple matrix algebra yields
$$\begin{aligned} \hat{{\varvec{\xi }}}_i= & {} \frac{\big (\sum _{j=1}^n{\hat{z}}_{ij}{\hat{s}}_{1ij}\big )\big (\sum _{j=1}^n{\hat{z}}_{ij}\hat{{\varvec{\eta }}}_{ij}\big )- {\hat{n}}_i(\sum _{j=1}^n{\hat{z}}_{ij}\hat{{\varvec{u}}}_{ij}\big )}{\big (\sum _{j=1}^n{\hat{z}}_{ij}{\hat{s}}_{1ij}\big )\big (\sum _{j=1}^n{\hat{z}}_{ij}{\hat{s}}_{2ij}\big )-{\hat{n}}^2_i} \end{aligned}$$and
$$\begin{aligned} \hat{{\varvec{\lambda }}}_i= & {} \frac{\big (\sum _{j=1}^n{\hat{z}}_{ij}{\hat{s}}_{2ij}\big )\big (\sum _{j=1}^n{\hat{z}}_{ij}\hat{{\varvec{u}}}_{ij}\big )- {\hat{n}}_i(\sum _{j=1}^n{\hat{z}}_{ij}\hat{{\varvec{\eta }}}_{ij}\big )}{\big (\sum _{j=1}^n{\hat{z}}_{ij}{\hat{s}}_{1ij}\big )\big (\sum _{j=1}^n{\hat{z}}_{ij}{\hat{s}}_{2ij}\big )-{\hat{n}}^2_i}. \end{aligned}$$ -
CM-step3:
The updates for \({\varvec{A}}\), \({\varvec{{\varOmega }}}_i\) and \({\varvec{D}}\) are given by
$$\begin{aligned} \hat{{\varvec{A}}}= & {} \left( \sum _{i=1}^g\sum _{j=1}^n{\hat{z}}_{ij}\hat{{\varvec{y}}}_j\hat{{\varvec{\eta }}}_{ij}^{\top }\right) \left( \sum _{i=1}^g\sum _{j=1}^n{\hat{z}}_{ij}\hat{{\varvec{{\varPsi }}}}_{ij}\right) ^{-1},\\ \hat{{\varvec{{\varOmega }}}}_i= & {} \frac{1}{{\hat{n}}_i}\sum _{j=1}^n{\hat{z}}_{ij}\Big [\hat{{\varvec{{\varPsi }}}}_{ij}-\hat{{\varvec{\eta }}}_{ij}\hat{{\varvec{\xi }}}_{i}^{\top } -\hat{{\varvec{\xi }}}_{i}\hat{{\varvec{\eta }}}_{ij}^{\top }+{\hat{s}}_{2ij}\hat{{\varvec{\xi }}}_{i}\hat{{\varvec{\xi }}}_{i}^{\top }+{\hat{s}}_{1ij}\hat{{\varvec{\lambda }}}_i\hat{{\varvec{\lambda }}}^{\top }_i\\&-\,(\hat{{\varvec{u}}}_{ij}-\hat{{\varvec{\xi }}}_i)\hat{{\varvec{\lambda }}}^{\top }_i-\hat{{\varvec{\lambda }}}_i(\hat{{\varvec{u}}}_{ij}-\hat{{\varvec{\xi }}}_i)^{\top }\Big ],\\ \hat{{\varvec{D}}}= & {} \frac{1}{n}\mathrm{Diag}\Bigg \{\sum _{i=1}^g\sum _{j=1}^n{\hat{z}}_{ij}\big ({\hat{s}}_{2ij}{\varvec{y}}_j{\varvec{y}}^{\top }_j-{\varvec{y}}_j\hat{{\varvec{\eta }}}_{ij}^{\top }\hat{{\varvec{A}}}^{\top }\big )\Bigg \}. \end{aligned}$$ -
CM step 4:
Calculate \(\hat{\nu }_i\) by solving the root of the following equation:
$$\begin{aligned} \log \left( \frac{\nu _i}{2}\right) - \mathrm{DG}\left( \frac{\nu _i}{2}\right) + 1 -\frac{1}{{\hat{n}}_i} \sum _{j=1}^n {\hat{z}}_{ij}({\hat{s}}_{2ij} + {\hat{s}}_{3ij})=0. \end{aligned}$$
Rights and permissions
About this article
Cite this article
Wang, WL., Castro, L.M., Chang, YT. et al. Mixtures of restricted skew-t factor analyzers with common factor loadings. Adv Data Anal Classif 13, 445–480 (2019). https://doi.org/10.1007/s11634-018-0317-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-018-0317-2