Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Flexible mixture modelling using the multivariate skew-t-normal distribution

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

This paper presents a robust probabilistic mixture model based on the multivariate skew-t-normal distribution, a skew extension of the multivariate Student’s t distribution with more powerful abilities in modelling data whose distribution seriously deviates from normality. The proposed model includes mixtures of normal, t and skew-normal distributions as special cases and provides a flexible alternative to recently proposed skew t mixtures. We develop two analytically tractable EM-type algorithms for computing maximum likelihood estimates of model parameters in which the skewness parameters and degrees of freedom are asymptotically uncorrelated. Standard errors for the parameter estimates can be obtained via a general information-based method. We also present a procedure of merging mixture components to automatically identify the number of clusters by fitting piecewise linear regression to the rescaled entropy plot. The effectiveness and performance of the proposed methodology are illustrated by two real-life examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Andrews, J.L., McNicholas, P.D.: Extending mixtures of multivariate t-factor analyzers. Stat. Comput. 21, 361–373 (2011)

    Article  MathSciNet  Google Scholar 

  • Arellano-Valle, R.B., Genton, M.G.: Skew-normal linear on fundamental skew distributions. J. Multivar. Anal. 96, 93–116 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  • Arellano-Valle, R.B., Bolfarine, H., Lachos, V.H.: Skew-normal linear mixed models. J. Data Sci. 3, 415–438 (2005)

    Google Scholar 

  • Azzalini, A.: The skew-normal distribution and related multivariate families (with discussion). Scand. J. Stat. 32, 159–200 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  • Azzalini, A.: sn: The skew-normal probability distribution. R package version 0.4-17 (2011)

  • Azzalini, A., Capitaino, A.: Statistical applications of the multivariate skew-normal distribution. J. R. Stat. Soc. B 61, 579–602 (1999)

    Article  MATH  Google Scholar 

  • Azzalini, A., Capitaino, A.: Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. J. R. Stat. Soc. B 65, 367–389 (2003)

    Article  MATH  Google Scholar 

  • Azzalini, A., Dalla Valle, A.: The multivariate skew-normal distribution. Biometrika 83, 715–726 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  • Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  • Basford, K.E., Greenway, D.R., McLachlan, G.J., Peel, D.: Standard errors of fitted means under normal mixture. Comput. Stat. 12, 1–17 (1997)

    MATH  Google Scholar 

  • Baudry, J.P., Raftery, A.E., Celeux, G., Lo, K., Gottardo, R.: Combining mixture components for clustering. J. Comput. Graph. Stat. 9, 332–353 (2010)

    Article  MathSciNet  Google Scholar 

  • Biernacki, C., Celeux, G., Govaert, G.: Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Anal. Mach. Intell. 22, 719–725 (2000)

    Article  Google Scholar 

  • Biernacki, C., Celeux, G., Govaert, G.: Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput. Stat. Data Anal. 41, 561–575 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  • Böhning, D., Dietz, E., Schaub, R., Schlattmann, P., Lindsay, B.: The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Ann. Inst. Stat. Math. 46, 373–388 (1994)

    Article  MATH  Google Scholar 

  • Brinkman, R.R., Gasparetto, M., Lee, S.J., Ribickas, A.J., Perkins, J., Janssen, W., Smiley, R., Smith, C.: High-content flow cytometry and temporal data analysis for defining a cellular signature of graft-versus-host disease. Biol. Blood Marrow Transplant. 13, 691–700 (2007)

    Article  Google Scholar 

  • Cabral, C.R.B., Bolfarine, H., Pereira, J.R.G.: Bayesian density estimation using skew student-t-normal mixtures. Comput. Stat. Data Anal. 52, 5075–5090 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  • Cabral, C., Lachos, V., Prates, M.: Multivariate mixture modeling using skew-normal independent distributions. Comput. Stat. Data Anal. 56, 126–142 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  • Cook, R.D., Weisberg, S.: An Introduction to Regression Graphics. Wiley, New York (1994)

    Book  MATH  Google Scholar 

  • Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc. B 39, 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  • Everitt, B.S., Hand, D.J.: Finite Mixture Distributions. Chapman & Hall, London (1981)

    Book  MATH  Google Scholar 

  • Fraley, C., Raftery, A.E.: How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput. J. 41, 578–588 (1998)

    MATH  Google Scholar 

  • Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97, 611–612 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  • Frühwirth-Schnatter, S.: Finite Mixture and Markov Switching Models. Springer, New York (2006)

    MATH  Google Scholar 

  • Frühwirth-Schnatter, S., Pyne, S.: Bayesian inference for finite mixtures of univariate and multivariate skew normal and skew-t distributions. Biostatistics 11, 317–336 (2010)

    Article  Google Scholar 

  • Genton, M.G.: Skew-Elliptical Distributions and Their Applications. Chapman & Hall, New York (2004)

    Book  MATH  Google Scholar 

  • Ghahramani, Z., Hinton, G.E.: The EM algorithm for mixtures of factor analyzers. (Tech. Report No. CRG-TR-96-1), University of Toronto (1997)

  • Gómez, H.W., Venegas, O., Bolfarine, H.: Skew-symmetric distributions generated by the distribution function of the normal distribution. Environmetrics 18, 395–407 (2007)

    Article  MathSciNet  Google Scholar 

  • Ho, H., Lin, T., Chen, H., Wang, W.: Some results on the truncated multivariate t distribution. J. Stat. Plan. Inference 142, 25–40 (2012a)

    Article  MATH  MathSciNet  Google Scholar 

  • Ho, H.J., Pyne, S., Lin, T.I.: Maximum likelihood inference for mixtures of skew student-t-normal distributions through practical EM-type algorithms. Stat. Comput. 22, 287–299 (2012b)

    Article  MathSciNet  Google Scholar 

  • Jamshidian, M., Jennrich, R.I.: Conjugate gradient acceleration of the EM algorithm. J. Am. Stat. Assoc. 88, 221–228 (1993)

    MATH  MathSciNet  Google Scholar 

  • Jamshidian, M., Jennrich, R.I.: Acceleration of the EM algorithm by using quasi-Newton methods. J. R. Stat. Soc. B 59, 569–587 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  • Karlis, D., Santourian, A.: Model-based clustering with non-elliptically contoured distributions. Stat. Comput. 19, 73–83 (2009)

    Article  MathSciNet  Google Scholar 

  • Karlis, D., Xekalaki, E.: Choosing initial values for the EM algorithm for finite mixtures. Comput. Stat. Data Anal. 41, 577–590 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  • Keribin, C.: Consistent estimation of the order of mixture models. Sankhya Ser. 62, 49–66 (2000)

    MATH  MathSciNet  Google Scholar 

  • Lange, K.: A quasi-Newton acceleration of the EM algorithm. Stat. Sin. 5, 1–18 (1995)

    MATH  Google Scholar 

  • Lee, S., McLachlan, G.: On the fitting of mixtures of multivariate skew t-distributions via the EM algorithm (2011). arXiv:1109.4706 [statME]

  • Lee, S., McLachlan, G.: Finite mixtures of multivariate skew t-distributions: some recent and new results. Stat. Comput. (2012). doi:10.1007/s11222-012-9362-4

    Google Scholar 

  • Lin, T.I.: Robust mixture modeling using multivariate skew t distributions. Stat. Comput. 20, 343–356 (2010)

    Article  MathSciNet  Google Scholar 

  • Lin, T.I., Lee, J.C., Hsieh, W.J.: Robust mixture modeling using the skew t distribution. Stat. Comput. 17, 81–92 (2007)

    Article  MathSciNet  Google Scholar 

  • Lindsay, B.: Mixture Models: Theory, Geometry and Applications. Institute of Mathematical Statistics, Hayward (1995)

    MATH  Google Scholar 

  • Liu, C.H., Rubin, D.B.: The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81, 633–648 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  • Lo, K., Gottardo, R.: Flexible mixture modeling via the multivariate t distribution with the Box-Cox transformation: an alternative to the skew-t distribution. Stat. Comput. 22, 33–52 (2012)

    Article  MathSciNet  Google Scholar 

  • Lo, K., Brinkman, R.R., Gottardo, R.: Automated gating of flow cytometry data via robust model-based clustering. Cytometry, Part A 73, 321–332 (2008)

    Article  Google Scholar 

  • Lo, K., Hahne, F., Brinkman, R.R., Gottardo, R.: FlowClust: a. Bioconductor package for automated gating of flow cytometry data. BMC Bioinform. 10, 145 (2009)

    Article  Google Scholar 

  • McLachlan, G.J., Basford, K.E.: Mixture Models: Inference and Application to Clustering. Marcel Dekker, New York (1988)

    Google Scholar 

  • McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. Wiley, New York (2008)

    Book  MATH  Google Scholar 

  • McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, New York (2000)

    Book  MATH  Google Scholar 

  • McLachlan, G.J., Bean, R.W., Jones, B.T.: Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution. Comput. Stat. Data Anal. 51, 5327–5338 (2007)

    Article  MATH  Google Scholar 

  • McNicholas, P.D., Murphy, T.B., McDaid, A.F., Frost, D.: Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Comput. Stat. Data Anal. 54, 711–723 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  • Meilijson, I.: A fast improvement to the EM algorithm to its own terms. J. R. Stat. Soc. B 51, 127–138 (1989)

    MATH  MathSciNet  Google Scholar 

  • Melnykov, V., Melnykov, I.: Initializing the EM algorithm in Gaussian mixture models with an unknown number of components. Comput. Stat. Data Anal. 56, 1381–1395 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  • Meng, X.L., Rubin, D.B.: Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80, 267–278 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  • Meng, X.L., van Dyk, D.: The EM algorithm—an old folk-song sung to a fast. J. R. Stat. Soc. B 59, 511–567 (1997)

    Article  MATH  Google Scholar 

  • O’Hagan, A., Murphy, T., Gormley, I.: Computational aspects of fitting mixture models via the expectation-maximization algorithm. Comput. Stat. Data Anal. 56, 3843–3864 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  • Peel, D., McLachlan, G.J.: Robust mixture modeling using the t distribution. Stat. Comput. 10, 339–348 (2000)

    Article  Google Scholar 

  • Pyne, S., Hu, X., Wang, K., Rossin, E., Lin, T.I., Maier, L., Baecher-Allan, C., McLachlan, G.J., Tamayo, P., Hafler, D.A., De Jager, P.L., Mesirov, J.P.: Automated high-dimensional flow cytometric data analysis. Proc. Natl. Acad. Sci. USA 106, 8519–8524 (2009)

    Article  Google Scholar 

  • R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2011)

  • Redner, R.A., Walker, H.F.: Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26, 195–239 (1984)

    Article  MATH  MathSciNet  Google Scholar 

  • Sahu, S.K., Dey, D.K., Branco, M.D.: A new class of multivariate skew distributions with application to Bayesian regression models. Can. J. Stat. 31, 129–150 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  • Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)

    Article  MATH  Google Scholar 

  • Titterington, D.M., Smith, A.F.M., Markov, U.E.: Statistical Analysis of Finite Mixture Distributions. Wiley, New York (1985)

    MATH  Google Scholar 

  • Vrbik, I., McNicholas, P.: Analytic calculations for the EM algorithm for multivariate skew t-mixture models. Stat. Probab. Lett. 82, 1169–1174 (2012)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors are grateful to the Associate Editor and two anonymous referees for their insightful comments and valuable suggestions, which led to substantial improvements in the presentation of this work. This research was supported by the National Science Council of Taiwan (Grant no. NSC101-2118-M-005-006-MY2).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tsung-I Lin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, TI., Ho, H.J. & Lee, CR. Flexible mixture modelling using the multivariate skew-t-normal distribution. Stat Comput 24, 531–546 (2014). https://doi.org/10.1007/s11222-013-9386-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-013-9386-4

Keywords