Abstract
Initialisation of the EM algorithm in model-based clustering is often crucial. Various starting points in the parameter space often lead to different local maxima of the likelihood function and, so to different clustering partitions. Among the several approaches available in the literature, model-based agglomerative hierarchical clustering is used to provide initial partitions in the popular mclust R package. This choice is computationally convenient and often yields good clustering partitions. However, in certain circumstances, poor initial partitions may cause the EM algorithm to converge to a local maximum of the likelihood function. We propose several simple and fast refinements based on data transformations and illustrate them through data examples.
Similar content being viewed by others
References
Auder B, Lebret R, Lovleff S, Langrognet F (2014) Rmixmod: an interface for MIXMOD. http://CRAN.R-project.org/package=Rmixmod, R package version 2.0.2
Banfield J, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49:803–821
Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7):719–725
Biernacki C, Celeux G, Govaert G (2003) Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput Stat Data Anal 41(3):561–575
Biernacki C, Celeux G, Govaert G, Langrognet F (2006) Model-based cluster and discriminant analysis with the MIXMOD software. Comput Stat Data Anal 51:587–600
Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recognit 28:781–793
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J R Stat Soc Series B Stat Methodol 39:1–38
Everitt B, Landau S, Leese M, Stahl D (2011) Cluster analysis, 5th edn. Wiley, Chichester, UK
Flury B (1997) A first course in multivariate statistics. Springer, New York
Forina M, Armanino C, Castino M, Ubigli M (1986) Multivariate data analysis as a discriminating method of the origin of wines. Vitis 25:189–201
Fraley C (1998) Algorithms for model-based Gaussian hierarchical clustering. SIAM J Sci Compu 20(1):270–281
Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41:578–588
Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631
Fraley C, Raftery AE, Murphy TB, Scrucca L (2012) MCLUST version 4 for R: normal mixture modeling for model-based clustering, classification, and density estimation. Technical Report 597, Department of Statistics, University of Washington
Fraley C, Raftery AE, Scrucca L (2015) mclust: normal mixture modelling for model-based clustering, classification, and density estimation. http://CRAN.R-project.org/package=mclust, R package version 5.0.1
Gordon AD (1999) Classification, 2nd edn. Chapman & Hall/CRC
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Inc
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, UK
Maitra R (2009) Initializing partition-optimization algorithms. IEEE/ACM Trans Comput Biol Bioinform 6(1):144–157
McLachlan G, Krishnan T (2008) The EM algorithm and extensions, 2nd edn. Wiley-Interscience, Hoboken, New Jersey
McLachlan G, Peel D (2000) Finite mixture models. Wiley, New York
McLachlan GJ (1988) On the choice of starting values for the EM algorithm in fitting mixture models. Statistician 37(4/5):417
McNicholas PD, ElSherbiny A, McDaid AF, Murphy TB (2015) pgmm: Parsimonious Gaussian Mixture Models. http://CRAN.R-project.org/package=pgmm, R package version 1.2
Melnykov V, Maitra R (2010) Finite mixture models and model-based clustering. Stat Surv 4:80–116
Melnykov V, Melnykov I (2012) Initializing the EM algorithm in Gaussian mixture models with an unknown number of components. Comput Stat Data Anal 56(6):1381–1395
Milligan GW, Cooper MC (1986) A study of the comparability of external criteria for hierarchical cluster analysis. Multivar Behav Res 21(4):441–458
Raftery AE, Dean N (2006) Variable selection for model-based clustering. J Am Stat Assoc 101(473):168–178
Schwartz G (1978) Estimating the dimension of a model. Ann Stat 6:31–38
Wu CJ (1983) On the convergence properties of the EM algorithm. Ann Stat 11(1):95–103
Acknowledgments
The authors are grateful to the Coordinating Editor and two referees for their very helpful comments. This work was supported by NIH Grants R01-HD054511, R01-HD070936 and U54-HL127624, and by Science Foundation Ireland Walton Research Fellowship Number 11/W.1/I2079.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Scrucca, L., Raftery, A.E. Improved initialisation of model-based clustering using Gaussian hierarchical partitions. Adv Data Anal Classif 9, 447–460 (2015). https://doi.org/10.1007/s11634-015-0220-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-015-0220-z