Improved initialisation of model-based clustering using Gaussian hierarchical partitions

Scrucca, Luca; Raftery, Adrian E.

doi:10.1007/s11634-015-0220-z

Improved initialisation of model-based clustering using Gaussian hierarchical partitions

Regular Article
Published: 26 October 2015

Volume 9, pages 447–460, (2015)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Luca Scrucca¹ &
Adrian E. Raftery²

1020 Accesses
42 Citations
Explore all metrics

Abstract

Initialisation of the EM algorithm in model-based clustering is often crucial. Various starting points in the parameter space often lead to different local maxima of the likelihood function and, so to different clustering partitions. Among the several approaches available in the literature, model-based agglomerative hierarchical clustering is used to provide initial partitions in the popular mclust R package. This choice is computationally convenient and often yields good clustering partitions. However, in certain circumstances, poor initial partitions may cause the EM algorithm to converge to a local maximum of the likelihood function. We propose several simple and fast refinements based on data transformations and illustrate them through data examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The parsimonious Gaussian mixture models with partitioned parameters and their application in clustering

Article 25 January 2024

Recent Developments in Model-Based Clustering with Applications

A Variational Bayesian Approach for Unsupervised Clustering

References

Auder B, Lebret R, Lovleff S, Langrognet F (2014) Rmixmod: an interface for MIXMOD. http://CRAN.R-project.org/package=Rmixmod, R package version 2.0.2
Banfield J, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49:803–821
Article MATH MathSciNet Google Scholar
Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7):719–725
Article Google Scholar
Biernacki C, Celeux G, Govaert G (2003) Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput Stat Data Anal 41(3):561–575
Article MATH MathSciNet Google Scholar
Biernacki C, Celeux G, Govaert G, Langrognet F (2006) Model-based cluster and discriminant analysis with the MIXMOD software. Comput Stat Data Anal 51:587–600
Article MATH MathSciNet Google Scholar
Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recognit 28:781–793
Article Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J R Stat Soc Series B Stat Methodol 39:1–38
MATH MathSciNet Google Scholar
Everitt B, Landau S, Leese M, Stahl D (2011) Cluster analysis, 5th edn. Wiley, Chichester, UK
Book MATH Google Scholar
Flury B (1997) A first course in multivariate statistics. Springer, New York
Book MATH Google Scholar
Forina M, Armanino C, Castino M, Ubigli M (1986) Multivariate data analysis as a discriminating method of the origin of wines. Vitis 25:189–201
Google Scholar
Fraley C (1998) Algorithms for model-based Gaussian hierarchical clustering. SIAM J Sci Compu 20(1):270–281
Article MATH MathSciNet Google Scholar
Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41:578–588
Article MATH Google Scholar
Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631
Article MATH MathSciNet Google Scholar
Fraley C, Raftery AE, Murphy TB, Scrucca L (2012) MCLUST version 4 for R: normal mixture modeling for model-based clustering, classification, and density estimation. Technical Report 597, Department of Statistics, University of Washington
Fraley C, Raftery AE, Scrucca L (2015) mclust: normal mixture modelling for model-based clustering, classification, and density estimation. http://CRAN.R-project.org/package=mclust, R package version 5.0.1
Gordon AD (1999) Classification, 2nd edn. Chapman & Hall/CRC
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218
Article Google Scholar
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Inc
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, UK
Maitra R (2009) Initializing partition-optimization algorithms. IEEE/ACM Trans Comput Biol Bioinform 6(1):144–157
Article Google Scholar
McLachlan G, Krishnan T (2008) The EM algorithm and extensions, 2nd edn. Wiley-Interscience, Hoboken, New Jersey
Book MATH Google Scholar
McLachlan G, Peel D (2000) Finite mixture models. Wiley, New York
Book MATH Google Scholar
McLachlan GJ (1988) On the choice of starting values for the EM algorithm in fitting mixture models. Statistician 37(4/5):417
Article MathSciNet Google Scholar
McNicholas PD, ElSherbiny A, McDaid AF, Murphy TB (2015) pgmm: Parsimonious Gaussian Mixture Models. http://CRAN.R-project.org/package=pgmm, R package version 1.2
Melnykov V, Maitra R (2010) Finite mixture models and model-based clustering. Stat Surv 4:80–116
Article MATH MathSciNet Google Scholar
Melnykov V, Melnykov I (2012) Initializing the EM algorithm in Gaussian mixture models with an unknown number of components. Comput Stat Data Anal 56(6):1381–1395
Article MATH MathSciNet Google Scholar
Milligan GW, Cooper MC (1986) A study of the comparability of external criteria for hierarchical cluster analysis. Multivar Behav Res 21(4):441–458
Article Google Scholar
Raftery AE, Dean N (2006) Variable selection for model-based clustering. J Am Stat Assoc 101(473):168–178
Article MATH MathSciNet Google Scholar
Schwartz G (1978) Estimating the dimension of a model. Ann Stat 6:31–38
Google Scholar
Wu CJ (1983) On the convergence properties of the EM algorithm. Ann Stat 11(1):95–103
Article MATH Google Scholar

Download references

Acknowledgments

The authors are grateful to the Coordinating Editor and two referees for their very helpful comments. This work was supported by NIH Grants R01-HD054511, R01-HD070936 and U54-HL127624, and by Science Foundation Ireland Walton Research Fellowship Number 11/W.1/I2079.

Author information

Authors and Affiliations

Dipartimento di Economia, Università degli Studi di Perugia, Via A. Pascoli 20, 06123, Perugia, Italy
Luca Scrucca
Department of Statistics, University of Washington, Box 354322, Seattle, Washington, 98195-4322, USA
Adrian E. Raftery

Authors

Luca Scrucca
View author publications
You can also search for this author in PubMed Google Scholar
Adrian E. Raftery
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luca Scrucca.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Scrucca, L., Raftery, A.E. Improved initialisation of model-based clustering using Gaussian hierarchical partitions. Adv Data Anal Classif 9, 447–460 (2015). https://doi.org/10.1007/s11634-015-0220-z

Download citation

Received: 30 November 2014
Revised: 03 October 2015
Accepted: 12 October 2015
Published: 26 October 2015
Issue Date: December 2015
DOI: https://doi.org/10.1007/s11634-015-0220-z

Keywords

Mathematics Subject Classification

62H30 (Classification and discrimination; cluster analysis)

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved initialisation of model-based clustering using Gaussian hierarchical partitions

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The parsimonious Gaussian mixture models with partitioned parameters and their application in clustering

Recent Developments in Model-Based Clustering with Applications

A Variational Bayesian Approach for Unsupervised Clustering

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Navigation

Improved initialisation of model-based clustering using Gaussian hierarchical partitions

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The parsimonious Gaussian mixture models with partitioned parameters and their application in clustering

Recent Developments in Model-Based Clustering with Applications

A Variational Bayesian Approach for Unsupervised Clustering

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation