Abstract
We analyze the influence of the cluster shape on the performance of four cluster validation criteria: AIC, BIC, ICL and NI. First we introduce a method to generate unimodal and radially symmetric clusters whose shape can be interpolated between peaky long-tailed and flat distributions using a single parameter. Normally distributed clusters are obtained as a special case. Then we systematically study the performance of AIC, BIC, ICL and NI when validating clusters of arbitrary shapes. Using problems with two clusters, different inter-cluster distances and different dimensions, we show that, while BIC provides the best results for normally distributed clusters, in a general context with high dimensional data and unknown cluster distributions the use of ICL or NI may be a better choice.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Note that \(\hat{\beta }(1) = \beta (1)\) as there is only one trivial solution for \(n_{c} = 1\).
- 2.
In the original formulation in [18] there is an additional constant term that has been omitted here for the sake of simplicity. This term depends only on the covariance matrix of the entire data set, which is constant for any particular problem.
References
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39(1), 1–38 (1977)
Richardson, S., Green, P.J.: On Bayesian analysis of mixtures with an unknown number of components. J. R. Stat. Soc. B 59(4), 731–792 (1997)
Rasmussen, C.E.: The infinite Gaussian mixture model. In: Sara, A., Solla, T.K.L., Müller, K.-R. (eds.) Advances in Neural Information Processing Systems [NIPS Conference, Denver, Colorado, USA, 29 November–4 December 1999], vol. 12, pp. 554–560. The MIT Press (1999)
Neal, R.M.: Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Stat. 9(2), 249–265 (2000)
Figueiredo, M.A.T., Jain, A.K.: Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 381–396 (2002)
McLachlan, G.J., Peel, D.: Finite Mixture Models. Series in Probability and Statistics. Wiley, New York (2000)
Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control. 19(6), 716–723 (1974)
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)
Fraley, C., Raftery, A.E.: How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput. J. 41(8), 578–588 (1998)
Gordon, A.D.: Cluster validation. In: Hayashi, C., Yajima, K., Bock, H.H., Ohsumi, N., Tanaka, Y., Baba, Y. (eds.) Data Science, Classification and Related Methods, pp. 22–39. Springer, Heidelberg (1998)
Bozdogan, H.: Choosing the number of component clusters in the mixture-model using a new information complexity criterion of the inverse-Fisher information matrix. In: Opitz, O., Lausen, B., Klar, R. (eds.) Data Analysis and Knowledge Organization, pp. 40–54. Springer, Heidelberg (1993). https://doi.org/10.1007/978-3-642-50974-2_5
Biernacki, C., Celeux, G., Govaert, G.: An improvement of the NEC criterion for assessing the number of clusters in a mixture model. Pattern Recognit. Lett. 20(3), 267–272 (1999)
Bezdek, J.C., Li, W., Attikiouzel, Y., Windham, M.P.: A geometric approach to cluster validity for normal mixtures. Soft Comput. 1(4), 166–179 (1997)
Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J.M., Perona, I.: An extensive comparative study of cluster validity indices. Pattern Recognit. 46(1), 243–256 (2013)
Rodriguez, M.Z., et al.: Clustering algorithms: a comparative approach. PLoS ONE 14, e0210236 (2019)
Biernacki, C., Celeux, G., Govaert, G.: Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Anal. Mach. Intell. 22(7), 719–725 (2000)
Samé, A., Ambroise, C., Govaert, G.: An online classification EM algorithm based on the mixture model. Stat. Comput. 17(3), 209–218 (2007)
Lago-Fernández, L.F., Corbacho, F.J.: Normality-based validation for crisp clustering. Pattern Recognit. 43(3), 782–795 (2010)
Lago-Fernández, L.F., Sánchez-Montañés, M.A., Corbacho, F.J.: The effect of low number of points in clustering validation via the negentropy increment. Neurocomputing 74(16), 2657–2664 (2011)
Lago-Fernández, L.F., Sánchez-Montañés, M., Corbacho, F.: Fuzzy cluster validation using the partition negentropy criterion. In: Alippi, C., Polycarpou, M., Panayiotou, C., Ellinas, G. (eds.) ICANN 2009. LNCS, vol. 5769, pp. 235–244. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04277-5_24
Acknowledgments
This work was funded by grant S2017/BMD-3688 from Comunidad de Madrid, and by Spanish projects MINECO/FEDER TIN2017-84452-R and DPI2015-65833-P (http://www.mineco.gob.es/).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Lago-Fernández, L.F., Aragón, J., Sánchez-Montañés, M. (2019). Validation of Unimodal Non-Gaussian Clusters. In: Rojas, I., Joya, G., Catala, A. (eds) Advances in Computational Intelligence. IWANN 2019. Lecture Notes in Computer Science(), vol 11507. Springer, Cham. https://doi.org/10.1007/978-3-030-20518-8_50
Download citation
DOI: https://doi.org/10.1007/978-3-030-20518-8_50
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20517-1
Online ISBN: 978-3-030-20518-8
eBook Packages: Computer ScienceComputer Science (R0)