Abstract
Center-based clustering algorithms like K-means, and EM are one of the most popular classes of clustering algorithms in use today. The author developed another variation in this family — K-Harmonic Means (KHM). It has been demonstrated using a small number of “benchmark” datasets that KHM is more robust than K-means and EM. In this paper, we compare their performance statistically. We run K-means, K-Harmonic Means and EM on each of 3600 pairs of (dataset, initialization) to compare the statistical average and variation of the performance of these algorithms. The results are that, for low dimensional datasets, KHM performs consistently better than KM, and KM performs consistently better than EM over a large variation of clustered-ness of the datasets and a large variation of initializations. Some of the reasons that contributed to this difference are explained.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bradley, P., Fayyad, U. M., C.A., “Refining Initial Points for KM Clustering”, MS Technical Report MSR-TR-98-36, May (1998)
Duda, R., Hart, P., “Pattern Classification and Scene Analysis”, John Wiley & Sons, (1972)
Dempster, A. P., Laird, N.M., and Rubin, D.B., “Miximum Likelyhood from Incomplete Data via the EM Algorithm”, Journal of the Royal Statistical Society, Series B, 39(1):1–38, (1977)
Fayyad, U. M., Piatetsky-Shapiro, G. Smyth, P. and Uthurusamy, R., “Advances in Knowledge Discovery and Data Mining”, AAAI Press (1996)
Fink, L.J., Fung, M., McGuire, K.L., Gribskov, M, “Elucidation of Genes Involved in HTLV-I-induced Transformation Using the K-Harmonic Means Algorithm to Cluster Microarray Data”, follow the link http://www.ismb02.org/posterlist.htm to find an extended abstract. Software tools at http://array.sdsc.edu/. (2002)
Gersho & Gray, “Vector Quantization and Signal Compression”, KAP, (1992)
Kaufman, L. and Rousseeuw, P. J., “Finding Groups in Data: An Introduction to Cluster Analysis”, John Wiley & Sons, (1990)
MacQueen, J., “Some Methods for Classification and Analysis of Multivariate Observations”. Pp. 281–297 in: L. M. Le Cam & J. Neyman [eds.] Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Vol. 1. University of California Press, Berkeley. xvii + 666 p, (1967)
McKenzie, P. and Alder, M., “Initializing the EM Algorithm for Use in Gaussian Mixture Modeling”, The Univ. of Western Australia, Center for Information Processing Systems, Manuscript
McLachlan, G. J. and Krishnan, T., “The EM Algorithm and Extensions.”, John Wiley & Sons, (1997)
Meila, M., Heckerman, D., “An Experimental Comparison of Model-based Clustering Methods”, Machine Learning, 42, 9–29, (2001)
Pena, J., Lozano, J., Larranaga, P., “An Empirical Comparison of Four Initialization Methods for the K-means Algorithm”, Pattern Recognition Letters, 20, 1027–1040, (1999)
Rendner, R.A. and Walker, H.F., “Mixture Densities, Maximum Likelihood and The EM Algorithm”, SIAM Review, vol. 26 # 2, (1984)
Tibshirani, R., Walther, G., and Hastie, T., “Estimating the Number of Clusters in a Dataset via the Gap Statistic”, Available at http://www-stat.stanford.edu/~tibs/research.html. March, (2000)
Zhang, B., Hsu, M., Dayal, U., “K-Harmonic Means”, Intl. Workshop on Temporal, Spatial and Spatio-Temporal Data Mining, Lyon, France Sept. 12, (2000)
Zhang, B., “Generalized K-Harmonic Means — Dynamic Weighting of Data in Unsupervised Learning,”, the First SIAM International Conference on Data Mining (SDM’2001), Chicago, USA, April 5–7, (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, B. (2003). Comparison of the Performance of Center-Based Clustering Algorithms. In: Whang, KY., Jeon, J., Shim, K., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2003. Lecture Notes in Computer Science(), vol 2637. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36175-8_7
Download citation
DOI: https://doi.org/10.1007/3-540-36175-8_7
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-04760-5
Online ISBN: 978-3-540-36175-6
eBook Packages: Springer Book Archive