Abstract
This paper contains a proposal to assign points to clusters, represented by their centers, based on weighted expected distances in a cluster analysis context. The proposed clustering algorithm has mechanisms to create new clusters, to merge two nearby clusters and remove very small clusters, and to identify points ‘noise’ when they are beyond a reasonable neighborhood of a center or belong to a cluster with very few points. The presented clustering algorithm is evaluated using four randomly generated and two well-known data sets. The obtained clustering is compared to other clustering algorithms through the visualization of the clustering, the value of the DB validity measure and the value of the sum of within-cluster distances. The preliminary comparison of results shows that the proposed clustering algorithm is very efficient and effective.
This work has been supported by FCT – Fundação para a Ciência e Tecnologia within the R&D Unit Project Scope UIDB/00319/2020.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
available at Mostapha Kalami Heris, Evolutionary Data Clustering in MATLAB (URL: https://yarpiz.com/64/ypml101-evolutionary-clustering), Yarpiz, 2015.
References
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Greenlaw, R., Kantabutra, S.: Survey of clustering: algorithms and applications. Int. J. Inf. Retr. Res. 3(2) (2013). 29 pages
Ezugwu, A.E.: Nature-inspired metaheuristics techniques for automatic clustering: a survey and performance study. SN Appl. Sci. 2, 273–329 (2020)
Mohammed, J.Z., Meira, W., Jr.: Data Mining and Machine Learning: Fundamental Concepts and Algorithms, 2nd edn. Cambridge University Press, Cambridge (2020)
Xu, D., Tian, Y.: A comprehensive survey of clustering algorithms. Ann. Data Sci. 2(2), 165–193 (2015)
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Le Cam, L.M., Neyman, J. (eds.) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press (1967)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis and density estimation. J. Am. Stat. Assoc. 97(458), 611–631 (2002)
Kwedlo, W.: A clustering method combining differential evolution with K-means algorithm. Pattern Recogn. Lett. 32, 1613–1621 (2011)
Patel, K.G.K., Dabhi, V.K., Prajapati, H.B.: Clustering using a combination of particle swarm optimization and K-means. J. Intell. Syst. 26(3), 457–469 (2017)
He, Z., Yu, C.: Clustering stability-based evolutionary K-means. Soft. Comput. 23, 305–321 (2019)
Sarkar, M., Yegnanarayana, B., Khemani, D.: A clustering algorithm using evolutionary programming-based approach. Pattern Recogn. Lett. 18, 975–986 (1997)
Chou, C.-H., Su, M.-C., Lai, E.: A new cluster validity measure and its application to image compression. Pattern Anal. Appl. 7, 205–220 (2004)
Asvadi, A.: K-means Clustering Code. Department of ECE, SPR Lab., Babol (Noshirvani) University of Technology (2013). http://www.a-asvadi.ir/
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-1(2), 224–227 (1979)
Dua, D., Graff, C.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2019). http://archive.ics.uci.edu/ml
Cura, T.: A particle swarm optimization approach to clustering. Expert Syst. Appl. 39, 1582–1588 (2012)
Kao, Y.-T., Zahara, E., Kao, I.-W.: A hybridized approach to data clustering. Expert Syst. Appl. 34, 1754–1762 (2008)
Acknowledgments
The authors wish to thank three anonymous referees for their comments and suggestions to improve the paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Rocha, A.M.A.C., Costa, M.F.P., Fernandes, E.M.G.P. (2021). A Simple Clustering Algorithm Based on Weighted Expected Distances. In: Pereira, A.I., et al. Optimization, Learning Algorithms and Applications. OL2A 2021. Communications in Computer and Information Science, vol 1488. Springer, Cham. https://doi.org/10.1007/978-3-030-91885-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-91885-9_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91884-2
Online ISBN: 978-3-030-91885-9
eBook Packages: Computer ScienceComputer Science (R0)