Abstract
Nature acts as a source of concepts, mechanisms, and principles for designing artificial computing systems to deal with complex computational problems. Most heuristic and metaheuristic algorithms are taken from the behavior of biological systems or physical systems in nature. Clustering is the process of grouping a set of data and putting it in a class of similar examples. Since the clustering problem is an NP-hard problem, using metaheuristics can be an appropriate tool to deal with these issues. Indeed, clustering is a special case of an optimization problem. In classic clustering, knowing the number of clusters is required before clustering. This paper presents an algorithm that requires no prior knowledge to classify the data. In this paper, we proposed a swarm-based Emperor Penguins Colony (EPC) algorithm to solve both classic and automatic clustering problems. The proposed approach is compared with six state-of-the-art, popular, and improved nature-inspired algorithms, a partitioning-based heuristic algorithm, and a hierarchical clustering method on ten real-world datasets. The results show that classic and automatic clustering using the EPC algorithm has better performance in comparison with other competing algorithms.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD international conference on Management of data. pp 94–105
Aguiar C, Leite D (2020) Unsupervised fuzzy eIX: Evolving internal-external fuzzy clustering. In: 2020 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS). pp 1–8
Alghamdi SA (2020) Emperor based resource allocation for D2D communication and QoF based routing over cellular V2X in urban environment (ERA-D2Q). Wireless Netw 26(5):3419–3437
Aliniya Z, Mirroshandel SA (2019) A novel combinatorial merge-split approach for automatic clustering using imperialist competitive algorithm. Expert Syst Appl 117:243–266
Angelin B, Geetha A (2021) A roc curve based K-Means clustering for Outlier Detection using Dragon fly optimization. Turkish J Comput Math Educ (TURCOMAT) 12(9):467–476
Azarakhsh J, Raisi Z (2019) Automatic clustering using metaheuristic algorithms for content based image retrieval. In: Fundamental Research in Electrical Engineering The Selected Papers of The First International Conference on Fundamental Research in Electrical Engineering. Springer, Berlin, pp 83–99
Berkhin P (2006) A survey of clustering data mining techniques. In: Grouping multidimensional data: Recent advances in clustering. Springer, Berlin, pp 25–71
Cai J, Luo J, Wang S, Yang S (2018) Feature selection in machine learning: a new perspective. Neurocomputing 300:70–79
Caliński T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Statistics-theory Methods 3(1):1–27
Chaturvedi A, Green PE, Caroll JD (2001) K-modes clustering. J Classif 18:35–55
Chen JX, Gong YJ, Chen WN, Li M, Zhang J (2019) Elastic differential evolution for automatic data clustering. IEEE Trans cybernetics 51(8):4134–4147
Cheng D, Zhu Q, Huang J, Wu Q, Yang L (2018) A novel cluster validity index based on local cores. IEEE Trans neural networks Learn Syst 30(4):985–999
Chou CH, Su MC, Lai E (2004) A new cluster validity measure and its application to image compression. Pattern Anal Appl 7:205–220
Collins SR, Miller KM, Maas NL, Roguev A, Fillingham J, Chu CS, Krogan NJ (2007) Functional dissection of protein complexes involved in yeast chromosome biology using a genetic interaction map. Nature 446(7137):806–810
Das S, Abraham A, Konar A (2007) Automatic clustering using an improved differential evolution algorithm. IEEE Trans Syst man cybernetics-Part A: Syst Hum 38(1):218–237
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 2:224–227
Defays D (1977) An efficient algorithm for a complete link method. Comput J 20(4):364–366
Derrac J, García S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol Comput 1(1):3–18
Dey A, Dey S, Bhattacharyya S, Snasel V, Hassanien AE (2018) Simulated annealing based quantum inspired automatic clustering technique. In: The International Conference on Advanced Machine Learning Technologies and Applications (AMLTA2018). pp 73–81
Dhillon IS, Modha DS (2001) Concept decompositions for large sparse text data using clustering. Mach Learn 42:143–175
Dua D, Karra-Taniskidou E (2017) UCI Machine Learning Repository http://archive.ics.uci.edu/ml. Irvine, CA:University of California, School of Information and Computer Science.
Dunn JC (1974) Well-separated clusters and optimal fuzzy partitions. J cybernetics 4(1):95–104
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Inkdd 96(34):226–231
Ezugwu AE (2020) Nature-inspired metaheuristic techniques for automatic clustering: a survey and performance study. SN Appl Sci 2:1–57
Flasiński M (2016) Pattern recognition and cluster analysis. Introduction to Artificial Intelligence. Springer, Cham. https://doi.org/10.1007/978-3-319-40022-8_10
Garai G, Chaudhuri BB (2004) A novel genetic algorithm for automatic clustering. Pattern Recognit Lett 25(2):173–187
Garcia-Lamont F, Cervantes J, López A, Rodriguez L (2018) Segmentation of images by color features: a survey. Neurocomputing 292:1–27
Gharehchopogh FS, Abdollahzadeh B, Khodadadi N, Mirjalili S (2023) Metaheuristics for clustering problems. In: Comprehensive Metaheuristics. Academic Press, Rome, pp 379–392
Gower JC, Ross GJ (1969) Minimum spanning trees and single linkage cluster analysis. J Roy Stat Soc: Ser C (Appl Stat) 18(1):54–64
Harifi S, Byagowi E, Khalilian M (2017) Comparative study of apache spark MLlib clustering algorithms. In: Data mining and big data: second international conference, DMBD 2017, Fukuoka, Japan, July 27–August 1, 2017, Proceedings 2. Springer International Publishing, pp 61–73
Harifi S, Khalilian M, Mohammadzadeh J, Ebrahimnejad S (2019) Emperor Penguins colony: a new metaheuristic algorithm for optimization. Evol Intel 12:211–226
Harifi S, Khalilian M, Mohammadzadeh J, Ebrahimnejad S (2020a) Optimizing a neuro-fuzzy system based on nature-inspired emperor penguins colony optimization algorithm. IEEE Trans Fuzzy Syst 28(6):1110–1124
Harifi S, Khalilian M, Mohammadzadeh J, Ebrahimnejad S (2020b) Using Metaheuristic Algorithms to improve k-Means clustering: a comparative study. Rev d’Intelligence Artif 34(3):297–305
Harifi S, Khalilian M, Mohammadzadeh J, Ebrahimnejad S (2021) Optimization in solving inventory control problem using nature inspired Emperor Penguins colony algorithm. J Intell Manuf 32:1361–1375
Hyde R, Angelov P, MacKenzie AR (2017) Fully online clustering of evolving data streams into arbitrarily shaped clusters. Inf Sci 382:96–114
Ikotun AM, Almutari MS, Ezugwu AE (2021) K-means-based nature-inspired metaheuristic algorithms for automatic data clustering problems: recent advances and future directions. Appl Sci 11(23):11246
Jambudi T, Gandhi S (2019) A New K-means-Based Algorithm for Automatic Clustering and Outlier Discovery. In: Information and communication technology for intelligent systems: proceedings of ICTIS 2018, Volume 2. pp 457–467
José-García A, Gómez-Flores W (2016) Automatic clustering using nature-inspired metaheuristics: a survey. Appl Soft Comput 41:192–213
Kangin D, Angelov P (2015) Evolving clustering, classification and regression with TEDA. In: 2015 International Joint Conference on Neural Networks (IJCNN). pp 1–8
Kapoor S, Zeya I, Singhal C, Nanda SJ (2017) A grey wolf optimizer based automatic clustering algorithm for satellite image segmentation. Procedia Comput Sci 115:415–422
Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis. John Wiley & Sons, Rome
Kettani O, Ramdani F, Tadili B (2015) AK-means: an automatic clustering algorithm based on K-means. J Adv Comput Sci Technol 4(2):231
Kovács F, Legány C, Babos A (2005) Cluster validity measurement techniques. In: 6th International symposium of hungarian researchers on computational intelligence
Kuo RJ, Huang YD, Lin CC, Wu YH, Zulvia FE (2014) Automatic kernel clustering with bee colony optimization algorithm. Inf Sci 283:107–122
Lemos A, Leite D, Maciel L, Ballini R, Caminhas W, Gomide F (2012) Evolving fuzzy linear regression tree approach for forecasting sales volume of petroleum products. In: 2012 IEEE International Conference on Fuzzy Systems. pp 1–8
Lin NP, Chang CI, Chueh HE, Chen HJ, Hao WH (2008) A deflected grid-based algorithm for clustering analysis. WSEAS Trans Computers 7(4):125–132
Liu Y, Li Z, Xiong H, Gao X, Wu J (2010) Understanding of internal clustering validation measures. In: 2010 IEEE international conference on data mining. pp 911–916
Liu Y, Wu X, Shen Y (2011) Automatic clustering using genetic algorithms. Appl Math Comput 218(4):1267–1279
Mattos CL, Barreto GA, Horstkemper D, Hellingrath B (2017) Metaheuristic optimization for automatic clustering of customer-oriented supply chain data. In: 2017 12th International Workshop on Self-Organizing Maps and Learning Vector Quantization, Clustering and Data Visualization (WSOM). pp 1–8
Mendenhall W, Beaver RJ, Beaver BM (2012) Introduction to probability and statistics. Cengage Learning, Chennai
Nguyen-Trang T, Nguyen-Thoi T, Nguyen-Thi KN, Vo-Van T (2023) Balance-driven automatic clustering for probability density functions using metaheuristic optimization. Int J Mach Learn Cybernet 14:1063–1078
Pacheco TM, Gonçalves LB, Ströele V, Soares SSR (2018) An ant colony optimization for automatic data clustering problem. In: 2018 IEEE Congress on evolutionary computation (CEC). pp 1–8
Pakhira MK, Bandyopadhyay S, Maulik U (2004) Validity index for crisp and fuzzy clusters. Pattern Recogn 37(3):487–501
Pan SM, Cheng KS (2007) Evolution-based tabu search approach to automatic clustering. IEEE Trans Syst Man Cybernetics Part C (Applications Reviews) 37(5):827–838
Pelleg D, Moore A (1999) Accelerating exact k-means algorithms with geometric reasoning. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining. pp 277–281
Pelleg D, Moore AW (2000) X-means: Extending k-means with efficient estimation of the number of clusters. In: Icml. pp 727–734
Phillips SJ (2002) Acceleration of k-means and related clustering algorithms. In: Algorithm Engineering and Experiments: 4th International Workshop, ALENEX 2002 San Francisco, CA, USA, pp 166–177
Said AB, Hadjidj R, Foufou S (2017) Cluster validity index based on Jeffrey divergence. Pattern Anal Appl 20:21–31
Saxena A, Mukesh P, Akshansh G, Neha B, Om-Prakash P, Aruna T, Meng JE, Weiping D, Chin-Teng L (2017) A review of clustering techniques and developments. Neurocomputing 267:664–681
Schölkopf B, Smola A, Müller KR (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319
Sharma M, Chhabra JK (2019) Sustainable automatic data clustering using hybrid PSO algorithm with mutation. Sustainable Computing: Informatics and Systems 23:144–157
Silva AM, Caminhas W, Lemos A, Gomide F (2014) A fast learning algorithm for evolving neo-fuzzy neuron. Appl Soft Comput 14:194–209
Starczewski A (2017) A new validity index for crisp clusters. Pattern Anal Appl 20:687–700
Steinbach M, Karypis G, Kumar V (2000) A Comparison of Document Clustering Techniques, Technical Report; 00-034, University of Minnesota Digital Conservancy, 2000, 1–22. Available online: https://hdl.handle.net/11299/215421.
Tseng LY, Yang SB (2001) A genetic approach to the automatic clustering problem. Pattern Recogn 34(2):415–424
Wallace CS, Dowe DL (1994) Intrinsic classification by MML-the Snob program. In: Proceedings of the 7th Australian Joint Conference on Artificial Intelligence. p 37
Wang W, Yang J, Muntz R (1997) STING: a statistical information grid approach to spatial data mining. In Vldb 97:186–195
Welch WJ (1982) Algorithmic complexity: three NP-hard problems in computational statistics. J Stat Comput Simul 15(1):17–25
Zhang B, Hsu M, Dayal U (2001) K-harmonic means-a spatial clustering algorithm with boosting. In: Temporal, spatial, and spatio-temporal data mining: first international Workshop, TSDM 2000 Lyon, France, September 12, 2000 Revised Papers, pp 31–45
Zhao Q, Fränti P (2014) WB-index: a sum-of-squares based index for cluster validity. Data Knowl Eng 92:77–89
Zhao WL, Deng CH, Ngo CW (2018) k-means: a revisit. Neurocomputing 291:195–206
Zhou Y, Wu H, Luo Q, Abdel-Baset M (2019) Automatic data clustering using nature-inspired symbiotic organism search algorithm. Knowl Based Syst 163:546–557
Zhou Q, Hao JK, Wu Q (2021) Responsive threshold search based memetic algorithm for balanced minimum sum-of-squares clustering. Inf Sci 569:184–204
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Harifi, S., Khalilian, M. & Mohammadzadeh, J. Swarm based automatic clustering using nature inspired Emperor Penguins Colony algorithm. Evolving Systems 14, 1083–1099 (2023). https://doi.org/10.1007/s12530-023-09507-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12530-023-09507-y