Abstract
The objective of clustering ensemble is to fuse multiple base partitions (BPs) to find the underlying data structure. It has been observed that sample can change its neighbors in different BPs and different samples have different relationship stability of sample. This difference shows that samples may have different contributions to the detection of underlying data structure. In addition, clustering ensemble aims to integrate the inconsistent parts of BPs by initially extracting the consistent parts. However, the existing clustering ensemble methods treat all samples equally. They neither consider sample relationship stability nor whether sample belongs to the consistent result or the inconsistent result in BPs. To tackle these deficiencies, we introduce the certainty of a sample to qualify its neighbor relationship stability and propose a formula to calculate this certainty. Then, we develop a clustering ensemble algorithm based on the sample’s certainty. It is based on the following idea: the neighbor relationship of cluster core in BPs is more stable, and different cluster cores usually do not form neighbor relationships in BPs. This idea forms the basis of the clustering ensemble process. According to the sample’s certainty, this algorithm divides a dataset into two subsets: cluster core samples and cluster halo samples. Then, the proposed algorithm discovers a clear core structure using cluster core samples and gradually assigns cluster halo samples to the core structure. The experiments on six synthetic datasets illustrate how our algorithm works. This algorithm has excellent performance and outperforms twelve state-of-the-art clustering ensemble algorithms on twelve real datasets.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Change history
23 November 2021
A Correction to this paper has been published: https://doi.org/10.1007/s12559-021-09957-z
References
Verma M, Srivastava M, Chack N, Diswar AK, Gupta N. A comparative study of various clustering algorithms in data mining. Int J Eng Res Appl (IJERA). 2012;2(3):1379–84.
Abualigah LM, Khader AT, Al-Betar MA. Unsupervised feature selection technique based on genetic algorithm for improving the text clustering; In: Proceedings of the 2016 7th international conference on computer science and information technology (CSIT), 2016. IEEE.
Elankavi R, Kalaiprasath R, Udayakumar DR. A fast clustering algorithm for high-dimensional data. International Journal of Civil Engineering and Technology (IJCIET). 2017;8(5):1220–7.
Kang Z, Pan H, Hoi SC, Xu Z. Robust graph learning from noisy data. IEEE transactions on cybernetics. 2019;50(5):1833–43.
Strehl A, Ghosh J. Cluster ensembles-a knowledge reuse framework for combining multiple partitions. J Mach Learn Res. 2002, 3(Dec);583–617.
Li F, Qian Y, Wang J, Liang J. Multigranulation information fusion: a Dempster-Shafer evidence theory-based clustering ensemble method. Inf Sci. 2017;378:389–409.
Abdala DD, Wattuya P, Jiang X. Ensemble clustering via random walker consensus strategy; In: Proceedings of the 2010 20th International Conference on Pattern Recognition. 2010. IEEE.
Tumer K, Agogino AK. Ensemble clustering with voting active clusters. Pattern Recogn Lett. 2008;29(14):1947–53.
Zhou P, Du L, Wang H, Shi L, Shen YD. Learning a robust consensus matrix for clustering ensemble via Kullback-Leibler divergence minimization; In: Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015. Citeseer.
Fern XZ, Lin W. Cluster ensemble selection. Statistical Analysis and Data Mining: The ASA Data Science Journal. 2008;1(3):128–41.
Kuncheva LI, Vetrov DP. Evaluation of stability of k-means cluster ensembles with respect to random initialization. IEEE Trans Pattern Anal Mach Intell. 2006;28(11):1798–808.
Kuncheva LI, Hadjitodorov ST. Using diversity in cluster ensembles; In: Proceedings of the 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat No 04CH37583), 2004. IEEE.
Domeniconi C, Al-Razgan M. Weighted cluster ensembles: methods and analysis. ACM Transactions on Knowledge Discovery from Data (TKDD). 2009;2(4):1–40.
Rodriguez A, Laio A. Clustering by fast search and find of density peaks. Science. 2014;344(6191):1492–6.
Li F, Qian Y, Wang J, Dang C, Jing L. Clustering ensemble based on sample’s stability. Artif Intell. 2019;273:37–55.
Zhou P, Du L, Liu X, et al. Self-Paced Clustering Ensemble. IEEE Transactions on Neural Networks and Learning Systems, 2020.
Zhou P, Du L, Li X. Self-paced Consensus Clustering with Bipartite Graph; In: Proceedings of the Proceedings of International Joint Conference on Artificial Intelligence, 2020.
Duarte FJ, Fred AL, Lourenço A, Rodrigues MF. Weighting cluster ensembles in evidence accumulation clustering; In: Proceedings of the 2005 portuguese conference on artificial intelligence, 2005. IEEE.
Fred AL, Jain AK. Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell. 2005;27(6):835–50.
Fern XZ, Brodley CE. Solving cluster ensemble problems by bipartite graph partitioning; In: Proceedings of the twenty-first international conference on Machine learning, 2004.
Minaei-Bidgoli B, Topchy AP, Punch WF. A Comparison of Resampling Methods for Clustering Ensembles. In: Proceedings of the IC-AI, 2004.
Liu H, Shao M, Li S, Fu Y. Infinite ensemble for image clustering; In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016.
Iam-On N, Boongoen T, Garrett S, Price C. A link-based approach to the cluster ensemble problem. IEEE Trans Pattern Anal Mach Intell. 2011;33(12):2396–409.
Huang D, Wang CD, Peng H, Lai J, Kwoh CK. Enhanced ensemble clustering via fast propagation of cluster-wise similarities. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2018.
Liu H, Liu T, Wu J, Tao D, Fu Y. Spectral ensemble clustering; In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, 2015.
Zhou J, Zheng H, Pan L. Ensemble clustering based on dense representation. Neurocomputing. 2019;357:66–76.
Bagherinia A, Minaei-Bidgoli B, Hosseinzadeh M, Parvin H. Reliability-based fuzzy clustering ensemble. Fuzzy Sets Syst. 2020.
Johnson SC. Hierarchical clustering schemes. Psychometrika. 1967;32(3):241–54.
Von Luxburg U. A tutorial on spectral clustering. Stat Comput. 2007;17(4):395–416.
Huang D, Lai JH, Wang CD. Robust ensemble clustering using probability trajectories. IEEE Trans Knowl Data Eng. 2015;28(5):1312–26.
Huang D, Wang CD, Wu JS, Lai JH, Kwoh CK. Ultra-scalable spectral clustering and ensemble clustering. IEEE Trans Knowl Data Eng. 2019;32(6):1212–26.
Franek L, Jiang X. Ensemble clustering by means of clustering embedding in vector spaces. Pattern Recogn. 2014;47(2):833–42.
Bai L, Liang J, Cao F. A multiple k-means clustering ensemble algorithm to find nonlinearly separable clusters. Information Fusion. 2020;61:36–47.
Tao Z, Liu H, Li S, Ding Z, Fu Y. Robust spectral ensemble clustering via rank minimization. ACM Transactions on Knowledge Discovery from Data (TKDD). 2019;13(1):1–25.
Kang Z, Zhao X, Peng C, et al. Partition level multiview subspace clustering. Neural Netw. 2020;122:279–88.
Otsu N. A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern. 1979;9(1):62–6.
Sezgin M, Sankur B. Survey over image thresholding techniques and quantitative performance evaluation. J Electron Imaging. 2004;13(1):146–65.
Fu L, Medico E. FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data. BMC Bioinformatics. 2007;8(1):3.
Jain AK, Law MH. Data clustering: A user’s dilemma; In: Proceedings of the International conference on pattern recognition and machine intelligence, 2005. Springer.
Ultsch A. Clustering with SOM: U*C; In: Proceedings of the Proc Workshop on Self-organizing Maps, 2005.
Ayad HG, Kamel MS. On voting-based consensus of cluster ensembles. Pattern Recogn. 2010;43(5):1943–53.
Zhou ZH, Tang W. Clusterer ensemble. Knowl-Based Syst. 2006;19(1):77–83.
Iam-on N, Garrett S. Linkclue: A matlab package for link-based cluster ensembles. J Stat Softw. 2010;36(9):1–36.
Yang Y. An evaluation of statistical approaches to text categorization. Inf Retrieval. 1999;1(1–2):69–90.
Hubert L, Arabie P. Comparing partitions. J Classif. 1985;2(1):193–218.
Funding
This work was supported in part by the Natural Science Foundation of China under Grant 61402004 and 61672034, in part by the Key Research and Development Program of Anhui Province under Grant 1804d08020309, and in part by the Natural Science Foundation of Anhui Province under Grant 1908085MF188, and in part by the Key Project of Natural Science Foundation of Anhui Provincial Department of Education under Grant KJ2020A0041.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethical Approval
This article does not contain any studies with human or animal subjects performed by any of the authors.
Informed Consent
Informed consent was obtained from all individual participants included in the study.
Conflict of Interest
The authors declare no competing interests.
Additional information
The original vision of this article has been revised. Funding information has been corrected.
Rights and permissions
About this article
Cite this article
Ji, X., Liu, S., Zhao, P. et al. Clustering Ensemble Based on Sample’s Certainty. Cogn Comput 13, 1034–1046 (2021). https://doi.org/10.1007/s12559-021-09876-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-021-09876-z