Clustering Ensemble Based on Sample’s Certainty

Ji, Xia; Liu, Shuaishuai; Zhao, Peng; Li, Xuejun; Liu, Qiong

doi:10.1007/s12559-021-09876-z

Clustering Ensemble Based on Sample’s Certainty

Published: 25 May 2021

Volume 13, pages 1034–1046, (2021)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Xia Ji¹,
Shuaishuai Liu¹,
Peng Zhao¹,
Xuejun Li¹ &
…
Qiong Liu²

397 Accesses
9 Citations
Explore all metrics

A Correction to this article was published on 23 November 2021

This article has been updated

Abstract

The objective of clustering ensemble is to fuse multiple base partitions (BPs) to find the underlying data structure. It has been observed that sample can change its neighbors in different BPs and different samples have different relationship stability of sample. This difference shows that samples may have different contributions to the detection of underlying data structure. In addition, clustering ensemble aims to integrate the inconsistent parts of BPs by initially extracting the consistent parts. However, the existing clustering ensemble methods treat all samples equally. They neither consider sample relationship stability nor whether sample belongs to the consistent result or the inconsistent result in BPs. To tackle these deficiencies, we introduce the certainty of a sample to qualify its neighbor relationship stability and propose a formula to calculate this certainty. Then, we develop a clustering ensemble algorithm based on the sample’s certainty. It is based on the following idea: the neighbor relationship of cluster core in BPs is more stable, and different cluster cores usually do not form neighbor relationships in BPs. This idea forms the basis of the clustering ensemble process. According to the sample’s certainty, this algorithm divides a dataset into two subsets: cluster core samples and cluster halo samples. Then, the proposed algorithm discovers a clear core structure using cluster core samples and gradually assigns cluster halo samples to the core structure. The experiments on six synthetic datasets illustrate how our algorithm works. This algorithm has excellent performance and outperforms twelve state-of-the-art clustering ensemble algorithms on twelve real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Consistency-oriented clustering ensemble via data reconstruction

Article 19 July 2024

PCS-granularity weighted ensemble clustering via Co-association matrix

Article 13 March 2024

Clustering ensemble extraction: a knowledge reuse framework

Article 27 March 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Change history

23 November 2021
A Correction to this paper has been published: https://doi.org/10.1007/s12559-021-09957-z

Notes

References

Verma M, Srivastava M, Chack N, Diswar AK, Gupta N. A comparative study of various clustering algorithms in data mining. Int J Eng Res Appl (IJERA). 2012;2(3):1379–84.
Google Scholar
Abualigah LM, Khader AT, Al-Betar MA. Unsupervised feature selection technique based on genetic algorithm for improving the text clustering; In: Proceedings of the 2016 7th international conference on computer science and information technology (CSIT), 2016. IEEE.
Elankavi R, Kalaiprasath R, Udayakumar DR. A fast clustering algorithm for high-dimensional data. International Journal of Civil Engineering and Technology (IJCIET). 2017;8(5):1220–7.
Google Scholar
Kang Z, Pan H, Hoi SC, Xu Z. Robust graph learning from noisy data. IEEE transactions on cybernetics. 2019;50(5):1833–43.
Article Google Scholar
Strehl A, Ghosh J. Cluster ensembles-a knowledge reuse framework for combining multiple partitions. J Mach Learn Res. 2002, 3(Dec);583–617.
Li F, Qian Y, Wang J, Liang J. Multigranulation information fusion: a Dempster-Shafer evidence theory-based clustering ensemble method. Inf Sci. 2017;378:389–409.
Article Google Scholar
Abdala DD, Wattuya P, Jiang X. Ensemble clustering via random walker consensus strategy; In: Proceedings of the 2010 20th International Conference on Pattern Recognition. 2010. IEEE.
Tumer K, Agogino AK. Ensemble clustering with voting active clusters. Pattern Recogn Lett. 2008;29(14):1947–53.
Article Google Scholar
Zhou P, Du L, Wang H, Shi L, Shen YD. Learning a robust consensus matrix for clustering ensemble via Kullback-Leibler divergence minimization; In: Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015. Citeseer.
Fern XZ, Lin W. Cluster ensemble selection. Statistical Analysis and Data Mining: The ASA Data Science Journal. 2008;1(3):128–41.
Article MathSciNet Google Scholar
Kuncheva LI, Vetrov DP. Evaluation of stability of k-means cluster ensembles with respect to random initialization. IEEE Trans Pattern Anal Mach Intell. 2006;28(11):1798–808.
Article Google Scholar
Kuncheva LI, Hadjitodorov ST. Using diversity in cluster ensembles; In: Proceedings of the 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat No 04CH37583), 2004. IEEE.
Domeniconi C, Al-Razgan M. Weighted cluster ensembles: methods and analysis. ACM Transactions on Knowledge Discovery from Data (TKDD). 2009;2(4):1–40.
Article Google Scholar
Rodriguez A, Laio A. Clustering by fast search and find of density peaks. Science. 2014;344(6191):1492–6.
Article Google Scholar
Li F, Qian Y, Wang J, Dang C, Jing L. Clustering ensemble based on sample’s stability. Artif Intell. 2019;273:37–55.
Article MathSciNet Google Scholar
Zhou P, Du L, Liu X, et al. Self-Paced Clustering Ensemble. IEEE Transactions on Neural Networks and Learning Systems, 2020.
Zhou P, Du L, Li X. Self-paced Consensus Clustering with Bipartite Graph; In: Proceedings of the Proceedings of International Joint Conference on Artificial Intelligence, 2020.
Duarte FJ, Fred AL, Lourenço A, Rodrigues MF. Weighting cluster ensembles in evidence accumulation clustering; In: Proceedings of the 2005 portuguese conference on artificial intelligence, 2005. IEEE.
Fred AL, Jain AK. Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell. 2005;27(6):835–50.
Article Google Scholar
Fern XZ, Brodley CE. Solving cluster ensemble problems by bipartite graph partitioning; In: Proceedings of the twenty-first international conference on Machine learning, 2004.
Minaei-Bidgoli B, Topchy AP, Punch WF. A Comparison of Resampling Methods for Clustering Ensembles. In: Proceedings of the IC-AI, 2004.
Liu H, Shao M, Li S, Fu Y. Infinite ensemble for image clustering; In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016.
Iam-On N, Boongoen T, Garrett S, Price C. A link-based approach to the cluster ensemble problem. IEEE Trans Pattern Anal Mach Intell. 2011;33(12):2396–409.
Article Google Scholar
Huang D, Wang CD, Peng H, Lai J, Kwoh CK. Enhanced ensemble clustering via fast propagation of cluster-wise similarities. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2018.
Liu H, Liu T, Wu J, Tao D, Fu Y. Spectral ensemble clustering; In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, 2015.
Zhou J, Zheng H, Pan L. Ensemble clustering based on dense representation. Neurocomputing. 2019;357:66–76.
Article Google Scholar
Bagherinia A, Minaei-Bidgoli B, Hosseinzadeh M, Parvin H. Reliability-based fuzzy clustering ensemble. Fuzzy Sets Syst. 2020.
Johnson SC. Hierarchical clustering schemes. Psychometrika. 1967;32(3):241–54.
Article Google Scholar
Von Luxburg U. A tutorial on spectral clustering. Stat Comput. 2007;17(4):395–416.
Article MathSciNet Google Scholar
Huang D, Lai JH, Wang CD. Robust ensemble clustering using probability trajectories. IEEE Trans Knowl Data Eng. 2015;28(5):1312–26.
Article Google Scholar
Huang D, Wang CD, Wu JS, Lai JH, Kwoh CK. Ultra-scalable spectral clustering and ensemble clustering. IEEE Trans Knowl Data Eng. 2019;32(6):1212–26.
Article Google Scholar
Franek L, Jiang X. Ensemble clustering by means of clustering embedding in vector spaces. Pattern Recogn. 2014;47(2):833–42.
Article Google Scholar
Bai L, Liang J, Cao F. A multiple k-means clustering ensemble algorithm to find nonlinearly separable clusters. Information Fusion. 2020;61:36–47.
Article Google Scholar
Tao Z, Liu H, Li S, Ding Z, Fu Y. Robust spectral ensemble clustering via rank minimization. ACM Transactions on Knowledge Discovery from Data (TKDD). 2019;13(1):1–25.
Article Google Scholar
Kang Z, Zhao X, Peng C, et al. Partition level multiview subspace clustering. Neural Netw. 2020;122:279–88.
Article Google Scholar
Otsu N. A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern. 1979;9(1):62–6.
Article MathSciNet Google Scholar
Sezgin M, Sankur B. Survey over image thresholding techniques and quantitative performance evaluation. J Electron Imaging. 2004;13(1):146–65.
Article Google Scholar
Fu L, Medico E. FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data. BMC Bioinformatics. 2007;8(1):3.
Article Google Scholar
Jain AK, Law MH. Data clustering: A user’s dilemma; In: Proceedings of the International conference on pattern recognition and machine intelligence, 2005. Springer.
Ultsch A. Clustering with SOM: U*C; In: Proceedings of the Proc Workshop on Self-organizing Maps, 2005.
Ayad HG, Kamel MS. On voting-based consensus of cluster ensembles. Pattern Recogn. 2010;43(5):1943–53.
Article Google Scholar
Zhou ZH, Tang W. Clusterer ensemble. Knowl-Based Syst. 2006;19(1):77–83.
Article Google Scholar
Iam-on N, Garrett S. Linkclue: A matlab package for link-based cluster ensembles. J Stat Softw. 2010;36(9):1–36.
Article Google Scholar
Yang Y. An evaluation of statistical approaches to text categorization. Inf Retrieval. 1999;1(1–2):69–90.
Article Google Scholar
Hubert L, Arabie P. Comparing partitions. J Classif. 1985;2(1):193–218.
Article Google Scholar

Download references

Funding

This work was supported in part by the Natural Science Foundation of China under Grant 61402004 and 61672034, in part by the Key Research and Development Program of Anhui Province under Grant 1804d08020309, and in part by the Natural Science Foundation of Anhui Province under Grant 1908085MF188, and in part by the Key Project of Natural Science Foundation of Anhui Provincial Department of Education under Grant KJ2020A0041.

Author information

Authors and Affiliations

School of Computer Science and Technology, Anhui University, Hefei, 230601, China
Xia Ji, Shuaishuai Liu, Peng Zhao & Xuejun Li
Science and Technology Information Office, Department of Public Security of Anhui Province, Hefei, 230601, China
Qiong Liu

Authors

Xia Ji
View author publications
You can also search for this author in PubMed Google Scholar
Shuaishuai Liu
View author publications
You can also search for this author in PubMed Google Scholar
Peng Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Xuejun Li
View author publications
You can also search for this author in PubMed Google Scholar
Qiong Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuejun Li.

Ethics declarations

Ethical Approval

This article does not contain any studies with human or animal subjects performed by any of the authors.

Informed Consent

Informed consent was obtained from all individual participants included in the study.

Conflict of Interest

The authors declare no competing interests.

Additional information

The original vision of this article has been revised. Funding information has been corrected.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ji, X., Liu, S., Zhao, P. et al. Clustering Ensemble Based on Sample’s Certainty. Cogn Comput 13, 1034–1046 (2021). https://doi.org/10.1007/s12559-021-09876-z

Download citation

Received: 17 October 2020
Accepted: 29 April 2021
Published: 25 May 2021
Issue Date: July 2021
DOI: https://doi.org/10.1007/s12559-021-09876-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Clustering Ensemble Based on Sample’s Certainty

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Consistency-oriented clustering ensemble via data reconstruction

PCS-granularity weighted ensemble clustering via Co-association matrix

Clustering ensemble extraction: a knowledge reuse framework

Change history

23 November 2021

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethical Approval

Informed Consent

Conflict of Interest

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Clustering Ensemble Based on Sample’s Certainty

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Consistency-oriented clustering ensemble via data reconstruction

PCS-granularity weighted ensemble clustering via Co-association matrix

Clustering ensemble extraction: a knowledge reuse framework

Explore related subjects

Change history

23 November 2021

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethical Approval

Informed Consent

Conflict of Interest

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation