Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3318299.3318308acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlcConference Proceedingsconference-collections
research-article

Ensemble-Initialized k-Means Clustering

Published: 22 February 2019 Publication History

Abstract

As one of the most classical clustering techniques, the k-means clustering has been widely used in various areas over the past few decades. Despite its significant success, there are still several challenging issues in the k-means clustering research, one of which lies in its high sensitivity to the selection of the initial cluster centers. In this paper, we propose a new cluster center initialization method for k-means based on ensemble learning. Specifically, an ensemble of base clusterings are first constructed by using multiple k-means clusterers with random initializations. Then, a co-association matrix is computed for the base clusterings, upon which the agglomerative clustering algorithm can thereby be performed to build a pre-clustering result. From the pre-clustering, the set of initial cluster centers are obtained and then used for the final k-means clustering process. Experiments on multiple real-world datasets have demonstrated the superiority of the proposed method.

References

[1]
M. B. Al-Daoud. A new algorithm for cluster initialization. In Proc. of World Enformatika Conference (WEC), 2005.
[2]
J. Alcalá-Fdez, A. Fernández, J. Luengo, J. Derrac, S. Garcia, L. Sánchez, and F. Herrera. Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic & Soft Computing, 17(2--3):255--287, 2011.
[3]
D. Arthur and S. Vassilvitskii. K-means++: The advantages of careful seeding. In Proc. of Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1027--1035, 2007.
[4]
K. Bache and M. Lichman. UCI machine learning repository, 2017.
[5]
C. Carpineto and G. Romano. Consensus clustering based on a new probabilistic rand index with application to subtopic retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(12):2315--2326, 2012.
[6]
M. E. Celebi, H. A. Kingravi, and P. A. Vela. A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Systems with Applications, 40(1):200--210, 2013.
[7]
Y. Fan, N. Li, C. Li, Z. Ma, L. J. Latecki, and K. Su. Restart and random walk in local search for maximum vertex weight cliques with evaluations in clustering aggregation. In Proc. of International Joint Conference on Artificial Intelligence (IJCAI), pages 622--630, 2017.
[8]
A. L. N. Fred and A. K. Jain. Combining multiple clusterings using evidence accumulation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6):835--850, 2005.
[9]
M. A. Hasan, V. Chaoji, S. Salem, and M. J. Zaki. Robust partitional clustering by outlier and density insensitive seeding. Pattern Recognition Letters, 30(11):994--1002, 2009.
[10]
D. Huang, J. Lai, and C.-D. Wang. Ensemble clustering using factor graph. Pattern Recognition, 50:131--142, 2016.
[11]
D. Huang, J.-H. Lai, and C.-D. Wang. Exploiting the wisdom of crowd: A multi-granularity approach to clustering ensemble. In Proc. of International Conference on Intelligence Science and Big Data Engineering (IScIDE), pages 112--119, 2013.
[12]
D. Huang, J.-H. Lai, and C.-D. Wang. Combining multiple clusterings via crowd agreement estimation and multi-granularity link analysis. Neurocomputing, 170:240--250, 2015.
[13]
D. Huang, J.-H. Lai, and C.-D. Wang. Robust ensemble clustering using probability trajectories. IEEE Transactions on Knowledge and Data Engineering, 28(5):1312--1326, 2016.
[14]
D. Huang, J.-H. Lai, C.-D. Wang, and P. C. Yuen. Ensembling over-segmentations: From weak evidence to strong segmentation. Neurocomputing, 207:416--427, 2016.
[15]
D. Huang, C. Wang, J. Lai, Y. Liang, S. Bian, and Y. Chen. Ensemble-driven support vector clustering: From ensemble learning to automatic parameter estimation. In Proc. of International Conference on Pattern Recognition (ICPR), pages 444--449, 2016.
[16]
D. Huang, C.-D. Wang, and J.-H. Lai. LWMC: A locally weighted meta-clustering algorithm for ensemble clustering. In Proc. of International Conference on Neural Information Processing (ICONIP), pages 167--176, 2017.
[17]
D. Huang, C. D. Wang, and J. H. Lai. Locally weighted ensemble clustering. IEEE Transactions on Cybernetics, 48(5):1460--1473, 2018.
[18]
D. Huang, C.-D. Wang, H. Peng, J. Lai, and C.-K. Kwoh. Enhanced ensemble clustering via fast propagation of cluster-wise similarities. IEEE Transactions on Systems, Man, and Cybernetics: Systems, in press, 2018.
[19]
D. Huang, C.-D. Wang, J.-S. Wu, J.-H. Lai, and C.-K. Kwoh. Ultra-Scalable Spectral Clustering and Ensemble Clustering. IEEE Transactions on Knowledge and Data Engineering, in press, 2019.
[20]
A. K. Jain. Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 31(8):651--666, 2010.
[21]
T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu. An efficient k-means clustering algorithm: analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7):881--892, 2002.
[22]
N. Li and L. J. Latecki. Clustering aggregation as maximum-weight independent set. In Advances in Neural Information Processing Systems (NIPS), pages 782--790, 2012.
[23]
T. Li and C. Ding. Weighted consensus clustering. In Proc. of SIAM International Conference on Data Mining (SDM), pages 798--809, 2008.
[24]
H. Liu, T. Liu, J. Wu, D. Tao, and Y. Fu. Spectral ensemble clustering. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 715--724, 2015.
[25]
J. MacQueen. Some methods for classification and analysis of multivariate observations. In Proc. of Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages 281--297, 1967.
[26]
S. J. Redmond and C. Heneghan. A method for initialising the k-means clustering algorithm using kd-trees. Pattern Recognition Letters, 28(8):965--973, 2007.
[27]
S. Z. Selim and M. A. Ismail. K-means-type algorithms: A generalized convergence theorem and characterization of local optimality. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6(1):81--87, 1984.
[28]
A. Strehl and J. Ghosh. Cluster ensembles: A knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3:583--617, 2003.
[29]
S. Vega-Pons, J. Ruiz-Shulcloper, and A. Guerra-Gandón. Weighted association based methods for the combination of heterogeneous partitions. Pattern Recognition Letters, 32(16):2163--2170, 2011.
[30]
T. Wang. CA-Tree: A hierarchical structure for efficient and scalable coassociation-based cluster ensembles. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 41(3):686--698, 2011.
[31]
Z. Yu, Z. Kuang, J. Liu, H. Chen, J. Zhang, J. You, H. S. Wong, and G. Han. Adaptive ensembling of semi-supervised clustering solutions. IEEE Transactions on Knowledge and Data Engineering, 29(8):1577--1590, 2017.
[32]
C. Zhong, X. Yue, Z. Zhang, and J. Lei. A clustering ensemble: Two-level-refined co-association matrix with path-based transformation. Pattern Recognition, 48(8):2699--2709, 2015.

Cited By

View all
  • (2022)A privacy‐preserving recommendation method with clustering and locality‐sensitive hashingComputational Intelligence10.1111/coin.1254939:1(121-144)Online publication date: 16-Sep-2022
  • (2022)Web table data integration based on smart campus scenarios to resolve name disambiguation of scientific research personnel2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC54236.2022.00106(602-607)Online publication date: Jun-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICMLC '19: Proceedings of the 2019 11th International Conference on Machine Learning and Computing
February 2019
563 pages
ISBN:9781450366007
DOI:10.1145/3318299
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • Southwest Jiaotong University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data clustering
  2. consensus clustering
  3. ensemble clustering
  4. k-means

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • NSFC

Conference

ICMLC '19

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)A privacy‐preserving recommendation method with clustering and locality‐sensitive hashingComputational Intelligence10.1111/coin.1254939:1(121-144)Online publication date: 16-Sep-2022
  • (2022)Web table data integration based on smart campus scenarios to resolve name disambiguation of scientific research personnel2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC54236.2022.00106(602-607)Online publication date: Jun-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media