research-article

Ensemble-Initialized k-Means Clustering

Authors:

Dong HuangAuthors Info & Claims

ICMLC '19: Proceedings of the 2019 11th International Conference on Machine Learning and Computing

Pages 65 - 69

https://doi.org/10.1145/3318299.3318308

Published: 22 February 2019 Publication History

Abstract

As one of the most classical clustering techniques, the k-means clustering has been widely used in various areas over the past few decades. Despite its significant success, there are still several challenging issues in the k-means clustering research, one of which lies in its high sensitivity to the selection of the initial cluster centers. In this paper, we propose a new cluster center initialization method for k-means based on ensemble learning. Specifically, an ensemble of base clusterings are first constructed by using multiple k-means clusterers with random initializations. Then, a co-association matrix is computed for the base clusterings, upon which the agglomerative clustering algorithm can thereby be performed to build a pre-clustering result. From the pre-clustering, the set of initial cluster centers are obtained and then used for the final k-means clustering process. Experiments on multiple real-world datasets have demonstrated the superiority of the proposed method.

References

[1]

M. B. Al-Daoud. A new algorithm for cluster initialization. In Proc. of World Enformatika Conference (WEC), 2005.

[2]

J. Alcalá-Fdez, A. Fernández, J. Luengo, J. Derrac, S. Garcia, L. Sánchez, and F. Herrera. Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic & Soft Computing, 17(2--3):255--287, 2011.

[3]

D. Arthur and S. Vassilvitskii. K-means++: The advantages of careful seeding. In Proc. of Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1027--1035, 2007.

Digital Library

[4]

K. Bache and M. Lichman. UCI machine learning repository, 2017.

[5]

C. Carpineto and G. Romano. Consensus clustering based on a new probabilistic rand index with application to subtopic retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(12):2315--2326, 2012.

Digital Library

[6]

M. E. Celebi, H. A. Kingravi, and P. A. Vela. A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Systems with Applications, 40(1):200--210, 2013.

Digital Library

[7]

Y. Fan, N. Li, C. Li, Z. Ma, L. J. Latecki, and K. Su. Restart and random walk in local search for maximum vertex weight cliques with evaluations in clustering aggregation. In Proc. of International Joint Conference on Artificial Intelligence (IJCAI), pages 622--630, 2017.

Digital Library

[8]

A. L. N. Fred and A. K. Jain. Combining multiple clusterings using evidence accumulation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6):835--850, 2005.

Digital Library

[9]

M. A. Hasan, V. Chaoji, S. Salem, and M. J. Zaki. Robust partitional clustering by outlier and density insensitive seeding. Pattern Recognition Letters, 30(11):994--1002, 2009.

Digital Library

[10]

D. Huang, J. Lai, and C.-D. Wang. Ensemble clustering using factor graph. Pattern Recognition, 50:131--142, 2016.

Digital Library

[11]

D. Huang, J.-H. Lai, and C.-D. Wang. Exploiting the wisdom of crowd: A multi-granularity approach to clustering ensemble. In Proc. of International Conference on Intelligence Science and Big Data Engineering (IScIDE), pages 112--119, 2013.

[12]

D. Huang, J.-H. Lai, and C.-D. Wang. Combining multiple clusterings via crowd agreement estimation and multi-granularity link analysis. Neurocomputing, 170:240--250, 2015.

Digital Library

[13]

D. Huang, J.-H. Lai, and C.-D. Wang. Robust ensemble clustering using probability trajectories. IEEE Transactions on Knowledge and Data Engineering, 28(5):1312--1326, 2016.

Digital Library

[14]

D. Huang, J.-H. Lai, C.-D. Wang, and P. C. Yuen. Ensembling over-segmentations: From weak evidence to strong segmentation. Neurocomputing, 207:416--427, 2016.

Digital Library

[15]

D. Huang, C. Wang, J. Lai, Y. Liang, S. Bian, and Y. Chen. Ensemble-driven support vector clustering: From ensemble learning to automatic parameter estimation. In Proc. of International Conference on Pattern Recognition (ICPR), pages 444--449, 2016.

[16]

D. Huang, C.-D. Wang, and J.-H. Lai. LWMC: A locally weighted meta-clustering algorithm for ensemble clustering. In Proc. of International Conference on Neural Information Processing (ICONIP), pages 167--176, 2017.

Digital Library

[17]

D. Huang, C. D. Wang, and J. H. Lai. Locally weighted ensemble clustering. IEEE Transactions on Cybernetics, 48(5):1460--1473, 2018.

[18]

D. Huang, C.-D. Wang, H. Peng, J. Lai, and C.-K. Kwoh. Enhanced ensemble clustering via fast propagation of cluster-wise similarities. IEEE Transactions on Systems, Man, and Cybernetics: Systems, in press, 2018.

[19]

D. Huang, C.-D. Wang, J.-S. Wu, J.-H. Lai, and C.-K. Kwoh. Ultra-Scalable Spectral Clustering and Ensemble Clustering. IEEE Transactions on Knowledge and Data Engineering, in press, 2019.

[20]

A. K. Jain. Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 31(8):651--666, 2010.

Digital Library

[21]

T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu. An efficient k-means clustering algorithm: analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7):881--892, 2002.

Digital Library

[22]

N. Li and L. J. Latecki. Clustering aggregation as maximum-weight independent set. In Advances in Neural Information Processing Systems (NIPS), pages 782--790, 2012.

Digital Library

[23]

T. Li and C. Ding. Weighted consensus clustering. In Proc. of SIAM International Conference on Data Mining (SDM), pages 798--809, 2008.

[24]

H. Liu, T. Liu, J. Wu, D. Tao, and Y. Fu. Spectral ensemble clustering. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 715--724, 2015.

Digital Library

[25]

J. MacQueen. Some methods for classification and analysis of multivariate observations. In Proc. of Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages 281--297, 1967.

[26]

S. J. Redmond and C. Heneghan. A method for initialising the k-means clustering algorithm using kd-trees. Pattern Recognition Letters, 28(8):965--973, 2007.

Digital Library

[27]

S. Z. Selim and M. A. Ismail. K-means-type algorithms: A generalized convergence theorem and characterization of local optimality. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6(1):81--87, 1984.

Digital Library

[28]

A. Strehl and J. Ghosh. Cluster ensembles: A knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3:583--617, 2003.

Digital Library

[29]

S. Vega-Pons, J. Ruiz-Shulcloper, and A. Guerra-Gandón. Weighted association based methods for the combination of heterogeneous partitions. Pattern Recognition Letters, 32(16):2163--2170, 2011.

Digital Library

[30]

T. Wang. CA-Tree: A hierarchical structure for efficient and scalable coassociation-based cluster ensembles. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 41(3):686--698, 2011.

Digital Library

[31]

Z. Yu, Z. Kuang, J. Liu, H. Chen, J. Zhang, J. You, H. S. Wong, and G. Han. Adaptive ensembling of semi-supervised clustering solutions. IEEE Transactions on Knowledge and Data Engineering, 29(8):1577--1590, 2017.

Digital Library

[32]

C. Zhong, X. Yue, Z. Zhang, and J. Lei. A clustering ensemble: Two-level-refined co-association matrix with path-based transformation. Pattern Recognition, 48(8):2699--2709, 2015.

Digital Library

Cited By

Zhang HLi QXu JMeng SHou J(2022)A privacy‐preserving recommendation method with clustering and locality‐sensitive hashingComputational Intelligence10.1111/coin.1254939:1(121-144)Online publication date: 16-Sep-2022
https://doi.org/10.1111/coin.12549
Jin JChen JZhang JLiu TQian RLiu FZhou LRen Y(2022)Web table data integration based on smart campus scenarios to resolve name disambiguation of scientific research personnel2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC54236.2022.00106(602-607)Online publication date: Jun-2022
https://doi.org/10.1109/COMPSAC54236.2022.00106

Index Terms

Ensemble-Initialized k-Means Clustering
1. Information systems
  1. Information systems applications
    1. Data mining
      1. Clustering
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Unsupervised learning and clustering

Recommendations

The Projected Dip-means Clustering Algorithm
SETN '18: Proceedings of the 10th Hellenic Conference on Artificial Intelligence

One of the major research issues in data clustering concerns the estimation of number of clusters. In previous work, the dip-means clustering algorithm has been proposed as a successful attempt to tackle this problem. Dip-means is an incremental ...
Improvement in k-Means Clustering Algorithm Using Data Clustering
ICCUBEA '15: Proceedings of the 2015 International Conference on Computing Communication Control and Automation

The set of objects having same characteristics are organized in groups and clusters of these objects reformed known as Data Clustering. It is an unsupervisedlearning technique for classification of data. K-means algorithm is widely used and famous ...
Clustering stability-based Evolutionary K-Means

Evolutionary K-Means (EKM), which combines K-Means and genetic algorithm, solves K-Means' initiation problem by selecting parameters automatically through the evolution of partitions. Currently, EKM algorithms usually choose silhouette index as cluster ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICMLC '19: Proceedings of the 2019 11th International Conference on Machine Learning and Computing

February 2019

563 pages

ISBN:9781450366007

DOI:10.1145/3318299

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Southwest Jiaotong University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

NSFC

Conference

ICMLC '19

ICMLC '19: 2019 11th International Conference on Machine Learning and Computing

February 22 - 24, 2019

Zhuhai, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
245
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang HLi QXu JMeng SHou J(2022)A privacy‐preserving recommendation method with clustering and locality‐sensitive hashingComputational Intelligence10.1111/coin.1254939:1(121-144)Online publication date: 16-Sep-2022
https://doi.org/10.1111/coin.12549
Jin JChen JZhang JLiu TQian RLiu FZhou LRen Y(2022)Web table data integration based on smart campus scenarios to resolve name disambiguation of scientific research personnel2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC54236.2022.00106(602-607)Online publication date: Jun-2022
https://doi.org/10.1109/COMPSAC54236.2022.00106

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents