Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Cluster’s Quality Evaluation and Selective Clustering Ensemble

Published: 27 June 2018 Publication History
  • Get Citation Alerts
  • Abstract

    Clustering ensemble has drawn much attention in recent years due to its ability to generate a high quality and robust partition result. Weighted clustering ensemble and selective clustering ensemble are two general ways to further improve the performance of a clustering ensemble method. Existing weighted clustering ensemble methods assign the same weight to each cluster in a partition of the ensemble. Since the qualities of the clusters in a partition are different, the clusters should be weighted differently. To address this issue, this article proposes a new measure to calculate the similarity between a cluster and a partition. Theoretically, this measure is effective in handling two problems in measuring the quality of a cluster, which are defined as the symmetric problem and the context meaning problem. In addition, some properties of the proposed measure are analyzed. This measure can be easily expanded to a clustering performance measure that calculates the similarity between two partitions. As a result of this measure, we propose a novel selective clustering ensemble framework, which considers the differences between the objective of the ensemble selection stage and the object of the ensemble integration stage in the selective clustering ensemble. To verify the performance of the new measure, we compare the performance of the measure with the two existing measures in weighting clusters. The experiments show that the proposed measure is more effective. To verify the performance of the novel framework, four existing state-of-the-art selective clustering ensemble frameworks are employed as references. The experiments show that the proposed framework is statistically better than the others on 17 UCI benchmark datasets, 8 document datasets, and the Olivetti Face Database.

    References

    [1]
    Ayan Acharya, Eduardo R. Hruschka, Joydeep Ghosh, and Sreangsu Acharyya. 2014. An optimization framework for combining ensembles of classifiers and clusterers with applications to nontransductive semisupervised learning and transfer learning. ACM Transactions on Knowledge Discovery from Data 9, 1 (2014), 1.
    [2]
    Ebrahim Akbari, Halina Mohamed Dahlan, Roliana Ibrahim, and Hosein Alizadeh. 2015. Hierarchical cluster ensemble selection. Engineering Applications of Artificial Intelligence 39 (2015), 146--156.
    [3]
    Hosein Alizadeh, Behrouz Minaei-Bidgoli, and Hamid Parvin. 2014. Cluster ensemble selection based on a new cluster stability measure. Intelligent Data Analysis 18, 3 (2014), 389--408.
    [4]
    Javad Azimi and Xiaoli Fern. 2009. Adaptive cluster ensemble selection. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). Vol. 9, 992--997.
    [5]
    Avrim L. Blum and Pat Langley. 1997. Selection of relevant features and examples in machine learning. Artificial Intelligence 97, 1 (1997), 245--271.
    [6]
    Si Chen, Gongde Guo, and Lifei Chen. 2010. A new over-sampling method based on cluster ensembles. In Proceedings of the 2010 IEEE 24th International Conference on Advanced Information Networking and Applications Workshops (WAINA). IEEE, 599--604.
    [7]
    Xiaoli Zhang Fern and Carla E. Brodley. 2003. Random projection for high dimensional data clustering: A cluster ensemble approach. In Proceedings of the International Conference on Machine Learning (ICML). Vol. 3, 186--193.
    [8]
    Xiaoli Zhang Fern and Carla E. Brodley. 2004. Solving cluster ensemble problems by bipartite graph partitioning. In Proceedings of the 21st International Conference on Machine Learning. ACM, 36.
    [9]
    Xiaoli Z. Fern and Wei Lin. 2008. Cluster ensemble selection. Statistical Analysis and Data Mining 1, 3 (2008), 128--141.
    [10]
    Bernd Fischer and Joachim M. Buhmann. 2003. Bagging for path-based clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 25, 11 (2003), 1411--1415.
    [11]
    Ana L. N. Fred and Anil K. Jain. 2005. Combining multiple clusterings using evidence accumulation. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 6 (2005), 835--850.
    [12]
    Brendan J. Frey and Delbert Dueck. 2007. Clustering by passing messages between data points. Science 315, 5814 (2007), 972--976.
    [13]
    Aristides Gionis, Heikki Mannila, and Panayiotis Tsaparas. 2007. Clustering aggregation. ACM Transactions on Knowledge Discovery from Data 1, 1 (2007), 4.
    [14]
    Stefan T. Hadjitodorov, Ludmila I. Kuncheva, and Ludmila P. Todorova. 2006. Moderate diversity for better cluster ensembles. Information Fusion 7, 3 (2006), 264--275.
    [15]
    Yi Hong, Sam Kwong, Hanli Wang, and Qingsheng Ren. 2009. Resampling-based selective clustering ensembles. Pattern Recognition Letters 30, 3 (2009), 298--305.
    [16]
    Dong Huang, Jianhuang Lai, and Chang-Dong Wang. 2016. Ensemble clustering using factor graph. Pattern Recognition 50 (2016), 131--142.
    [17]
    Shudong Huang, Hongjun Wang, Dingcheng Li, Yan Yang, and Tianrui Li. 2015. Spectral co-clustering ensemble. Knowledge-Based Systems 84 (2015), 46--55.
    [18]
    Lawrence Hubert and Phipps Arabie. 1985. Comparing partitions. Journal of Classification 2, 1 (1985), 193--218.
    [19]
    Natthakan Iam-On, Tossapon Boongoen, Simon Garrett, and Chris Price. 2011. A link-based approach to the cluster ensemble problem. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 12 (2011), 2396--2409.
    [20]
    Anil K. Jain, Robert P. W. Duin, and Jianchang Mao. 2000. Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 1 (2000), 4--37.
    [21]
    Jianhua Jia, Xuan Xiao, Bingxiang Liu, and Licheng Jiao. 2011. Bagging-based spectral clustering ensemble selection. Pattern Recognition Letters 32, 10 (2011), 1456--1467.
    [22]
    Liping Jing, Kuang Tian, and Joshua Z. Huang. 2015. Stratified feature sampling method for ensemble clustering of high dimensional data. Pattern Recognition 48, 11 (2015), 3688--3702.
    [23]
    Stephen C. Johnson. 1967. Hierarchical clustering schemes. Psychometrika 32, 3 (1967), 241--254.
    [24]
    Imran Khan, Joshua Z. Huang, and Kamen Ivanov. 2016. Incremental density-based ensemble clustering over evolving data streams. Neurocomputing 191 (2016), 34--43.
    [25]
    Ludmila I. Kuncheva and Stefan Todorov Hadjitodorov. 2004. Using diversity in cluster ensembles. In Proceedings of the 2004 IEEE International Conference On Systems, Man and Cybernetics. Vol. 2, IEEE, 1214--1219.
    [26]
    Ludmila I. Kuncheva and Dmitry P. Vetrov. 2006. Evaluation of stability of k-means cluster ensembles with respect to random initialization. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 11 (2006), 1798--1808.
    [27]
    Martin H. C. Law, Alexander P. Topchy, and Anil K. Jain. 2004. Multiobjective data clustering. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). Vol. 2, IEEE, II--II.
    [28]
    Feijiang Li, Yuhua Qian, Jieting Wang, and Jiye Liang. 2017. Multigranulation information fusion: A dempster-shafer evidence theory-based clustering ensemble method. Information Sciences 378 (2017), 389--409.
    [29]
    Tao Li and Chris Ding. 2008. Weighted consensus clustering. In Proceedings of the 2008 SIAM International Conference on Data Mining. SIAM, 798--809.
    [30]
    Yan Li, Yunming Ye, Zhaocai Sun, Edward Hung, Joshua Huang, and Yueping Li. 2013. An ensemble of decision cluster crotches for classification of high dimensional data. Knowledge-Based Systems 43 (2013), 63--73.
    [31]
    Murilo Coelho Naldi, A. C. P. L. F. Carvalho, and R. J. G. B. Campello. 2013. Cluster ensemble selection based on relative validity indexes. Data Mining and Knowledge Discovery 27, 2 (2013), 1--31.
    [32]
    Yuhua Qian, Feijiang Li, Jiye Liang, Bing Liu, and Chuangyin Dang. 2016. Space structure and clustering of categorical data. IEEE Transactions on Neural Networks and Learning Systems 27, 10 (2016), 2047--2059.
    [33]
    Yuhua Qian, Jiye Liang, Witold Pedrycz, and Chuangyin Dang. 2010. Positive approximation: An accelerator for attribute reduction in rough set theory. Artificial Intelligence 174, 9--10 (2010), 597--618.
    [34]
    Yuhua Qian, Hang Xu, Jiye Liang, Bing Liu, and Jieting Wang. 2015. Fusing monotonic decision trees. IEEE Transactions on Knowledge and Data Engineering 27, 10 (2015), 2717--2728.
    [35]
    Parisa Rastin and Rushed Kanawati. 2015. A multiplex-network based approach for clustering ensemble selection. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, 1332--1339.
    [36]
    David N. Reshef, Yakir A. Reshef, Hilary K. Finucane, Sharon R. Grossman, Gilean McVean, Peter J. Turnbaugh, Eric S. Lander, Michael Mitzenmacher, and Pardis C. Sabeti. 2011. Detecting novel associations in large data sets. Science 334, 6062 (2011), 1518--1524.
    [37]
    Ferdinando S. Samaria and Andy C. Harter. 1994. Parameterisation of a stochastic model for human face identification. In Proceedings of the 2nd IEEE Workshop on Applications of Computer Vision, 1994. IEEE, 138--142.
    [38]
    Michael Steinbach, George Karypis, and Vipin Kumar. 2000. A comparison of document clustering techniques. In Proceedings of the KDD Workshop on Text Mining. Vol. 400, Boston, 525--526.
    [39]
    Alexander Strehl and Joydeep Ghosh. 2002. Cluster ensembles—A knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3 (Dec. 2002), 583--617.
    [40]
    Alexander Topchy, Anil K. Jain, and William Punch. 2005. Clustering ensembles: Models of consensus and weak partitions. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 12 (2005), 1866--1881.
    [41]
    Jun Wang, Fu-lai Chung, Shitong Wang, and Zhaohong Deng. 2014. Double indices-induced FCM clustering and its integration with fuzzy subspace clustering. Pattern Analysis and Applications 17, 3 (2014), 549--566.
    [42]
    Junjie Wu, Hongfu Liu, Hui Xiong, Jie Cao, and Jian Chen. 2015. K-means-based consensus clustering: A unified view. IEEE Transactions on Knowledge and Data Engineering 27, 1 (2015), 155--169.
    [43]
    Sen Xu, Kung-Sik Chan, Jun Gao, Xiufang Xu, Xianfeng Li, Xiaopeng Hua, and Jing An. 2016. An integrated K-means--Laplacian cluster ensemble approach for document datasets. Neurocomputing 214 (2016), 495--507.
    [44]
    Fan Yang, Xuan Li, Qianmu Li, and Tao Li. 2014. Exploring the diversity in cluster ensemble generation: Random sampling and random projection. Expert Systems with Applications 41, 10 (2014), 4844--4866.
    [45]
    Yiming Yang. 1999. An evaluation of statistical approaches to text categorization. Information Retrieval 1, 1 (1999), 69--90.
    [46]
    Yun Yang and Ke Chen. 2011. Temporal data clustering via weighted clustering ensemble with different representations. IEEE Transactions on Knowledge and Data Engineering 23, 2 (2011), 307--320.
    [47]
    Muhammad Yousefnezhad, Sheng-Jun Huang, and Daoqiang Zhang. 2018. WoCE: A framework for clustering ensemble by exploiting the wisdom of crowds theory. IEEE Transactions on Cybernetics 48, 2 (2018), 486--499.
    [48]
    Muhammad Yousefnezhad and Daoqiang Zhang. 2015. Weighted spectral cluster ensemble. In Proceedings of the 2015 IEEE International Conference on Data Mining. IEEE, 549--558.
    [49]
    Zhiwen Yu, Le Li, Jiming Liu, Jun Zhang, and Guoqiang Han. 2015. Adaptive noise immune cluster ensemble using affinity propagation. IEEE Transactions on Knowledge and Data Engineering 27, 12 (2015), 3176--3189.
    [50]
    Li Zheng, Tao Li, and Chris Ding. 2014. A framework for hierarchical ensemble clustering. ACM Transactions on Knowledge Discovery from Data 9, 2 (2014), 9.
    [51]
    Zhi-Hua Zhou and Wei Tang. 2006. Clusterer ensemble. Knowledge-Based Systems 19, 1 (2006), 77--83.

    Cited By

    View all
    • (2024)Data poisoning attacks on traffic state estimation and predictionTransportation Research Part C: Emerging Technologies10.1016/j.trc.2024.104577(104577)Online publication date: Apr-2024
    • (2024)Clustering ensemble extraction: a knowledge reuse frameworkAdvances in Data Analysis and Classification10.1007/s11634-024-00588-4Online publication date: 27-Mar-2024
    • (2023)Local Community Detection in Graph Streams with AnchorsInformation10.3390/info1406033214:6(332)Online publication date: 12-Jun-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 12, Issue 5
    October 2018
    354 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/3234931
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 June 2018
    Accepted: 01 April 2018
    Revised: 01 April 2018
    Received: 01 October 2017
    Published in TKDD Volume 12, Issue 5

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Clustering ensemble
    2. cluster quality
    3. selective clustering ensemble
    4. weighted clustering ensemble

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • National Natural Science Fund of China
    • Program for the Outstanding Innovative Teams of Higher Learning Institutions of Shanxi
    • Program for New Century Excellent Talents in University
    • Program for the Young San Jin Scholars of Shanxi
    • Hong Kong SAR Government

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)28
    • Downloads (Last 6 weeks)3

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Data poisoning attacks on traffic state estimation and predictionTransportation Research Part C: Emerging Technologies10.1016/j.trc.2024.104577(104577)Online publication date: Apr-2024
    • (2024)Clustering ensemble extraction: a knowledge reuse frameworkAdvances in Data Analysis and Classification10.1007/s11634-024-00588-4Online publication date: 27-Mar-2024
    • (2023)Local Community Detection in Graph Streams with AnchorsInformation10.3390/info1406033214:6(332)Online publication date: 12-Jun-2023
    • (2023)A Quantitative Social Network Analysis of the Character Relationships in the MahabharataHeritage10.3390/heritage61103666:11(7009-7030)Online publication date: 29-Oct-2023
    • (2023)Applications and Challenges in Healthcare Big Data: A Strategic ReviewCurrent Medical Imaging Reviews10.2174/157340561866622030811370719:1(27-36)Online publication date: Jan-2023
    • (2023)Fuzzy Ensemble Clustering Based on Self-Coassociation and Prototype PropagationIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2023.326225631:10(3610-3623)Online publication date: Oct-2023
    • (2022)A novel ensemble convex hull-based classification model for bevel gearbox fault diagnosisMeasurement Science and Technology10.1088/1361-6501/aca8c134:3(035017)Online publication date: 16-Dec-2022
    • (2022)Community detection in error-prone environments based on particle cooperation and competition with distance dynamicsPhysica A: Statistical Mechanics and its Applications10.1016/j.physa.2022.128178607(128178)Online publication date: Dec-2022
    • (2022)SpecRp: A spectral-based community embedding algorithmMachine Learning with Applications10.1016/j.mlwa.2022.1003269(100326)Online publication date: Sep-2022
    • (2022)Clustering mixed type data: a space structure-based approachInternational Journal of Machine Learning and Cybernetics10.1007/s13042-022-01602-x13:9(2799-2812)Online publication date: 5-Jul-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media