Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A Framework for Hierarchical Ensemble Clustering

Published: 23 September 2014 Publication History
  • Get Citation Alerts
  • Abstract

    Ensemble clustering, as an important extension of the clustering problem, refers to the problem of combining different (input) clusterings of a given dataset to generate a final (consensus) clustering that is a better fit in some sense than existing clusterings. Over the past few years, many ensemble clustering approaches have been developed. However, most of them are designed for partitional clustering methods, and few research efforts have been reported for ensemble hierarchical clustering methods. In this article, a hierarchical ensemble clustering framework that can naturally combine both partitional clustering and hierarchical clustering results is proposed. In addition, a novel method for learning the ultra-metric distance from the aggregated distance matrices and generating final hierarchical clustering with enhanced cluster separation is developed based on the ultra-metric distance for hierarchical clustering. We study three important problems: dendrogram description, dendrogram combination, and dendrogram selection. We develop two approaches for dendrogram selection based on tree distances, and we investigate various dendrogram distances for representing dendrograms. We provide a systematic empirical study of the ensemble hierarchical clustering problem. Experimental results demonstrate the effectiveness of our proposed approaches.

    References

    [1]
    E. N. Adams. 1986. N-trees as nestings: Complexity, similarity, and consensus. Journal of Classification 3, 299--317. 10.1007/BF01894192.
    [2]
    E. N. Adams III. 1972. Consensus techniques and the comparison of taxonomic trees. Systematic Zoology 21, 4, 390--397.
    [3]
    R. Agarwala, V. Bafna, M. Farach, M. Paterson, and M. Thorup. 1999. On the approximability of numerical taxonomy (fitting distances by tree metrics). SIAM Journal on Computing 1073--1085.
    [4]
    N. Ailon and M. Charikar. 2005. Fitting tree metrics: Hierarchical clustering and phylogeny. In Proceedings of the Symposium on Foundations of Computer Science. 73--82.
    [5]
    J. Azimi and X. Fern. 2009. Adaptive cluster ensemble selection. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’09). 992--997.
    [6]
    L. Breiman and L. Breiman. 1996. Bagging predictors. Machine Learning 24, 2, 123--140, Aug. 1996.
    [7]
    L. Breiman and L. Breiman. 2001. Random forests. Machine Learning 5--32.
    [8]
    C. Ding, X. He, H. Xiong, H. Peng, and S. R. Holbrook. 2006. Transitive closure and metric inequality of weighted graphs: Vdetecting protein interaction modules using cliques. International Journal of Data Mining and Bioinformatics 1, 162--177.
    [9]
    M. Farach, T. M. Przytycka, and M. Thorup. 1995. On the agreement of many trees. Information Processing Letter 55, 297--301.
    [10]
    X. Z. Fern and C. E. Brodley. 2004. Solving cluster ensemble problems by bipartite graph partitioning. In Proceedings of the 21st International Conference on Machine Learning (ICML’04). ACM, New York, NY, 36.
    [11]
    X. Z. Fern and W. Lin. 2008. Cluster ensemble selection. Statistical Analysis and Data Mining 1, 128--141.
    [12]
    C. Fraley and A. E. Raftery. 1998. How many clusters? Which clustering method? Answers via model-based cluster analysis. The Computer Journal 41, 578--588, 1998.
    [13]
    A. Gionis, H. Mannila, and P. Tsaparas. 2005. Clustering aggregation. In Proceedings of the 21st International Conference on Data Engineering (ICDE’05). 341--352.
    [14]
    T. Gonzalez. 1985. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38, 293--306, 1985.
    [15]
    M. Hossain, S. M. Bridges, Y. Wang, and J. E. Hodges. 2012. An effective ensemble method for hierarchical clustering. In Proceedings of the 5th International C* Conference on Computer Science and Software Engineering. ACM, 18--26.
    [16]
    A. Jain and R. Dubes. 1998. Algorithms for Clustering Data. Prentice Hall advanced reference series. Prentice Hall, 1988.
    [17]
    M. Jalalat-evakilkandi and A. Mirzaei. 2010. A new hierarchical-clustering combination scheme based on scatter matrices and nearest neighbor criterion. In Proceedings of the 2010 5th International Symposium on Telecommunications (IST’10). IEEE, 904--908.
    [18]
    K. Koutroumbas, I. Tsagouri, and A. Belehaki. 2010. On the clustering of foF2 time series corresponding to disturbed ionospheric periods. Advances in Space Research 45, 9, 1129--1144.
    [19]
    M. K. Kuhner and J. Felsenstein. 1994. A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Molecular Biology and Evolution 11, 3, 459--68.
    [20]
    T. Li and C. Ding. 2008. Weighted consensus clustering. In Proceedings of the SIAM International Conference on Data Mining. 798--809.
    [21]
    T. Li, C. Ding, and M. I. Jordan. 2007. Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In Proceedings of the 2007 7th IEEE International Conference on Data Mining (ICDM’07). IEEE Computer Society, Washington, DC, 577--582.
    [22]
    T. Li, M. Ogihara, and S. Ma. 2004. On combining multiple clusterings. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management (CIKM’04). ACM, New York, NY, 294--303.
    [23]
    T. Li, M. Ogihara, and S. Ma. 2010. On combining multiple clusterings: an overview and a new perspective. Applied Intelligence 33, 2, 207--219.
    [24]
    Y. Lu and Y. Wan. 2012. PHA: A fast potential-based hierarchical agglomerative clustering method. Pattern Recognition 46, 5, 1227--1239, May 2013.
    [25]
    D. Luo, C. Ding, H. Huang, and F. Nie. 2011. Consensus spectral clustering in near-linear time. In Proceedings of the 2011 IEEE 27th International Conference on Data Engineering (ICDE’11). IEEE Computer Society, Washington, DC, 1079--1090.
    [26]
    H. D. Meyer, H. Naessens, and B. D. Baets. 2004. Algorithms for computing the min-transitive closure and associated partition dendrogram of a symmetric fuzzy relation. European Journal of Operational Research 155, 1, 226--238.
    [27]
    A. Mirzaei and M. Rahmati. 2008. Combining hierarchical clusterings using min-transitive closure. In Proceedings of the 19th International Conference on Pattern Recognition (ICPR’08). IEEE, 1--4.
    [28]
    A. Mirzaei and M. Rahmati. 2010. A novel hierarchical-clustering-combination scheme based on fuzzy-similarity relations. IEEE Transactions on Fuzzy Systems 18, 1, 27--39.
    [29]
    A. Mirzaei, M. Rahmati, and M. Ahmadi. 2008. A new method for hierarchical clustering combination. Intelligent Data Analysis 12, 549--571.
    [30]
    S. Monti, P. Tamayo, J. Mesirov, and T. Golub. 2003. Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning 52, 91--118.
    [31]
    J. Podani. 2000. Simulation of random dendrograms and comparison tests: Some comments. Journal of Classification 17, 123--142.
    [32]
    E. Rashedi and A. Mirzaei. 2011. A novel multi-clustering method for hierarchical clusterings based on boosting. In Proceedings of the 2011 19th Iranian Conference on Electrical Engineering (ICEE’11). IEEE, 1--4.
    [33]
    D. F. Robinson and L. R. Foulds. 1981. Comparison of phylogenetic trees. Mathematical Bioscience, 53, 131--147.
    [34]
    F. J. Rohlf and D. R. Fisher. 1968. Tests for hierarchical structure in random data sets. Systematic Zoology 17, 4, 407--412.
    [35]
    R. E. Schapire and Y. Singer. 1999. Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37, 297--336, 1999.
    [36]
    R. R. Sokal and F. J. Rohlf. 1962. The comparison of dendrograms by objective methods. Taxon, 11, 2, 1962.
    [37]
    A. Strehl and J. Ghosh. 2003. Cluster ensembles—a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3, 583--617, March 2003.
    [38]
    C. A. Sugar and G. M. James. 2003. Finding the number of clusters in a data set: An information theoretic approach. Journal of the American Statistical Association 98, 750--763, 2003.
    [39]
    D. Swofford. 1991. When are phylogeny estimates from molecular and morphological data incongruent? In M. M. Miyamoto and J. Cracraft, editors, Phylogenetic Analysis of DNA Sequences. Oxford University Press, 295--333.
    [40]
    P.-N. Tan, M. Steinbach, and V. Kumar. 2005. Introduction to Data Mining (1st ed.). Addison-Wesley Longman, Boston, MA.
    [41]
    R. Tibshirani, G. Walther, and T. Hastie. 2001. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society Series B 63, 2, 411--423.
    [42]
    A. Topchy, A. Jain, and W. Punch. 2005. Clustering ensembles: models of consensus and weak partitions. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 12, 1866--1881.
    [43]
    M. Wilkinson. 1994. Common cladistic information and its consensus representation: Reduced adams and reduced cladistic consensus trees and profiles. Systematic Biology, 43, 3, 343--368, 1994.
    [44]
    D. H. Wolpert. 1992. Stacked generalization. Neural Networks, 5, 241--259, 1992.
    [45]
    J. Wu, H. Xiong, and J. Chen. 2009. Towards understanding hierarchical clustering: A data distribution perspective. Neurocomputing, 72, 10--12, 2319--2330, 2009.
    [46]
    Y. Zhao and G. Karypis. 2002. Evaluation of hierarchical clustering algorithms for document datasets. In Proceedings of the 11th International Conference on Information and Knowledge Management (CIKM’02). ACM, New York, NY, 515--524.
    [47]
    L. Zheng and T. Li. 2011. Semi-supervised hierarchical clustering. In Proceedings of the 2011 IEEE 11th International Conference on Data Mining (ICDM’11), 982--991, 2011.
    [48]
    L. Zheng, T. Li, and C. H. Q. Ding. 2010. Hierarchical ensemble clustering. In ICDM’10, 1199--1204, 2010.

    Cited By

    View all
    • (2024)Clustering Methods for Multidimensional Data from Social Media2024 MIT Art, Design and Technology School of Computing International Conference (MITADTSoCiCon)10.1109/MITADTSoCiCon60330.2024.10575244(1-7)Online publication date: 25-Apr-2024
    • (2024)Ensemble clustering via fusing global and local structure informationExpert Systems with Applications10.1016/j.eswa.2023.121557237(121557)Online publication date: Mar-2024
    • (2023)LSEC: Large-scale spectral ensemble clusteringIntelligent Data Analysis10.3233/IDA-21624027:1(59-77)Online publication date: 30-Jan-2023
    • Show More Cited By

    Index Terms

    1. A Framework for Hierarchical Ensemble Clustering

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Knowledge Discovery from Data
        ACM Transactions on Knowledge Discovery from Data  Volume 9, Issue 2
        November 2014
        193 pages
        ISSN:1556-4681
        EISSN:1556-472X
        DOI:10.1145/2672614
        Issue’s Table of Contents
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 23 September 2014
        Accepted: 01 March 2014
        Revised: 01 January 2014
        Received: 01 July 2013
        Published in TKDD Volume 9, Issue 2

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Hierarchical ensemble clustering
        2. ensemble selection
        3. ultra-metric

        Qualifiers

        • Research-article
        • Research
        • Refereed

        Funding Sources

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)39
        • Downloads (Last 6 weeks)6

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Clustering Methods for Multidimensional Data from Social Media2024 MIT Art, Design and Technology School of Computing International Conference (MITADTSoCiCon)10.1109/MITADTSoCiCon60330.2024.10575244(1-7)Online publication date: 25-Apr-2024
        • (2024)Ensemble clustering via fusing global and local structure informationExpert Systems with Applications10.1016/j.eswa.2023.121557237(121557)Online publication date: Mar-2024
        • (2023)LSEC: Large-scale spectral ensemble clusteringIntelligent Data Analysis10.3233/IDA-21624027:1(59-77)Online publication date: 30-Jan-2023
        • (2022)Toward Multidiversified Ensemble Clustering of High-Dimensional Data: From Subspaces to Metrics and BeyondIEEE Transactions on Cybernetics10.1109/TCYB.2021.304963352:11(12231-12244)Online publication date: Nov-2022
        • (2022)Weighted clustering ensemblePattern Recognition10.1016/j.patcog.2021.108428124:COnline publication date: 1-Apr-2022
        • (2022)Ensemble clustering of longitudinal bivariate HIV biomarker profiles to group patients by patterns of disease progressionInternational Journal of Data Science and Analytics10.1007/s41060-022-00323-214:3(305-318)Online publication date: 4-May-2022
        • (2021)Manifold regularization ensemble clustering with many objectives using unsupervised extreme learning machinesIntelligent Data Analysis10.3233/IDA-20536225:4(847-862)Online publication date: 9-Jul-2021
        • (2021)A Domain Adaptive Density Clustering Algorithm for Data With Varying Density DistributionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.295413333:6(2310-2321)Online publication date: 1-Jun-2021
        • (2021)A new method for weighted ensemble clustering and coupled ensemble selectionConnection Science10.1080/09540091.2020.1866496(1-22)Online publication date: 7-Jan-2021
        • (2020)Ultra-Scalable Spectral Clustering and Ensemble ClusteringIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.290341032:6(1212-1226)Online publication date: 1-Jun-2020
        • Show More Cited By

        View Options

        Get Access

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media