Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A Framework for Hierarchical Ensemble Clustering

Published: 23 September 2014 Publication History

Abstract

Ensemble clustering, as an important extension of the clustering problem, refers to the problem of combining different (input) clusterings of a given dataset to generate a final (consensus) clustering that is a better fit in some sense than existing clusterings. Over the past few years, many ensemble clustering approaches have been developed. However, most of them are designed for partitional clustering methods, and few research efforts have been reported for ensemble hierarchical clustering methods. In this article, a hierarchical ensemble clustering framework that can naturally combine both partitional clustering and hierarchical clustering results is proposed. In addition, a novel method for learning the ultra-metric distance from the aggregated distance matrices and generating final hierarchical clustering with enhanced cluster separation is developed based on the ultra-metric distance for hierarchical clustering. We study three important problems: dendrogram description, dendrogram combination, and dendrogram selection. We develop two approaches for dendrogram selection based on tree distances, and we investigate various dendrogram distances for representing dendrograms. We provide a systematic empirical study of the ensemble hierarchical clustering problem. Experimental results demonstrate the effectiveness of our proposed approaches.

References

[1]
E. N. Adams. 1986. N-trees as nestings: Complexity, similarity, and consensus. Journal of Classification 3, 299--317. 10.1007/BF01894192.
[2]
E. N. Adams III. 1972. Consensus techniques and the comparison of taxonomic trees. Systematic Zoology 21, 4, 390--397.
[3]
R. Agarwala, V. Bafna, M. Farach, M. Paterson, and M. Thorup. 1999. On the approximability of numerical taxonomy (fitting distances by tree metrics). SIAM Journal on Computing 1073--1085.
[4]
N. Ailon and M. Charikar. 2005. Fitting tree metrics: Hierarchical clustering and phylogeny. In Proceedings of the Symposium on Foundations of Computer Science. 73--82.
[5]
J. Azimi and X. Fern. 2009. Adaptive cluster ensemble selection. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’09). 992--997.
[6]
L. Breiman and L. Breiman. 1996. Bagging predictors. Machine Learning 24, 2, 123--140, Aug. 1996.
[7]
L. Breiman and L. Breiman. 2001. Random forests. Machine Learning 5--32.
[8]
C. Ding, X. He, H. Xiong, H. Peng, and S. R. Holbrook. 2006. Transitive closure and metric inequality of weighted graphs: Vdetecting protein interaction modules using cliques. International Journal of Data Mining and Bioinformatics 1, 162--177.
[9]
M. Farach, T. M. Przytycka, and M. Thorup. 1995. On the agreement of many trees. Information Processing Letter 55, 297--301.
[10]
X. Z. Fern and C. E. Brodley. 2004. Solving cluster ensemble problems by bipartite graph partitioning. In Proceedings of the 21st International Conference on Machine Learning (ICML’04). ACM, New York, NY, 36.
[11]
X. Z. Fern and W. Lin. 2008. Cluster ensemble selection. Statistical Analysis and Data Mining 1, 128--141.
[12]
C. Fraley and A. E. Raftery. 1998. How many clusters? Which clustering method? Answers via model-based cluster analysis. The Computer Journal 41, 578--588, 1998.
[13]
A. Gionis, H. Mannila, and P. Tsaparas. 2005. Clustering aggregation. In Proceedings of the 21st International Conference on Data Engineering (ICDE’05). 341--352.
[14]
T. Gonzalez. 1985. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38, 293--306, 1985.
[15]
M. Hossain, S. M. Bridges, Y. Wang, and J. E. Hodges. 2012. An effective ensemble method for hierarchical clustering. In Proceedings of the 5th International C* Conference on Computer Science and Software Engineering. ACM, 18--26.
[16]
A. Jain and R. Dubes. 1998. Algorithms for Clustering Data. Prentice Hall advanced reference series. Prentice Hall, 1988.
[17]
M. Jalalat-evakilkandi and A. Mirzaei. 2010. A new hierarchical-clustering combination scheme based on scatter matrices and nearest neighbor criterion. In Proceedings of the 2010 5th International Symposium on Telecommunications (IST’10). IEEE, 904--908.
[18]
K. Koutroumbas, I. Tsagouri, and A. Belehaki. 2010. On the clustering of foF2 time series corresponding to disturbed ionospheric periods. Advances in Space Research 45, 9, 1129--1144.
[19]
M. K. Kuhner and J. Felsenstein. 1994. A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Molecular Biology and Evolution 11, 3, 459--68.
[20]
T. Li and C. Ding. 2008. Weighted consensus clustering. In Proceedings of the SIAM International Conference on Data Mining. 798--809.
[21]
T. Li, C. Ding, and M. I. Jordan. 2007. Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In Proceedings of the 2007 7th IEEE International Conference on Data Mining (ICDM’07). IEEE Computer Society, Washington, DC, 577--582.
[22]
T. Li, M. Ogihara, and S. Ma. 2004. On combining multiple clusterings. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management (CIKM’04). ACM, New York, NY, 294--303.
[23]
T. Li, M. Ogihara, and S. Ma. 2010. On combining multiple clusterings: an overview and a new perspective. Applied Intelligence 33, 2, 207--219.
[24]
Y. Lu and Y. Wan. 2012. PHA: A fast potential-based hierarchical agglomerative clustering method. Pattern Recognition 46, 5, 1227--1239, May 2013.
[25]
D. Luo, C. Ding, H. Huang, and F. Nie. 2011. Consensus spectral clustering in near-linear time. In Proceedings of the 2011 IEEE 27th International Conference on Data Engineering (ICDE’11). IEEE Computer Society, Washington, DC, 1079--1090.
[26]
H. D. Meyer, H. Naessens, and B. D. Baets. 2004. Algorithms for computing the min-transitive closure and associated partition dendrogram of a symmetric fuzzy relation. European Journal of Operational Research 155, 1, 226--238.
[27]
A. Mirzaei and M. Rahmati. 2008. Combining hierarchical clusterings using min-transitive closure. In Proceedings of the 19th International Conference on Pattern Recognition (ICPR’08). IEEE, 1--4.
[28]
A. Mirzaei and M. Rahmati. 2010. A novel hierarchical-clustering-combination scheme based on fuzzy-similarity relations. IEEE Transactions on Fuzzy Systems 18, 1, 27--39.
[29]
A. Mirzaei, M. Rahmati, and M. Ahmadi. 2008. A new method for hierarchical clustering combination. Intelligent Data Analysis 12, 549--571.
[30]
S. Monti, P. Tamayo, J. Mesirov, and T. Golub. 2003. Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning 52, 91--118.
[31]
J. Podani. 2000. Simulation of random dendrograms and comparison tests: Some comments. Journal of Classification 17, 123--142.
[32]
E. Rashedi and A. Mirzaei. 2011. A novel multi-clustering method for hierarchical clusterings based on boosting. In Proceedings of the 2011 19th Iranian Conference on Electrical Engineering (ICEE’11). IEEE, 1--4.
[33]
D. F. Robinson and L. R. Foulds. 1981. Comparison of phylogenetic trees. Mathematical Bioscience, 53, 131--147.
[34]
F. J. Rohlf and D. R. Fisher. 1968. Tests for hierarchical structure in random data sets. Systematic Zoology 17, 4, 407--412.
[35]
R. E. Schapire and Y. Singer. 1999. Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37, 297--336, 1999.
[36]
R. R. Sokal and F. J. Rohlf. 1962. The comparison of dendrograms by objective methods. Taxon, 11, 2, 1962.
[37]
A. Strehl and J. Ghosh. 2003. Cluster ensembles—a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3, 583--617, March 2003.
[38]
C. A. Sugar and G. M. James. 2003. Finding the number of clusters in a data set: An information theoretic approach. Journal of the American Statistical Association 98, 750--763, 2003.
[39]
D. Swofford. 1991. When are phylogeny estimates from molecular and morphological data incongruent? In M. M. Miyamoto and J. Cracraft, editors, Phylogenetic Analysis of DNA Sequences. Oxford University Press, 295--333.
[40]
P.-N. Tan, M. Steinbach, and V. Kumar. 2005. Introduction to Data Mining (1st ed.). Addison-Wesley Longman, Boston, MA.
[41]
R. Tibshirani, G. Walther, and T. Hastie. 2001. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society Series B 63, 2, 411--423.
[42]
A. Topchy, A. Jain, and W. Punch. 2005. Clustering ensembles: models of consensus and weak partitions. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 12, 1866--1881.
[43]
M. Wilkinson. 1994. Common cladistic information and its consensus representation: Reduced adams and reduced cladistic consensus trees and profiles. Systematic Biology, 43, 3, 343--368, 1994.
[44]
D. H. Wolpert. 1992. Stacked generalization. Neural Networks, 5, 241--259, 1992.
[45]
J. Wu, H. Xiong, and J. Chen. 2009. Towards understanding hierarchical clustering: A data distribution perspective. Neurocomputing, 72, 10--12, 2319--2330, 2009.
[46]
Y. Zhao and G. Karypis. 2002. Evaluation of hierarchical clustering algorithms for document datasets. In Proceedings of the 11th International Conference on Information and Knowledge Management (CIKM’02). ACM, New York, NY, 515--524.
[47]
L. Zheng and T. Li. 2011. Semi-supervised hierarchical clustering. In Proceedings of the 2011 IEEE 11th International Conference on Data Mining (ICDM’11), 982--991, 2011.
[48]
L. Zheng, T. Li, and C. H. Q. Ding. 2010. Hierarchical ensemble clustering. In ICDM’10, 1199--1204, 2010.

Cited By

View all
  • (2024)Clustering Methods for Multidimensional Data from Social Media2024 MIT Art, Design and Technology School of Computing International Conference (MITADTSoCiCon)10.1109/MITADTSoCiCon60330.2024.10575244(1-7)Online publication date: 25-Apr-2024
  • (2024)Ensemble clustering via fusing global and local structure informationExpert Systems with Applications10.1016/j.eswa.2023.121557237(121557)Online publication date: Mar-2024
  • (2023)LSEC: Large-scale spectral ensemble clusteringIntelligent Data Analysis10.3233/IDA-21624027:1(59-77)Online publication date: 30-Jan-2023
  • Show More Cited By

Index Terms

  1. A Framework for Hierarchical Ensemble Clustering

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 9, Issue 2
      November 2014
      193 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/2672614
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 23 September 2014
      Accepted: 01 March 2014
      Revised: 01 January 2014
      Received: 01 July 2013
      Published in TKDD Volume 9, Issue 2

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Hierarchical ensemble clustering
      2. ensemble selection
      3. ultra-metric

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)40
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 17 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Clustering Methods for Multidimensional Data from Social Media2024 MIT Art, Design and Technology School of Computing International Conference (MITADTSoCiCon)10.1109/MITADTSoCiCon60330.2024.10575244(1-7)Online publication date: 25-Apr-2024
      • (2024)Ensemble clustering via fusing global and local structure informationExpert Systems with Applications10.1016/j.eswa.2023.121557237(121557)Online publication date: Mar-2024
      • (2023)LSEC: Large-scale spectral ensemble clusteringIntelligent Data Analysis10.3233/IDA-21624027:1(59-77)Online publication date: 30-Jan-2023
      • (2022)Toward Multidiversified Ensemble Clustering of High-Dimensional Data: From Subspaces to Metrics and BeyondIEEE Transactions on Cybernetics10.1109/TCYB.2021.304963352:11(12231-12244)Online publication date: Nov-2022
      • (2022)Weighted clustering ensemblePattern Recognition10.1016/j.patcog.2021.108428124:COnline publication date: 1-Apr-2022
      • (2022)Ensemble clustering of longitudinal bivariate HIV biomarker profiles to group patients by patterns of disease progressionInternational Journal of Data Science and Analytics10.1007/s41060-022-00323-214:3(305-318)Online publication date: 4-May-2022
      • (2021)Manifold regularization ensemble clustering with many objectives using unsupervised extreme learning machinesIntelligent Data Analysis10.3233/IDA-20536225:4(847-862)Online publication date: 9-Jul-2021
      • (2021)A Domain Adaptive Density Clustering Algorithm for Data With Varying Density DistributionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.295413333:6(2310-2321)Online publication date: 1-Jun-2021
      • (2021)A new method for weighted ensemble clustering and coupled ensemble selectionConnection Science10.1080/09540091.2020.1866496(1-22)Online publication date: 7-Jan-2021
      • (2020)Ultra-Scalable Spectral Clustering and Ensemble ClusteringIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.290341032:6(1212-1226)Online publication date: 1-Jun-2020
      • Show More Cited By

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media