Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/ICDE.2005.34guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Clustering Aggregation

Published: 05 April 2005 Publication History

Abstract

We consider the following problem: given a set of clusterings, find a clustering that agrees as much as possible with the given clusterings. This problem, clustering aggregation, appears naturally in various contexts. For example, clustering categorical data is an instance of the problem: each categorical variable can be viewed as a clustering of the input rows. Moreover, clustering aggregation can be used as a meta-clustering method to improve the robustness of clusterings. The problem formulation does not require a-priori information about the number of clusters, and it gives a naturalway for handlingmissing values. We give a formal statement of the clustering-aggregation problem, we discuss related work, and we suggest a number of algorithms. For several of the methods we provide theoretical guarantees on the quality of the solutions. We also show how sampling can be used to scale the algorithms for large data sets. We give an extensive empirical evaluation demonstrating the usefulness of the problem and of the solutions.

References

[1]
P. Andritsos, P. Tsaparas, R. J. Miller, and K. C. Sevcik. LIMBO: Scalable clustering of categorical data. In EDBT, 2004.
[2]
N. Bansal, A. Blum, and S. Chawla. Correlation clustering. In FOCS, 2002.
[3]
J.-P. Barthelemy and B. Leclerc. The median procedure for partitions. DIMACS Series in Discrete Mathematics, 1995.
[4]
C. L. Blake and C. J. Merz. UCI repository of machine learning databases, 1998.
[5]
C. Boulis and M. Ostendorf. Combining multiple clustering systems. In PKDD, 2004.
[6]
M. Charikar, V. Guruswami, and A. Wirth. Clustering with qualitative information. In FOCS, 2003.
[7]
D. Cristofor and D. A. Simovici. An information-theoretical approach to genetic algorithms for clustering. Technical Report TR-01-02, UMass/Boston, 2001.
[8]
E.D. Demaine and N. Immorlica. Correlation clustering with partial information. In APPROX, 2003.
[9]
C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the Web. In WWW, 2001.
[10]
D. Emanuel and A. Fiat. Correlation clustering: Minimizing disagreements on arbitrary weighted graphs. In ESA, 2003.
[11]
R. Fagin, R. Kumar, and D. Sivakumar. Comparing top k lists. In SODA, 2003.
[12]
X. Z. Fern and C. E. Brodley. Random projection for high dimensional data clustering: A cluster ensemble approach. In ICML, 2003.
[13]
V. Filkov and S. Skiena. Integrating microarray data by concensus clustering. In International Conference on Tools with Artificial Inteligence, 2003.
[14]
A. Fred and A. K. Jain. Data clustering using evidence accumulation. In ICPR, 2002.
[15]
S. Guha, R. Rastogi, and K. Shim. ROCK: A robust clustering algorithm for categorical attributes. Information Systems, 25(5):345-366, 2000.
[16]
G. Hamerly and C. Elkan. Learning the k in k-means. In NIPS. 2003.
[17]
D. Hochbaum and D. Shmoys. A best possible heuristic for the k-center problem. Mathematics of Operations Research, pages 180-184, 1985.
[18]
P. Smyth. Model selection for probabilistic clustering using cross-validated likelihood. Statistics and Computing, 10(1):63-72, 2000.
[19]
A. Strehl and J. Ghosh. Cluster ensembles -- A knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 2002.
[20]
C. Swamy. Correlation clustering: maximizing agreements via semidefinite programming. In SODA, 2004.
[21]
A. Topchy, A. K. Jain, and W. Punch. A mixture model of clustering ensembles. In SDM, 2004.

Cited By

View all
  • (2024)A novel tree structure-based multi-prototype clustering algorithmJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2024.10200236:3Online publication date: 1-Mar-2024
  • (2021)Towards improving machine learning algorithms accuracy by benefiting from similarities between casesJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-20107740:1(947-972)Online publication date: 1-Jan-2021
  • (2018)LBIRCHProceedings of the 2018 10th International Conference on Machine Learning and Computing10.1145/3195106.3195158(74-78)Online publication date: 26-Feb-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICDE '05: Proceedings of the 21st International Conference on Data Engineering
April 2005
8301 pages
ISBN:0769522858

Publisher

IEEE Computer Society

United States

Publication History

Published: 05 April 2005

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A novel tree structure-based multi-prototype clustering algorithmJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2024.10200236:3Online publication date: 1-Mar-2024
  • (2021)Towards improving machine learning algorithms accuracy by benefiting from similarities between casesJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-20107740:1(947-972)Online publication date: 1-Jan-2021
  • (2018)LBIRCHProceedings of the 2018 10th International Conference on Machine Learning and Computing10.1145/3195106.3195158(74-78)Online publication date: 26-Feb-2018
  • (2017)Evolutionary role mining in complex networks by ensemble clusteringProceedings of the Symposium on Applied Computing10.1145/3019612.3019815(1053-1060)Online publication date: 3-Apr-2017
  • (2017)Adaptive density peak clustering based on K-nearest neighbors with aggregating strategyKnowledge-Based Systems10.1016/j.knosys.2017.07.010133:C(208-220)Online publication date: 1-Oct-2017
  • (2016)Towards distributed ensemble clustering for networked sensing systemsProceedings of the 17th ACM International Symposium on Mobile Ad Hoc Networking and Computing10.1145/2942358.2942391(1-10)Online publication date: 5-Jul-2016
  • (2016)PatchWork, a scalable density-grid clustering algorithmProceedings of the 31st Annual ACM Symposium on Applied Computing10.1145/2851613.2851643(824-831)Online publication date: 4-Apr-2016
  • (2015)Instance-wise weighted nonnegative matrix factorization for aggregating partitions with locally reliable clustersProceedings of the 24th International Conference on Artificial Intelligence10.5555/2832747.2832819(4091-4097)Online publication date: 25-Jul-2015
  • (2014)Ensemble learning with trees and rulesIntelligent Data Analysis10.5555/2691093.269109918:5(857-872)Online publication date: 1-Sep-2014
  • (2014)A Framework for Hierarchical Ensemble ClusteringACM Transactions on Knowledge Discovery from Data10.1145/26113809:2(1-23)Online publication date: 23-Sep-2014
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media