Article

Clustering Aggregation

Authors:

Aristides Gionis,

Heikki Mannila,

Panayiotis TsaparasAuthors Info & Claims

ICDE '05: Proceedings of the 21st International Conference on Data Engineering

Pages 341 - 352

https://doi.org/10.1109/ICDE.2005.34

Published: 05 April 2005 Publication History

Abstract

We consider the following problem: given a set of clusterings, find a clustering that agrees as much as possible with the given clusterings. This problem, clustering aggregation, appears naturally in various contexts. For example, clustering categorical data is an instance of the problem: each categorical variable can be viewed as a clustering of the input rows. Moreover, clustering aggregation can be used as a meta-clustering method to improve the robustness of clusterings. The problem formulation does not require a-priori information about the number of clusters, and it gives a naturalway for handlingmissing values. We give a formal statement of the clustering-aggregation problem, we discuss related work, and we suggest a number of algorithms. For several of the methods we provide theoretical guarantees on the quality of the solutions. We also show how sampling can be used to scale the algorithms for large data sets. We give an extensive empirical evaluation demonstrating the usefulness of the problem and of the solutions.

References

[1]

P. Andritsos, P. Tsaparas, R. J. Miller, and K. C. Sevcik. LIMBO: Scalable clustering of categorical data. In EDBT, 2004.

[2]

N. Bansal, A. Blum, and S. Chawla. Correlation clustering. In FOCS, 2002.

Digital Library

[3]

J.-P. Barthelemy and B. Leclerc. The median procedure for partitions. DIMACS Series in Discrete Mathematics, 1995.

[4]

C. L. Blake and C. J. Merz. UCI repository of machine learning databases, 1998.

[5]

C. Boulis and M. Ostendorf. Combining multiple clustering systems. In PKDD, 2004.

Digital Library

[6]

M. Charikar, V. Guruswami, and A. Wirth. Clustering with qualitative information. In FOCS, 2003.

Digital Library

[7]

D. Cristofor and D. A. Simovici. An information-theoretical approach to genetic algorithms for clustering. Technical Report TR-01-02, UMass/Boston, 2001.

[8]

E.D. Demaine and N. Immorlica. Correlation clustering with partial information. In APPROX, 2003.

[9]

C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the Web. In WWW, 2001.

Digital Library

[10]

D. Emanuel and A. Fiat. Correlation clustering: Minimizing disagreements on arbitrary weighted graphs. In ESA, 2003.

[11]

R. Fagin, R. Kumar, and D. Sivakumar. Comparing top k lists. In SODA, 2003.

Digital Library

[12]

X. Z. Fern and C. E. Brodley. Random projection for high dimensional data clustering: A cluster ensemble approach. In ICML, 2003.

[13]

V. Filkov and S. Skiena. Integrating microarray data by concensus clustering. In International Conference on Tools with Artificial Inteligence, 2003.

Digital Library

[14]

A. Fred and A. K. Jain. Data clustering using evidence accumulation. In ICPR, 2002.

[15]

S. Guha, R. Rastogi, and K. Shim. ROCK: A robust clustering algorithm for categorical attributes. Information Systems, 25(5):345-366, 2000.

Digital Library

[16]

G. Hamerly and C. Elkan. Learning the k in k-means. In NIPS. 2003.

[17]

D. Hochbaum and D. Shmoys. A best possible heuristic for the k-center problem. Mathematics of Operations Research, pages 180-184, 1985.

[18]

P. Smyth. Model selection for probabilistic clustering using cross-validated likelihood. Statistics and Computing, 10(1):63-72, 2000.

Digital Library

[19]

A. Strehl and J. Ghosh. Cluster ensembles -- A knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 2002.

Digital Library

[20]

C. Swamy. Correlation clustering: maximizing agreements via semidefinite programming. In SODA, 2004.

Digital Library

[21]

A. Topchy, A. K. Jain, and W. Punch. A mixture model of clustering ensembles. In SDM, 2004.

Cited By

Li TLi BXin XMa YYang Q(2024)A novel tree structure-based multi-prototype clustering algorithmJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2024.10200236:3Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1016/j.jksuci.2024.102002
Mostafa S(2021)Towards improving machine learning algorithms accuracy by benefiting from similarities between casesJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-20107740:1(947-972)Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.3233/JIFS-201077
Guo DChen JChen YLi Z(2018)LBIRCHProceedings of the 2018 10th International Conference on Machine Learning and Computing10.1145/3195106.3195158(74-78)Online publication date: 26-Feb-2018
https://dl.acm.org/doi/10.1145/3195106.3195158
Show More Cited By

Index Terms

Clustering Aggregation
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
2. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
    2. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Clustering aggregation

We consider the following problem: given a set of clusterings, find a single clustering that agrees as much as possible with the input clusterings. This problem, clustering aggregation, appears naturally in various contexts. For example, clustering ...
Clustering aggregation by probability accumulation

Since a large number of clustering algorithms exist, aggregating different clustered partitions into a single consolidated one to obtain better results has become an important problem. In Fred and Jain's evidence accumulation algorithm, they construct a ...
Efficient Clustering Aggregation Based on Data Fragments

Clustering aggregation, known as clustering ensembles, has emerged as a powerful technique for combining different clustering results to obtain a single better clustering. Existing clustering aggregation algorithms are applied directly to data points, ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICDE '05: Proceedings of the 21st International Conference on Data Engineering

April 2005

8301 pages

ISBN:0769522858

Publisher

IEEE Computer Society

United States

Publication History

Published: 05 April 2005

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

37
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li TLi BXin XMa YYang Q(2024)A novel tree structure-based multi-prototype clustering algorithmJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2024.10200236:3Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1016/j.jksuci.2024.102002
Mostafa S(2021)Towards improving machine learning algorithms accuracy by benefiting from similarities between casesJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-20107740:1(947-972)Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.3233/JIFS-201077
Guo DChen JChen YLi Z(2018)LBIRCHProceedings of the 2018 10th International Conference on Machine Learning and Computing10.1145/3195106.3195158(74-78)Online publication date: 26-Feb-2018
https://dl.acm.org/doi/10.1145/3195106.3195158
Choobdar SRibeiro PSilva FShin SShin DLencastre M(2017)Evolutionary role mining in complex networks by ensemble clusteringProceedings of the Symposium on Applied Computing10.1145/3019612.3019815(1053-1060)Online publication date: 3-Apr-2017
https://dl.acm.org/doi/10.1145/3019612.3019815
Yaohui LZhengming MFang Y(2017)Adaptive density peak clustering based on K-nearest neighbors with aggregating strategyKnowledge-Based Systems10.1016/j.knosys.2017.07.010133:C(208-220)Online publication date: 1-Oct-2017
https://dl.acm.org/doi/10.1016/j.knosys.2017.07.010
Ding HSu LXu JDressler Fauf der Heide F(2016)Towards distributed ensemble clustering for networked sensing systemsProceedings of the 17th ACM International Symposium on Mobile Ad Hoc Networking and Computing10.1145/2942358.2942391(1-10)Online publication date: 5-Jul-2016
https://dl.acm.org/doi/10.1145/2942358.2942391
Gouineau FLandry TTriplet TOssowski S(2016)PatchWork, a scalable density-grid clustering algorithmProceedings of the 31st Annual ACM Symposium on Applied Computing10.1145/2851613.2851643(824-831)Online publication date: 4-Apr-2016
https://dl.acm.org/doi/10.1145/2851613.2851643
Zheng XZhu SGao JMamitsuka H(2015)Instance-wise weighted nonnegative matrix factorization for aggregating partitions with locally reliable clustersProceedings of the 24th International Conference on Artificial Intelligence10.5555/2832747.2832819(4091-4097)Online publication date: 25-Jul-2015
https://dl.acm.org/doi/10.5555/2832747.2832819
Akdemir DJannink J(2014)Ensemble learning with trees and rulesIntelligent Data Analysis10.5555/2691093.269109918:5(857-872)Online publication date: 1-Sep-2014
https://dl.acm.org/doi/10.5555/2691093.2691099
Zheng LLi TDing C(2014)A Framework for Hierarchical Ensemble ClusteringACM Transactions on Knowledge Discovery from Data10.1145/26113809:2(1-23)Online publication date: 23-Sep-2014
https://dl.acm.org/doi/10.1145/2611380
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents