Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Chromatic Correlation Clustering

Published: 01 June 2015 Publication History

Abstract

We study a novel clustering problem in which the pairwise relations between objects are categorical. This problem can be viewed as clustering the vertices of a graph whose edges are of different types (colors). We introduce an objective function that ensures the edges within each cluster have, as much as possible, the same color. We show that the problem is NP-hard and propose a randomized algorithm with approximation guarantee proportional to the maximum degree of the input graph. The algorithm iteratively picks a random edge as a pivot, builds a cluster around it, and removes the cluster from the graph. Although being fast, easy to implement, and parameter-free, this algorithm tends to produce a relatively large number of clusters. To overcome this issue we introduce a variant algorithm, which modifies how the pivot is chosen and how the cluster is built around the pivot. Finally, to address the case where a fixed number of output clusters is required, we devise a third algorithm that directly optimizes the objective function based on the alternating-minimization paradigm.
We also extend our objective function to handle cases where object’s relations are described by multiple labels. We modify our randomized approximation algorithm to optimize such an extended objective function and show that its approximation guarantee remains proportional to the maximum degree of the graph.
We test our algorithms on synthetic and real data from the domains of social media, protein-interaction networks, and bibliometrics. Results reveal that our algorithms outperform a baseline algorithm both in the task of reconstructing a ground-truth clustering and in terms of objective-function value.

References

[1]
Nir Ailon, Noa Avigdor-Elgrabli, Edo Liberty, and Anke van Zuylen. 2012. Improved approximation algorithms for bipartite correlation clustering. SIAM J. Comput. 41, 5 (2012), 1110--1121.
[2]
Nir Ailon, Moses Charikar, and Alantha Newman. 2008. Aggregating inconsistent information: Ranking and clustering. Journal of the ACM (JACM) 55, 5 (2008), 23:1--23:27.
[3]
Nir Ailon and Edo Liberty. 2009. Correlation clustering revisited: The ‘true” cost of error minimization problems. In Proceedings of the International Colloquium on Automata, Languages and Programming (ICALP’09). 24--36.
[4]
Nikhil Bansal, Avrim Blum, and Shuchi Chawla. 2004. Correlation Clustering. Machine Learning 56, 89--113 (2004).
[5]
Michele Berlingerio, Michele Coscia, and Fosca Giannotti. 2011a. Finding and characterizing communities in multidimensional networks. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM’11). 490--494.
[6]
Michele Berlingerio, Michele Coscia, Fosca Giannotti, Anna Monreale, and Dino Pedreschi. 2011b. Foundations of Multidimensional Network Analysis. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM’11). 485--489.
[7]
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research (JMLR) 3 (2003), 993--1022.
[8]
Brigitte Boden, Stephan Günnemann, Holger Hoffmann, and Thomas Seidl. 2012. Mining coherent subgraphs in multi-layer graphs with edge labels. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1258--1266.
[9]
Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Antti Ukkonen. 2012. Chromatic correlation clustering. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1321--1329.
[10]
Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Antti Ukkonen. 2014. Distance oracles in edge-labeled graphs. In Proceedings of the International Conference on Extending Database Technology (EDBT). 547--558.
[11]
Francesco Bonchi, Aristides Gionis, and Antti Ukkonen. 2013. Overlapping correlation clustering. Knowledge and Information Systems (KAIS) 35, 1 (2013), 1--32.
[12]
Piotr Brodka, Pawel Stawiak, and Przemyslaw Kazienko. 2011. Shortest path discovery in the multi-layered social network. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM’11). 497--501.
[13]
Herman Chernoff. 1981. A Note on an Inequality Involving the Normal Distribution. Annals of Probability 9, 3 (1981), 533--535.
[14]
Imre Csiszar and Gary Tusnady. 1984. Information Geometry and Alternating Minimization Procedures. Statistics and Decisions (1984).
[15]
Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, and Yinghui Wu. 2011. Adding regular expressions to graph reachability and pattern queries. In Proceedings of the IEEE International Conference on Data Engineering (ICDE’11). 39--50.
[16]
Ioannis Giotis and Venkatesan Guruswami. 2006. Correlation clustering with a fixed number of clusters. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA’06). 1167--1176.
[17]
Ruoming Jin, Hui Hong, Haixun Wang, Ning Ruan, and Yang Xiang. 2010. Computing label-constraint reachability in graph databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 123--134.
[18]
Przemyslaw Kazienko, Katarzyna Musial, Elżbieta Kukla, Tomasz Kajdanowicz, and Piotr Bródka. 2011. Multidimensional social network: Model and analysis. In Proceedings of the International Conference on Computational Collective Intelligence: Technologies and Applications (ICCCI’11). 378--387.
[19]
Hans-Peter Kriegel, Peer Kröger, and Arthur Zimek. 2009. Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Transactions on Knowledge Discovery from Data (TKDD) 3, 1 (2009), 1:1--1:58.
[20]
Ankita Likhyani and Srikanta Bedathur. 2013. Label constrained shortest path estimation. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM’13). 1177--1180.
[21]
Chuan Lin, Young rae Cho, Woo chang Hwang, Pengjun Pei, and Aidong Zhang. 2007. Clustering methods in protein-protein interaction networks. In Knowledge Discovery in Bioinformatics: Techniques, Methods and Application, Xianhua Hu and Yi Pan (Eds.). Wiley.
[22]
Matteo Magnani and Luca Rossi. 2011. The ML-model for multi-layer social networks. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM’11). 5--12.
[23]
Frank D. McSherry. 2001. Spectral partitioning of random graphs. In Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS’01). 529--537.
[24]
Lance Parsons, Ehtesham Haque, and Huan Liu. 2004. Subspace clustering for high dimensional data: A review. SIGKDD Explorations Newsletter 6, 1 (2004), 90--105.
[25]
Michael Rice and Vassilis J. Tsotras. 2010. Graph Indexing of Road Networks for Shortest Path Queries with Label Restrictions. Proceedings of the VLDB Endowment (PVLDB) 4, 2 (2010), 69--80.
[26]
Matthew Rocklin and Ali Pinar. 2011. On clustering on graphs with multiple edge types. In Proceedings of the Workshop on Algorithms and Models for the Web Graph (WAW’11). 38--49.
[27]
Lei Tang, Xufei Wang, and Huan Liu. 2009. Uncovering groups via heterogeneous interaction analysis. In Proceedings of the IEEE International Conference on Data Mining (ICDM’09). 503--512.
[28]
Lai Tang, Xufei Wang, and Huan Liu. 2011. Community detection via heterogeneous interaction analysis. Data Mining and Knowledge Discovery (DAMI) (2011), 1--33.
[29]
Kun Xu, Lei Zou, Jeffery Xu Yu, Lei Chen, Yanghua Xiao, and Dongyan Zhao. 2011. Answering label-constraint reachability in large graphs. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM’11). 1595--1600.

Cited By

View all
  • (2025)A Local Search Algorithm for the Radius-Constrained k-Median ProblemTheory of Computing Systems10.1007/s00224-024-10211-w69:1Online publication date: 30-Jan-2025
  • (2024)Accurate Multi-view Clustering to Seek the Cross-viewed yet Uniform Sample Assignment via Tensor Feature MatchingInformation Sciences10.1016/j.ins.2024.120305(120305)Online publication date: Feb-2024
  • (2023)Optimal LP rounding and linear-time approximation algorithms for clustering edge-colored hypergraphsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619862(34924-34951)Online publication date: 23-Jul-2023
  • Show More Cited By

Index Terms

  1. Chromatic Correlation Clustering

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 9, Issue 4
    June 2015
    261 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/2786971
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 June 2015
    Accepted: 01 January 2015
    Revised: 01 June 2014
    Received: 01 June 2013
    Published in TKDD Volume 9, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Clustering
    2. correlation clustering
    3. edge-labeled graphs

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • Yahoo! Internship program

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)19
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)A Local Search Algorithm for the Radius-Constrained k-Median ProblemTheory of Computing Systems10.1007/s00224-024-10211-w69:1Online publication date: 30-Jan-2025
    • (2024)Accurate Multi-view Clustering to Seek the Cross-viewed yet Uniform Sample Assignment via Tensor Feature MatchingInformation Sciences10.1016/j.ins.2024.120305(120305)Online publication date: Feb-2024
    • (2023)Optimal LP rounding and linear-time approximation algorithms for clustering edge-colored hypergraphsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619862(34924-34951)Online publication date: 23-Jul-2023
    • (2023)FirmTruss Community Search in Multilayer NetworksProceedings of the VLDB Endowment10.14778/3570690.357070016:3(505-518)Online publication date: 23-Jan-2023
    • (2022)Approximation Algorithms for the Capacitated Min–Max Correlation Clustering ProblemAsia-Pacific Journal of Operational Research10.1142/S021759592240008540:01Online publication date: 4-Jan-2022
    • (2022)A Literature Review on Correlation Clustering: Cross-disciplinary Taxonomy with Bibliometric AnalysisOperations Research Forum10.1007/s43069-022-00156-63:3Online publication date: 3-Sep-2022
    • (2022)Approximation algorithms for the lower bounded correlation clustering problemJournal of Combinatorial Optimization10.1007/s10878-022-00976-645:1Online publication date: 31-Dec-2022
    • (2021)A Color-blind 3-Approximation for Chromatic Correlation Clustering and Improved HeuristicsProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining10.1145/3447548.3467446(882-891)Online publication date: 14-Aug-2021
    • (2021)Approximation Algorithms for the Lower Bounded Correlation Clustering ProblemComputational Data and Social Networks10.1007/978-3-030-91434-9_4(39-49)Online publication date: 15-Nov-2021
    • (2020)Core Decomposition in Multilayer NetworksACM Transactions on Knowledge Discovery from Data10.1145/336987214:1(1-40)Online publication date: 3-Feb-2020
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media