Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3366423.3380152acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Clustering in graphs and hypergraphs with categorical edge labels

Published: 20 April 2020 Publication History

Abstract

Modern graph or network datasets often contain rich structure that goes beyond simple pairwise connections between nodes. This calls for complex representations that can capture, for instance, edges of different types as well as so-called “higher-order interactions” that involve more than two nodes at a time. However, we have fewer rigorous methods that can provide insight from such representations. Here, we develop a computational framework for the problem of clustering hypergraphs with categorical edge labels — or different interaction types — where clusters corresponds to groups of nodes that frequently participate in the same type of interaction.
Our methodology is based on a combinatorial objective function that is related to correlation clustering on graphs but enables the design of much more efficient algorithms that also seamlessly generalize to hypergraphs. When there are only two label types, our objective can be optimized in polynomial time, using an algorithm based on minimum cuts. Minimizing our objective becomes NP-hard with more than two label types, but we develop fast approximation algorithms based on linear programming relaxations that have theoretical cluster quality guarantees. We demonstrate the efficacy of our algorithms and the scope of the model through problems in edge-label community detection, clustering with temporal data, and exploratory data analysis.

References

[1]
Evrim Acar, Daniel M. Dunlavy, and Tamara G. Kolda. 2009. Link prediction on evolving data using matrix and tensor factorizations. In 2009 IEEE International Conference on Data Mining Workshops. IEEE, 262–269.
[2]
Sameer Agarwal, Kristin Branson, and Serge Belongie. 2006. Higher order learning with graphs. In Proceedings of the 23rd International Conference on Machine Learning - ICML '06. ACM Press. https://doi.org/10.1145/1143844.1143847
[3]
Yong-Yeol Ahn, Sebastian E. Ahnert, James P. Bagrow, and Albert-László Barabási. 2011. Flavor network and the principles of food pairing. Scientific Reports 1, 1 (Dec. 2011). https://doi.org/10.1038/srep00196
[4]
Leman Akoglu, Hanghang Tong, Brendan Meeder, and Christos Faloutsos. 2012. PICS: Parameter-free Identification of Cohesive Subgroups in Large Attributed Graphs. In Proceedings of the 2012 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9781611972825.38
[5]
Réka Albert and Albert-László Barabási. 2002. Statistical mechanics of complex networks. Reviews of Modern physics 74, 1 (2002), 47.
[6]
Yael Anava, Noa Avigdor-Elgrabli, and Iftah Gamzu. 2015. Improved Theoretical and Practical Guarantees for Chromatic Correlation Clustering. In Proceedings of the 24th International Conference on World Wide Web(WWW ’15). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 55–65. https://doi.org/10.1145/2736277.2741629
[7]
Francesca Arrigo, Desmond J Higham, and Francesco Tudisco. 2019. A framework for second order eigenvector centralities and clustering coefficients. arXiv:1910.12711 (2019).
[8]
Nikhil Bansal, Avrim Blum, and Shuchi Chawla. 2004. Correlation Clustering. Machine Learning 56(2004), 89–113.
[9]
Austin R Benson. 2019. Three hypergraph eigenvector centralities. SIAM Journal on Mathematics of Data Science 1, 2 (2019), 293–312.
[10]
Austin R. Benson, Rediet Abebe, Michael T. Schaub, Ali Jadbabaie, and Jon Kleinberg. 2018. Simplicial closure and higher-order link prediction. Proceedings of the National Academy of Sciences 115, 48(2018), E11221–E11230.
[11]
Austin R. Benson, David F. Gleich, and Jure Leskovec. 2016. Higher-order organization of complex networks. Science 353, 6295 (2016), 163–166.
[12]
Francesco Bonchi, Aristides Gionis, Francesco Gullo, Charalampos E. Tsourakakis, and Antti Ukkonen. 2015. Chromatic Correlation Clustering. ACM Trans. Knowl. Discov. Data 9, 4, Article 34 (June 2015), 24 pages. https://doi.org/10.1145/2728170
[13]
Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Antti Ukkonen. 2012. Chromatic Correlation Clustering. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD ’12). ACM, New York, NY, USA, 1321–1329. https://doi.org/10.1145/2339530.2339735
[14]
Shyam Boriah, Varun Chandola, and Vipin Kumar. 2008. Similarity measures for categorical data: A comparative evaluation. In Proceedings of the 2008 SIAM International Conference on Data Mining. SIAM, 243–254.
[15]
Cecile Bothorel, Juan David Cruz, Mateo Magnani, and Barbora Micenková. 2015. Clustering attributed graphs: Models, measures and methods. Network Science 3, 3 (March 2015), 408–444. https://doi.org/10.1017/nws.2015.9
[16]
Y. Boykov, O. Veksler, and R. Zabih. 2001. Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 11 (Nov 2001), 1222–1239. https://doi.org/10.1109/34.969114
[17]
Ulrik Brandes, Daniel Delling, Marco Gaertler, Robert Gorke, Martin Hoefer, Zoran Nikoloski, and Dorothea Wagner. 2007. On modularity clustering. IEEE Transactions on Knowledge and Data Engineering 20, 2(2007), 172–188.
[18]
Gruia Calinescu, Howard Karloff, and Yuval Rabani. 2000. An Improved Approximation Algorithm for MULTIWAY CUT. J. Comput. System Sci. 60, 3 (2000), 564 – 574. https://doi.org/10.1006/jcss.1999.1687
[19]
Michael B Cohen, Yin Tat Lee, and Zhao Song. 2019. Solving linear programs in the current matrix multiplication time. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing. ACM, 938–942.
[20]
Nicolas A. Crossley, Andrea Mechelli, Petra E. Vértes, Toby T. Winton-Brown, Ameera X. Patel, Cedric E. Ginestet, Philip McGuire, and Edward T. Bullmore. 2013. Cognitive relevance of the community structure of the human brain functional coactivation network. Proceedings of the National Academy of Sciences 110, 28(2013), 11583–11588.
[21]
E. Dahlhaus, D. S. Johnson, C. H. Papadimitriou, P. D. Seymour, and M. Yannakakis. 1994. The Complexity of Multiterminal Cuts. SIAM J. Comput. 23(1994), 864–894.
[22]
Inderjit S. Dhillon, Yuqiang Guan, and Brian Kulis. 2007. Weighted Graph Cuts without Eigenvectors A Multilevel Approach. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 11(2007), 1944–1957. https://doi.org/10.1109/tpami.2007.1115
[23]
Manlio De Domenico, Vincenzo Nicosia, Alexandre Arenas, and Vito Latora. 2015. Structural reducibility of multilayer networks. Nature Communications 6, 1 (2015). https://doi.org/10.1038/ncomms7864
[24]
Yuxiao Dong, Nitesh V. Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable Representation Learning for Heterogeneous Networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press. https://doi.org/10.1145/3097983.3098036
[25]
David Easley, Jon Kleinberg, 2012. Networks, Crowds, and Markets. Cambridge Books.
[26]
Santo Fortunato. 2010. Community detection in graphs. Physics Reports 486, 3-5 (2010), 75–174.
[27]
Takuro Fukunaga. 2018. LP-Based Pivoting Algorithm for Higher-Order Correlation Clustering. In Computing and Combinatorics, Lusheng Wang and Daming Zhu (Eds.). Springer International Publishing, Cham, 51–62.
[28]
Venkatesh Ganti, Johannes Gehrke, and Raghu Ramakrishnan. 1999. CACTUS-clustering categorical data using summaries. In KDD, Vol. 99. 73–83.
[29]
Naveen Garg, Vijay V Vazirani, and Mihalis Yannakakis. 2004. Multiway cuts in node weighted graphs. Journal of Algorithms 50, 1 (2004), 49–61.
[30]
David Gibson, Jon Kleinberg, and Prabhakar Raghavan. 2000. Clustering categorical data: An approach based on dynamical systems. The VLDB Journal—The International Journal on Very Large Data Bases 8, 3-4(2000), 222–236.
[31]
David F. Gleich, Nate Veldt, and Anthony Wirth. 2018. Correlation Clustering Generalized. In 29th International Symposium on Algorithms and Computation(ISAAC 2018), Vol. 123. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 44:1–44:13. https://doi.org/10.4230/LIPIcs.ISAAC.2018.44
[32]
Kaggle. 2015. Walmart Recruiting: Trip Type Classification. (2015). https://www.kaggle.com/c/walmart-recruiting-trip-type-classification.
[33]
Kaggle. 2015. What’s Cooking?(2015). https://www.kaggle.com/c/whats-cooking.
[34]
Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In Proceedings of the International Conference on Learning Representations.
[35]
Mikko Kivelä, Alex Arenas, Marc Barthelemy, James P. Gleeson, Yamir Moreno, and Mason A. Porter. 2014. Multilayer networks. Journal of Complex Networks 2, 3 (2014), 203–271.
[36]
Pushmeet Kohli. 2007. Minimizing Dynamic and Higher Order Energy Functions using Graph Cuts. Ph.D. Dissertation. Oxford Brookes University.
[37]
V. Kolmogorov and R. Zabin. 2004. What energy functions can be minimized via graph cuts?IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 2 (Feb 2004), 147–159. https://doi.org/10.1109/TPAMI.2004.1262177
[38]
Andrea Lancichinetti and Santo Fortunato. 2012. Consensus clustering in complex networks. Scientific Reports 2, 1 (March 2012). https://doi.org/10.1038/srep00336
[39]
Silvio Lattanzi and D Sivakumar. 2009. Affiliation networks. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing. Citeseer, 427–434.
[40]
Yin Tat Lee and Aaron Sidford. 2015. Efficient inverse maintenance and faster algorithms for linear programming. In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science. IEEE, 230–249.
[41]
Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2007. Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data (TKDD) 1, 1(2007), 2.
[42]
Jure Leskovec, Kevin J. Lang, and Michael Mahoney. 2010. Empirical comparison of algorithms for network community detection. In Proceedings of the 19th International Conference on World Wide Web - WWW '10. ACM Press. https://doi.org/10.1145/1772690.1772755
[43]
P. Li, H. Dau, G. Puleo, and O. Milenkovic. 2017. Motif clustering and overlapping clustering for social network analysis. In IEEE INFOCOM 2017 - IEEE Conference on Computer Communications. 1–9. https://doi.org/10.1109/INFOCOM.2017.8056956
[44]
Pan Li and Olgica Milenkovic. 2017. Inhomogeneous Hypergraph Clustering with Applications. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett(Eds.). Curran Associates, Inc., 2308–2318. http://papers.nips.cc/paper/6825-inhomogeneous-hypergraph-clustering-with-applications.pdf
[45]
Pan Li, Gregory J. Puleo, and Olgica Milenkovic. 2018. Motif and Hypergraph Correlation Clustering. CoRR abs/1811.02089(2018). arxiv:1811.02089http://arxiv.org/abs/1811.02089
[46]
Cristopher Moore. 2017. The Computer Science and Physics of Community Detection: Landscapes, Phase Transitions, and Hardness. Bulletin of the EATCS 121 (2017).
[47]
Peter J. Mucha, Thomas Richardson, Kevin Macon, Mason A. Porter, and Jukka-Pekka Onnela. 2010. Community structure in time-dependent, multiscale, and multiplex networks. Science 328, 5980 (2010), 876–878.
[48]
Mark EJ Newman. 2003. The structure and function of complex networks. SIAM Rev. 45, 2 (2003), 167–256.
[49]
James B Orlin. 2013. Max flows in O(nm) time, or better. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing. ACM, 765–774.
[50]
Braxton Osting, Sourabh Palande, and Bei Wang. 2017. Spectral Sparsification of Simplicial Complexes for Clustering and Label Propagation. arXiv:1708.08436 (2017).
[51]
Pietro Panzarasa, Tore Opsahl, and Kathleen M Carley. 2009. Patterns and dynamics of users’ behavior and interaction: Network analysis of an online community. Journal of the American Society for Information Science and Technology 60, 5 (2009), 911–932.
[52]
Evangelos Papalexakis, Konstantinos Pelechrinis, and Christos Faloutsos. 2014. Spotting misbehaviors in location-based social networks using tensors. In Proceedings of the 23rd International Conference on World Wide Web. ACM, 551–552.
[53]
Mason A Porter. 2019. Nonlinearity+ Networks: A 2020 Vision. arXiv:1911.03805 (2019).
[54]
Ryan A Rossi, Nesreen K Ahmed, and Eunyee Koh. 2018. Higher-order network representation learning. In Companion Proceedings of the The Web Conference 2018. 3–4.
[55]
Vsevolod Salnikov, Daniele Cassese, and Renaud Lambiotte. 2018. Simplicial complexes and complex systems. European Journal of Physics 40, 1 (2018), 014001.
[56]
Satu Elisa Schaeffer. 2007. Graph clustering. Computer Science Review 1, 1 (2007), 27–64.
[57]
Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MAS) and Applications. In WWW '15 Companion. ACM.
[58]
[58] Substance Abuse and Mental Health Services Administration. Drug Abuse Warning Network (DAWN).2011. (2011). https://www.samhsa.gov/data/data-we-collect/dawn-drug-abuse-warning-network.
[59]
Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, and Tianyi Wu. 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment 4, 11 (2011), 992–1003.
[60]
Chun-Yuen Teng, Yu-Ru Lin, and Lada A. Adamic. 2012. Recipe recommendation using ingredient networks. In Proceedings of the 3rd Annual ACM Web Science Conference on - WebSci '12. ACM Press. https://doi.org/10.1145/2380718.2380757
[61]
Dorothea Wagner and Frank Wagner. 1993. Between min cut and graph bisection. In International Symposium on Mathematical Foundations of Computer Science. Springer, 744–750.
[62]
Zhiqiang Xu, Yiping Ke, Yi Wang, Hong Cheng, and James Cheng. 2012. A model-based approach to attributed graph clustering. In Proceedings of the 2012 International Conference on Management of Data. ACM Press. https://doi.org/10.1145/2213836.2213894
[63]
Hao Yin, Austin R. Benson, Jure Leskovec, and David F. Gleich. 2017. Local higher-order graph clustering. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 555–564.
[64]
Thomas Zaslavsky. 1982. Signed graphs. Discrete Applied Mathematics 4, 1 (1982), 47 – 74. https://doi.org/10.1016/0166-218X(82)90033-6
[65]
Justine Zhang, Cristian Danescu-Niculescu-Mizil, Christina Sauper, and Sean J. Taylor. 2018. Characterizing Online Public Discussions through Patterns of Participant Interactions. Proceedings of the ACM on Human-Computer Interaction 2, CSCW(2018), 1–27. https://doi.org/10.1145/3274467
[66]
Dengyong Zhou, Jiayuan Huang, and Bernhard Schölkopf. 2007. Learning with hypergraphs: Clustering, classification, and embedding. In Advances in Neural Information Processing Systems. 1601–1608.
[67]
Yang Zhou, Hong Cheng, and Jeffrey Xu Yu. 2009. Graph clustering based on structural/attribute similarities. Proceedings of the VLDB Endowment 2, 1 (2009), 718–729. https://doi.org/10.14778/1687627.1687709

Cited By

View all
  • (2025)Census and Analysis of Higher-Order Interactions in Real-World HypergraphsBig Data Mining and Analytics10.26599/BDMA.2024.90200918:2(383-406)Online publication date: Apr-2025
  • (2025)Hyperedge overlap drives explosive transitions in systems with higher-order interactionsNature Communications10.1038/s41467-024-55506-116:1Online publication date: 9-Jan-2025
  • (2025)I2HGNN: Iterative Interpretable HyperGraph Neural Network for semi-supervised classificationNeural Networks10.1016/j.neunet.2024.106929183(106929)Online publication date: Mar-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '20: Proceedings of The Web Conference 2020
April 2020
3143 pages
ISBN:9781450370233
DOI:10.1145/3366423
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 April 2020

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

WWW '20
Sponsor:
WWW '20: The Web Conference 2020
April 20 - 24, 2020
Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)82
  • Downloads (Last 6 weeks)12
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Census and Analysis of Higher-Order Interactions in Real-World HypergraphsBig Data Mining and Analytics10.26599/BDMA.2024.90200918:2(383-406)Online publication date: Apr-2025
  • (2025)Hyperedge overlap drives explosive transitions in systems with higher-order interactionsNature Communications10.1038/s41467-024-55506-116:1Online publication date: 9-Jan-2025
  • (2025)I2HGNN: Iterative Interpretable HyperGraph Neural Network for semi-supervised classificationNeural Networks10.1016/j.neunet.2024.106929183(106929)Online publication date: Mar-2025
  • (2025)Efficient Hypergraph Collective Influence Maximization in Cascading Processes based on General Threshold ModelInformation Sciences10.1016/j.ins.2024.121816(121816)Online publication date: Jan-2025
  • (2024)Fast algorithms for hypergraph pagerank with applications to semi-supervised learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692125(1306-1330)Online publication date: 21-Jul-2024
  • (2024)Unsupervised Alignment of Hypergraphs with Different ScalesProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671955(609-620)Online publication date: 25-Aug-2024
  • (2024)Overlapping and Robust Edge-Colored Clustering in HypergraphsProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635792(143-151)Online publication date: 4-Mar-2024
  • (2024)Densest Subhypergraph: Negative Supermodular Functions and Strongly Localized MethodsProceedings of the ACM Web Conference 202410.1145/3589334.3645624(881-892)Online publication date: 13-May-2024
  • (2024)Simplicial Complex Neural NetworksIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.332362446:1(561-575)Online publication date: Jan-2024
  • (2024)Reordering and Compression for Hypergraph ProcessingIEEE Transactions on Computers10.1109/TC.2024.337791573:6(1486-1499)Online publication date: Jun-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media