Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Modularity-based Hypergraph Clustering: Random Hypergraph Model, Hyperedge-cluster Relation, and Computation

Published: 13 November 2023 Publication History

Abstract

A graph models the connections among objects. One important graph analytical task is clustering which partitions a data graph into clusters with dense innercluster connections. A line of clustering maximizes a function called modularity. Modularity-based clustering is widely adopted on dyadic graphs due to its scalability and clustering quality which depends highly on its selection of a random graph model. The random graph model decides not only which clustering is preferred - modularity measures the quality of a clustering based on its alignment to the edges of a random graph, but also the cost of computing such an alignment. Existing random hypergraph models either measure the hyperedge-cluster alignment in an All-Or-Nothing (AON) manner, losing important group-wise information, or introduce expensive alignment computation, refraining the clustering from scaling up. This paper proposes a new random hypergraph model called Hyperedge Expansion Model (HEM), a non-AON hypergraph modularity function called Partial Innerclusteredge modularity (PI) based on HEM, a clustering algorithm called Partial Innerclusteredge Clustering (PIC) that optimizes PI, and novel computation optimizations. PIC is a scalable modularity-based hypergraph clustering that can effectively capture the non-AON hyperedge-cluster relation. Our experiments show that PIC outperforms eight state-of-the-art methods on real-world hypergraphs in terms of both clustering quality and scalability and is up to five orders of magnitude faster than the baseline methods.

References

[1]
Sameer Agarwal, Jongwoo Lim, Lihi Zelnik-Manor, Pietro Perona, David~J. Kriegman, and Serge~J. Belongie. 2005. Beyond Pairwise Clustering. In CVPR. 838--845. https://doi.org/10.1109/CVPR.2005.89
[2]
, Charu Aggarwal and Chandan Reddy. 2013. Data clustering: algorithms and applications. https://doi.org/10.1201/9781315373515
[3]
021)]% aghdaei2021hypersf, Ali Aghdaei, Zhiqiang Zhao, and Zhuo Feng. 2021. HyperSF: Spectral Hypergraph Coarsening via Flow-based Local Clustering. In ICCAD. 1--9. https://doi.org/10.1109/ICCAD51958.2021.9643555
[4]
William Aiello, Fan Chung, and Linyuan Lu. 2000. A random graph model for massive graphs. In STOC. 171--180. https://doi.org/10.1145/335305.335326
[5]
Ilya Amburg, Nate Veldt, and Austin Benson. 2020. Clustering in graphs and hypergraphs with categorical edge labels. In WWW. 706--717. https://doi.org/10.1145/3366423.3380152
[6]
Alex Arenas, Alberto Fernandez, and Sergio Gomez. 2008. Analysis of the structure of complex networks at different resolution levels. New journal of physics, Vol. 10, 5 (2008), 053039. https://doi.org/10.1088/1367--2630/10/5/053039
[7]
Austin~R Benson, Rediet Abebe, Michael~T Schaub, Ali Jadbabaie, and Jon Kleinberg. 2018. Simplicial closure and higher-order link prediction. Proc. Natl. Acad. Sci., Vol. 115, 48 (2018), E11221--E11230. https://doi.org/10.1073/pnas.1800683115
[8]
Vincent Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast Unfolding of Communities in Large Networks. J. Stat. Mech., Vol. 2008, 10 (04 2008), P10008. https://doi.org/10.1088/1742--5468/2008/10/P10008
[9]
Béla Bollobás and Bela Bollobas. 1998. Modern graph theory. Vol., Vol. 184. Springer Science & Business Media. https://doi.org/10.1007/978--1--4612-0619--4
[10]
Thomas Bonald, Nathan de Lara, Quentin Lutz, and Bertrand Charpentier. 2020. Scikit-network: Graph Analysis in Python. J. Mach. Learn., Vol. 21, 185 (2020), 1--6.
[11]
Alain Bretto. 2013. Hypergraph theory. An introduction. Mathematical Engineering. Cham: Springer (2013). https://doi.org/10.1007/978--3--319-00080-0
[12]
Jiajun Bu, Shulong Tan, Chun Chen, Can Wang, Hao Wu, Lijun Zhang, and Xiaofei He. 2010. Music recommendation by unified hypergraph: combining social media information and music content. In ACM Multimedia. 391--400. https://doi.org/10.1145/1873951.1874005
[13]
Philip~S Chodrow. 2020. Configuration models of random hypergraphs. J. Complex Networks, Vol. 8, 3 (08 2020), cnaa018. https://doi.org/10.1093/comnet/cnaa018
[14]
Philip~S. Chodrow, Nate Veldt, and Austin~R. Benson. 2021. Generative hypergraph clustering: From blockmodels to modularity. Science Advances, Vol. 7, 28 (2021), eabh1303. https://doi.org/10.1126/sciadv.abh1303
[15]
Fan Chung and Linyuan Lu. 2002. The average distances in random graphs with given expected degrees. Natl Acad. Sci., Vol. 99, 25 (2002), 15879--15882. https://doi.org/10.1073/pnas.252631999
[16]
Aaron Clauset, Mark~EJ Newman, and Cristopher Moore. 2004. Finding community structure in very large networks. Phys. Rev. E, Vol. 70, 6 (2004), 066111. https://doi.org/10.1103/PhysRevE.70.066111
[17]
Aaron Clauset, Cosma~Rohilla Shalizi, and Mark~EJ Newman. 2009. Power-law distributions in empirical data. SIAM review, Vol. 51, 4 (2009), 661--703. https://doi.org/10.1137/070710111
[18]
Jordi Duch and Alex Arenas. 2005. Community detection in complex networks using extremal optimization. Phys. Rev. E, Vol. 72, 2 (2005), 027104. https://doi.org/10.1103/PhysRevE.72.027104
[19]
Paul ErdHo s, Alfréd Rényi, et al. 1960. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, Vol. 5, 1 (1960), 17--60.
[20]
Santo Fortunato. 2010. Community detection in graphs. Phys. Reps., Vol. 486, 3--5 (2010), 75--174. https://doi.org/10.1016/j.physrep.2009.11.002
[21]
Santo Fortunato and Marc Barthelemy. 2007. Resolution limit in community detection. Proceedings of the national academy of sciences, Vol. 104, 1 (2007), 36--41. https://doi.org/10.1073/pnas.0605965104
[22]
Santo Fortunato and Darko Hric. 2016. Community detection in networks: A user guide. Phys. Reps., Vol. 659 (2016), 1--44. https://doi.org/10.1016/j.physrep.2016.09.002
[23]
Kimon Fountoulakis, Pan Li, and Shenghao Yang. 2021. Local hyper-flow diffusion. NeurIPS, Vol. 34 (2021), 27683--27694.
[24]
Koby Hayashi, Sinan~G Aksoy, Cheong~Hee Park, and Haesun Park. 2020. Hypergraph random walks, laplacians, and clustering. In CIKM. 495--504. https://doi.org/10.1145/3340531.3412034
[25]
Einar Hille and Ralph~Saul Phillips. 1996. Functional analysis and semi-groups. Vol., Vol. 31. American Mathematical Soc.
[26]
Dong Huang, Chang-Dong Wang, Jian-Sheng Wu, Jian-Huang Lai, and Chee-Keong Kwoh. 2019. Ultra-scalable spectral clustering and ensemble clustering. TKDE, Vol. 32, 6 (2019), 1212--1226. https://doi.org/10.1109/TKDE.2019.2903410
[27]
Lawrence Hubert and Phipps Arabie. 1985. Comparing partitions. J. Classif., Vol. 2, 1 (1985), 193--218. https://doi.org/10.1007/BF01908075
[28]
Bogumił Kami'nski, Valérie Poulin, Paweł Prałat, Przemysław Szufel, and Francc ois Théberge. 2019. Clustering via hypergraph modularity. PloS one, Vol. 14, 11 (2019), e0224307. https://doi.org/10.1371/journal.pone.0224307
[29]
Min-Soo Kim and Jiawei Han. 2009. A particle-and-density based evolutionary clustering method for dynamic networks. VLDB, Vol. 2, 1 (2009), 622--633. https://doi.org/10.14778/1687627.1687698
[30]
Sungwoong Kim, Sebastian Nowozin, Pushmeet Kohli, and Chang Yoo. 2011. Higher-order correlation clustering for image segmentation. NIPS, Vol. 24 (2011), 1530--1538.
[31]
Larkshmi Krishnamurthy, Joseph Nadeau, Gultekin Ozsoyoglu, M Ozsoyoglu, Greg Schaeffer, Murat Tasan, and Wanhong Xu. 2003. Pathways database system: an integrated system for biological pathways. Bioinformatics, Vol. 19, 8 (2003), 930--937. https://doi.org/10.1093/bioinformatics/btg113
[32]
Tarun Kumar, Sankaran Vaidyanathan, Harini Ananthapadmanabhan, Srinivasan Parthasarathy, and Balaraman Ravindran. 2019. A New Measure of Modularity in Hypergraphs: Theoretical Insights and Implications for Effective Clustering. In Complex Networks, Vol., Vol. 881. 286--297. https://doi.org/10.1007/978--3-030--36687--2_24
[33]
Tarun Kumar, Sankaran Vaidyanathan, Harini Ananthapadmanabhan, Srinivasan Parthasarathy, and Balaraman Ravindran. 2020. Hypergraph clustering by iteratively reweighted modularity maximization. Appl. Netw. Sci., Vol. 5, 1 (2020), 52. https://doi.org/10.1007/s41109-020-00300--3
[34]
Jussi~M Kumpula, Jari Saram"aki, Kimmo Kaski, and János Kertész. 2007. Limited resolution in complex network community detection with Potts model approach. Eur. Phys. J. B, Vol. 56 (2007), 41--45. https://doi.org/10.1140/epjb/e2007-00088--4
[35]
Geon Lee, Minyoung Choe, and Kijung Shin. 2021. How Do Hyperedges Overlap in Real-World Hypergraphs? - Patterns, Measures, and Generators. In WWW. 3396--3407. https://doi.org/10.1145/3442381.3450010
[36]
Jure Leskovec, Kevin~J Lang, Anirban Dasgupta, and Michael~W Mahoney. 2009. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, Vol. 6, 1 (2009), 29--123. https://doi.org/10.1080/15427951.2009.10129177
[37]
Lei Li and Tao Li. 2013. News recommendation via hypergraph learning: encapsulation of user behavior and news content. In WSDM. 305--314. https://doi.org/10.1145/2433396.2433436
[38]
Pan Li and Olgica Milenkovic. 2017. Inhomogeneous Hypergraph Clustering with Applications. In NIPS, Vol., Vol. 30. 2308--2318.
[39]
Pan Li and Olgica Milenkovic. 2018. Submodular hypergraphs: p-laplacians, cheeger inequalities and spectral clustering. In ICML. 3014--3023.
[40]
Wentao Li, Miao Qiao, Lu Qin, Lijun Chang, Ying Zhang, and Xuemin Lin. 2022. On Scalable Computation of Graph Eccentricities. In SIGMOD. 904--916. https://doi.org/10.1145/3514221.3517874
[41]
Yibo Lin, Shounak Dhar, Wuxi Li, Haoxing Ren, Brucek Khailany, and David~Z Pan. 2019. Dreamplace: Deep learning toolkit-enabled gpu acceleration for modern vlsi placement. In DAC. 1--6. https://doi.org/10.1145/3316781.3317803
[42]
Meng Liu, Nate Veldt, Haoyu Song, Pan Li, and David~F Gleich. 2021. Strongly local hypergraph diffusions for clustering and semi-supervised learning. In WWW. 2092--2103. https://doi.org/10.1145/3442381.3449887
[43]
Christopher~D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to information retrieval. Cambridge University Press. https://doi.org/10.1017/CBO9780511809071
[44]
Mark~EJ Newman. 2006. Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E, Vol. 74, 3 (2006), 036104. https://doi.org/10.1103/PhysRevE.74.036104
[45]
Mark E. J. Newman. 2010. Networks: An Introduction. https://doi.org/10.1093/acprof:oso/9780199206650.001.0001
[46]
Marios Papachristou and Jon Kleinberg. 2022. Core-Periphery Models for Hypergraphs. In KDD. 1337--1347. https://doi.org/10.1145/3534678.3539272
[47]
Emad Ramadan, Arijit Tarafdar, and Alex Pothen. 2004. A hypergraph model for the yeast protein complex network. In IPDPS. 189. https://doi.org/10.1109/IPDPS.2004.1303205
[48]
Jörg Reichardt and Stefan Bornholdt. 2006. Statistical mechanics of community detection. Physical review E, Vol. 74, 1 (2006), 016110. https://doi.org/10.1103/PhysRevE.74.016110
[49]
Satu~Elisa Schaeffer. 2007. Graph clustering. Comput. Sci. Rev., Vol. 1, 1 (2007), 27--64. https://doi.org/10.1016/j.cosrev.2007.05.001
[50]
Jianbo Shi and Jitendra Malik. 2000. Normalized cuts and image segmentation. TPAMI, Vol. 22, 8 (2000), 888--905. https://doi.org/10.1109/34.868688
[51]
Sucheta Soundarajan and John~E. Hopcroft. 2012. Using community information to improve the precision of link prediction methods. In WWW. 607--608. https://doi.org/10.1145/2187980.2188150
[52]
Yuuki Takai, Atsushi Miyauchi, Masahiro Ikeda, and Yuichi Yoshida. 2020. Hypergraph Clustering Based on PageRank. In KDD. 1970--1978. https://doi.org/10.1145/3394486.3403248
[53]
Bertrand Thirion, Gaël Varoquaux, Elvis Dohmatob, and Jean-Baptiste Poline. 2014. Which fMRI clustering gives good brain parcellations? Front. Neurosci., Vol. 8 (2014), 167. https://doi.org/10.3389/fnins.2014.00167
[54]
Nate Veldt, Austin~R Benson, and Jon Kleinberg. 2020a. Minimizing localized ratio cut objectives in hypergraphs. In KDD. 1708--1718. https://doi.org/10.1145/3394486.3403222
[55]
Nate Veldt, Anthony Wirth, and David~F Gleich. 2020b. Parameterized correlation clustering in hypergraphs and bipartite graphs. In KDD. 1868--1876. https://doi.org/10.1145/3394486.3403238
[56]
Ulrike Von~Luxburg. 2007. A tutorial on spectral clustering. Statistics and computing, Vol. 17 (2007), 395--416. https://doi.org/10.1007/s11222-007--9033-z
[57]
Zhiqiang Xu, Yiping Ke, Yi Wang, Hong Cheng, and James Cheng. 2012. A model-based approach to attributed graph clustering. In SIGMOD. 505--516. https://doi.org/10.1145/2213836.2213894
[58]
Hao Yin, Austin~R Benson, Jure Leskovec, and David~F Gleich. 2017. Local higher-order graph clustering. In KDD. 555--564. https://doi.org/10.1145/3097983.3098069
[59]
Chen Zhe, Aixin Sun, and Xiaokui Xiao. 2019. Community Detection on Large Complex Attribute Network. In KDD. 2041--2049. https://doi.org/10.1145/3292500.3330721
[60]
Dengyong Zhou, Jiayuan Huang, and Bernhard Schö lkopf. 2006. Learning with Hypergraphs: Clustering, Classification, and Embedding. In NeurIPS. 1601--1608. https://doi.org/10.7551/mitpress/7503.003.0205 io

Cited By

View all
  • (2024)Hypergraph animalsPhysical Review E10.1103/PhysRevE.110.044125110:4Online publication date: 17-Oct-2024
  • (2024)AntiFormer: graph enhanced large language model for binding affinity predictionBriefings in Bioinformatics10.1093/bib/bbae40325:5Online publication date: 20-Aug-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 1, Issue 3
PACMMOD
September 2023
472 pages
EISSN:2836-6573
DOI:10.1145/3632968
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 November 2023
Published in PACMMOD Volume 1, Issue 3

Permissions

Request permissions for this article.

Author Tags

  1. cardinality
  2. modularity
  3. random graph

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)276
  • Downloads (Last 6 weeks)13
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Hypergraph animalsPhysical Review E10.1103/PhysRevE.110.044125110:4Online publication date: 17-Oct-2024
  • (2024)AntiFormer: graph enhanced large language model for binding affinity predictionBriefings in Bioinformatics10.1093/bib/bbae40325:5Online publication date: 20-Aug-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media