MiDaS: Representative Sampling from Real-world Hypergraphs

Choe, Minyoung; Yoo, Jaemin; Lee, Geon; Baek, Woonsung; Kang, U; Shin, Kijung

doi:10.1145/1122445.1122456

Computer Science > Social and Information Networks

arXiv:2202.01587 (cs)

[Submitted on 3 Feb 2022 (v1), last revised 5 Feb 2022 (this version, v2)]

Title:MiDaS: Representative Sampling from Real-world Hypergraphs

Authors:Minyoung Choe, Jaemin Yoo, Geon Lee, Woonsung Baek, U Kang, Kijung Shin

View PDF

Abstract:Graphs are widely used for representing pairwise interactions in complex systems. Since such real-world graphs are large and often evergrowing, sampling a small representative subgraph is indispensable for various purposes: simulation, visualization, stream processing, representation learning, crawling, to name a few. However, many complex systems consist of group interactions (e.g., collaborations of researchers and discussions on online Q&A platforms), and thus they can be represented more naturally and accurately by hypergraphs (i.e., sets of sets) than by ordinary graphs. Motivated by the prevalence of large-scale hypergraphs, we study the problem of representative sampling from real-world hypergraphs, aiming to answer (Q1) what a representative sub-hypergraph is and (Q2) how we can find a representative one rapidly without an extensive search. Regarding Q1, we propose to measure the goodness of a sub-hypergraph by comparing it with the entire hypergraph in terms of ten graph-level, hyperedge-level, and node-level statistics. Regarding Q2, we first analyze the characteristics of six intuitive approaches in 11 real-world hypergraphs. Then, based on the analysis, we propose MiDaS, which draws hyperedges with a bias towards those with high-degree nodes. Through extensive experiments, we demonstrate that MiDaS is (a) Representative: finding overall the most representative samples among 13 considered approaches, (b) Fast: several orders of magnitude faster than the strongest competitors, which performs an extensive search, and (c) Automatic: rapidly searching a proper degree of bias.

Comments:	Accepted to WWW 2022 - The Web Conference 2022
Subjects:	Social and Information Networks (cs.SI)
Cite as:	arXiv:2202.01587 [cs.SI]
	(or arXiv:2202.01587v2 [cs.SI] for this version)
	https://doi.org/10.48550/arXiv.2202.01587
Related DOI:	https://doi.org/10.1145/1122445.1122456

Submission history

From: Minyoung Choe [view email]
[v1] Thu, 3 Feb 2022 13:50:23 UTC (13,054 KB)
[v2] Sat, 5 Feb 2022 05:36:50 UTC (13,054 KB)

Computer Science > Social and Information Networks

Title:MiDaS: Representative Sampling from Real-world Hypergraphs

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Social and Information Networks

Title:MiDaS: Representative Sampling from Real-world Hypergraphs

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators