Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2463676.2463704acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Massive graph triangulation

Published: 22 June 2013 Publication History

Abstract

This paper studies I/O-efficient algorithms for settling the classic triangle listing problem, whose solution is a basic operator in dealing with many other graph problems. Specifically, given an undirected graph G, the objective of triangle listing is to find all the cliques involving 3 vertices in G. The problem has been well studied in internal memory, but remains an urgent difficult challenge when G does not fit in memory, rendering any algorithm to entail frequent I/O accesses. Although previous research has attempted to tackle the challenge, the state-of-the-art solutions rely on a set of crippling assumptions to guarantee good performance. Motivated by this, we develop a new algorithm that is provably I/O and CPU efficient at the same time, without making any assumption on the input G at all. The algorithm uses ideas drastically different from all the previous approaches, and outperformed the existing competitors by a factor over an order of magnitude in our extensive experimentation.

References

[1]
A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. CACM, 31(9):1116--1127, 1988.
[2]
N. Alon and J. H. Spencer. The Probabilistic Methods. Wiley, New York, 2nd edition, 2000.
[3]
E. Bakshy, I. Rosenn, C. Marlow, and L. A. Adamic. The role of social networks in information diffusion. CoRR, 2012.
[4]
V. Batagelj and M. Zaversnik. Short cycle connectivity. Discrete Mathematics, 307:310--318, 2007.
[5]
D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-MAT: A recursive model for graph mining. In SDM, 2004.
[6]
J. Cheng, Y. Ke, A. W.-C. Fu, J. X. Yu, and L. Zhu. Finding maximal cliques in massive networks. TODS, 36(4):21, 2011.
[7]
N. Chiba and T. Nishizeki. Arboricity and subgraph listing algorithms. SIAM J. of Comp., 14(1):210--223, 1985.
[8]
S. Chu and J. Cheng. Triangle listing in massive networks and its applications. In SIGKDD, pages 672--680, 2011.
[9]
S. Chu and J. Cheng. Triangle listing in massive networks. TKDD, 6(4):17, 2012.
[10]
J. Cohen. Graph twiddling in a mapreduce world. Computing in Science and Engineering, 11(4):29--41, 2009.
[11]
R. Dementiev. Algorithm engineering for large data sets hardware, software, algorithms. PhD thesis, Saarland University, 2006.
[12]
D. Eppstein and E. S. Spiro. The h-index of a graph and its application to dynamic subgraph statistics. In WADS, pages 278--289, 2009.
[13]
J. Hellings, G. H. L. Fletcher, and H. J. Haverkort. Efficient external-memory bisimulation on dags. In SIGMOD, pages 553--564, 2012.
[14]
A. Itai and M. Rodeh. Finding a minimum circuit in a graph. SIAM J. of Comp., 7(4):413--423, 1978.
[15]
M. Latapy. Main-memory triangle computations for very large (sparse (power-law)) graphs. TCC, 407(1-3):458--473, 2008.
[16]
M. C. Lin, F. J. Soulignac, and J. L. Szwarcfiter. Arboricity, h-index, and dynamic algorithms. TCC, 426:75--90, 2012.
[17]
B. Menegola. An external memory algorithm for listing triangles. Technical report, Universidade Federal do Rio Grande do Sul, 2010.
[18]
C. S. J. A. Nash-Williams. Decomposition of finite graphs into forests. Journal of the London Mathematical Society, 39(1):12, 1964.
[19]
T. Schank. Algorithmic Aspects of Triangle-Based Network Analysis. PhD thesis, Universitat Karlsruhe, Fakultat fur Informatik, 2007.
[20]
T. Schank and D. Wagner. Finding, counting and listing all triangles in large graphs, an experimental study. In Workshop on Experimental Algorithms (WEA), pages 606--609, 2005.
[21]
C.-H. Tai, P. S. Yu, D.-N. Yang, and M.-S. Chen. Privacy-preserving social network publication against friendship attacks. In SIGKDD, pages 1262--1270, 2011.
[22]
J. Wang and J. Cheng. Truss decomposition in massive networks. PVLDB, 5(9):812--823, 2012.
[23]
N. Wang, J. Zhang, K.-L. Tan, and A. K. H. Tung. On triangulation-based dense neighborhood graph discovery. PVLDB, 4(2):58--68, 2010.
[24]
D. J. Watts and S. H. Strogatz. Collective dynamics of 'small-world' networks. Nature, 393:440--442, 1998.
[25]
P. Zhao, C. C. Aggarwal, and M. Wang. gsketch: On query estimation in graph streams. PVLDB, 5(3):193--204, 2011.

Cited By

View all
  • (2024)Efficient Maximal Motif-Clique Enumeration over Large Heterogeneous Information NetworksProceedings of the VLDB Endowment10.14778/3681954.368197517:11(2946-2959)Online publication date: 30-Aug-2024
  • (2024)Efficient Algorithms for Pseudoarboricity Computation in Large Static and Dynamic GraphsProceedings of the VLDB Endowment10.14778/3681954.368195817:11(2722-2734)Online publication date: 1-Jul-2024
  • (2024)FRESH: Towards Efficient Graph Queries in an Outsourced Graph2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00346(4545-4557)Online publication date: 13-May-2024
  • Show More Cited By

Recommendations

Reviews

Amitabha Roy

A triangle of an undirected graph is simply three vertices of the graph, u , v , and w , which are also connected by the edges ( u , v ), ( v , w ), and ( w , u ). Spatially, this arrangement looks like a triangle. Listing all of the triangles of a graph has a number of important applications in network analysis and knowledge discovery. For example, vertices with a high triangle count are usually extremely significant in social networks, such as Twitter, often associated with people who act as central figures in communities. This 2013 SIGMOD Best Paper Award winner deals with the problem of counting triangles in graphs that are too large to be stored entirely in the main memory. The algorithm uses as input a graph whose edge list is sorted by source vertex, thereby clustering vertices of an edge together. The edge list is loaded sequentially, one chunk of edges at a time, into memory. This chunk is then compared to a scan of the entire edge list. The process identifies all triangles with one edge in the loaded chunk (say ( u , v ) in the example above) and the opposite vertex ( w in the example above). This paper provides an algorithmic complexity analysis of both data movement from disk as well as of work done in the memory, showing that both are done efficiently by connecting the amount of work done to the arboricity of the graph. The paper is a must-read for both theorists and practitioners in the large-scale graph processing area. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
June 2013
1322 pages
ISBN:9781450320375
DOI:10.1145/2463676
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 June 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. graph
  2. i/o-efficient algorithm
  3. triangle

Qualifiers

  • Research-article

Conference

SIGMOD/PODS'13
Sponsor:

Acceptance Rates

SIGMOD '13 Paper Acceptance Rate 76 of 372 submissions, 20%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)3
Reflects downloads up to 12 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Efficient Maximal Motif-Clique Enumeration over Large Heterogeneous Information NetworksProceedings of the VLDB Endowment10.14778/3681954.368197517:11(2946-2959)Online publication date: 30-Aug-2024
  • (2024)Efficient Algorithms for Pseudoarboricity Computation in Large Static and Dynamic GraphsProceedings of the VLDB Endowment10.14778/3681954.368195817:11(2722-2734)Online publication date: 1-Jul-2024
  • (2024)FRESH: Towards Efficient Graph Queries in an Outsourced Graph2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00346(4545-4557)Online publication date: 13-May-2024
  • (2024)I/O Efficient Max-Truss Computation in Large Static and Dynamic Graphs2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00249(3217-3229)Online publication date: 13-May-2024
  • (2024)GraphIdx: An efficient indexing technique for accelerating graph data miningSoftware Impacts10.1016/j.simpa.2024.10063220(100632)Online publication date: May-2024
  • (2024)Parallelization of butterfly counting on hierarchical memoryThe VLDB Journal10.1007/s00778-024-00856-x33:5(1453-1484)Online publication date: 7-Jun-2024
  • (2023)I/O-Efficient Butterfly Counting at ScaleProceedings of the ACM on Management of Data10.1145/35887141:1(1-27)Online publication date: 30-May-2023
  • (2023)Hypergraph motifs and their extensions beyond binaryThe VLDB Journal10.1007/s00778-023-00827-833:3(625-665)Online publication date: 26-Dec-2023
  • (2022)Incremental Influential Community Detection in Large NetworksProceedings of the 34th International Conference on Scientific and Statistical Database Management10.1145/3538712.3538724(1-12)Online publication date: 6-Jul-2022
  • (2022)sGrapp: Butterfly Approximation in Streaming GraphsACM Transactions on Knowledge Discovery from Data10.1145/349501116:4(1-43)Online publication date: 8-Jan-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media