Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2588555.2588563acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

OPT: a new framework for overlapped and parallel triangulation in large-scale graphs

Published: 18 June 2014 Publication History

Abstract

Graph triangulation, which finds all triangles in a graph, has been actively studied due to its wide range of applications in the network analysis and data mining. With the rapid growth of graph data size, disk-based triangulation methods are in demand but little researched. To handle a large-scale graph which does not fit in memory, we must iteratively load small parts of the graph. In the existing literature, achieving the ideal cost has been considered to be impossible for billion-scale graphs due to the memory size constraint. In this paper, we propose an overlapped and parallel disk-based triangulation framework for billion-scale graphs, OPT, which achieves the ideal cost by (1) full overlap of the CPU and I/O operations and (2) full parallelism of multi-core CPU and FlashSSD I/O. In OPT, triangles in memory are called the internal triangles while triangles constituting vertices in memory and vertices in external memory are called the external triangles. At the macro level, OPT overlaps the internal triangulation and the external triangulation, while it overlaps the CPU and I/O operations at the micro level. Thereby, the cost of OPT is close to the ideal cost. Moreover, OPT instantiates both vertex-iterator and edge-iterator models and benefits from multi-thread parallelism on both types of triangulation. Extensive experiments conducted on large-scale datasets showed that (1) OPT achieved the elapsed time close to that of the ideal method with less than 7% of overhead under the limited memory budget, (2) OPT achieved linear speed-up with an increasing number of CPU cores, (3) OPT outperforms the state-of-the-art parallel method by up to an order of magnitude with 6 CPU cores, and (4) for the first time in the literature, the triangulation results are reported for a billion-vertex scale real-world graph.

References

[1]
N. Alon et al. The space complexity of approximating the frequency moments. STOC '96.
[2]
N. Alon et al. Finding and counting given length cycles. phAlgorithmica, 1997.
[3]
S. Arifuzzaman et al. Patric: A parallel algorithm for counting triangles in massive networks. CIKM '13.
[4]
L. Backstrom et al. Group formation in large social networks: membership, growth, and evolution. KDD '06.
[5]
V. Batagelj and A. Mrvar. A subquadratic triad census algorithm for large sparse networks with small maximum degree. phSocial Networks, 2001.
[6]
k(2007)}BZ:07aaV. Batagelj and M. Zaver\v snik. Short cycle connectivity. phDiscrete Mathematics, 2007.
[7]
L. Becchetti et al. Efficient semi-streaming algorithms for local triangle counting in massive graphs. KDD '08.
[8]
P. Boldi et al. Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks. WWW '11.
[9]
L. S. Buriol et al. Counting triangles in data streams. PODS '06.
[10]
D. Chakrabarti et al. R-mat: A recursive model for graph mining. SDM'04.
[11]
N. Chiba and T. Nishizeki. Arboricity and subgraph listing algorithms. phSIAM J. Comput., Feb. 1985.
[12]
S. Chu and J. Cheng. Triangle listing in massive networks and its applications. KDD '11.
[13]
D. Coppersmith and R. Kumar. An improved data stream algorithm for frequency moments. SODA '04.
[14]
J.-P. Eckmann and E. Moses. Curvature of co-links uncovers hidden thematic layers in the World Wide Web. PNAS'02.
[15]
P. Erdos and A. Renyi. On random graphs i. phPubl. Math. Debrecen, 1959.
[16]
J. E. Gonzalez et al. Powergraph: distributed graph-parallel computation on natural graphs. OSDI'12.
[17]
W.-S. Han et al. Turbograph: a fast parallel graph engine handling billion-scale graphs in a single pc. KDD '13.
[18]
F. Harary and H. J. Kommel. Matrix measures for transitivity and balance. phJournal of Mathematical Sociology, 1979.
[19]
P. Holme and B. J. Kim. Growing scale-free networks with tunable clustering. phPhys. Rev. E, 2002.
[20]
X. Hu et al. Massive graph triangulation. SIGMOD '13.
[21]
A. Itai and M. Rodeh. Finding a minimum circuit in a graph. STOC '77.
[22]
H. Kwak et al. What is twitter, a social network or a news media? WWW '10.
[23]
A. Kyrola et al. Graphchi: large-scale graph computation on just a pc. OSDI'12.
[24]
M. Latapy. Main-memory triangle computations for very large (sparse (power-law)) graphs. phTheor. Comput. Sci., 2008.
[25]
A. Mislove et al. Measurement and analysis of online social networks. IMC '07.
[26]
z et al.()}PDB:12aaA. Prat-Pérez et al. Shaping communities out of triangles. CIKM '12.
[27]
T. Schank. phAlgorithmic Aspects of Triangle-Based Network Analysis. PhD thesis, Universit\"at Karlsruhe, 2007.
[28]
T. Schank and D. Wagner. Finding, counting and listing all triangles in large graphs, an experimental study. WEA'05.
[29]
C. Seshadhri et al. Community structure and scale-free collections of erd\Hos-rényi graphs. phPhysical Review E, 85 (5): 056109, 2012.
[30]
S. Suri and S. Vassilvitskii. Counting triangles and the curse of the last reducer. WWW '11.
[31]
C. E. Tsourakakis et al. Doulion: counting triangles in massive graphs with a coin. KDD '09.

Cited By

View all
  • (2023)DecoMine: A Compilation-Based Graph Pattern Mining System with Pattern DecompositionProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3567955.3567956(47-61)Online publication date: 25-Mar-2023
  • (2023)Hypergraph motifs and their extensions beyond binaryThe VLDB Journal10.1007/s00778-023-00827-833:3(625-665)Online publication date: 26-Dec-2023
  • (2022)sGrapp: Butterfly Approximation in Streaming GraphsACM Transactions on Knowledge Discovery from Data10.1145/349501116:4(1-43)Online publication date: 8-Jan-2022
  • Show More Cited By

Index Terms

  1. OPT: a new framework for overlapped and parallel triangulation in large-scale graphs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
    June 2014
    1645 pages
    ISBN:9781450323765
    DOI:10.1145/2588555
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 June 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. big data
    2. parallel processing
    3. triangulation

    Qualifiers

    • Research-article

    Conference

    SIGMOD/PODS'14
    Sponsor:

    Acceptance Rates

    SIGMOD '14 Paper Acceptance Rate 107 of 421 submissions, 25%;
    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)22
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)DecoMine: A Compilation-Based Graph Pattern Mining System with Pattern DecompositionProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3567955.3567956(47-61)Online publication date: 25-Mar-2023
    • (2023)Hypergraph motifs and their extensions beyond binaryThe VLDB Journal10.1007/s00778-023-00827-833:3(625-665)Online publication date: 26-Dec-2023
    • (2022)sGrapp: Butterfly Approximation in Streaming GraphsACM Transactions on Knowledge Discovery from Data10.1145/349501116:4(1-43)Online publication date: 8-Jan-2022
    • (2021)CoCoS: Fast and Accurate Distributed Triangle Counting in Graph StreamsACM Transactions on Knowledge Discovery from Data10.1145/344148715:3(1-30)Online publication date: 21-Apr-2021
    • (2021)Theoretically Efficient Parallel Graph Algorithms Can Be Fast and ScalableACM Transactions on Parallel Computing10.1145/34343938:1(1-70)Online publication date: 22-Apr-2021
    • (2021)TCStream: Large-Scale Graph Triangle-Counting on a single Machine using GPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.3135329(1-1)Online publication date: 2021
    • (2020)Hypergraph motifsProceedings of the VLDB Endowment10.14778/3407790.340782313:12(2256-2269)Online publication date: 14-Sep-2020
    • (2020)HOSAProceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference & 2020 3rd International Conference on Big Data and Artificial Intelligence10.1145/3409501.3409507(121-130)Online publication date: 3-Jul-2020
    • (2020)Improving I/O Complexity of Triangle EnumerationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.3003259(1-1)Online publication date: 2020
    • (2020)Temporal locality-aware sampling for accurate triangle counting in real graph streamsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-020-00624-729:6(1501-1525)Online publication date: 12-Aug-2020
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media