Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2783258.2783413acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Beyond Triangles: A Distributed Framework for Estimating 3-profiles of Large Graphs

Published: 10 August 2015 Publication History
  • Get Citation Alerts
  • Abstract

    We study the problem of approximating the 3-profile of a large graph. 3-profiles are generalizations of triangle counts that specify the number of times a small graph appears as an induced subgraph of a large graph. Our algorithm uses the novel concept of 3-profile sparsifiers: sparse graphs that can be used to approximate the full 3-profile counts for a given large graph. Further, we study the problem of estimating local and ego 3-profiles, two graph quantities that characterize the local neighborhood of each vertex of a graph.
    Our algorithm is distributed and operates as a vertex program over the GraphLab PowerGraph framework. We introduce the concept of edge pivoting which allows us to collect 2-hop information without maintaining an explicit 2-hop neighborhood list at each vertex. This enables the computation of all the local 3-profiles in parallel with minimal communication.
    We test our implementation in several experiments scaling up to 640 cores on Amazon EC2. We find that our algorithm can estimate the 3-profile of a graph in approximately the same time as triangle counting. For the harder problem of ego 3-profiles, we introduce an algorithm that can estimate profiles of hundreds of thousands of vertices in parallel, in the timescale of minutes.

    Supplementary Material

    MP4 File (p229.mp4)

    References

    [1]
    N. K. Ahmed, N. Duffield, J. Neville, and R. Kompella. Graph Sample and Hold: A Framework for Big-Graph Analytics. In KDD, 2014.
    [2]
    N. Alon, R. Yuster, and U. Zwick. Finding and Counting Given Length Cycles. Algorithmica, 17(3):209--223, Mar. 1997.
    [3]
    Amazon web services. http://aws.amazon.com, 2015.
    [4]
    L. Becchetti, P. Boldi, C. Castillo, and A. Gionis. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In KDD, 2008.
    [5]
    M. A. Bhuiyan, M. Rahman, M. Rahman, and M. Al Hasan. GUISE: Uniform Sampling of Graphlets for Large Graph Analysis. IEEE 12th International Conference on Data Mining, pages 91--100, Dec. 2012.
    [6]
    C. Borgs, J. Chayes, and K. Vesztergombi. Counting Graph Homomorphisms. Topics in Discrete Mathematics, pages 315--371, 2006.
    [7]
    S. Chu and J. Cheng. Triangle listing in massive networks and its applications. In Proc. 17th ACM SIGKDD, page 672, New York, New York, USA, 2011. ACM Press.
    [8]
    T. A. Davis and Y. Hu. The University of Florida Sparse Matrix Collection. ACM Transactions on Mathematical Software, 38(1):1--25, 2011.
    [9]
    E. R. Elenberg, K. Shanmugam, M. Borokhovich, and A. G. Dimakis. Beyond Triangles: A Distributed Framework for Estimating 3-profiles of Large Graphs, 2015.
    [10]
    J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 17--30, 2012.
    [11]
    W. Han and J. Lee. Turboiso: Towards Ultrafast and Robust Subgraph Isomorphism Search in Large Graph Databases. In SIGMOD, pages 337--348, 2013.
    [12]
    F. Hormozdiari, P. Berenbrink, N. Przulj, and S. C. Sahinalp. Not All Scale-free Networks are Born Equal: The Role of the Seed Graph in PPI Network Evolution. PLoS Computational Biology, 3(7):e118, July 2007.
    [13]
    T. Hočevar and J. Demšar. A Combinatorial Approach to Graphlet Counting. Bioinformatics, 30(4):559--65, Feb. 2014.
    [14]
    M. Jha, C. Seshadhri, and A. Pinar. Path Sampling: A Fast and Provable Method for Estimating 4-Vertex Subgraph Counts. 2014.
    [15]
    J. H. Kim and V. H. Vu. Concentration of Multivariate Polynomials and Its Applications. Combinatorica, 20(3):417--434, 2000.
    [16]
    T. Kloks, D. Kratsch, and H. Müller. Finding and Counting Small Induced Subgraphs Efficiently. Information Processing Letters, 74(3--4):115--121, 2000.
    [17]
    M. Kowaluk, A. Lingas, and E.-M. Lundell. Counting and Detecting Small Subgraphs via Equations. SIAM Journal of Discrete Mathematics, 27(2):892--909, 2013.
    [18]
    H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a social network or a news media? In Proc. 19th International World Wide Web Conference, pages 591--600, 2010.
    [19]
    J. Lee, W.-S. Han, R. Kasperovics, and J.-H. Lee. An In-depth Comparison of Subgraph Isomorphism Algorithms in Graph Databases. Proc. VLDB Endowment, 6(2):133--144, Dec. 2012.
    [20]
    J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.
    [21]
    L. Lovász. Large Networks and Graph Limits, volume 60. American Mathematical Soc., 2012.
    [22]
    D. Marcus and Y. Shavitt. RAGE - A Rapid Graphlet Enumerator for Large Networks. Computer Networks, 56(2):810--819, Feb. 2012.
    [23]
    R. Meusel, S. Vigna, O. Lehmberg, and C. Bizer. Graph structure in the web - revisited. In Proc. 23rd International World Wide Web Conference, Web Science Track. ACM, 2014.
    [24]
    D. O'Callaghan, M. Harrigan, J. Carthy, and P. Cunningham. Identifying Discriminating Network Motifs in YouTube Spam. Feb. 2012.
    [25]
    R. Pagh and C. E. Tsourakakis. Colorful triangle counting and a MapReduce implementation. Information Processing Letters, 112(7):277--281, 2012.
    [26]
    N. Przulj. Biological Network Comparison Using Graphlet Degree Distribution. Bioinformatics, 23(2):177--183, 2007.
    [27]
    N. Satish, N. Sundaram, M. A. Patwary, J. Seo, J. Park, M. A. Hassaan, S. Sengupta, Z. Yin, and P. Dubey. Navigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets. In SIGMOD, pages 979--990, 2014.
    [28]
    T. Schank. Algorithmic Aspects of Triangle-Based Network Analysis. PhD thesis, 2007.
    [29]
    C. Seshadhri, A. Pinar, and T. G. Kolda. Triadic measures on graphs: The power of wedge sampling. In Proc. SDM, pages 10--18, 2013.
    [30]
    N. Shervashidze, K. Mehlhorn, and T. H. Petri. Efficient Graphlet Kernels for Large Graph Comparison. In AISTATS, pages 488--495, 2009.
    [31]
    S. Suri and S. Vassilvitskii. Counting Triangles and the Curse of the Last Reducer. In Proc. 20th International World Wide Web Conference, page 607. ACM Press, 2011.
    [32]
    C. E. Tsourakakis. Fast Counting of Triangles in Large Real Networks: Algorithms and Laws. In IEEE International Conference on Data Mining, 2008.
    [33]
    C. E. Tsourakakis, U. Kang, G. L. Miller, and C. Faloutsos. Doulion: Counting Triangles in Massive Graphs with a Coin. In SIGKDD, 2009.
    [34]
    C. E. Tsourakakis, M. Kolountzakis, and G. L. Miller. Triangle Sparsifiers. Journal of Graph Theory and Applications, 15(6):703--726, 2011.
    [35]
    J. Ugander, L. Backstrom, M. Park, and J. Kleinberg. Subgraph Frequencies: Mapping the Empirical and Extremal Geography of Large Graph Collections. In Proc. World Wide Web Conference, 2013.
    [36]
    V. V. Williams, J. Wang, R. Williams, and H. Yu. Finding Four-Node Subgraphs in Triangle Time. SODA, pages 1671--1680, 2014.
    [37]
    X. Yan and J. Han. gSpan: Graph-Based Substructure Pattern Mining. In International Conference on Data Mining, 2002.

    Cited By

    View all

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
    August 2015
    2378 pages
    ISBN:9781450336642
    DOI:10.1145/2783258
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 August 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. 3-profiles
    2. graph analytics
    3. graph engines
    4. graph sparsifiers
    5. graphlab
    6. motifs

    Qualifiers

    • Research-article

    Funding Sources

    • DARPA
    • NSF
    • Google
    • Docomo

    Conference

    KDD '15
    Sponsor:

    Acceptance Rates

    KDD '15 Paper Acceptance Rate 160 of 819 submissions, 20%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)3

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Detecting Anomalous Graphs in Labeled Multi-Graph DatabasesACM Transactions on Knowledge Discovery from Data10.1145/353377017:2(1-25)Online publication date: 20-Feb-2023
    • (2023)CAGE: Cache-Aware Graphlet EnumerationString Processing and Information Retrieval10.1007/978-3-031-43980-3_11(129-142)Online publication date: 20-Sep-2023
    • (2022)Parallel Five-cycle Counting AlgorithmsACM Journal of Experimental Algorithmics10.1145/355654127(1-23)Online publication date: 21-Oct-2022
    • (2022)Approximately Counting Butterflies in Large Bipartite Graph StreamsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.306298734:12(5621-5635)Online publication date: 1-Dec-2022
    • (2022)BFS-based distributed algorithm for parallel local-directed subgraph enumerationJournal of Complex Networks10.1093/comnet/cnac05110:6Online publication date: 1-Dec-2022
    • (2021)Trust: Triangle Counting Reloaded on GPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.306489232:11(2646-2660)Online publication date: 1-Nov-2021
    • (2020)Adaptive shrinkage estimation for streaming graphsProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496613(10595-10606)Online publication date: 6-Dec-2020
    • (2020)Distributed subgraph countingProceedings of the VLDB Endowment10.14778/3407790.340784013:12(2493-2507)Online publication date: 14-Sep-2020
    • (2020)PangolinProceedings of the VLDB Endowment10.14778/3389133.338913713:8(1190-1205)Online publication date: 3-May-2020
    • (2020)Efficiently Counting Vertex Orbits of All 5-vertex Subgraphs, by EVOKEProceedings of the 13th International Conference on Web Search and Data Mining10.1145/3336191.3371773(447-455)Online publication date: 20-Jan-2020
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media