Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3038912.3052597acmotherconferencesArticle/Chapter ViewAbstractPublication PageswebconfConference Proceedingsconference-collections
research-article

ESCAPE: Efficiently Counting All 5-Vertex Subgraphs

Published: 03 April 2017 Publication History
  • Get Citation Alerts
  • Abstract

    Counting the frequency of small subgraphs is a fundamental technique in network analysis across various domains, most notably in bioinformatics and social networks. The special case of triangle counting has received much attention. Getting results for 4-vertex or 5-vertex patterns is highly challenging, and there are few practical results known that can scale to massive sizes.
    We introduce an algorithmic framework that can be adopted to count any small pattern in a graph and apply this framework to compute exact counts for all 5-vertex subgraphs. Our framework is built on cutting a pattern into smaller ones, and using counts of smaller patterns to get larger counts. Furthermore, we exploit degree orientations of the graph to reduce runtimes even further. These methods avoid the combinatorial explosion that typical subgraph counting algorithms face. We prove that it suffices to enumerate only four specific subgraphs (three of them have less than 5 vertices) to exactly count all 5-vertex patterns.
    We perform extensive empirical experiments on a variety of real-world graphs. We are able to compute counts of graphs with tens of millions of edges in minutes on a commodity machine. To the best of our knowledge, this is the first practical algorithm for 5-vertex pattern counting that runs at this scale. A stepping stone to our main algorithm is a fast method for counting all 4-vertex patterns. This algorithm is typically ten times faster than the state of the art 4-vertex counters.

    References

    [1]
    Escape. https://bitbucket.org/seshadhri/escape.
    [2]
    Parallel parameterized graphlet decomposition (pgd) library. http://nesreenahmed.com/graphlets/.
    [3]
    L. Adamic and E. Adar. Friends and neighbors on the web. Social Networks, 25(3):211--230, 2003.
    [4]
    N. K. Ahmed, J. Neville, R. A. Rossi, and N. Duffield. Efficient graphlet counting for large networks. In Proceedings of International Conference on Data Mining (ICDM), 2015.
    [5]
    N. Alon, R. Yuster, and U. Zwick. Color-coding: A new method for finding simple paths, cycles and other small subgraphs within large graphs. pages 326--335, 1994.
    [6]
    L. Becchetti, P. Boldi, C. Castillo, and A. Gionis. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In KDD'08, pages 16--24, 2008.
    [7]
    A. Benson, D. F. Gleich, and J. Leskovec. Higher-order organization of complex networks. Science, 353(6295):163--166, 2016.
    [8]
    N. Betzler, R. van Bevern, M. R. Fellows, C. Komusiewicz, and R. Niedermeier. Parameterized algorithmics for finding connected motifs in biological networks. IEEE/ACM Trans. Comput. Biology Bioinform., 8(5):1296--1308, 2011.
    [9]
    M. Bhuiyan, M. Rahman, M. Rahman, and M. A. Hasan. Guise: Uniform sampling of graphlets for large graph analysis. In Proceedings of International Conference on Data Mining, pages 91--100, 2012.
    [10]
    E. Birmel. Detecting local network motifs. Electron. J. Statist., 6:908--933, 2012.
    [11]
    R. Burt. Structural holes and good ideas. American Journal of Sociology, 110(2):349--399, 2004.
    [12]
    N. Chiba and T. Nishizeki. Arboricity and subgraph listing algorithms. SIAM J. Comput., 14:210--223, 1985.
    [13]
    J. Cohen. Graph twiddling in a MapReduce world. Computing in Science & Engineering, 11:29--41, 2009.
    [14]
    J. Coleman. Social capital in the creation of human capital. American Journal of Sociology, 94:S95--S120, 1988.
    [15]
    R. Diestel. Graph Theory, Graduate texts in mathematics 173. Springer-Verlag, 2006.
    [16]
    J.-P. Eckmann and E. Moses. Curvature of co-links uncovers hidden thematic layers in the World Wide Web. Proceedings of the National Academy of Sciences (PNAS), 99(9):5825--5829, 2002.
    [17]
    E. R. Elenberg, K. Shanmugam, M. Borokhovich, and A. G. Dimakis. Beyond triangles: A distributed framework for estimating 3-profiles of large graphs. In Knowledge Data and Discovery (KDD), pages 229--238, 2015.
    [18]
    E. R. Elenberg, K. Shanmugam, M. Borokhovich, and A. G. Dimakis. Distributed estimation of graph 4-profiles. In Conference on World Wide Web, pages 483--493, 2016.
    [19]
    K. Faust. A puzzle concerning triads in social networks: Graph constraints and the triad census. Social Networks, 32(3):221--233, 2010.
    [20]
    M. Gonen and Y. Shavitt. Approximating the number of network motifs. Internet Mathematics, 6(3):349--372, 2009.
    [21]
    D. Hales and S. Arteconi. Motifs in evolving cooperative networks look like protein structure networks. NHM, 3(2):239--249, 2008.
    [22]
    T. Hocevar and J. Demsar. A combinatorial approach to graphlet counting. Bioinformatics, 2014.
    [23]
    P. Holland and S. Leinhardt. A method for detecting structure in sociometric data. American Journal of Sociology, 76:492--513, 1970.
    [24]
    F. Hormozdiari, P. Berenbrink, N. Prulj, and S. C. Sahinalp. Not all scale-free networks are born equal: The role of the seed graph in ppi network evolution. PLoS Computational Biology, 118, 2007.
    [25]
    M. Jha, C. Seshadhri, and A. Pinar. Path sampling: A fast and provable method for estimating 4-vertex subgraph counts. In Proc. World Wide Web (WWW), number 1212.2264, 2015.
    [26]
    D. Liben-Nowell and J. Kleinberg. The link prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7):1019--1031, 2007.
    [27]
    D. Marcus and Y. Shavitt. Efficient counting of network motifs. In ICDCS Workshops, pages 92--98. IEEE Computer Society, 2010.
    [28]
    I. Melckenbeeck, P. Audenaert, T. Michoel, D. Colle, and M. Pickavet. An algorithm to automatically generate the combinatorial orbit counting equations. PLoS ONE, 11(1):1--19, 01 2016.
    [29]
    T. Milenkovic and N. Przulj. Uncovering Biological Network Function via Graphlet Degree Signatures. arXiv, q-bio.MN, Jan. 2008.
    [30]
    R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: Simple building blocks of complex networks. Science, 298(5594):824--827, 2002.
    [31]
    A. Pinar, C. Seshadhri, and V. Vishal. Escape:efficiently counting all 5-vertex subgraphs. Technical report, 2016. https://users.soe.ucsc.edu/ sesh/escape.pdf.
    [32]
    A. Portes. Social capital: Its origins and applications in modern sociology. Annual Review of Sociology, 24(1):1--24, 1998.
    [33]
    N. Przulj, D. G. Corneil, and I. Jurisica. Modeling interactome: scale-free or geometric?. Bioinformatics, 20(18):3508--3515, 2004.
    [34]
    M. Rahman, M. A. Bhuiyan, and M. A. Hasan. Graft: An efficient graphlet counting method for large graph analysis. IEEE Transactions on Knowledge and Data Engineering, PP(99), 2014.
    [35]
    A. E. Sariyuce, C. Seshadhri, A. Pinar, and U. V. Catalyurek. Finding the hierarchy of dense subgraphs using nucleus decompositions. In Proceedings of the 24th International Conference on World Wide Web, WWW '15, pages 927--937, New York, NY, USA, 2015. ACM.
    [36]
    T. Schank and D. Wagner. Approximating clustering coefficient and transitivity. Journal of Graph Algorithms and Applications, 9:265--275, 2005.
    [37]
    T. Schank and D. Wagner. Finding, counting and listing all triangles in large graphs, an experimental study. In Experimental and Efficient Algorithms, pages 606--609. Springer Berlin / Heidelberg, 2005.
    [38]
    C. Seshadhri, T. G. Kolda, and A. Pinar. Community structure and scale-free collections of Erdös-Rényi graphs. Physical Review E, 85(5):056109, May 2012.
    [39]
    C. Seshadhri, A. Pinar, and T. G. Kolda. Fast triangle counting through wedge sampling. In Proceedings of the SIAM Conference on Data Mining, 2013.
    [40]
    S. Suri and S. Vassilvitskii. Counting triangles and the curse of the last reducer. In World Wide Web (WWW), pages 607--614, 2011.
    [41]
    M. Szell, R. Lambiotte, and S. Thurner. Multirelational organization of large-scale social networks in an online world. Proceedings of the National Academy of Sciences, 107:13636--13641, 2010.
    [42]
    J. D. Tomaz Hocevar. Combinatorial algorithm for counting small induced graphs and orbits. Technical report, arXiv, 2016. http://arxiv.org/abs/1601.06834.
    [43]
    C. Tsourakakis, M. N. Kolountzakis, and G. Miller. Triangle sparsifiers. J. Graph Algorithms and Applications, 15:703--726, 2011.
    [44]
    C. E. Tsourakakis. The k-clique densest subgraph problem. In Conference on World Wide Web (WWW), pages 1122--1132, 2015.
    [45]
    C. E. Tsourakakis, U. Kang, G. L. Miller, and C. Faloutsos. Doulion: counting triangles in massive graphs with a coin. In Knowledge Data and Discovery (KDD), pages 837--846, 2009.
    [46]
    C. E. Tsourakakis, J. W. Pachocki, and M. Mitzenmacher. Scalable motif-aware graph clustering. CoRR, abs/1606.06235, 2016.
    [47]
    J. Ugander, L. Backstrom, and J. M. Kleinberg. Subgraph frequencies: mapping the empirical and extremal geography of large graph collections. In WWW, pages 1307--1318, 2013.
    [48]
    S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications. Cambridge University Press, 1994.
    [49]
    D. Watts and S. Strogatz. Collective dynamics of 'small-world' networks. Nature, 393:440--442, 1998.
    [50]
    S. Wernicke and F. Rasche. Fanmod: a tool for fast network motif detection. Bioinformatics, 22(9):1152--1153, 2006.
    [51]
    E. Wong, B. Baur, S. Quader, and C.-H. Huang. Biological network motif detection: principles and practice. Briefings in Bioinformatics, 13(2):202--215, 2012.
    [52]
    Z. Zhao, G. Wang, A. Butt, M. Khan, V. S. A. Kumar, and M. Marathe. Sahad: Subgraph analysis in massive networks using hadoop. In Proceedings of International Parallel and Distributed Processing Symposium (IPDPS), pages 390--401, 2012.
    [53]
    The network repository. Available at http://www.networkrepository.com/.
    [54]
    Stanford Network Analysis Project (SNAP). Available at http://snap.stanford.edu/.

    Cited By

    View all
    • (2024)ZeroEA: A Zero-Training Entity Alignment Framework via Pre-Trained Language ModelProceedings of the VLDB Endowment10.14778/3654621.365464017:7(1765-1774)Online publication date: 1-Mar-2024
    • (2024)Frequent Itemset Mining in the Graph Data FieldComputer Science and Application10.12677/CSA.2024.14101714:01(158-172)Online publication date: 2024
    • (2024)Towards Optimal Output-Sensitive Clique Listing or: Listing Cliques from Smaller CliquesProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649663(923-934)Online publication date: 10-Jun-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '17: Proceedings of the 26th International Conference on World Wide Web
    April 2017
    1678 pages
    ISBN:9781450349130

    Sponsors

    • IW3C2: International World Wide Web Conference Committee

    In-Cooperation

    Publisher

    International World Wide Web Conferences Steering Committee

    Republic and Canton of Geneva, Switzerland

    Publication History

    Published: 03 April 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. graph orientations
    2. motif analysis
    3. subgraph counting

    Qualifiers

    • Research-article

    Conference

    WWW '17
    Sponsor:
    • IW3C2

    Acceptance Rates

    WWW '17 Paper Acceptance Rate 164 of 966 submissions, 17%;
    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)138
    • Downloads (Last 6 weeks)27

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)ZeroEA: A Zero-Training Entity Alignment Framework via Pre-Trained Language ModelProceedings of the VLDB Endowment10.14778/3654621.365464017:7(1765-1774)Online publication date: 1-Mar-2024
    • (2024)Frequent Itemset Mining in the Graph Data FieldComputer Science and Application10.12677/CSA.2024.14101714:01(158-172)Online publication date: 2024
    • (2024)Towards Optimal Output-Sensitive Clique Listing or: Listing Cliques from Smaller CliquesProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649663(923-934)Online publication date: 10-Jun-2024
    • (2024)DeSCo: Towards Generalizable and Scalable Deep Subgraph CountingProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635788(218-227)Online publication date: 4-Mar-2024
    • (2024)Efficient -Clique Counting on Large Graphs: The Power of Color-Based Sampling ApproachesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.331464336:4(1518-1536)Online publication date: Apr-2024
    • (2024)GPU-based butterfly countingThe VLDB Journal10.1007/s00778-024-00861-0Online publication date: 27-Jun-2024
    • (2024)Parallelization of butterfly counting on hierarchical memoryThe VLDB Journal10.1007/s00778-024-00856-xOnline publication date: 7-Jun-2024
    • (2024)SCS: A Structural Similarity Measure for Graph Clustering Based on Cycles and PathsWeb and Big Data10.1007/978-981-97-2303-4_22(331-345)Online publication date: 29-May-2024
    • (2024)Classification of Following Intentions Using Multi-layer Motif Analysis of Communication Density and Symmetry Among UsersComplex Networks & Their Applications XII10.1007/978-3-031-53472-0_4(37-48)Online publication date: 21-Feb-2024
    • (2023)Temporal motif-based attentional graph convolutional network for dynamic link predictionIntelligent Data Analysis10.3233/IDA-21616927:1(241-268)Online publication date: 30-Jan-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media