research-article

ESCAPE: Efficiently Counting All 5-Vertex Subgraphs

Authors:

C. Seshadhri, and

Vaidyanathan VishalAuthors Info & Claims

WWW '17: Proceedings of the 26th International Conference on World Wide Web

April 2017

Pages 1431 - 1440

https://doi.org/10.1145/3038912.3052597

Published: 03 April 2017 Publication History

Abstract

Counting the frequency of small subgraphs is a fundamental technique in network analysis across various domains, most notably in bioinformatics and social networks. The special case of triangle counting has received much attention. Getting results for 4-vertex or 5-vertex patterns is highly challenging, and there are few practical results known that can scale to massive sizes.

We introduce an algorithmic framework that can be adopted to count any small pattern in a graph and apply this framework to compute exact counts for all 5-vertex subgraphs. Our framework is built on cutting a pattern into smaller ones, and using counts of smaller patterns to get larger counts. Furthermore, we exploit degree orientations of the graph to reduce runtimes even further. These methods avoid the combinatorial explosion that typical subgraph counting algorithms face. We prove that it suffices to enumerate only four specific subgraphs (three of them have less than 5 vertices) to exactly count all 5-vertex patterns.

We perform extensive empirical experiments on a variety of real-world graphs. We are able to compute counts of graphs with tens of millions of edges in minutes on a commodity machine. To the best of our knowledge, this is the first practical algorithm for 5-vertex pattern counting that runs at this scale. A stepping stone to our main algorithm is a fast method for counting all 4-vertex patterns. This algorithm is typically ten times faster than the state of the art 4-vertex counters.

References

[1]

Escape. https://bitbucket.org/seshadhri/escape.

[2]

Parallel parameterized graphlet decomposition (pgd) library. http://nesreenahmed.com/graphlets/.

[3]

L. Adamic and E. Adar. Friends and neighbors on the web. Social Networks, 25(3):211--230, 2003.

[4]

N. K. Ahmed, J. Neville, R. A. Rossi, and N. Duffield. Efficient graphlet counting for large networks. In Proceedings of International Conference on Data Mining (ICDM), 2015.

Digital Library

[5]

N. Alon, R. Yuster, and U. Zwick. Color-coding: A new method for finding simple paths, cycles and other small subgraphs within large graphs. pages 326--335, 1994.

Digital Library

[6]

L. Becchetti, P. Boldi, C. Castillo, and A. Gionis. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In KDD'08, pages 16--24, 2008.

Digital Library

[7]

A. Benson, D. F. Gleich, and J. Leskovec. Higher-order organization of complex networks. Science, 353(6295):163--166, 2016.

[8]

N. Betzler, R. van Bevern, M. R. Fellows, C. Komusiewicz, and R. Niedermeier. Parameterized algorithmics for finding connected motifs in biological networks. IEEE/ACM Trans. Comput. Biology Bioinform., 8(5):1296--1308, 2011.

Digital Library

[9]

M. Bhuiyan, M. Rahman, M. Rahman, and M. A. Hasan. Guise: Uniform sampling of graphlets for large graph analysis. In Proceedings of International Conference on Data Mining, pages 91--100, 2012.

Digital Library

[10]

E. Birmel. Detecting local network motifs. Electron. J. Statist., 6:908--933, 2012.

[11]

R. Burt. Structural holes and good ideas. American Journal of Sociology, 110(2):349--399, 2004.

[12]

N. Chiba and T. Nishizeki. Arboricity and subgraph listing algorithms. SIAM J. Comput., 14:210--223, 1985.

Digital Library

[13]

J. Cohen. Graph twiddling in a MapReduce world. Computing in Science & Engineering, 11:29--41, 2009.

Digital Library

[14]

J. Coleman. Social capital in the creation of human capital. American Journal of Sociology, 94:S95--S120, 1988.

[15]

R. Diestel. Graph Theory, Graduate texts in mathematics 173. Springer-Verlag, 2006.

[16]

J.-P. Eckmann and E. Moses. Curvature of co-links uncovers hidden thematic layers in the World Wide Web. Proceedings of the National Academy of Sciences (PNAS), 99(9):5825--5829, 2002.

[17]

E. R. Elenberg, K. Shanmugam, M. Borokhovich, and A. G. Dimakis. Beyond triangles: A distributed framework for estimating 3-profiles of large graphs. In Knowledge Data and Discovery (KDD), pages 229--238, 2015.

Digital Library

[18]

E. R. Elenberg, K. Shanmugam, M. Borokhovich, and A. G. Dimakis. Distributed estimation of graph 4-profiles. In Conference on World Wide Web, pages 483--493, 2016.

Digital Library

[19]

K. Faust. A puzzle concerning triads in social networks: Graph constraints and the triad census. Social Networks, 32(3):221--233, 2010.

[20]

M. Gonen and Y. Shavitt. Approximating the number of network motifs. Internet Mathematics, 6(3):349--372, 2009.

[21]

D. Hales and S. Arteconi. Motifs in evolving cooperative networks look like protein structure networks. NHM, 3(2):239--249, 2008.

[22]

T. Hocevar and J. Demsar. A combinatorial approach to graphlet counting. Bioinformatics, 2014.

[23]

P. Holland and S. Leinhardt. A method for detecting structure in sociometric data. American Journal of Sociology, 76:492--513, 1970.

[24]

F. Hormozdiari, P. Berenbrink, N. Prulj, and S. C. Sahinalp. Not all scale-free networks are born equal: The role of the seed graph in ppi network evolution. PLoS Computational Biology, 118, 2007.

[25]

M. Jha, C. Seshadhri, and A. Pinar. Path sampling: A fast and provable method for estimating 4-vertex subgraph counts. In Proc. World Wide Web (WWW), number 1212.2264, 2015.

Digital Library

[26]

D. Liben-Nowell and J. Kleinberg. The link prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7):1019--1031, 2007.

Digital Library

[27]

D. Marcus and Y. Shavitt. Efficient counting of network motifs. In ICDCS Workshops, pages 92--98. IEEE Computer Society, 2010.

Digital Library

[28]

I. Melckenbeeck, P. Audenaert, T. Michoel, D. Colle, and M. Pickavet. An algorithm to automatically generate the combinatorial orbit counting equations. PLoS ONE, 11(1):1--19, 01 2016.

[29]

T. Milenkovic and N. Przulj. Uncovering Biological Network Function via Graphlet Degree Signatures. arXiv, q-bio.MN, Jan. 2008.

[30]

R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: Simple building blocks of complex networks. Science, 298(5594):824--827, 2002.

[31]

A. Pinar, C. Seshadhri, and V. Vishal. Escape:efficiently counting all 5-vertex subgraphs. Technical report, 2016. https://users.soe.ucsc.edu/ sesh/escape.pdf.

[32]

A. Portes. Social capital: Its origins and applications in modern sociology. Annual Review of Sociology, 24(1):1--24, 1998.

[33]

N. Przulj, D. G. Corneil, and I. Jurisica. Modeling interactome: scale-free or geometric?. Bioinformatics, 20(18):3508--3515, 2004.

Digital Library

[34]

M. Rahman, M. A. Bhuiyan, and M. A. Hasan. Graft: An efficient graphlet counting method for large graph analysis. IEEE Transactions on Knowledge and Data Engineering, PP(99), 2014.

[35]

A. E. Sariyuce, C. Seshadhri, A. Pinar, and U. V. Catalyurek. Finding the hierarchy of dense subgraphs using nucleus decompositions. In Proceedings of the 24th International Conference on World Wide Web, WWW '15, pages 927--937, New York, NY, USA, 2015. ACM.

Digital Library

[36]

T. Schank and D. Wagner. Approximating clustering coefficient and transitivity. Journal of Graph Algorithms and Applications, 9:265--275, 2005.

[37]

T. Schank and D. Wagner. Finding, counting and listing all triangles in large graphs, an experimental study. In Experimental and Efficient Algorithms, pages 606--609. Springer Berlin / Heidelberg, 2005.

Digital Library

[38]

C. Seshadhri, T. G. Kolda, and A. Pinar. Community structure and scale-free collections of Erdös-Rényi graphs. Physical Review E, 85(5):056109, May 2012.

[39]

C. Seshadhri, A. Pinar, and T. G. Kolda. Fast triangle counting through wedge sampling. In Proceedings of the SIAM Conference on Data Mining, 2013.

[40]

S. Suri and S. Vassilvitskii. Counting triangles and the curse of the last reducer. In World Wide Web (WWW), pages 607--614, 2011.

Digital Library

[41]

M. Szell, R. Lambiotte, and S. Thurner. Multirelational organization of large-scale social networks in an online world. Proceedings of the National Academy of Sciences, 107:13636--13641, 2010.

[42]

J. D. Tomaz Hocevar. Combinatorial algorithm for counting small induced graphs and orbits. Technical report, arXiv, 2016. http://arxiv.org/abs/1601.06834.

[43]

C. Tsourakakis, M. N. Kolountzakis, and G. Miller. Triangle sparsifiers. J. Graph Algorithms and Applications, 15:703--726, 2011.

[44]

C. E. Tsourakakis. The k-clique densest subgraph problem. In Conference on World Wide Web (WWW), pages 1122--1132, 2015.

Digital Library

[45]

C. E. Tsourakakis, U. Kang, G. L. Miller, and C. Faloutsos. Doulion: counting triangles in massive graphs with a coin. In Knowledge Data and Discovery (KDD), pages 837--846, 2009.

Digital Library

[46]

C. E. Tsourakakis, J. W. Pachocki, and M. Mitzenmacher. Scalable motif-aware graph clustering. CoRR, abs/1606.06235, 2016.

Digital Library

[47]

J. Ugander, L. Backstrom, and J. M. Kleinberg. Subgraph frequencies: mapping the empirical and extremal geography of large graph collections. In WWW, pages 1307--1318, 2013.

Digital Library

[48]

S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications. Cambridge University Press, 1994.

[49]

D. Watts and S. Strogatz. Collective dynamics of 'small-world' networks. Nature, 393:440--442, 1998.

[50]

S. Wernicke and F. Rasche. Fanmod: a tool for fast network motif detection. Bioinformatics, 22(9):1152--1153, 2006.

Digital Library

[51]

E. Wong, B. Baur, S. Quader, and C.-H. Huang. Biological network motif detection: principles and practice. Briefings in Bioinformatics, 13(2):202--215, 2012.

[52]

Z. Zhao, G. Wang, A. Butt, M. Khan, V. S. A. Kumar, and M. Marathe. Sahad: Subgraph analysis in massive networks using hadoop. In Proceedings of International Parallel and Distributed Processing Symposium (IPDPS), pages 390--401, 2012.

Digital Library

[53]

The network repository. Available at http://www.networkrepository.com/.

[54]

Stanford Network Analysis Project (SNAP). Available at http://snap.stanford.edu/.

Cited By

Huo NCheng RKao BNing WHaldar NLi XLi JNajafi MLi TQu G(2024)ZeroEA: A Zero-Training Entity Alignment Framework via Pre-Trained Language ModelProceedings of the VLDB Endowment10.14778/3654621.365464017:7(1765-1774)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.14778/3654621.3654640
邱文(2024)Frequent Itemset Mining in the Graph Data FieldComputer Science and Application10.12677/CSA.2024.14101714:01(158-172)Online publication date: 2024
https://doi.org/10.12677/CSA.2024.141017
Dalirrooyfard MMathialagan SWilliams VXu YMohar BShinkar IO'Donnell R(2024)Towards Optimal Output-Sensitive Clique Listing or: Listing Cliques from Smaller CliquesProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649663(923-934)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3618260.3649663
Show More Cited By

Index Terms

ESCAPE: Efficiently Counting All 5-Vertex Subgraphs
1. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory
      1. Graph algorithms
2. Theory of computation
  1. Design and analysis of algorithms
    1. Graph algorithms analysis

Recommendations

A Learned Sketch for Subgraph Counting
SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data

Subgraph counting, as a fundamental problem in network analysis, is to count the number of subgraphs in a data graph that match a given query graph by either homomorphism or subgraph isomorphism. The importance of subgraph counting derives from the fact ...
Read More
Efficiently Counting Vertex Orbits of All 5-vertex Subgraphs, by EVOKE
WSDM '20: Proceedings of the 13th International Conference on Web Search and Data Mining

Subgraph counting is a fundamental task in network analysis. Typically, algorithmic work is on total counting, where we wish to count the total frequency of a (small) pattern subgraph in a large input data set. But many applications require local counts ...
Read More
The Distance Orientation Problem
Abstract
The Distance Orientation Problem (DOP) is formulated as follows: Given a graph with positive weights on its edges, are there weights for the vertices, such that for every edge x y it holds that the absolute difference between the ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '17: Proceedings of the 26th International Conference on World Wide Web

April 2017

1678 pages

ISBN:9781450349130

General Chairs:
Rick Barrett
W3Events
,
Rick Cummings
Murdoch University
,
Program Chairs:
Eugene Agichtein
Emory University
,
Evgeniy Gabrilovich
Google Research

Copyright © 2017 Copyright is held by the International World Wide Web Conference Committee (IW3C2).

Sponsors

IW3C2: International World Wide Web Conference Committee

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 03 April 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW '17

Sponsor:

IW3C2

WWW '17: 26th International World Wide Web Conference

April 3 - 7, 2017

Perth, Australia

Acceptance Rates

WWW '17 Paper Acceptance Rate 164 of 966 submissions, 17%;

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

84
Total Citations
View Citations
721
Total Downloads

Downloads (Last 12 months)138
Downloads (Last 6 weeks)27

Other Metrics

View Author Metrics

Citations

Cited By

Huo NCheng RKao BNing WHaldar NLi XLi JNajafi MLi TQu G(2024)ZeroEA: A Zero-Training Entity Alignment Framework via Pre-Trained Language ModelProceedings of the VLDB Endowment10.14778/3654621.365464017:7(1765-1774)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.14778/3654621.3654640
邱文(2024)Frequent Itemset Mining in the Graph Data FieldComputer Science and Application10.12677/CSA.2024.14101714:01(158-172)Online publication date: 2024
https://doi.org/10.12677/CSA.2024.141017
Dalirrooyfard MMathialagan SWilliams VXu YMohar BShinkar IO'Donnell R(2024)Towards Optimal Output-Sensitive Clique Listing or: Listing Cliques from Smaller CliquesProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649663(923-934)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3618260.3649663
Fu TWei CWang YYing RAngélica LLattanzi SMuñoz Medina AAkoglu LGionis AVassilvitskii S(2024)DeSCo: Towards Generalizable and Scalable Deep Subgraph CountingProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635788(218-227)Online publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1145/3616855.3635788
Ye XLi RDai QChen HWang G(2024)Efficient -Clique Counting on Large Graphs: The Power of Color-Based Sampling ApproachesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.331464336:4(1518-1536)Online publication date: Apr-2024
https://doi.org/10.1109/TKDE.2023.3314643
Xia YZhang FXu QZhang MYao ZLu LDu XDeng DHe BMa S(2024)GPU-based butterfly countingThe VLDB Journal10.1007/s00778-024-00861-0Online publication date: 27-Jun-2024
https://doi.org/10.1007/s00778-024-00861-0
Wang ZLai LLiu YShui BTian CZhong S(2024)Parallelization of butterfly counting on hierarchical memoryThe VLDB Journal10.1007/s00778-024-00856-xOnline publication date: 7-Jun-2024
https://doi.org/10.1007/s00778-024-00856-x
Li JWang LZhang ZQin X(2024)SCS: A Structural Similarity Measure for Graph Clustering Based on Cycles and PathsWeb and Big Data10.1007/978-981-97-2303-4_22(331-345)Online publication date: 29-May-2024
https://doi.org/10.1007/978-981-97-2303-4_22
Fushimi TMiyazaki T(2024)Classification of Following Intentions Using Multi-layer Motif Analysis of Communication Density and Symmetry Among UsersComplex Networks & Their Applications XII10.1007/978-3-031-53472-0_4(37-48)Online publication date: 21-Feb-2024
https://doi.org/10.1007/978-3-031-53472-0_4
Wu ZChen HZhang JPei YHuang Z(2023)Temporal motif-based attentional graph convolutional network for dynamic link predictionIntelligent Data Analysis10.3233/IDA-21616927:1(241-268)Online publication date: 30-Jan-2023
https://doi.org/10.3233/IDA-216169
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents