Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3366423.3380264acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article
Open access

Provably and Efficiently Approximating Near-cliques using the Turán Shadow: PEANUTS

Published: 20 April 2020 Publication History

Abstract

Clique and near-clique counts are important graph properties with applications in graph generation, graph modeling, graph analytics, community detection among others. They are the archetypal examples of dense subgraphs. While there are several different definitions of near-cliques, most of them share the attribute that they are cliques that are missing a small number of edges. Clique counting is itself considered a challenging problem. Counting near-cliques is significantly harder more so since the search space for near-cliques is orders of magnitude larger than that of cliques.
We give a formulation of a near-clique as a clique that is missing a constant number of edges. We exploit the fact that a near-clique contains a smaller clique, and use techniques for clique sampling to count near-cliques. This method allows us to count near-cliques with 1 or 2 missing edges, in graphs with tens of millions of edges. To the best of our knowledge, there was no known efficient method for this problem, and we obtain a 10x − 100x speedup over existing algorithms for counting near-cliques.
Our main technique is a space efficient adaptation of the Turán Shadow sampling approach, recently introduced by Jain and Seshadhri (WWW 2017). This approach constructs a large recursion tree (called the Turán Shadow) that represents cliques in a graph. We design a novel algorithm that builds an estimator for near-cliques, using a online, compact construction of the Turán Shadow.

References

[1]
Noga Alon, Raphy Yuster, and Uri Zwick. 1994. Color-coding: A New Method for Finding Simple Paths, Cycles and Other Small Subgraphs Within Large Graphs. In Symposium on the Theory of Computing (STOC)(Montreal, Quebec, Canada). 326–335. https://doi.org/10.1145/195058.195179
[2]
J Ignacio Alvarez-Hamelin, Luca Dall’Asta, Alain Barrat, and Alessandro Vespignani. 2006. Large scale networks fingerprinting and visualization using the k-core decomposition. In Advances in neural information processing systems. 41–50.
[3]
R. Andersen and K. Chellapilla. 2009. Finding Dense Subgraphs with Size Bounds. In Workshop on Algorithms and Models for the Web-Graph (WAW). 25–37.
[4]
L. Becchetti, P. Boldi, C. Castillo, and A. Gionis. 2008. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In KDD’08. 16–24. https://doi.org/10.1145/1401890.1401898
[5]
Mansurul A Bhuiyan, Mahmudur Rahman, Mahmuda Rahman, and Mohammad Al Hasan. 2012. Guise: Uniform sampling of graphlets for large graph analysis. In 2012 IEEE 12th International Conference on Data Mining. IEEE, 91–100.
[6]
I. Bordino, D. Donata, A. Gionis, and S. Leonardi. 2008. Mining Large Networks with Subgraph Counting. In Proceedings of International Conference on Data Mining. 737–742.
[7]
Marco Bressan, Flavio Chierichetti, Ravi Kumar, Stefano Leucci, and Alessandro Panconesi. 2018. Motif Counting Beyond Five Nodes. ACM Transactions on Knowledge Discovery from Data (TKDD) 12, 4(2018), 48.
[8]
Jie Chen and Yousef Saad. 2010. Dense subgraph extraction with application to community detection. IEEE Transactions on knowledge and data engineering 24, 7(2010), 1216–1230.
[9]
Norishige Chiba and Takao Nishizeki. 1985. Arboricity and subgraph listing algorithms. SIAM J. Comput. 14(1985), 210–223. Issue 1. https://doi.org/10.1137/0214017
[10]
Radu Curticapean, Holger Dell, and Dániel Marx. 2017. Homomorphisms are a good basis for counting small subgraphs. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing. ACM, 210–223.
[11]
Maximilien Danisch, Oana Balalau, and Mauro Sozio. 2018. Listing k-cliques in Sparse Real-World Graphs. In Proceedings of the 2018 World Wide Web Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 589–598.
[12]
Devdatt Dubhashi and Alessandro Panconesi. 2009. Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge University Press.
[13]
Ethan R Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, and Alexandros G Dimakis. 2016. Distributed estimation of graph 4-profiles. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 483–493.
[14]
Irene Finocchi, Marco Finocchi, and Emanuele G. Fusco. 2015. Clique Counting in MapReduce: Algorithms and Experiments. ACM Journal of Experimental Algorithmics 20 (2015). https://doi.org/10.1145/2794080
[15]
Eugene Fratkin, Brian T Naughton, Douglas L Brutlag, and Serafim Batzoglou. 2006. MotifCut: regulatory motifs finding with maximum density subgraphs. Bioinformatics 22, 14 (2006), e150–e157.
[16]
Guyue Han and Harish Sethu. 2016. Waddling random walk: Fast and accurate mining of motif statistics in large graphs. In Data Mining (ICDM), 2016 IEEE 16th International Conference on. IEEE, 181–190.
[17]
Tomaž Hočevar and Janez Demšar. 2017. Combinatorial algorithm for counting small induced graphs and orbits. PloS one 12, 2 (2017), e0171428.
[18]
P. Holland and S. Leinhardt. 1970. A method for detecting structure in sociometric data. Amer. J. Sociology 76(1970), 492–513.
[19]
Shweta Jain and C Seshadhri. 2017. A Fast and Provable Method for Estimating Clique Counts Using Turán’s Theorem. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 441–449.
[20]
M. Jha, C. Seshadhri, and A. Pinar. 2015. Path Sampling: A Fast and Provable Method for Estimating 4-Vertex Subgraph Counts. In World Wide Web (WWW). 495–505.
[21]
Daniel M Kane, Kurt Mehlhorn, Thomas Sauerwald, and He Sun. 2012. Counting arbitrary subgraphs in data streams. In International Colloquium on Automata, Languages, and Programming. Springer, 598–609.
[22]
Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew Tomkins. 1999. Trawling the Web for emerging cyber-communities. Computer networks 31, 11-16 (1999), 1481–1493.
[23]
Guimei Liu and Limsoon Wong. 2008. Effective pruning techniques for mining quasi-cliques. In Joint European conference on machine learning and knowledge discovery in databases. Springer, 33–49.
[24]
David W Matula and Leland L Beck. 1983. Smallest-last ordering and clustering and graph coloring algorithms. Journal of the ACM (JACM) 30, 3 (1983), 417–427.
[25]
R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. 2002. Network motifs: Simple building blocks of complex networks. Science 298, 5594 (2002), 824–827.
[26]
Ashwin Paranjape, Austin R Benson, and Jure Leskovec. 2017. Motifs in temporal networks. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 601–610.
[27]
Jeffrey Pattillo, Nataly Youssef, and Sergiy Butenko. 2012. Clique relaxation models in social network analysis. In Handbook of Optimization in Complex Networks. Springer, 143–162.
[28]
Ali Pinar, C Seshadhri, and Vaidyanathan Vishal. 2017. Escape: Efficiently counting all 5-vertex subgraphs. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1431–1440.
[29]
Nataša Pržulj. 2007. Biological network comparison using graphlet degree distribution. Bioinformatics 23, 2 (2007), e177–e183.
[30]
Ryan A Rossi, David F Gleich, and Assefaw H Gebremedhin. 2015. Parallel Maximum Clique Algorithms with Applications to Network Analysis. SIAM Journal on Scientific Computing 37, 5 (2015), C589–C616.
[31]
Ahmet Erdem Sariyuce, C Seshadhri, Ali Pinar, and Umit V Catalyurek. 2015. Finding the hierarchy of dense subgraphs using nucleus decompositions. In Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 927–937.
[32]
Ahmet Erdem Sariyüce, C. Seshadhri, Ali Pinar, and Ümit V. Çatalyürek. 2015. Finding the Hierarchy of Dense Subgraphs using Nucleus Decompositions. (2015), 927–937.
[33]
C. Seshadhri, Tamara G. Kolda, and Ali Pinar. 2012. Community structure and scale-free collections of Erdös-Rényi graphs. Physical Review E 85, 5 (May 2012), 056109. https://doi.org/10.1103/PhysRevE.85.056109
[34]
Miguel EP Silva, Pedro Paredes, and Pedro Ribeiro. 2017. Network motifs detection using random networks with prescribed subgraph frequencies. In Workshop on Complex Networks CompleNet. Springer, 17–29.
[35]
Ann Sizemore, Chad Giusti, and Danielle S. Bassett. 2016. Classification of weighted networks through mesoscale homological features. Journal of Complex Networks 10.1093 (2016).
[36]
SNAP [n.d.]. Stanford Network Analysis Project (SNAP). Available at http://snap.stanford.edu/.
[37]
Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. ArnetMiner: Extraction and Mining of Academic Social Networks. In KDD’08. 990–998.
[38]
C. Tsourakakis, F. Bonchi, A. Gionis, F. Gullo, and M. Tsiarli. 2013. Denser Than the Densest Subgraph: Extracting Optimal Quasi-cliques with Quality Guarantees. In Knowledge Data and Discovery (KDD).
[39]
Charalampos E. Tsourakakis. 2015. The K-clique Densest Subgraph Problem. In Proceedings of the Conference on World Wide Web WWW. 1122–1132. https://doi.org/10.1145/2736277.2741098
[40]
Charalampos E. Tsourakakis, Jakub W. Pachocki, and Michael Mitzenmacher. 2016. Scalable motif-aware graph clustering. CoRR abs/1606.06235(2016). http://arxiv.org/abs/1606.06235
[41]
Johan Ugander, Lars Backstrom, and Jon M. Kleinberg. 2013. Subgraph frequencies: mapping the empirical and extremal geography of large graph collections. In WWW, Daniel Schwabe, Virgílio A. F. Almeida, Hartmut Glaser, Ricardo A. Baeza-Yates, and Sue B. Moon (Eds.). International World Wide Web Conferences Steering Committee / ACM, 1307–1318.
[42]
Pinghui Wang, John Lui, Bruno Ribeiro, Don Towsley, Junzhou Zhao, and Xiaohong Guan. 2014. Efficiently estimating motif statistics of large networks. ACM Transactions on Knowledge Discovery from Data (TKDD) 9, 2(2014), 8.
[43]
Pinghui Wang, Junzhou Zhao, Xiangliang Zhang, Zhenguo Li, Jiefeng Cheng, John CS Lui, Don Towsley, Jing Tao, and Xiaohong Guan. 2018. MOSS-5: A fast method of approximating counts of 5-node graphlets in large graphs. IEEE Transactions on Knowledge and Data Engineering 30, 1(2018), 73–86.
[44]
Sebastian Wernicke. 2006. Efficient Detection of Network Motifs. IEEE/ACM Trans. Comput. Biology Bioinform. 3, 4 (2006), 347–359.
[45]
Hao Yin, Austin R Benson, Jure Leskovec, and David F Gleich. 2017. Local higher-order graph clustering. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 555–564.
[46]
Haiyuan Yu, Alberto Paccanaro, Valery Trifonov, and Mark Gerstein. 2006. Predicting interactions in protein networks by completing defective cliques. Bioinformatics 22, 7 (2006), 823–829.

Cited By

View all
  • (2024)A Counting-based Approach for Efficient k-Clique Densest Subgraph DiscoveryProceedings of the ACM on Management of Data10.1145/36549222:3(1-27)Online publication date: 30-May-2024
  • (2024)Efficient -Clique Counting on Large Graphs: The Power of Color-Based Sampling ApproachesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.331464336:4(1518-1536)Online publication date: Apr-2024
  • (2024)DeepDenseExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121816238:PBOnline publication date: 27-Feb-2024
  • Show More Cited By

Index Terms

  1. Provably and Efficiently Approximating Near-cliques using the Turán Shadow: PEANUTS
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      WWW '20: Proceedings of The Web Conference 2020
      April 2020
      3143 pages
      ISBN:9781450370233
      DOI:10.1145/3366423
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 April 2020

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Cliques
      2. Turán Shadow
      3. defective-cliques
      4. graphs
      5. near-cliques
      6. sampling

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      WWW '20
      Sponsor:
      WWW '20: The Web Conference 2020
      April 20 - 24, 2020
      Taipei, Taiwan

      Acceptance Rates

      Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)156
      • Downloads (Last 6 weeks)13
      Reflects downloads up to 07 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)A Counting-based Approach for Efficient k-Clique Densest Subgraph DiscoveryProceedings of the ACM on Management of Data10.1145/36549222:3(1-27)Online publication date: 30-May-2024
      • (2024)Efficient -Clique Counting on Large Graphs: The Power of Color-Based Sampling ApproachesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.331464336:4(1518-1536)Online publication date: Apr-2024
      • (2024)DeepDenseExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121816238:PBOnline publication date: 27-Feb-2024
      • (2023)Scaling Up k-Clique Densest Subgraph DetectionProceedings of the ACM on Management of Data10.1145/35889231:1(1-26)Online publication date: 30-May-2023
      • (2022)Lightning Fast and Space Efficient k-clique CountingProceedings of the ACM Web Conference 202210.1145/3485447.3512167(1191-1202)Online publication date: 25-Apr-2022
      • (2022)Sequential stratified regeneration: MCMC for large state spaces with an application to subgraph count estimationData Mining and Knowledge Discovery10.1007/s10618-021-00802-336:1(414-447)Online publication date: 1-Jan-2022
      • (2021)Graph Theory Approach to Detect Examinees Involved in Test CollusionApplied Psychological Measurement10.1177/0146621621101390245:4(253-267)Online publication date: 12-May-2021

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media