Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1401890.1401898acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Efficient semi-streaming algorithms for local triangle counting in massive graphs

Published: 24 August 2008 Publication History

Abstract

In this paper we study the problem of local triangle counting in large graphs. Namely, given a large graph G = (V;E) we want to estimate as accurately as possible the number of triangles incident to every node υ ∈ V in the graph. The problem of computing the global number of triangles in a graph has been considered before, but to our knowledge this is the first paper that addresses the problem of local triangle counting with a focus on the efficiency issues arising in massive graphs. The distribution of the local number of triangles and the related local clustering coefficient can be used in many interesting applications. For example, we show that the measures we compute can help to detect the presence of spamming activity in large-scale Web graphs, as well as to provide useful features to assess content quality in social networks.
For computing the local number of triangles we propose two approximation algorithms, which are based on the idea of min-wise independent permutations (Broder et al. 1998). Our algorithms operate in a semi-streaming fashion, using O(jV j) space in main memory and performing O(log jV j) sequential scans over the edges of the graph. The first algorithm we describe in this paper also uses O(jEj) space in external memory during computation, while the second algorithm uses only main memory. We present the theoretical analysis as well as experimental results in massive graphs demonstrating the practical efficiency of our approach.

References

[1]
N. Alon, R. Yuster, and U. Zwick. Finding and counting given length cycles. Algorithmica, 17(3):209--223, 1997.
[2]
Z. Bar-Yossef, R. Kumar, and D. Sivakumar. Reductions in streaming algorithms, with an application to counting triangles in graphs. In SODA, 2002.
[3]
V. Batagelj and A. Mrvar. A subquadratic triad census algorithm for large sparse networks with small maximum degree. Social Networks, 23:237--243, 2001.
[4]
T. Bohman, C. Cooper, and A. M. Frieze. Min-wise independent linear permutations. Electr. J. Comb, 7, 2000.
[5]
P. Boldi and S. Vigna. The webgraph framework I: compression techniques. In WWW, 2004.
[6]
A. Z. Broder. On the resemblance and containment of documents. In Compression and Complexity of Sequences, IEEE Computer Society, 1998.
[7]
A. Z. Broder. Identifying and filtering near-duplicate documents. In CPM. Springer, 2000.
[8]
A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-wise independent permutations. In STOC, New York, NY, USA, 1998.
[9]
A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. Syntactic clustering of the web. In WWW, 1997.
[10]
L. S. Buriol, G. Frahling, S. Leonardi, A. Marchetti-Spaccamela, and C. Sohler. Counting triangles in data streams. In PODS, 2006.
[11]
C. Castillo, D. Donato, L. Becchetti, P. Boldi, S. Leonardi, M. Santini, and S. Vigna. A reference collection for web spam. SIGIR Forum, 40(2):11--24, December 2006.
[12]
C. Castillo, D. Donato, A. Gionis, V. Murdock, and F. Silvestri. Know your neighbors: Web spam detection using the web topology. In SIGIR, 2007.
[13]
D. Coppersmith and R. Kumar. An improved data stream algorithm for frequency moments. In SODA, 2004.
[14]
D. Coppersmith and S. Winograd. Matrix multiplication via arithmetic progressions. Journal of Symbolic Computation, 9(3):251--280, 1990.
[15]
C. Demetrescu, I. Finocchi, and A. Ribichini. Trading of space for passes in graph streaming problems. In SODA, 2006.
[16]
J.-P. Eckmann and E. Moses. Curvature of co-links uncovers hidden thematic layers in the world wide web. PNAS, 99(9):5825--5829, 2002.
[17]
J. Feigenbaum, S. Kannan, M. A. Gregor, S. Suri, and J. Zhang. On graph problems in a semi-streaming model. In ICALP, 2004.
[18]
D. Fetterly, M. Manasse, and M. Najork. Spam, damn spam, and statistics: Using statistical analysis to locate spam web pages. In WebDB, 2004.
[19]
D. Fogaras and B. Racz. Scaling link-based similarity search. In WWW, 2005.
[20]
D. Gibson, R. Kumar, and A. Tomkins. Discovering large dense subgraphs in massive graphs. In VLDB, 2005.
[21]
A. Gulli and A. Signorini. The indexable Web is more than 11.5 billion pages. In WWW, 2005.
[22]
T. Haveliwala. Efficient computation of pagerank. Technical report, Stanford University, 1999.
[23]
M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. Dimacs Series In Discrete Mathematics And Theoretical Computer Science, pages 107--118, 1999.
[24]
P. Indyk. A small approximately min-wise independent family of hash functions. In SODA, 1999.
[25]
A. Itai and M. Rodeh. Finding a minimum circuit in a graph. SIAM Journal of Computing, 7(4):413--423, 1978.
[26]
G. Jeh and J. Widom. Simrank: a measure of structural-context similarity. In KDD, New York, NY, USA, 2002.
[27]
R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Trawling the Web for emerging cyber-communities. Computer Networks, 31(11{16):1481--1493, 1999.
[28]
J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over time: densification laws, shrinking diameters and possible explanations. In KDD, 2005.
[29]
M. E. J. Newman. The structure and function of complex networks. SIAM Review, 45(2):167--256, 2003.
[30]
T. Schank and D. Wagner. Finding, counting and listing all triangles in large graphs, an experimental study. In Proceedings of the 4th International Workshop on Experimental and Efficient Algorithms (WEA), 2005.
[31]
J. S. Vitter. External memory algorithms and data structures. ACM Computing Surveys, 33(2):209--271, 2001.
[32]
H. T. Welser, E. Gleave, D. Fisher, and M. Smith. Visualizing the signatures of social roles in online discussion groups. The Journal of Social Structure, 8(2), 2007.

Cited By

View all
  • (2024)Efficient -Clique Counting on Large Graphs: The Power of Color-Based Sampling ApproachesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.331464336:4(1518-1536)Online publication date: Apr-2024
  • (2024)Privacy-Preserving Approximate Calculation of Subgraphs in Social Networking Services2024 IEEE International Conference on Web Services (ICWS)10.1109/ICWS62655.2024.00060(383-394)Online publication date: 7-Jul-2024
  • (2024)Generic network sparsification via degree- and subgraph-based edge samplingInformation Sciences: an International Journal10.1016/j.ins.2024.121096679:COnline publication date: 1-Sep-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2008
1116 pages
ISBN:9781605581934
DOI:10.1145/1401890
  • General Chair:
  • Ying Li,
  • Program Chairs:
  • Bing Liu,
  • Sunita Sarawagi
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. graph mining
  2. probabilistic algorithms
  3. semi-streaming

Qualifiers

  • Research-article

Conference

KDD08

Acceptance Rates

KDD '08 Paper Acceptance Rate 118 of 593 submissions, 20%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)71
  • Downloads (Last 6 weeks)2
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Efficient -Clique Counting on Large Graphs: The Power of Color-Based Sampling ApproachesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.331464336:4(1518-1536)Online publication date: Apr-2024
  • (2024)Privacy-Preserving Approximate Calculation of Subgraphs in Social Networking Services2024 IEEE International Conference on Web Services (ICWS)10.1109/ICWS62655.2024.00060(383-394)Online publication date: 7-Jul-2024
  • (2024)Generic network sparsification via degree- and subgraph-based edge samplingInformation Sciences: an International Journal10.1016/j.ins.2024.121096679:COnline publication date: 1-Sep-2024
  • (2024)Balanced parallel triangle enumeration with an adaptive algorithmDistributed and Parallel Databases10.1007/s10619-023-07437-x42:1(103-141)Online publication date: 1-Mar-2024
  • (2023)CLAP: Locality Aware and Parallel Triangle Counting with Content Addressable Memory2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10136997(1-6)Online publication date: Apr-2023
  • (2023)Triangular Stability Maximization by Influence Spread over Social NetworksProceedings of the VLDB Endowment10.14778/3611479.361149016:11(2818-2831)Online publication date: 24-Aug-2023
  • (2023)Scalable Approximate Butterfly and Bi-triangle Counting for Large Bipartite NetworksProceedings of the ACM on Management of Data10.1145/36267531:4(1-26)Online publication date: 12-Dec-2023
  • (2023)SAGE: A Storage-Based Approach for Scalable and Efficient Sparse Generalized Matrix-Matrix MultiplicationProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615044(923-933)Online publication date: 21-Oct-2023
  • (2023)Khuzdul: Efficient and Scalable Distributed Graph Pattern Mining EngineProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575743(413-426)Online publication date: 27-Jan-2023
  • (2023)DecoMine: A Compilation-Based Graph Pattern Mining System with Pattern DecompositionProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3567955.3567956(47-61)Online publication date: 25-Mar-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media