Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

FURL: Fixed-memory and uncertainty reducing local triangle counting for multigraph streams

Published: 01 September 2019 Publication History

Abstract

Given a multigraph stream (e.g., Facebook messages or network traffic) where duplicate edges arrive continuously, how can we accurately and memory-efficiently estimate local triangles for all nodes? Local triangle counting in a graph stream is one of the fundamental tasks in graph mining with important applications including anomaly detection, social role identification, community detection, etc. Many recent graph streams include duplicate edges, hence form multigraph streams: e.g., many network packets might have a same (source, destination) pair. Although there have been several local triangle counting methods for multigraph streams, they have problems in terms of accuracy and memory efficiency; furthermore, most methods support either binary or weighted counting, and thus cannot find anomalies whose detection requires both types of counting. In this paper, we propose FURL, a memory-efficient and accurate local triangle counting method for multigraph streams. FURL has two main advantages. First, FURL improves accuracy by (1) reducing the variance of its estimation via a regularization strategy, and (2) sampling more triangles than the state-of-the-art methods do, by using its memory space efficiently. Second, FURL finds anomalies which state-of-the-art methods cannot discover. Experimental results show that FURL outperforms state-of-the-art methods in terms of accuracy and memory efficiency. Thanks to FURL, we discover interesting anomalies from a Bitcoin network.

References

[1]
Alon N, Yuster R, Zwick U (1997) Finding and counting given length cycles. Algorithmica 17(3):209–223. https://doi.org/10.1007/BF02523189
[2]
Becchetti L, Boldi P, Castillo C, Gionis A (2010) Efficient algorithms for large-scale local triangle counting. ACM Trans Knowl Discov Data 4(3):13:1–13:28. https://doi.org/10.1145/1839490.1839494
[3]
Berry JW, Hendrickson B, LaViolette RA, Phillips CA (2011) Tolerating the community detection resolution limit with edge weighting. Phys Rev E 83(5):056119. https://doi.org/10.1103/PhysRevE.83.056119
[4]
Broder AZ, Charikar M, Frieze AM, Mitzenmacher M (2000) Min-wise independent permutations. J Comput Syst Sci 60(3):630–659. https://doi.org/10.1006/jcss.1999.1690
[5]
Chou B, Suzuki E (2010) Discovering community-oriented roles of nodes in a social network. In: 12th international conference data warehousing and knowledge discovery (DAWAK), pp 52–64
[6]
Eckmann JP, Moses E (2002) Curvature of co-links uncovers hidden thematic layers in the world wide web. Proc Natl Acad Sci 99(9):5825–5829. https://doi.org/10.1073/pnas.032093399
[7]
Epasto A, Lattanzi S, Mirrokni VS, Sebe I, Taei A, Verma S (2015) Ego-net community mining applied to friend suggestion. Proc VLDB Endow 9(4):324–335. https://doi.org/10.14778/2856318.2856327
[8]
Feller W (1968) An introduction to probability theory and its applications, vol 1. Wiley, London
[9]
Flajolet P, Martin GN (1985) Probabilistic counting algorithms for data base applications. J Comput Syst Sci 31(2):182–209. https://doi.org/10.1016/0022-0000(85)90041-8
[10]
Gentle JE (2009) Computational statistics, 1st edn. Springer, New York
[11]
Jha M, Pinar A, Seshadhri C (2015) Counting triangles in real-world graph streams: dealing with repeated edges and time windows. In: 49th Asilomar conference on signals, systems, and computers (ACSSC), pp 1507–1514
[12]
Kutzkov K, Pagh R (2013) On the streaming complexity of computing local clustering coefficients. In: Sixth ACM international conference on web search and data mining, (WSDM), pp 677–686
[13]
Latapy M (2008) Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor Comput Sci 407(1–3):458–473. https://doi.org/10.1016/j.tcs.2008.07.017
[14]
Lim Y, Kang U (2015) MASCOT: memory-efficient and accurate sampling for counting local triangles in graph streams. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 685–694
[15]
Lim Y, Jung M, Kang U (2018) Memory-efficient and accurate sampling for counting local triangles in graph streams: from simple to multigraphs. ACM Trans Knowl Discov Data 12(1):4:1–4:28. https://doi.org/10.1145/3022186
[16]
Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U (2002) Network motifs: simple building blocks of complex networks. Science 298(5594):824–827
[17]
Pagh R, Tsourakakis CE (2012) Colorful triangle counting and a mapreduce implementation. Inf Process Lett 112(7):277–281. https://doi.org/10.1016/j.ipl.2011.12.007
[18]
Stefani LD, Epasto A, Riondato M, Upfal E (2016) Trièst: counting local and global triangles in fully-dynamic streams with fixed memory size. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 825–834
[19]
Stefani LD, Epasto A, Riondato M, Upfal E (2017) Trièst: counting local and global triangles in fully dynamic streams with fixed memory size. ACM Trans Knowl Discov Data 11(4):43:1–43:50. https://doi.org/10.1145/3059194
[20]
Sunter A (1977) List sequential sampling with equal or unequal probabilities without replacement. Appl Stat 26(3):261–268. https://doi.org/10.2307/2346966
[21]
Suri S, Vassilvitskii S (2011) Counting triangles and the curse of the last reducer. In: Proceedings of the 20th international conference on world wide web (WWW), pp 607–614
[22]
Tsourakakis CE, Kang U, Miller GL, Faloutsos C (2009) DOULION: counting triangles in massive graphs with a coin. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (KDD) vol. 1, 2009, pp 837–846
[23]
Vitter JS (1985) Random sampling with a reservoir. ACM Trans Math Softw 11(1):37–57. https://doi.org/10.1145/3147.3165
[24]
Wang P, Qi Y, Sun Y, Zhang X, Tao J, Guan X (2017) Approximately counting triangles in large graph streams including edge duplicates with a fixed memory usage. Proc VLDB Endow 11(2):162–175. https://doi.org/10.14778/3149193.3149197
[25]
Welser HT, Gleave E, Fisher D, Smith MA (2007) Visualizing the signatures of social roles in online discussion groups. J Soc Struct 8(2):1–32
[26]
Yang Z, Wilson C, Wang X, Gao T, Zhao BY, Dai Y (2011) Uncovering social network sybils in the wild. In: Proceedings of the 11th ACM SIGCOMM internet measurement conference, (IMC), pp 259–268

Cited By

View all
  • (2024)FABLE: Approximate Butterfly Counting in Bipartite Graph Stream with Duplicate EdgesProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679812(2158-2167)Online publication date: 21-Oct-2024
  • (2023)Top-k heavy weight triangles listing on graph streamWorld Wide Web10.1007/s11280-022-01117-z26:4(1827-1851)Online publication date: 1-Jul-2023
  • (2023)Global triangle estimation based on first edge sampling in large graph streamsThe Journal of Supercomputing10.1007/s11227-023-05205-379:13(14079-14116)Online publication date: 1-Sep-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Data Mining and Knowledge Discovery
Data Mining and Knowledge Discovery  Volume 33, Issue 5
Sep 2019
282 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 September 2019

Author Tags

  1. Local triangle counting
  2. Graph stream
  3. Edge sampling

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)FABLE: Approximate Butterfly Counting in Bipartite Graph Stream with Duplicate EdgesProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679812(2158-2167)Online publication date: 21-Oct-2024
  • (2023)Top-k heavy weight triangles listing on graph streamWorld Wide Web10.1007/s11280-022-01117-z26:4(1827-1851)Online publication date: 1-Jul-2023
  • (2023)Global triangle estimation based on first edge sampling in large graph streamsThe Journal of Supercomputing10.1007/s11227-023-05205-379:13(14079-14116)Online publication date: 1-Sep-2023
  • (2023)Sliding window-based approximate triangle counting with bounded memory usageThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00783-332:5(1087-1110)Online publication date: 1-Sep-2023
  • (2022)Distributed Triangle Approximately Counting Algorithms in Simple Graph StreamACM Transactions on Knowledge Discovery from Data10.1145/349456216:4(1-43)Online publication date: 8-Jan-2022
  • (2022)SPAC: Scalable Pattern Approximate Counting in Graph MiningAlgorithms and Architectures for Parallel Processing10.1007/978-3-031-22677-9_12(214-232)Online publication date: 10-Oct-2022
  • (2021)Sliding Window-based Approximate Triangle Counting over Streaming Graphs with Duplicate EdgesProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3452800(645-657)Online publication date: 9-Jun-2021
  • (2020)Temporal locality-aware sampling for accurate triangle counting in real graph streamsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-020-00624-729:6(1501-1525)Online publication date: 12-Aug-2020

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media