Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3394486.3403074acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Incremental Lossless Graph Summarization

Published: 20 August 2020 Publication History

Abstract

Given a fully dynamic graph, represented as a stream of edge insertions and deletions, how can we obtain and incrementally update a lossless summary of its current snapshot? As large-scale graphs are prevalent, concisely representing them is inevitable for efficient storage and analysis. Lossless graph summarization is an effective graph-compression technique with many desirable properties. It aims to compactly represent the input graph as (a) a summary graph consisting of supernodes (i.e., sets of nodes) and superedges (i.e., edges between supernodes), which provide a rough description, and (b) edge corrections which fix errors induced by the rough description. While a number of batch algorithms, suited for static graphs, have been developed for rapid and compact graph summarization, they are highly inefficient in terms of time and space for dynamic graphs, which are common in practice.
In this work, we propose MoSSo, the first incremental algorithm for lossless summarization of fully dynamic graphs. In response to each change in the input graph, MoSSo updates the output representation by repeatedly moving nodes among supernodes. MoSSo decides nodes to be moved and their destinations carefully but rapidly based on several novel ideas. Through extensive experiments on 10 real graphs, we show MoSSo is (a) Fast and 'any time': processing each change in near-constant time (less than 0.1 millisecond), up to 7 orders of magnitude faster than running state-of-the-art batch methods, (b) Scalable: summarizing graphs with hundreds of millions of edges, requiring sub-linear memory during the process, and (c) Effective: achieving comparable compression ratios even to state-of-the-art batch methods.

Supplementary Material

MP4 File (3394486.3403074.mp4)
Concisely representing large-scale graphs is inevitable for efficient storage and analysis, which can be achieved by lossless graph summarization with many desirable properties. This technique yields a summary graph consisting of supernodes and superedges, and edge corrections. However, several algorithms developed for static graphs are inefficient in terms of time and space for dynamic graphs, represented as a stream of edge insertions and deletions. In this video, we present MoSSo, the first incremental algorithm for lossless summarization of fully dynamic graphs. Given a change in the input graph, MoSSo updates the output representation by moving nodes among supernodes based on novel ideas. Extensive experiments on real graphs show MoSSo processes a change in near-constant time, up to 10^7 times faster than running the fastest batch methods, summarizes graphs with up to 0.3 billion edges, requires sublinear memory during the process, and achieves compression ratios comparable with up-to-date batch methods.

References

[1]
Réka Albert and Albert-László Barabási. 2002. Statistical mechanics of complex networks. Reviews of modern physics, Vol. 74, 1 (2002), 47.
[2]
Maham Anwar Beg, Muhammad Ahmad, Arif Zaman, and Imdadullah Khan. 2018. Scalable Approximation Algorithm for Graph Summarization. In PAKDD.
[3]
Eric Temple Bell. 1938. The iterated exponential integers. Annals of Mathematics (1938), 539--557.
[4]
Paolo Boldi and Sebastiano Vigna. 2004. The webgraph framework I: compression techniques. In WWW.
[5]
Andrei Z Broder, Moses Charikar, Alan M Frieze, and Michael Mitzenmacher. 2000. Min-wise independent permutations. JCSS, Vol. 60, 3 (2000), 630--659.
[6]
Gregory Buehrer and Kumar Chellapilla. 2008. A scalable pattern mining approach to web graph compression with communities. In WSDM.
[7]
Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, Michael Mitzenmacher, Alessandro Panconesi, and Prabhakar Raghavan. 2009. On compressing social networks. In KDD.
[8]
Laxman Dhulipala, Igor Kabiljo, Brian Karrer, Giuseppe Ottaviano, Sergey Pupyrev, and Alon Shalita. 2016. Compressing graphs and indexes with recursive graph bisection. In KDD.
[9]
Xiangyang Gou, Lei Zou, Chenxingyu Zhao, and Tong Yang. 2019. Fast and Accurate Graph Stream Summarization. In ICDE.
[10]
W Keith Hastings. 1970. Monte Carlo sampling methods using Markov chains and their applications. (1970).
[11]
Edward Kao, Vijay Gadepally, Michael Hurley, Michael Jones, Jeremy Kepner, Sanjeev Mohindra, Paul Monticciolo, Albert Reuther, Siddharth Samsi, William Song, et al. 2017. Streaming graph challenge: Stochastic block partition. In HPEC.
[12]
Arijit Khan and Charu Aggarwal. 2017. Toward query-friendly compression of rapid graph streams. SNAM, Vol. 7, 1 (2017), 23.
[13]
Kifayat Ullah Khan, Waqas Nawaz, and Young-Koo Lee. 2015. Set-based approximate approach for lossless graph summarization. Computing, Vol. 97, 12 (2015), 1185--1207.
[14]
Jon M Kleinberg, Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew S Tomkins. 1999. The web as a graph: measurements, models, and methods. In COCOON.
[15]
Kyuhan Lee, Hyeonsoo Jo, Jihoon Ko, Sungsu Lim, and Kijung Shin. 2020. SSumM: Sparse Summarization of Massive Graphs. In KDD.
[16]
Kristen LeFevre and Evimaria Terzi. 2010. GraSS: Graph structure summarization. In SDM.
[17]
Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2007. Graph evolution: Densification and shrinking diameters. TKDD, Vol. 1, 1 (2007), 2.
[18]
Yike Liu, Tara Safavi, Abhilash Dighe, and Danai Koutra. 2018. Graph Summarization Methods and Applications: A Survey. CSUR, Vol. 51, 3 (2018), 62.
[19]
Michael Mathioudakis, Francesco Bonchi, Carlos Castillo, Aristides Gionis, and Antti Ukkonen. 2011. Sparsification of influence networks. In KDD.
[20]
Yasir Mehmood, Nicola Barbieri, Francesco Bonchi, and Antti Ukkonen. 2013. Csi: Community-level social influence analysis. In ECML/PKDD.
[21]
Saket Navlakha, Rajeev Rastogi, and Nisheeth Shrivastava. 2008. Graph summarization with bounded error. In SIGMOD.
[22]
Huazhong Ning, Wei Xu, Yun Chi, Yihong Gong, and Thomas Huang. 2007. Incremental spectral clustering with application to monitoring of evolving blog communities. In SDM.
[23]
Tiago P Peixoto. 2014. Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models. Physical Review E, Vol. 89, 1 (2014), 012804.
[24]
Matteo Riondato, David Garc'ia-Soriano, and Francesco Bonchi. 2017. Graph summarization with quality guarantees. DMKD, Vol. 31, 2 (2017), 314--349.
[25]
Jorma Rissanen. 1978. Modeling by shortest data description. Automatica, Vol. 14, 5 (1978), 465--471.
[26]
Neil Shah, Danai Koutra, Tianmin Zou, Brian Gallagher, and Christos Faloutsos. 2015. Timecrunch: Interpretable dynamic graph summarization. In KDD.
[27]
Kijung Shin, Amol Ghoting, Myunghwan Kim, and Hema Raghavan. 2019. Sweg: Lossless and lossy summarization of web-scale graphs. In WWW.
[28]
Julian Shun, Laxman Dhulipala, and Guy E Blelloch. 2015. Smaller and faster: Parallel processing of compressed graphs with Ligra+. In DCC.
[29]
Mansoureh Takaffoli, Reihaneh Rabbany, and Osmar R Zaiane. 2013. Incremental local community identification in dynamic social networks. In ASONAM.
[30]
Nan Tang, Qing Chen, and Prasenjit Mitra. 2016. Graph stream summarization: From big bang to big crunch. In SIGMOD.
[31]
Ioanna Tsalouchidou, Gianmarco De Francisci Morales, Francesco Bonchi, and Ricardo Baeza-Yates. 2016. Scalable dynamic graph summarization. In Big Data.
[32]
Peixiang Zhao, Charu C Aggarwal, and Min Wang. 2011. gSketch: on query estimation in graph streams. PVLDB, Vol. 5, 3 (2011), 193--204.
[33]
Zhongying Zhao, Chao Li, Xuejian Zhang, Francisco Chiclana, and Enrique Herrera Viedma. 2019. An incremental method to detect communities in dynamic evolving social networks. Knowledge-Based Systems, Vol. 163 (2019), 404--415.
[34]
Yang Zhou, Hong Cheng, and Jeffrey Xu Yu. 2010. Clustering large attributed graphs: An efficient incremental approach. In ICDM.

Cited By

View all
  • (2024)Enhanced Data Mining and Visualization of Sensory-Graph-Modeled Datasets through SummarizationSensors10.3390/s2414455424:14(4554)Online publication date: 14-Jul-2024
  • (2024)LM-SRPQ: Efficiently Answering Regular Path Query in Streaming GraphsProceedings of the VLDB Endowment10.14778/3641204.364121417:5(1047-1059)Online publication date: 1-Jan-2024
  • (2024)A Two-stage Coarsening Method for a Streaming Graph with Preserving Key FeaturesProceedings of the 2024 International Conference on Generative Artificial Intelligence and Information Security10.1145/3665348.3665392(253-260)Online publication date: 10-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
August 2020
3664 pages
ISBN:9781450379984
DOI:10.1145/3394486
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. graph compression
  2. graph summarization
  3. incremental algorithm

Qualifiers

  • Research-article

Funding Sources

  • National Research Foundation of Korea
  • Institute of Information & Communications Technology Planning & Evaluation

Conference

KDD '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)89
  • Downloads (Last 6 weeks)11
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Enhanced Data Mining and Visualization of Sensory-Graph-Modeled Datasets through SummarizationSensors10.3390/s2414455424:14(4554)Online publication date: 14-Jul-2024
  • (2024)LM-SRPQ: Efficiently Answering Regular Path Query in Streaming GraphsProceedings of the VLDB Endowment10.14778/3641204.364121417:5(1047-1059)Online publication date: 1-Jan-2024
  • (2024)A Two-stage Coarsening Method for a Streaming Graph with Preserving Key FeaturesProceedings of the 2024 International Conference on Generative Artificial Intelligence and Information Security10.1145/3665348.3665392(253-260)Online publication date: 10-May-2024
  • (2024)Graph Summarization: Compactness Meets EfficiencyProceedings of the ACM on Management of Data10.1145/36549432:3(1-26)Online publication date: 30-May-2024
  • (2024) SsAG: Summarization and Sparsification of Attributed GraphsACM Transactions on Knowledge Discovery from Data10.1145/365161918:6(1-22)Online publication date: 12-Apr-2024
  • (2024)Node Embedding Preserving Graph SummarizationACM Transactions on Knowledge Discovery from Data10.1145/364950518:6(1-19)Online publication date: 12-Apr-2024
  • (2024)General-purpose query processing on summary graphsSocial Network Analysis and Mining10.1007/s13278-024-01314-w14:1Online publication date: 9-Aug-2024
  • (2023)Featured graph coarsening with similarity guaranteesProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619148(17953-17975)Online publication date: 23-Jul-2023
  • (2023)Causal Data IntegrationProceedings of the VLDB Endowment10.14778/3603581.360360216:10(2659-2665)Online publication date: 1-Jun-2023
  • (2023)Auxo: A Scalable and Efficient Graph Stream Summarization StructureProceedings of the VLDB Endowment10.14778/3583140.358315416:6(1386-1398)Online publication date: 20-Apr-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media