Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3698038.3698524acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Open access

IncBoost: Scaling Incremental Graph Processing for Edge Deletions and Weight Updates

Published: 20 November 2024 Publication History

Abstract

Incremental query evaluation is key to efficiently processing rapidly changing graph data. By focusing on the parts of the query results affected by updates, it avoids unnecessary computations, allowing for faster query evaluation. While this technique works well in the cases of edge insertions, its benefit quickly diminishes when the volumes of edge deletions and edge weight updates increases.
To address the above scalability issue, this work introduces several techniques for handling large update batches that include many edge deletions and weight updates. First, for edge deletions, this work introduces a bottom-up dependency tracing method to identify the affected vertices. Unlike the existing top-down tracing, it completely avoids traversing the underlying graph, thus more scalable for large deletion batches. Second, for edge weight updates, existing graph systems treat each weight change as an edge deletion (with old weight) followed by an edge insertion (with new weight). This "two-round" method is computationally excessive. This work shows that it is, in fact, possible to handle weight updates directly. Finally, this work shows the benefits of adjusting the processing strategy according to the update volume. We integrated the above ideas into a graph system called IncBoost. Based on our evaluation, IncBoost can scale incremental query evaluation to large update batches that represent 30-60% of the graph size. By contrast, the state-of-the-art streaming graph system (RisGraph) typically fails to yield benefits when the batch size reaches 5-15% of the graph size. Regarding the absolute processing time, IncBoost consistently outperforms RisGraph with 3.1× and 5.2× speedups for edge deletions and weight updates on large batches, respectively.

References

[1]
2013. Wikipedia links, english network dataset. http://konect.cc/networks/wikipedia_link_en/. Accessed: 2022-01-02.
[2]
Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. 2006. Group formation in large social networks: membership, growth, and evolution. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. 44--54.
[3]
Abanti Basak, Zheng Qu, Jilan Lin, Alaa R Alameldeen, Zeshan Chishti, Yufei Ding, and Yuan Xie. 2021. Improving streaming graph processing performance using input knowledge. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture. 1036--1050.
[4]
Matthew Baxter, Tarek Elgindy, Andreas T Ernst, Thomas Kalinowski, and Martin WP Savelsbergh. 2014. Incremental network design with shortest paths. European Journal of Operational Research 238, 3 (2014), 675--684.
[5]
Raymond Cheng, Ji Hong, Aapo Kyrola, Youshan Miao, Xuetian Weng, Ming Wu, Fan Yang, Lidong Zhou, Feng Zhao, and Enhong Chen. 2012. Kineograph: taking the pulse of a fast-changing and connected world. In Proceedings of the 7th ACM european conference on Computer Systems. 85--98.
[6]
Avery Ching, Sergey Edunov, Maja Kabiljo, Dionysios Logothetis, and Sambavi Muthukrishnan. 2015. One trillion edges: Graph processing at facebook-scale. Proceedings of the VLDB Endowment 8, 12 (2015), 1804--1815.
[7]
Laxman Dhulipala, Guy E Blelloch, and Julian Shun. 2019. Low-latency graph streaming using compressed purely-functional trees. In Proceedings of the 40th ACM SIGPLAN conference on programming language design and implementation. 918--934.
[8]
Wenfei Fan and Chao Tian. 2022. Incremental graph computations: Doable and undoable. ACM Transactions on Database Systems (TODS) 47, 2 (2022), 1--44.
[9]
Wenfei Fan, Chao Tian, Ruiqi Xu, Qiang Yin, Wenyuan Yu, and Jingren Zhou. 2021. Incrementalizing graph algorithms. In Proceedings of the 2021 International Conference on Management of Data. 459--471.
[10]
Guanyu Feng, Zixuan Ma, Daixuan Li, Shengqi Chen, Xiaowei Zhu, Wentao Han, and Wenguang Chen. 2021. RisGraph: A Real-Time Streaming System for Evolving Graphs to Support Sub-millisecond Per-update Analysis at Millions Ops/s. In Proceedings of the 2021 International Conference on Management of Data. 513--527.
[11]
Shufeng Gong, Chao Tian, Qiang Yin, Wenyuan Yu, Yanfeng Zhang, Liang Geng, Song Yu, Ge Yu, and Jingren Zhou. 2021. Automating incremental graph processing with flexible memoization. Proceedings of the VLDB Endowment 14, 9 (2021), 1613--1625.
[12]
Anja Gruenheid, Xin Luna Dong, and Divesh Srivastava. 2014. Incremental record linkage. Proceedings of the VLDB Endowment 7, 9 (2014), 697--708.
[13]
Wentao Han, Youshan Miao, Kaiwei Li, Ming Wu, Fan Yang, Lidong Zhou, Vijayan Prabhakaran, Wenguang Chen, and Enhong Chen. 2014. Chronos: a graph engine for temporal graph analysis. In Proceedings of the Ninth European Conference on Computer Systems. 1--14.
[14]
Monika Henzinger, Ami Paz, and Stefan Schmid. 2021. On the Complexity of Weight-Dynamic Network Algorithms. In 2021 IFIP Networking Conference (IFIP Networking). IEEE, 1--9.
[15]
Abdullah Al Raqibul Islam, Dong Dai, and Dazhao Cheng. 2022. VCSR: Mutable CSR Graph Format Using Vertex-Centric Packed Memory Array. In 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid). IEEE, 71--80.
[16]
Xiaolin Jiang, Chengshuo Xu, Xizhe Yin, Zhijia Zhao, and Rajiv Gupta. 2021. Tripoline: generalized incremental graph processing via graph triangle inequality. In Proceedings of the Sixteenth European Conference on Computer Systems. 17--32.
[17]
Wolfgang Kellerer, Patrick Kalmbach, Andreas Blenk, Arsany Basta, Martin Reisslein, and Stefan Schmid. 2019. Adaptable and data-driven softwarized networks: Review, opportunities, and challenges. Proc. IEEE 107, 4 (2019), 711--731.
[18]
Jon M Kleinberg, Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew S Tomkins. 1999. The web as a graph: Measurements, models, and methods. In International Computing and Combinatorics Conference. Springer, 1--17.
[19]
Pradeep Kumar and H Howie Huang. 2020. Graphone: A data store for real-time analytics on evolving graphs. ACM Transactions on Storage (TOS) 15, 4 (2020), 1--40.
[20]
Jérôme Kunegis. 2013. Konect: the koblenz network collection. In Proceedings of the 22nd international conference on world wide web. 1343--1350.
[21]
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media?. In Proceedings of the 19th international conference on World wide web. 591--600.
[22]
Leslie Lamport. 2019. Time, clocks, and the ordering of events in a distributed system. In Concurrency: the Works of Leslie Lamport. 179--196.
[23]
Jüri Lember, Dario Gasbarra, Alexey Koloydenko, and Kristi Kuljus. 2019. Estimation of Viterbi path in Bayesian hidden Markov models. Metron 77 (2019), 137--169.
[24]
Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford large network dataset collection.
[25]
Dennis Luxen and Christian Vetter. 2011. Real-time routing with OpenStreetMap data. In Proceedings of the 19th ACM SIGSPATIAL international conference on advances in geographic information systems. 513--516.
[26]
Mugilan Mariappan, Joanna Che, and Keval Vora. 2021. DZiG: sparsity-aware incremental processing of streaming graphs. In Proceedings of the Sixteenth European Conference on Computer Systems. 83--98.
[27]
Mugilan Mariappan and Keval Vora. 2019. Graphbolt: Dependency-driven synchronous processing of streaming graphs. In Proceedings of the Fourteenth EuroSys Conference 2019. 1--16.
[28]
Frank McSherry, Andrea Lattuada, Malte Schwarzkopf, and Timothy Roscoe. [n. d.]. Shared Arrangements: practical inter-query sharing for streaming dataflows. Proceedings of the VLDB Endowment 13, 10 ([n.d.]).
[29]
Frank McSherry, Derek Gordon Murray, Rebecca Isaacs, and Michael Isard. 2013. Differential Dataflow. In CIDR.
[30]
Derek G Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. 2013. Naiad: a timely dataflow system. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. 439--455.
[31]
Meghana Nasre, Matteo Pontecorvi, and Vijaya Ramachandran. 2014. Betweenness centrality-incremental and faster. In International Symposium on Mathematical Foundations of Computer Science. Springer, 577--588.
[32]
Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2013. A lightweight infrastructure for graph analytics. In Proceedings of the twenty-fourth ACM symposium on operating systems principles. 456--471.
[33]
Prashant Pandey, Brian Wheatman, Helen Xu, and Aydin Buluc. 2021. Terrace: A hierarchical graph container for skewed dynamic graphs. In Proceedings of the 2021 International Conference on Management of Data. 1372--1385.
[34]
Joseph Picone. 1990. Continuous speech recognition using hidden Markov models. IEEE Assp magazine 7, 3 (1990), 26--41.
[35]
Enric Pujol, Ingmar Poese, Johannes Zerwas, Georgios Smaragdakis, and Anja Feldmann. 2019. Steering hyper-giants' traffic at scale. In Proceedings of the 15th International Conference on Emerging Networking Experiments And Technologies. 82--95.
[36]
Shafiur Rahman, Mahbod Afarin, Nael Abu-Ghazaleh, and Rajiv Gupta. 2021. JetStream: Graph analytics on streaming data with event-driven hardware accelerator. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture. 1091--1105.
[37]
Ganesan Ramalingam and Thomas Reps. 1996. An incremental algorithm for a generalization of the shortest-path problem. Journal of Algorithms 21, 2 (1996), 267--305.
[38]
Ryan A. Rossi and Nesreen K. Ahmed. 2015. The Network Data Repository with Interactive Graph Analytics and Visualization. In AAAI. https://networkrepository.com
[39]
Scott Sallinen, Roger Pearce, and Matei Ripeanu. 2019. Incremental graph processing for on-line analytics. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 1007--1018.
[40]
Xiaogang Shi, Bin Cui, Yingxia Shao, and Yunhai Tong. 2016. Tornado: A system for real-time iterative analysis over evolving data. In Proceedings of the 2016 International Conference on Management of Data. 417--430.
[41]
Julian Shun and Guy E Blelloch. 2013. Ligra: a lightweight graph processing framework for shared memory. In Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming. 135--146.
[42]
William F Tinney and John W Walker. 1967. Direct solutions of sparse network equations by optimally ordered triangular factorization. Proc. IEEE 55, 11 (1967), 1801--1809.
[43]
Alexander van der Grinten, Maria Predari, and Florian Willich. 2022. A fast data structure for dynamic graphs based on hash-indexed adjacency blocks. In 20th International Symposium on Experimental Algorithms (SEA 2022). Schloss Dagstuhl-Leibniz-Zentrum für Informatik.
[44]
Andrew Viterbi. 1967. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE transactions on Information Theory 13, 2 (1967), 260--269.
[45]
Keval Vora, Rajiv Gupta, and Guoqing Xu. 2017. Kickstarter: Fast and accurate computations on streaming graphs via trimmed approximations. In Proceedings of the twenty-second international conference on architectural support for programming languages and operating systems. 237--251.
[46]
Shoujin Wang, Liang Hu, Yan Wang, Xiangnan He, Quan Z Sheng, Mehmet A Orgun, Longbing Cao, Francesco Ricci, and Philip S Yu. 2021. Graph learning based recommender systems: A review. arXiv preprint arXiv:2105.06339 (2021).
[47]
Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian Shun, and Saman Amarasinghe. 2018. Graphit: A highperformance graph dsl. Proceedings of the ACM on Programming Languages 2, OOPSLA (2018), 1--30.
[48]
Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. 2016. Gemini: A computation-centric distributed graph processing system. In OSDI, Vol. 16. 301--316.
[49]
Xiaowei Zhu, Guanyu Feng, Marco Serafini, Xiaosong Ma, Jiping Yu, Lei Xie, Ashraf Aboulnaga, and Wenguang Chen. [n. d.]. LiveGraph: A Transactional Graph Storage System with Purely Sequential Adjacency List Scans. Proceedings of the VLDB Endowment 13, 7 ([n.d.]).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SoCC '24: Proceedings of the 2024 ACM Symposium on Cloud Computing
November 2024
1062 pages
ISBN:9798400712869
DOI:10.1145/3698038
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 November 2024

Check for updates

Author Tags

  1. graph processing
  2. incremental query evaluation

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

SoCC '24
Sponsor:
SoCC '24: ACM Symposium on Cloud Computing
November 20 - 22, 2024
WA, Redmond, USA

Acceptance Rates

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 150
    Total Downloads
  • Downloads (Last 12 months)150
  • Downloads (Last 6 weeks)67
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media