Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Incremental Sliding Window Connectivity over Streaming Graphs

Published: 06 August 2024 Publication History

Abstract

We study index-based processing for connectivity queries within sliding windows on streaming graphs. These queries, which determine whether two vertices belong to the same connected component, are fundamental operations in real-time graph data processing and demand high throughput and low latency. While indexing methods that leverage data structures for fully dynamic connectivity can facilitate efficient query processing, they encounter significant challenges with deleting expired edges from the window during window updates. We introduce a novel indexing approach that eliminates the need for physically performing edge deletions. This is achieved through a unique bidirectional incremental computation framework, referred to as the BIC model. The BIC model implements two distinct incremental computations to compute connected components within the window, operating along and against the timeline, respectively. These computations are then merged to efficiently compute queries in the window. We propose techniques for optimized index storage, incremental index updates, and efficient query processing to improve BIC effectiveness. Empirically, BIC achieves a 14× increase in throughput and a reduction in P95 latency by up to 3900× when compared to state-of-the-art indexes.

References

[1]
Tyler Akidau, Alex Balikov, Kaya Bekiroğlu, Slava Chernyak, Josh Haberman, Reuven Lax, Sam McVeety, Daniel Mills, Paul Nordstrom, and Sam Whittle. 2013. MillWheel: Fault-Tolerant Stream Processing at Internet Scale. Proc. VLDB Endowment 6, 11 (2013), 1033--1044.
[2]
David Alberts, Giuseppe Cattaneo, and Giuseppe F. Italiano. 1997. An Empirical Study of Dynamic Graph Algorithms. ACM J. Exp. Algorithmics 2 (jan 1997), 5--es.
[3]
Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle, and Michael Grossniklaus. 2009. C-SPARQL: SPARQL for Continuous Querying. In Proc. 18th Int. World Wide Web Conf. 1061--1062.
[4]
Jean-Paul Calbimonte, Oscar Corcho, and Alasdair J. G. Gray. 2010. Enabling Ontology-Based Access to Streaming Data Sources. In Proc. 9th Int. Semantic Web Conf. 96--111.
[5]
Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache flink: Stream and batch processing in a single engine. IEEE Data Eng. Bull. 38, 4 (2015), 28--38.
[6]
Qing Chen, Oded Lachish, Sven Helmer, and Michael H. Böhlen. 2022. Dynamic Spanning Trees for Connectivity Queries on Fully-Dynamic Undirected Graphs. Proc. VLDB Endowment 15, 11 (2022), 3263--3276.
[7]
Daniele Dell'Aglio, Jean-Paul Calbimonte, Emanuele Della Valle, and Oscar Corcho. 2015. Towards a Unified Language for RDF Stream Query Processing. In Proc. 12th Extended Semantic Web Conf. 353--363.
[8]
David Ediger, Rob McColl, Jason Riedy, and David A. Bader. 2012. STINGER: High performance data structure for streaming graphs. In Proc. 28th Int. Conf. on Data Engineering. 1--5.
[9]
David Eppstein, Zvi Galil, Giuseppe F. Italiano, and Amnon Nissenzweig. 1997. Sparsification---a Technique for Speeding up Dynamic Graph Algorithms. J. ACM 44, 5 (sep 1997), 669--696.
[10]
Greg N. Frederickson. 1985. Data Structures for On-Line Updating of Minimum Spanning Trees, with Applications. SIAM J. on Comput. 14, 4 (1985), 781--798.
[11]
David Gibb, Bruce Kapron, Valerie King, and Nolan Thorn. 2015. Dynamic graph connectivity with improved worst case update time and sublinear space. arXiv:1509.06464 [cs.DS]
[12]
Lukasz Golab and M. Tamer Özsu. 2003. Issues in Data Stream Management. ACM SIGMOD Rec. 32, 2 (jun 2003), 5--14.
[13]
Monika Rauch Henzinger and Michael L Fredman. 1998. Lower bounds for fully dynamic connectivity problems in graphs. Algorithmica 22, 3 (1998), 351--362.
[14]
Monika Rauch Henzinger and Valerie King. 1995. Randomized Dynamic Graph Algorithms with Polylogarithmic Time per Operation. In Proc. 27th Annual ACM Symp. on Theory of Computing. Association for Computing Machinery, New York, NY, USA, 519--527.
[15]
Monika R. Henzinger and Valerie King. 1999. Randomized Fully Dynamic Graph Algorithms with Polylogarithmic Time per Operation. J. ACM 46, 4 (1999), 502--516.
[16]
Monika R. Henzinger and Valerie King. 2001. Maintaining Minimum Spanning Forests in Dynamic Graphs. SIAM J. on Comput. 31, 2 (2001), 364--374.
[17]
Amy Hodler. 2023. White Paper: Financial Fraud Detection with Graph Data Science.
[18]
Jacob Holm, Kristian de Lichtenberg, and Mikkel Thorup. 1998. Poly-Logarithmic Deterministic Fully-Dynamic Algorithms for Connectivity, Minimum Spanning Tree, 2-Edge, and Biconnectivity. In Proc. 30th Annual ACM Symp. on Theory of Computing. 79--89.
[19]
Jacob Holm, Kristian de Lichtenberg, and Mikkel Thorup. 2001. Poly-Logarithmic Deterministic Fully-Dynamic Algorithms for Connectivity, Minimum Spanning Tree, 2-Edge, and Biconnectivity. J. ACM 48, 4 (2001), 723--760.
[20]
Shang-En Huang, Dawei Huang, Tsvi Kopelowitz, and Seth Pettie. 2017. Fully Dynamic Connectivity in O(Log n(Log Log n)2) Amortized Expected Time. In Proc. 28th Annual ACM-SIAM Symp. on Discrete Algorithms. 510--520.
[21]
Anand Padmanabha Iyer, Li Erran Li, Tathagata Das, and Ion Stoica. 2016. Time-Evolving Graph Processing at Scale. In Proc. 4th Int. Workshop on Graph Data Management Experiences & Systems. Article 5, 6 pages.
[22]
Raj Iyer, David Karger, Hariharan Rahul, and Mikkel Thorup. 2002. An Experimental Study of Polylogarithmic, Fully Dynamic, Connectivity Algorithms. ACM J. Exp. Algorithmics 6 (dec 2002), 4--es.
[23]
Bruce M. Kapron, Valerie King, and Ben Mountjoy. 2013. Dynamic Graph Connectivity in Polylogarithmic Worst Case Time. In Proc. 24th Annual ACM-SIAM Symp. on Discrete Algorithms. 1131--1142.
[24]
Casper Kejlberg-Rasmussen, Tsvi Kopelowitz, Seth Pettie, and Mikkel Thorup. 2016. Faster Worst Case Deterministic Dynamic Connectivity. In 24th Annual European Symposium on Algorithms (ESA 2016), Vol. 57. 53:1--53:15.
[25]
Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel, Karthik Ramasamy, and Siddarth Taneja. 2015. Twitter Heron: Stream Processing at Scale. In Proc. ACM SIGMOD Int. Conf. on Management of Data. 239--250.
[26]
Pradeep Kumar and H. Howie Huang. 2020. GraphOne: A Data Store for RealTime Analytics on Evolving Graphs. ACM Trans. Storage 15, 4, Article 29 (jan 2020), 40 pages.
[27]
Danh Le-Phuoc, Minh Dao-Tran, Josiane Xavier Parreira, and Manfred Hauswirth. 2011. A Native and Adaptive Approach for Unified Processing of Linked Streams and Linked Data. In Proc. 10th Int. Semantic Web Conf. 370--388.
[28]
Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.
[29]
Mugilan Mariappan and Keval Vora. 2019. GraphBolt: Dependency-Driven Synchronous Processing of Streaming Graphs. In Proc. 14th ACM SIGOPS/EuroSys European Conf. on Comp. Syst. Article 25, 16 pages.
[30]
Andrew McGregor. 2014. Graph Stream Algorithms: A Survey. ACM SIGMOD Rec. 43, 1 (may 2014), 9--20.
[31]
Peter Bro Miltersen, Sairam Subramanian, Jeffrey Scott Vitter, and Roberto Tamassia. 1994. Complexity models for incremental computation. Theor. Comput. Sci. 130, 1 (1994), 203--236.
[32]
Richard C Murphy, Kyle B Wheeler, Brian W Barrett, and James A Ang. 2010. Introducing the graph 500. Cray Users Group (CUG) 19 (2010), 45--74.
[33]
Mark E. J. Newman. 2010. Networks: An Introduction. Oxford University Press.
[34]
Evelien Otte and Ronald Rousseau. 2002. Social network analysis: a powerful strategy, also for the information sciences. J. Inf. Sci. 28, 6 (2002), 441--453. arXiv:https://doi.org/10.1177/016555150202800601
[35]
Anil Pacaci, Angela Bonifati, and M. Tamer Özsu. 2020. Regular Path Query Evaluation on Streaming Graphs. In Proc. ACM SIGMOD Int. Conf. on Management of Data. 1415--1430.
[36]
Mihai Patrascu and Erik D. Demaine. 2006. Logarithmic Lower Bounds in the Cell-Probe Model. SIAM J. on Comput. 35, 4 (2006), 932--963.
[37]
Christopher Rost, Kevin Gomez, Matthias Täschner, Philip Fritzsche, Lucas Schons, Lukas Christ, Timo Adameit, Martin Junghanns, and Erhard Rahm. 2021. Distributed Temporal Graph Analytics with GRADOOP. VLDB J. 31, 2 (may 2021), 375--401.
[38]
Siddhartha Sahu, Amine Mhedhbi, Semih Salihoglu, Jimmy Lin, and M. Tamer Özsu. 2017. The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing. Proc. VLDB Endowment 11, 4 (2017), 420--431.
[39]
Siddhartha Sahu, Amine Mhedhbi, Semih Salihoglu, Jimmy Lin, and M Tamer Özsu. 2020. The ubiquity of large graphs and surprising challenges of graph processing: extended survey. VLDB J. 29, 2 (2020), 595--618.
[40]
Sherif Sakr, Angela Bonifati, Hannes Voigt, Alexandru Iosup, Khaled Ammar, Renzo Angles, Walid G. Aref, Marcelo Arenas, Maciej Besta, Peter A. Boncz, Khuzaima Daudjee, Emanuele Della Valle, Stefania Dumbrava, Olaf Hartig, Bernhard Haslhofer, Tim Hegeman, Jan Hidders, Katja Hose, Adriana Iamnitchi, Vasiliki Kalavri, Hugo Kapp, Wim Martens, M. Tamer Özsu, Eric Peukert, Stefan Plantikow, Mohamed Ragab, Matei Ripeanu, Semih Salihoglu, Christian Schulz, Petra Selmer, Juan F. Sequeda, Joshua Shinavier, Gábor Szárnyas, Riccardo Tommasini, Antonino Tumeo, Alexandru Uta, Ana Lucia Varbanescu, Hsiang-Yun Wu, Nikolay Yakovets, Da Yan, and Eiko Yoneki. 2021. The future is big graphs: A community view on graph processing systems. Commun. ACM 64, 9 (2021), 62--71.
[41]
Robert Sedgewick and Kevin Wayne. 2011. Algorithms, 4th Edition. Addison-Wesley. I--XII, 1--955 pages.
[42]
Dipanjan Sengupta, Narayanan Sundaram, Xia Zhu, Theodore L. Willke, Jeffrey Young, Matthew Wolf, and Karsten Schwan. 2016. GraphIn: An Online High Performance Incremental Graph Processing Framework. In Proc. 22nd Int. Euro-Par Conf. 319--333.
[43]
Feng Sheng, Qiang Cao, Haoran Cai, Jie Yao, and Changsheng Xie. 2018. GraPU: Accelerate Streaming Graph Analysis through Preprocessing Buffered Updates. In Proc. 9th ACM Symp. on Cloud Computing. 301--312.
[44]
Gábor Szárnyas, Jack Waudby, Benjamin A. Steer, Dávid Szakállas, Altan Birler, Mingxi Wu, Yuchen Zhang, and Peter Boncz. 2022. The LDBC Social Network Benchmark: Business Intelligence Workload. Proc. VLDB Endowment 16, 4 (2022), 877--890.
[45]
Robert Endre Tarjan. 1975. Efficiency of a Good But Not Linear Set Union Algorithm. J. ACM 22, 2 (1975), 215--225.
[46]
Robert Endre Tarjan. 1979. A class of algorithms which require nonlinear time to maintain disjoint sets. J. Comput. System Sci. 18, 2 (1979), 110--127.
[47]
Robert E. Tarjan and Uzi Vishkin. 1985. An Efficient Parallel Biconnectivity Algorithm. SIAM J. on Comput. 14, 4 (1985), 862--874.
[48]
Mikkel Thorup. 2000. Near-Optimal Fully-Dynamic Graph Connectivity. In Proc. 32nd Annual ACM Symp. on Theory of Computing. 343--350.
[49]
Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, and Dmitriy Ryaboy. 2014. Storm@twitter. In Proc. ACM SIGMOD Int. Conf. on Management of Data. 147--156.
[50]
Christian Von Ferber, Taras Holovatch, Yu Holovatch, and V Palchykov. 2009. Public transport networks: empirical analysis and modeling. Eur. Phys. J. B 68 (2009), 261--275.
[51]
Zhengyu Wang. 2015. An Improved Randomized Data Structure for Dynamic Graph Connectivity. arXiv:1510.04590 [cs.DS]
[52]
Wikipedia. 2023. Interval tree --- Wikipedia, The Free Encyclopedia. http://en.wikipedia.org/w/index.php?title=Interval%20tree&oldid=1178718117. [Online; accessed 23-October-2023].
[53]
Christian Wulff-Nilsen. 2013. Faster Deterministic Fully-Dynamic Graph Connectivity. In Proc. 24th Annual ACM-SIAM Symp. on Discrete Algorithms. 1757--1769.
[54]
Christian Wulff-Nilsen. 2017. Fully-Dynamic Minimum Spanning Forest with Improved Worst-Case Update Time. In Proc. 49th Annual ACM Symp. on Theory of Computing. 1130--1143.
[55]
Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. 2013. Discretized Streams: Fault-Tolerant Streaming Computation at Scale. In Proc. 24th ACM Symp. on Operating System Principles. 423--438.
[56]
Chao Zhang, Angela Bonifati, and M. Tamer Özsu. 2024. Incremental Sliding Window Connectivity over Streaming Graphs. arXiv:2406.06754 [cs.DB]

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 17, Issue 10
June 2024
276 pages
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 06 August 2024
Published in PVLDB Volume 17, Issue 10

Check for updates

Badges

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 25
    Total Downloads
  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)8
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media