Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Optimizing differentially-maintained recursive queries on dynamic graphs

Published: 01 July 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Differential computation (DC) is a highly general incremental computation/view maintenance technique that can maintain the output of an arbitrary and possibly recursive dataflow computation upon changes to its base inputs. As such, it is a promising technique for graph database management systems (GDBMS) that support continuous recursive queries over dynamic graphs. Although differential computation can be highly efficient for maintaining these queries, it can require prohibitively large amount of memory. This paper studies how to reduce the memory overhead of DC with the goal of increasing the scalability of systems that adopt it. We propose a suite of optimizations that are based on dropping the differences of operators, both completely or partially, and recomputing these differences when necessary. We propose deterministic and probabilistic data structures to keep track of the dropped differences. Extensive experiments demonstrate that the optimizations can improve the scalability of a DC-based continuous query processor.

    References

    [1]
    2022. REFINITIV Knowledge Graph. https://solutions.refinitiv.com/KnowledgeGraphs.
    [2]
    Martín Abadi, Frank McSherry, and Gordon D Plotkin. 2015. Foundations of Differential Dataflow. In Foundations of Software Science and Computation Structures, Andrew Pitts (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 71--83.
    [3]
    Khaled Ammar, Siddhartha Sahu, Semih Salihoglu, and M. Tamer Özsu. 2022. Optimizing Differentially-Maintained Recursive Queries on Dynamic Graphs.
    [4]
    Burton H Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13, 7 (1970), 422--426.
    [5]
    Angela Bonifati, Wim Martens, and Thomas Timm. 2019. Navigating the maze of Wikidata query logs. In Proc. 28th Int. World Wide Web Conf. 127--138.
    [6]
    TimothyM. Chan. 2005. All-Pairs Shortest Paths with Real Weights in O(n 3/log n) Time. In Algorithms and Data Structures, Frank Dehne, Alejandro López-Ortiz, and Jörg-Rüdiger Sack (Eds.). Lecture Notes in Computer Science, Vol. 3608. Springer, 318--324.
    [7]
    Timothy M. Chan. 2012. All-pairs Shortest Paths for Unweighted Undirected Graphs in O(Mn) Time. ACM Trans. Algorithms 8, 4, Article 34 (Oct. 2012), 17 pages.
    [8]
    Camil Demetrescu and Giuseppe F Italiano. 2001. Fully dynamic all pairs shortest paths with real edge weights. In Proceedings 42nd IEEE Symposium on Foundations of Computer Science. IEEE, 260--267.
    [9]
    Camil Demetrescu and Giuseppe F. Italiano. 2004. A New Approach to Dynamic All Pairs Shortest Paths. J. ACM 51, 6 (2004), 968--992.
    [10]
    Orri Erling, Alex Averbuch, Josep Larriba-Pey, Hassan Chafi, Andrey Gubichev, Arnau Prat, Minh-Duc Pham, and Peter Boncz. 2015. The LDBC Social Network Benchmark: Interactive Workload. In SIGMOD. 619--630.
    [11]
    Bin Fan, Dave G Andersen, Michael Kaminsky, and Michael D Mitzenmacher. 2014. Cuckoo filter: Practically better than bloom. In ACM CoNEXT. 75--88.
    [12]
    Andrew V Goldberg and Chris Harrelson. 2005. Computing the shortest path: A search meets graph theory. In SODA, Vol. 5. Citeseer, 156--165.
    [13]
    Todd J Green, Shan Shan Huang, Boon Thau Loo, Wenchao Zhou, et al. 2013. Datalog and recursive query processing. Foundations and Trends in Databases.
    [14]
    Anand Padmanabha Iyer, Qifan Pu, Kishan Patel, Joseph E. Gonzalez, and Ion Stoica. 2021. TEGRA: Efficient Ad-Hoc Analytics on Evolving Graphs. In NSDI. 337--355. https://www.usenix.org/conference/nsdi21/presentation/iyer
    [15]
    U. Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. 2009. PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations. In ICDM.
    [16]
    Chathura Kankanamge, Siddhartha Sahu, Amine Mhedbhi, Jeremy Chen, and Semih Salihoglu. 2017. Graphflow: An Active Graph Database. In SIGMOD.
    [17]
    Seongyun Ko, Taesung Lee, Kijae Hong, Wonseok Lee, In Seo, Jiwon Seo, and Wook-Shin Han. 2021. iTurboGraph: Scaling and Automating Incremental Graph Analytics. In SIGMOD. 977--990.
    [18]
    Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.
    [19]
    Peter S Loubai. 1967. A network evaluation procedure. Highway Research Record 205 (1967).
    [20]
    Mugilan Mariappan and Keval Vora. 2019. GraphBolt: Dependency-Driven Synchronous Processing of Streaming Graphs. In Proceedings of the Fourteenth EuroSys Conference. Article 25, 16 pages.
    [21]
    Frank McSherry. 2022. Differential Dataflow. https://github.com/frankmcsherry/differential-dataflow.
    [22]
    Frank McSherry, Derek Murray, Rebecca Isaacs, and Michael Isard. 2013. Differential Dataflow. In Proc. 6th Biennial Conf. on Innovative Data Systems Research.
    [23]
    Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. 2013. Naiad: A Timely Dataflow System. In Proc. 24th ACM Symp. on Operating System Principles. 439--455.
    [24]
    Anil Pacaci, Angela Bonifati, and M. Tamer Özsu. 2020. Regular Path Query Evaluation on Streaming Graphs. In SIGMOD. 1415--1430.
    [25]
    Michalis Potamias, Francesco Bonchi, Carlos Castillo, and Aristides Gionis. 2009. Fast Shortest Path Distance Estimation in Large Networks. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM 2009). Association for Computing Machinery, New York, NY, USA, 867--876.
    [26]
    VV Rodionov. 1968. The parametric problem of shortest distances. U. S. S. R. Comput. Math. and Math. Phys. 8, 5 (1968), 336--343.
    [27]
    Liam Roditty and Uri Zwick. 2011. On dynamic shortest paths problems. Algorithmica 61, 2 (2011), 389--401.
    [28]
    Leonid Ryzhyk and Mihai Budiu. 2019. Differential Datalog. In Datalog. 56--67.
    [29]
    Siddhartha Sahu and Semih Salihoglu. 2021. Graphsurge: Graph Analytics on View Collections Using Differential Computation. In SIGMOD. 1518--1530.
    [30]
    Semih Salihoglu and Jennifer Widom. 2014. HelP: High-Level Primitives For Large-Scale Graph Processing. In Proceedings of Workshop on Graph Data Management Experiences and Systems.
    [31]
    Christian Stuecklberger. 2016. Expressing the Routing Logic of a SDN Controller as a Differential Dataflow. Master's thesis. ETH Zürich.
    [32]
    Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster Computing with Working Sets. In HotCloud. https://www.usenix.org/conference/hotcloud -cluster-computing-working-sets
    [33]
    Peng Zhang, Yuhao Huang, Aaron Gember-Jacobson, Wenbo Shi, Xu Liu, Hongkun Yang, and Zhiqiang Zuo. 2020. Incremental Network Configuration Verification. In HotNets. 81--87.

    Cited By

    View all
    • (2024)Optimizing Differential Computation for Large-Scale Graph ProcessingProceedings of the 7th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)10.1145/3661304.3661900(1-9)Online publication date: 14-Jun-2024
    • (2024)An Overview of Continuous Querying in (Modern) Data SystemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654679(605-612)Online publication date: 9-Jun-2024

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 15, Issue 11
    July 2022
    980 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 July 2022
    Published in PVLDB Volume 15, Issue 11

    Badges

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Optimizing Differential Computation for Large-Scale Graph ProcessingProceedings of the 7th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)10.1145/3661304.3661900(1-9)Online publication date: 14-Jun-2024
    • (2024)An Overview of Continuous Querying in (Modern) Data SystemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654679(605-612)Online publication date: 9-Jun-2024

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media