Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3661304.3661900acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Free access

Optimizing Differential Computation for Large-Scale Graph Processing

Published: 09 June 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Differential computation (DC) has emerged as a powerful general technique for maintaining computations over evolving datasets, even those containing arbitrarily nested loops, making DC particularly well-suited for graph computations. However, the general maintenance technique used by DC makes it less efficient for application-specific workloads. This paper shows how application-specific optimizations can improve both the runtime and memory characteristics of Differential Dataflow (DD), the reference implementation of DC. We present three optimizations for DD that make it more suitable for graph processing. Our first two optimizations are a redesign the in-memory indices that DD uses to maintain operator state, making them more read-friendly and decreasing the amount of data that operators need to scan from the indices. Next, we observe that DD's Reduce operator performs an expensive recomputation to determine whether or not there are any new outputs for changed inputs, even if the outputs have not actually changed. Our third optimization, called Fast Empty Difference Verification, detects when there are no output changes without performing DD's default rerunning logic. We present experiments on a variety of graph computation workloads demonstrating that our optimizations improve DD's runtime by up to 19× and reduce memory consumption by up to 1.7×.

    References

    [1]
    2024. Docker. https://www.docker.com
    [2]
    2024. An Implementation of Differential Dataflow Using Timely Dataflow on Rust. https://github.com/TimelyDataflow/differential-dataflow
    [3]
    Martín Abadi, Frank McSherry, and Gordon D. Plotkin. 2015. Foundations of Differential Dataflow. In FoSSaCS. https://doi.org/10.1007/978-3-662-46678-0_5
    [4]
    Khaled Ammar, Siddhartha Sahu, Semih Salihoglu, and M. Tamer Özsu. 2022. Optimizing Differentially-Maintained Recursive Queries on Dynamic Graphs. PVLDB 15, 11 (2022). https://doi.org/10.14778/3551793.3551862
    [5]
    Justin DeBrabant, Andrew Pavlo, Stephen Tu, Michael Stonebraker, and Stan Zdonik. 2013. Anti-Caching: A New Approach to Database Management System Architecture. PVLDB 6, 14 (2013). https://doi.org/10.14778/2556549.2556575
    [6]
    Ahmed Eldawy, Justin Levandoski, and Per-Åke Larson. 2014. Trekking Through Siberia: Managing Cold Data in a Memory-Optimized Database. PVLDB 7, 11 (2014). https://doi.org/10.14778/2732967.2732968
    [7]
    Jon Gjengset, Malte Schwarzkopf, Jonathan Behrens, Lara Timbó Araújo, Martin Ek, Eddie Kohler, M. Frans Kaashoek, and Robert Tappan Morris. 2018. Noria: Dynamic, Partially-Stateful Data-Flow for High-Performance Web Applications. In OSDI. https://www.usenix.org/conference/osdi18/presentation/gjengset
    [8]
    Goetz Graefe. 2018. Buffer Pool. In Encyclopedia of Database Systems. https://doi.org/10.1007/978-1-4614-8265-9_682
    [9]
    Viktor Leis, Michael Haubenschild, Alfons Kemper, and Thomas Neumann. 2018. LeanStore: In-Memory Data Management beyond Main Memory. In ICDE. https://doi.org/10.1109/ICDE.2018.00026
    [10]
    Frank McSherry, Andrea Lattuada, Malte Schwarzkopf, and Timothy Roscoe. 2020. Shared Arrangements: Practical Inter-Query Sharing for Streaming Dataflows. PVLDB 13, 10 (2020). https://doi.org/10.14778/3401960.3401974
    [11]
    Frank McSherry, Derek Gordon Murray, Rebecca Isaacs, and Michael Isard. 2013. Differential Dataflow. In CIDR. http://cidrdb.org/cidr2013/Papers/CIDR13_Paper111.pdf
    [12]
    Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. 2013. Naiad: A Timely Dataflow System. In ACM SOSP. https://doi.org/10.1145/2517349.2522738
    [13]
    Thomas Neumann and Michael Freitag. 2020. Umbra: A Disk-Based System with In-Memory Performance. In CIDR. http://cidrdb.org/cidr2020/papers/p29-neumann-cidr20.pdf
    [14]
    Zechao Shang, Xi Liang, Dixin Tang, Cong Ding, Aaron J Elmore, Sanjay Krishnan, and Michael J Franklin. 2020. CrocodileDB: Efficient Database Execution through Intelligent Deferment. In CIDR. http://cidrdb.org/cidr2020/papers/p14-shang-cidr20.pdf
    [15]
    Quoc-Cuong To, Juan Soto, and Volker Markl. 2018. A Survey of State Management in Big Data Processing Systems. The VLDB Journal 27, 6 (2018). https://doi.org/10.1007/S00778-018-0514-9

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    GRADES-NDA '24: Proceedings of the 7th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)
    June 2024
    62 pages
    ISBN:9798400706530
    DOI:10.1145/3661304
    • Editors:
    • Olaf Hartig,
    • Zoi Kaoudi
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 June 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Differential computation
    2. Graph processing
    3. Incremental dataflow computation

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    SIGMOD/PODS '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 29 of 61 submissions, 48%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 15
      Total Downloads
    • Downloads (Last 12 months)15
    • Downloads (Last 6 weeks)15

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media