Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3661304.3661898acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Space & Time Efficient Leapfrog Triejoin

Published: 09 June 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Leapfrog Triejoin (LTJ) is arguably the most practical and popular worst-case-optimal (wco) algorithm for solving basic graph patterns in graph databases. Its main drawback is that it needs the database triples (subject, predicate, object) represented as paths in a trie, for each of the six orders of subject, predicate, and object. The resulting blowup in space makes most systems disregard LTJ or implement it only partially, and their corresponding algorithms be non-wco. In this paper we show that, by using compact data structures, it is possible to build an index that at the same matches the query time performance of the fastest classic wco index, and uses half the space of non-wco indices (which are much slower). Concretely, we make use of compact tree representations to store functional tries using one bit per trie edge, instead of one pointer. The resulting structure, called compactLTJ, uses 25% of the space of classic wco implementations and 45%-65% of classic non-wco systems. At solving queries, it is on par with the fastest classic wco system, and two orders of magnitude faster than non-wco systems. We further incorporate improved query resolution strategies into compactLTJ, which makes it considerably faster than any other alternative to display the first query results.

    References

    [1]
    C. R. Aberger, A. Lamb, S. Tu, A. Nötzli, K. Olukotun, and C. Ré. Emptyheaded: A relational engine for graph processing. ACM Transactions on Database Systems, 42(4), 2017.
    [2]
    M. Abo Khamis, H. Q. Ngo, and D. Suciu. What do Shannon-type inequalities, submodular width, and disjunctive datalog have to do with one another? In Proc. 36th ACM Symposium on Principles of Database Systems (PODS), pages 429--444, 2017.
    [3]
    D. Arroyuelo, A. Hogan, G. Navarro, J. Reutter, J. Rojas-Ledesma, and A. Soto. Worst-case optimal graph joins in almost no space. In Proc. 47th ACM International Conference on Management of Data (SIGMOD), pages 102--114, 2021.
    [4]
    A. Atserias, M. Grohe, and D. Marx. Size bounds and query plans for relational joins. SIAM Journal on Computing, 42(4):1737--1767, 2013.
    [5]
    A. Bonifati, W. Martens, and T. Timm. Navigating the maze of Wikidata query logs. In Proc. World Wide Web Conference (WWW), pages 127--138, 2019.
    [6]
    D. Clark. Compact Pat Trees. PhD thesis, University of Waterloo, 1996.
    [7]
    O. Erling. Virtuoso, a hybrid RDBMS/graph column store. Data Engineering Bulletin, 35(1):3--8, 2012.
    [8]
    M. J. Freitag, M. Bandle, T. Schmidt, A. Kemper, and T. Neumann. Adopting worst-case optimal joins in relational database systems. Proceedings of the VLDB Endowment, 13(11):1891--1904, 2020.
    [9]
    G. Gottlob, N. Leone, and F. Scarcello. Hypertree decompositions and tractable queries. In Proc. 18th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), pages 21--32, 1999.
    [10]
    S. Harris, A. Seaborne, and E. Prud'hommeaux. SPARQL 1.1 Query Language. W3C Recommendation, 2013. https://www.w3.org/TR/sparql11-query/.
    [11]
    A. Hogan, C. Riveros, C. Rojas, and A. Soto. A worst-case optimal join algorithm for SPARQL. In Proc. 18th International Semantic Web Conference (ISWC), pages 258--275, 2019.
    [12]
    G. Jacobson. Space-efficient static trees and graphs. In Proc. 30th IEEE Symposium on Foundations of Computer Science (FOCS), pages 549--554, 1989.
    [13]
    O. Kalinsky, Y. Etsion, and B. Kimelfeld. Flexible caching in trie joins. In Proc. 20th International Conference on Extending Database Technology (EDBT), pages 282--293, 2017.
    [14]
    M. A. Khamis, H. Q. Ngo, C. Ré, and A. Rudra. Joins via geometric resolutions: Worst case and beyond. ACM Transactions on Database Systems, 41(4):22, 2016.
    [15]
    J. Leskovec. Stanford Large Network Dataset Collection: LiveJournal social network. https://snap.stanford.edu/data/soc-LiveJournal1.html.
    [16]
    S. Malyshev, M. Krötzsch, L. González, J. Gonsior, and A. Bielefeldt. Getting the most out of Wikidata: Semantic technology usage in Wikipedia's knowledge graph. In Proc. 17th International Semantic Web Conference (ISWC), pages 376--394, 2018.
    [17]
    F. Manola and E. Miller. RDF Primer. W3C Recommendation. 2004. http://www.w3.org/TR/rdf-primer/.
    [18]
    A. Mhedhbi and S. Salihoglu. Optimizing subgraph queries by combining binary and worst-case optimal joins. Proceedings of the VLDB Endowment, 12(11):1692--1704, 2019.
    [19]
    I. Munro. Tables. In Proc. 16th Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS), pages 37--42, 1996.
    [20]
    G. Navarro and K. Sadakane. Fully-functional static and dynamic succinct trees. ACM Transactions on Algorithms, 10(3):article 16, 2014.
    [21]
    T. Neumann and G. Weikum. The RDF-3X engine for scalable management of RDF data. VLDB Journal, 19:91--113, 2010.
    [22]
    H. Q. Ngo. Worst-case optimal join algorithms: Techniques, results, and open problems. In Proc. 37th Symposium on Principles of Database Systems (PODS), pages 111--124, 2018.
    [23]
    H. Q. Ngo, E. Porat, C. Ré, and A. Rudra. Worst-case optimal join algorithms. In Proc. 31st Symposium on Principles of Database Systems (PODS), pages 37--48, 2012.
    [24]
    H. Q. Ngo, C. Ré, and A. Rudra. Skew strikes back: new developments in the theory of join algorithms. SIGMOD Record, 42(4):5--16, 2013.
    [25]
    D. Nguyen, M. Aref, M. Bravenboer, G. Kollias, H. Q. Ngo, C. Ré, and A. Rudra. Join processing for graph patterns: An old dog with new tricks. In Proc. 3rd International Workshop on Graph Data Management Experiences and Systems (GRADES), pages 2:1--2:8, 2015.
    [26]
    P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. In Proc. ACM International Conference on Management of Data (SIGMOD), pages 23--34, 1979.
    [27]
    B. B. Thompson, M. Personick, and M. Cutcher. The Bigdata®RDF Graph Database. In Linked Data Management, pages 193--237. Chapman and Hall/CRC, 2014.
    [28]
    T. L. Veldhuizen. Triejoin: A simple, worst-case optimal join algorithm. In Proc. 17th International Conference on Database Theory (ICDT), pages 96--106, 2014.
    [29]
    D. Vrandecic and M. Krötzsch. Wikidata: A free collaborative knowledgebase. Communications of the ACM, 57(10):78--85, 2014.
    [30]
    D. Vrgoc, C. Rojas, R. Angles, M. Arenas, D. Arroyuelo, C. Buil-Aranda, A. Hogan, G. Navarro, C. Riveros, and J. Romero. MillenniumDB: An open-source graph database system. Data Intelligence, 5(3):560--610, 2023.
    [31]
    J. Wang, I. Trummer, A. Kara, and D. Olteanu. ADOPT: Adaptively optimizing attribute orders forworst-case optimal join algorithms via reinforcement learning. Proceedings of the VLDB Endowment, 16(11):2805--2817, 2023.
    [32]
    Y. R. Wang, M. Willsey, and D. Suciu. Free Join: Unifying worst-case optimal and traditional joins. Proc. 49th ACM International Conference on Management of Data (SIGMOD), 1(2):150:1--150:23, 2023.
    [33]
    M. Yannakakis. Algorithms for acyclic database schemes. In Proc. 7th International Conference on Very Large Databases (VLDB), pages 82--94, 1981.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    GRADES-NDA '24: Proceedings of the 7th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)
    June 2024
    62 pages
    ISBN:9798400706530
    DOI:10.1145/3661304
    • Editors:
    • Olaf Hartig,
    • Zoi Kaoudi
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 June 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Leapfrog Triejoin
    2. Worst-case optimal joins
    3. graph databases
    4. graph indexing
    5. graph patterns
    6. nearest-neighbor graphs
    7. similarity joins

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    SIGMOD/PODS '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 29 of 61 submissions, 48%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 15
      Total Downloads
    • Downloads (Last 12 months)15
    • Downloads (Last 6 weeks)15
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media