Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3623278.3624754acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Supporting Descendants in SIMD-Accelerated JSONPath

Published: 07 February 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Harnessing the power of SIMD can bring tremendous performance gains in data processing. In querying streamed JSON data, the state of the art leverages SIMD to fast forward significant portions of the document. However, it does not provide support for descendant, which excludes many real-life queries and makes formulating many others hard. In this work, we aim to change this: we consider the fragment of JSONPath that supports child, descendant, wildcard, and labels. We propose a modular approach based on novel depth-stack automata that process a stream of events produced by a state-driven classifier, allowing fast forwarding parts of the input document irrelevant at the current stage of the computation. We implement our solution in Rust and compare it with the state of the art, confirming that our approach allows supporting descendants without sacrificing performance, and that reformulating natural queries using descendants brings impressive performance gains in many cases.

    References

    [1]
    Andreas Abel and Jan Reineke. uops.info: Characterizing latency, throughput, and port usage of instructions on intel microarchitectures. In ASPLOS, ASPLOS '19, pages 673--686, New York, NY, USA, 2019. ACM.
    [2]
    Andreas Abel and Jan Reineke. nanobench: A low-overhead tool for running microbenchmarks on x86 systems. In 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 34--46, 2020.
    [3]
    Andreas Abel and Jan Reineke. uiCA: Accurate throughput prediction of basic blocks on recent Intel microarchitectures. In Lawrence Rauchwerger, Kirk Cameron, Dimitrios S. Nikolopoulos, and Dionisios Pnevmatikatos, editors, ICS '22: 2022 International Conference on Supercomputing, Virtual Event, USA, June 27--30, 2022, ICS '22, pages 1--12. ACM, June 2022.
    [4]
    Andreas Abel and Jan Reineke. The uops.info Code Analyzer: analysis of few groups classification loop, September 2023. https://bit.ly/3sWaSGb.
    [5]
    Andreas Abel and Jan Reineke. The uops.info Code Analyzer: analysis of general classification loop, September 2023. https://tinyurl.com/2s3byyzf.
    [6]
    Andreas Abel and Jan Reineke. The uops.info Code Analyzer: analysis of non-overlapping group classification loop, September 2023. https://bit.ly/3RhtCKH.
    [7]
    Corentin Barloy, Filip Murlak, and Charles Paperman. Stackless processing of streamed trees. In Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS'21, page 109--125, New York, NY, USA, 2021. Association for Computing Machinery.
    [8]
    Raphaël Bolze, Franck Cappello, Eddy Caron, Michel Daydé, Frédéric Desprez, Emmanuel Jeannot, Yvon Jégou, Stephane Lanteri, Julien Leduc, Nouredine Melab, Guillaume Mornet, Raymond Namyst, Pascale Primet, Benjamin Quétier, Olivier Richard, El-Ghazali Talbi, and Iréa Touche. Grid'5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed. International Journal of High Performance Computing Applications, 20(4):481--494, 2006.
    [9]
    Christoph Burgmer et al. json-path-comparison, 2019. https://github.com/cburgmer/json-path-comparison.
    [10]
    Crossref. Crossref. https:cd//www.crossref.org/.
    [11]
    Crossref. Crossref datadump, 2022. https://www.crossref.org/blog/2022-public-data-file-of-more-than-134-million-metadata-records-now-available/.
    [12]
    Stephen Dolan. jq. https://stedolan.github.io/jq/.
    [13]
    James Finnie-Ansley, Paul Denny, Brett A. Becker, Andrew Luxton-Reilly, and James Prather. The robots are coming: Exploring the implications of OpenAI codex on introductory programming. In Australasian Computing Education Conference. ACM, February 2022.
    [14]
    Andrew Gallant. memchr, 2015. https://crates.io/crates/memchr.
    [15]
    Mateusz Gienieczko, Filip Murlak, and Charles Paperman. rsonpath, 2023. https://zenodo.org/record/8400854.
    [16]
    Groupement d'Intérêt Scientifique. Grid'5000 hardware, 2018.
    [17]
    Stefan Gössner. JSONPath, 2007. https://goessner.net/articles/JsonPath/.
    [18]
    Stefan Gössner, Glyn Normington, and Carsten Bormann. JSONPath: Query expressions for JSON. Internet-Draft draft-ietf-jsonpath-base-20, Internet Engineering Task Force, August 2023. Work in Progress.
    [19]
    Brook Heisler. criterion-rs, 2017. https://crates.io/crates/criterion-rs.
    [20]
    Lin Jiang, Junqiao Qiu, and Zhijia Zhao. Scalable structural index construction for JSON analytics. Proc. VLDB Endow., 14(4):694--707, December 2020.
    [21]
    Lin Jiang and Zhijia Zhao. JSONSki: Streaming semi-structured data with bit-parallel fast-forwarding. In ASPLOS, pages 200--211. ACM, 2022.
    [22]
    Geoff Langdale and Daniel Lemire. Parsing gigabytes of JSON per second. VLDB J., 28(6):941--960, 2019.
    [23]
    Yinan Li, Nikos R. Katsipoulakis, Badrish Chandramouli, Jonathan Goldstein, and Donald Kossmann. Mison. Proc. VLDB Endow., 10(10):1118--1129, June 2017.
    [24]
    Tobias Mühlbauer, Wolf Rödiger, Robert Seilbeck, Angelika Reiser, Alfons Kemper, and Thomas Neumann. Instant loading for main memory databases. Proc. VLDB Endow., 6(14):1702--1713, September 2013.
    [25]
    Shoumik Palkar, Firas Abuzaid, Peter D. Bailis, and Matei A. Zaharia. Filter before you parse: Faster analytics on raw data with sparser. Proc. VLDB Endow., 11:1576--1589, 2018.
    [26]
    Charles Paperman. Rsonpath benchmark data files, Sep 2023.
    [27]
    Charles Paperman, Mateusz Gienieczko, and Filip Murlak. Dataset for bencharmking rsonpath, Sep 2023.
    [28]
    Antoine Pietri, Diomidis Spinellis, and Stefano Zacchiroli. The software heritage graph dataset: Large-scale analysis of public software development history. In Proceedings of the 17th International Conference on Mining Software Repositories, pages 1--5, 2020.
    [29]
    Prevoty, Inc. and jni-rs contributors. jni. https://crates.io/crates/jni.
    [30]
    simdjson. On-demand API. https://simdjson.org/api/0.6.0/md_doc_ondemand.html.
    [31]
    simdjson. simdjson. https://github.com/simdjson/simdjson.
    [32]
    Leo Wang. A streaming JSONPath processor in Java. https://github.com/jsurfer/JsonSurfer/.
    [33]
    K Zyp. Rfc 6901: Javascript object notation (json) pointer, 2013.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASPLOS '23: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4
    March 2023
    430 pages
    ISBN:9798400703942
    DOI:10.1145/3623278
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 February 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. json
    2. jsonpath
    3. simd
    4. query language
    5. data management

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ASPLOS '23

    Acceptance Rates

    Overall Acceptance Rate 535 of 2,713 submissions, 20%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 115
      Total Downloads
    • Downloads (Last 12 months)115
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media