Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

TraNCE: transforming nested collections efficiently

Published: 01 July 2021 Publication History
  • Get Citation Alerts
  • Abstract

    Nested relational query languages have long been seen as an attractive tool for scenarios involving large hierarchical datasets. There has been a resurgence of interest in nested relational languages. One driver has been the affinity of these languages for large-scale processing platforms such as Spark and Flink.
    This demonstration gives a tour of TraNCE, a new system for processing nested data on top of distributed processing systems. The core innovation of the system is a compiler that processes nested relational queries in a series of transformations; these include variants of two prior techniques, shredding and unnesting, as well as a materialization transformation that customizes the way levels of the nested output are generated. The TraNCE platform builds on these techniques by adding components for users to create and visualize queries, as well as data exploration and notebook execution targets to facilitate the construction of large-scale data science applications. The demonstration will both showcase the system from the viewpoint of usability by data scientists and illustrate the data management techniques employed.

    References

    [1]
    Maaz Bin Safeer Ahmad and Alvin Cheung. 2018. Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications. In SIGMOD.
    [2]
    Alexander Alexandrov, Asterios Katsifodimos, Georgi Krastev, and Volker Markl. 2016. Implicit Parallelism through Deep Language Embedding. SIGMOD Rec. 45, 1 (2016), 51--58.
    [3]
    James Cheney, Sam Lindley, and Philip Wadler. 2014. Query Shredding: Efficient Relational Evaluation of Queries over Nested Multisets. In SIGMOD.
    [4]
    Leonidas Fegaras and David Maier. 2000. Optimizing Object Queries Using an Effective Calculus. TODS 25, 4 (2000).
    [5]
    Leonidas Fegaras and Md Hasanuzzaman Noor. 2020. Translation of Array-Based Loops to Distributed Data-Parallel Programs. In VLDB.
    [6]
    Andreas Kunft, Asterios Katsifodimos, Sebastian Schelter, Sebastian Breß, Tilmann Rabl, and Volker Markl. 2019. An Intermediate Representation for Optimizing Machine Learning Pipelines. In VLDB.
    [7]
    Ingo Müller, Ghislain Fourny, Stefan Irimescu, Can Berker Cikis, and Gustavo Alonso. 2021. Rumble: Data Independence for Large Messy Data Sets. In VLDB.
    [8]
    Erik Pasternak, Rachel Fenichel, and Andrew N. Marshall. 2017. Tips for creating a block language with blockly. In 2017 IEEE Blocks and Beyond Workshop (B B). 21--24.
    [9]
    Jaclyn Smith, Michael Benedikt, Milos Nikolic, and Amir Shaikhha. 2021. Scalable Querying of Nested Data. In VLDB.
    [10]
    Jaclyn Smith, Michael Benedikt, Milos Nikolic, and Yao Shi. 2020. Scalable Analysis of Multi-Modal Biomedical Data. bioarxiv.org.
    [11]
    Alexander Ulrich. 2019. Query Flattening and the Nested Data Parallelism Paradigm. Ph.D. Dissertation. University of Tübingen, Germany. https://publikationen.uni-tuebingen.de/xmlui/handle/10900/87698/
    [12]
    Alexander Ulrich and Torsten Grust. 2015. The Flatter, the Better: Query Compilation Based on the Flattening Transformation. In SIGMOD.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 14, Issue 12
    July 2021
    587 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 July 2021
    Published in PVLDB Volume 14, Issue 12

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 22
      Total Downloads
    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 09 Aug 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media