Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3629104.3672430acmconferencesArticle/Chapter ViewAbstractPublication PagesdebsConference Proceedingsconference-collections
demonstration
Free access

Continuous Data Ingestion and Transformation in Snowflake

Published: 22 July 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Continuous data growth and the need for close-to-now analytics challenge today's data loading patterns for data warehouses. Traditionally, data is ingested into data warehouses with batch-style loading processes. This also used to be the way to for cloud-based data platforms, such as Snowflake. Data was chunked up into files which were loaded into cloud storage and registered to a table. Subsequently, derived results were fully recomputed if they were based on data sets with new data. Processing inefficiencies and cost concerns prohibited frequent ingestion and refreshes, leaving many latency-sensitive use cases unaddressed.
    Over the past years, Snowflake has released and improved several features to continuously ingest and transform data. These features ease the definition of declarative data pipelines and enable our users to continuously load huge amounts of data and frequently update derived results in a cost-efficient manner. Snowflake's user interface (UI) provides detailed insights into the operational aspects of data pipelines and supports debugging and failure investigations.
    With this demo, we show how to seamlessly combine Snowflake's data ingestion and transformation features to built sophisticated data pipelines that cost-efficiently process large amounts of incoming data. Our demo scenario resembles the ingestion and transformation steps of an order analytics system of a B2B retail business. We show and explain the declarative definition of the data pipeline, observe its continuous operation, and leverage Snowflake's UI to dig into details such as query plans for incremental result maintenance.

    References

    [1]
    dbt Labs. 2024. DBT Website. https://www.getdbt.com. Accessed: 2024-05-14.
    [2]
    Benoit Dageville et al. 2016. The snowflake elastic data warehouse. In Proceedings of the 2016 International Conference on Management of Data. 215--226.
    [3]
    Bas Harenslak et al. 2021. Data Pipelines with Apache Airflow. Simon and Schuster.
    [4]
    Jose Blakeley et al. 1986. Efficiently updating materialized views. ACM SIGMOD Record 15, 2 (1986), 61--71.
    [5]
    Mohammad Khorasani et al. 2022. Web application development with streamlit: Develop and deploy secure and scalable web applications to the cloud using a pure Python framework. Springer.
    [6]
    Tyler Akidau et al. 2023. What's the Difference? Incremental Processing with Change Queries in Snowflake. Proc. ACM Manag. Data 1, 2 (2023), 196:1--196:27. https://doi.org/10.1145/3589776
    [7]
    The Apache Software Foundation. 2024. Apache Airflow Website. https://airflow.apache.org. Accessed: 2024-05-14.
    [8]
    Snowflake Inc. 2024. Snowflake's Kafka Connector (Documentation). https://docs.snowflake.com/en/user-guide/kafka-connector. Accessed: 2024-05-14.
    [9]
    Snowflake Inc. 2024. Streamlit Website. https://streamlit.io. Accessed: 2024-05-14.
    [10]
    Ralph Kimball. 2008. Slowly Changing Dimensions. Information Management 18, 9 (2008), 29.
    [11]
    TPC. 2024. TPC-H Website. https://www.tpc.org/tpch/. Accessed: 2024-05-06.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DEBS '24: Proceedings of the 18th ACM International Conference on Distributed and Event-based Systems
    June 2024
    239 pages
    ISBN:9798400704437
    DOI:10.1145/3629104
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 July 2024

    Check for updates

    Author Tags

    1. Snowflake
    2. data ingestion
    3. data pipeline
    4. data transformation
    5. data warehouse
    6. dynamic table
    7. incremental maintenance
    8. stream

    Qualifiers

    • Demonstration
    • Research
    • Refereed limited

    Conference

    DEBS '24

    Acceptance Rates

    DEBS '24 Paper Acceptance Rate 15 of 30 submissions, 50%;
    Overall Acceptance Rate 145 of 583 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 22
      Total Downloads
    • Downloads (Last 12 months)22
    • Downloads (Last 6 weeks)22
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media