Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Lazy ETL in action: ETL technology dates scientific data

Published: 01 August 2013 Publication History

Abstract

Both scientific data and business data have analytical needs. Analysis takes place after a scientific data warehouse is eagerly filled with all data from external data sources (repositories). This is similar to the initial loading stage of Extract, Transform, and Load (ETL) processes that drive business intelligence. ETL can also help scientific data analysis. However, the initial loading is a time and resource consuming operation. It might not be entirely necessary, e.g. if the user is interested in only a subset of the data.
We propose to demonstrate Lazy ETL, a technique to lower costs for initial loading. With it, ETL is integrated into the query processing of the scientific data warehouse. For a query, only the required data items are extracted, transformed, and loaded transparently on-the-fly.
The demo is built around concrete implementations of Lazy ETL for seismic data analysis. The seismic data warehouse is ready for query processing, without waiting for long initial loading. The audience fires analytical queries to observe the internal mechanisms and modifications that realize each of the steps; lazy extraction, transformation, and loading.

References

[1]
Standard for the Exchange of Earthquake Data. Incorporated Research Institutions for Seismology, February 1988.
[2]
MonetDB, Column-store Pioneers. www.monetdb.org, 2013.
[3]
I. Alagiannis et al. NoDB: Efficient Query Execution on Raw Data Files. In SIGMOD, 2012.
[4]
S. Chaudhuri and U. Dayal. An overview of data warehousing and OLAP technology. ACM Sigmod record, 26(1):65-74, 1997.
[5]
U. Dayal et al. Data integration ows for business intelligence. In EDBT, pages 1-11. ACM, 2009.
[6]
J. Gray et al. Scientific Data Management in the Coming Decade. SIGMOD Record, 34(4), 2005.
[7]
S. Idreos et al. Here are my data files. here are my queries. where are my results? In CIDR 2011.
[8]
M. Ivanova et al. An Architecture for Recycling Intermediates in a Column-store. In SIGMOD Conference, pages 309-320, 2009.
[9]
M. Jarke, M. Lenzerini, Y. Vassiliou, and P. Vassiliadis. Fundamentals of data warehouses. Springer Verlag, 2003.
[10]
T. Jörg and S. Deßloch. Towards generating etl processes for incremental loading. In IDEAS, pages 101-110. ACM, 2008.
[11]
T. Jörg and S. Dessloch. Near real-time data warehousing using state-of-the-art etl tools. Enabling Real-Time Business Intelligence, pages 100-117, 2010.
[12]
Y. Kargin et al. Instant-On Scientific Data Warehouses -- Lazy ETL for Data-Intensive Research. In BIRTE, 2012.
[13]
J. Kiviniemi, A. Wolski, A. Pesonen, and J. Arminen. Lazy aggregates for real-time OLAP. Lecture notes in computer science, pages 165-172, 1999.
[14]
W. Labio, R. Yerneni, and H. Garcia-Molina. Shrinking the Warehouse Update Window. In Proceedings of SIGMOD, pages 383-394, 1998.
[15]
ORFEUS. Seismology Event Data (1988 - now). ftp://www.orfeus-eu.org/pub/data/pond/, 2013.
[16]
SQL/MED. ISO/IEC 9075-9:2008 Information technology - Database languages - SQL - Part 9: Management of External Data (SQL/MED).
[17]
M. Stonebraker et al. Requirements for Science Data Bases and SciDB. In CIDR, 2009.
[18]
P. Vassiliadis. A survey of extract-transform-load technology. International Journal of Data Warehousing and Mining (IJDWM), 5(3):1-27, 2009.
[19]
P. Vassiliadis and A. Simitsis. Extraction, transformation, and loading. Encyclopedia of Database Systems, pages 1095-1101, 2009.
[20]
Y. Zhang et al. SciQL: bridging the gap between science and relational DBMS. IDEAS'11, pages 124-133. ACM.

Cited By

View all
  • (2025)A survey of multimodal event detection based on data fusionThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00878-534:1Online publication date: 1-Jan-2025
  • (2019)On-demand big data integrationDistributed and Parallel Databases10.1007/s10619-018-7248-y37:2(273-295)Online publication date: 1-Jun-2019
  • (2018)Real-time ETL in StriimProceedings of the International Workshop on Real-Time Business Intelligence and Analytics10.1145/3242153.3242157(1-10)Online publication date: 27-Aug-2018
  • Show More Cited By
  1. Lazy ETL in action: ETL technology dates scientific data

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the VLDB Endowment
      Proceedings of the VLDB Endowment  Volume 6, Issue 12
      August 2013
      264 pages

      Publisher

      VLDB Endowment

      Publication History

      Published: 01 August 2013
      Published in PVLDB Volume 6, Issue 12

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)14
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 14 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)A survey of multimodal event detection based on data fusionThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00878-534:1Online publication date: 1-Jan-2025
      • (2019)On-demand big data integrationDistributed and Parallel Databases10.1007/s10619-018-7248-y37:2(273-295)Online publication date: 1-Jun-2019
      • (2018)Real-time ETL in StriimProceedings of the International Workshop on Real-Time Business Intelligence and Analytics10.1145/3242153.3242157(1-10)Online publication date: 27-Aug-2018
      • (2018)Challenges and Opportunities in Transportation DataProceedings of the 1st ACM/EIGSCC Symposium on Smart Cities and Communities10.1145/3236461.3241971(1-8)Online publication date: 20-Jun-2018

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media