Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3242153.3242157acmotherconferencesArticle/Chapter ViewAbstractPublication PagesbirteConference Proceedingsconference-collections
research-article

Real-time ETL in Striim

Published: 27 August 2018 Publication History

Abstract

In the new digital economy, on demand access of real time enterprise data is critical to modernize cross organizational, cross partner, and online consumer functions. In addition to on premise legacy data, enterprises are producing an enormous amount of real-time data through new hybrid cloud applications; these event streams need to be collected, transformed and analyzed in real-time to make critical business decision. Traditional Extract-Load-Transform (ETL) processes are no longer sufficient and need to be re-architected to account for streaming, heterogeneity, usability, extensibility (custom processing), and continuous validity.
Striim is a novel end-to-end distributed streaming ETL and intelligence platform that enables rapid development and deployment of streaming applications. Striim's real-time ETL engine has been architected from ground-up to enable both business users and developers to build and deploy streaming applications. In this paper, we describe some of the core features of Striim's ETL engine (i) built-in adapters to extract and load data in real-time from legacy and new cloud sources/targets (ii) an extensible SQL-based transformation engine to transform events; users can inject custom logic via a component called Open Processor (iv) New primitives like MODIFY, BEFORE and AFTER and (v) built-in data validation that continuously checks if everything is continually making it to the destination.

References

[1]
Oracle Data Integrator. http://www.oracle.com/technetwork/middleware/data-integrator/overview/index.html
[2]
Informatica. http://www.informatica.com.
[3]
Apache Nifi. https://flink.apache.org.
[4]
Confluent KSQL. https://www.confluent.io/blog/building-real-time-streaming-etl-pipeline-20-minutes/
[5]
StreamSets. https://www.streamsets.com
[6]
Apache Spark. http://spark.apache.org
[7]
Pentaho Data Integration. https://GitHub.com/pentaho/pentaho-kettle
[8]
Talend. http://www.talend.com
[9]
IBM InfoSphere. https://www.ibm.com/us-en/marketplace/infosphere-information-server
[10]
Apache Storm. http://storm.apache.org.
[11]
Apache Flink. https://ci.apache.org/projects/flink/
[12]
Pareek et al. Striim: A streaming analytics platform for real-time business decisions. In BIRTE, 2017.
[13]
T. Jörg and S. Dessloch. Near real-time data warehousing using state-of-the-art-etl tools. Enabling Real-Time Business Intelligence, pages 100--117, 2010.
[14]
T. Jörg and S. Dessloch. Towards generating etl processes for incremental loading. In IDEAS, pages 101--110 ACM, 2008.
[15]
Y. Kargin et al. Instant-on Scientific Data Warehouses- Lazy ETL for DataIntensive Research In BIRTE, 2012.
[16]
Y. Kargin et al. Lazy ETL in Action: ETL Technology Dates Scientific Data. In VLDB, 2013.
[17]
Yang et al.Lenses: An On-Demand Approach to ETL. In VLDB, 2015
[18]
Theodorou et al. A Framework for User-Centered Declarative ETL. In Proceedings of the 17th International Workshop on Data Warehousing and OLAP, 2014
[19]
Thomsen et al. Easy and effective parallel programmable ETL. In Proceedings of the 14th International Workshop on Data Warehousing and OLAP, 2011
[20]
Liu et al. Map-Reduce based dimensional ETL made easy. In Proceedings of the VLDB Endowment, Volume 5 Issue 12, August 2012, Pages 1882--1885
[21]
Liu et al. CloudETL: scalable dimensional ETL for Hive, In IDEAS, 2014
[22]
http://www.oracle.com/us/products/middleware/059493.pdf
[23]
https://docs.oracle.com/cloud/latest/db112/CNCPT/transact.htm#CNCPT039
[24]
https://www.quest.com/community/quest/b/en/posts/use-the-force-of-compare-and-repair-to-protect-your-shareplex-replication-from-the-dark-side
[25]
GDPR Compliance. https://eugdprcompliant.com/
[26]
Apache Kafka. https://kafka.apache.org
[27]
Streaming ETL- The New Data Integration. https://www.confluent.io/streaming-etl-the-new-data-integration.
[28]
https://aws.amazon.com/serverless/
[29]
https://azure.microsoft.com/en-us/overview/serverless-computing/
[30]
https://cloud.google.com/serverless/

Cited By

View all
  • (2024)Metadata-Driven Cloud-Agnostic Data Integration Framework2024 47th MIPRO ICT and Electronics Convention (MIPRO)10.1109/MIPRO60963.2024.10569972(862-868)Online publication date: 20-May-2024
  • (2023)A Modular Framework for Data Processing at the Edge: Design and ImplementationSensors10.3390/s2317766223:17(7662)Online publication date: 4-Sep-2023
  • (2022)Distributed real-time ETL architecture for unstructured big dataKnowledge and Information Systems10.1007/s10115-022-01757-764:12(3419-3445)Online publication date: 16-Sep-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
BIRTE '18: Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics
August 2018
59 pages
ISBN:9781450366076
DOI:10.1145/3242153
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • NSF: National Science Foundation
  • Google Inc.
  • Microsoft: Microsoft

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 August 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ETL
  2. Real-time
  3. data validation

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

BIRTE '18

Acceptance Rates

Overall Acceptance Rate 12 of 21 submissions, 57%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)1
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Metadata-Driven Cloud-Agnostic Data Integration Framework2024 47th MIPRO ICT and Electronics Convention (MIPRO)10.1109/MIPRO60963.2024.10569972(862-868)Online publication date: 20-May-2024
  • (2023)A Modular Framework for Data Processing at the Edge: Design and ImplementationSensors10.3390/s2317766223:17(7662)Online publication date: 4-Sep-2023
  • (2022)Distributed real-time ETL architecture for unstructured big dataKnowledge and Information Systems10.1007/s10115-022-01757-764:12(3419-3445)Online publication date: 16-Sep-2022
  • (2022)Dynamic multi-variant relational scheme-based intelligent ETL framework for healthcare managementSoft Computing10.1007/s00500-022-06938-827:1(605-614)Online publication date: 21-Mar-2022
  • (2021)Integration of ETL in Cloud Using Spark for Streaming DataAdvanced Techniques for IoT Applications10.1007/978-981-16-4435-1_18(172-182)Online publication date: 3-Aug-2021
  • (2020)Challenges and Solutions for Processing Real-Time Big Data Stream: A Systematic Literature ReviewIEEE Access10.1109/ACCESS.2020.30052688(119123-119143)Online publication date: 2020
  • (2020)Integration of IoT Streaming Data With Efficient Indexing and Storage OptimizationIEEE Access10.1109/ACCESS.2020.29800068(47456-47467)Online publication date: 2020
  • (2020)IoT streaming data integration from multiple sourcesComputing10.1007/s00607-020-00830-9Online publication date: 8-Jul-2020
  • (2019)A Demonstration of Striim A Streaming Integration and Intelligence PlatformProceedings of the 13th ACM International Conference on Distributed and Event-based Systems10.1145/3328905.3332519(236-239)Online publication date: 24-Jun-2019
  • (2019)ISDI: A New Window-Based Framework for Integrating IoT Streaming Data from Multiple SourcesPrimate Life Histories, Sex Roles, and Adaptability10.1007/978-3-030-15032-7_42(498-511)Online publication date: 15-Mar-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media