Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1516360.1516362acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article
Free access

Data integration flows for business intelligence

Published: 24 March 2009 Publication History

Abstract

Business Intelligence (BI) refers to technologies, tools, and practices for collecting, integrating, analyzing, and presenting large volumes of information to enable better decision making. Today's BI architecture typically consists of a data warehouse (or one or more data marts), which consolidates data from several operational databases, and serves a variety of front-end querying, reporting, and analytic tools. The back-end of the architecture is a data integration pipeline for populating the data warehouse by extracting data from distributed and usually heterogeneous operational sources; cleansing, integrating and transforming the data; and loading it into the data warehouse. Since BI systems have been used primarily for off-line, strategic decision making, the traditional data integration pipeline is a oneway, batch process, usually implemented by extract-transform-load (ETL) tools. The design and implementation of the ETL pipeline is largely a labor-intensive activity, and typically consumes a large fraction of the effort in data warehousing projects. Increasingly, as enterprises become more automated, data-driven, and real-time, the BI architecture is evolving to support operational decision making. This imposes additional requirements and tradeoffs, resulting in even more complexity in the design of data integration flows. These include reducing the latency so that near real-time data can be delivered to the data warehouse, extracting information from a wider variety of data sources, extending the rigidly serial ETL pipeline to more general data flows, and considering alternative physical implementations. We describe the requirements for data integration flows in this next generation of operational BI system, the limitations of current technologies, the research challenges in meeting these requirements, and a framework for addressing these challenges. The goal is to facilitate the design and implementation of optimal flows to meet business requirements.

References

[1]
M. Anthimopoulos, B. Gatos, I. Pratikakis. Multiresolution text detection in video frames. In VISAPP (2), pp. 161--166, 2007.
[2]
M.Berry, M. Castellanos (Eds). Survey of Text Mining II: Clustering, Classification and Retrieval. Springer Verlag, 2008.
[3]
M. Castellanos, A. Simitsis, K. Wilkinson, U. Dayal. Automating the Loading of Business Process Warehouses. In EDBT, 2009.
[4]
S. Chaudhuri, U. Dayal, V. Ganti. Database Technology for Decision Support Systems. In IEEE Computer 34(12), pp. 48--55, December 2001.
[5]
Q. Chen, M. Hsu. Data Continuous SQL Process Model. In CoopIS, pp. 175--192, 2008.
[6]
S. Chen, L. Bao, P. Chen. OptBPEL: A Tool for Performance Optimization of BPEL Process. In Software Composition, pp. 141--148, 2008.
[7]
L. Chung, B. A. Nixon, E. Yu, J. Mylopoulos. Non-Functional Requirements in Software Engineering. Kluwer Academic Publishing, 1999.
[8]
N. N. Dalvi, S. K. Sanghai, P. Roy, S. Sudarshan. Pipelining in Multi-Query Optimization. In PODS, 2001.
[9]
J. Dean, S. Ghemawat. MapReduce. Simplified Data Processing on Large Clusters. In Sixth Symposium on Operating System Design and Implementation, 2004.
[10]
A. Elmagarmid, M. Rusinkiewicz, A. Sheth. Management of Heterogeneous and Autonomous Database Systems. Morgan Kaufmann, 1999.
[11]
Freitag, A. McCallum. Information Extraction with HMM Structures Learned by Stochastic Optimization. In National Conference on Artificial Intelligence, 2000.
[12]
P. Gillin. BI @ the Speed of Business. Computer World Technology Briefings. December 2007. Available at: http://resources.computerworld.com/sas_imw/registration.php?item=12&tab=1.
[13]
L. M. Haas, M. A. Hernández, H. Ho, L. Popa, M. Roth. Clio grows up: from research prototype to industrial tool. In SIGMOD, pp. 805--810, 2005.
[14]
A. Y. Halevy, A. Rajaraman, J. J. Ordille. Database Integration: The Teenage Years. In VLDB, pp, 9--16, 2006.
[15]
R. Hull. Artifact-Centric Business Process Models: Brief Survey of Research Results and Challenges. In ODBASE Conference, pp. 1152--1163, 2008.
[16]
Informatica. Pushdown Optimization. Available at: http://www.informatica.com/INFA_Resources/ds_pushdown_optimization_6675.pdf
[17]
Informatica. How to Achieve Flexible, Cost-effective Scalability and Performance through Pushdown Processing. White paper, November 2007.
[18]
W. H. Inmon. Building the Data Warehouse. John Wiley, 1993.
[19]
W. H. Inmon, A. Nesavich. Tapping into Unstructured Data: Integrating Unstructured Data and Textual Analytics into Business Intelligence. Morgan Kaufmann, 2007.
[20]
W. H. Inmon, D. Strauss, G. Neuschloss. DW 2.0. The Architecture for the Next Generation of Data Warehousing. Morgan Kaufmann, 2008.
[21]
H. A. Kuno, K. Yuasa, K. Govindarajan, K. Smathers, B. Burg, P. Carau, K. Wilkinson. Governing the Contract Lifecycle: A Framework for Sequential Configuration of Loosely-Coupled Systems. In DNIS, pp. 264--279, 2005.
[22]
S. Luján-Mora, P. Vassiliadis, J. Trujillo. Data Mapping Diagrams for Data Warehouse Design with UML. In ER, pp. 191--204, 2004.
[23]
E. Malinowski, E. Zimanyi. Advanced Data Warehouse Design. From Conventional to Spatial and Temporal Applications. Springer, 2009.
[24]
C. Thomsen, T. B. Pedersen, W. Lehner. RiTE: Providing On-Demand Data for Right-Time Data Warehousing. In ICDE, pp. 456--465, 2008.
[25]
N. Polyzotis, S. Skiadopoulos, P. Vassiliadis, A. Simitsis, N.-E. Frantzell. Supporting Streaming Updates in an Active Data Warehouse. In ICDE, pp. 476--485, 2007.
[26]
P. Roy, S. Seshadri, S. Sudarshan, S. Bhobe. Efficient and Extensible Algorithms for Multi Query Optimization. In SIGMOD, pp. 249--260, 2000.
[27]
T. K. Sellis, A. Simitsis. ETL Workflows: From Formal Specification to Optimization. In ADBIS, pp. 1--11, 2007.
[28]
T. K. Sellis. Multiple-Query Optimization. In ACM Trans. Database Syst. 13(1), pp. 23--52, 1988.
[29]
A. Simitsis, P. Vassiliadis, T. K. Sellis. Optimizing ETL Processes in Data Warehouses. In ICDE, 2005.
[30]
D. Skoutas, A. Simitsis. Designing ETL Processes Using Semantic Web Technologies. In DOLAP, pp. 67--74, 2006.
[31]
S. Soderland: Learning Information Extraction Rules for Semi-Structured and Free Text. In Machine Learning 34(1--3), pp. 233--272, 1999.
[32]
V. Tziovara, P. Vassiliadis, A. Simitsis. Deciding the Physical Implementation of ETL Workflows. In DOLAP, pp. 49--56, 2007.
[33]
P. Vassiliadis, A. Simitsis. Near Real Time ETL. In Springer Annals of Information Systems, Vol. 3, pp. 19--29, 2008.
[34]
P. Vassiliadis, A. Simitsis, S. Skiadopoulos. Conceptual modeling for ETL processes. In DOLAP, pp. 14--21, 2002.
[35]
P. Vassiliadis, A. Simitsis, M. Terrovitis, S. Skiadopoulos. Blueprints and Measures for ETL Workflows. In ER, pp. 385--400, 2005.
[36]
C. White. The Next Generation of Business Intelligence: Operational BI. DM Review Magazine, May 2005
[37]
K. Wilkinson, H. A. Kuno, K. Govindarajan, K. Yuasa, K. Smathers, J. Nanda, U. Dayal. Enabling Outsourced Service Providers to Think Globally While Acting Locally. In EDBT, pp. 1106--1109, 2006.
[38]
WS-BPEL Version 2.0, Oasis. Available at: http://docs.oasis-open.org/wsbpel/2.0/wsbpel-v2.0.pdf

Cited By

View all
  • (2024)Assessing the effectiveness of crawlers and large language models in detecting adversarial hidden link threats in meta computingHigh-Confidence Computing10.1016/j.hcc.2024.100292(100292)Online publication date: Nov-2024
  • (2023)Evolving Business Intelligence on Data Integration, ETL Procedures, and the Power of Predictive AnalyticsData-Driven Intelligent Business Sustainability10.4018/979-8-3693-0049-7.ch001(1-17)Online publication date: 5-Dec-2023
  • (2023)Towards a Conceptual Framework for Data Management in Business IntelligenceInformation10.3390/info1410054714:10(547)Online publication date: 6-Oct-2023
  • Show More Cited By
  1. Data integration flows for business intelligence

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
    March 2009
    1180 pages
    ISBN:9781605584225
    DOI:10.1145/1516360
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 March 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. ETL
    2. business intelligence
    3. data integration
    4. data warehousing

    Qualifiers

    • Research-article

    Conference

    EDBT/ICDT '09
    EDBT/ICDT '09: EDBT/ICDT '09 joint conference
    March 24 - 26, 2009
    Saint Petersburg, Russia

    Acceptance Rates

    Overall Acceptance Rate 7 of 10 submissions, 70%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)762
    • Downloads (Last 6 weeks)94
    Reflects downloads up to 28 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Assessing the effectiveness of crawlers and large language models in detecting adversarial hidden link threats in meta computingHigh-Confidence Computing10.1016/j.hcc.2024.100292(100292)Online publication date: Nov-2024
    • (2023)Evolving Business Intelligence on Data Integration, ETL Procedures, and the Power of Predictive AnalyticsData-Driven Intelligent Business Sustainability10.4018/979-8-3693-0049-7.ch001(1-17)Online publication date: 5-Dec-2023
    • (2023)Towards a Conceptual Framework for Data Management in Business IntelligenceInformation10.3390/info1410054714:10(547)Online publication date: 6-Oct-2023
    • (2023)Selected Aspects of Interactive Feature ExtractionTransactions on Rough Sets XXIII10.1007/978-3-662-66544-2_8(121-287)Online publication date: 1-Jan-2023
    • (2023)Data Integration in Practice: Academic Finance Analytics Case StudyAdvances in Internet, Data & Web Technologies10.1007/978-3-031-26281-4_1(1-11)Online publication date: 12-Feb-2023
    • (2022)PAZARYERLERİ VE İŞ ZEKÂSISinop Üniversitesi Sosyal Bilimler Dergisi10.30561/sinopusd.10831166:1(1-22)Online publication date: 31-May-2022
    • (2022)Cyber-Resiliency for Digital Enterprises: A Strategic Leadership PerspectiveIEEE Transactions on Engineering Management10.1109/TEM.2020.299617569:6(3757-3770)Online publication date: Dec-2022
    • (2022)SIRAD: Secure Infrastructure for Research with Administrative DataSoftware Impacts10.1016/j.simpa.2022.10024512(100245)Online publication date: May-2022
    • (2022)Data Driven Decision Making When Transitioning Towards a Modular SetupAdvances in System-Integrated Intelligence10.1007/978-3-031-16281-7_56(597-606)Online publication date: 4-Sep-2022
    • (2021)The art of balanceProceedings of the VLDB Endowment10.14778/3476311.347637814:12(2999-3013)Online publication date: 28-Oct-2021
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media