Abstract
Data analytics plays a vital role in contemporary organizations, through analytics, organizations are able to derive knowledge and intelligence from data to support strategic decisions. An important step in data analytics is data integration, during which historic data is gathered from various sources and integrated into a centralized repository called data warehouse. Although there are various approaches for data integration, Extract Transform and Load (ETL) has become one of the most efficient and popular approach. Over the decades, ETL has been applied to a wide range of domains such as finance, health and telecom to mention but a few. As the popularity and use of ETL grow, it becomes important to analyze and identify the trends in the research and practice of ETL. In this paper, we perform a systematic literature review to identify and analyze: (1) Approaches used to implement existing ETL solutions (2) Quality attributes to be considered while adopting any ETL approach. (3) The depth of coverage in ETL research and practice with regards to the application domains, frequency publications and geographical locations of papers. (4) The prevailing challenges in developing ETL solutions. Furthermore, we discuss the implications of our findings to ETL researchers and practitioners.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
El Akkaoui, Z., ZimĂ nyi, E., MazĂ³n, J.N.: A model-driven framework for ETL process development. In: Proceedings of the ACM (2011)
Aqlan, F., Nwokeji, J.C.: Applying product manufacturing techniques to teach data analytics in industrial engineering: a project based learning experience. In: 2018 IEEE Frontiers in Education Conference (FIE), pp. 1–7, October 2018
Aqlan, F., Nwokeji, J.C., Shamsan, A.: Teaching an introductory data analytics course using microsoft access® and excel®. In: 2020 IEEE Frontiers in Education Conference (FIE), pp. 1–10, October 2020
Bansal, S.K.: Towards a semantic extract-transform-load (ETL) framework for big data integration. In: 2014 IEEE International Congress on Big Data, pp. 522–529, June 2014
Dayal, U., Castellanos, M., Simitsis, A., Wilkinson, K.: Data integration flows for business intelligence. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT 2009, pp. 1–11. ACM, New York (2009)
Deb Nath, R.P., Hose, K., Pedersen, T.B.: Towards a programmable semantic extract-transform-load framework for semantic data warehouses. In: Proceedings of the ACM Eighteenth International Workshop on Data Warehousing and OLAP, DOLAP 2015, pp. 15–24. ACM, New York (2015)
El Akkaoui, Z., Zimanyi, E., Mazon Lopez, J.N., Trujillo Mondejar, J.C., et al.: A BPMN-based design and maintenance framework for ETL processes. Int. J. Data Warehous. Min. 9, 46–72 (2013)
Freitas, A., Kampgen, B., Oliveira, J.G., ORiain, S., Curry, E.: Representing interoperable provenance descriptions for ETL workflows. In: Extended Semantic Web Conference, pp. 43–57. Springer (2012)
Gudivada, V.N., Baeza-Yates, R.A., Raghavan, V.V.: Big data: promises and problems. IEEE Comput. 48(3), 20–23 (2015)
Kitchenham, B., Charters, S.: Guidelines for performing systematic literature reviews in software engineering version 2.3. Engineering 45(4ve), 1051 (2007)
Nwokeji, J.C., Aqlan, F., Olagunju, A.: Big data ETL implementation approaches: a systematic literature review (P) (2018)
Nwokeji, J.C., Aqlan, F., Barn, B., Clark, T., Kulkarni, V.: A modelling technique for enterprise agility. In: Proceedings of the 51st Hawaii International Conference on System Sciences (2018)
Nwokeji, J.C., Clark, T., Barn, B., Kulkarni, V.: A conceptual framework for enterprise agility. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing, pp. 1242–1244. ACM (2015)
Simitsis, A., Wilkinson, K., Dayal, U., Castellanos, M.: Optimizing etl workflows for fault-tolerance. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), pp. 385–396, March 2010
Teodoro, D.H., et al.: Interoperability driven integration of biomedical data sources. Stud. Health Technol. Inf. 169, 185–9 (2011)
Theodorou, V., AbellĂ³, A., Lehner, W.: Quality measures for ETL processes. In: International Conference on Data Warehousing and Knowledge Discovery, pp. 9–22. Springer (2014)
Wang, Y., Kung, L.A., Byrd, T.A.: Big data analytics: understanding its capabilities and potential benefits for healthcare organizations. Technol. Forecast. Soc. Change 126, 3–13 (2018)
Zhang, Y., Qiu, M., Tsai, C.-W., Hassan, M.M., Alamri, A.: Health-CPS: healthcare cyber-physical system assisted by cloud and big data. IEEE Syst. J. 11(1), 88–95 (2017)
Ziegler, P., Dittrich, K.R.: Data integration-problems, approaches, and perspectives. In: Conceptual Modelling in Information Systems Engineering, pp. 39–58. Springer (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Nwokeji, J.C., Matovu, R. (2021). A Systematic Literature Review on Big Data Extraction, Transformation and Loading (ETL). In: Arai, K. (eds) Intelligent Computing. Lecture Notes in Networks and Systems, vol 284. Springer, Cham. https://doi.org/10.1007/978-3-030-80126-7_24
Download citation
DOI: https://doi.org/10.1007/978-3-030-80126-7_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-80125-0
Online ISBN: 978-3-030-80126-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)