18 ENT 2019 Golec 144 148
18 ENT 2019 Golec 144 148
18 ENT 2019 Golec 144 148
Conference Paper
Data Lake Architecture for a Banking Data Model
Suggested Citation: Golec, Darko (2019) : Data Lake Architecture for a Banking Data Model,
In: Proceedings of the ENTRENOVA - ENTerprise REsearch InNOVAtion Conference, Rovinj,
Croatia, 12-14 September 2019, IRENET - Society for Advancing Innovation and Research in
Economy, Zagreb, Vol. 5, pp. 144-148
Die Dokumente auf EconStor dürfen zu eigenen wissenschaftlichen Documents in EconStor may be saved and copied for your
Zwecken und zum Privatgebrauch gespeichert und kopiert werden. personal and scholarly purposes.
Sie dürfen die Dokumente nicht für öffentliche oder kommerzielle You are not to copy documents for public or commercial
Zwecke vervielfältigen, öffentlich ausstellen, öffentlich zugänglich purposes, to exhibit the documents publicly, to make them
machen, vertreiben oder anderweitig nutzen. publicly available on the internet, or to distribute or otherwise
use the documents in public.
Sofern die Verfasser die Dokumente unter Open-Content-Lizenzen
(insbesondere CC-Lizenzen) zur Verfügung gestellt haben sollten, If the documents have been made available under an Open
gelten abweichend von diesen Nutzungsbedingungen die in der dort Content Licence (especially Creative Commons Licences), you
genannten Lizenz gewährten Nutzungsrechte. may exercise further usage rights as specified in the indicated
licence.
https://creativecommons.org/licenses/by-nc/4.0/
ENTRENOVA 12-14, September 2019 Rovinj, Croatia
Abstract
Industry models provide an excellent opportunity to accelerate development based
on best practices and standards which are introduced in industry models. One such
model is a banking model for data warehouse. Traditional data warehousing
technologies are based on relational database engines, data consistency and high
normalization, but in more recent period data lake has become more and more
interesting. Main advantages of the data lake landscape are commodity hardware,
open source technologies with cost-free software and elastic scalability. In this paper
we will present how data lake can be used in addition to data warehouse. The aim of
the paper is presenting a possible data lake architecture for the banking industry
model which is considered in a certain international banking company.
Introduction
Banking Data Warehouse is a family of business and technical models that accelerate
the design of enterprise vocabularies, data warehouses, data lakes, and analytics
solutions, driven by financial-services business requirements (IBM Ireland, 2006).
Making better decisions faster can make the difference between surviving and
thriving in an increasingly competitive marketplace. The financial services industry
needs to respond to challenges such as globalization, deregulation and customer
expectations.
This paper will describe a possible end-to-end architecture for a banking data
model. Tools will not be covered. An architecture is based on popular trends, such as
scalability, performance, distribution and open source.
A research methodology is a review of literature. Research question is how does
reference architecture for data lake architecture look like?
Industry Models
An industry models are a comprehensive set of predesigned models that form the
basis of a business and software solution. An industry models (Figure 1) consist of set of
industry‐specific integrated models that are optimized for business challenges in a
144
ENTRENOVA 12-14, September 2019 Rovinj, Croatia
Figure 1
Industry Models
Architecture Consideration
This section presents the reasons of using data warehouse as well as reasons of using
data lake in banking. Moreover, reference architectures and importance of
requirements are described.
Data Warehouse
A Data Warehouse model consists of atomic model and dimensional model. Atomic
model is used for enterprise data, while dimensional model is used for data marts.
Figure 2 depicts two important layers:
o Atomic warehouse model – used as the basis for the Inmon-style central
relational data warehouse deployment.
o Dimensional warehouse model – used as the basis for the Kimball-style
relational data warehouse deployment.
145
ENTRENOVA 12-14, September 2019 Rovinj, Croatia
Figure 2
Reference Architecture for a Data Warehouse
Data Lake
At the core of the data lake are the set of repositories which could range from
traditional RDBMs information warehouses to operational data hubs to HDFS clusters.
An architecture for data lake is shown in figure 3. Typically, components are design-
time artifacts and are used to underpin the related development activities. Critical
data lake components in relation to banking model are:
o Catalog – Business Term content
o Deep data – historical data from the systems of record
o Sandboxes – store for data for experimentation purposes
Figure 3
Reference Architecture for a Data Lake
146
ENTRENOVA 12-14, September 2019 Rovinj, Croatia
Figure 4
Coexistence of Data Lake and Data Warehouse
147
ENTRENOVA 12-14, September 2019 Rovinj, Croatia
for business users can be consumed with Consumption and Delivery in which all of the
zone areas are accessible.
Figure 5
Data Lake Architecture for a Banking Data Model
Conclusion
As described in this paper, banks are interested in a data warehouse and data lake
implementation. A paper has described use cases and requirements when data lake
can be better than data warehouse. Both of them have advantages and drawbacks,
hence they cannot replace each other, but rather coexist as complimentary and
harmonized solutions. As the main goal data lake architecture for a banking data
model has been presented based on several zone areas.
References
1. Awadallah, A., Graham, D. (2011), “Hadoop and the Data Warehouse: When to Use
Which”, available at: marketing.teradata.com/When-to-Use-Hadoop (05 April 2019).
2. O’Brien, H. (2015), Agile Project Management: A Quick Start Beginner’s Guide To
Mastering Agile Project Management, CreateSpace Publishing.
3. Clifford, A., Murphy, D., Fritzsimons, G., Meehan, P., O’Suilleabhain, R., Abed, S. (2012),
Best Practices, Transforming IBM Industry Models into a production data warehouse.
4. IBM (2006), IBM Industry Models for Financial Services, The Information FrameWork (IFW)
Overview.
5. Documentation from the project (International banking company).
148