DM104 - Evaluation of Business Performance
DM104 - Evaluation of Business Performance
DM104 - Evaluation of Business Performance
Welcome Notes:
WELCOME BSIS STUDENTS!
I. INTRODUCTION:
This module discusses the three types of data warehousing architecture. Data warehouse architecture
refers to the design of an organization's data collection and storage framework. This module will guide you
to fully understand the three architectures of data warehousing and the different layers working within every
structure.
II. OBJECTIVES:
Before you proceed to the main lesson, test yourself in this activity.
GREAT!!!
You may now proceed to the main lesson.
Based on the preliminary activities, what did you notice about it?
________________________________________________________
CONGRATULATIONS!
You may now proceed to the lesson.
A data warehouse architecture is a method of defining the overall architecture of data communication
processing and presentation that exist for end-clients computing within the enterprise. Each data warehouse
is different, but all are characterized by standard vital components.
DM104 – Evaluation of Business Performance Page 3 of 15
Data Warehouse Architecture
Production applications such as payroll accounts payable product purchasing and inventory control are
designed for online transaction processing (OLTP). Such applications gather detailed data from day to day
operations.
Data Warehouse applications are designed to support the user ad-hoc data requirements, an activity
recently dubbed online analytical processing (OLAP). These include applications such as forecasting,
profiling, summary reporting, and trend analysis.
Production databases are updated continuously by either by hand or via OLTP applications. In contrast,
a warehouse database is updated from operational systems periodically, usually during off-hours. As OLTP
data accumulates in production databases, it is regularly extracted, filtered, and then loaded into a dedicated
warehouse server that is accessible to users. As the warehouse is populated, it must be restructured tables
de-normalized, data cleansed of errors and redundancies and new fields and keys added to reflect the needs
to the user for sorting, combining, and summarizing data. Data warehouses and their architectures very
depending upon the elements of an organization's situation.
Three common architectures are:
Data Warehouse Architecture: Basic
Data Warehouse Architecture: With Staging Area
Data Warehouse Architecture: With Staging Area and Data Marts
Operational System
An operational system is a method used in data warehousing to refer to a system that is used to
process the day-to-day transactions of an organization.
Flat Files
A flat file system is a system of files in which transactional data is stored, and every file in the system
must have a different name.
Meta Data
A set of data that defines and gives information about other data. Meta Data summarizes necessary
information about data, which can make finding and work with particular instances of data more
accessible. For example, author, data build, and data changed, and file size are examples of very basic
document metadata. Metadata is used to direct a query to the most appropriate data source.
Lightly and highly summarized data
The area of the data warehouse saves all the predefined lightly and highly summarized (aggregated)
data generated by the warehouse manager. The goals of the summarized information are to speed up
query performance. The summarized record is updated continuously as new information is loaded into
the warehouse.
End-User access Tools
The principal purpose of a data warehouse is to provide information to the business managers for
strategic decision-making. These customers interact with the warehouse using end-client access tools.
The examples of some of the end-user access tools can be:
- Reporting and Query Tools
- Application Development Tools
- Executive Information Systems Tools
- Online Analytical Processing Tools
- Data Mining Tools
A staging area simplifies data cleansing and consolidation for operational method coming from multiple
source systems, especially for enterprise data warehouses where all relevant data of an enterprise is
consolidated.
Data Warehouse Staging Area is a temporary location where a record from source systems is copied.
The figure illustrates an example where purchasing, sales, and stocks are separated. In this example, a
financial analyst wants to analyze historical data for purchases and sales or mine historical information to
make predictions about customer behavior.
1. Separation: Analytical and transactional processing should be keep apart as much as possible.
2. Scalability: Hardware and software architectures should be simple to upgrade the data volume,
which has to be managed and processed, and the number of user's requirements, which have to be
met, progressively increase.
3. Extensibility: The architecture should be able to perform new operations and technologies without
redesigning the whole system.
DM104 – Evaluation of Business Performance Page 7 of 15
Data Warehouse Architecture
4. Security: Monitoring accesses are necessary because of the strategic data stored in the data
warehouses.
5. Administrable: Data Warehouse management should not be complicated.
Single-Tier Architecture
Single-Tier architecture is not periodically used in practice. Its purpose is to minimize the amount of data
stored to reach this goal; it removes data redundancies. The figure shows the only layer physically available
is the source layer. In this method, data warehouses are virtual. This means that the data warehouse is
implemented as a multidimensional view of operational data created by specific middleware, or an
intermediate processing layer.
DM104 – Evaluation of Business Performance Page 8 of 15
Data Warehouse Architecture
The vulnerability of this architecture lies in its failure to meet the requirement for separation between
analytical and transactional processing. Analysis queries are agreed to operational data after the middleware
interprets them. In this way, queries affect transactional workloads.
Two-Tier Architecture
The requirement for separation plays an essential role in defining the two-tier architecture for a data
warehouse system, as shown in figure:
Although it is typically called two-layer architecture to highlight a separation between physically available
sources and data warehouses, in fact, consists of four subsequent data flow stages:
1. Source layer: A data warehouse system uses a
heterogeneous source of data. That data is stored
initially to corporate relational databases or legacy
databases, or it may come from an information
system outside the corporate walls.
2. Data Staging: The data stored to the source should
be extracted, cleansed to remove inconsistencies
and fill gaps, and integrated to merge
heterogeneous sources into one standard schema.
The so-named Extraction, Transformation,
and Loading Tools (ETL) can combine
heterogeneous schemata, extract, transform,
cleanse, validate, filter, and load source data into a
data warehouse.
3. Data Warehouse layer: Information is saved to one logically centralized individual repository: a data
warehouse. The data warehouses can be directly accessed, but it can also be used as a source for
creating data marts, which partially replicate data warehouse contents and are designed for specific
enterprise departments. Meta-data repositories store information on sources, access procedures,
data staging, users, data mart schema, and so on.
4. Analysis: In this layer, integrated data is efficiently, and flexible accessed to issue reports,
dynamically analyze information, and simulate hypothetical business scenarios. It should feature
aggregate information navigators, complex query optimizers, and customer-friendly GUIs.
DM104 – Evaluation of Business Performance Page 9 of 15
Data Warehouse Architecture
Three-Tier Architecture
The three-tier architecture consists of the source layer (containing multiple source system), the
reconciled layer and the data warehouse layer (containing both data warehouses and data marts). The
reconciled layer sits between the source data and data warehouse.
The main advantage of the reconciled
layer is that it creates a standard reference
data model for a whole enterprise. At the same
time, it separates the problems of source data
extraction and integration from those of data
warehouse population. In some cases,
the reconciled layer is also directly used to
accomplish better some operational tasks,
such as producing daily reports that cannot be
satisfactorily prepared using the corporate
applications or generating data flows to feed
external processes periodically to benefit from
cleaning and integration.
This architecture is especially useful for the extensive, enterprise-wide systems. A disadvantage of this
structure is the extra file storage space used through the extra redundant reconciled layer. It also makes the
analytical tools a little further away from being real-time.
1. Bottom Tier
The Bottom Tier in the three-tier architecture of a data warehouse consists of the Data Repository.
Data Repository is the storage space for the data extracted from various data sources, which undergoes
a series of activities as a part of the ETL process. ETL stands for Extract, Transform and Load. As a
preliminary process, before the data is loaded into the repository, all the data relevant and required are
identified from several sources of the system. These data are then cleaned up, to avoid repeating or junk
data from its current storage units. The next step is to transform all these data into a single format of
storage. The final step of ETL is to Load the data on the repository.
2. Middle Tier
The Middle tier here is the tier with the OLAP servers. The Data Warehouse can have more than
one OLAP server, and it can have more than one type of OLAP server model as well, which depends on
the volume of the data to be processed and the type of data held in the bottom tier.
3. Top Tier
The Top Tier is a front-end layer, that is, the user interface that allows the user to connect with the
database systems. This user interface is usually a tool or an API call, which is used to fetch the required
data for Reporting, Analysis, and Data Mining purposes. The type of tool depends purely on the form of
outcome expected. It could be a Reporting tool, an Analysis tool, a Query tool or a Data mining tool.
1. Data warehouse __________ refers to the design of an organization's data collection and storage
framework.
2. __________ applications such as payroll accounts payable product purchasing and inventory control are
designed for online transaction processing (OLTP).
3. __________ system is a method used in data warehousing to refer to a system that is used to process
the day-to-day transactions of an organization.
4. __________ is data that defines and gives information about other data.
5. __________ is a temporary location where a record from source systems is copied.
6. __________ is a segment of a data warehouses that can provided information for reporting and analysis.
7. __________ architecture consists of the source, the reconciled layer and the data warehouse layer.
8. __________ here is the tier with the OLAP servers.
9. __________ architecture is used to minimize the amount of data stored to reach this goal; it removes
data redundancies.
10. __________ system is a system of files in which transactional data is stored, and every file in the system
must have a different name.
VI. GENERALIZATION
Middle Tier
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
Top Tier
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
KUDOS!
You have come to an end of Module 2.
OOPS! Don’t forget that you have still an assignment to do.
Here it is….
DM104 – Evaluation of Business Performance Page 13 of 15
Data Warehouse Architecture
VII. ASSIGNMENT
After your long journey of reading and accomplishing the module, let us now
challenge your mind by answering the evaluation part of this module.
DM104 – Evaluation of Business Performance Page 14 of 15
Data Warehouse Architecture
VIII. EVALUATION