Data Warehouse
Data Warehouse
Data Warehouse
INTRODUCTION
Data Warehouse Ralph Kimball
Def: -
DECISION SUPPORTING SYSTEM (DSS): Since a DWH is a decision to support a decision making process,
hence it is known as Decision Supporting System.
HISTORICAL DATABASE: -
DATA WAREHOUSE
READ ONLY DATABASE: Since data base is design to only query the data for analyzing, but
not for transactional processing hence it is called as Read Only Database.
DATA ACQUISITION:Its a process of extracting the data from multiple OLTP source
systems, integrating the data into a homogeneous format and loading into
Data Warehouse.
There are two types of ETLs to build data acquisition
i)
ii)
CODE BASE ETL: An ETL application can be developed using programming languages
such as SQL, PL/SQL.
Ex: - SAS based SAL access, Teradata and ETL utilities.
DATA EXTRACTION: Its a process of reading the data from multiple OLTP source
systems. The following are the different source systems.
I.
Main Frames
II.
Oracle Applications
III.
SAP
IV.
People Soft
V.
XML Files
VI.
Flat Files
DATA WAREHOUSE
DATA TRANSFORMATON: Its a process of converting the data into the required business
format.
DATA SCRUBBING: It is the process of deriving new data which is not available in the
source.
DATA MERGING: Its a process of integrating the data records from multiple sources.
I.
II.
Vertical Merging
DATA WAREHOUSE
Ex: -
Horizontal Merging
VERTICAL MERGING: Its a process of integrating records the data from similar source
definitions.
HORIZONAL MERGING: Its a process of integrating the data records horizontally using
the process called JOIN (Based on common column values).
DATA AGGREGATION: -
DATA WAREHOUSE
DATA LOADING: Its a process of inserting the data into target system. There are two
types of loads
I.
II.
Initial Load
Incremental Load
INITIAL LOADING: Its a process of inserting the data into an empty target table.
INCREMANTAL LOAD: Its a process of loading only new records after initial load.
ETL CLIENT SERVER TECHONOLOGY: ETL CLIENT: An ETL client is graphical application software which allows to
design the plan of ETL process.
An ETL plan is design with following components
i) Source definition
ii) Target definition
ETL REPOISTORY: -
DATA WAREHOUSE
DATA WAREHOUSE
DWH
i)
ii)
iii)
Historical data
iv)
Summarized data
v)
vi)
vii)
Demoralized data
Normalized data
DATA WAREHOUSE
Bus Schema)
A. Initial Load
DATA WAREHOUSE
B. Incremental Load
SNOW FLAKE SCHEMA: A very large dimension table is splited into one (or) more dimension
tables, which results in reducing quite bit of table space.
It improves the query performance.
Disadvantage as number of tables increases the number of joints
increases as a result query performance many also degrade.
DATA WAREHOUSE
Note: -***
GALAXY SCHEMA: FACT CONSTELLATION: It is a process joining two FACT tables from multiple schemas.
CONFORMED DIMENSIONS: A dimension table which is shared by multiple FACT tables is known
as Conformed Dimensions.
Ex: - Customer and Time.
ZUNK DIMENSION: A dimension with the type flag (0 or 1) and bullion (YES or NO) are
not used to describe the Key Performance Indicators are known as Zunk
Dimensions.
Ex: - Gender_Flag, Product_Promotion_Flag.
DIRTY DIMENSION: In a dimension table if the record exist more than once with a
change of non-key attribute is known as Dirty Dimension.
DATA WAREHOUSE
TYPE1 DIMENSION
II.
TYPE2 DIMENSION
III.
TYPE3 DIMENSION
TYPE1 DIMENSION: A type1 dimension stores only current changes in the target. It does
not store history.
TYPE2 DIMENSION: A type2 dimension stores complete historical data in the target.
For each update in the OLTP it inserts a new record in the target. A
surrogate key is a system generated sequence number that is to be
defined as Primary Key.
TYPE3 DIMENSION: 10
TYPES OF FACT TABLES: Detailed Fact Table: A FACT table which contains details of the transactions is known as
Detailed Fact Table.
DATA WAREHOUSE
Summarized Fact Table: A FACT table which contains aggregate facts is known as summary
FACT table.
Fact-Less Fact Table: A FACT table without any FACTS is known as FACT-Less FACT Table.
TYPES OF FACTS: There are three types of FACTS, a fact table can have
Additive FACTS: A FACT which can be summarized for all the dimensions is known as
Additive FACT.
Ex: - Quantity, Revenue
11
DATA WAREHOUSE
Semi-Additive FACTS: A FACT which can be summarized for a few dimensions but not for
all the dimensions is known as Semi-Additive FACT.
Ex: - Current Balance
12
DATA WAREHOUSE
Non-Additive FACTS: -
ONLINE ANALYTICAL PROCESSOR (OLAP): An OLAP is a set of specifications which allows the decision maker
(or) end-users to query the data from database and present the data for
analysis in a template called Report.
13
Relational OLAP (R-OLAP): An OLAP which can query the data from relational data sources is
known R-OLAP.
Ex: - Cognos, Business Objects, Micro Strategy.
Hybrid OLAP (H-OLAP): An OLAP which supports the combined properties of R-OLAP and MOLAP is known as H-OLAP.
Ex: - Cognos, Business Objects, Micro Strategy.
DATA WAREHOUSE
Desktop OLAP (D-OLAP): An OLAP which can query the data from desktop databases such as
Text Files, XML Files, EXCEL known as D-OLAP.
Ex: - Cognos, Business Objects, Micro Strategy. Category
14
15
DATA WAREHOUSE