Class15 - Data Warehousing
Class15 - Data Warehousing
Class15 - Data Warehousing
Data
Raw piece of information that is capable of being moved
and store.
Database
An organized collection of such data in which data are
managed in tabular form with relationship.
Data Warehouse
System that organizes all the data available in an
organization, makes it accessible & usable for the all kinds
of data analysis and also allows to create a lots of reports
by the use of mining tools.
CH#2, Data Warehousing By: Babu Ram Dawadi
Data Warehouse…
“A data warehouse is a subject-oriented,
integrated, time-variant, and nonvolatile collection
of data in support of management’s decision-
making process.”
Data warehousing:
The process of constructing and using data
warehouses.
Is the process of extracting & transferring
operational data into informational data & loading
it into a central data store (warehouse)
CH#2, Data Warehousing By: Babu Ram Dawadi
Data Warehouse—Integrated
Constructed by integrating multiple,
heterogeneous data sources Sales
relational databases, flat files, on-line system
transaction records
Data cleaning and data integration
techniques are applied. Payroll
system Customer
Ensure consistency in naming
data
conventions, encoding structures,
attribute measures, etc. among different
data sources Purchasing
E.g., Hotel price: currency, tax, system
breakfast covered, etc.
When data is moved to the warehouse, it
is converted.
CH#2, Data Warehousing By: Babu Ram Dawadi
Data Warehouse—Subject-
Oriented
Organized around major subjects, such
as customer, product, sales.
Sales Employee
Focusing on the modeling and analysis system data
analysis
...
Extractor/ Extractor/ Extractor/
Monitor Monitor Monitor
queries/
Query reports
Data and
Integration Data Data Analysis
Component Warehouse Component
data
mining
Metadata
Internal Monitoring
Sources Administration
Construction &
CH#2, Data Warehousing By: Babu Ram Dawadi
maintenance
3 main phases
Data acquisition
relevant data collection
Recovering: transformation into the data warehouse model from
existing models
Loading: cleaning and loading in the DWH
Storage
Data extraction
Tool examples: Query report, SQL, multidimensional analysis
(OLAP tools), datamining
+ evolution and maintenance
10
DW Monitoring
Identify growth factors and rate
Identify what data is being used
Identify who is using the data, and when
11
DATA WAREHOUSING
THE USE OF A DATA WAREHOUSE
INVENTORY
DATABASE STEP 1: Load the Data Warehouse
DATA
NEWCASTLE
SALES DB WAREHOUSE
LONDON
SALES DB
DW
Data marts
17
Decision Support Systems
Methodology (or series of methodologies) designed to
extract information from data and to use such
information as a basis for decision making
DEFINITION :
‘OLAP applications and tools are those that are designed to ask
ad hoc, complex queries of large multidimensional collections
of data. It is for this reason that OLAP is often mentioned in the
context of Data Warehouses’.
Region
Sales Granularity
Year Product
category
Quarter Product
type
Product
3 dimensions
34
OLAP
MULTDIMENSIONAL DATA MODEL
London
Glasgow Socks
Newcastle
Jumpers
10 50 10 10 T-Shirts
0 0 1 2 Shorts
80 80 80 80
Pyjamas
0 25 20 15
0 0 0 0
Spring Summer Autumn Winter
Vaud
Fribourg
Neuchatel
1999 Product
type
1998
1997
Year
Sales of standard telephones
in 1997 in Vaud region
36
OLAP Terminology
A data cube supports viewing/modelling of a
variable (a set of variables) of interest. Measures
are used to report the values of the particular
variable with respect to a given set of dimensions.
dimensions = 3
location by country
locat All
Türkiye Almanya
PC 50 90 PC 140
Printer 20 30 Printer 50
Two dimensional cuboid
One dim. cuboid 42
Roll-up and Drill-down algebraic operators
3
25 28
9
1997 180 244 72 35 180 244 72 44
9
42 51
5
318 204 78 22 318 204 78 27
1998
8
3 11
35 Lausane }Vaud
1999
131 153 57 11 Vevey 131 153 57 46 Vaud
Brig}Valais
Sion Valais
mobiles fax standard mobiles fax standard
IOION
N
WWN
Roll-up
N
EGG
TTOO
RRE
Less detailed: go up in the granularity hierarchy
Drill-down
45
Cube: A Lattice of Cuboids
all
0-D(apex) cuboid
time,location,supplier
time,item,location 3-D cuboids
time,item,supplier item,location,supplier
46
4-D(base) cuboid
time, item, location, supplier
CONCEPTUAL MODELING OF
DATA WAREHOUSES
Fact Table
EXAMPLE OF SNOWFLAKE SCHEMA
time
time_key item
day item_key supplier
day_of_the_week Sales Fact Table item_name supplier_key
month brand supplier_type
quarter time_key type
year item_key supplier_key
branch_key
location
branch location_key
location_key
branch_key
units_sold street
branch_name
city_key city
branch_type
dollars_sold
city_key
avg_sales city
province_or_street
51
Measures country
EXAMPLE OF FACT
CONSTELLATION
time
time_key item Shipping Fact Table
day item_key
day_of_the_week Sales Fact Table item_name time_key
month brand
quarter time_key type item_key
year supplier_type shipper_key
item_key
branch_key from_location
55
OLTP –VS- OLAP
On Line Transaction Processing -- OLTP
Maintain a database that is an accurate model of some
real-world enterprise
Short simple transactions
– Bill Gates
Verification Multidimensional
Statistical
Data Mart
Data Warehouse
knowledge discovery =
data preparation + data mining + evaluation/interpretation of discovered
patterns/relationships
Pattern Evaluation
Data mining: the core of
knowledge discovery process.
Data Mining
Task-relevant Data
Data Selection
Warehouse
Data Cleaning
Data Integration
Client Age Incom Credit Car House Regio Car. House Sport Music Comic
NO e owner owner n Mag. Mag. Mag. Mag. Mag.
Classification tasks
*Association Rules
•K- nearest neighbor Genetic Algorithms
•Decision Trees
Classification tasks:
Classification is the process of dividing data into no. of
classes. Eg: class of customers
Problem Solving Tasks:
It involves finding solutions of remedies to the problems
that arise. Eg: why are people not going to cinema hall?
CH#2, Data Warehousing By: Babu Ram Dawadi
KDD…
For finding useful patterns in databases, it is
necessary to choose right algorithms and right
tools.