Data Warehousing: Data Models and OLAP Operations
Data Warehousing: Data Models and OLAP Operations
OLAP operations
Topics Covered
1. Understanding the term “Data Warehousing”
5. ROLAP
6. MOLAP
7. HOLAP
9. Conclusion
Understanding the term Data Warehousing
Data Warehouse:
The term Data Warehouse was coined by Bill Inmon in 1990, which he
defined in the following way: "A warehouse is a subject-oriented,
integrated, time-variant and non-volatile collection of data in support
of management's decision making process". He defined the terms in
the sentence as follows:
Subject Oriented:
Data that gives information about a particular subject instead of about
a company's ongoing operations.
Integrated:
Data that is gathered into the data warehouse from a variety of
sources and merged into a coherent whole.
Time-variant:
All data in the data warehouse is identified with a particular time
period.
Non-volatile
Data is stable in a data warehouse. More data is added but data is
never removed. This enables management to gain a consistent picture
of the business.
Data Warehouse Architecture
Other important terminology
Enterprise Data warehouse
collects all information about subjects (customers,products,sales,assets,
personnel) that span the entire organization
Data Mart
Departmental subsets that focus on selected subjects
extract Query/Reporting
transform
load serve
refresh
etc. e.g., ROLAP
Operational
DB’s Data Mining
serve
Data Marts
Approaches to OLAP Servers
Three possibilities for OLAP servers
(1) Relational OLAP (ROLAP)
Relational and specialized relational DBMS to store and manage
warehouse data
OLAP middleware to support missing pieces
(2) Multidimensional OLAP (MOLAP)
Array-based storage structures
Direct access to array data structures
(3) Hybrid OLAP (HOLAP)
Storing detailed data in RDBMS
Storing aggregated data in MDBMS
User access via MOLAP tools
The Multi-Dimensional Data Model
“Sales by product line over the past six months”
“Sales by store between 1990 and 1995”
...
ROLAP: Dimensional Modeling Using
Relational DBMS
Special schema design: star, snowflake
Products
IBM DB2, Oracle, Sybase IQ, RedBrick, Informix
Star Schema (in RDBMS)
Star Schema Example
The “Classic” Star Schema
Benefits: Easy to understand, easy to define hierarchies, reduces # of physical joins, low
maintenance, very simple metadata
Star Schema
with Sample
Data
The “Snowflake” Schema
Store Dimension
STORE KEY District_ID Region_ID
Store Description District Desc. Region Desc.
City Region_ID Regional Mgr.
State
District ID
Region_ID
Regional Mgr.
Store Fact Table
STORE KEY
PRODUCT KEY
PERIOD KEY
Dollars
Units
Price
Aggregates
Add up amounts for day 1
In SQL: SELECT sum(amt) FROM SALE
WHERE date = 1
rollup
drill-down
Points to be noticed about ROLAP
Defines complex, multi-dimensional data with simple model
Reduces the number of joins a query has to process
Allows the data warehouse to evolve with rel. low maintenance
Can contain both detailed and summarized data.
ROLAP is based on familiar, proven, and already selected
technologies.
BUT!!!
SQL for multi-dimensional manipulation of calculations.
MOLAP: Dimensional Modeling Using the
Multi Dimensional Model
dimensions = 2
3-D Cube
dimensions = 3
Example
roll-up to region
Dimensions:
NY
SF
Time, Product, Store
roll-up to brand
LA
Attributes:
10
Product (upc, price, …)
Juice
Store …
Product
Milk 34
56 …
Coke
Cream 32 Hierarchies:
Soap 12 Product Brand …
Bread 56 roll-up to week Day Week Quarter
M T W Th F S S
Store Region Country
Time
56 units of bread sold in LA on M
Cube Aggregation: Roll-up
Example: computing sums
s1 s2 s3
day 2 ...
p1 44 4
p2 s1 s2 s3
day 1
p1 12 50
p2 11 8
s1 s2 s3
sum 67 12 50
s1 s2 s3
p1 56 4 50
p2 11 8
129
sum
rollup p1 110
p2 19
drill-down
Cube Operators for Roll-up
s1 s2 s3
day 2 ...
p1 44 4
p2 s1 s2 s3
day 1
p1 12 50
p2 11 8 sale(s1,*,*)
s1 s2 s3
sum 67 12 50
s1 s2 s3
p1 56 4 50
p2 11 8
129
sum
sale(s2,p2,*) p1 110
p2 19 sale(*,*,*)
Extended Cube
* s1 s2 s3 *
p1 56 4 50 110
p2 11 8 19
day 2 *
s1 67
s2 12
s3 *50 129
p1 44 4 48
p2
s1 s2 s3 *
day 1
p1
*
12
44 4
50 62
48 sale(*,p2,*)
p2 11 8 19
* 23 8 50 81
Aggregation Using Hierarchies
s1 s2 s3
day 2
p1 44 4
store
p2 s1 s2 s3
day 1
p1 12 50
p2 11 8
region
country
region A region B
p1 56 54
p2 11 8
(store s1 in Region A;
stores s2, s3 in Region B)
Points to be noticed about MOLAP
MDDs are great candidates for the <50GB department data marts.
1) Performance:
While MDD servers can handle up to 50GB of storage, RDBMS servers can
handle hundreds of gigabytes and terabytes.
An experiment with Relational and the
Multidimensional models on a data set
.
* This may include the calculation of many other derived data without any
additional I/O.
What-if analysis
IF
A. You require write access
B. Your data is under 50 GB
C. Your timetable to implement is 60-90 days
D. Lowest level already aggregated
E. Data access on aggregated level
F. You’re developing a general-purpose application for inventory movement or assets management
THEN
Consider an MDD /MOLAP solution for your data mart
IF
A. Your data is over 100 GB
B. You have a "read-only" requirement
C. Historical data at the lowest level of granularity
D. Detailed access, long-running queries
E. Data assigned to lowest level elements
THEN
Consider an RDBMS/ROLAP solution for your data mart.
IF
A. OLAP on aggregated and detailed data
B. Different user groups
C. Ease of use and detailed data
THEN
Consider an HOLAP for your data mart
Examples
ROLAP
Telecommunication startup: call data records (CDRs)
ECommerce Site
Credit Card Company
MOLAP
Analysis and budgeting in a financial department
Sales analysis
HOLAP
Sales department of a multi-national company
Banks and Financial Service Providers
Tools available
ROLAP:
ORACLE 8i
ORACLE Reports; ORACLE Discoverer
ORACLE Warehouse Builder
Arbors Software’s Essbase
MOLAP:
ORACLE Express Server
ORACLE Express Clients (C/S and Web)
MicroStrategy’s DSS server
Platinum Technologies’ Plantinum InfoBeacon
HOLAP:
ORACLE 8i
ORACLE Express Serve
ORACLE Relational Access Manager
ORACLE Express Clients (C/S and Web)
Conclusion
ROLAP: RDBMS -> star/snowflake schema
ROLAP or MOLAP: Data models used play major role in performance differences
The choice is requirement specific, though currently data warehouses are predominantly built
using RDBMSs/ROLAP.
References
http://dimlab.usc.edu/csci599/Fall2002/paper/I2_P064.pdf
OLAP, Relational, and Multidimensional Database Systems, by George Colliat,
Arbor Software Corporation
http://www.donmeyer.com/art3.html
Data warehousing Services, Data Mining & Analysis, LLC
http://www.cs.man.ac.uk/~franconi/teaching/2001/CS636/CS636-olap.ppt
Data Warehouse Models and OLAP Operations, by Enrico Franconi
http://www.promatis.com/mediacenter/papers
- ROLAP, MOLAP, HOLAP: How to determine which to technology is appropriate,
by Holger Frietch, PROMATIS Corporation