Lect 5
Lect 5
Lect 5
— Chapter 5—
transaction records
Data cleaning and data integration techniques
are applied.
Ensure consistency in naming conventions,
Month
Data Mining: Concepts and Techniq
24/12/5 ues 14
Cuboids Corresponding to the Cube
all
0-D(apex) cuboid
product date country
1-D cuboids
3-D(base) cuboid
product, date, country
time,location,supplier
3-D cuboids
time,item,location
time,item,supplier item,location,supplier
4-D(base) cuboid
time, item, location, supplier
Data Mining: Concepts and Techniq
24/12/5 ues 16
Conceptual Modeling of Data
Warehouses
Modeling data warehouses: dimensions & measures
Star schema: A fact table in the middle connected
to a set of dimension tables
Snowflake schema: A refinement of star schema
where some dimensional hierarchy is normalized
into a set of smaller dimension tables, forming a
shape similar to snowflake
Fact constellations: Multiple fact tables share
dimension tables, viewed as a collection of stars,
therefore called galaxy schema or fact
constellation
Data Mining: Concepts and Techniq
24/12/5 ues 17
Example of Star Schema
time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key type
year supplier_type
item_key
branch_key
branch location
location_key
branch_key location_key
branch_name units_sold street
branch_type city
dollars_sold state_or_province
country
avg_sales
Measures
Data Mining: Concepts and Techniq
24/12/5 ues 18
Example of Snowflake Schema
time
time_key item
day item_key supplier
day_of_the_week Sales Fact Table item_name supplier_key
month brand supplier_type
quarter time_key type
year item_key supplier_key
branch_key
location
branch location_key
location_key
branch_key
units_sold street
branch_name
city_key
branch_type
dollars_sold city
city_key
avg_sales city
state_or_province
Measures country
Data Mining: Concepts and Techniq
24/12/5 ues 19
Example of Fact
Constellation
time
time_key item Shipping Fact Table
day item_key
day_of_the_week Sales Fact Table item_name time_key
month brand
quarter time_key type item_key
year supplier_type shipper_key
item_key
branch_key from_location
Office Day
Month
Data Mining: Concepts and Techniq
24/12/5 ues 21
A Sample Data Cube
Total annual sales
Date of TV in U.S.A.
1Qtr 2Qtr 3Qtr 4Qtr sum
ct
TV
du
PC U.S.A
o
Pr
VCR
Country
sum
Canada
Mexico
sum
all
0-D(apex) cuboid
product date country
1-D cuboids
3-D(base) cuboid
product, date, country
Layer2
MDDB
MDDB
Meta
Data
Filtering&Integration Database API Filtering
Layer1
Data cleaning Data
Databases Data
Data
Data integration
Warehouse
Mining: Concepts and Techniq Repository
24/12/5 ues 29
Chapter 3: Data Warehousing and
OLAP Technology: An Overview
What is a data warehouse?
Summary