Unit 3
Unit 3
Structure
3.0 Introduction
3.1 Objectives
3.2 Dimensional Modeling
3.2.1 Strengths of Dimensional Modeling
3.3 Identifying Facts and Dimensions
3.4 Star Schema
3.4.1 Features of Star Schema
3.5 Advantages and Disadvantages of Star Schema
3.6 Snowflake Schema
3.6.1 Features of Snowflake Schema
3.7 Advantages and Disadvantages of Snowflake Schema
3.7.1 Star Schema Vs Snowflake Schema
3.8 Fact Constellation Schema
3.8.1 Advantages and Disadvantages of Fact Constellation Schema
3.9 Aggregate Tables
3.10 Need for Building Aggregate Fact Tables
Limitations of Aggregate Fact Tables
3.11 Aggregate Fact Tables and Derived Dimension Tables
3.12 Summary
3.13 Solutions/Answers
3.14 Further Readings
3.0 INTRODUCTION
In the earlier unit, we had studied about the Data Warehouse Architecture and
Data Marts. In this unit let us focus on the modeling aspects. In this unit we will
go through the dimensional modeling, star schema, snowflake schema, aggregate
tables and Fact constellation schema.
3.1 OBJECTIVES
After going through this unit, you shall be able to:
• understand the purpose of dimension modeling;
• identifying the measures, facts, and dimensions;
• discuss the fact and dimension tables and their pros and cons;
• discuss the Star and Snowflake schemas;
• explore comparative analysis of star and snowflake schema;
• describe Aggregate facts, fact constellation, and
• discuss various examples of star and snowflake schema.
Data Warehouse
Fundamentals 3.2 DIMENSIONAL MODELING
and Architecture
Dimensional modeling is a data model design adopted when building a data
warehouse. Simply, it can be understood that dimension modeling reduces the
response time of query fired unlike relational systems. The concept behind
dimensional modeling is all about the conceptual design. Firstly let’s see the
introduction to dimensional modeling and how it is different from a traditional
data model design. A data model is a representation of how data is stored in a
database and it is usually a diagram of the few tables and the relationships that
exist between them. This modeling is designed to read, summarize and compute
some numeric data from a data warehouse. A data warehouse is an example of a
system that requires small number of large tables. This is due to many users using
the application to read lot of data a characteristic of a data warehouse is to write the
data once and read it many times over so it is the read operation that is dominant in
a data warehouse. Now let's look at the data warehouse containing customer related
information in a single table this makes it a lot easier for analytics just to count the
number of customers by country but this time the use of tables in the data warehouse
simplify the query processing. The main objective of dimension modeling is to
provide an easy architecture for the end user to write queries and also, to reduce
the number of relationships between the tables and dimensions hence providing
efficient query handling.
Dimensional modeling populates data in a cube as a logical representation with
OLAP data management. The concept was developed by Ralph Kimball. It has
“fact” and “dimension” as its two important measure. The transaction record is
divided into either “facts”, which consists of business numerical transaction data,
or “dimensions”, which are the reference information that gives context to the facts.
The more detail about fact and dimension is explained in the subsequent sections.
The main objective of dimension modeling is to provide an easy architecture for the
end user to write queries. Also it will reduce the number of relationships between
the tables and dimensions, hence providing efficient query handling.
The following are the steps in Dimension Modeling as shown in figure1.
1. Identify Business Process
2. Identify Grain (level of detail)
3. Identify dimensions and attributes
5. Build Schema
The model should describe the Why, How much, When/Where/Who and What of
your business process.
36
Dimensional Modeling
3.5
ADVANTAGES AND DISADVANTAGES OF STAR
SCHEMA
3.5.1 Advantages of Star Schema
Star schemas are easy for end users and applications to understand and navigate.
With a well-designed schema, users can quickly analyze large, multidimensional
data sets. The main advantages of star schemas in a decision-support environment
are:
• Query performance
Because a star schema database has a small number of tables and clear join paths,
queries run faster than they do against an OLTP system. Small single-table queries,
usually of dimension tables, are almost instantaneous. Large join queries that
involve multiple tables take only seconds or minutes to run.
In a star schema database design, the dimensions are linked only through the central
fact table. When two dimension tables are used in a query, only one join path,
intersecting the fact table, exists between those two tables. This design feature
enforces accurate and consistent query results.
39
Data Warehouse • Load performance and administration
Fundamentals
and Architecture Structural simplicity also reduces the time required to load large batches of data
into a star schema database. By defining facts and dimensions and separating them
into different tables, the impact of a load operation is reduced. Dimension tables
can be populated once and occasionally refreshed. You can add new facts regularly
and selectively by appending records to a fact table.
• Built-in referential integrity
A star schema has referential integrity built in when data is loaded. Referential
integrity is enforced because each record in a dimension table has a unique primary
key, and all keys in the fact tables are legitimate foreign keys drawn from the
dimension tables. A record in the fact table that is not related correctly to a dimension
cannot be given the correct key value to be retrieved.
• Easily understood
A star schema is easy to understand and navigate, with dimensions joined only
through the fact table. These joins are more significant to the end user, because they
represent the fundamental relationship between parts of the underlying business.
Users can also browse dimension table attributes before constructing a query.
3.5.2 Disadvantages of Star Schema
As mentioned before, improving read queries and analysis in a star schema could
involve certain challenges:
• Decreased data integrity: Because of the denormalized data structure, star
schemas do not enforce data integrity very well. Although star schemas use
countermeasures to prevent anomalies from developing, a simple insert or
update command can still cause data incongruities.
• Less capable of handling diverse and complex queries: Databases
designers build and optimize star schemas for specific analytical needs.
As denormalized data sets, they work best with a relatively narrow set of
simple queries. Comparatively, a normalized schema permits a far wider
variety of more complex analytical queries.
• No Many-to-Many Relationships: Because they offer a simple dimension
schema, star schemas don’t work well for “many-to-many data relationships”
Example 1: Suppose a star schema is composed of a Sales fact table as shown in
Figure 3a and several dimension tables connected to it for Time, Branch, Item and
Location.
Fact Table
Sales is the Fact table.
Dimension Tables
The Time table has a column for each day, month, quarter, year etc..
The Item table has columns for each item_key, item_name, brand, type and
supplier_type.
The Branch table has columns for each branch_key, branch_name and branch_
type.
40
The Location table has columns of geographic data, including street, city, state, Dimensional Modeling
and country. Unit_Sold and Dollars_Sold are the Measures.
41
Data Warehouse Dimension Tables
Fundamentals
and Architecture The Store table consists of columns like store_id store_address, city, region, state
and country.
Customer table has columns for each product_id, product_time and product_type.
Sales_Type includes sales_type_id and type_name columns.
Product table consists of product_id, product_name and product_type.
Time table consists of columns like time_id, action_date, action_week, action_
month, action_year and action_ weekday.
Measures may be amount spent and no. of items bought.
43
Data Warehouse
Fundamentals
and Architecture
46
Check Your Progress 2 Dimensional Modeling
3.12 SUMMARY
This unit presented the basic designing of data warehouse. These topics are more
focused on the various kind of modeling and schemas. It explored the grains, facts,
and dimensions of the schemas. It is important to know about the dimensional
modeling .as the appropriate modeling technique would yield the correct respond
the queries.
A dimensional modeling is a kind of data structure used to optimize design of Data
warehouse for the query retrieval operations. There are various schema designs.
Here, it discussed star, snowflake, and fact constellations. From denormalized to
normalized schemas uses dimension, fact, derived and aggregate fact table. Every
table has some purpose and used for efficient designing in terms of space and query
handling. This unit discusses the pros and cons of every tables. The number of
examples used to explain the designing in different scenarios.
3.13 SOLUTIONS/ANSWERS
Check Your Progress 1:
1) Characteristics of Star Schema:
• very dimension in a star schema is represented with only one-
E
dimension table.
• The dimension table should contain the set of attributes.
• The dimension table is joined to the fact table using a foreign key
• The dimension table are not joined to each other
• Fact table would contain key and measure
• The Star schema is easy to understand and provides optimal disk
usage.
• he dimension tables are not normalized. For instance, in the above
T
figure, Country ID does not have Country lookup table as an OLTP
design would have.
• The schema is widely supported by BI Tools
50
Dimensional Modeling
2)
52