Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
46 views

Unit 3

Uploaded by

Ancelia Patrao
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views

Unit 3

Uploaded by

Ancelia Patrao
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

UNIT 3 DIMENSIONAL MODELING

Structure
3.0 Introduction
3.1 Objectives
3.2 Dimensional Modeling
3.2.1 Strengths of Dimensional Modeling
3.3 Identifying Facts and Dimensions
3.4 Star Schema
3.4.1 Features of Star Schema
3.5 Advantages and Disadvantages of Star Schema
3.6 Snowflake Schema
3.6.1 Features of Snowflake Schema
3.7 Advantages and Disadvantages of Snowflake Schema
3.7.1 Star Schema Vs Snowflake Schema
3.8 Fact Constellation Schema
3.8.1 Advantages and Disadvantages of Fact Constellation Schema
3.9 Aggregate Tables
3.10 Need for Building Aggregate Fact Tables
Limitations of Aggregate Fact Tables
3.11 Aggregate Fact Tables and Derived Dimension Tables
3.12 Summary
3.13 Solutions/Answers
3.14 Further Readings

3.0 INTRODUCTION
In the earlier unit, we had studied about the Data Warehouse Architecture and
Data Marts. In this unit let us focus on the modeling aspects. In this unit we will
go through the dimensional modeling, star schema, snowflake schema, aggregate
tables and Fact constellation schema.

3.1 OBJECTIVES
After going through this unit, you shall be able to:
• understand the purpose of dimension modeling;
• identifying the measures, facts, and dimensions;
• discuss the fact and dimension tables and their pros and cons;
• discuss the Star and Snowflake schemas;
• explore comparative analysis of star and snowflake schema;
• describe Aggregate facts, fact constellation, and
• discuss various examples of star and snowflake schema.
Data Warehouse
Fundamentals 3.2 DIMENSIONAL MODELING
and Architecture
Dimensional modeling is a data model design adopted when building a data
warehouse. Simply, it can be understood that dimension modeling reduces the
response time of query fired unlike relational systems. The concept behind
dimensional modeling is all about the conceptual design. Firstly let’s see the
introduction to dimensional modeling and how it is different from a traditional
data model design. A data model is a representation of how data is stored in a
database and it is usually a diagram of the few tables and the relationships that
exist between them. This modeling is designed to read, summarize and compute
some numeric data from a data warehouse. A data warehouse is an example of a
system that requires small number of large tables. This is due to many users using
the application to read lot of data a characteristic of a data warehouse is to write the
data once and read it many times over so it is the read operation that is dominant in
a data warehouse. Now let's look at the data warehouse containing customer related
information in a single table this makes it a lot easier for analytics just to count the
number of customers by country but this time the use of tables in the data warehouse
simplify the query processing. The main objective of dimension modeling is to
provide an easy architecture for the end user to write queries and also, to reduce
the number of relationships between the tables and dimensions hence providing
efficient query handling.
Dimensional modeling populates data in a cube as a logical representation with
OLAP data management. The concept was developed by Ralph Kimball. It has
“fact” and “dimension” as its two important measure. The transaction record is
divided into either “facts”, which consists of business numerical transaction data,
or “dimensions”, which are the reference information that gives context to the facts.
The more detail about fact and dimension is explained in the subsequent sections.
The main objective of dimension modeling is to provide an easy architecture for the
end user to write queries. Also it will reduce the number of relationships between
the tables and dimensions, hence providing efficient query handling.
The following are the steps in Dimension Modeling as shown in figure1.
1. Identify Business Process
2. Identify Grain (level of detail)
3. Identify dimensions and attributes
5. Build Schema
The model should describe the Why, How much, When/Where/Who and What of
your business process.

36
Dimensional Modeling

Figure 1: Steps in Dimension Modeling

Step 1: Identify the Business Objectives


Selection of the right business process to build a data warehouse and identifying the
business objectives is the first step in dimension modeling. This is very important
step otherwise this can lead to repeated process and software defects.
Step 2: Identifying Granularity
The grain literally means each minute detail of the business problem. This is
decomposing of the large and complex problem into the lowest level information.
For example, if there is some data month-wise. So, the table would contain details of
all the months in a year. It depends on the report to be submitted to the management.
This affects the size of the data warehouse.
Step 3: Identifying Dimensions and attributes
The dimensions of the data warehouse can be understood by the entities of the
database. like, items, products, date, stocks, time etc. The identification of the
primary keys and the foreign keys specifications all are described here.
Step 4: Build the Schema
The database structure or arrangement of columns in a database table, decides the
schema. There are various popular schemas like, star, snowflake, fact constellation
schemas - summarizing, from the selection of business process to identifying
each and every finest level of detail of the business transactions. Identifying the
significant dimensions and attributes would help to build the schema.
3.2.1 Strengths of Dimensional Modeling
Following are some of the strengths of Dimensional Modeling:
• It provides the simplicity of architecture or schema to understand and handle
various stakeholders from warehouse designers to business clients.
• It reduces the number of relationships between different data elements.
• It promotes data quality by enforcing foreign key constraints as a form of
referential integrity check on a data warehouse. The dimensional modeling 37
Data Warehouse helps the database administrators to maintain the reliability of the data.
Fundamentals
and Architecture • The aggregate functions used in the schemas optimize the query performance
posted by the customers. Since data warehouse size keeps on increasing
and with this increased size, the optimization becomes the concern which
dimension modeling makes it easy.

3.3 IDENTIFYING FACTS AND DIMENSIONS


We have studied the steps of dimension modeling in the previous section. The last
step narrated is to build the schema. So, let’s see the elementary measures to build
a schema.
Facts and Fact table: A fact is an event. It is a measure which represents business
items or transactions of items having association and context data. The Fact table
contains the description of all the primary keys of all the tables used in the business
processes which acts as a foreign key in the fact table. It also has an aggregate
function to compute the business process on some entity. It is a numeric attribute
of a fact, representing the performance or behavior of the business relative to the
dimensions. The number of columns in the fact table is less than the dimension
table. It is more normalized form.
Dimensions and Dimension table: It is a collection of data which describe one
business dimension. Dimensions decide the contextual background for the facts,
and they are the framework over which OLAP is performed. Dimension tables
establish the context of the facts. The table stores fields that describe the facts. The
data in the table are in de normalized form. So, it contains large number of columns
as compared to fact table. The attributes in a dimension table are used as row and
column headings in a document or query results display.
Example: In the example of student registration case study to any particular course
can have attributes like student_id, course_id, program_id, date_of_registration,
fee_id in fact table. Course summary can have course name, duration of the course
etc. Student information can contain the personal details about the student like
name, address, contact details etc.
Student Registration
Fact Table (student_id, course_id, program_id, date_of_registration, fee_id)
Measure: Sum (Fee_amount))
Dimension Tables (Student_details,
Course_details
Program_details,
Fee_details,
Date)

3.4 STAR SCHEMA


There are three basic popular models which are used for dimensional modeling:
• Star Model
• Snowflake Model
• Fact Constellation Schema
38
Star Schema: It represents the multidimensional model. In this model the data is Dimensional Modeling
organized into facts and dimensions. The star model is the underlying structure for
a dimensional model. It has one broad central table (fact table) and a set of smaller
tables (dimensions) arranged in a star design. This design is logically shown in the
below figure 2.

Figure 2 : Star Schema

3.4.1 Features of Star Schema


• The data is in denormalized database.
• It provides quick query response
• Star schema is flexible can be changed or added easily.
• It reduces the complexity of metadata for developers and end users.

3.5 
ADVANTAGES AND DISADVANTAGES OF STAR
SCHEMA
3.5.1 Advantages of Star Schema
Star schemas are easy for end users and applications to understand and navigate.
With a well-designed schema, users can quickly analyze large, multidimensional
data sets. The main advantages of star schemas in a decision-support environment
are:
• Query performance
Because a star schema database has a small number of tables and clear join paths,
queries run faster than they do against an OLTP system. Small single-table queries,
usually of dimension tables, are almost instantaneous. Large join queries that
involve multiple tables take only seconds or minutes to run.
In a star schema database design, the dimensions are linked only through the central
fact table. When two dimension tables are used in a query, only one join path,
intersecting the fact table, exists between those two tables. This design feature
enforces accurate and consistent query results.

39
Data Warehouse • Load performance and administration
Fundamentals
and Architecture Structural simplicity also reduces the time required to load large batches of data
into a star schema database. By defining facts and dimensions and separating them
into different tables, the impact of a load operation is reduced. Dimension tables
can be populated once and occasionally refreshed. You can add new facts regularly
and selectively by appending records to a fact table.
• Built-in referential integrity
A star schema has referential integrity built in when data is loaded. Referential
integrity is enforced because each record in a dimension table has a unique primary
key, and all keys in the fact tables are legitimate foreign keys drawn from the
dimension tables. A record in the fact table that is not related correctly to a dimension
cannot be given the correct key value to be retrieved.
• Easily understood
A star schema is easy to understand and navigate, with dimensions joined only
through the fact table. These joins are more significant to the end user, because they
represent the fundamental relationship between parts of the underlying business.
Users can also browse dimension table attributes before constructing a query.
3.5.2 Disadvantages of Star Schema
As mentioned before, improving read queries and analysis in a star schema could
involve certain challenges:
• Decreased data integrity: Because of the denormalized data structure, star
schemas do not enforce data integrity very well. Although star schemas use
countermeasures to prevent anomalies from developing, a simple insert or
update command can still cause data incongruities.
• Less capable of handling diverse and complex queries: Databases
designers build and optimize star schemas for specific analytical needs.
As denormalized data sets, they work best with a relatively narrow set of
simple queries. Comparatively, a normalized schema permits a far wider
variety of more complex analytical queries.
• No Many-to-Many Relationships: Because they offer a simple dimension
schema, star schemas don’t work well for “many-to-many data relationships”
Example 1: Suppose a star schema is composed of a Sales fact table as shown in
Figure 3a and several dimension tables connected to it for Time, Branch, Item and
Location.
Fact Table
Sales is the Fact table.
Dimension Tables
The Time table has a column for each day, month, quarter, year etc..
The Item table has columns for each item_key, item_name, brand, type and
supplier_type.
The Branch table has columns for each branch_key, branch_name and branch_
type.
40
The Location table has columns of geographic data, including street, city, state, Dimensional Modeling
and country. Unit_Sold and Dollars_Sold are the Measures.

Figure 3a: Example of Star Schema

The measures may be unit_sold and dollars_sold.


Example 2:
The star schema works by dividing data into measurements and the “who, what,
where, when, why, and how” descriptive context. Broadly, these two groups are
facts and dimensions.
By doing this, the star schema methodology allows the business user to restructure
their transactional database into smaller tables that are easier to fit together. Fact
tables are then linked to their associated dimension tables with primary or foreign
key relationships. An example of this would be a quick grocery store purchase. The
amount you spent and how many items you bought would be considered a fact,
but what you bought, when you bought it and the specific grocery store’s location
would all be considered dimensions.
Once these two groups have been established, we can connect them by the unique
transaction number associated with your specific purchase. An important note is
that each fact, or measurement, will be associated with multiple dimensions. This
is what forms the star shape, the fact in the center, and dimensions drawing out
around it. Dimensions relating to the grocery store, the products you bought, and
descriptions about you as their customer will be carefully separated into its table
with its attributes.
This example is modeled as shown below and star schema for this is depicted in
Figure 3b.
Fact Table
Sales is the Fact Table.

41
Data Warehouse Dimension Tables
Fundamentals
and Architecture The Store table consists of columns like store_id store_address, city, region, state
and country.
Customer table has columns for each product_id, product_time and product_type.
Sales_Type includes sales_type_id and type_name columns.
Product table consists of product_id, product_name and product_type.
Time table consists of columns like time_id, action_date, action_week, action_
month, action_year and action_ weekday.
Measures may be amount spent and no. of items bought.

Figure 3b: Example of Star Schema

 Check Your Progress 1


1) Discuss the characteristics of star schema?
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………...
2) Draw a Star Schema for a marketing employee staying in a NewYork city of
the country USA. He buys products and wants to compute the total product
sold and how much sales done?
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
42 ……………………………………………………………………………...
Dimensional Modeling
3.6 SNOWFLAKE SCHEMA
The other popular modeling technique is Snowflake Schema. You can understand
the term flakes as chocolate flakes on the pastry and ice-creams. These flakes add
additional tastes to the chocolate. Similarly, snowflake schema is the extension of
star schema which adds more dimensions to give more meaning to the logical view
of the database. These additional tables are more normalized than star schema. The
arrangement of data is like that the centralized fact table relates to multiple related
dimensional tables. This can become more complex if the dimensions are more
detailed and at multiple levels. In the conceptual hierarchy child table has multiple
parent tables. You must keep in mind that we are just extending or flaking the
dimension tables not the fact tables.
Snowflake Model
The snowflake model is the conclusion of decomposing one or more of the
dimensions. Snowflake Schema in data warehouse is a logical arrangement
of tables in a multidimensional database such that the ER diagram resembles a
snowflake shape. A Snowflake Schema is an extension of a Star Schema, and it
adds additional dimensions. The dimension tables are normalized which splits data
into additional tables.
In the following Snowflake Schema example, Country is further normalized into
an individual table.
3.6.1 Features of Snowflake Schema
Following are the important features of snowflake schema:
1. It has normalized tables
2. Occupy less disk space.
3. It requires more lookup time as many tables are interconnected and
extending dimensions.
Example
In the figure 4, the snowflake schema is shown of a case study of customers, sales,
products, location wise quantity sold, and number of items sold are calculated. The
customers, products, date, store are saved in the fact table with their respective
primary keys acting in fact table as a foreign key.
You will observe that the two aggregate functions can be applied to calculate
quantity sold and amount sold. Further, the some dimensions are extended to the
type of customer and also store information territory wise too. Note, date has been
expanded into date, month, year. This schema will give you more opportunity to
perform query handling in detail.

43
Data Warehouse
Fundamentals
and Architecture

Figure 4: Snowflake Schema

3.7 ADVANTAGES AND DISADVANTAGES OF


SNOWFLAKE SCHEMA
Following are the advantages of Snowflake schema:
• A Snowflake schema occupies a much smaller amount of disk space
compared to the Star schema. Lesser disk space means more convenience
and less hassle.
• Snowflake schema of small protection from various Data integrity issues. Most
people tend to prefer the Snowflake schema because of how safe if it is.
• Data is easy to maintain and more structured.
• Data quality is better than star schema.
Disadvantages of Snowflake Schema
• Complex data schemas: As you might imagine, snowflake schemas create
many levels of complexity while normalizing the attributes of a star schema.
This complexity results in more complicated source query joins. In offering
a more efficient way to store data, snowflake can result in performance
declines while browsing these complex joins. Still, processing technology
advancements have resulted in improved snowflake schema query
performance in recent years, which is one of the reasons why snowflake
schemas are rising in popularity.
• Slower at processing cube data: In a snowflake schema, the complex joins
result in slower cube data processing. The star schema is generally better
for cube data processing.
• Lower data integrity levels: While snowflake schemas offer greater
normalization and fewer risks of data corruption after performing UPDATE
and INSERT commands, they do not provide the level of transnational
44 assurance that comes with a traditional, highly-normalized database
structure. Therefore, when loading data into a snowflake schema, it's vital to Dimensional Modeling
be careful and double-check the quality of information post-loading.
3.7.1 Star Schema Vs Snowflake Schema
Following are the differences between Star and Snowflake schema.
Features Star Schema Snowflake Schema
Normalized The dimension tables in star This schema has normalized
Dimension schema are not normalized so dimension tables
Tables they may contain redundancies
Queries The execution of queries is The execution of snowflake
relatively faster as there are schema complex queries is
less joins needed in forming a slower than star schema as
query. many joins and foreign key
relations are needed to form
a query. Thus performance is
affected.
Performance Star schema model has faster It has slow performance as
execution and response time compared to star schema
Storage Space This type of schema requires Snowflake schema tables
more storage space as are easy to maintain and
compared to snowflake due to save storage space due to
unnormalised tables. normalized tables.
Usage Star schema is preferred when If the dimension table contains
the dimension tables have large number of rows,
lesser rows snowflake schema is preferred
Type of DW This schema is suitable for 1:1 It is used for complex
or 1: many relationships such relationships such as many:
as data marts. many in enterprise Data
warehouses.
Dimension Star schema has a single table Snowflake schema may have
Tables for each dimension more than one dimension table
for each dimension.

3.8 FACT CONSTELLATION SCHEMA


There is another schema for representing a multidimensional model. This term fact
constellation is like the galaxy of universe containing several stars. It is a collection of
fact schemas having one or more-dimension tables in common as shown in the Figure 5
below. This logical representation is mainly used in designing complex database systems.

Figure 5: Fact Constellation Schema


45
Data Warehouse In the figure 5 it can be observed that there are two fact tables and two-dimension
Fundamentals tables in the pink boxes are the common dimension tables connecting both the star
and Architecture
schemas.
For example, if we are designing a fact constellation schema for Placement and
Workshop in a University consider,
Fact tables
Placement (Stud_roll, Company_id, TPO_id), need to calculate the number of
students eligible and number of students placed.
Workshop (Stud_roll, Institute_id, TPO_id) need to find out the facts about number
of students selected, number of students attended the workshop)
So, there are two fact tables namely, Placement and Workshop which are part of
two different star schemas having:
i) dimension tables – Company, Student and TPO in Star schema with fact
table Placement and
ii) dimension tables – Training Institute, Student and TPO in Star schema with
fact table Workshop.
Both the star schema has two-dimension tables common and hence, forming a fact
constellation or galaxy schema as shown in figure 6.

Figure 6: Fact Constellation of placement and workshop

3.8.1 Advantages and Disadvantages of Fact Constellation Schema


Advantage
This schema is more flexible and gives wider perspective about the data warehouse
system.
Disadvantage
As, this schema is connecting two or more facts to form a constellation. This kind
of structure makes it complex to implement and maintain.

46
 Check Your Progress 2 Dimensional Modeling

1. Compare and contrast Star schema with Snowflake Schema?


……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………...
2. Suppose that a data warehouse consists of dimensions time, doctor, ward
and patient, and the two measures count and charge, where charge is the
fee that a doctor charges a patient for a visit. Enumerate three classes of
schemes that are popularly used for modeling.
a) Draw a Star Schema diagram
b) Draw a Snowflake Schema diagram.
……………………………………………………………………………..
…………………………………………………………………………..…
……………………………………………………………………………...

3.9 AGGREGATE TABLES


Since, in the data warehouse the data is stored in multidimensional cube. In the
information technology industry, there are various tools available to process the
queries posted on the data warehouse engine. These tools are called business
intelligence (BI) tools. These tools help to answer the complex queries and to take
decisions. Aggregate word is very similar to the aggregation of the database schemas
of relational tables that you must be familiar with. Aggregate fact tables roll up
the basic fact tables of the schema to improve the query processing. The business
tools smoothly select the level of aggregation to improve the query performance.
Aggregate fact tables contain foreign keys referring to dimension tables.
Points to note about Aggregate tables:
1) It is also called summary tables.
2) It contains pre-computed queries of the data warehouse schema.
3) It reduces the dimensionality of the base fact tables.
4) It can be used to respond to the queries of the dimensions that are saved.

3.10 NEED FOR BUILDING AGGREGATE FACT TABLES


Let us understand the need of building aggregate table. Aggregate tables also
referred to pre-computed tables having partially summarized data.
• Simply putting in one word, it’s about speed or quick response to queries.
This you can understand as an intermediate table which stores the results of
the queries on I/O disk space. It uses aggregates functionality.
For example, there is a company ABC corporation limited which takes
orders online and it there are millions of customer transactions placing
orders. So, the dimension tables for the company could be Customer,
Product and Order_date. In the fact table it maintains all the orders placed
say, Fact_Orders. To generate a report of monthly orders by product type 47
Data Warehouse and by a particular region. It needs aggregates which are summary tables
Fundamentals can be obtained by Groupby SQL query.
and Architecture
• It occupies less space than atomic fact tables. It nearly takes the half time of
a general query processing.
• One of the more popular uses of aggregates is to adjust the granularity
of a dimension. When the granularity of a dimension is changed, the fact
table must be partially summarized to match the current grain of the new
dimension, resulting in the creation of new dimensional and fact tables that
fit this new grain standard.
• The Roll-up OLAP operation of the base fact tables generates aggregate
tables. Hence the query performance increases as it reduces the number of
rows to be accessed for the retrieval of data of a query.

3.11 AGGREGATE FACT TABLE AND DERIVED


DIMENSION TABLES
Aggregate facts are produced by calculating measures from more atomic fact tables.
These tables contain computational SQL aggregate functions like AVERAGE,
MIN, MAX, COUNT etc. It also contains function that helps to find output using
group by. The aggregate fact tables produce summary statistics. Whenever, the
speedy query handling is required the aggregate fact tables is the best option.
• Basically, aggregates allow you to store the intermediate results or pre-
calculate the subqueries or queries fired on a data warehouse by summing
data up to higher levels and storing them in a separate star.
• You can understand aggregate fact tables as the conformed copy of the fact
table as it should provide you the same result of the query as the detailed
fact table.
• This aggregate fact tables can be used in the case of large datasets or when
there are large number of queries. It reduces the response time of the
queries fired by users or customers. It is very useful in business intelligence
application tools.
When you have complicated questions of multiple facts in multiple tables that are
stored at different levels from one another, and when a reporting request includes
yet another level, the levels at which facts are stored become even more relevant.
You must be able to meet users' need for fact reporting at the business level. There's
nothing wrong with improving the overall intelligence.
The levels at which facts are stored become especially important when you begin to
have complex queries with multiple facts in multiple tables that are stored at levels
different from one another, and when a reporting request involves still a different
level. You must be able to support fact reporting at the business levels which users
require. There is nothing wrong with enhancing an aggregate with new facts or
deriving new dimension. For measures, the only issue is if the new measures are
atomic in the context of the aggregate fact. If, however, the new measures are
received at a lower grain, you would be better off creating a new atomic fact for
those measures prior to incorporating summarized measures into the aggregate.
This would allow the new measures to be used for other purposes without having
48
to go back to the source.
Let's say we have a fact table: FactBillReciept has monthly transactions. There can Dimensional Modeling
be different types of transaction receipts during a month for each supplier. This
huge data would result in lot of calculations. So, we would build another aggregate
table which is derived of base table.
FactBillMonthReceipt: It contains aggregated receipts per month, per supplier. But
the problem is it has additional foreign keys like supplier_status for the month.
To solve this, we have the concept of derived tables which contains additional
measures and foreign keys that are not present in the base fact table.
Conformed Dimension
A conformed dimension is the dimension that is shared across multiple data mart
or subject area. An organization may use the same dimension table across different
projects without making any changes to the dimension tables.
Derived Tables
It is the significant addition to the Data Warehouse. Derived tables are used to
create a second-level data marts for cross functional analysis.
Consolidated Fact tables: It is the fact table which has data from different fact
tables used to form a schema with a common grain.
For example, to design a Sales department Data Warehouse schema assuming there
are following entities and respective grains in them.
Sales: Employee, date, and product.
Budget: Department, Financial Year, Quarter-wise
Product can have various attributes like, product size, product _category etc..
One thing to notice here is that the product attributes keep on changing as per
the requirements, but product dimension remains the same. So, it is better to keep
Product as a separate dimension.
Let’s design the tables and its grains.

Figure 7: Aggregate Tables and Derived tables


49
Data Warehouse The derived tables are very useful in terms of putting fewer loads on the Data
Fundamentals Warehouse engine for calculation.
and Architecture
 Check Your Progress 3
1. Discuss the limitations of Aggregate Fact tables.
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………...

3.12 SUMMARY
This unit presented the basic designing of data warehouse. These topics are more
focused on the various kind of modeling and schemas. It explored the grains, facts,
and dimensions of the schemas. It is important to know about the dimensional
modeling .as the appropriate modeling technique would yield the correct respond
the queries.
A dimensional modeling is a kind of data structure used to optimize design of Data
warehouse for the query retrieval operations. There are various schema designs.
Here, it discussed star, snowflake, and fact constellations. From denormalized to
normalized schemas uses dimension, fact, derived and aggregate fact table. Every
table has some purpose and used for efficient designing in terms of space and query
handling. This unit discusses the pros and cons of every tables. The number of
examples used to explain the designing in different scenarios.

3.13 SOLUTIONS/ANSWERS
Check Your Progress 1:
1) Characteristics of Star Schema:
•  very dimension in a star schema is represented with only one-
E
dimension table.
• The dimension table should contain the set of attributes.
• The dimension table is joined to the fact table using a foreign key
• The dimension table are not joined to each other
• Fact table would contain key and measure
• The Star schema is easy to understand and provides optimal disk
usage.
•  he dimension tables are not normalized. For instance, in the above
T
figure, Country ID does not have Country lookup table as an OLTP
design would have.
• The schema is widely supported by BI Tools

50
Dimensional Modeling
2)

Figure 8: Star Schema

 Check Your Progress 2:


1:
Star Schema Snowflake Schema
It is a logical arrangement of one factIt is a logical arrangement of one fact
table surrounded by other dimension table with dimension tables and further
tables like a star. dimension tables are normalized to
other dimensions
It requires a single join SQL command It requires many joins SQL command to
to fetch the data fetch the data
Simple Database design and respond to Complex database design and respond
query time is very less time to queries is high
The data is not normalized. High level The data is normalized so low level of
of redundancy redundancy.
2: a. Star Schema of Hospital Management

Figure 9 : Fact Schema of Hospital Management System 51


Data Warehouse b. Snowflake Schema of Hospital Management
Fundamentals
and Architecture

Figure 10: Snowflake Schema of Hospital Management System

 Check Your Progress 3:


1.
Limitations of Aggregate fact tables: Aggregate tables take lot of time to scan the
rows of the base fact table. So, there will be more tables to manage. The size of
aggregates in computing can be costly. Based on the greedy approach the size of
aggregates is decided using hashing technique. If there are n dimensions in the
table, then there can be 2n possible aggregates. The load on the data warehouse
becomes more complex.

3.14 FURTHER READINGS


• Building the Data Warehouse, William H. Inmon, Wiley, 4th Edition, 2005.
• Data Warehousing Fundamentals, Paulraj Ponnaiah, Wiley Student Edition
• Data Warehousing, Reema Thareja, Oxford University Press.
• Data Warehousing, Data Mining & OLAP, Alex Berson and Stephen
J.Smith, Tata McGraw – Hill Edition, 2016.

52

You might also like