Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Dimensional Modeling and Schemas: Data Modeling Research Paper

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Dimensional Modeling and Schemas

Data Modeling Research Paper

Report By,
Rajasree

Abstract
The technique used for data modeling depends on the resultant data model objective. Data model helps us to clearly identify the objective of the project. Dimensional modeling is a modeling technique generally used for data warehouses and data mining. As we know data warehouse constitutes many analysis and queries, having data models help in better understanding.

Introduction
Dimensional data model is mostly used in data warehousing systems. This is different from entity relationship model. In dimensional modeling, facts and dimensions are used. The terms commonly used in this type of modeling: Dimension: A category of information. For example: Time is a dimension. Attribute: A unique level within a dimension. For example: Month is an attribute in the Time Dimension. Hierarchy: These are levels that represent relationship between different attributes within a dimension. For example: Hierarchy in the time dimension can be represented as: Year Quarter Month Day.

Dimensional Model Tables


Fact Table: It is a table which generally represents numbers or measures. For example, sales_amount can be said as measure. Generally, this number is recorded in the fact table with granularity. Lets say, sales_amount can be viewed with respect to a store that is sales_amount of each store or sales_amount per day. So, in this case there can be three attributes in the table: date (to represent each day sales), store (to represent each stores sale) and sales_amount (this is the actual measure)

Lookup Table: This table provides detailed information about the attributes present in the table. For example, consider a bank model. In bank entity there can be branch attribute which includes list of many branches associated with that particular bank. Each branch many fields representing a unique branch code and other fields includes details. A dimensional model will have fact and lookup tables. A single fact table can have relationship with many lookup tables. But two fact tables cannot be connected directly with each other.

Dimensional Model Schemas


A schema is a collection of database objects, including tables, views and indexes. Schema objects can be arranged in different ways in the schema models. The most common schema types that are used for designing data models for data warehouse and data marts are star schema and snow flake schema. Third normal form schema is also one of the schema used for data warehousing. And there are many other schemas used for data warehouse. Based on the requirements of data warehouse project, schema model is selected. Below are the descriptions of different type of schemas. Third Normal Form It is similar to the classical relational database modeling. As we know that, normalization process reduces the redundancy of the data. In this schema also, there will be no redundant data. And there will be very many tables due to normalization process. For example, in orders and order items tables contain similar information as sales table in the star schema.

3NF schemas are generally used for modeling of large data warehouses, especially environments with significant data-loading requirements that are used to feed data marts and execute long-running queries. The main advantages of 3NF schemas: It provides a neutral schema design It is independent of any application or data-usage considerations May require less data-transformation Highly normalized

The main disadvantage of 3NF schemas is it includes many tables. Star Schema It is the simplest data warehouse schema. This name is given as the schema looks like a star. Generally, a fact table is located in the center and it is connected to many dimensional or lookup tables. Fact table contains the attributes with primary information. Look up table contains the details of the attributes present in the fact table. There can be many dimensional or lookup tables to single fact table. There will be no direct relationship between dimensional of lookup table. The diagram below represents the model of a star schema.

For example, in the sample schema, the fact table, sales, contain the measures quantity_sold, amount, and average, and the keys time_key, item-key, branch_key, and location_key. The dimension tables are time, branch, item and location.

Advantages of star schema: Provide a direct and intuitive mapping between the business entities being analyzed by end users and the schema design. Provide highly optimized performance for typical star queries. Are widely supported by a large number of business intelligence tools, which may anticipate OR even require that the data-warehouse schema contains dimension tables. Disadvantages of Star Schema: Occupies more space Highly Denormalized

Snowflake Schema This schema is a complex model when compared a star schema, and is a type of star schema. The name of the schema is snowflake because the diagram of the schema looks like snowflake. Snowflake schemas remove redundancy by normalizing the dimensions. It means, similar to third normal form, there will be use of multiple tables instead of single table to eliminate redundancy. This is done by dividing one single large table into multiple dimensional tables. Here, there can be more than one dimensional tables linked to single fact table similar to star schema. Unlike star schema, here there can be direct relationship between dimensional tables.

The diagram below represents the snowflake schema model:

For example, a location dimension table that is shown in star schema can be normalized by dividing the table into location table and city table as shown in below snowflake schema. This saves space even though it increases the number of dimension tables. It requires more complex queries.

Advantages of snowflake Schema:


In some cases may improve performance because smaller tables are joined It is easier to maintain Increases flexibility

Disadvantages Snow flake Schema: Increases the number of tables Makes the queries complex because more tables need to be joined

Comparison between Star and Snow flake Schemas: Star Schema Joins: Fewer Joins (As there will be no links between dimension tables) Ease of Use: Less complex queries and easy to understand (As it includes only few dimensional tables) Query Performance: Less no. of foreign keys and hence lesser query execution time (Because of only single level of dimension table) Ease of maintenance/change: Type of Datawarehouse: Has redundant data and hence less easy to maintain/change (Lacks normalization) Good for large datawarehouses (As it is simple to understand) Dimension table: Contains only single dimension table for each dimension Snowflake Schema Higher number of Joins (As there will be links between dimension tables) More complex queries and hence less easy to understand (As it includes normalized dimensional tables which increases number of tables) More foreign keys-and hence more query execution time (Because of multiple level of dimension table and normalization) No redundancy and hence more easy to maintain and change (includes normalization) Good to use for small datawarehouses/datamarts (As it is complex) It may have more than one dimension table for each dimension

Other Types of Schemas


Fact Constellation Schema This Schema is used mainly for the aggregate fact tables, or where we want to split a fact table for better comprehension. The split of fact table is done only when we want to focus on aggregation over few facts & dimensions. This schema can have multiple fact tables and a single dimensional table can be shared by more than one fact table.

As shown in figure, Sales Fact table has a location_key attribute and Shipping Fact table have from_location and to_location attributes. The attributes from both facts tables can be looked up in the location dimensional table. As these attributes present in fact table are explained in the location table.

The main disadvantage of this schema is that it is complicated model as it includes many aggregations. Starflake schemas A starflake schema is a hybrid of both star schema and snowflake schema. This schema takes benefit of both star schema and snowflake schema. Here some dimensional tables are normalized and some are de-normalized. To normalize the schema, the shared dimensional hierarchies are placed in outriggers. The model of the starflake is shown below:

Example:

As we can see here, sales transaction is a fact table. Locations, Products and Time are dimensional tables which represents star schema. The other dimensional tables represent the snow flake schema. Galaxy Schema Galaxy schema contains many fact tables with some common dimensions. This schema is a combination of many data marts.

Dimensional Modeling Process


There is process for designing a dimensional model schema. The process steps are described as below: Choose the Business Process: Here the business process on which the schema model is being built should be mentioned. This helps in ensuring the actual usability of the dimensional data model and the use of data warehouse. Declaring the Grain: This focuses on the describing the main purpose or goal of the model. Identify the dimensions: Identifying the dimension is very important as it is the foundation of fact table. The examples of dimension can be time, date etc. Here the detailed information of the attributes of fact table is present. Identify the facts: Facts are numeric value or it can be said as measure. Identifying facts is important as it is closely related to the business users of the system.

Advantages of Dimensional Model


It has standard type of joins. All dimensions have symmetrically equal entry points into the fact table and also query strategies are symmetrical. This model is flexible as it can easily accommodate new data elements and design strategies. Dimensional model can easily handle common modeling situations such as slowly changing dimensions and Heterogeneous products in the business with standard approaches. Provides better understanding. Because of de-normalization, there is high query performance.

Conclusion
Dimensional model is used to map the aspects of each process within the business. Schemas that are modeled according to dimensional modeling principles are very beneficial and help to read large amount of data quickly and easily. This enables to query easily and also ease the analyzing of data.

References:
[1] http://www.executionmih.com/data-warehouse/star-snowflake-schema.php [2] http://samsudeenb.blogspot.com/2006/12/dimension-data-modelling.html [3] http://publib.boulder.ibm.com/infocenter/rdahelp/v7r5/index.jsp?topic=%2Fcom.ibm.data tools.dimensional.ui.doc%2Ftopics%2Fc_dm_dimschemas.html [4] http://en.wikipedia.org/wiki/Dimensional_modeling

You might also like