Dimensional Modeling and Schemas: Data Modeling Research Paper
Dimensional Modeling and Schemas: Data Modeling Research Paper
Dimensional Modeling and Schemas: Data Modeling Research Paper
Report By,
Rajasree
Abstract
The technique used for data modeling depends on the resultant data model objective. Data model helps us to clearly identify the objective of the project. Dimensional modeling is a modeling technique generally used for data warehouses and data mining. As we know data warehouse constitutes many analysis and queries, having data models help in better understanding.
Introduction
Dimensional data model is mostly used in data warehousing systems. This is different from entity relationship model. In dimensional modeling, facts and dimensions are used. The terms commonly used in this type of modeling: Dimension: A category of information. For example: Time is a dimension. Attribute: A unique level within a dimension. For example: Month is an attribute in the Time Dimension. Hierarchy: These are levels that represent relationship between different attributes within a dimension. For example: Hierarchy in the time dimension can be represented as: Year Quarter Month Day.
Lookup Table: This table provides detailed information about the attributes present in the table. For example, consider a bank model. In bank entity there can be branch attribute which includes list of many branches associated with that particular bank. Each branch many fields representing a unique branch code and other fields includes details. A dimensional model will have fact and lookup tables. A single fact table can have relationship with many lookup tables. But two fact tables cannot be connected directly with each other.
3NF schemas are generally used for modeling of large data warehouses, especially environments with significant data-loading requirements that are used to feed data marts and execute long-running queries. The main advantages of 3NF schemas: It provides a neutral schema design It is independent of any application or data-usage considerations May require less data-transformation Highly normalized
The main disadvantage of 3NF schemas is it includes many tables. Star Schema It is the simplest data warehouse schema. This name is given as the schema looks like a star. Generally, a fact table is located in the center and it is connected to many dimensional or lookup tables. Fact table contains the attributes with primary information. Look up table contains the details of the attributes present in the fact table. There can be many dimensional or lookup tables to single fact table. There will be no direct relationship between dimensional of lookup table. The diagram below represents the model of a star schema.
For example, in the sample schema, the fact table, sales, contain the measures quantity_sold, amount, and average, and the keys time_key, item-key, branch_key, and location_key. The dimension tables are time, branch, item and location.
Advantages of star schema: Provide a direct and intuitive mapping between the business entities being analyzed by end users and the schema design. Provide highly optimized performance for typical star queries. Are widely supported by a large number of business intelligence tools, which may anticipate OR even require that the data-warehouse schema contains dimension tables. Disadvantages of Star Schema: Occupies more space Highly Denormalized
Snowflake Schema This schema is a complex model when compared a star schema, and is a type of star schema. The name of the schema is snowflake because the diagram of the schema looks like snowflake. Snowflake schemas remove redundancy by normalizing the dimensions. It means, similar to third normal form, there will be use of multiple tables instead of single table to eliminate redundancy. This is done by dividing one single large table into multiple dimensional tables. Here, there can be more than one dimensional tables linked to single fact table similar to star schema. Unlike star schema, here there can be direct relationship between dimensional tables.
For example, a location dimension table that is shown in star schema can be normalized by dividing the table into location table and city table as shown in below snowflake schema. This saves space even though it increases the number of dimension tables. It requires more complex queries.
In some cases may improve performance because smaller tables are joined It is easier to maintain Increases flexibility
Disadvantages Snow flake Schema: Increases the number of tables Makes the queries complex because more tables need to be joined
Comparison between Star and Snow flake Schemas: Star Schema Joins: Fewer Joins (As there will be no links between dimension tables) Ease of Use: Less complex queries and easy to understand (As it includes only few dimensional tables) Query Performance: Less no. of foreign keys and hence lesser query execution time (Because of only single level of dimension table) Ease of maintenance/change: Type of Datawarehouse: Has redundant data and hence less easy to maintain/change (Lacks normalization) Good for large datawarehouses (As it is simple to understand) Dimension table: Contains only single dimension table for each dimension Snowflake Schema Higher number of Joins (As there will be links between dimension tables) More complex queries and hence less easy to understand (As it includes normalized dimensional tables which increases number of tables) More foreign keys-and hence more query execution time (Because of multiple level of dimension table and normalization) No redundancy and hence more easy to maintain and change (includes normalization) Good to use for small datawarehouses/datamarts (As it is complex) It may have more than one dimension table for each dimension
As shown in figure, Sales Fact table has a location_key attribute and Shipping Fact table have from_location and to_location attributes. The attributes from both facts tables can be looked up in the location dimensional table. As these attributes present in fact table are explained in the location table.
The main disadvantage of this schema is that it is complicated model as it includes many aggregations. Starflake schemas A starflake schema is a hybrid of both star schema and snowflake schema. This schema takes benefit of both star schema and snowflake schema. Here some dimensional tables are normalized and some are de-normalized. To normalize the schema, the shared dimensional hierarchies are placed in outriggers. The model of the starflake is shown below:
Example:
As we can see here, sales transaction is a fact table. Locations, Products and Time are dimensional tables which represents star schema. The other dimensional tables represent the snow flake schema. Galaxy Schema Galaxy schema contains many fact tables with some common dimensions. This schema is a combination of many data marts.
Conclusion
Dimensional model is used to map the aspects of each process within the business. Schemas that are modeled according to dimensional modeling principles are very beneficial and help to read large amount of data quickly and easily. This enables to query easily and also ease the analyzing of data.
References:
[1] http://www.executionmih.com/data-warehouse/star-snowflake-schema.php [2] http://samsudeenb.blogspot.com/2006/12/dimension-data-modelling.html [3] http://publib.boulder.ibm.com/infocenter/rdahelp/v7r5/index.jsp?topic=%2Fcom.ibm.data tools.dimensional.ui.doc%2Ftopics%2Fc_dm_dimschemas.html [4] http://en.wikipedia.org/wiki/Dimensional_modeling