Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Business Information Systems Dimensional Analysis Prithwis Mukerjee, Ph.D.
Dimensional Models A denormalized relational model Made up of tables with attributes
Relationships defined by keys and foreign keys Organized for understandability and ease of reporting rather than update
Queried and maintained by SQL or special purpose management tools.
From Relational to Dimensional Relational Model Designed from the perspective of process efficiency Marketing
Sales “Normalised” data structures Entity Relationship Model Used for transactional, or operational systems OLTP : OnLine Transaction Processing Based on data that is  Current
Non Redundant Dimensional Model Designed from the perspective of subject Sales
Customers “De-normalised” data structures in blatant violation of normalisation
Used for analysis of aggregated data OLAP : OnLine Analytical Processing Based on data that is  Historical
May be redundant
ER vs. Dimensional Models One table per entity
Minimize data redundancy
Optimize update
The Transaction Processing Model One fact table for data organization
Maximize understandability
Optimized for retrieval
The data warehousing model
Strengths of the Dimensional Model Predictable, standard framework
Respond well to changes in user reporting needs
Relatively easy to add data without reloading tables
Standard design approaches have been developed
There exist a number of products supporting the dimensional model “ The Data Warehouse Toolkit” by Ralph Kimball & Margy Ross “ The Data Warehouse Lifecycle Toolkit” by Ralph Kimball & Margy Ross
A Transactional Database OrderDetails OrderHeaderID ProductID Amount OrderHeader OrderHeaderID CustomerID OrderDate FreightAmount Products ProductID Description Size Customers CustomerID AddressID Name Addresses AddressID StateID Street States StateID CountryID Desc Countries CountryID Description
A Dimensional Model FactSales CustomerID ProductID TimeID SalesAmount Products ProductID Description Size Subcategory Category Customers CustomerID Name Street State Country Time TimeID Date Month Quarter Year
Extract Transform Load Relational Dimensional Model Process Oriented Subject Oriented Transactional Aggregate Current Historic
Facts & Dimensions There are two main types of objects in a dimensional model Facts  are quantitative measures that we wish to analyse and report on.
Dimensions  contain textual descriptors of the business. They provide  context  for the facts.
Fact & Dimension Tables FACTS Contains two or more foreign keys
Tend to have huge numbers of records
Useful facts tend to be numeric and additive DIMENSIONS Contain text and descriptive information
1 in a 1-M relationship
Generally the source of interesting constraints
Typically contain the attributes for the SQL answer set.
GB Video E-R Diagram Customer #Cust No F Name L Name Ads1 Ads2 City State Zip Tel No CC No Expire Rental #Rental No Date Clerk No Pay Type CC No Expire CC Approval Line #Line No Due Date Return Date OD charge Pay type Requestor of Owner of Video #Video No One-day fee Extra days Weekend Title #Title No Name Vendor No Cost Name for Holder of
GB Video Data Mart Customer CustID Cust No F Name L Name Rental RentalID Rental No Clerk No Store Pay Type Line LineID OD Charge OneDayCharge ExtraDaysCharge WeekendCharge DaysReserved DaysOverdue CustID AddressID RentalId VideoID TitleID RentalDateID DueDateID ReturnDateID Video VideoID Video No Title TitleID TitleNo Name Cost Vendor Name Rental Date RentalDateID SQLDate Day Week Quarter Holiday Due Date DueDateID SQLDate Day Week Quarter Holiday Return Date ReturnDateID SQLDate Day Week Quarter Holiday Address AddressID Adddress1 Address2 City State Zip AreaCode Phone
Fact Table Measurements associated with a specific business process Grain:  level of detail of the table
Process events produce fact records
Facts (attributes) are usually  Numeric
Additive Derived facts included
Foreign (surrogate) keys refer to dimension tables (entities)
Classification values help define subsets
Dimension Tables Entities describing the objects of the process Conformed dimensions cross processes
Attributes are descriptive Text
Numeric  Surrogate keys
Less volatile than facts (1:m with the fact table)
Null entries
Date dimensions
Produce “by” questions

More Related Content

Dimensional Modelling

Editor's Notes

  1. A simplistic transactional schema showing 7 tables relating to sales orders
  2. This is a star schema, (later on we will discuss snowflake schemas.) showing 4 tables that relate to the previous transactional schema State and Country have been denormalized under Customer Dimensions are in Blue These are the things that we analyse “by” (eg. By Time, By Customer, By Region) Fact is yellow These are ususally quantitative things that we are interested in
  3. We already have the data in a data model – why create another data model…? Well… What is currently called “Data Warehousing” or “Business Intelligence” was originally often called “Decision Support Systems” We already have all the data in the OLTP system, why replicate it in a dimensional model? Atomic - Summary Supports Transaction throughput – Supports Aggregate queries Current - Historic
  4. Facts work best if they are additive Dimensions allow us to “slice & dice” the facts into meaningful groups. The provide context
  5. Designing the Perfect Data Warehouse (the paper formerly known as: Data Modeling for Data Warehouses), Frank McGuff , http://members.aol.com/fmcguff/dwmodel/
  6. There are some changes where it is valid to overwrite history. When someone gets married and changes their name, they may want to carry the history of their previous purchases over to their new name rather than see a split history.
  7. This makes inserts into your fact table more expensive as you always need to match on the effective dates as well as the business key. Sometimes people kept a “Current” flag. Another approach rather than putting nulls in the End date is to put an arbitrary date well in the future, this can make the join logic a bit simpler.
  8. This type of change tracking is more useful when there is a once off change like a change in sales regions where you want to see history re-cast into the new regions, but may also want to compare the old and new regions.