The document discusses dimensional modeling and data warehousing. It describes how dimensional models are designed for understandability and ease of reporting rather than updates. Key aspects include facts and dimensions, with facts being numeric measures and dimensions providing context. Slowly changing dimensions are also covered, with types 1-3 handling changes to dimension attribute values over time.
5. From Relational to Dimensional Relational Model Designed from the perspective of process efficiency Marketing
6. Sales “Normalised” data structures Entity Relationship Model Used for transactional, or operational systems OLTP : OnLine Transaction Processing Based on data that is Current
22. There exist a number of products supporting the dimensional model “ The Data Warehouse Toolkit” by Ralph Kimball & Margy Ross “ The Data Warehouse Lifecycle Toolkit” by Ralph Kimball & Margy Ross
23. A Transactional Database OrderDetails OrderHeaderID ProductID Amount OrderHeader OrderHeaderID CustomerID OrderDate FreightAmount Products ProductID Description Size Customers CustomerID AddressID Name Addresses AddressID StateID Street States StateID CountryID Desc Countries CountryID Description
24. A Dimensional Model FactSales CustomerID ProductID TimeID SalesAmount Products ProductID Description Size Subcategory Category Customers CustomerID Name Street State Country Time TimeID Date Month Quarter Year
25. Extract Transform Load Relational Dimensional Model Process Oriented Subject Oriented Transactional Aggregate Current Historic
26. Facts & Dimensions There are two main types of objects in a dimensional model Facts are quantitative measures that we wish to analyse and report on.
27. Dimensions contain textual descriptors of the business. They provide context for the facts.
34. GB Video E-R Diagram Customer #Cust No F Name L Name Ads1 Ads2 City State Zip Tel No CC No Expire Rental #Rental No Date Clerk No Pay Type CC No Expire CC Approval Line #Line No Due Date Return Date OD charge Pay type Requestor of Owner of Video #Video No One-day fee Extra days Weekend Title #Title No Name Vendor No Cost Name for Holder of
35. GB Video Data Mart Customer CustID Cust No F Name L Name Rental RentalID Rental No Clerk No Store Pay Type Line LineID OD Charge OneDayCharge ExtraDaysCharge WeekendCharge DaysReserved DaysOverdue CustID AddressID RentalId VideoID TitleID RentalDateID DueDateID ReturnDateID Video VideoID Video No Title TitleID TitleNo Name Cost Vendor Name Rental Date RentalDateID SQLDate Day Week Quarter Holiday Due Date DueDateID SQLDate Day Week Quarter Holiday Return Date ReturnDateID SQLDate Day Week Quarter Holiday Address AddressID Adddress1 Address2 City State Zip AreaCode Phone
36. Fact Table Measurements associated with a specific business process Grain: level of detail of the table
51. The Bus Matrix Process Date Product Store Promotion Warehouse Vendor Contract Shipper Retail Sales X X X X Retail Inventory X X X Retail Deliveries X X X Warehouse Inventory X X X X Warehouse Deliveries X X X X Purchase Orders X X X X X X
52. The Business Model Identify the data structure, attributes and constraints for the client’s data warehousing environment. Stable
55. Business Model As always in life, there are some disadvantages to 3NF: Performance can be truly awful. Most of the work that is performed on denormalizing a data model is an attempt to reach performance objectives.
56. The structure can be overwhelmingly complex. We may wind up creating many small relations which the user might think of as a single relation or group of data.
57. The 4 Step Design Process Choose the Data Mart
61. Building a Data Warehouse from a Normalized Database The steps Develop a normalized entity-relationship business model of the data warehouse.
62. Translate this into a dimensional model. This step reflects the information and analytical characteristics of the data warehouse.
63. Translate this into the physical model. This reflects the changes necessary to reach the stated performance objectives.
64. Structural Dimensions The first step is the development of the structural dimensions. This step corresponds very closely to what we normally do in a relational database.
65. The star architecture that we will develop here depends upon taking the central intersection entities as the fact tables and building the foreign key => primary key relations as dimensions.
86. Fact Tables Represent a process or reporting environment that is of value to the organization It is important to determine the identity of the fact table and specify exactly what it represents.
92. Non-key attributes in the fact table Attributes in dimension tables are constants. Facts vary with the granularity of the fact table
93. Dimensions A table (or hierarchy of tables) connected with the fact table with keys and foreign keys Preferably single valued for each fact record (1:m)
122. Type 1 Slowly Changing Dimension CustomerID Code Name State Gender 1 K001 Miranda Kerr NSW F CustomerID Code Name State Gender 1 K001 Miranda Kerr VIC F
123. Type 2 Slowly Changing Dimension Allows the recording of changes of state over time
126. Type 2 Slowly Changing Dimension 23/2/09 CustomerID Code Name State Gender Start End 1 K001 Miranda Kerr NSW F 1/1/09 <NULL> CustomerID Code Name State Gender Start End 1 K001 Miranda Kerr NSW F 1/1/09 23/2/09 2 K001 Miranda Kerr VIC F 24/2/09 <NULL>
127. Type 3 Slowly Changing Dimension De-normalized change tracking
130. Type 3 Slowly Changing Dimension VIC NSW CustomerID Code Name Current State Gender Prev State 1 K001 Miranda Kerr F <NULL>
Editor's Notes
A simplistic transactional schema showing 7 tables relating to sales orders
This is a star schema, (later on we will discuss snowflake schemas.) showing 4 tables that relate to the previous transactional schema State and Country have been denormalized under Customer Dimensions are in Blue These are the things that we analyse “by” (eg. By Time, By Customer, By Region) Fact is yellow These are ususally quantitative things that we are interested in
We already have the data in a data model – why create another data model…? Well… What is currently called “Data Warehousing” or “Business Intelligence” was originally often called “Decision Support Systems” We already have all the data in the OLTP system, why replicate it in a dimensional model? Atomic - Summary Supports Transaction throughput – Supports Aggregate queries Current - Historic
Facts work best if they are additive Dimensions allow us to “slice & dice” the facts into meaningful groups. The provide context
Designing the Perfect Data Warehouse (the paper formerly known as: Data Modeling for Data Warehouses), Frank McGuff , http://members.aol.com/fmcguff/dwmodel/
There are some changes where it is valid to overwrite history. When someone gets married and changes their name, they may want to carry the history of their previous purchases over to their new name rather than see a split history.
This makes inserts into your fact table more expensive as you always need to match on the effective dates as well as the business key. Sometimes people kept a “Current” flag. Another approach rather than putting nulls in the End date is to put an arbitrary date well in the future, this can make the join logic a bit simpler.
This type of change tracking is more useful when there is a once off change like a change in sales regions where you want to see history re-cast into the new regions, but may also want to compare the old and new regions.