Data Mning

The document discusses dimensional modeling concepts for data warehousing. It defines dimensional modeling and describes its key components: facts, measures, and dimensions. It explains the different types of dimensional schemas including star schemas, snowflake schemas, and fact constellations. It also covers the dimensional modeling process and differentiates dimensional modeling from entity-relationship modeling.

Uploaded by

rapinmystyle

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views

Data Mning

Uploaded by

rapinmystyle

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 10

Lovely Professional University

Termpaper Of Data Warehouse and Modeling

Topic: Data Warehouse Dimensional Modeling Concepts Reg no.: 11005440 Roll no.: D1R06A07 Course code: CAP-618 Course Instructor: Mrs. Rajni Bhalla Date of Submission: 16 Nov, 2012 Signature: Neha Kapoor

Contents

Dimensional Modeling Concept Fact Table- The central linkage in Dimensional Modeling Dimension Table- What does and should it contain Dimensional Modeling Process Dimensional Model Star Schema using Star Query Snow-Flake Schema in Dimensional Modeling Fact Constellation Schema How data modeling is different from an ER diagram? Benefits of Data Modeling Conclusion

Dimensional Modeling Concept

Dimensional modeling (DM) is the name of a set of techniques and concepts used in data warehouse design. It is considered to be different from entity-relationship modeling (ER). Dimensional Modeling does not necessarily involve a relational database. The same modeling approach, at the logical level, can be used for any physical form, such as multidimensional database or even flat files. According to data warehousing consultant Ralph Kimball,[1] DM is a design technique for databases intended to support end-user queries in a data warehouse. It is oriented around understandability and performance. According to him, although transactionoriented ER is very useful for the transaction capture, it should be avoided for end-user delivery. Dimensional modeling always uses the concepts of facts (measures), and dimensions (context). Facts are typically (but not always) numeric values that can be aggregated, and dimensions are groups of hierarchies and descriptors that define the facts. Dimensional Model is a logical design technique that seeks to present the data in a standard, intuitive framework that allows for high-performance access. It is inherently dimensional, and it adheres to a discipline that uses the relational model with some important restrictions. Every dimensional model is composed of one table with a multi-part key, called the fact table, and a set of smaller tables called dimension tables. Each dimension table has a single-part primary key that corresponds exactly to one of the components of the multi-part key in the fact table. (See Figure) This characteristic 'star-like' structure is often called a star join. A fact table, because it has a multi-part primary key made up of two OR more foreign keys, always expresses a many-to-many relationship. The most useful fact tables also contain one OR more numerical measures, OR 'facts,' that occur for the combination of keys that define each record. In Figure, the facts are Units_Sold, Dollars_Sold, and Avg_sales. The most useful facts in a fact table are numeric and additive. Additivity is crucial because data warehouse applications almost never retrieve a single fact table record; rather, they fetch back hundreds, thousands, OR even millions of these records at a time, and the only useful thing to do with so many records is to add them up. Dimension tables, by contrast, most often contain descriptive textual information, and the attributes (also called classification attributes), which are used for analysis. Dimension attributes are used as the source of most of the interesting constraints in data warehouse queries, and they are virtually always the source of the row headers in the SQL answer set.

Fact Table and Dimension Tables in a Dimensional Model Schema Lets consider a Data-Warehouse cube. This cube has 4 dimensions and three measures. This means that for every value of each of these 4 dimensions there will two values of coordinates. For example: Co-ordinate [City(X), Product(Y), channel(Z),Month] = [ Sales (Quantity), Sales (Value)] OR [NY, Standard Desk-top, Mail, September 2005] = [2000 units, $15000] In the dimensional modeling schema, the FACT table contains the value of coordinates against the lowest granularity of all the possible combinations of dimensions. The dimension tables contain the details of the dimensions, which include the attributes of dimensions including all the higher-level hierarchies. The link between the fact table and all the associated dimension tables is through a dimension key, which is the lowest level granularity primary key of the dimension tables.

Fact Table- The central linkage in Dimensional Modeling

A fact table contains the value of all the measures linked to the set of dimensions linked to the FACT table. It contains the measure values for the combination of lowest level of granularity of dimensions. The measures are typically numeric, which can undergo mathematical aggregation and analysis. Families of FACT Tables

Chains and Circles. Heterogeneous products. Transactions and snapshots. Aggregates

Dimension Table- What does and should it contain

The dimension table contains all the information on the dimension. This includes: a. The primary key (Equivalent foreign key in the Fact Table). b. All attributes of the dimension. These include:

The hierarchy attributes- Consider a business hierarchy-- pin-code to city to district to state to country for location dimension. This means that each hierarchy element will be an attribute. Textual as well as the code attributes- Location code as well as the name of the location. This is required, because both could be used for different reasons by different users. A power user could be looking for location code , whereas an end user could be looking for more explicit header . Include all parallel hierarchies A product could be having different hierarchies, depending upon if CFO OR Head of sales is looking at it. This enables the done on all hierarchies as well as cross-hierarchies. Production Primary Key Refer Surrogate primary key link to FACT table These keys are used because the production keys could change OR could be reused. For example a bill number could be reused after 5 years, OR a part number (especially FMCG) could be reused after few years. Production OR source system key- This is required for audit ability OR link to the Extraction data and source systems.

Dimensional Modeling Process

The dimensional model is built on a star-like schema, with dimensions surrounding the fact table. To build the schema, the following design model is used: 1. 2. 3. 4. Choose the business process Declare the grain Identify the dimensions Identify the fact

Choose the business process The process of dimensional modeling builds on a 4-step design method that helps to ensure the usability of the dimensional model and the use of the data warehouse. The basics in the design

build on the actual business process which the data warehouse should cover. Therefore the first step in the model is to describe the business process which the model builds on. This could for instance be a sales situation in a retail store. To describe the business process, one can choose to do this in plain text or use basic Business Process Modeling Notation (BPMN) or other design guides like the Unified Modeling Language (UML). Declare the grain After describing the Business Process, the next step in the design is to declare the grain of the model. The grain of the model is the exact description of what the dimensional model should be focusing on. This could for instance be An individual line item on a customer slip from a retail store. To clarify what the grain means, you should pick the central process and describe it with one sentence. Furthermore the grain (sentence) is what you are going to build your dimensions and fact table from. You might find it necessary to go back to this step to alter the grain due to new information gained on what your model is supposed to be able to deliver. Identify the dimensions The third step in the design process is to define the dimensions of the model. The dimensions must be defined within the grain from the second step of the 4-step process. Dimensions are the foundation of the fact table, and is where the data for the fact table is collected. Typically dimensions are nouns like date, store, inventory etc. These dimensions are where all the data is stored. For example, the date dimension could contain data such as year, month and weekday. Identify the facts After defining the dimensions, the next step in the process is to make keys for the fact table. This step is to identify the numeric facts that will populate each fact table row. This step is closely related to the business users of the system, since this is where they get access to data stored in the data warehouse. Therefore most of the fact table rows are numerical, additive figures such as quantity or cost per unit, etc.

Dimensional Model Star Schema using Star Query

The star schema is perhaps the simplest data warehouse schema. It is called a star schema because the entity-relationship diagram of this schema resembles a star, with points radiating from a central table. The center of the star consists of a large fact table and the points of the star are the dimension tables. A star schema is characterized by one OR more very large fact tables that contain the primary information in the data warehouse, and a number of much smaller dimension tables (OR lookup tables), each of which contains information about the entries for a particular attribute in the fact table. A star query is a join between a fact table and a number of dimension tables. Each dimension table is joined to the fact table using a primary key to foreign key join, but the dimension tables are not joined to each other. The cost-based optimizer recognizes star queries and generates efficient execution plans for them.

A typical fact table contains keys and measures. For example, in the sample schema, the fact table, sales, contain the measures quantity_sold, amount, and average, and the keys time_key, item-key, branch_key, and location_key. The dimension tables are time, branch, item and location. A star join is a primary key to foreign key join of the dimension tables to a fact table. The main advantages of star schemas are that they:

Provide a direct and intuitive mapping between the business entities being analyzed by end users and the schema design. Provide highly optimized performance for typical star queries. Are widely supported by a large number of business intelligence tools, which may anticipate OR even require that the data-warehouse schema contains dimension tables.

Snow-Flake Schema in Dimensional Modeling

The snowflake schema is a more complex data warehouse model than a star schema, and is a type of star schema. It is called a snowflake schema because the diagram of the schema resembles a snowflake. Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dimension data has been grouped into multiple tables instead of one large table. Forexample, a location dimension table in a star schema might be normalized into a location table and city table in a snowflake schema. While this saves space, it increases the number of dimension tables and requires more foreign key joins. The result is more complex queries and reduced query performance. Figure above presents a graphical representation of a snowflake schema.

Fact Constellation Schema

This Schema is used mainly for the aggregate fact tables, OR where we want to split a fact table for better comprehension. The split of fact table is done only when we want to focus on aggregation over few facts & dimensions.

How Dimensional model is different from an E-R diagram?

An E-R diagram (used in OLTP or transactional system) has highly normalized model (Even at a logical level), whereas dimensional model aggregates most of the attributes and hierarchies of a dimension into a single entity. An E-R diagram is a complex maze of hundreds of entities linked with each other, whereas the Dimensional model has logical grouped set of star-schemas. The E-R diagram is split as per the entities. A dimension model is split as per the dimensions and facts. In an E-R diagram all attributes for an entity including textual as well as numeric, belong to the entity table. Whereas a 'dimension' entity in dimension model has mostly the textual attributes, and the 'fact' entity has mostly numeric attributes.

Dimensional modeling is a better approach for Data warehouse compared to standard Data Model.

The dimensional model has a number of important data warehouse advantages that the ER model lacks. First advantage of the dimensional model is that there are standard type of joins and framework. All dimensions can be thought of as symmetrically equal entry points into the fact table. The logical design can be done independent of expected query patterns. The user interfaces are symmetrical, the query strategies are symmetrical, and the SQL generated against the dimensional model is symmetrical. In other words,

You will never find attributes in fact tables and facts in dimension tables. If you see a non-fact field in the fact table, you can assume that it is a key to a dimension table

Second advantage of the dimensional model is that it is smoothly extensible to accommodate unexpected new data elements and new design decisions. First, all existing tables (both fact and dimension) can be changed in place by simply adding new data rows in the table. Data should not have to be reloaded. Typically, No query tool OR reporting tool needs to be reprogrammed to accommodate the change. All old applications continue to run without yielding different results. You can, respectively, make the following graceful changes to the design after the data warehouse is up and running by:

Adding new unanticipated facts (that is, new additive numeric fields in the fact table), as long as they are consistent with the fundamental grain of the existing fact table. Adding completely new dimensions, as long as there is a single value of that dimension defined for each existing fact record Adding new, unanticipated dimensional attributes. Breaking existing dimension records down to a lower level of granularity from a certain point in time forward.

Third advantage of the dimensional model is that there is a body of standard approaches for handling common modeling situations in the business world. Each of these situations has a wellunderstood set of alternatives that can be specifically programmed in report writers, query tools, and other user interfaces. These modeling situations include:

Slowly changing dimensions, where a 'constant' dimension such as Product OR Customer actually evolves slowly and asynchronously. Dimensional modeling provides specific techniques for handling slowly changing dimensions, depending on the business environment. Heterogeneous products, where a business such as a bank needs to: o Track a number of different lines of business together within a single common set of attributes and facts, but at the same time.. o It needs to describe and measure the individual lines of business in highly idiosyncratic ways using incompatible measures.

Benefits of the dimensional modeling

Understandability - Compared to the normalized model, the dimensional model is easier to understand and more intuitive. In dimensional models, information is grouped into coherent business categories or dimensions, making it easier to read and interpret. Simplicity also allows software to navigate databases efficiently. In normalized models, data is divided into many discrete entities and even a simple business process might result in dozens of tables joined together in a complex way. Query performance - Dimensional models are more denormalized and optimized for data querying, while normalized models seek to eliminate data redundancies and are optimized for transaction loading and updating. The predictable framework of a dimensional model allows the database to make strong assumptions about the data that aid in performance. Each dimension is an equivalent entry point into the fact table, and this symmetrical structure allows effective handling of complex queries. Query optimization for star join databases is simple, predictable, and controllable. Extensibility - Dimensional models are extensible and easily accommodate unexpected new data. Existing tables can be changed in place either by simply adding new data rows into the table or executing SQL alter table commands. No queries or other applications that sit on top of the Warehouse need to be reprogrammed to accommodate changes. Old queries and applications continue to run without yielding different results. But in normalized models each modification should be considered carefully, because of the complex dependencies between database tables.

Microsoft Excel Statistical and Advanced Functions for Decision Making
From Everand
Microsoft Excel Statistical and Advanced Functions for Decision Making
Palani Murugappan
4/5 (2)
Murali Sir PLSQL PDF
100% (1)
Murali Sir PLSQL PDF
162 pages
61269800199R1-Data Description BS
100% (1)
61269800199R1-Data Description BS
349 pages
MonetDB User Guide
No ratings yet
MonetDB User Guide
49 pages
Dimensional Modeling
100% (1)
Dimensional Modeling
12 pages
Basics of Dimensional Modeling
100% (1)
Basics of Dimensional Modeling
14 pages
Unit 2
No ratings yet
Unit 2
8 pages
Data Cubemod2
100% (1)
Data Cubemod2
21 pages
Data Warehouse Concepts: TCS Internal
No ratings yet
Data Warehouse Concepts: TCS Internal
19 pages
Data Warehouse Concepts PDF
0% (1)
Data Warehouse Concepts PDF
14 pages
Dimensional Modeling PDF
No ratings yet
Dimensional Modeling PDF
14 pages
Logical DWDesign
No ratings yet
Logical DWDesign
5 pages
Unit 2
No ratings yet
Unit 2
33 pages
Lecture 7 p1
No ratings yet
Lecture 7 p1
38 pages
Entity Relational Modeling Vs
No ratings yet
Entity Relational Modeling Vs
9 pages
What Is Dimensional Model
No ratings yet
What Is Dimensional Model
7 pages
Dimensional Modeling and Schemas: Data Modeling Research Paper
No ratings yet
Dimensional Modeling and Schemas: Data Modeling Research Paper
11 pages
DW Concepts
No ratings yet
DW Concepts
7 pages
ETL Testing Fundamentals
No ratings yet
ETL Testing Fundamentals
5 pages
Unit 3
No ratings yet
Unit 3
18 pages
Star Schema
No ratings yet
Star Schema
5 pages
Bahria University: Assignment # 5
No ratings yet
Bahria University: Assignment # 5
12 pages
DWDM Unit 2
No ratings yet
DWDM Unit 2
104 pages
Data Warehousing Concepts
No ratings yet
Data Warehousing Concepts
9 pages
Chapter 2 Kimball Dimensional Modelling Techniques Overview
No ratings yet
Chapter 2 Kimball Dimensional Modelling Techniques Overview
14 pages
Chapter Four - Data Warehouse Design: SATA Technology and Business Collage
No ratings yet
Chapter Four - Data Warehouse Design: SATA Technology and Business Collage
10 pages
Informatica FAQs
No ratings yet
Informatica FAQs
143 pages
introduction to DataWarehouse and DataMining
No ratings yet
introduction to DataWarehouse and DataMining
35 pages
CDM - Class 5,6,7
No ratings yet
CDM - Class 5,6,7
8 pages
Entity-Relationship Model: Data Warehouse Data Models
No ratings yet
Entity-Relationship Model: Data Warehouse Data Models
4 pages
Data Warehouse: Subject Oriented
No ratings yet
Data Warehouse: Subject Oriented
6 pages
Power BI DAX Training manual
No ratings yet
Power BI DAX Training manual
12 pages
Final Interview Questions (Etl - Informatica) : Subject Oriented, Integrated, Time Variant, Non Volatile
100% (1)
Final Interview Questions (Etl - Informatica) : Subject Oriented, Integrated, Time Variant, Non Volatile
77 pages
Unit-1 Lecture Notes
100% (1)
Unit-1 Lecture Notes
43 pages
Dataware House Strcture
No ratings yet
Dataware House Strcture
13 pages
Dimension Modeling
No ratings yet
Dimension Modeling
37 pages
Data Warehousing 2
No ratings yet
Data Warehousing 2
14 pages
2020300053_ADBMS_EXP1_Chinmay
No ratings yet
2020300053_ADBMS_EXP1_Chinmay
5 pages
Data Warehouse Schemas: Mandeep Kaur Sandhu Amanjot Kaur Ramandeep Kaur
No ratings yet
Data Warehouse Schemas: Mandeep Kaur Sandhu Amanjot Kaur Ramandeep Kaur
5 pages
Data Warehousing Basics
No ratings yet
Data Warehousing Basics
20 pages
DW-DM R19 Unit-1
100% (1)
DW-DM R19 Unit-1
25 pages
Fact and Dimension Tables
No ratings yet
Fact and Dimension Tables
11 pages
Data Stage
No ratings yet
Data Stage
10 pages
Dwm Chp2 Notes
No ratings yet
Dwm Chp2 Notes
21 pages
What Is Data Warehouse?: Explanatory Note
No ratings yet
What Is Data Warehouse?: Explanatory Note
10 pages
August 2009 Bachelor of Science in Information Technology (BScIT)
No ratings yet
August 2009 Bachelor of Science in Information Technology (BScIT)
49 pages
Unit 4
No ratings yet
Unit 4
11 pages
On The Differences of Relational and Dimensional Data Model: Mladen Varga
No ratings yet
On The Differences of Relational and Dimensional Data Model: Mladen Varga
7 pages
DWM Exp 1-2
No ratings yet
DWM Exp 1-2
9 pages
Dimensional Modelling
No ratings yet
Dimensional Modelling
26 pages
Notes
No ratings yet
Notes
39 pages
Chapter Nine
No ratings yet
Chapter Nine
36 pages
Data Warehousing Interview Questions and Answers
No ratings yet
Data Warehousing Interview Questions and Answers
5 pages
Dimension Modelling Techniques in Business Intelligence
No ratings yet
Dimension Modelling Techniques in Business Intelligence
4 pages
What Is Informatica Variable: String - Empty String Numeric - 0 Datetime - 1/1/1
No ratings yet
What Is Informatica Variable: String - Empty String Numeric - 0 Datetime - 1/1/1
6 pages
Final DWM
No ratings yet
Final DWM
30 pages
Group 9 GSLC
No ratings yet
Group 9 GSLC
11 pages
DWM
No ratings yet
DWM
19 pages
Dimensions DW
No ratings yet
Dimensions DW
6 pages
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
Expert Cube Development with SSAS Multidimensional Models
From Everand
Expert Cube Development with SSAS Multidimensional Models
Marco Russo
No ratings yet
Chapter - 3 Order Database: ER-Diagram
No ratings yet
Chapter - 3 Order Database: ER-Diagram
7 pages
High Speed Networks - Solution
No ratings yet
High Speed Networks - Solution
2 pages
IDOC Overview
No ratings yet
IDOC Overview
37 pages
,C#,MVC, Sqlserver, Jquery, WCF Interview Question Answer
No ratings yet
,C#,MVC, Sqlserver, Jquery, WCF Interview Question Answer
86 pages
Key Informant Interview - S. Go
100% (1)
Key Informant Interview - S. Go
28 pages
Raden Ayu - Assignment Week 8-ELT Research-S2PING
No ratings yet
Raden Ayu - Assignment Week 8-ELT Research-S2PING
2 pages
CBIS (Computer Based Information System) : Data
No ratings yet
CBIS (Computer Based Information System) : Data
7 pages
Data Communication and Networking II
No ratings yet
Data Communication and Networking II
109 pages
Session 2 - Report Writing
100% (1)
Session 2 - Report Writing
62 pages
JDBC Sample Programs
No ratings yet
JDBC Sample Programs
32 pages
How To Understand A Data Model: 3.0 Welcome
No ratings yet
How To Understand A Data Model: 3.0 Welcome
18 pages
Discovery Thesis
100% (3)
Discovery Thesis
8 pages
HANA Heap TranslationTables Internal 1.00.85.99
No ratings yet
HANA Heap TranslationTables Internal 1.00.85.99
2 pages
3.5. File System Implementation-Allocation
No ratings yet
3.5. File System Implementation-Allocation
16 pages
Dax Zero to Developer
No ratings yet
Dax Zero to Developer
71 pages
Data Migration To SAP S4 HANA
No ratings yet
Data Migration To SAP S4 HANA
6 pages
SQL Server Difference FAQs-8
No ratings yet
SQL Server Difference FAQs-8
2 pages
Data Warehousing/OLAP Report: Deepa Vaidhyanathan Graduate Student-Department of Computer and Information Systems
No ratings yet
Data Warehousing/OLAP Report: Deepa Vaidhyanathan Graduate Student-Department of Computer and Information Systems
16 pages
Introduction To Accounting
100% (1)
Introduction To Accounting
25 pages
The Role of Big Data Analytics For The Internet of Things (Iot)
No ratings yet
The Role of Big Data Analytics For The Internet of Things (Iot)
15 pages
Dissertation Guidelines Ignou
100% (1)
Dissertation Guidelines Ignou
5 pages
Strat MGT and TQM Chapter 9 Week 12
0% (1)
Strat MGT and TQM Chapter 9 Week 12
22 pages
Enterprise Data Warehouse (EDW) Full Guide
No ratings yet
Enterprise Data Warehouse (EDW) Full Guide
20 pages
MIT Dremio A New Paradigm For Managing Data
No ratings yet
MIT Dremio A New Paradigm For Managing Data
8 pages
(eBook PDF) Modern Database Management 12th Global Edition instant download
100% (3)
(eBook PDF) Modern Database Management 12th Global Edition instant download
57 pages
Draft On Understanding PowerBI 1714590813
No ratings yet
Draft On Understanding PowerBI 1714590813
60 pages
Beletu Researche
No ratings yet
Beletu Researche
46 pages