Data Modeling Best Practices
Data Modeling Best Practices
DATA MODELING
DATA TUTORIALS YOUTUBE CHANNEL
https://www.youtube.com/@datatutorials1
Report Development Flow
Every component is equally important to produce robust solution
Power Query
DAX
First Name Last Name Sales First Name Last Name Sales
Query Metadata 14 KB
Entities
Dimension Table:
Contain descriptive information used to slice and dice data from Fact Tables (eg:
branch_name, branch_type)
branch_key
Also holds Relationship/Key Fields used to connect the dimension to the fact table
(eg: branch_key)
Wider tables with small amount of rows
Fact Table:
Contain facts/details which are fields used as values in a visualization (eg: dollars_sold,
units_sold)
Also holds Relationship/Key fields used to connect the dimension to the fact table
(eg: time_key, item_key, branch_key, location_key)
Narrow tables with large amount of rows
• H
Golden Rule:
Avoid using a single table that includes everything (both facts and dimensions)
Relationships
• Connections between a 2 tables (usually
fact & Dim tables) using columns from
each are called Relationships
Bi-Directional Relationship
‐ Allow you to pass filters in both directions
‐ This is different than Many to Many
‐ There is a significant performance penalty for Bi-Directional filtering
Section B
Data Model Schemas, Normalization, DAX Calculated
Columns and Measures
Phases in Building a Power BI Desktop File
Data Model Brings Facts and Dimensions Together
Data Models
Flat or Snowflake
Star Schema
Denormalized Schema
Flat or
Denormalized
Schema
• Highly inefficient
Schema
Example:
One row per order or per Item
Daily or Monthly date grain
A Calculated Column is evaluated as a new column in the table in which it resides and will not change value until the
underlying data is refreshed.
Measures are calculations which do not have a result until they are used in a visualization.
They may use sums, averages, minimum or maximum values, counts, or more advanced calculations; and they change
value in response to your interaction with your reports.
Calculated Column
What is a Calculated Column?
Calculated Column
Best Practices – Calculated Columns
What is a Measure?
Columns
Values
Slicer
Rows
Designing good data models
Key takeaways to design a good Power BI Desktop data model
• If a fact table contains an ID field which is unique for each record, remove it unless needed as a connector key
• Ex. Transaction ID
• The DateTime data type is usually not needed, unless you are specifically using the Time component
➢ If you really need Time, try splitting Date & Time into
Knowledge Check
2. What are some advantages of a star schema over a flat or denormalized model?
• Dimension tables save space by reducing the amount of data that needs to be repeated over and
over in every row
• Relationships between tables can be leveraged for more complex measures
- The connection will ingest/pull all the data from the source and
make it a part of the PBI
Choosing storage mode: Import vs DirectQuery
Best Practices
Data Modeling
An inefficient model can completely slow down a report, even with very small data
volumes
GOALS:
Why is it undesired?
• Calculated columns don’t compress as well as physical columns
Proposed Solution
• Perform calc in Power Query, ideally push down
Remove unused tables and columns
Scenario
• Model contains tables/columns that are not used for reporting/analysis or
calculations
Why is it undesired?
• Increases model size
• Increases time to load into memory
• Increases refresh time
• May affect usability
Avoid high precision/cardinality columns
Scenario
• Model contains columns at a higher precision than needed for analysis e.g. datetime
in milliseconds, weight to 6 decimal places
• Model contains columns that are highly unique
Why is it undesired?
• Less compression with high precision/cardinality
• Increases time to load into memory
• Increases refresh time
Proposed Solution
• Remove if not needed
• Reduce precision
• Split datetime into date and time
Use integers instead of strings
Why is it undesired?
• Strings use dictionary encoding, integers use run length encoding which is more
efficient
Proposed Solution
• Check data types and set to integer if known to be numerical
Be careful with bi-directional relationships
Scenario
• Most relationships in the model are set to bi-
directional
Why is it undesired?
• Applying filters/slicers traverses many
relationships and can be slower
• Some filter chains unlikely to add business
value
Proposed Solution
• Only use bi-di where the business scenario
requires it
Set Default Summarization
Scenario
• Numeric columns in model that are purely
informational (e.g. Account ID)
• Default summarization is Sum
Why is it undesired?
• Power BI will try to sum the number when
dropped into visuals.
• Detailed tables/matrixes can be slower
Proposed Solution
• Set the default summarization to None