Week 05 Implementing Dimensional Models
Week 05 Implementing Dimensional Models
Assignment 05:
Implementing Dimensional Models
Part 1: Overview
This assignment will introduce you to the process of data warehouse development, specifically DDS
models from the Kimball technical architecture. You will implement the physical model (database tables,
keys and constraints) for your Details Dimensional-Modeling workbook, and then test that model by
performing an initial data load using SQL Queries, a fairly common practice in dimensional model
development.
Goals
The goals of this assignment are to:
Demonstrate general table, index, and key management in the SQL Query language, as a review
of your SQL experience
Teach you how to translate a detailed dimensional design specification into a ROLAP star-
schema implementation, including table creation, primary and foreign keys, as well as populate
your schema with sample data
Reinforce that the nature of development is iterative, so you must be able to re-create your
structures and repopulate them with data at will
Effort
This assignment can be done individually or with a partner. If you work with a partner, do not simply
divide up the work. Collaborate with each other throughout the exercise as if you were working on the
same data warehousing team.
Technical Requirements
To complete this assignment you will need the following:
Access to the course ist-cs-dw1.ad.syr.edu SQL Server, and specifically the Northwind Traders
database. You should connect to this server before starting the assignment.
For Part 2: The completed dimensional modeling Excel workbook, titled
COMPLETED-Northwind-Detailed-Dimensional-Modeling-Workbook.xlsm,
available in the same place you got this lab.
For Part 3: Your own completed Detailed Dimensional Modeling workbook from the previous
assignment.
Microsoft Excel 2007 or higher for viewing and editing the worksheets.
Page 1 of 25
IST722—Data Warehouse Assignment 05
Michael A. Fudge, Jr. Implementing Dimensional Models
1) A means to automate the process. You can construct the entire dimensional model simply by
executing a script.
2) Replay ability. You can quickly reproduce a dimensional model in your test, development, and
production environments.
3) Source code control. SQL is code; code can be tracked using a software configuration
management (SCM) tool like Git, Subversion, or SVN.
Considering the advantages above, coupled with how easy it is to learn SQL and to create objects in it, I
strongly recommend using SQL for database design projects. I will require its use for this course!
In addition to these commands are the types of database objects you can manipulate with them. Here
are some of the objects we’ll use in this class:
So you can combine any combination of DDL command + database object to produce the appropriate
command you need. For example: CREATE VIEW, DROP INDEX, ALTER TABLE, etc. At this point all that’s
left are the details.
Create Table
To make a new table, use the CREATE TABLE statement. The syntax is:
CREATE TABLE tableName (
column1 datatype null | not null,
column2 datatype null | not null,
...
columnN datatype null | not null,
CONSTRAINT pkTableNameColumn PRIMARY KEY (column1)
);
Page 2 of 25
IST722—Data Warehouse Assignment 05
Michael A. Fudge, Jr. Implementing Dimensional Models
Okay, let’s create some tables that might be used in a simple data mart.
Then press Ctrl+S to save your SQL script as college-data-mart.sql. When you’re done, it should have
the name of the SQL code file in your tab (see screen shot above).
What do these commands do? The first line in the script is which database to use. The “GO” command
tells SQL Server to batch everything before this statement, guaranteeing it is executed before advancing
to the next command in the script.
When you’re ready to execute your script, press the [F5] key. This will run your script and create your
table. If it works, you should see Command(s) completed successfully. in the Messages area below.
If you have an error, see if you can troubleshoot the issue by comparing the screenshot to your code on
the line number in question.
If you got it to work, press [F5] to execute your code again; you’ll notice that this time you will get an
error:
If you need to re-create it, you’ll have to execute a command to get rid of it first.
Page 3 of 25
IST722—Data Warehouse Assignment 05
Michael A. Fudge, Jr. Implementing Dimensional Models
Drop Table
The DROP TABLE command is used to remove a table (and all of the data therein) from the database.
DROP TABLE should be used with caution, and in general is only useful when building out a database
design.
Let’s modify our SQL script so that we can reexecute it by adding a DROP TABLE before the CREATE
TABLE.
NOTE: If your DimStudent has a red line under it, it’s probably because you didn’t create the table, or
you may simply need to refresh your Intellisense cache. Try pressing Ctrl+Shift+R to see if that corrects
the problem.
Where Is My Table?
The really cool thing about SQL script is that it creates real
database objects that you can view through the GUI of
the database management system. Let’s see if we can find
and open our DimStudent table.
Page 4 of 25
IST722—Data Warehouse Assignment 05
Michael A. Fudge, Jr. Implementing Dimensional Models
The Rest
Here’s the entire script with two dimension tables and one fact table. You will notice that I create my
fact table last, but my DROP
TABLE for my fact table
comes first. This is because
of the fact table’s foreign
key dependencies. To
preserve referential
integrity, the DBMS
prevents you from dropping
any table that is referenced
by a foreign key (and believe
me, this is a good thing).
The fact table is full of
foreign keys, and so it must
be dropped before the
dimension tables.
Page 5 of 25
IST722—Data Warehouse Assignment 05
Michael A. Fudge, Jr. Implementing Dimensional Models
Page 6 of 25
IST722—Data Warehouse Assignment 05
Michael A. Fudge, Jr. Implementing Dimensional Models
Part 2: Walk-Through
In this part we will implement, populate, and test a DDS star schema for the Northwind Sales Reporting
business process. Here’s the excerpt from the Bus Matrix:
Our plan it to turn the business process you see above into the following dimensional data store (DDS)
star schema:
We will accomplish this by generating SQL code from the Northwind Detailed Dimensional-Modeling
workbook, which contains the detailed specification for the star schema. On to the steps!
Page 7 of 25
IST722—Data Warehouse Assignment 05
Michael A. Fudge, Jr. Implementing Dimensional Models
Let’s start staging the source data. Open a query window, and save the query as part2b-northwind-
sales-data-stage.sql. Switch to your stage database (name will vary from the screenshot, of course):
Page 8 of 25
IST722—Data Warehouse Assignment 05
Michael A. Fudge, Jr. Implementing Dimensional Models
Type it in and execute it to create the stage table, and then populate it with data from the source. We
want to save all the stage queries into this one file.
1. What data do we need? For the answer to this question, consult the DimEmployee table (the
eventual target)
It looks like we need EmployeeID, EmployeeName, and EmployeeTitle. When in doubt, refer to
the detailed design worksheet where you specified the source to target map.
2. Next write an SQL Select statement to acquire the data. Execute this:
Take a look at the output and make sure it’s the data you need.
Page 9 of 25
IST722—Data Warehouse Assignment 05
Michael A. Fudge, Jr. Implementing Dimensional Models
NOTE: You might be tempted to combine first name and last name, as required by the target.
DO NOT DO THIS. Always stage data exactly as it appears from the source. Our goal is to have an
exact version of the source pipeline without being dependent on the availability of the actual
source. This allows us to design and implement the transformation logic over several iterations
(which you will probably need) without taxing the source.
3. Finally, when the data is what you need, it’s time to sock it away into a stage table, adding it to
the stage script including the INTO clause:
NOTE: The INTO clause of the SELECT statement creates the table and populates it with data. If
for some reason you “mess this up,” you will need to drop the table before you can execute this
statement again.
If we take a peek at the Product dimension, you’ll see that the source of this dimension does not come
from one table, but three: Products, Suppliers, and Territories.
Suppliers Table
Category Table
Should we stage all three tables? Or stage the query output of the join? The answer is, “It depends.”
Will Supplier or Category be used as a dimension in another dimensional model? If so, stage the
tables independently.
Page 10 of 25
IST722—Data Warehouse Assignment 05
Michael A. Fudge, Jr. Implementing Dimensional Models
It is more convenient to stage the query output, and as this is just an academic exercise, we will go
that route:
After you execute the query, add the INTO clause to stage the data, adding to our part2b-northwind-
sales-data-stage.sql. script:
While we could stage the entire date dimension, we’ll use this example to demonstrate how to only
stage the data we need.
NOTE: The following is an academic exercise. Normally you would not stage a date dimension, let alone
in this manner.
How many dates do we need? To answer this question, let’s query the Orders table for the min and max
Order and Ship dates:
Page 11 of 25
IST722—Data Warehouse Assignment 05
Michael A. Fudge, Jr. Implementing Dimensional Models
It looks like we’re in good shape grabbing the years 1996 through 1998.
Here’s the SQL you should add to the script to stage the dates we require:
But, the dimension keys like CustomerKey and ProductKey are not in the staged data! What do we do?
Instead, we use the natural keys from the data sources so that later in the pipeline we can “lookup” the
dimension keys. This is called the surrogate key pipeline, and it is fairly common when loading a fact
table. Here’s the SQL to complete the stage:
Page 12 of 25
IST722—Data Warehouse Assignment 05
Michael A. Fudge, Jr. Implementing Dimensional Models
…and one script that will stage the data on demand by re-creating the tables. Now that the extraction is
done, it’s time to write a script to load staged data into our star schema!
Important Tip: Document your findings in this step, and this will help you plan out the actual ETL
process later on. For example, if you need to combine two columns into one or replace NULL with a
value, then this will need to happen in your ETL tooling later on.
Since the dimensions depend on the fact table, we should start with those first.
First open a new query and save it as part2c-northwind-sales-data-load.sql. Execute this command to
switch to your data warehouse database (again, this will be different based on your user id).
Loading DimEmployee
Since the dimension table exists already and we do not want to replace them, we cannot use the SELECT
.. INTO statement. Instead we will use the INSERT INTO … SELECT pattern to add data to the dimension.
Page 13 of 25
IST722—Data Warehouse Assignment 05
Michael A. Fudge, Jr. Implementing Dimensional Models
Page 14 of 25
IST722—Data Warehouse Assignment 05
Michael A. Fudge, Jr. Implementing Dimensional Models
That looks good. Now that the data shape of our source matches the target dimension, we can load it
with this statement:
NOTE: It’s mere coincidence that the values of EmployeeKey and EmployeeID match up. Do not assume
this will always be the case!
Loading DimCustomer
You probably have gotten this figured out by now. Here’s the SQL to load the Customer Dimension:
This is a fairly common error. The source permits null, but the dimension, as a best practice, does not. As
such, we need to replace NULL with a default value, such as ‘N/A’. To accomplish this, I use the CASE
WHEN expression. Here’s the updated code, fixing this problem in CustomerRegion and
CustomerPostalCode:
Page 15 of 25
IST722—Data Warehouse Assignment 05
Michael A. Fudge, Jr. Implementing Dimensional Models
Loading DimProduct
Try to load the product dimension on your own. Here’s your approach:
Loading DimDate
Next load the Date Dimension on your own, again mapping the columns from source to target, and
consult the Excel worksheet for the meaning of the columns. This step should be trivial.
For each foreign key in the fact table, match the source data business key to the dimension business key
so we can look up the dimension primary key.
The output:
Page 16 of 25
IST722—Data Warehouse Assignment 05
Michael A. Fudge, Jr. Implementing Dimensional Models
If we repeat this process for the other dimensions of product and employee, we should get this:
That gives us everything but the date dimension keys. This can be handled differently because date keys
are predictable. For example, the date key for Aug-12-2015 would be 20150812. You can generate a
date key with some simple formatting to YYYYMMDD. Fortunately, I’ve written a SQL function for you to
do this. It is accessible from the ExternalSources database.
Here’s the code completing the surrogate key pipeline, generating the Order and Shipping date keys:
Fact Source
Quantity Quantity
ExtendedPriceAmount Quantity * UnitPrice
DiscountAmount Quantity * UnitPrice * Discount
SoldAmount Quantity * UnitPrice * (1 – Discount)
Page 17 of 25
IST722—Data Warehouse Assignment 05
Michael A. Fudge, Jr. Implementing Dimensional Models
Here’s the query with just the columns we need and the required facts:
And the final query, which should be added to the SQL file, now becomes trivial:
Page 18 of 25
IST722—Data Warehouse Assignment 05
Michael A. Fudge, Jr. Implementing Dimensional Models
This view should contain all the tables in our star schema:
Page 19 of 25
IST722—Data Warehouse Assignment 05
Michael A. Fudge, Jr. Implementing Dimensional Models
3. Enter our class SQL Server information into the Data Connection Wizard dialog, and then click
Next>
Page 20 of 25
IST722—Data Warehouse Assignment 05
Michael A. Fudge, Jr. Implementing Dimensional Models
4. Select your ist722_netid_dw database and select the SalesMart view. Then click Next>
Page 21 of 25
IST722—Data Warehouse Assignment 05
Michael A. Fudge, Jr. Implementing Dimensional Models
6. This final screen allows you to decide what you need to do with the data. For this example,
choose Table and then select Existing worksheet and click OK.
Page 22 of 25
IST722—Data Warehouse Assignment 05
Michael A. Fudge, Jr. Implementing Dimensional Models
Drag and drop the fields into the row, column, or values
area of the pivot table and see if you can make the
following pivot table reports:
Page 23 of 25
IST722—Data Warehouse Assignment 05
Michael A. Fudge, Jr. Implementing Dimensional Models
The queries are easy; the possibilities are endless, and you don’t need to write a single line of SQL to get
your questions answered. That, friends, is business intelligence!
Page 24 of 25
IST722—Data Warehouse Assignment 05
Michael A. Fudge, Jr. Implementing Dimensional Models
In this part you will repeat the process outlined in Part 2, but this time you’ll use your own detailed
dimensional design as opposed to the one I’ve provided.
Instructions
Make sure your name and email address appear at the top of each SQL script. I should be able to
execute your scripts, and they should work (create tables, import data, etc.).
Turning It In
Please turn all files from part 3. Make sure your name, NetID, and date appear somewhere on each of
the files you include.
If you worked with a partner, please indicate that in your assignment by including your partner’s name
and NetID. You should both submit the assignment individually.
Page 25 of 25