SAS DI Developer - Fast Track PDF
SAS DI Developer - Fast Track PDF
Course Notes
SAS Data Integration Studio: Fast Track Course Notes was developed by Linda Jolley, Kari Richardson,
Eric Rossland, and Christine Vitron. Editing and production support was provided by the Curriculum
Development and Support Department.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of
SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product
names are trademarks of their respective companies.
Copyright 2009 SAS Institute Inc. Cary, NC, USA. All rights reserved. Printed in the United States of
America. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in
any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written
permission of the publisher, SAS Institute Inc.
Book code E1477, course code DIFT, prepared date 31Jul2009. DIFT_001
ISBN 978-1-60764-048-6
For Your Information iii
Table of Contents
Prerequisites ................................................................................................................................ ix
1.1 Exploring the Platform for SAS Business Analytics ....................................................... 1-3
Exercises.................................................................................................................. 3-82
Exercises.................................................................................................................. 5-68
7.2 Using Extract, Summary Statistics, and Loop Transformations ...................................... 7-7
Demonstration: Using the Extract and Summary Statistics Transformation ............. 7-9
Demonstration: Using the Loop Transformations ................................................... 7-34
Exercises.................................................................................................................. 7-54
7.5 Using Transpose, Sort, Append, and Rank Transformations ......................................... 7-95
Demonstration: Using the Transpose, Sort, Append, and Rank
Transformations............................................................................. 7-99
7.6 Basic Standardization with the Apply Lookup Standardization Transformation ......... 7-123
Demonstration: Using the Apply Lookup Standardization Transformation .......... 7-125
Exercises................................................................................................................ 7-138
Chapter 8 Working with Tables and the Table Loader Transformation ............. 8-1
8.3 Table Properties and Load Techniques of the Table Loader Transformation ................. 8-14
9.2 Using the SCD Type 2 Loader and Lookup Transformations ........................................ 9-15
Demonstration: Populate Star Schema Tables Using the SCD Type 2 Loader
with the Surrogate Key Method .................................................... 9-29
Course Description
This intensive training course provides accelerated learning for those students who will register sources
and targets; create and deploy jobs; work with transformations; set up change management; work with
slowly changing dimensions; and understand status handling and change data capture. This course is for
individuals who are comfortable with learning large amounts of information in a short period of time. The
&di1 and &di2 courses are available to provide the same type of information in a much more detailed
approach over a longer period of time.
To learn more
For information on other courses in the curriculum, contact the SAS Education
Division at 1-800-333-7660, or send e-mail to training@sas.com. You can also
find this information on the Web at support.sas.com/training/ as well as in the
Training Course Catalog.
For a list of other SAS books that relate to the topics covered in this
Course Notes, USA customers can contact our SAS Publishing Department at
1-800-727-3228 or send e-mail to sasbook@sas.com. Customers outside the
USA, please contact your local SAS office.
Also, see the Publications Catalog on the Web at support.sas.com/pubs for a
complete list of books and a convenient order form.
For Your Information ix
Prerequisites
Experience with SAS programming, SQL processing, and the SAS macro facility is required. This
experience can be gained by completing the SAS Programming 1: Esstentials, SAS SQL 1: Essentials,
and SAS Macro Language 1: Essentials courses.
x For Your Information
Chapter 1 Introduction
1.1 Exploring the Platform for SAS Business Analytics ................................................... 1-3
Objectives
Compare the two types of SAS installations.
Define the architecture of the platform for SAS
Business Analytics.
Describe the SAS platform applications used for data
integration, reporting, and analysis.
6
1.1 Exploring the Platform for SAS Business Analytics 1-5
The platform for SAS Business Analytics is also known as the SAS Enterprise Intelligence
Platform and the SAS Intelligence Platform.
middle tier
server tier
data tier
8
1-6 Chapter 1 Introduction
SAS platform applications cannot execute SAS code on their own. They must request code
submission and other services from a SAS server.
10
SAS Add-In for The SAS Add-In for Microsoft Office enables business users to
Microsoft Office transparently leverage the power of SAS analytics, reporting, and data
access directly from Microsoft Office via integrated menus and toolbars.
SAS Data SAS Data Integration Studio enables a data warehouse developer to create
Integration Studio and manage metadata objects that define sources, targets, and the sequence
of steps for the extraction, transformation, and loading of data.
SAS Enterprise SAS Enterprise Guide provides a guided mechanism to exploit the power
Guide of SAS and publish dynamic results throughout the organization. SAS
Enterprise Guide can also be used for traditional SAS programming.
SAS Information The SAS Information Delivery Portal is a Web application that can surface
Delivery Portal the different types of business analytic content such as information maps,
stored processes, and reports.
SAS Information SAS Information Map Studio is used to build information maps, which
Map Studio shield business users from the complexities of the underlying data by
organizing and referencing data in business terms.
SAS Management SAS Management Console provides a single interface for managing the
Console metadata of the SAS Platform. Specific administrative tasks are supported
by plug-ins to the SAS Management Console.
SAS OLAP Cube SAS OLAP Cube Studio is used to create OLAP cubes, which are
Studio multidimensional structures of summarized data. The Cube Designer
provides a point-and-click interface for cube creation.
SAS Visual BI SAS Visual BI, powered by JMP software, provides dynamic business
(JMP) visualization, enabling business users to interactively explore ideas and
information, investigate patterns, and discover previously hidden facts
through visual queries.
SAS Web OLAP The SAS Web OLAP Viewer provides a Web interface for viewing and
Viewer exploring OLAP data. It enables business users to look at data from
multiple angles, view increasing levels of detail, and add linked graphs.
SAS Web Report SAS Web Report Studio provides intuitive and efficient access to query and
Studio reporting capabilities on the Web.
dfPower Studio dfPower Studio from DataFlux (a SAS company) combines advanced data-
profiling capabilities with proven data quality, integration, and
augmentation tools for incorporating data quality into a data collection and
management process.
1.1 Exploring the Platform for SAS Business Analytics 1-9
Channels Cubes
Jobs Libraries
11
12
1-10 Chapter 1 Introduction
My Folder ( ) is a shortcut to the personal folder of the user who is currently logged on.
Products contains folders for individual SAS products. These folders contain content that is
installed along with the product. For example, some products have a set of initial
jobs, transformations, stored processes, or reports which users can modify for their
own purposes. Other products include sample content (for example, sample stored
processes) to demonstrate product capabilities. Where applicable, the content is
stored under the product's folder in subfolders that indicate the release number for
the product.
You can also create additional folders under SAS Folders in which to store
shared content.
Follow these best practices when interacting with SAS folders:
Use personal folders for personal content, shared folders for content that multiple users need to view.
Use folders instead of custom repositories to organize content.
Do not delete or rename the Users folder.
Do not delete or rename the home folder or personal folder of an active user.
Do not delete or rename the Products or System folders or their subfolders.
Use caution when renaming the Shared Data folder.
When you create new folders, the security administrator should set permissions.
1.1 Exploring the Platform for SAS Business Analytics 1-11
Objectives
State the purpose of SAS Data Integration Studio.
State the purpose of dfPower Studio.
Explore the available interfaces.
16
17
1.2 Introduction to Data Integration Applications 1-13
19
1-14 Chapter 1 Introduction
Menu Bar
Toolbar
Status Bar
20
The title bar shows the current version of SAS Data Integration Studio, as well as the name of the current
connection profile.
The menu bar provides access to the drop-down menus. The list of active options varies according to the
current work area and the kind of object that you select. Inactive options are disabled or hidden.
The Toolbar provides access to shortcuts for items on the menu bar. The list of active options varies
according to the current work area and the kind of object that you select. Inactive options are disabled or
hidden.
The status bar displays the name of the currently selected object, the name of the default SAS Application
Server if one has been selected, the login ID and metadata identity of the current user, and the name of the
current SAS Metadata Server. To select a different SAS Application Server, double-click the name of that
server to display a dialog box. If the name of the SAS Metadata Server turns red, the connection is
broken. In that case, you can double-click the name of the metadata server to display a dialog box that
enables you to reconnect.
1.2 Introduction to Data Integration Applications 1-15
Tree View
21
The tree view provides access to the Basic Properties pane, Folders tree, Inventory tree, Transformations
tree, and Checkouts tree.
The Basic Properties pane displays the basic properties of an object selected in a tree view. To surface
this pane, select View Basic Properties from the desktop.
The Folders tree organizes metadata into folders that are shared across a number of SAS applications.
The Inventory tree displays metadata for objects that are registered on the current metadata server, such as
tables and libraries. Metadata can be accessed in folders that group metadata by type, such as Table,
Library, and so on.
The Transformations tree displays transformations that can be dragged and dropped into SAS Data
Integration Studio jobs.
The Checkouts tree displays metadata that has been checked out for update, as well as any new metadata
that has not been checked in. The Checkouts tree is not displayed in above view of SAS Data Integration
Studio. The Checkouts tree automatically displays when you are working under change management.
1-16 Chapter 1 Introduction
Job Editor
Job Editor
22
The Job Editor window enables you to create, maintain, and troubleshoot SAS Data Integration Studio
jobs.
The Diagram tab is used to build and update the process flow for a job.
The Code tab is used to review or update code for a job.
The Log tab is used to review the log for a submitted job.
The Output tab is used to review the output of a submitted job.
The Details pane is used to monitor and debug a job in the Job Editor.
23
1.2 Introduction to Data Integration Applications 1-17
24
25
1-18 Chapter 1 Introduction
27
1.2 Introduction to Data Integration Applications 1-19
28
29
1-20 Chapter 1 Introduction
This demonstration illustrates logging into SAS Data Integration Studio and investigating the interface by
using predefined metadata objects.
1. Select Start All Programs SAS SAS Data Integration Studio 4.2.
2. Log on using Brunos credentials.
a. Verify that the connection profile is My Server.
b. Click to close the Connection Profile window and to open the Log On window.
c. Type Bruno as the value for the User ID field and Student1 as the value for the
Password field.
Some folders in the Folders tree are provided by default, such as My Folder, Products,
Shared Data, System, and Users.
Three folders (and subfolders) were added by an administrator: Chocolate Enterprises, Data
Mart Development, and Orion Star.
4. Click in front of the Data Mart Development folder to expand the folder.
1.2 Introduction to Data Integration Applications 1-23
The DIFT Demo folder contains seven metadata objects: two library objects, four table objects, and
one job object.
Each metadata object has its own type of properties.
1-24 Chapter 1 Introduction
6. Single-click on the DIFT Test Table ORDER_ITEM table object. The Basic Properties pane
displays basic information for this table object.
1.2 Introduction to Data Integration Applications 1-25
7. Single-click on the DIFT Test Source Library library object. The Basic Properties pane displays
basic information for this library object.
1-26 Chapter 1 Introduction
8. Single-click on the DIFT Test Job OrderFact Table Plus job object. The Basic Properties pane
displays basic information for this library object.
1.2 Introduction to Data Integration Applications 1-27
The name of the metadata table object is shown on the General tab, as well as the metadata folder
location.
1-28 Chapter 1 Introduction
The Columns tab displays the column attributes of the physical table. Note that all columns are
numeric.
1.2 Introduction to Data Integration Applications 1-29
The Physical Storage tab displays the type of table, the library object name, and the name of the
physical table.
10. Right-click on DIFT Test Table ORDER_ITEM and select Open. The View Data window opens
and displays the data for this table.
1.2 Introduction to Data Integration Applications 1-31
The functions of the View Data window are controlled by the View Data toolbar:
Positions the data with the Go-to row as the first data line displayed.
Displays the Sort By Column tab in the View Data Options window.
11. To close the View Data window, select File Close (or click ).
1-32 Chapter 1 Introduction
The name of the metadata table object is shown on the General tab, as well as the metadata folder
location.
1.2 Introduction to Data Integration Applications 1-33
The Options tab displays the library reference and the location of the physical path of this library.
13. Display the generated LIBNAME statement for this library object by right-clicking on
DIFT Test Source Library and selecting View Libname.
15. Access the Job Editor window to examine the properties of the job objects in more detail.
a. Right-click on DIFT Test Job OrderFact Table Plus and select Open.
This job joins two source tables and then loads the result into a target table. The target table is then
used as the source for the Rank transformation, the result of the ranking is loaded into a target table,
sorted, and then a report is generated based on the rankings.
1-36 Chapter 1 Introduction
b. Click the DIFT Test Table ORDERS table object. Note that the Details area now has a
Columns tab.
The Columns tab in the Details area displays column attributes for the selected table object.
These attributes are fully editable in this location.
Similarly, selecting any of the table objects in the process flow diagram (DIFT Test Table
ORDERS, DIFT Test Table ORDER_ITEM, DIFT Test Target Order
Fact Table (in diagram twice), DIFT Test Target Ranked Order Fact)
displays a Columns tab for that table object.
1.2 Introduction to Data Integration Applications 1-37
c. Click the SQL Join transformation. Note that the Details area now has a Mappings tab.
The full functionality of the Mappings tab from the SQL Join Designer window is found on this
Mappings tab.
Similarly, selecting any of the transformations in the process flow diagram (SQL Join, Table
Loader, Rank, Sort, List Data) displays a Mappings tab for that transformation.
1-38 Chapter 1 Introduction
d. Run the job by clicking . As the transformations execute, they are highlighted to denote
which node is executing.
As each transformation finishes, the icon is decorated with a symbol to denote success or failure.
Those transformations that had errors are also outlined in red.
Also, the Status tab in the Details area provides the status for each part of the job that executed.
1.2 Introduction to Data Integration Applications 1-39
e. Double-click the word Error under Status for the Table Loader.
The Details area moves focus to Warnings and Errors tab. The error indicates that the physical
location for the target library does not exist.
1-40 Chapter 1 Introduction
f. Select DIFT Test Target Library found in the Data Mart Development DIFT Demo folder
on the Folders tab.
The Basic Properties pane displays a variety of information, including the physical path location.
1.2 Introduction to Data Integration Applications 1-41
h. Run the entire job again by clicking . The Details area shows that all but the List Data
transformation completed successfully.
i. Double-click the word Error under Status for the List Data transformation.
The Details area moves focus to Warnings and Errors tab. The error indicates that the physical file
does not exist. However, because the file is to be created from the transformation, it is more likely
that the location for the file does not exist.
1-42 Chapter 1 Introduction
The Status tab of the Details pane shows the transformation completed successfully.
l. Select File Close (or click ) to close the job editor window. If any changes were made while
viewing the job, the following window opens:
16. Investigate some of the options available for SAS Data Integration Studio by selecting Tools
Options.
17. Examine the Show advanced property tabs option (this option is on the General tab of the Options
window).
a. If Show advanced property tabs is de-selected
then tabs such as Extended Attributes and Authorization do not appear in the
properties window for a specified object.
18. Examine the Enable row count on basic properties for tables option (this option is on the General
tab of the Options window).
a. If Enable row count on basic properties for tables is de-selected
then the Number of Rows field displays Row count is disabled for a selected
table object.
1-46 Chapter 1 Introduction
then the Number of Rows field displays the number of rows found for the selected table
object.
1.2 Introduction to Data Integration Applications 1-47
a. Click to establish and/or test the application server connection for SAS Data
Integration Studio. An information window opens verifying a successful connection:
The application server can also be set and tested via the status bar. For example, if the
application server has not been defined, the status bar shows:
. Double-clicking on this area opens the Default Application
Server window. A selection can be made and tested.
1-48 Chapter 1 Introduction
the resultant objects in the Diagram area are then drawn as the following:
1-50 Chapter 1 Introduction
b. Verify that the default selection in the layout area is Left To Right.
This results in process flow diagrams going horizontally, such as the following:
The options on this tab affect how data are displayed in the View Data window.
1-52 Chapter 1 Introduction
a. Verify the default selection for the Column headers area is Show column name in column
header.
If Show column description in column header is selected in the Column headers area
If both Show column name in column header and Show column description in column
header are selected in the Column headers area
a. Verify that the following fields are set appropriately in the Data Quality area:
Default Locale:
ENUSA
DQ Setup Location:
C:\Program Files\SAS\SASFoundation\9.2\dquality\sasmisc\dqsetup.txt
Scheme Repository Type:
C:\Program Files\DataFlux\QltyKB\CI\2008A\scheme
b. Verify the path specified for DataFlux Installation Folder under the DataFlux
dfPower area.
DataFlux Installation Folder:
23. Select the Tools menu. Note that there is an item, dfPower Tool, that provides direct access to many
of the DataFlux dfPower Studio applications.
1.2 Introduction to Data Integration Applications 1-55
1. From SAS Data Integration Studio session, select Tools dfPower Tool dfPower Explorer.
2. Create a new project
a. Click File New Project.
If the DIFT Repository has not been created, then follow these steps to create and
initialize it:
From SAS Data Integration Studio, select dfPower Tool dfPower Studio.
In the Navigation area, right-click on Repositories and select New Repository.
a. Click .
1) Type DIFT Orion Detail as the value for the Description field.
2) Click next to the Directory field. The Browse for Folder window opens.
3) Navigate to S:\Workshop\OrionStar\ordetail.
4) Click to close the Browse for Folder window. The Directory field displays
the selected path.
d. Click .
1-60 Chapter 1 Introduction
h. Click .
1.2 Introduction to Data Integration Applications 1-61
i. Type DIFT Orion Detail Project as the value for the Project name field.
j. Type DIFT Orion Detail Project as the value for the Description field.
k. Click .
1-62 Chapter 1 Introduction
The results are displayed in dfPower Explorer. Four tables were analyzed, and in the four tables there
are thirty columns.
1.2 Introduction to Data Integration Applications 1-63
5. Click the ORDER_ITEM table in the Matching Tables area. Having both tables selected
displays the relationship between the two tables.
7. Click PRODUCT_LIST in the Matching Tables area. The Product_ID column could
potentially link these tables.
Before initiating any data warehousing project, it is important to first examine the data and identify any
potential issues that may exist.
1. From dfPower Explorer, right-click on the CUSTOMER table in the Database area and select
Add Table to Profile Task.
The table and all its columns get added to the Profile Job Definition & Notes area.
1.2 Introduction to Data Integration Applications 1-69
2. From dfPower Explorer, right-click on the ORDER_ITEM table in the Database area and select
Add Table to Profile Task.
3. From dfPower Explorer, right-click on the PRODUCT_LIST table in the Database area and select
Add Table to Profile Task.
4. Collapse the listing of columns for each of the tables in the Profile Job Definition &
Notes area.
6. Type DIFT Orion Detail Information as the value for the Name field.
10. From SAS Data Integration Studio session, select Tools dfPower Tool
dfPower Profile (Configurator).
13. Click .
1-72 Chapter 1 Introduction
If a dfPower Profile job is not available (for instance, one was not created using dfPower
Explorer), SAS data can be added by using the following steps:
Select Insert SAS Data Set Directory.
Type DIFT Orion Detail Data as the value for the Description field.
Click next to the Directory field. The Browse for Folder window opens.
Navigate to S:\Workshop\OrionStar\ordetail.
Click to close the Browse for Folder window.
The Directory field displays the selected path.
Click to close the Insert SAS Data Set Directory window.
The link to the SAS Data Set Directory appears in the database listing.
14. Expand the DIFT Orion Detail data source. A list of available SAS tables is displayed. The ones
selected are the ones added from dfPower Explorer.
1.2 Introduction to Data Integration Applications 1-73
If you did not open an existing job in dfPower Profile (Configurator) and you attempt to run a
job, a warning window opens.
Clicking opens the Save As window. Typing a valid name and then clicking
displays the Run Job window as above.
1.2 Introduction to Data Integration Applications 1-75
24. Click to close the Run Job window. The Executor executes the job.
1-76 Chapter 1 Introduction
The columns from the CUSTOMER table are listed in the Tables area with a tabular view of the each
column and its calculated statistics.
1-78 Chapter 1 Introduction
32. In the Metrics area, select only the Data Length, Maximum Length and Minimum Length
statistics.
34. Click .
35. Select File Exit to close the dfPower Profile (Viewer) window.
36. Select File Exit to close the dfPower Profile (Configurator) window.
1.3 Introduction to Change Management 1-81
Objectives
Define the change management feature of
SAS Data Integration Studio.
35
36
1-82 Chapter 1 Introduction
Change Management
The Change Management facility in SAS Data Integration
Studio enables multiple SAS Data Integration Studio
users to work with the same metadata repository at the
same time without overwriting each other's changes.
Change Management
Metadata
Repository
37
38
1.3 Introduction to Change Management 1-83
Checkouts Tree
If you are authorized to work with a project repository, a
Checkouts tree is added to the desktop of SAS Data
Integration Studio.
The Checkouts tree displays metadata in your project
repository, which is an individual work area or playpen.
39
40
1-84 Chapter 1 Introduction
41
1.3 Introduction to Change Management 1-85
1. Select Start All Programs SAS SAS Data Integration Studio 4.2.
2. Log on using Barbaras credentials to access her project repository.
a. Select Barbaras Work Repository as the connection profile.
b. Click to close the Connection Profile window and open the Log On window.
c. Type Barbara as the value for the User ID field and Student1 as the value for the
Password field.
3. Double-click on the application server area of the status bar to open the Default Application Server
window.
4. Verify that SASApp is selected as the value for the Server field.
5. Click .
7. Click to close the Default Application Server window. The status bar updates to be the
following:
1-88 Chapter 1 Introduction
8. Verify that the tree view area now has a Checkouts tab.
This tab displays metadata objects checked out of the parent repository, as well as any new objects
that Barbara creates.
9. If necessary, click the Folders tab.
10. Expand the Data Mart Development DIFT Demo folders.
11. Select the DIFT Test Job OrderFact Table Plus job, hold down the CTRL key, and select both
DIFT Test Source Library and DIFT Test Table ORDER_ITEM.
1.3 Introduction to Change Management 1-89
12. Right-click on one of the selected items and select Check Out.
The icons for the three objects are decorated with a check ( ).
1-90 Chapter 1 Introduction
17. Right-click on DIFT Test Table ORDER_ITEM and select Check In (optionally, select
Check Outs Check In with the table object selected).
The Check In Wizard opens.
1-92 Chapter 1 Introduction
18. Type Testing out Change Management as the value for the Title field.
20. Click .
1.3 Introduction to Change Management 1-93
22. Click .
1-94 Chapter 1 Introduction
24. Click .
1. Select Start All Programs SAS SAS Data Integration Studio 4.2.
2. Log on using Oles credentials to access his project repository.
a. Select Oles Work Repository as the connection profile.
b. Click to close the Connection Profile window and open the Log On window.
c. Type Ole as the value for the User ID field and Student1 as the value for the Password
field.
4. Verify that the tree view area now has a Checkouts tab.
7. Right-clicking on DIFT Test Source Library (or on DIFT Test Job OrderFact Table Plus) shows
that the Check Out option is not available for this checked out object.
Ole can tell that Barbara has the object checked out.
9. Select File Close (or click ) to close the History window.
1.3 Introduction to Change Management 1-99
Ole can tell that Barbara had this object checked out and that it was checked back in. The title and
description information filled in by Barbara in the Check In Wizard can give Ole an idea on what
updates Barbara made to this metadata object.
11. Select File Close (or click ) to close the History window.
12. Right-click on DIFT Test Table ORDER_ITEM and select Check Out.
13. Click the Checkouts tab and verify that the table object is available for editing.
1-100 Chapter 1 Introduction
1. Select Start All Programs SAS SAS Data Integration Studio 4.2.
2. Log on using Ahmeds credentials (an administrator) to access the Foundation repository.
a. Select My Server as the connection profile.
b. Click to close the Connection Profile window and open the Log On window.
c. Type Ahmed as the value for the User ID field and Student1 as the value for the
Password field.
Clearing a project repository unlocks checked out objects (any changes made to these checked out
objects will not be saved) and deletes any new objects that may have been created in the project
repository.
6. Select both repositories (that is, select Barbaras Work Repository, hold down the CTRL key, and
select Oles Work Repository).
7. Click .
1-104 Chapter 1 Introduction
8. Verify that the checked out objects are no longer checked out.
1.3 Introduction to Change Management 1-105
10. Verify that the change Ole made to the Description field was not saved.
12. Select File Exit to close Ahmeds SAS Data Integration Studio session.
1-106 Chapter 1 Introduction
14. Select View Refresh. The Checkouts tab was active, so the metadata for the project repository is
refreshed.
1.3 Introduction to Change Management 1-107
17. Select File Exit to close Oles SAS Data Integration Studio session.
1-108 Chapter 1 Introduction
21. Select File Exit to close Barbaras SAS Data Integration Studio session.
Chapter 2 Introduction to Course
Data and Course Scenario
Objectives
Define common job roles.
Define the classroom environment.
Explore the course scenario.
Business Analyst
Business User
Platform Administrator
Project Manager
Classroom Environment
During this course, you will use a classroom machine on
which the SAS platform has been installed and configured
in a single machine environment.
groups
Course Data
The data used in the course is from a fictitious global
sports and outdoors retailer named Orion Star Sports &
Outdoors.
Course Data
The Orion Star data used in the course consists of the
following:
data ranging from 2003 through 2007
64 suppliers
9
2-6 Chapter 2 Introduction to Course Data and Course Scenario
10 continued...
11
2.1 Introduction to Classroom Environment and Course Data 2-7
12
Course Scenario
During this course, you will have the opportunity to learn
about SAS Data Integration Studio as a data integration
developer.
13
2-8 Chapter 2 Introduction to Course Data and Course Scenario
Objectives
Define the tasks for the course scenario.
Define the data model to be used for the data mart.
16
Course Tasks
There are several main steps you will accomplish during
this class.
Step 1: Register metadata for source tables.
Step 2: Register metadata for target tables.
Step 3: Create jobs to load tables.
Step 4: Investigate a variety of transformations.
Step 5: Investigate table relationships.
Step 6: Investigate slowly changing dimensions.
Step 7: Develop user-defined transformations.
Step 8: Deploy jobs.
17
2.2 Course Tasks 2-9
18
For this step, you will define metadata such as the above
several target tables.
19
2-10 Chapter 2 Introduction to Course Data and Course Scenario
20
22
23
2-12 Chapter 2 Introduction to Course Data and Course Scenario
24
25
2.2 Course Tasks 2-13
Organization Customer
Dimension Dimension
Order
Fact Table
Product Time
Dimension Dimension
26
27
2-14 Chapter 2 Introduction to Course Data and Course Scenario
Exercises
Product Table
Place an X in the column to indicate whether the data item will be used to classify data or as an
analysis variable. Add any additional data items that you think are needed.
Product_Name
Product_Group
Product_Category
Product_Line
Supplier_ID
Supplier_Name
Supplier_Country
Discount
Total_Retail_Price
CostPrice_Per_Unit
2-16 Chapter 2 Introduction to Course Data and Course Scenario
The following table contains a data dictionary for the columns of source tables.
Column Table Type Length Format Label
Birth_Date CUSTOMER Num 4 DATE9. Customer Birth
Date
STAFF Num 4 DATE9. Employee Birth
Date
City_ID CITY Num 8 City ID
POSTAL_CODE Num 8 City ID
STREET_CODE Num 8 City ID
City_Name CITY char 30 City Name
POSTAL_CODE char 30 City Name
STREET_CODE char 30 City Name
Continent_ID CONTINENT Num 4 Continent ID
COUNTRY Num 4 Numeric Rep.
for Continent
Continent_Name CONTINENT char 30 Continent Name
CostPrice_Per_Unit ORDER_ITEM Num 8 DOLLAR13.2 Cost Price Per Unit
Count STREET_CODE Num 4 Frequency
Country CITY char 2 $COUNTRY. Country
COUNTRY char 2 Country
Abbreviation
CUSTOMER char 2 $COUNTRY. Customer Country
GEO_TYPE char 2 Country
Abbreviation
HOLIDAY char 2 Country's Holidays
ORGANIZATION char 2 $COUNTRY. Country
Abbreviation
STATE char 2 $COUNTRY. Abbreviated
Country
STREET_CODE char 2 $COUNTRY. Abbreviated
Country
SUPPLIER char 2 $COUNTRY. Country
Country_Former_Name COUNTRY char 30 Former Name
of Country
Country_ID COUNTRY Num 4 Country ID
2.2 Course Tasks 2-17
a. Complete the table by listing the source tables and the columns in those tables that are involved in
determining the values that will be loaded in the ProdDim table.
Target Source Source Computed
Column Table Column Column? (X)
Product_ID
Product_Category
Product_Group
Product_Line
Product_Name
Supplier_Country
Supplier_ID
Supplier_Name
2-22 Chapter 2 Introduction to Course Data and Course Scenario
b. Sketch the diagram for the product dimension table. Show the input data source(s) as well as the
desired calculated columns, and the target table (product dimension table).
Diagram for the Product Dimension Table:
2.2 Course Tasks 2-23
30
31
2-24 Chapter 2 Introduction to Course Data and Course Scenario
32
33
2.2 Course Tasks 2-25
34
2-26 Chapter 2 Introduction to Course Data and Course Scenario
Product_Category X
Product_Group X
Product_Line X
Product_Name PRODUCT_LIST Product_Name
b.
Diagram for the Product Dimension Table:
2-28 Chapter 2 Introduction to Course Data and Course Scenario
Chapter 3 Creating Metadata for
Source Data
Objectives
Define some administrative tasks to be performed for
SAS Data Integration Studio.
Describe the New Library Wizard.
4
3-4 Chapter 3 Creating Metadata for Source Data
You will perform the first two tasks in this chapter. The last
three tasks will be discussed and demonstrated in the final
chapters of this course.
Custom Folders
The Folders tree is one of the tree views in the left panel
of the desktop. Like the Inventory tree, the Folders tree
displays metadata for objects that are registered on the
current metadata server, such as tables and libraries. The
Inventory tree, however, organizes metadata by type and
does not enable you to add custom folders. The Folders
tree enables you to add custom folders.
In general, an administrator sets up the custom folder
structure in the Folders tree and sets permissions on
those folders. Users simply save metadata to the
appropriate folders in that structure.
6
3.1 Setting Up the Environment 3-5
1. Select Start All Programs SAS SAS Data Integration Studio 4.2.
2. Log on using Ahmeds credentials to access the Foundation repository.
a. Select My Server as the connection profile.
b. Click to close the Connection Profile window and open the Log On window.
c. Type Ahmed as the value for the User ID field and Student1 as the value for the Password
field.
4. Right-click on the Data Mart Development folder and select New Folder.
6. Right-click on the Data Mart Development folder and select New Folder.
7. Type Orion Target Data and then press ENTER.
8. Right-click on the Data Mart Development folder and select New Folder.
9. Type Orion Jobs and then press ENTER.
10. Right-click on the Data Mart Development folder and select New Folder.
11. Type Orion Reports and then press ENTER.
12. Right-click on the Data Mart Development folder and select New Folder.
13. Type Orion SCD and then press ENTER.
3-8 Chapter 3 Creating Metadata for Source Data
Libraries
In SAS software, a library is a collection of one or more
files that are recognized by SAS and that are referenced
and stored as a unit.
Libraries are critical to SAS Data Integration Studio.
Metadata for sources, targets, or jobs cannot be finalized
until the appropriate libraries have been registered in a
metadata repository.
Accordingly, one of the first tasks in a SAS Data
Integration Studio project is to specify metadata for the
libraries that contain or will contain sources, targets, or
other resources. At some sites, an administrator adds and
maintains most of the libraries that are needed, and the
administrator tells SAS Data Integration Studio users
which libraries to use.
8
Q a library reference
9
3-10 Chapter 3 Creating Metadata for Source Data
This demonstration illustrates defining metadata for a SAS library, a location that has some of SAS source
tables to be used throughout the rest of the course.
1. If necessary, select Start All Programs SAS SAS Data Integration Studio 4.2.
2. Log on using Barbaras credentials to access the Barbaras Work Repository repository.
b. Click to close the Connection Profile window and open the Log On window.
c. Type Barbara as the value for the User ID field and Student1 as the value for the
Password field.
d. Click to close the Log On window. SAS Data Integration Studio opens.
6. Click .
7. Type DIFT Orion Source Tables Library as the value for the Name field.
8.
3.1 Setting Up the Environment 3-13
9. Verify that the location is set to /Data Mart Development/Orion Source Data.
10. Click .
3-14 Chapter 3 Creating Metadata for Source Data
13. Click .
16. Click to move the selected path to the Selected items pane.
3.1 Setting Up the Environment 3-15
The final settings for the library options window are shown here.
If the desired path does not exist in the Available items pane, click . In the New
Path Specification window, click next to Paths. In the Browse window, navigate
to the desired path. Click to close the Browse window. Click to close
the New Path Specification window.
17. Click .
3-16 Chapter 3 Creating Metadata for Source Data
Exercises
For this set of exercises, use Barbaras project repository to create the library object(s).
1. Specifying Folder Structure
If you did not follow along with the steps of the demonstration, complete steps 1-13 starting on page
3-5.
2. Specifying Orion Source Tables Library
If you did not follow-along with the steps of the demonstration, complete steps 1-17 starting on page
3-10.
3. Specifying a Library for Additional SAS Tables
There are additional SAS tables that are needed for the course workshops. Therefore, a new library
object must be registered to access these tables. The specifics for the library are shown below:
Name: DIFT SAS Library
Folder Location: \Data Mart Development\Orion Source Data
SAS Server: SASApp
Libref: DIFTSAS
Path Specification: S:\Workshop\dift\data
4. Checking in New Library Objects
Check in the new library objects. Specify the following for check-in information:
Title: Adding two library objects
Description: Checking in new library objects of DIFT Orion
Source Tables Library and DIFT SAS Library.
Objectives
Use Register Tables wizard to register SAS source
data.
Use Register Tables wizard to register metadata for a
Microsoft Access database table using ODBC.
Register metadata for a comma-delimited external file.
14
Source Data
Tables are the inputs and outputs of many SAS Data
Integration Studio jobs. The tables can be SAS tables or
tables created by the database management systems that
are supported by SAS/ACCESS software.
In this class, you will use source data from three different
types of data sources:
Q SAS tables
Q external files
15
3.2 Registering Source Data Metadata 3-19
16
17
3-20 Chapter 3 Creating Metadata for Source Data
c. Click to close the Connection Profile window and open the Log On window.
d. Type Barbara as the value for the User ID field and Student1 as the value for the
Password field.
When the Register Tables wizard opens, only those data formats that are licensed for your
site are available for use.
The procedure for registering a table typically begins with a page that asks you to "Select the
type of tables that you want to import information about". This window is skipped when you
register a table through a library.
3-22 Chapter 3 Creating Metadata for Source Data
5. Click next to the SAS Library field and then select DIFT Orion Source Tables Library.
6. Click . The Define Tables and Select Folder Location window opens.
3-24 Chapter 3 Creating Metadata for Source Data
11. Click .
The metadata object for the table is found in the Checkouts tree.
3-26 Chapter 3 Creating Metadata for Source Data
12. Right-click the PRODUCT_LIST metadata table object and select Properties.
13. Type DIFT at the beginning of the default name.
15. Click the Columns tab to view some of the defined information.
22. Right-click the DIFT PRODUCT_LIST metadata table object and select Open. The View Data
window opens.
3-32 Chapter 3 Creating Metadata for Source Data
26. Verify that Equals is set as the value for the Filter type field.
3.2 Registering Source Data Metadata 3-33
28. Click .
3-34 Chapter 3 Creating Metadata for Source Data
The data returned to the View Data window are filtered based on the query specified.
On a Windows operating
system, the Control Panels
Administrative Tools enable
you to add, remove, and
configure Open Database
Connectivity (ODBC) data
sources and drivers.
19
20
3-36 Chapter 3 Creating Metadata for Source Data
This demonstration uses the Control Panels Administrative Tools to access the ODBC Data Source
Administrator. A Microsoft Access database will be defined as an ODBC data source to the operating
system.
To register the desired tables from the Microsoft Access database via ODBC connection, a library object
(metadata object) is needed, and this library object requires a server definition. This server definition
points to the newly defined ODBC system resources. On this image, Barbara does not have the
appropriate authority to create metadata about a server. So Ahmed will create this server definition for her
using SAS Management Console.
Finally, Barbara can use the Register Tables wizard to complete the registration of the desired table.
3. In the Administrative Tools window, double-click Data Sources (ODBC) to open the ODBC Data
Source Administrator window.
4. In the ODBC Data Source Administrator window, click the System DSN tab.
3-38 Chapter 3 Creating Metadata for Source Data
5. Click .
8. Type DIFT Course Data as the value for the Data Source Name field.
The path and database name are now specified in the Database area as shown here:
3.2 Registering Source Data Metadata 3-41
The System DSN tab in the ODBC Data Source Administrator now has the newly defined ODBC data
source.
Metadata for an ODBC data source requires a library object that will use the ODBC engine, and the
library object requires a metadata server object that will point to the system ODBC data source. Barbara
does not have the appropriate authorizations to create this server. Ahmed is an administrator and can
create this server using SAS Management Console.
1. Access SAS Management Console using Ahmeds credentials.
a. Select Start All Programs SAS SAS Management Console 4.2.
b. Select My Server as the connection profile.
c. Click to close the Connection Profile window and open the Log On window.
d. Type Ahmed as the value for the User ID field and Student1 as the value for the Password
field.
4. Click .
3.2 Registering Source Data Metadata 3-45
5. Type DIFT Course Microsoft Access Database Server as the value for the Name
field.
6. Click .
3-46 Chapter 3 Creating Metadata for Source Data
7. Select ODBC Microsoft Access as the value for the Data Source Type field.
8. Click .
9. Click Datasrc.
3.2 Registering Source Data Metadata 3-47
10. Type "DIFT Course Data" (the quotes are necessary since the ODBC data source name has
spaces).
11. Click .
3-48 Chapter 3 Creating Metadata for Source Data
12. Click .
With the ODBC server defined, Barbara can now define the metadata object referencing a table in the
Microsoft Access database.
1. If necessary, access SAS Data Integration Studio as Barbaras credentials.
a. Select Start All Programs SAS SAS Data Integration Studio 4.2.
b. Select Barbaras Work Repository as the connection profile.
c. Click to close the Connection Profile window and open the Log On window.
d. Type Barbara as the value for the User ID field and Student1 as the value for the
Password field.
7. Select ODBC Microsoft Access as the type of table to import information about.
8. Click .
3.2 Registering Source Data Metadata 3-51
There are no library metadata objects defined with an ODBC engine so none appear in the selection
list.
10. Type DIFT Course Microsoft Access Database as the value for the Name field.
3-52 Chapter 3 Creating Metadata for Source Data
11. Verify that the location is set to /Data Mart Development/Orion Source Data.
The final specifications for the name and location window should be as follows:
12. Click .
16. Click .
3.2 Registering Source Data Metadata 3-55
17. Verify that DIFT Course Microsoft Access Database Server is the value for the Database
Server field.
18. Click .
3-56 Chapter 3 Creating Metadata for Source Data
19. Click . This finishes the metadata definition for the library object.
3.2 Registering Source Data Metadata 3-57
20. Click .
3-58 Chapter 3 Creating Metadata for Source Data
22. Click .
3.2 Registering Source Data Metadata 3-59
23. Click .
The metadata object for the ODBC data source, as well as the newly defined library object, are found
in the Checkouts tree.
24. Right-click the CustType metadata table object and select Properties.
3-60 Chapter 3 Creating Metadata for Source Data
25. Type DIFT Customer Types as the new value for the Name field.
3.2 Registering Source Data Metadata 3-61
26. Click the Columns tab to view some of the defined information.
3-62 Chapter 3 Creating Metadata for Source Data
29. Right-click the DIFT Customer Types metadata table object and select Open. The View Data
window opens.
22
23
3.2 Registering Source Data Metadata 3-65
c. Click to close the Connection Profile window and open the Log On window.
d. Type Barbara as the value for the User ID field and Student1 as the value for the
Password field.
6. Type DIFT Supplier Information as the value for the Name field.
7. Verify that the location is set to /Data Mart Development/Orion Source Data.
13. Click .
3.2 Registering Source Data Metadata 3-69
Previewing the file shows the first record contains column names and that the values are comma-
delimited and not space delimited.
The final settings for the External File Location window are shown here:
19. Click .
3-72 Chapter 3 Creating Metadata for Source Data
21. Type 2 (the number two) as the value for the Start record field.
22. Click to close the Auto Fill Columns window. The top portion of the Column Definitions
window populates with 6 columns: 3 numeric and 3 character.
24. Select Get the column names from column headings in this file.
25. Verify that 1 is set as the value for The column headings are in file record field.
26. Click . The Name field populates with all the column names.
3.2 Registering Source Data Metadata 3-75
29. Click the Data tab in the bottom part of the Column Definitions window.
30. Click .
3-76 Chapter 3 Creating Metadata for Source Data
31. Click .
32. Click .
The metadata object for the external file is found on the Checkouts tab.
3.2 Registering Source Data Metadata 3-77
5. Click .
3.2 Registering Source Data Metadata 3-79
7. Click .
3-80 Chapter 3 Creating Metadata for Source Data
8. Click .
3.2 Registering Source Data Metadata 3-81
Exercises
c. Register the five tables (SAS tables) found in the DIFT SAS Library. Change the default
metadata names to DIFT <SAS-table-name>.
e. Click .
f. Type DIFT SAS Library as the value for the Name field.
g. Verify that the location is set to \Data Mart Development\Orion Source Data.
h. Click .
k. Click .
m. The desired path does not exist in the Available items pane. Click .
s. Click .
t. Verify that the information is correct in the review window and then click .
The new library metadata object is found in the Checkouts tree.
3-86 Chapter 3 Creating Metadata for Source Data
d. Click .
3.3 Solutions to Exercises 3-87
f. Click .
3-88 Chapter 3 Creating Metadata for Source Data
h. Click .
4) Click next to SAS Library field and then select DIFT Orion Source Tables Library.
5) Click . The Define Tables and Select Folder Location window opens.
6) Select STAFF table, hold down the CTRL key and select ORDER_ITEM and ORDERS.
7) Verify that /Data Mart Development/Orion Source Data is the folder listed for the
Location field.
10) Right-click the STAFF metadata table object and select Properties.
11) Type DIFT at the beginning of the default name.
13) Right-click the ORDER_ITEM metadata table object and select Properties.
14) Type DIFT at the beginning of the default name.
17) Right-click the ORDERS metadata table object and select Properties.
18) Type DIFT at the beginning of the default name.
3-90 Chapter 3 Creating Metadata for Source Data
c. Register the five tables (SAS tables) found in the DIFT SAS Library. Change the default
metadata names to DIFT <SAS-table-name>.
4) Click next to SAS Library field and then select DIFT SAS Library.
5) Click . The Define Tables and Select Folder Location window opens.
6) Click .
7) Verify that /Data Mart Development/Orion Source Data is the folder listed for the
Location field.
The metadata objects for the table are found in the Checkouts tree.
10) Right-click the CUSTOMER_TRANS metadata table object and select Properties.
11) Type DIFT at the beginning of the default name.
13) Right-click the CUSTOMER_TRANS_OCT metadata table object and select Properties.
3.3 Solutions to Exercises 3-91
16) Right-click the NEWORDERTRANS metadata table object and select Properties.
17) Type DIFT at the beginning of the default name.
20) Right-click the STAFF_PARTIAL metadata table object and select Properties.
21) Type DIFT at the beginning of the default name.
The metadata objects for the table are found in the Checkouts tree.
3-92 Chapter 3 Creating Metadata for Source Data
7) Click . The ODBC window opens and displays the one ODBC library definition in
the SAS Library field.
8) Click .
9) Select Contacts, hold down the CTRL key and select NewProducts.
10) Click .
11) Click .
12) Right-click the Contacts metadata table object and select Properties.
13) Type DIFT at the beginning of the default name.
15) Right-click the NewProducts metadata table object and select Properties.
3-94 Chapter 3 Creating Metadata for Source Data
6) Verify that the location is set to /Data Mart Development/Orion Source Data.
9) Navigate to S:\Workshop\dift\data.
10) Select profit.txt.
11) Click .
3.3 Solutions to Exercises 3-95
17) Click to add a new column specification. Enter the following information:
18) Click to add a new column specification. Enter the following information:
19) Click to add a new column specification. Enter the following information:
20) Click to add a new column specification. Enter the following information:
21) Click to add a new column specification. Enter the following information:
22) Click to add a new column specification. Enter the following information:
23) Click the Data tab and then click . Verify that the values are read in correctly.
25) Click . The review window displays general information for the external file.
3.3 Solutions to Exercises 3-97
26) Click . The metadata object for the external file is found in the Checkouts tree.
b. After the external file metadata is defined in the project repository, be sure to check it in.
1) Select Check Outs Check In All.
2) Type Adding metadata for profit information external file as the
value for the Title field.
5) Click . The external file object should no longer be in the Checkouts tree.
5) Click .
8) Type DIFT Workshop Data as the value for the Data Source Name field.
3) Click to close the Connection Profile window and access the Log On window.
4) Type Ahmed as the value for the User ID field and Student1 as the value for the
Password field.
3) Click .
4) Type DIFT Workshop Microsoft Access Database Server as the value for
the Name field.
5) Click .
6) Select ODBC Microsoft Access as the value for the Data Source Type field.
7) Click .
8) Select Datasrc.
9) Type DIFT Workshop Data (the quotes are necessary since the ODBC data source
name has spaces).
10) Click .
11) Click .
3) Click to close the Connection Profile window and access the Log On window.
4) Type Barbara as the value for the User ID field and Student1 as the value for the
Password field.
7) Click .
a) Type DIFT Workshop Microsoft Access Database as the value for the
Name field.
b) Verify that the location is set to /Data Mart Development/Orion Source Data.
c) Click .
g) Click .
h) Verify that DIFT Workshop Microsoft Access Database Server is the value for the
Database Server field.
i) Click .
j) Click . This finishes the metadata definition for the library object.
9) Click .
3-100 Chapter 3 Creating Metadata for Source Data
10) Select Catalog_Orders , hold down the CTRL key, select PRODUCTS and then
Web_Orders.
11) Click .
12) Click .The metadata objects for the three tables as well as the newly defined
library object are found in the Checkouts tree.
g. Update the metadata for Catalog_Orders.
4) Click .
7) Click .
3-102 Chapter 3 Creating Metadata for Source Data
Chapter 4 Creating Metadata for
Target Data
Objectives
Review features of the New Tables wizard.
4
4-4 Chapter 4 Creating Metadata for Target Data
Organization Customer
Dimension Dimension
Order
Fact Table
Product Time
Dimension Dimension
Case
Organization Case
Customer
Dimension
Study Study
Dimension
Order
Exercise
Fact Table
Product Case
Time
Demo
Dimension Dimension
Study
7
4.1 Registering Target Data Metadata 4-5
Product_ Supplier
List
8
4-6 Chapter 4 Creating Metadata for Target Data
This demonstration defines a metadata object for a single target file. The target is to be a SAS data set
named ProdDim to be stored in DIFT Orion Target Tables Library (the library object needs to be created,
as well).
1. Select Start All Programs SAS SAS Data Integration Studio 4.2.
2. Log on using Barbaras credentials to access her project repository.
a. Select Barbaras Work Repository as the connection profile.
b. Click to close the Connection Profile window and open the Log On window.
c. Type Barbara as the value for the User ID field and Student1 as the value for the
Password field.
7. Type DIFT Product Dimension as the value for the Name field.
8. Verify that the location is set to /Data Mart Development/Orion Target Data.
The final specifications for the name and location window should be as follows:
9. Click .
4.1 Registering Target Data Metadata 4-9
11. Click next to the Library field. The target tables library is not yet defined.
a. Type DIFT Orion Target Tables Library as the value for the Name field.
b. Verify that the location is set to /Data Mart Development/Orion Target Data.
The final specifications for the name and location window are as follows:
c. Click .
4-10 Chapter 4 Creating Metadata for Target Data
f. Click .
p. Verify that the newly specified path is found in the Selected items pane.
The final settings for the library options window are shown here:
q. Click .
4-12 Chapter 4 Creating Metadata for Target Data
The new library metadata object can be found in the Library field.
4.1 Registering Target Data Metadata 4-13
The final settings for the Table Storage Information window are shown below:
13. Click .
4-14 Chapter 4 Creating Metadata for Target Data
14. Expand the Data Mart Development Orion Source Data folder on the Folders tab.
15. From the Orion Source Data folder, expand DIFT PRODUCT_LIST table object.
16. Select the following columns from DIFT PRODUCT_LIST and click to move the columns to
the Selected pane:
Product_ID
Product_Name
Supplier_ID
4.1 Registering Target Data Metadata 4-15
19. Select the following columns from DIFT Supplier Information and click to move the columns
to the Selected pane:
Supplier_Name
Country
20. Click .
4-16 Chapter 4 Creating Metadata for Target Data
a. Click . Define two simple indexes: one for Product_ID and one for
Product_Group.
Neglecting to press ENTER results in not having the name of the index saved, which
produces an error when the table is generated because the name of the index and the
column being indexed do not match.
d. Select the Product_ID column and move it to the Indexes panel by clicking .
g. Select the Product_Group column and move it to the Indexes panel by clicking . The two
requested indexes are defined in the Define Indexes window.
h. Click to close the Define Indexes window and return to the Target Table Designer.
25. Click .
4-20 Chapter 4 Creating Metadata for Target Data
27. Click .
The new table object and new library object appear on the Checkouts tab.
4.1 Registering Target Data Metadata 4-21
e. Click .
The objects should appear in the Data Mart Development Orion Target Data folders.
4-22 Chapter 4 Creating Metadata for Target Data
Exercises
Define metadata for the OrderFact table. Name the metadata object DIFT Order Fact.
Specify that the table should be created as a SAS table with the physical name of OrderFact.
Physically store the table in DIFT Orion Target Tables Library. Use the set of distinct columns
from DIFT ORDER_ITEM and DIFT ORDERS. Store the metadata object in the Data Mart
Development Orion Target Data folder. Check in the new table object.
2. Defining Additional Target Tables
Several additional tables must be defined for the demonstrations and exercises in subsequent sections.
Check in all of the metadata table objects.
Create a target table metadata object named DIFT Recent Orders that defines
column metadata for a SAS table that will be named Recent_Orders and stored in the
DIFT Orion Target Tables Library. The columns in Recent_Orders should be the same
columns that are defined in the OrderFact target table. Store the metadata object in the
Data Mart Development Orion Target Data folder.
Create a target table metadata object named DIFT Old Orders that defines
column metadata for a SAS table that will be named Old_Orders and stored in the
DIFT Orion Target Tables Library. The columns in Old_Orders should be the same
columns that are defined in the OrderFact target table. Store the metadata object in the
Data Mart Development Orion Target Data folder.
(Optional) Create a target table metadata object named DIFT US Suppliers that defines
column metadata for a SAS table that will be named US_Suppliers and stored in the
DIFT Orion Target Tables Library. The columns in US_Suppliers should be the same
columns that are defined in the DIFT Supplier Information external file object. Store the
metadata object in the Data Mart Development Orion Target Data folder.
4.2 Importing Metadata 4-23
Objectives
Discuss SAS packages.
Discuss importing and exporting of relational
metadata.
13
Types of Metadata
SAS Data Integration Studio enables you to import and
export metadata for individual objects or sets of related
objects. You can work with two kinds of metadata:
Q SAS metadata in SAS Package format
14
4-24 Chapter 4 Creating Metadata for Target Data
15
Relational Metadata
By importing and exporting relational metadata in external
formats, you can reuse metadata from third-party
applications, and you can reuse SAS metadata in those
applications as well. For example, you can use third-party
data modeling software to specify a star schema for a set
of tables. The model can be exported in Common
Warehouse Metamodel (CWM) format. You can then use
a SAS Metadata Bridge to import that model into SAS
Data Integration Studio.
16
4.2 Importing Metadata 4-25
17
Relational Metadata
You can import and export relational metadata in any
format that is accessible with a SAS Metadata Bridge.
Relational metadata includes the metadata for the
following objects:
Q data libraries
Q tables
Q columns
Q indexes
18
4-26 Chapter 4 Creating Metadata for Target Data
This demonstration illustrates importing metadata that was exported in CWM format from Oracle
Designer.
1. Select Start All Programs SAS SAS Data Integration Studio 4.2.
2. Log on using Barbaras credentials to access her project repository.
a. Select Barbaras Work Repository as the connection profile.
b. Click to close the Connection Profile window and access the Log On window.
c. Type Barbara as the value for the User ID field and Student1 as the value for the
Password field.
6. Select File Import Metadata. The Metadata Importer wizard initializes and displays the
window to enable the user to select an import format.
7. Select Oracle Designer.
8. Click .
4-28 Chapter 4 Creating Metadata for Target Data
9. Click next to the File name field to open the Select a file window.
12. Click .
14. Verify that the folder location is set to /Data Mart Development/Orion Target Data.
The final settings for the File Location window of the Metadata Importer wizard should be as shown:
15. Click .
4-30 Chapter 4 Creating Metadata for Target Data
17. Click .
4.2 Importing Metadata 4-31
25. Click .
4-34 Chapter 4 Creating Metadata for Target Data
26. The finish window displays the final settings. Review and accept the settings.
27. Click .
30. The Checkouts tree displays two new metadata table objects.
31. Right-click the Current Staff metadata table object and select Properties.
32. Type DIFT at the beginning of the default name.
4-36 Chapter 4 Creating Metadata for Target Data
33. Click the Columns tab to view some of the defined information.
34. Update the formats for each of the columns.
Start_Date Date9.
End_Date Date9.
Job_Title <none>
Salary Dollar12.
Gender $Gender6.
Birth_Date Date9.
Emp_Hire_Date Date9.
Emp_Term_Date Date9.
Manager_ID 12.
36. Right-click the Terminated Staff metadata table object and select Properties.
37. Type DIFT at the beginning of the default name.
4-38 Chapter 4 Creating Metadata for Target Data
38. Click the Columns tab to view some of the defined information.
39. Update the formats for each of the columns.
Start_Date Date9.
End_Date Date9.
Job_Title <none>
Salary Dollar12.
Gender $Gender6.
Birth_Date Date9.
Emp_Hire_Date Date9.
Emp_Term_Date Date9.
Manager_ID 12.
e. Verify that the location is set to /Data Mart Development/Orion Target Data.
f. Click .
h. Select DIFT Orion Target Tables Library as the value for the Library field.
j. Click .
k. Expand the Data Mart Development Orion Source Data folder on the Folders tab.
l. From the Orion Source Data folder, click the DIFT ORDER_ITEM table object.
m. Select all columns from DIFT ORDER_ITEM by clicking (all columns will be moved to
the Selected pane).
p. Select all columns from DIFT ORDERS by clicking (all columns will be moved to the
Selected pane). An Error window opens saying that Order_ID will not be added twice.
q. Click .
r. Click .
t. Review the metadata listed in the finish window and then click . The new table object
appears on the Checkouts tab.
u. Select Check Outs Check In All.
v. Type Adding metadata for Order Fact table as the value for the Title field.
5) Verify that the location is set to /Data Mart Development/Orion Target Data.
6) Click .
8) Select DIFT Orion Target Tables Library as the value for the Library field.
10) Click .
11) Expand the Data Mart Development Orion Target Data folder on the Folders tab.
12) From the Orion Target Data folder, locate the DIFT Order Fact table object.
13) Select the DIFT Order Fact table object and click to move all columns to the
Selected pane.
14) Click .
15) Accept the default attributes of the columns and then click .
16) Review the metadata listed in the finish window and then click . The new table
object appears on the Checkouts tab.
b. Define metadata for the DIFT Recent Orders table.
5) Verify that the location is set to /Data Mart Development/Orion Target Data.
4-42 Chapter 4 Creating Metadata for Target Data
6) Click .
8) Select DIFT Orion Target Tables Library as the value for the Library field.
10) Click .
11) Expand the Data Mart Development Orion Target Data folder on the Folders tab.
12) From the Orion Target Data folder, locate the DIFT Order Fact table object.
13) Select the DIFT Order Fact table object and click to move all columns to the
Selected pane.
14) Click .
15) Accept the default attributes of the columns and then click .
16) Review the metadata listed in the finish window and then click . The new table
object appears on the Checkouts tab.
c. (Optional) Define metadata for the DIFT US Suppliers table.
5) Verify that the location is set to /Data Mart Development/Orion Target Data.
6) Click .
8) Select DIFT Orion Target Tables Library as the value for the Library field.
10) Click .
11) Expand the Data Mart Development Orion Source Data folder on the Folders tab.
12) From the Orion Source Data folder, locate the DIFT Supplier Information table object.
13) Select the DIFT Supplier Information table object and click to move all columns to the
Selected pane.
14) Click .
4.3 Solutions to Exercises 4-43
15) Accept the default attributes of the columns and then click .
16) Review the metadata listed in the finish window and then click . The new table
object appears on the Checkouts tab.
d. Check in the newly created table objects.
1) Select Check Outs Check In All.
2) Type Adding metadata for various target table objects as the value for
the Title field.
3) Type Adding metadata for Old and Recent Orders, and US Suppliers
as the value for the Description field.
The metadata in the Orion Target Data folder should now resemble the following:
4-44 Chapter 4 Creating Metadata for Target Data
Chapter 5 Creating Metadata for
Jobs
Objectives
Define a job object.
Discuss various features of jobs and the Job Editor
window.
Overview
At this point, metadata is defined for the following:
various types of source tables
4
5-4 Chapter 5 Creating Metadata for Jobs
What Is a Job?
A job is a collection of SAS tasks that creates output. SAS
Data Integration Studio uses the metadata for each job to
generate SAS code that reads sources and creates
targets in physical storage.
6
5.1 Introduction to Jobs and the Job Editor 5-5
A Quick Example
Before you proceed to further discussions on jobs and
the Process Designer window, look at the creation of a
simple job. The job creates two SAS data sets, one
containing the current employees and the other
containing the terminated employees.
Splitter Transformation
The Splitter transformation is a
transformation that can be used to
create one or more subsets of a
source.
8
5-6 Chapter 5 Creating Metadata for Jobs
This demonstration shows the building of a job that uses the Splitter transformation.
The final process flow diagram will look like the following:
c. Click to close the Connection Profile window and access the Log On window.
d. Type Barbara as the value for the User ID field and Student1 as the value for the
Password field.
6. Type DIFT Populate Current and Terminated Staff Tables as the value for the
Name field.
8. Click .
5.1 Introduction to Jobs and the Job Editor 5-9
When a job window is active, objects can be added to the diagram by right-clicking and
selecting Add to Diagram.
10. Select File Save to save diagram and job metadata to this point.
5.1 Introduction to Jobs and the Job Editor 5-11
12. Select File Save to save diagram and job metadata to this point.
5-12 Chapter 5 Creating Metadata for Jobs
b. To connect the DIFT STAFF table object to the Splitter transformation, place your cursor over
the connection selector until a pencil icon appears.
14. Select File Save to save diagram and job metadata to this point.
5.1 Introduction to Jobs and the Job Editor 5-13
d. Drag the two objects to the Diagram tab of the Job Editor.
When a job window is active, objects can be added to the diagram by right-clicking and
selecting Add to Diagram.
5-14 Chapter 5 Creating Metadata for Jobs
16. Select File Save to save diagram and job metadata to this point.
5.1 Introduction to Jobs and the Job Editor 5-15
b. Right-click on the second temporary table object of the Splitter transformation and select Delete.
18. Connect the Splitter transformation to each of the target table objects.
a. Place your cursor over the Splitter transformation until a pencil icon appears.
b. When the pencil icon appears, click and drag the cursor to the first output table,
DIFT Current Staff.
5.1 Introduction to Jobs and the Job Editor 5-17
c. Again, place your cursor over the Splitter transformation until a pencil icons appears, and click
and drag the cursor to the second output table, DIFT Terminated Staff.
19. Select File Save to save diagram and job metadata to this point.
5-18 Chapter 5 Creating Metadata for Jobs
2) Select Row Selection Conditions as the value for the Row Selection Type field.
7) Click .
d. Specify the subsetting criteria for the DIFT Terminated Staff table object.
2) Select Row Selection Conditions as the value for the Row Selection Type field.
3) Click below the Selection Conditions area. The Expression window opens.
7) Click .
f. Verify that all Target Table columns have an arrow coming in to them (that is, all target columns
will receive data from a source column).
21. Select File Save to save diagram and job metadata to this point.
5-24 Chapter 5 Creating Metadata for Jobs
26. Scroll to view the note about the creation of the DIFTTGT.TERM_STAFF table:
5.1 Introduction to Jobs and the Job Editor 5-27
27. View the data for the DIFT Current Staff table object.
a. Right-click on the DIFT Current Staff table object and select Open.
b. When finished viewing the data, select File Close to close the View Data window.
5-28 Chapter 5 Creating Metadata for Jobs
28. View the data for the DIFT Terminated Staff table object.
a. Right-click on the DIFT Terminated Staff table object and select Open.
b. When finished viewing the data, select File Close to close the View Data window.
29. Select File Close to close the Job Editor. If necessary, save changes to the job. The new job object
appears on the Checkouts tab.
30. Select Check Outs Check In All.
a. Type Adding job that populates current & terminated staff tables as
the value for the Title field.
New Jobs
New jobs are initialized by the New Job wizard. The New
Job wizard names a job and the metadata location.
(A description can optionally be specified.)
Selecting creates
an empty job.
10
Job Editor
The Job Editor window enables you to create, maintain,
and troubleshoot SAS Data Integration Studio jobs. To
display this window for an existing job, right-click a job in
the tree view and select Open.
11
5-30 Chapter 5 Creating Metadata for Jobs
Pane Description
Details Used to monitor and debug a job.
To display, select View Details from the
desktop.
Runtime Manager Displays the run-time status of the current job,
the last time that the job was executed in the
current session, and the SAS Application
Server that was used to execute the job.
To display, select View Runtime Manager
from the Desktop.
Actions History Displays low-priority errors and warnings.
To display, select View Actions History
from the Desktop.
12
13
5.1 Introduction to Jobs and the Job Editor 5-31
14
15
5-32 Chapter 5 Creating Metadata for Jobs
Introduction to Transformations
A transformation is a metadata object that specifies how
to extract data, transform data, or load data into data
stores. Each transformation that you specify in a process
flow diagram generates or retrieves SAS code. You can
also specify user-written code in the metadata for any
transformation in a process flow diagram.
16
Transformations Tree
The Transformations tree organizes transformations into a
set of folders. You can drag a transformation from the
Transformations tree to the Job Editor, where you can
connect it to source and target tables and update its
default metadata. By updating a transformation with the
metadata for actual sources, targets, and transformations,
you can quickly create process flow
diagrams for common scenarios.
The display shows the standard
Transformations tree.
17
5.2 Using the SQL Join Transformation 5-33
Objectives
Discuss components of SQL Joins Designer window.
20
21
5-34 Chapter 5 Creating Metadata for Jobs
22
Navigate
pane
SQL Clauses
pane
Properties
pane
23
5.2 Using the SQL Join Transformation 5-35
24
25
The Tables pane displays when a table object is selected in the Navigate pane, and when the Select
keyword is selected in the Navigate pane. The Tables pane might also display when other aspects of
particular joins are requested (for instance, the surfacing of Having, Group by, and Order by
information). The Tables pane is displayed in the same location as the SQL Clauses pane.
5-36 Chapter 5 Creating Metadata for Jobs
26
27
5.2 Using the SQL Join Transformation 5-37
Product_Category
Product_Line
28
Calculating Product_Category
Product_Category values are calculated by:
Q performing a grouping using Product_ID
Calculating Product_Line
Product_Line values are calculated by:
Q performing a grouping using Product_ID
32
Product_Category:
put(int(product_id/100000000)*100000000,product.)
Product_Line:
put(int(product_id/10000000000)*10000000000,product.)
33 ...
5-40 Chapter 5 Creating Metadata for Jobs
Product_Category:
put(int(product_id/1e8)*1e8,product.)
Product_Line:
put(int(product_id/1e10)*1e10,product.)
34 ...
5.2 Using the SQL Join Transformation 5-41
In this demonstration, you can take advantage of the SQL Join transformation to join the DIFT
Product_List and DIFT Supplier Information source tables to create the target table
DIFT Product Dimension.
c. Click to close the Connection Profile window and access the Log On window.
d. Type Barbara as the value for the User ID field and Student1 as the value for the
Password field.
c. Click .
5-42 Chapter 5 Creating Metadata for Jobs
7. Select File Save to save diagram and job metadata to this point.
5-44 Chapter 5 Creating Metadata for Jobs
9. Rename the temporary table object associated with the File Reader transformation.
a. Right-click on the green temporary table object and select Properties.
10. Select File Save to save diagram and job metadata to this point.
11. Add the SQL Join transformation to the diagram.
a. In the tree view, click the Transformations tab.
b. Expand the Data grouping.
c. Select the SQL Join transformation.
5.2 Using the SQL Join Transformation 5-47
12. Select File Save to save diagram and job metadata to this point.
13. Add inputs to the SQL Join transformation.
a. Place your cursor over the SQL Join transformation in the diagram to reveal the two default
ports.
b. Connect the DIFT PRODUCT_LIST table object to one of the input ports for the SQL Join.
c. Connect the File Reader transformation (click on the temporary table icon, , associated with
the File Reader and drag) to the other port of the SQL Join transformation.
5-48 Chapter 5 Creating Metadata for Jobs
14. Select File Save to save diagram and job metadata to this point.
15. Add the DIFT Product Dimension table object as the output for the SQL Join.
a. Right-click on the temporary table objects associated with the SQL Join transformation and select
Replace.
b. In the Table Selector window, expand the Data Mart Development Orion Target Data
folders.
c. Select DIFT Product Dimension table object.
d. Select .
5.2 Using the SQL Join Transformation 5-49
16. Select File Save to save diagram and job metadata to this point.
17. Review properties of the File Reader transformation.
a. Right-click on the File Reader transformation and select Properties.
b. Click the Mappings tab.
c. Verify that all target columns have a column mapping.
18. Select File Save to save diagram and job metadata to this point.
5-50 Chapter 5 Creating Metadata for Jobs
b. Select the Join item on the Diagram tab. Verify that the Join is an Inner join from the Properties
pane.
The type of join can also be verified and or changed by right-clicking on the Join
item in the process flow of the SQL Join clauses. A pop-up menu appears that has
the type of Join checked, but also enables selection of another type of join.
5-52 Chapter 5 Creating Metadata for Jobs
c. Select the Where keyword in the Navigate pane to surface the Where tab.
d. Verify that the Inner join will be executed based on the values of Supplier_ID columns from
the sources being equal.
1) Click in the top portion of the Where tab. A row gets added with the logical AND
as the Boolean operator.
5.2 Using the SQL Join Transformation 5-53
2) Select Choose column(s) from the drop-down list under the first Operand column.
5) Type 1 (numeral one) in the field for the second Operand column and press ENTER.
f. Select the Select keyword in the Navigate pane to surface the Select tab.
g. Map the Country column to Supplier_Country by clicking on the Country column and
dragging to the Supplier_Country.
The Expression field is what needs to be filled in for the three columns.
5.2 Using the SQL Join Transformation 5-57
i. Re-order the columns so that Product_Group is first, then Product_Category, and then
Product_Line.
5) Click .
6) Click .
2) In the Expression column, select Advanced from the drop-down list. The Expression
window opens.
3) If necessary, access the HelperFile.txt file in S:\Workshop\dift.
4) Copy the expression for Product_Category:
put(int(product_list.product_id/1e8)*1e8,product.)
6) Click .
7) Click .
2) In the Expression column, select Advanced from the drop-down list. The Expression
window is opened.
3) If necessary, access the HelperFile.txt file in S:\Workshop\dift.
4) Copy the expression for Product_Line:
Put(int(product_list.product_id/1e10)*1e10,product.)
6) Click .
7) Click .
m. Click to fold the target table info back to the right side.
n. Select File Save to save changes to the SQL Join transformation.
A warning occurred in the execution of the SQL Join. You see a change in the coloring of the
transformation in the process flow and the symbol overlay.
c. Double-click the Warning for the SQL Join. The Warnings and Errors tab is moved forward with
the warning message.
5.2 Using the SQL Join Transformation 5-63
21. Edit the SQL Join transformation to fix the column mappings.
a. Right-click on the SQL Join transformation and select Open.
b. Click the Select keyword on the Navigate pane to surface the Select tab. Note the warning
symbol, , associated with each of the three calculated columns.
c. Map the Product_ID column to the Product_Group column (click on Product_ID in the
Source table side and drag to Product_Group in the Target table side).
5-64 Chapter 5 Creating Metadata for Jobs
22. Run the job by right-clicking in background of the job and selecting Run.
24. View the log for the executed job by selecting the Log tab.
25. Scroll to view the note about the creation of the DIFTTGT.PRODDIM table:
5.2 Using the SQL Join Transformation 5-67
Exercises
Some specifics for creating the job to load the OrderFact table are show below:
Name the job DIFT Populate Order Fact Table.
Two tables should be joined together, DIFT ORDER_ITEM and DIFT ORDERS.
The SQL Join transformation will be used for the inner join based on Order_ID from the input
tables.
No additional processing is necessary beyond the SQL Join; therefore, the targets can be loaded
directly from the SQL Join transformation.
After verifying that the table is created successfully (OrderFact should have 951,669
observations and 12 variables.), check in the job object.
2. Loading Recent and Old Orders Tables
Some specifics for creating the job to load the Old Orders and Recent Orders tables are
shown below:
Name the job DIFT Populate Old and Recent Orders Tables.
Use the SAS Splitter transformation to break apart the observations from the OrderFact table.
Old orders are defined to be orders older than those placed in 2005. An expression that can be used
to find the observations for this data is the following:
Objectives
Investigate mapping and propagation.
Investigate chaining of jobs.
Work with performance statistics.
Generate reports on metadata for tables and jobs.
39
Automatic Mappings
By default, SAS Data Integration Studio automatically
creates a mapping when a source column and a target
column have the same column name, data type, and
length.
Events that trigger automatic mapping include:
Q connecting a source and a target to the transformation
on the Diagram tab
Q clicking Propagate on the toolbar or in the pop-up
menu in the Job Editor window
Q clicking Propagate on the Mappings tab toolbar and
selecting a propagation option
Q clicking Map all columns on the Mappings tab toolbar.
40
5-70 Chapter 5 Creating Metadata for Jobs
41
Automatic Propagation
Automatic propagation sends column changes to tables
when process flows are created. If you disable automatic
propagation and refrain from using manual propagation,
you can propagate column changes on the Mappings tab
for a transformation that are restricted to the target tables
for that transformation. Automatic propagation controls
can occurs at various levels:
Q Global
Q Job
Q Process flow
Q Transformation
42
5.3 Working with Jobs 5-71
This demonstration investigates automatic and manual propagation and mappings. The propagation is
investigated from sources to targets only. Propagation can be done from targets to sources.
1. If necessary, access SAS Data Integration Studio using Brunos credentials.
a. Select Start All Programs SAS SAS Data Integration Studio 4.2.
b. Verify that the connection profile is My Server.
c. Click to close the Connection Profile window and access the Log On window.
d. Type Bruno as the value for the User ID field and Student1 as the value for the
Password field.
Note that four columns are character and two are numeric.
5. Create a target table with the same attributes as the DIFT PRODUCTS (Copy) table.
a. Right-click the DIFT PRODUCTS (Copy) table object and select Copy.
b. Right-click the DIFT Additional Examples folder and select Paste.
c. Right-click the Copy of DIFT PRODUCTS (Copy) table object and select Properties.
d. Type DIFT PRODUCTS Information as the value for the Name field (on the General tab).
k. Verify that the DBMS appropriately updated to SAS with this new library selection.
The new table object appears under the DIFT Additional Examples folder.
6. Create a target table with different attributes as the DIFT PRODUCTS (Copy) table.
a. Right-click the DIFT PRODUCTS (Copy) table object and select Copy.
b. Right-click the DIFT Additional Examples folder and select Paste.
c. Right-click the Copy of DIFT PRODUCTS (Copy) table object and select Properties.
d. Type DIFT PRODUCTS Profit Information as the value for the Name field (on the
General tab).
m. Verify that the DBMS appropriately updated to SAS with this new library selection.
n. Type PRODUCTSProfitInfo as the value for the Name field.
The new table object appears under the DIFT Additional Examples folder.
5-78 Chapter 5 Creating Metadata for Jobs
c. Click .
d. Select DIFT Orion Target Tables Library as the value for the Library field.
f. Click .
The new table object appears under the DIFT Additional Examples folder.
5.3 Working with Jobs 5-79
c. Verify that /Data Mart Development/DIFT Additional Examples is the value for the
Location field.
d. Click .
10. Add table objects and transformations to the Diagram tab of the Job Editor.
a. Click and drag the DIFT PRODUCTS (Copy) table object to the Diagram tab of the Job Editor.
b. Click and drag the DIFT PRODUCTS Information table object to the Diagram tab of the Job
Editor.
c. Click the Transformations tab.
d. Expand the Data grouping of transformations.
e. Click and drag the Extract transformation to the Diagram tab of the Job Editor.
f. Expand the Access grouping of transformations.
g. Click and drag the Table Loader transformation to the Diagram tab of the Job Editor.
5.3 Working with Jobs 5-81
A similar message regarding no mappings defined can be found for the Table Loader
transformation.
11. Connect the objects in the process flow diagram and investigate the mappings.
a. Connect the DIFT PRODUCTS (Copy) table object to the Extract transformation.
b. Connect the Extract transformation to the Table Loader transformation.
c. Connect the Table Loader transformation to the DIFT PRODUCTS Information table object.
The process flow diagram updates to the following:
5-82 Chapter 5 Creating Metadata for Jobs
Mappings can be investigated by opening the Properties window for each transformation.
Alternatively the Details section displays defined mappings for the selected transformation.
d. If necessary, select View Details to display the Details area within the Job Editor window.
(Optionally, the Details section can be displayed by clicking the tool in the Job Editor tools).
Automatic mappings occur between a source column and a target column when these columns
have the same column name, data type, and length.
g. Click the Table Loader transformation on the Diagram tab.
h. View the mappings in the Details section.
All mappings were automatically established between the source columns and the target columns.
5.3 Working with Jobs 5-83
12. Add additional table objects and transformations to the Diagram tab of the Job Editor.
a. Click and drag the DIFT PRODUCTS (Copy) table object to the Diagram tab of the Job Editor.
b. Click and drag the DIFT PRODUCTS Profit Information table object to the Diagram tab of
the Job Editor.
c. Click the Transformations tab.
d. Expand the Data grouping of transformations.
e. Click and drag the Extract transformation to the Diagram tab of the Job Editor.
f. Expand the Access grouping of transformations.
g. Click and drag the Table Loader transformation to the Diagram tab of the Job Editor.
13. Connect the new objects in the process flow diagram and investigate the mappings.
a. Connect the DIFT PRODUCTS (Copy) table object to the Extract transformation.
b. Connect the Extract transformation to the Table Loader transformation.
c. Connect the Table Loader transformation to the DIFT PRODUCTS Profit Information table
object.
The process flow diagram updates to the following:
a. If necessary, select View Details to display the Details area within the Job Editor window.
f. Manually map the TYPE columns. A Warning will appear. An expression could be written to
avoid this warning otherwise the Log for the job will contain a WARNING message as well.
i. Scroll in the target table side of the Mappings tab to view the automatic expression for the Sex
column.
5-86 Chapter 5 Creating Metadata for Jobs
15. Turn off automatic mappings for the job and rebuild the connections for the first part of this process
flow.
a. Right-click in the background of the Job Editor and select Settings Automatically Map
Columns.
b. Click on each of the connections for the first flow, right-click and select Delete.
5.3 Working with Jobs 5-87
c. Re-connect the DIFT PRODUCTS (Copy) table object to the Extract transformation.
h. Click on the Mappings tab tool set. All columns are mapped.
i. Click on the Mappings tab tool set. All columns are no longer mapped.
l. Click on the Mappings tab tool set, and verify that Include Selected Columns in
Mapping is selected.
m. Click on the Mappings tab tool set. The selected column is manually mapped.
5.3 Working with Jobs 5-89
b. Click .
c. Verify that /Data Mart Development/DIFT Additional Examples is the value for the
Location field.
d. Click .
19. Add table objects and transformations to the Diagram tab of the Job Editor.
a. Click and drag the DIFT PRODUCTS (Copy) table object to the Diagram tab of the Job Editor.
b. Click and drag the DIFT PRODUCTS Profit Information (2) table object to the Diagram tab
of the Job Editor.
c. Click the Transformations tab.
d. Expand the Data grouping of transformations.
e. Click and drag the Extract transformation to the Diagram tab of the Job Editor.
f. Click and drag the Sort transformation to the Diagram tab of the Job Editor.
g. Expand the Access grouping of transformations.
h. Click and drag the Table Loader transformation to the Diagram tab of the Job Editor.
5.3 Working with Jobs 5-91
20. Connect the source to the first transformation and investigate the propagation.
a. Connect the DIFT PRODUCTS (Copy) table object to the Extract transformation.
b. Click the Extract transformation on the Diagram tab.
c. Click the Mappings tab in the Details section.
All columns from the source are propagated and mapped to the target.
21. Turn off automatic propagation and mappings for the job.
a. Right-click in the background of the job and select Settings Automatically Propagate
Columns.
b. Right-click in the background of the job and select Settings Automatically Map Columns.
These selections can also be made from the Job Editors tool set.
5-92 Chapter 5 Creating Metadata for Jobs
22. Break the connection between the DIFT PRODUCTS (Copy) table object and the Extract
transformation (click on the connection, right-click, and select Delete).
23. Remove the already propagated columns from the output table of the Extract transformation.
a. Right-click on the output table object associated with the Extract transformation and select
Properties.
e. Click . to close the properties window for the Extract output table.
5.3 Working with Jobs 5-93
24. Reconnect the source to the Extract transformation and investigate the propagation.
a. Connect the DIFT PRODUCTS (Copy) table object to the Extract transformation.
b. Click the Extract transformation on the Diagram tab.
c. Click the Mappings tab in the Details section.
All columns from the source are NOT propagated and mapped to the target.
25. Manually propagate all columns.
b. Click on the Mappings tab tool set. All columns are removed from the target table side.
5-94 Chapter 5 Creating Metadata for Jobs
Chaining Jobs
Existing jobs can be added to the Diagram tab of the Job
Editor window. These jobs are added to the control flow in
the order that they are added to the job. This sequence is
useful for jobs that are closely related - however, the jobs
do not have to be related. You can always change the
order of execution for the added jobs in the Control Flow
tab of the Details pane.
Chaining Jobs
c. Click to close the Connection Profile window and access the Log On window.
d. Type Bruno as the value for the User ID field and Student1 as the value for the
Password field.
6. Return to the Folders tab, and expand the folders to Data Mart Development
Orion Jobs.
5-100 Chapter 5 Creating Metadata for Jobs
7. Select the DIFT Populate Old and Recent Orders Tables job and drag it to the Diagram tab of the
job editor window.
8. Select the DIFT Populate Order Fact Table job and drag it to the Diagram tab of the job editor
window.
The first job connects automatically to the second job.
The OrderFact table, created in the DIFT Populate Order Fact Table job, is the source table for
the DIFT Populate Old and Recent Orders Tables job. Therefore, DIFT Populate the Order Fact
Table job should run first and then the DIFT Populate Old and Recent Orders Tables job.
5.3 Working with Jobs 5-101
12. Select View Layout Left to Right. The diagram updates with the correct ordering and in the
horizontal view.
15. Click the Status tab in the Details pane and verify that the jobs both ran successfully in concurrence.
46
47
5-104 Chapter 5 Creating Metadata for Jobs
48
5.3 Working with Jobs 5-105
c. Click to close the Connection Profile window and access the Log On window.
d. Type Bruno as the value for the User ID field and Student1 as the value for the
Password field.
The Collect Table Statistics choice populates the Records field - otherwise, a zero is listed.
10. Scroll to the right in the table of statistics.
5-108 Chapter 5 Creating Metadata for Jobs
11. Click on the Statistics tab toolbar. The Save window is displayed.
14. Click .
5.3 Working with Jobs 5-109
15. Access a Windows Explorer window by selecting Start All Programs Accessories
Windows Explorer.
16. Navigate to S:\Workshop\dift\reports.
17. Double-click DIFTTestJobStatsRun1.csv. Microsoft Excel opens and displays the saved statistics.
19. Click on the Statistics tab toolbar. The table view changes to a graphical view, which is a line
graph by default.
20. Click between Line Graph and Bar Chart. All of the reported statistics are selected by
default.
5.3 Working with Jobs 5-111
22. Click . The line graph updates to the following view of the requested statistics:
5-112 Chapter 5 Creating Metadata for Jobs
25. Click . The line graph updates to the following view of the requested statistics:
5.3 Working with Jobs 5-113
28. Click . The line graph updates to the following view of the requested statistics:
5-114 Chapter 5 Creating Metadata for Jobs
29. Click Bar Chart on the Statistics tab toolbar. The table view changes to a bar chart view.
The bar chart quickly tells us that the Table Loader transformation took almost three times as long as
the rank transformation. The SQL Join transformation ran very quickly, compared to the other
transformations.
30. Place your cursor over the bar for the SQL Join transformation. Tooltip text appears with summarized
information about the processing of the SQL Join transformation.
5.3 Working with Jobs 5-115
31. Place your cursor over the bar for the Table Loader transformation. Tooltip text appears with
summarized information about the processing of the Table Loader transformation.
5-116 Chapter 5 Creating Metadata for Jobs
The bar chart updates to a single bar for just the Table Loader transformation. The scaling for times is
easier to read for this selected transformation.
5.3 Working with Jobs 5-117
34. Click on Statistics tab toolbar. The Print window opens as displayed.
The graph can be written to a file, and then printed from the file.
35. Click .
5-118 Chapter 5 Creating Metadata for Jobs
36. Click on Statistics tab toolbar. The Save to File window opens as displayed.
39. Click .
About Reports
The reports featured in SAS Data Integration Studio can
be used to generate reports - metadata for tables and
jobs can be reviewed in a convenient format.
Reports enable you to:
Q find information about a table or job quickly
50
5-120 Chapter 5 Creating Metadata for Jobs
c. Click to close the Connection Profile window and access the Log On window.
d. Type Bruno as the value for the User ID field and Student1 as the value for the
Password field.
4. Click from the Reports windows toolbar. The Report Options window is displayed.
Verify that the default report format is set to HTML. A valid CSS file can be specified as well as
additional ODS HTML statement options.
5-122 Chapter 5 Creating Metadata for Jobs
a. Click next to the Default Location field. The Select a directory window is
opened.
b. Navigate to S:\Workshop\dift\reports.
d. Type JobsReport as the name of the new folder and press ENTER.
12. When done viewing the report, select File Close from the browser window.
5.3 Working with Jobs 5-127
13. To create a document object, click Job Documentation and then select .
16. Click .
5-128 Chapter 5 Creating Metadata for Jobs
f. Select File Save to save diagram and job metadata to this point.
g. Add the SQL Join transformation to the diagram.
1) In the tree view, select the Transformations tab.
2) Expand the Data grouping.
3) Select the SQL Join transformation.
4) Drag the SQL Join transformation to the diagram.
5) Center the SQL Join so that it is in middle of the DIFT ORDER_ITEM table object and
DIFT ORDERS table object.
5) Click .
k. Select File Save to save diagram and job metadata to this point.
l. Review the properties of the SQL Join transformation.
1) Right-click on the SQL Join transformation and select Open. The Designer window opens.
2) Select the Join item on the Diagram tab. Verify that the Join is an Inner Join from the
Properties pane.
3) Verify that the Inner join will be executed based on the values of Order_ID columns from
the sources being equal.
4) Select the Select keyword on the Navigate pane to surface the Select tab.
5) Verify that all target columns are mapped.
5) Click .
6) Right-click on the other temporary output table for the Splitter and select Replace.
7) Verify that the Folders tab is selected.
8) Expand the Data Mart Development Orion Target Data folder.
9) Select DIFT Recent Orders.
10) Click .
11) If necessary, separate the two target table objects. The process flow diagram should
resemble the following:
l. Select File Save to save diagram and job metadata to this point.
5.4 Solutions to Exercises 5-133
b) Select Row Selection Conditions as the value for the Row Selection Type field.
g) Click .
i) Type 01jan2005d.
j) Click .
k) Click .
The Selection Conditions area on the Row Selection tab updates to the following:
5-134 Chapter 5 Creating Metadata for Jobs
4) Specify the subsetting criteria for the DIFT Old Orders table object.
b) Select Row Selection Conditions as the value for the Row Selection Type field.
g) Click .
i) Type 01jan2005d.
j) Click .
k) Click .
The Selection Conditions area on the Row Selection tab updates to the following:
n. Select File Save to save the diagram and job metadata to this point.
5.4 Solutions to Exercises 5-135
6) View the data for the DIFT Recent Orders table object.
a) Right-click on the DIFT Recent Orders table object and select Open.
b) When finished viewing the data, select File Close to close the View Data window.
7) View the data for the DIFT Old Orders table object.
a) Right-click on the DIFT Old Orders table object and select Open.
b) When finished viewing the data, select File Close to close the View Data window.
p. Select File Close to close the Job Editor. The new job object appears on the Checkouts tab.
5-136 Chapter 5 Creating Metadata for Jobs
6.1 Exercises
Organization Customer
Dimension Dimension
Order
Fact Table
Product Time
Dimension Dimension
5
6.1 Exercises 6-5
Organization Staff
7
6-6 Chapter 6 Orion Star Case Study
In this exercise set you will define and load the three remaining dimension tables. For each of these
target tables, you will need to define:
metadata for the source tables
metadata for the target table
metadata for the process flow to move the data from the source(s) to the target
In addition, metadata objects for data sources needed for showing features of the software will be defined.
Column Expression
Customer_Age Floor(Yrdif(customer.birth_date, today(), 'actual'))
The metadata for the columns can be imported from the DIFT Customer Dimension
metadata object and the text for the expressions can be found in HelperFile.txt.
4. Check-In the Metadata Objects for the Customer Dimension
After verifying that the job ran successfully, check in all the objects from the project repository.
6-8 Chapter 6 Orion Star Case Study
The calculations for the desired computed columns are shown below:
Name Expression
Group put(calculated _Group, org.)
The metadata for the columns can be imported from the DIFT Organization Dimension
metadata object and the text for the expressions can be found in HelperFile.txt.
6-10 Chapter 6 Orion Star Case Study
Verify that all columns for the Table Loaders target table have a defined mapping.
The warning message regarding a compressed data set occurs because the DIFT Organization
table has a COMPRESS=YES property set. Edit the table properties to set this to NO.
For Method 1:
Use the New Table wizard to create a metadata table object named DIFT Time Dimension.
Store the metadata object in the /Data Mart Development/Orion Target Data folder.
The following columns must be entered manually:
NAME LENGTH TYPE FORMAT
Date_ID 4 Numeric Date9.
WeekDay_Num 8 Numeric
WeekDay_Name 9 Character
Month_Num 8 Numeric
Year_ID 4 Character
Month_Name 9 Character
Quarter 6 Character
Holiday_US 26 Character
Fiscal_Year 4 Character
Fiscal_Month_Num 8 Numeric
Fiscal_Quarter 6 Character
Name the physical table, a SAS table, TimeDim, and store it in the DIFT Orion Target Tables
Library.
6-12 Chapter 6 Orion Star Case Study
For Method 2:
In SAS Data Integration Studio, select Tools Code Editor.
In the Enhanced Editor window, include the TimeDim.sas program from the
S:\Workshop\dift\SASCode directory.
Submit the program and verify that no errors were generated in the Log window.
In SAS Data Integration Studio, invoke the Register Tables wizard.
Store the metadata object in the \Data Mart Development\Orion Target Data folder.
Select SAS as the source type, and select DIFT Orion Target Tables Library.
The TimeDim table should be available.
Set the name of the metadata table object to DIFT Time Dimension.
Verify (update if necessary) that the length of Date_ID is 4.
In the Code Editor window, uncomment the PROC DATASETS step and run just that step. Verify
that the TimeDim table is deleted (check the Log). You will re-create it via a SAS Data Integration
Studio job.
Close the Code Editor window. (Select File Close.) Do not save any changes.
10. Loading the Time Dimension Target Table
Some specifics for creating the job to load the DIFT Time Dimension table are shown below:
Name the job DIFT Populate Time Dimension Table.
Store the metadata object in the /Data Mart Development/Orion Jobs folder.
Use the User Written Code transformation to specify the code to load this table.
Add the Table Loader transformation to the process flow for visual effect but specify to exclude
this transformation from running.
d. Select next to the SAS Library field and then click DIFT Orion Source Tables Library.
e. Click . The Define Tables and Select Folder Location window displays.
g. Verify that /Data Mart Development/Orion Source Data is the folder listed for the Location
field.
e. Verify that the location is set to /Data Mart Development/Orion Target Data.
f. Click .
h. Select DIFT Orion Target Tables Library as the value for the Library field.
j. Click .
6-14 Chapter 6 Orion Star Case Study
k. Expand the Data Mart Development Orion Source Data folder on the Folders tab.
l. From the Data folder, expand DIFT Customer Types table object.
m. Select the Customer_Type and Customer_Group columns from DIFT Customer Types and
click to move the columns to the Selected pane.
p. Select the following columns from DIFT CUSTOMER and click to move the columns to the
Selected pane:
Customer_ID
Country
Gender
Customer_Name
Customer_FirstName
Customer_LastName
Birth_Date
q. Click .
1) Click .
4) Select the Customer_ID column and move it to the Indexes panel by clicking .
5) Click .
u. Click .
v. Review the metadata listed in the finish window and then click . The new table object
appears on the Checkouts tab.
3. Loading the Customer Dimension Target Table
a. Select the Folders tab.
b. Expand Data Mart Development Orion Jobs.
c. Verify that the Orion Jobs folder is selected.
d. Select File New Job. The New Job window opens.
1) Type DIFT Populate Customer Dimension Table as the value for the Name
field.
2) Verify that the Location is set to /Data Mart Development/Orion Jobs.
4) Click .
k. Select File Save to save diagram and job metadata to this point.
l. Review the properties of the SQL Join transformation.
1) Right-click on the SQL Join transformation and select Open. The Designer window opens.
2) Select the Join item on the Diagram tab.
3) In the Properties pane, set the Join to Left.
6.2 Solutions to Exercises 6-17
4) Establish the join criteria of the Customer_Type_ID columns from the sources being
equal.
a) Double-click the Left icon in the process flow diagram for SQL clauses.
b) Click .
c) Under the first Operand field, click and then Choose column(s).
d) Expand DIFT Customer Types and select Customer_Type_ID.
e) Click .
g) Under the second Operator field, click and then Choose column(s).
h) Expand DIFT Customer and select Customer_Type_ID.
i) Click .
a) Double-click the Where keyword on the Navigate pane to surface the Where tab.
b) Click .
c) Under the first Operand field, click and then Choose column(s).
d) Expand DIFT Customer Types and select Customer_Type_ID.
e) Click .
11) Manually map the Birth_Date source column to the Customer_Age column.
d. Select next to the SAS Library field and then click DIFT Orion Source Tables Library.
e. Click . The Define Tables and Select Folder Location window displays.
g. Verify that /Data Mart Development/Orion Source Data is the folder listed for the Location
field.
e. Verify that the location is set to /Data Mart Development/Orion Target Data.
f. Click .
h. Select DIFT Orion Target Tables Library as the value for the Library field.
j. Click .
6-20 Chapter 6 Orion Star Case Study
k. Expand the Data Mart Development Orion Source Data folder on the Folders tab.
l. From the Data folder, expand DIFT STAFF table object.
m. Select the following columns from DIFT STAFF and click to move the columns to the
Selected pane.
Job_Title
Salary
Gender
Birth_Date
Emp_Hire_Date
Emp_Term_Date
n. Select the Checkouts tab.
o. Expand DIFT ORGANIZATION table object.
p. Select the following columns from DIFT ORGANIZATION and click to move the columns
to the Selected pane.
Employee_ID
Org_Name
Country
q. Click .
1) Click .
4) Select the Employee_ID column and move it to the Indexes panel by clicking .
5) Click .
u. Click .
v. Review the metadata listed in the finish window and then click . The new table object
appears on the Checkouts tab.
6-22 Chapter 6 Orion Star Case Study
3) Click .
4) To connect the DIFT ORGANIZATION table object to the SQL Join, place your cursor over
the connection selector until a pencil icon appears.
5) Click on this connection selector and drag to one of the input ports for the SQL Join
transformation.
h. Select File Save to save diagram and job metadata to this point.
i. Add the Table Loader transformation to the diagram.
1) In the tree view, select the Transformations tab.
2) Expand the Access grouping.
3) Select the Table Loader transformation.
4) Drag the Table Loader transformation to the diagram.
5) Center the Table Loader so that it is to the right of the SQL Join transformation.
j. Connect the SQL Join transformation (click on the temporary table icon, , associated with the
SQL Join and drag) to the Table Loader transformation.
k. Select File Save to save diagram and job metadata to this point.
l. Add the DIFT Organization Dimension table object to the process flow.
p. Select File Save to save diagram and job metadata to this point.
q. Specify the properties of the SQL Join transformation.
1) Right-click on the SQL Join transformation and select Open. The Designer window opens.
2) Right-click the Join item on the Diagram tab and change the join to a Left join.
3) Double-click the Where keyword on the SQL Clauses pane.
6-24 Chapter 6 Orion Star Case Study
4) Select the Where keyword on the Navigate pane to surface the Where tab.
a) Click .
b) Select Choose column(s) from the drop-down list under the first Operand column.
d) Click .
5) Select the Select keyword on the Navigate pane to surface the Select tab.
a) Remove all target columns by right-clicking someplace over the target table side and
choose Select All.
i) Map columns.
(1) Right-click in the panel between source and target columns, and select Map All.
Three of the target columns map.
(2) Map the Country column to Employee_Country by clicking on the Country
column and dragging to the Employee_Country.
(3) Map the Gender column to Employee_Gender by clicking on the Gender
column and dragging to the Employee_Gender.
(4) Map the Org_Name column to Employee_Name by clicking on the Org_Name
column and dragging to the Employee_Name.
(5) Map the Birth_Date column to Employee_Birth_Date by clicking on the
Birth_Date column and dragging to the Employee_Birth_Date.
(6) Map the Emp_Hire_Date column to Employee_Hire_Date by clicking on the
Emp_Hire_Date column and dragging to the Employee_Hire_Date.
6.2 Solutions to Exercises 6-25
j) Click to fold the target table over the source table. This provides more room to work on
the calculated expressions.
(1) Select Employee_Country (or the last column before the four that need expressions)
and then click on the toolbar.
(14) Locate the _Section column. In the Expression column, select Advanced
from the drop-down list. The Expression window opens as displayed.
6-26 Chapter 6 Orion Star Case Study
(22) Locate the _Company column. In the Expression column, select Advanced
from the drop-down list. The Expression window opens as displayed:
(23) Copy the expression for _Company from HelperFile.txt.
Input(Put(calculated _department,orgdim.),12.)
(26) Locate the Group column. In the Expression column, select Advanced from
the drop-down list. The Expression window opens as displayed.
(27) Copy the expression for Group from HelperFile.txt.
Put(calculated _group,org.)
(30) Locate the Section column. In the Expression column, select Advanced
from the drop-down list. The Expression window opens as displayed:
(31) Copy the expression for Section from HelperFile.txt.
Put(calculated _section,org.)
(38) Locate the Company column. In the Expression column, select Advanced
from the drop-down list. The Expression window opens as displayed.
(39) Copy the expression for Company from HelperFile.txt.
Put(calculated _company,org.)
k) Click to fold the target table info back to the right side.
l) Map the Employee_ID column from the ORGANIZATION table to _Group.
s. Run the job by right-clicking in background of the job and select Run. The job runs without
Errors or Warnings.
u. View the Log for the executed Job by selecting the Log tab.
6-28 Chapter 6 Orion Star Case Study
v. Scroll to view the note about the creation of the DIFTTGT.ORGDIM table:
e. Verify that the location is set to /Data Mart Development/Orion Target Data.
f. Click .
h. Select DIFT Orion Target Tables Library as the value for the Library field.
j. Click .
k. Click .
3) Select the Date_ID column and move to the Indexes panel by clicking .
4) Click .
n. Click .
o. Click .
3) Click .
6-32 Chapter 6 Orion Star Case Study
f. Select File Save to save diagram and job metadata to this point.
g. Add the Table Loader transformation to the diagram.
1) In the tree view, select the Transformations tab.
2) Expand the Access grouping.
3) Select the Table Loader transformation.
4) Drag the Table Loader transformation to the diagram.
5) Center the Table Loader so that it is to the right of the User Written Code transformation.
h. Connect the User Written Code transformation to the Table Loader transformation.
i. Select File Save to save diagram and job metadata to this point.
j. Add target table to the diagram.
1) Select Checkouts tab.
2) Select the DIFT Time Dimension table object.
3) Drag the DIFT Time Dimension table object to the diagram.
4) Connect the Table Loader transformation to DIFT Time Dimension table object.
k. Select File Save to save diagram and job metadata to this point.
l. Specify properties for the User Written code transformation.
1) Right-click on User Written Code transformation and select Properties.
2) Select Code tab.
3) Select All user written as the value for the Code generation mode field.
6.2 Solutions to Exercises 6-33
5) Click .
6) Navigate to S:\Workshop\dift\SASCode.
7) Select TimeDim.sas.
6-34 Chapter 6 Orion Star Case Study
8) Click . The path and filename of the code file are now listed in the Open
window.
3) Click (propagate from target to sources tool) from the tool set on the Mappings tab.
6.2 Solutions to Exercises 6-35
4) Verify that all columns are mapped. If not, right-click in the panel between the source
columns and the target columns and select Map All. The mappings will be updated.
5) Click the Code tab.
6) Click Exclude transformation from run.
n. Run the job by right-clicking in the background of the job and select Run. The job runs without
Errors or Warnings.
o. View the Log for the executed Job by selecting the Log tab.
p. Select File Save to save diagram and job metadata to this point.
q. Select File Close to close the Job Editor.
11. Check-In the Metadata Objects for the Time Dimension
a. Select Check Outs Check In All.
b. Type Adding target table & job for Time Dimension as the value for the
Title field.
7.2 Using Extract, Summary Statistics, and Loop Transformations ................................ 7-7
Demonstration: Using the Extract and Summary Statistics Transformation .......................... 7-9
7.5 Using Transpose, Sort, Append, and Rank Transformations ................................... 7-95
Demonstration: Using the Transpose, Sort, Append, and Rank Transformations ............... 7-99
7.6 Basic Standardization with the Apply Lookup Standardization Transformation .. 7-123
Demonstration: Using the Apply Lookup Standardization Transformation......................... 7-125
7.1 Introduction
Objectives
List transformations that are discussed in this chapter.
Transformations Tree
The Transformations tree organizes transformations into a
set of folders. You can drag a transformation from the
Transformations tree to the Job Editor, where you can
connect it to source and target tables and update its
default metadata. By updating a transformation with the
metadata for actual sources, targets,
and transformations, you can quickly
create process flow diagrams for
common scenarios.
4
7-4 Chapter 7 Working with Transformations
Transformation Examples
This chapter has examples that use a number of
transformations available in SAS Data Integration Studio.
Here is a partial listing of those transformations:
Q Sort
Q Rank
Q Transpose
Q Data Validation
Q Extract
Q Append
Q Summary Statistics
Q One-Way Frequency
5
7.1 Introduction 7-5
This demonstration creates a series of subfolders under the Orion Reports folder. The new folders will be
used to organize the various metadata objects created and used in the subsequent sections of this chapter.
1. If necessary, access SAS Data Integration Studio using Brunos credentials.
a. Select Start All Programs SAS SAS Data Integration Studio 4.2.
b. Verify that the connection profile is My Server.
c. Click to close the Connection Profile window and open the Log On window.
d. Type Bruno as the value for the User ID field and Student1 as the value for the Password
field.
6. Press ENTER.
7. Right-click on the Orion Reports folder and select New Folder.
8. Type Loop Transforms as the name for the new folder.
9. Press ENTER.
10. Right-click on the Orion Reports folder and select New Folder.
11. Type Data Validation as the name for the new folder.
19. Right-click on the Orion Reports folder and select New Folder.
20. Type Status Handling as the name for the new folder.
Objectives
Discuss and use the Extract and Summary Statistics
transformation.
Discuss and use the Loop transformations.
Extract Transformation
The Extract transformation is
typically used to create a subset
from a source. It can also be used to
create columns in a target that are
derived from columns in a source.
10
7-8 Chapter 7 Working with Transformations
11
7.2 Using Extract, Summary Statistics, and Loop Transformations 7-9
This demonstration creates a report on customer order information for customers from the United States
who placed orders in 2007. The customer dimension information first need to be joined to the order fact
table and then subset, which is done in a separate job. A second job is created that will extract the desired
rows, and then a summary statistics report will be created from this extracted data.
1. If necessary, access SAS Data Integration Studio using Brunos credentials.
a. Select Start All Programs SAS SAS Data Integration Studio 4.2.
c. Click to close the Connection Profile window and open the Log On window.
d. Type Bruno as the value for the User ID field and Student1 as the value for the Password
field.
e. Type DIFT Populate Customer Order Information Table as the value for the
Name field.
3. Add source table metadata to the diagram for the process flow.
a. Select the Data Mart Development Orion Target Data Folders.
b. Drag the DIFT Customer Dimension table object to the Diagram tab of the Job Editor.
c. Drag the DIFT Order Fact table object to the Diagram tab of the Job Editor.
7-10 Chapter 7 Working with Transformations
d. Connect the DIFT Customer Dimension table object to one input port for the SQL Join
transformation.
e. Connect the DIFT Order Fact table object to the second input port for the SQL Join
transformation.
7.2 Using Extract, Summary Statistics, and Loop Transformations 7-11
h. Click .
j. Click .
6. Select File Save to save diagram and job metadata to this point.
7-14 Chapter 7 Working with Transformations
1) Click .
2) Under the first Operand field, click and then Advanced. The Expression Builder
window opens.
3) On the Functions tab, click the Date and Time folder under the Categories list.
5) Click .
9) Click .
10) Click .
12) In the second Operand field, click and type 2007 and then press ENTER.
g. Verify that all 22 target columns will be mapped one-to-one using a source column.
h. Select .
8. Select File Save to save diagram and job metadata to this point.
9. Run the job.
a. Right-click in background of the job and select Run.
b. Click the Status tab in the Details area. Note that all processes completed successfully.
d. View the Log for the executed Job. Scroll to view the note about the creation of the
DIFTTGT.CUSTOMERORDERINFO:
4) Click .
g. Right-click on the DIFT Customer Order Information table and select Open.
7.2 Using Extract, Summary Statistics, and Loop Transformations 7-17
10. When you are finished viewing the DIFT Customer Order Information table, close the
View Data window by selecting File Close.
11. Save and close the Job Editor window.
b. Select File Save to save diagram and job metadata to this point.
e. Type DIFT Create Report for US Customer Order Information as the value
for the Name field.
2. Add source table metadata to the diagram for the process flow.
b. If necessary, expand Data Mart Development Orion Reports Extract and Summary.
c. Drag the DIFT Customer Order Information table object to the Diagram tab of the Job Editor.
3. Add the Extract transformation to the process flow.
b. Expand the Data folder and locate the Extract transformation template.
c. Drag the Extract transformation to the Diagram tab of the Job Editor. Place the transformation
next to the table object.
7-18 Chapter 7 Working with Transformations
d. Connect the DIFT Customer Order Information table object to the Extract transformation.
b. Expand the Analysis folder and locate the Summary Statistics transformation template.
c. Drag the Summary Statistics transformation to the Diagram tab of the Job Editor. Place the
transformation next to the table object.
5. Select File Save to save diagram and job metadata to this point.
6. Specify properties for the Extract transformation.
c. In bottom portion of the Where tab, Click the Data Sources tab.
e. Select Customer_Country.
7. Select File Save to save diagram and job metadata to this point.
8. Specify properties for the Summary Statistics transformation.
2) Select Total Retail Price, hold down the CTRL key and select Quantity, and then click .
3) Click to close the Select Data Source Items window. The Select analysis
columns area updates as displayed:
7.2 Using Extract, Summary Statistics, and Loop Transformations 7-21
4) Click in the Select columns to subgroup data area to open the Select Data
Source Items window.
5) Select Customer Gender, hold down the CTRL key and select Customer Age Group, and
then click .
6) Click to close the Select Data Source Items window. The Select columns
to subgroup data area updates as displayed:
7-22 Chapter 7 Working with Transformations
3) Navigate to S:\Workshop\dift\reports.
4) Type UnitedStatesCustomerInfo.html in the Name field.
9. Select File Save to save diagram and job metadata to this point.
10. Run the job.
b. Click the Status tab in the Details area. Note that all processes completed successfully.
c. Expand to S:\Workshop\dift\reports.
e. Click .
g. When done viewing the report, select File Close to close Internet Explorer.
7.2 Using Extract, Summary Statistics, and Loop Transformations 7-29
12. Select File Save to save diagram and job metadata to this point.
13. Select File Close to close the Job Editor window. The Extract and Summary folder displays the
two jobs, and a target table:
7-30 Chapter 7 Working with Transformations
14
15
7.2 Using Extract, Summary Statistics, and Loop Transformations 7-31
Step 1
Step 2
Step 3
16
Control Table
The control table can be any table that contains rows of
data that can be fed into an iteration. The creation of this
table can be an independent job, or as part of the job flow
containing the Loop transformations.
17
7-32 Chapter 7 Working with Transformations
19
7.2 Using Extract, Summary Statistics, and Loop Transformations 7-33
20
7-34 Chapter 7 Working with Transformations
This demonstration uses the loop transformations to iterate through the distinct customer country values
and create a separate summary report for each of the countries. Three basic steps will be accomplished:
Step 1: Create the control table.
Step 2: Create the parameterized job.
Step 3: Create the iterative job.
g. Click .
i. Select DIFT Orion Target Tables Library as the value for the Library field.
k. Click .
1) Click .
6) Click .
11) Click .
13) Type 2-Character Country Value as the Description of the new column.
n. Click .
d. Connect the DIFT Customer Order Information table object to the SQL Join transformation.
By default, the SQL Join expects at least two input tables. However, for this instance, we need
just one input.
7.2 Using Extract, Summary Statistics, and Loop Transformations 7-37
e. Click the status indicator on the SQL Join transformation to discover a source table is missing.
g. Right-click on the SQL Join transformation and select Ports Delete Input Port. The status
indicator now shows no errors.
Again, the status indicator for the SQL Join shows that there is a problem.
e. Click the status indicator on the SQL Join transformation to discover that mappings are needed.
6. Select File Save to save diagram and job metadata to this point.
7-38 Chapter 7 Working with Transformations
d. On the Select tab, specify the following Expression information for the three target columns.
CCountryName compress(put(customer_country,$country.))
CountryValue put(customer_country,$2.)
7.2 Using Extract, Summary Statistics, and Loop Transformations 7-39
f. Click to return to the Job Editor. Note that the status indicator associated with the SQL
Join transformation now shows no errors.
8. Select File Save to save diagram and job metadata to this point.
9. Run the job to generate the control table.
a. Right-click in background of the job and select Run.
b. Verify that the job runs successfully.
c. Click the Log tab and verify that DIFTTGT.DISTINCTCOUNTRIES is created with 45
observation and 3 variables.
7-40 Chapter 7 Working with Transformations
c. Right-click DIFT Create Report for US Customer Order Information and select Copy.
e. Right-click DIFT Create Report for US Customer Order Information (the copied job located
in the Loop Transforms folder) and select Properties.
f. Type DIFT Parameterized Job for Country Reports as the value for the Name
field.
2. Double-click the job DIFT Parameterized Job for Country Reports and it opens in the Job Editor
window.
3. Edit the Extract transformation.
c. Type &CtryValue in place of US (in the Expression Text area) be sure double quotes
are being used.
Be sure to type in the period that separates the parameter name from the rest of the text.
5. Select File Save to save diagram and job metadata to this point.
7.2 Using Extract, Summary Statistics, and Loop Transformations 7-43
c. Click .
2) Type Country Name as the value for the Displayed text field.
7-44 Chapter 7 Working with Transformations
d. Click .
2) Type Compressed Country Name as the value for the Displayed text field.
e. Click .
2) Type Country Value as the value for the Displayed text field.
h. Click to close the DIFT Parameterized Job for Country Reports Properties window.
7. Select File Save to save diagram and job metadata to this point.
The icon for the job object in the Loop Transforms folder is now decorated with an ampersand to
denote that the job is parameterized.
7.2 Using Extract, Summary Statistics, and Loop Transformations 7-47
Parameterized jobs can be tested *if* all parameters are supplied default values.
b. Click the Status tab in the Details area. Note that all processes completed successfully.
e. Scroll towards the top of the Log and note that the parameters are all defined default values.
f. Scroll towards end of the Summary Statistics code area and verify that the correct HTML file
name is being generated, as well as the correct title2 text.
c. Expand to S:\Workshop\dift\reports.
d. Verify that UnitedStatesCustomerInfo.html exists (you can also check the date-time stamp to
verify that this HTML file was created with this job).
7.2 Using Extract, Summary Statistics, and Loop Transformations 7-49
2. Add control table metadata to the diagram for the process flow.
a. Click the Folders tab.
b. If necessary, expand Data Mart Development Orion Reports Loop Transforms.
c. Drag the DIFT Control Table - Countries table object to the Diagram tab of the Job Editor.
3. Add the Loop transformation to the process flow.
a. Click the Transformations tab.
b. Expand the Control folder and locate the Loop transformation template.
c. Drag the Loop transformation to the Diagram tab of the Job Editor.
d. Connect the DIFT Control Table - Countries table object as input to the Loop transformation.
c. For the Country Name parameter, select CountryName as the value for the Mapped Source
Column.
d. For the Compressed Country Name parameter, select CCountryName as the value for the
Mapped Source Column.
e. For the Country Value parameter, select CountryValue as the value for the Mapped Source
Column.
7. Select File Save to save diagram and job metadata to this point.
8. Run the job.
a. Right-click in background of the job and select Run.
b. Click the Status tab in the Details area. Note that all processes completed successfully.
and so on.
For each of these parameter sets, the inner job is executed and this execution results in an
HTML file.
7.2 Using Extract, Summary Statistics, and Loop Transformations 7-53
Exercises
The Marketing Department has been asked to examine buying habits of various age groups across the
genders. The same kind of marketing analysis will be applied to each distinct gender/age group
combination. To make this task easier, a request has been made to create a separate SAS table for each of
the distinct gender/age group combinations. You first use the Extract transformation to create one of the
needed tables. This job can then be parameterized and used with the Loop transformations to create the
series of desired tables.
1. Using the Extract Transformation to Create Table for Female Customers Aged 15-30 Years
Create a job that uses the Customer Dimension table to load a new table to contain just the female
customers aged 15-30 years.
Place the job in the Data Mart Development Orion Reports Extract and Summary folder.
Name the job DIFT Populate Female15To30Years Table.
Use the Customer Dimension table as the source table for the job (the metadata for this table can be
found in Data Mart Development Orion Target Data).
Add the Extract transformation to the job and build the following WHERE clause:
Customer_Gender = F and Customer_Age_Group = 15-30 years
Register the output table from the Extract transformation with the following attributes:
Name: Female15To30Years
Run the job and verify that the new table has 12,465 observations and 11 variables.
The final job flow should resemble the following:
7.2 Using Extract, Summary Statistics, and Loop Transformations 7-55
Use the SQL Join transformation to populate the control table. The control table needs to
contain the distinct combinations of all gender and age group values from the DIFT Customer
Dimension.
Use the following table to help define the calculations for the columns:
Column Expression
GenVal put(customer_gender,$1.)
Run the job the control table should have 8 observations and 3 columns. Verify the data look
appropriate.
Create a table template to be used in the parameterized job.
Place the table in the Data Mart Development Orion Reports Loop Transforms folder.
Name the table object DIFT Table Template for Gender Age Group Table.
Name the physical table &GdrAgeGrp._Customers and have the table created in the DIFT
Orion Target Tables Library as a SAS table.
The table template needs to have the same column specifications as the DIFT Customer
Dimension table.
7-56 Chapter 7 Working with Transformations
Number of Number of
Table Name Observations Columns
Female15To30Years_Customers 12465 obs 11 cols
Female31To45Years_Customers 9263 obs 11 cols
Female46To60Years_Customers 9295 obs 11 cols
Female61To75Years_Customers 9266 obs 11 cols
Male15To30Years_Customers 15261 obs 11 cols
Male31To45Years_Customers 11434 obs 11 cols
Male46To60Years_Customers 11502 obs 11 cols
Male61To75Years_Customers 11468 obs 11 cols
7.3 Establishing Status Handling 7-57
Objectives
Discuss return codes and how to capture a return
code in a SAS Data Integration Studio job.
Investigate where status handling is available.
25
Return Codes
When a job is executed in SAS Data Integration Studio,
a return code for each transformation in the job is
captured in a macro variable. The return code for the job
is set according to the least successful transformation in
the job.
26
7-58 Chapter 7 Working with Transformations
Example Actions
The return code can be associated with an action that
performs one or more of these tasks:
terminate the job or transformation
27
29
30
7-60 Chapter 7 Working with Transformations
31
32
7.3 Establishing Status Handling 7-61
This demonstration illustrates establishing Status Handling for an SQL Join transformation and for a job.
Also illustrated is the use of the Return Code Check transformation.
1. If necessary, access SAS Data Integration Studio using Brunos credentials.
a. Select Start All Programs SAS SAS Data Integration Studio 4.2.
b. Verify that the connection profile is My Server.
c. Click to close the Connection Profile window and access the Log On window.
d. Type Bruno as the value for the User ID field and Student1 as the value for the
Password field.
5. Establish two successful status handling conditions for the SQL Join transformation.
a. Click the Diagram tab.
b. Right-click on the SQL Join transformation and select Properties.
c. Click the Status Handling tab.
7-62 Chapter 7 Working with Transformations
There are two other conditions that can be tested for: Warnings and Errors.
The conditions of Warnings and Errors produce the same list of actions. The Errors condition has
one additional option associated it Abort.
g. Specify S:\Workshop\dift\reports\SHforSQLJoin.txt for the File Name field.
h. Specify Successful running of SQL Join in Job &jobid for the Message
field.
o. Specify Successful run for SQL Join in Job &jobid as the Message field.
6. Select File Save to save diagram and job metadata to this point.
7. Re-run the job.
a. Right-click in background of the job and select Run.
b. Click the Status tab in the Details area. Note that all processes completed successfully.
The notes pertaining to the text file for a successful condition should resemble:
7-66 Chapter 7 Working with Transformations
The notes pertaining to the data set for a successful condition should resemble the following:
c. Double-click SHforSQLJoin.txt.
The SAS data set received a new observation each time the job was run.
c. Select File Exit to close SAS Enterprise Guide and do not save any changes.
7-68 Chapter 7 Working with Transformations
11. Establish two successful status handling conditions for the job.
a. Right-click in the background of the job and select Properties.
b. On the General tab, change the name to DIFT Pop Cust Dim Table (SH).
e. Click in the Condition area, next to Successful, and select Send Job Status.
f. Select Send Job Status in the Action area. The Action Options window opens.
j. Click to close the DIFT Pop Cust Dim Table (SH) Properties window.
12. Select File Save to save diagram and job metadata to this point.
13. Re-run the job.
a. Right-click in background of the job and select Run.
b. Click the Status tab in the Details area. Note that all processes completed successfully.
d. View the Log for the executed Job. The new job status data set is created with one observation.
The SAS data set received a new observation each time the job was run.
The SAS data set can be used to gather some total time processing statistics for this job.
c. Select File Exit to close SAS Enterprise Guide and do not save any changes.
17. Close the DIFT Pop Cust Dim Table (SH) job and save any changes.
7.3 Establishing Status Handling 7-71
This demonstration shows the use of the Return Code Check transformation for transformations such as
the Extract transformation that do not have a Status Handling tab.
1. Locate and open the Report on US Customer Order Information job.
a. Click the Folders tab.
b. Expand Data Mart Development Orion Reports Extract and Summary.
c. Right-click DIFT Create Report for US Customer Order Information and select Copy.
d. Expand Data Mart Development Orion Reports Status Handling.
e. Right-click the Status Handling folder and select Paste.
f. Right-click the pasted job, DIFT Create Report on US Customer Order Information and
select Properties.
g. On the General tab, change the name of the job to DIFT Create Report on US Cust
Info (SH).
i. Right-click DIFT Create Report for US Cust Order Info (SH) and select Open.
7-72 Chapter 7 Working with Transformations
2. Right-click on the Extract transformation and select Properties. Note that the properties for this
transformation do not have a Status Handling tab.
The Return Code Check transformation can be used to take advantage of the status handling features
for those transformations that have no Status Handling tab. The Return Code Check transformation
captures the status of the previous transformation in the process flow, in this case, the Extract
transformation.
4. Add the Return Code Check transformation to the process flow diagram.
a. Click the Transformations tab.
b. Expand the Control group.
c. Locate the Return Code Check transformation.
d. Drag the Return Code Check transformation in to the process flow diagram.
7.3 Establishing Status Handling 7-73
h. Click to close the Action Options window. The Return Code Check Properties window
shows this one condition.
Objectives
Discuss and use the Data Validation transformation.
36
37
7-78 Chapter 7 Working with Transformations
a. Select Start All Programs SAS SAS Data Integration Studio 4.2.
c. Click to close the Connection Profile window and access the Log On window.
d. Type Bruno as the value for the User ID field and Student1 as the value for the
Password field.
e. Type DIFT Valid Products as the value for the Name field.
g. Click .
i. Select DIFT Orion Target Tables Library as the value for the Library field.
k. Click .
l. Expand the Data Mart Development Orion Source Data folder on the Folders tab.
7-80 Chapter 7 Working with Transformations
m. From the Orion Source Data folder, select DIFT NEWORDERTRANS table object.
o. Click .
e. Type DIFT Invalid Products as the value for the Name field.
g. Click .
i. Select DIFT Orion Target Tables Library as the value for the Library field.
k. Click .
l. Expand the Data Mart Development Orion Source Data folder on the Folders tab.
m. From the Orion Source Data folder, select DIFT NEWORDERTRANS table object.
o. Click .
e. Type DIFT Populate Valid and Invalid Product Tables as the value for the
Name field.
5. Add source table metadata to the diagram for the process flow.
c. Drag the DIFT NEWORDERTRANS table to the Diagram tab of the Job Editor.
6. Add the Data Validation transformation to the process flow.
b. Expand the Data folder and locate the Data Validation transformation template.
c. Drag the Data Validation transformation to the Diagram tab of the Job Editor.
d. Connect the DIFT NEWORDERTRANS table object to the Data Validation transformation.
g. Click .
7-82 Chapter 7 Working with Transformations
a. Right-click on the green temporary table object associated with the Data Validation
transformation and select Replace.
d. Click .
14) In the Action Options window, type difttgt as the value for the Libref field.
m. Keep the default Action if invalid value, Move row to error table.
n. Click to close the Invalid Values window. The Invalid Values tab shows the following:
7.4 Using the Data Validation Transformation 7-87
r. Verify that Move row t o error table is set as the value for Action if missing.
s. Click to close the Missing Values window. The Missing Values tab shows the
following:
9. Add a sticky note to the job. (A sticky note is a way to visual document within a job.)
b. Drag the sticky note and place it under the Data Validation transformation.
10. Select File Save to save diagram and job metadata to this point.
7.4 Using the Data Validation Transformation 7-89
b. Click the Status tab in the Details area. Note that all processes completed successfully.
e. Scroll to view the note about the creation of the DIFTTGT.VALID_PRODUCTS table and the
creation of the DIFTTGT.INVALID_PRODUCTS table:
7-90 Chapter 7 Working with Transformations
b. Right-click the DIFT Valid Products table object and select Open.
c. When you are finished viewing the DIFT Valid Products data set, close the View Data
window by selecting File Close.
13. View the DIFT Invalid Products table.
c. Right-click the DIFT Invalid Products table object and select Open.
d. When you are finished viewing the Invalid Products data set, close the View Data window
by selecting File Close.
7.4 Using the Data Validation Transformation 7-91
c. When you are finished viewing the file, select File Exit to close it.
15. View ValidInvalidProdEntryToFile.txt.
c. When you are finished viewing the file, select File Exit to close it.
16. View the data set difttgt.proddataexcept.
a. Copy the LIBNAME statement for the DIFT Orion Target Tables Library.
1) Click the Folders tab.
2) Expand Data Mart Development Orion Target Data.
3) Right-click on the DIFT Orion Target Tables Library object and select View Libname.
4) Right-click in the background of the Display Libname window and select Select All.
7-92 Chapter 7 Working with Transformations
5) Right-click in the background of the Display Libname window and select Copy.
4) Click the Output tab to view the information in the data set.
Exercises
Create metadata for the target table named DIFT Valid Customers.
The target table should be physically stored in the DIFT Orion Target Tables Library with a
name of Valid_Customers.
The table object should contain the exact same columns as the DIFT Customer
Dimension table object found in the Data Mart Development Orion Target Data
folder.
The target table object should end up in the Data Mart Development Orion Reports
Data Validation folder
Create metadata for the target table named DIFT Invalid Customers.
The target table should be physically stored in the DIFT Orion Target Tables Library with a
name of Invalid_Customers.
The table object should contain the exact same columns as the DIFT Customer
Dimension table object found in the Data Mart Development Orion Target Data
folder.
The target table object should end up in the Data Mart Development Orion Reports
Data Validation folder
Create the job that will load Valid Customers from the DIFT Customer Dimension table using the
Data Validation transformation.
7-94 Chapter 7 Working with Transformations
How many rows were moved to the error table because the value for Customer_Type was invalid?
Were there any duplicate values found for Customer_Name and Customer_Birth_Date?
7.5 Using Transpose, Sort, Append, and Rank Transformations 7-95
Objectives
Discuss and use the Rank, Transpose, Append, List,
and Sort transformations.
43
Transpose Transformation
The Transpose transformation
creates an output data set by
restructuring the values in a SAS
data set, and transposing selected
variables in to observations.
The Transpose transformation is an
interface to the TRANSPOSE
procedure.
44
7-96 Chapter 7 Working with Transformations
Sort Transformation
The Sort transformation provides an
interface for the SORT procedure.
The transformation can be used to
read data from a source, sort it, and
write the sorted data to a target in a
SAS Data Integration Studio job.
45
Append Transformation
The Append transformation can be
used to create a single target by
appending or concatenating two or
more sources.
46
7.5 Using Transpose, Sort, Append, and Rank Transformations 7-97
Rank Transformation
The Rank transformation uses the
RANK procedure to rank one or
more numeric variables in the source
and store the ranks in the target.
47
48
7-98 Chapter 7 Working with Transformations
49
7.5 Using Transpose, Sort, Append, and Rank Transformations 7-99
a. Select Start All Programs SAS SAS Data Integration Studio 4.2.
c. Click to close the Connection Profile window and access the Log On window.
d. Type Bruno as the value for the User ID field and Student1 as the value for the
Password field.
2. Verify that the DIFT STAFF_PARTIAL metadata table object exists and has data loaded.
d. Select File New External File Delimited. The New Delimited External File wizard
opens:
1) Type DIFT Additional Staff Information as the value for the Name field.
1) Navigate to S:\Workshop\dift\data.
3) Select AddlStaff.csv.
4) Click .
Previewing the file shows the first record contains column names and that the values are comma-
delimited and not space delimited.
1) Clear Blank.
2) Click Comma.
j. Click .
2) Type 2 (the number two) as the value for the Start record field.
3) Click to close the Auto Fill Columns window. The top portion of the Column
Definitions window populates with three columns, one of them numeric and two of them
character.
7-102 Chapter 7 Working with Transformations
5) Select Get the column names from column headings in this file.
6) Verify that 1 is set as the value for The column headings are in the file
record field.
7) Click . The Name field populates with all the column names.
k. Change the length, informat, and format for the Job_Title column.
1) In the top portion of the Column Definitions window, locate the Job_Title column.
m. Click .
n. Click . The metadata object for the external file is found on the Checkouts tab.
4. Create a target table object that is a duplicate of metadata from DIFT STAFF_PARTIAL.
f. Click . The copied object is now in the Transpose and Rank folder.
7.5 Using Transpose, Sort, Append, and Rank Transformations 7-103
a. Right-click the Copy of DIFT STAFF_PARTIAL table object and select Properties.
b. On the General tab, change the name of the metadata object to DIFT Full Staff.
d. Click next to the Library field. The Select a library window is displayed.
4) Click .
e. Type DIFT Populate Full Staff & Create Rank Report as the value for the
Name field.
7. Add source table metadata to the diagram for the process flow.
c. Drag the DIFT STAFF_PARTIAL table object to the Diagram tab of the Job Editor.
f. Drag the DIFT Additional Staff Information external file object to the Diagram tab of the Job
Editor.
7.5 Using Transpose, Sort, Append, and Rank Transformations 7-105
b. Expand the Access folder and locate the File Reader transformation template.
c. Drag the File Reader transformation to the Diagram tab of the Job Editor.
9. Connect the DIFT Additional Staff Information external file object to the File Reader
transformation.
10. Add the Sort transformation to the process flow.
c. Expand the Data folder and locate the Sort transformation template.
d. Drag the Sort transformation to the Diagram tab of the Job Editor.
11. Connect the File Reader transformation to the Sort transformation.
The process flow diagram at this point should resemble the following:
b. Expand the Data folder and locate the Transpose transformation template.
c. Drag the Transpose transformation to the Diagram tab of the Job Editor.
13. Connect the Sort transformation to the Transpose transformation.
14. Add the Append transformation to the process flow.
b. Expand the Data folder and locate the Append transformation template.
c. Drag the Append transformation to the Diagram tab of the Job Editor.
15. Connect the Transpose transformation to one of the ports for the Append transformation.
16. Connect DIFT STAFF_PARTIAL table object to the other default port for the Append
transformation.
7-106 Chapter 7 Working with Transformations
a. Right-click on the temporary output table for the Append and select Replace. The Table
Selector window is displayed.
d. Click .
18. Select File Save to save diagram and job metadata to this point.
19. Add the Rank transformation to the process flow.
b. Expand the Data folder and locate the Rank transformation template.
c. Drag the Rank transformation to the Diagram tab of the Job Editor.
20. Connect the DIFT Full Staff table object to the Rank transformation.
21. Select File Save to save diagram and job metadata to this point.
The process flow diagram should resemble the following:
7.5 Using Transpose, Sort, Append, and Rank Transformations 7-107
d. Click .
23. Change the name of work table that is output for the File Reader transformation.
a. Right-click on the green temporary table object associated with the File Reader transformation
and select Properties.
d. Click .
7-108 Chapter 7 Working with Transformations
e. Click .
25. Change the name of work table that is output for the first Sort transformation.
a. Right-click on the green temporary table object associated with the Sort transformation and select
Properties.
d. Click .
7.5 Using Transpose, Sort, Append, and Rank Transformations 7-109
26. Change the name of work table that is output for the Transpose transformation.
a. Right-click on the green temporary table object associated with the Transpose transformation and
select Properties.
e. Click .
a) Right-click in the background of the Target table area and select Select All.
d) Select to move all columns from the DIFT Full Staff table to the Selected area.
e) Click .
7.5 Using Transpose, Sort, Append, and Rank Transformations 7-111
3) Map Job_Title from Source table area to Job_Title in the Target table
area.
a) Click in the Select columns to transpose area to open the Select Data
Source Items window.
c) Click to close the Select Data Source Items window. The Select
analysis columns area updates as displayed:
7.5 Using Transpose, Sort, Append, and Rank Transformations 7-113
3) Establish ColNames for the Select a column for output column names area.
c) Click to close the Select a Data Source Item window. The Select a
column for output column names area updates as displayed:
7-114 Chapter 7 Working with Transformations
4) Establish Job_Title for the Select columns whose values define groups
of records to transpose area.
c) Click to close the Select Data Source Items window. The Select
columns whose values define groups of records to transpose
area updates as displayed:
d. Click .
7-116 Chapter 7 Working with Transformations
29. Change the name of work table that is output for the Rank transformation.
a. Right-click on the green temporary table object associated with the Rank transformation and
select Properties.
d. Click .
4) Select the following columns (click on one, then hold down the CTRL key and click on each
subsequent column):
Start_Date
End_Date
Birth_Date
Emp_Hire_Date
Emp_Term_Date
Manager_ID
6) Re-order the columns so that Salary and RankedSalary are the last two columns (click
on the row number and drag a column to desired ordering).
7) Verify that all columns are mapped properly (the RankedSalary column will not have a one-
to-one column mapping).
8) Right-click on the RankedSalary column and select Propagate From Targets To End.
7.5 Using Transpose, Sort, Append, and Rank Transformations 7-117
1) Select Salary in the Available source columns area, and then click to move
Salary to the Selected source columns area.
2) Select RankSalary in the Available target columns area, and then click to
move RankSalary to the Selected target columns area.
31. Right-click in the background of the job and select Settings Automatically Propagate Columns
(effectively disabling the automatic propagation for this job, from this point of edit on).
32. Add the Sort transformation to the process flow.
b. Expand the Data folder and locate the Sort transformation template.
c. Drag the Sort transformation to the Diagram tab of the Job Editor.
7-118 Chapter 7 Working with Transformations
b. Expand the Output folder and locate the List Data transformation template.
c. Drag the List Data transformation to the Diagram tab of the Job Editor.
35. Connect the Sort transformation to the List Data transformation.
36. Select File Save to save diagram and job metadata to this point.
37. Change the name of work table that is output for the second Sort transformation.
a. Right-click on the green temporary table object associated with the second Sort transformation
and select Properties.
d. Click .
f. Click .
7-120 Chapter 7 Working with Transformations
2) Select Use column labels as column headings (LABEL) and then click .
5) Type NOOBS as the value for the Additional PROC PRINT options area.
6) Type format salary dollar12.; as the value for the Additional PROC PRINT
statements area.
7.5 Using Transpose, Sort, Append, and Rank Transformations 7-121
a) Click in the Select other columns to print area to open the Select Data
Source Items window.
b) Select Job_Title, hold down the CTRL key and select Gender, hold down the CTRL key
and select Salary, and then click .
c) Click to close the Select Data Source Items window. The Select other
columns to print area updates as displayed:
3) Click Postcode.
4) Type options obs=MAX;.
7-122 Chapter 7 Working with Transformations
41. Select File Save to save diagram and job metadata to this point.
42. Run the job.
b. Click the Status tab in the Details area. Note that all processes completed successfully.
Objectives
Discuss and use the Apply Lookup Standardization
transformation.
Discuss and use the One-Way Frequency
transformation.
53
54
7-124 Chapter 7 Working with Transformations
55
7.6 Basic Standardization with the Apply Lookup Standardization Transformation 7-125
A table of potential customers was defined in metadata (DIFT Contacts). This demonstration creates
a job that initially reports on this data by creating a one-way frequency reports for two of the columns.
The job is then updated by adding the Apply Lookup Standardization transformation to apply two
predefined standardization schemes. The final step is reporting on the newly transformed data. The final
process flow should resemble the following:
a. Select Start All Programs SAS SAS Data Integration Studio 4.2.
c. Click to close the Connection Profile window and access the Log On window.
d. Type Bruno as the value for the User ID field and Student1 as the value for the
Password field.
2. Verify that the DIFT Contacts metadata table object exists and has data available.
e. Type DIFT Standardize and Report on Contacts Table as the value for the
Name field.
4. Add source table metadata to the diagram for the process flow.
c. Drag the DIFT Contacts table object to the Diagram tab of the Job Editor.
5. Add the One-Way Frequency transformation to the process flow.
b. Expand the Analysis folder and locate the One-Way Frequency transformation template.
c. Drag the One-Way Frequency transformation to the Diagram tab of the Job Editor.
6. Connect the DIFT Contacts table object to the One-Way Frequency transformation.
7. Select File Save to save diagram and job metadata to this point.
7-128 Chapter 7 Working with Transformations
b) Select OS, hold down the CTRL key and select DATABASE, and then click .
c) Click to close the Select Data Source Items window. The Select
columns to perform a one-way frequency distribution on area
updates as displayed:
4) Type nocum nopercent as the value for Specify other options for TABLES
statement area.
9. Select File Save to save diagram and job metadata to this point.
10. Run the job.
b. Click the Status tab in the Details area. Note that all processes completed successfully.
A quick glance verifies the initial suspicion that the OS and Database columns have not had
any standards imposed for data values.
11. Select File Close to close the job editor.
7.6 Basic Standardization with the Apply Lookup Standardization Transformation 7-131
Two schemes have been pre-built for these types of column data. The next steps will
establish the necessary options to access these schemes
add the Apply Lookup Standardization
re-run the One-Way Frequency task against the standardized table.
1. Select Tools Options.
2. Select Data Quality tab.
3. Verify that the following fields are set appropriately in the Data Quality area:
c. Double-click on the job DIFT Standardize and Report on Contacts Table. The job editor will
be opened.
6. Break the connection between the DIFT Contacts table object and the One-Way Frequency
transformation.
a. Select the connection line between the table and the transformation.
b. Expand the Data Quality folder and locate the Apply Lookup Standardization transformation
template.
c. Drag the Apply Lookup Standardization transformation to the Diagram tab of the Job Editor.
8. Connect the DIFT Contacts table object to the Apply Lookup Standardization transformation.
9. Connect the Apply Lookup Standardization transformation to the One-Way Frequency
transformation.
7.6 Basic Standardization with the Apply Lookup Standardization Transformation 7-133
c. On the Diagram tab, click to auto-arrange the elements in the process flow.
11. Select File Save to save diagram and job metadata to this point.
7-134 Chapter 7 Working with Transformations
c. For the DATABASE column, select DIFT Database Scheme.sch.qkb as the value for the
Scheme field.
d. For the OS column, select DIFT OS Scheme.sch.qkb as the value for the Scheme field.
e. For the DATABASE column, select Phrase as the value for the Apply Mode field.
f. For the OS column, select Phrase as the value for the Apply Mode field.
13. Select File Save to save diagram and job metadata to this point.
7.6 Basic Standardization with the Apply Lookup Standardization Transformation 7-135
b) Select OS, hold down the CTRL key and select DATABASE, and then click .
c) Click to close the Select Data Source Items window. The Select
columns to perform a one-way frequency distribution on area
updates as displayed:
7-136 Chapter 7 Working with Transformations
15. Select File Save to save diagram and job metadata to this point.
16. Run the job.
b. Click the Status tab in the Details area. Note that all processes completed successfully.
Exercises
Partial Output:
b. Add source table metadata to the diagram for the process flow.
1) Select the Data Mart Development Orion Target Data folder.
2) Drag the DIFT Customer Dimension table object to the Diagram tab of the Job Editor.
c. Add the Extract transformation to the process flow.
1) Click the Transformation tab.
2) Expand the Data folder and locate the Extract transformation template.
3) Drag the Extract transformation to the Diagram tab of the Job Editor. Place it next to the
DIFT Customer Dimension table object.
4) Connect the DIFT Customer Dimension table object to the Extract transformation.
d. Add the target table to the process flow.
1) Right-click on the green temporary table object associated with the Extract transformation
and select Register Table.
2) Type DIFT Customers - Females 15-30 Years as the value for the Name field.
e. Select File Save to save diagram and job metadata to this point.
f. Specify properties for the Extract transformation.
1) Right-click on the Extract transformation and select Properties.
2) Click the Where tab.
3) Construct the following expression:
7) Click .
9) Select DIFT Orion Target Tables Library as the value for the Library field.
11) Click .
12) No column metadata will be selected from existing metadata objects. Click .
a) Click .
f) Click .
k) Click .
14) Click .
15) Review the metadata listed in the finish window and click .
c. Add source table metadata to the diagram for the process flow.
1) Expand Data Mart Development Orion Target Data.
2) Drag the DIFT Customer Dimension table object to the Diagram tab of the Job Editor.
d. Add the SQL Join transformation to the process flow.
1) Click the Transformation tab.
2) Expand the Data folder and locate the SQL Join transformation template.
3) Drag the SQL Join transformation to the Diagram tab of the Job Editor. Place it next to the
DIFT Customer Dimension table object.
4) Connect the DIFT Customer Dimension table object to the SQL Join transformation.
5) Right-click on the SQL Join transformation and select Ports Delete Input Port. The status
indicator now shows no errors.
e. Add the target table to the process flow.
1) Right-click on the green icon (output table icon) for the SQL Join transformation and select
Replace.
2) Expand Data Mart Development Orion Reports Loop Transforms.
3) Click the DIFT Control Table Gender Age Groups table object.
4) Click .
GdrAgeGrp compress(put(customer_gender,$gender.)||
tranwrd(customer_age_group,"-","To"))
7.7 Solutions to Exercises 7-145
7) Click to return to the Job Editor. Note that the status indicator associated with the
SQL Join transformation now shows no errors.
g. Select File Save to save diagram and job metadata to this point.
h. Run the job to generate the control table.
1) Right-click in background of the job and select Run.
2) Verify that the job runs successfully.
3) Click the Log tab and verify that DIFTTGT.DISTINCTCOUNTRIES is created with 45
observation and three variables.
i. Create a table object template that will be used to generate the individual gender-age group tables.
1) Click the Folders tab.
2) Expand Data Mart Development Orion Reports Loop Transforms.
3) Verify that the Loop Transforms folder is selected.
4) Select File New Table.
5) Type DIFT Table Template for Gender Age Group Table as the value for the
Name field.
7) Click .
9) Select DIFT Orion Target Tables Library as the value for the Library field.
11) Click .
15) Click .
16) Click .
17) Review the metadata listed in the finish window and click .
j. Define the parameterized job metadata object to load the holding table.
1) Click the Folders tab.
2) Expand Data Mart Development Orion Reports Loop Transforms.
3) Verify that the Loop Transforms folder is selected.
4) Select File New Job. The New Job window opens.
5) Type DIFT Parameterized Job for Gender Age Group Tablesas the value
for the Name field.
8) Add source table metadata to the diagram for the process flow.
a) Expand Data Mart Development Orion Target Data.
b) Drag the DIFT Customer Dimension table object to the Diagram tab of the Job Editor.
9) Add the Extract transformation to the process flow.
a) Click the Transformation tab.
b) Expand the Data folder and locate the Extract transformation template.
c) Drag the Extract transformation to the Diagram tab of the Job Editor. Place it next to the
DIFT Customer Dimension table object.
d) Connect the DIFT Customer Dimension table object to the Extract transformation.
10) Add the target table to the process flow.
a) Expand Data Mart Development Orion Reports Loop Transforms.
b) Drag DIFT Table Template for Gender Age Group Tables table object to the Diagram
tab of the Job Editor.
c) Right-click on the output table object (green icon) for the Extract transformation and
select Delete.
d) Connect the Extract transformation to the DIFT Table Template for Gender Age Group
Tables table object.
11) Select File Save to save diagram and job metadata to this point.
12) Specify properties for the Extract transformation.
a) Right-click on the Extract transformation and select Properties.
b) Select Where tab.
c) In bottom portion of the Where tab, click the Data Sources tab.
d) Expand CustDim table.
e) Select Customer_Gender.
f) Click .
g) In Expression Text area, type ="&genval" AND (that is, an equals sign, the text
&genval, a space, the text AND, and another space).
h) On the Data Sources tab, double-click Customer_Age_Group to add this to the
Expression Text area.
13) Select File Save to save diagram and job metadata to this point.
14) Define job parameters.
a) Right-click in the background of the job and select Properties.
b) Click the Parameters tab.
c) Click .
e) Type Gender Value as the value for the Displayed text field.
i) Click .
k) Type Age Group Value as the value for the Displayed text field.
o) Click .
q) Type Gender Age Group Value as the value for the Displayed text field.
15) Select File Save to save diagram and job metadata to this point.
16) Run the job.
a) Right-click in background of the job and select Run.
b) Click the Status tab in the Details area. Note that all processes completed successfully.
d) View the Log for the executed Job. Specifically, locate the note about the
FEMALE15TO30YEARS_CUSTOMERS table.
l. Add control table metadata to the diagram for the process flow.
1) Click the Folders tab.
2) Expand Data Mart Development Orion Reports Loop Transforms.
3) Drag the DIFT Control Table Gender Age Groups table object to the Diagram tab of the
Job Editor.
m. Add the Loop transformation to the process flow.
1) Click the Transformations tab.
2) Expand the Control folder and locate the Loop transformation template.
3) Drag the Loop transformation to the Diagram tab of the Job Editor.
4) Connect the DIFT Control Table - Gender-Age Groups table object as input to the Loop
transformation.
n. Add the parameterized job to the process flow.
1) Click the Folders tab.
2) Expand Data Mart Development Orion Reports Loop Transforms.
3) Drag the DIFT Parameterized Job for Gender Age Group Tables job to the Diagram tab of
the Job Editor.
7.7 Solutions to Exercises 7-151
3) For the Gender Value parameter, select GenVal as the value for the Mapped Source
Column.
4) For the Age Group Value parameter, select AgeGroup as the value for the Mapped
Source Column.
5) For the Gender Age Group Value parameter, select GdrAgeGrp as the value for the
Mapped Source Column.
q. Select File Save to save diagram and job metadata to this point.
r. Run the job.
1) Right-click in background of the job and select Run.
2) Click the Status tab in the Details area. Note that all processes completed successfully.
7) Click .
9) Select DIFT Orion Target Tables Library as the value for the Library field.
11) Click .
12) Expand the Data Mart Development Orion Source Data folder on the Folders tab.
13) From the Orion Source Data folder, select DIFT Customer Dimension table object.
15) Click .
17) Review the metadata listed in the finish window and click .
7) Click .
7-154 Chapter 7 Working with Transformations
9) Select DIFT Orion Target Tables Library as the value for the Library field.
11) Click .
12) Expand the Data Mart Development Orion Source Data folder on the Folders tab.
13) From the Orion Source Data folder, select DIFT Customer Dimension table object.
15) Click .
17) Review the metadata listed in the finish window and click .
d. Add source table metadata to the diagram for the process flow.
1) Click the Data Mart Development Orion Target Data Folders.
2) Drag the DIFT Customer Dimension table to the Diagram tab of the Job Editor.
e. Add the Data Validation transformation to the process flow.
1) Click the Transformation tab.
2) Expand the Data folder and locate the Data Validation transformation template.
3) Drag the Data Validation transformation to the Diagram tab of the Job Editor.
4) Connect the DIFT Customer Dimension table object to the Data Validation transformation.
7.7 Solutions to Exercises 7-155
4) Click .
d) Expand the DIFT Customer Types table in the Data Mart Development
Data folder and select Customer_Type.
f) Select Change value to as the value for the Action if invalid field.
7-156 Chapter 7 Working with Transformations
g) Type Unknown Customer Type as the value for the New Value field.
c) Verify that Abort job is set as the value for Action if missing.
h. Add a sticky note to the job. (A sticky note is a way to visually document within a job.)
2) Drag the sticky note and place it under the Data Validation transformation.
3) Double-click the sticky note to expand to add some text.
4) Type The Invalid_Customers table is populated through the
execution of the Data Validation transformation. as the text for the
sticky note.
2) Click the Status tab in the Details area. Note that all processes completed successfully.
3) When you are finished viewing the file, select File Exit to close it.
b. Add source table metadata to the diagram for the process flow.
1) Click the Folders tab.
2) Navigate to the Data Mart Development Orion Source Data folder.
3) Drag the DIFT Catalog_Orders table object to the Diagram tab of the Job Editor.
c. Add the One-Way Frequency transformation to the process flow.
1) Click the Transformation tab.
2) Expand the Analysis folder and locate the One-Way Frequency transformation template.
3) Drag the One-Way Frequency transformation to the Diagram tab of the Job Editor.
d. Connect the DIFT Catalog_Orders table object to the One-Way Frequency transformation.
e. Select File Save to save diagram and job metadata to this point.
f. Specify properties for the One-Way Frequency transformation.
1) Right-click on the One-Way Frequency transformation and select Properties.
2) Click the Options tab.
3) Verify that Assign columns is selected in the selection pane.
4) Establish CATALOG in the Select columns to perform a one-way frequency
distribution on area.
g. Select File Save to save diagram and job metadata to this point.
h. Run the job.
1) Right-click in background of the job and select Run.
2) Click the Status tab in the Details area. Note that all processes completed successfully.
n. Select File Save to save diagram and job metadata to this point.
o. Specify properties for the Apply Lookup Standardization transformation.
1) Right-click on the Apply Lookup Standardization transformation and select Properties.
2) Click the Standardizations tab.
3) For the CATALOG column, select DIFT Catalog Orders.sch.qkb as the value for the
Scheme field.
4) For the CATALOG column, select Phrase as the value for the Apply Mode field.
5) For the OS column, select Phrase as the value for the Apply Mode field.
p. Select File Save to save diagram and job metadata to this point.
q. Specify properties for the One-Way Frequency transformation.
1) Right-click on the One-Way Frequency transformation and select Properties.
2) Click the Options tab.
3) Verify that Assign columns is selected in the selection pane.
7.7 Solutions to Exercises 7-163
r. Select File Save to save diagram and job metadata to this point.
s. Run the job.
1) Right-click in background of the job and select Run.
2) Click the Status tab in the Details area. Note that all processes completed successfully.
8.3 Table Properties and Load Techniques of the Table Loader Transformation ......... 8-14
8-2 Chapter 8 Working with Tables and the Table Loader Transformation
Objectives
Discuss available table loader techniques.
Discuss reasons to use the Table Loader
transformation.
Loader Transformations
SAS Data Integration Studio provides
three specific transformations to load data.
These Loader transformations are
designed to output to permanent,
registered tables (that is, tables
that are available in the Folder or
Inventory Tree).
Loaders can do the following:
create and replace tables
maintain indexes
4
8-4 Chapter 8 Working with Tables and the Table Loader Transformation
Table Loader
No Table Loader?
SAS Data Integration Studio data transformations can
perform a simple load of that transformations output
table. The transformations will drop and then replace
the table.
10
8.1 Basics of the Table Loader Transformation 8-7
11
8-8 Chapter 8 Working with Tables and the Table Loader Transformation
Objectives
Discuss various load styles provided by the Table
Loader transformation.
13
Important Step
An important step in an ETL process usually involves
loading data into a permanent physical table that is
structured to match your data model. The designer or
builder of an ETL process flow must identify the type of
load that the process requires in order to:
append all source data to any
previously loaded data
replace all previously loaded
data with the source data
use the source data to update
and add to the previously
loaded data based on
specific key column(s)
14
8.2 Load Styles of the Table Loader Transformation 8-9
Load Style
In SAS Data Integration Studio, the Table Loader
transformation can be used to perform any of the three
load types (the Load style field on the Load Technique
tab).
15
16
The APPEND procedure with the FORCE option is the default. If the source is a large table and the target
is in a database that supports bulk load, PROC APPEND can take advantage of the bulk-load feature.
Consider bulk loading the data into database tables by using the optimized SAS/ACCESS engine bulk
loaders. It is recommended that you use native SAS/ACCESS engine libraries instead of ODBC libraries
or OLE DB libraries for relational database data. SAS/ACCESS engines have native access to the
databases and have superior bulk-loading capabilities.
PROC SQL with the INSERT statement performs well when the source table is small (because the
overhead needed to set up bulk loading is not incurred). PROC SQL with the INSERT option adds one
row at a time to the database.
8-10 Chapter 8 Working with Tables and the Table Loader Transformation
Replace Description
All rows using delete uses PROC SQL with DELETE * to remove all rows.
Entire table replaces the entire table using PROC DATASETS.
Simulating truncate uses DATA step with SET and STOP statements to
remove all rows (available only for SAS tables).
All rows using truncate uses PROC SQL with TRUNCATE to remove all rows
(only available for some databases).
17
When Entire table is selected, the table is removed and disk space is freed. Then the table is re-created
with 0 rows. Consider using this option unless your security requirements restrict table deletion
permissions (a restriction that is commonly imposed by a database administrator on database tables).
Also, avoid this method if the table has any indexes or constraints that SAS Data Integration Studio
cannot re-create from metadata (for example, check constraints).
If available, consider using All rows using truncate. Both All rows using selections enable you to
keep all indexes and constraints intact during the load. By design, using TRUNCATE is the quickest way
to remove all rows. The DELETE * syntax also removes all rows; however, based on the database and
table settings, this choice can incur overhead that will degrade performance. The database administrator
or database documentation should be consulted for a comparison of the two techniques.
Caution: When using All rows using delete repeatedly to clear a SAS table, the size of that
table should be monitors over time. All rows using delete only performs logical deletes for SAS
tables; therefore, a tables physical size will grow and the increased size can negatively affect
performance.
8.2 Load Styles of the Table Loader Transformation 8-11
18
19
8-12 Chapter 8 Working with Tables and the Table Loader Transformation
20 ...
8.2 Load Styles of the Table Loader Transformation 8-13
When Modify by Column(s) is selected, the Match by Column(s) group box, which
enables you to select columns, is enabled.
When Modify Using Index is selected, the Modify Using Index group box, which enables
you to select an index, is enabled. Also, the Modify Using Index group box has a check box
enabled, Return to the top of the index for duplicate values coming
from the input data.
The options Modify by Columns and Modify using Index have the added benefit of being able to take
unmatched records and add them to the target table during the same single pass through the source table.
Of these three choices, the DATA step MODIFY with KEY= method often out-performs the other update
methods in tests conducted on loading SAS tables. The DATA step MODIFY with KEY= method can also
perform adequately for database tables when indexes are used.
When the SQL procedure with the WHERE or SET statements is used, performance varies. Neither of
these statements in PROC SQL requires data to be indexed or sorted, but indexing on the key column(s)
can greatly improve performance. Both of these statements use WHERE processing to match each row of
the source table with a row in the target table.
The update technique chosen should depend on the percentage of rows being updated. If the majority of
target records are being updated, the DATA step with MERGE (or UPDATE) might perform better than
the DATA step with MODIFY BY or MODIFY KEY= or PROC SQL because MERGE makes full use of
record buffers.
Performance results can be hardware and operating environment dependent, so you should consider
testing more than one technique.
8-14 Chapter 8 Working with Tables and the Table Loader Transformation
Objectives
Discuss various types of keys and how to define
in SAS Data Integration Studio.
Discuss indexes and how to define in SAS Data
Integration Studio.
Discuss Table Loader options for keys and indexes.
23
Keys
Several transformations available in SAS Data Integration
Studio, including the Table Loader transformation, can
take advantage of different types of keys that can be
defined for tables.
Foreign Keys
Unique Keys
Surrogate Keys
24
8.3 Table Properties and Load Techniques of the Table Loader Transformation 8-15
25
27
8.3 Table Properties and Load Techniques of the Table Loader Transformation 8-17
28
Integrity constraints preserve the consistency and correctness of stored data. They restrict the data values
that can be updated or inserted into a table. Integrity constraints can be specified at table creation time or
after data already exists in the table. In the latter situation, all data are checked to verify that they satisfy
the candidate constraint before the constraint is added to the table. Integrity constraints are enforced
automatically by the SAS System for each add, update, and delete of data to the table containing the
constraint(s). Specifying constraints is the user's responsibility.
There are five basic types of integrity constraints:
Not Null (Required Data)
Check (Validity Checking)
Unique (Uniqueness)
Primary Key (Unique and Not Null)
Foreign Key (Referential)
The first four types of constraints are referred to as "general constraints" in this document. Foreign keys
and primary keys that are referenced by one or more foreign keys are referred to as "referential
constraints". Note that a primary key alone is insufficient for referential integrity. Referential integrity
requires a primary key and a foreign key.
8-18 Chapter 8 Working with Tables and the Table Loader Transformation
Indexes
An index is an optional file that you can create for
a SAS data file that does the following:
points to observations based on the values of one
or more key variables
provides direct access to specific observations
29
30
8.3 Table Properties and Load Techniques of the Table Loader Transformation 8-19
Business Scenario
The SAS data set orion.sales_history is often
queried with a WHERE statement.
Partial Listing of orion.sales_history
Customer Order_ Order_ Product_
. . . Product_ID . . . . . .
_ID ID Type Group
31
Business Scenario
You need to create three indexes on the most frequently
used subsetting columns.
Index Name Index Variables
Customer_ID Customer_ID
Product_Group Product_Group
SaleID Order_ID
Product_ID
Partial Listing of orion.sales_history
Customer Order_ Order_ Product_
. . . Product_ID . . . . . .
_ID ID Type Group
32
8-20 Chapter 8 Working with Tables and the Table Loader Transformation
data customer14958;
set orion.sales_history;
where Customer_ID=14958;
run;
34
8.3 Table Properties and Load Techniques of the Table Loader Transformation 8-21
Input
SAS Buffers The WHERE statement
selects observations
Data by reading data
Data sequentially.
pages are
loaded.
PDV
Customer_ID Employee_ID
Output Buffers
SAS
Data
40
data customer14958;
set orion.sales_history;
where Customer_ID=14958;
run;
41
8-22 Chapter 8 Working with Tables and the Table Loader Transformation
Input
SAS Buffers The WHERE statement
Data selects observations
Only by using direct access.
necessary
pages are
loaded. PDV
Customer_ID Employee_ID
Output Buffers
SAS
Data
48
50
51
8-24 Chapter 8 Working with Tables and the Table Loader Transformation
53
8.3 Table Properties and Load Techniques of the Table Loader Transformation 8-25
Condition Settings
The Constraint Condition and Index Condition options that
are available will depend on the load technique specified.
Take off
Leave as is
54
55
8-26 Chapter 8 Working with Tables and the Table Loader Transformation
56
General Rule
Consider removing and re-creating indexes
if more than 10% of the data in the table
will be re-loaded.
57
8.3 Table Properties and Load Techniques of the Table Loader Transformation 8-27
58
8-28 Chapter 8 Working with Tables and the Table Loader Transformation
Chapter 9 Working with Slowly
Changing Dimensions
9.2 Using the SCD Type 2 Loader and Lookup Transformations ................................... 9-15
Demonstration: Populate Star Schema Tables Using the SCD Type 2 Loader with the
Surrogate Key Method................................................................................ 9-29
Objectives
Explain slowly changing dimensions.
Define keys.
List benefits of slowly changing dimensions.
Define SCD types 1, 2, and 3.
4
9-4 Chapter 9 Working with Slowly Changing Dimensions
6
9.1 Defining Slowly Changing Dimensions 9-5
foreign keys (in the fact table) can only have values
that exist in the primary key
a primary key value cannot be deleted if it exists in a
foreign key
8
9-6 Chapter 9 Working with Slowly Changing Dimensions
Business Keys
Often the business key in a dimension table can function
as a primary key in that table
Customer_ID in Customer Dimension
Product_ID in Product Dimension
etc
10
9.1 Defining Slowly Changing Dimensions 9-7
11
12
9-8 Chapter 9 Working with Slowly Changing Dimensions
13
14
9.1 Defining Slowly Changing Dimensions 9-9
15
16
9-10 Chapter 9 Working with Slowly Changing Dimensions
17
21
1
2
22
9.1 Defining Slowly Changing Dimensions 9-13
3 4
23
2 1
24
9-14 Chapter 9 Working with Slowly Changing Dimensions
25
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-15
Objectives
List the functions of the SCD Type 2 transformation.
Define business keys.
Define surrogate and retained keys.
Detect and track changes.
List the functions of the Lookup transformation.
28
tracks changes
29
9-16 Chapter 9 Working with Slowly Changing Dimensions
Business Key
The business key consists of one or more columns that
identify a business entity, like a customer, a product, or an
employee.
The Business Key tab is used to specify one or more
columns in a target dimension table that represent the
business key.
30
Change Detection
The business key is used as the basis for change
detection. The business keys in source rows are
compared to the business keys in the target.
The Detect Changes tab is used to specify one or more
columns in a dimension table that are monitored for
changes.
31
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-17
Change Detection
By default all columns are included in change detection,
except the columns that are specified in the Change
Tracking, Business Key, and Generated Key tabs.
32
Change Tracking
The SCD Type 2 Loader provides three methods for
tracking historical records:
Beginning and End Date (or datetime) values
Version number
33
9-18 Chapter 9 Working with Slowly Changing Dimensions
Change Tracking
The Change Tracking tab is used to specify one or more
methods and associated columns in a target dimension
table to be used for tracking historical records.
Multiple methods can be selected.
34
Generated Key
The SCD Type 2 Loader generates values for a
Generated Key column in the target dimension table.
The Generated Key tab is used to specify a column for
the generated key values as well as a method for
generating the key values.
35
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-19
Generated Key
The generated key:
eliminates dependencies on the source data, as the
business key may be subject to redefinition, reuse, or
recycling
can be used as a primary key or as part of a
composite primary key
is generated at run time for each new row that is
added to the target
36
Generated Key
The generated key can be specified as a surrogate key or
a retained key.
37
9-20 Chapter 9 Working with Slowly Changing Dimensions
38
Cust Id is the business key in this example. The surrogate key method generates a new key value for
each added row.
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-21
The retained key method with new column generates a new retained key value if the added row
represents a new business key.
If the added row represents an existing business key:
the same retained key is assigned
the old row is closed out (end date assigned or current record turned off)
the new row is opened (begin date assigned or current record turned on)
9-22 Chapter 9 Working with Slowly Changing Dimensions
The retained key method with existing column does not generate key values.
If the added row represents an existing business key:
the old row is closed out (end date assigned or current record turned off)
the new row is opened (begin date assigned or current record turned on)
42
An entry consists of all the rows (the current row and the historical rows) for a business entity represented
by a business key value.
If a source row has a business key that does not exist in the target, then that row represents a new entry.
The new row is added to the target with appropriate change-tracking values.
If a source row has the same business key as a current row in the target, and a value in a column identified
as a Change Detection column differs, then that row represents an update to an existing entry. The source
row is added to the target. The source row becomes the new current row for that entry; it receives
appropriate change-tracking values. The superseded target row can also receive new change-tracking
values (closed out).
If a source row has the same business key and content as a current row in the target, it might indicate that
the entry is being closed out. The entry is closed out if change tracking is implemented with begin and
end datetime values, and if the end datetime value in the source is older than the same value in the target.
When this is the case, the new end date is written into the target to close out the entry.
If a source row has the same business key and the same content as a current row in the target, then that
source row is ignored.
9-24 Chapter 9 Working with Slowly Changing Dimensions
43
The digest column contains a concatenated encryption of values from selected columns other than the key
columns. It is a character column with a length of 32 and named DIGEST_VALUE. The encrypted
concatenation uses the MD5 algorithm.
Lookup Transformation
The Lookup transformation can be used to load a target
table with columns taken from a source and from a
number of lookup tables.
When a job containing a Lookup transformation is run,
each source row is processed as follows:
The key columns in the source row are compared to
the key columns in the specified lookup tables.
If matches are found, specified lookup values and
source values are added to the target row in the
transformation temporary output table.
The temporary output table rows are then loaded into
the target.
44
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-25
45
Lookups Tab
The Lookups tab in the Lookup transformation is used
to specify lookup properties:
Source to Lookup columns
Lookup to Target columns
Where expression
Exceptions
46
9-26 Chapter 9 Working with Slowly Changing Dimensions
47
48
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-27
49
50
9-28 Chapter 9 Working with Slowly Changing Dimensions
Error Tables
The Errors tab in the properties of the Lookup
transformation is used to specify:
Errors Table: includes a generated column (source
row number) and any column from the source data.
Exception Table: includes four generated columns
(exception information) and any column from the
source data.
51
52
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-29
a. Select Start All Programs SAS SAS Data Integration Studio 4.2.
c. Click to close the Connection Profile window and access the Log On window.
d. Type Barbara as the value for the User ID field and Student1 as the value for the
Password field.
1. Register the Source Data Template data set. The Source Data Template data set is a SAS data set that
was prepared with 29 column definitions for use in this workshop. The data set has no rows, so it
stores no data. It serves only as a repository for column definitions.
a. Select the Folders tab.
b. Expand Data Mart Development Orion SCD.
c. Verify that the Orion SCD folder is selected.
d. Select File Register Tables. The Register Tables wizard starts.
e. Select SAS as the type of table and click . The Select a SAS Library window is
displayed.
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-31
f. Select the DIFT SAS Library from the SAS Library drop-down list and click .
j. Type DIFT SCD Source Data Template in the Name field and click .
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-33
e. Type DIFT SCD Source Data as the value for the Name field.
2) Navigate to S:\Workshop\dift\data.
4) Select OrionSourceDataM01.csv.
The file has 7 records. Note that the first record has column names and that the data fields are
comma-delimited.
l. Click . The Column Definitions window is displayed. Increase the size of the window
by dragging the corners.
n. A dialog window indicates that there are only 7 records. Click to close the Warning
dialog window.
o. Import column definitions from a template data set.
2) Select Get the column definitions from other existing tables or external files.
6) Select the DIFT SCD Source Data Template table and click .
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-37
7) 29 columns from the DIFT SCD Source Data Template table are selected.
p. Definitions for 29 columns are imported. These column definitions will be forward propagated
through the process flow to the target tables.
9-38 Chapter 9 Working with Slowly Changing Dimensions
r. Click .
e. Type DIFT SCD Customer Dimension as the value for the Name field.
f. Verify that the location is set to /Data Mart Development/ Orion SCD.
g. Click .
1) Click next to the Library field. The New Library Wizard opens.
2) Type DIFT SCD Target Tables Library as the value for the Name field.
3) Verify that the location is set to /Data Mart Development/ Orion SCD.
4) Click .
7) Click .
9-42 Chapter 9 Working with Slowly Changing Dimensions
10) Click .
j. Change the default name of the new table. Type SCDCustDim in the Name field.
9-44 Chapter 9 Working with Slowly Changing Dimensions
l. Do not select columns at this time. You propagate columns from sources to the targets in a later
step. Click .
n. Click .
p. Click .
f. Verify that the location is set to /Data Mart Development/ Orion SCD.
g. Click .
i. Select DIFT SCD Target Tables Library as the value for the Library field.
l. Do not select columns at this time. You propagate columns from the source to the targets in a later
step.
o. Click .
q. Click .
f. Verify that the location is set to /Data Mart Development/ Orion SCD.
g. Click .
i. Select DIFT SCD Target Tables Library as the value for the Library field.
l. Do not select columns at this time. You propagate columns from the source to the targets in a later
step.
o. Click .
q. Click .
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-47
1. Orient the process flow from top to bottom and turn off automatic column propagation.
e. Type Populate Star Schema with SCD-RK Processing as the value for the Name
field.
4. Add the source table to the diagram for the process flow.
b. Drag the DIFT SCD Source Data external file to the Diagram tab of the Job Editor.
5. Add a File Reader transformation to the process flow.
b. Expand the Access group and locate the File Reader transformation template.
c. Drag the File Reader transformation to the Diagram tab of the Job Editor. Place it under the
DIFT SCD Orion Detail Information external file object.
d. Connect the DIFT SCD Orion Detail Information external file object to the File Reader
transformation.
b. Expand the Data group and locate the Splitter transformation template.
c. Drag the Splitter transformation to the Diagram tab of the Job Editor. Place it under the File
Reader transformation.
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-49
b. Expand the Data group and locate the Sort transformation template.
c. Drag the Sort transformation to the Diagram tab of the Job Editor.
d. Connect one of the temporary table outputs from the Splitter to one of the Sort transformations.
e. Drag a second Sort transformation to the Diagram tab of the Job Editor.
f. Connect the second temporary table output from the Splitter to the other Sort transformation.
9-50 Chapter 9 Working with Slowly Changing Dimensions
8. Select File Save to save diagram and job metadata to this point.
9. Add two SCD Type 2 Loader transformations to the process flow.
b. Expand the Data group locate the SCD Type 2 Loader transformation template.
c. Drag the SCD Type 2 Loader transformation to the Diagram tab of the Job Editor.
d. Connect the temporary table output from one Sort transformation to one of the SCD Type 2
Loader transformations.
e. Drag a second SCD Type 2 Loader transformation to the Diagram tab of the Job Editor.
f. Connect the temporary table output from the second Sort transformation to the other SCD Type 2
Loader transformation.
b. Drag the DIFT SCD Customer Dimension table object to the Diagram tab of the Job Editor,
placing it under one of the SCD Type 2 Loader transformations.
c. Connect the SCD Type 2 Loader transformation to the DIFT SCD Customer Dimension table
object.
d. Drag the DIFT SCD Product Dimension table object (from the Checkouts tab) to the Diagram
tab of the Job Editor, placing it under one of the SCD Type 2 Loader transformations.
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-51
e. Connect the SCD Type 2 Loader transformation to the DIFT SCD Product Dimension table
object.
11. Select File Save to save diagram and job metadata to this point.
12. Add a third output table to the Splitter by right-clicking on the Splitter transformation and selecting
Add Work Table.
b. Expand the Data group and locate the Lookup transformation template.
c. Drag the Lookup transformation to the Diagram tab of the Job Editor.
d. Connect the third temporary output table from the Splitter transformation to the Lookup
transformation.
e. Next connect the DIFT SCD Customer Dimension table object to the Lookup transformation.
9-52 Chapter 9 Working with Slowly Changing Dimensions
f. Add a third input port to the Lookup transformation by right-clicking on the Lookup
transformation and selecting Ports Add Input Port.
g. Connect the DIFT SCD Product Dimension table object to the Lookup transformation.
b. Expand the Access group and locate the Table Loader transformation template.
c. Drag the Table Loader transformation to the Diagram tab of the Job Editor.
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-53
15. Add the DIFT SCD Order Fact table as the final output for this process flow.
b. Drag the DIFT SCD Order Fact table object to the Diagram tab of the Job Editor, placing it
under the Table Loader transformations.
9-54 Chapter 9 Working with Slowly Changing Dimensions
c. Connect the Table Loader transformation to the DIFT SCD Order Fact table object.
1. There are currently no columns defined in the temporary output tables or the target tables. You
manually propagate columns forward starting at the source table.
2. Define columns for the temporary output table from the File Reader.
a. Right-click on the File Reader and select Properties.
b. Select the Mappings tab. The 29 columns in the DIFT SCD Source Data table are listed in
the Source table pane on the left. Propagate all 29 columns to the Target table pane.
All 29 columns are propagated forward from the DIFT SCD Source Data table to the
temporary output table of the File Reader transformation.
e. Right-click on the second temporary output table (leading to the DIFT SCD Product
Dimension table) from the Splitter and select Properties.
h. Right-click on the first temporary output table (leading to the Lookup transformation and the
DIFT SCD Order Fact table) from the Splitter and select Properties.
i. Select the Physical Storage tab.
4. Define columns for the temporary output tables from the Splitter.
a. Right-click on the Splitter and select Properties.
b. Select the General tab.
d. This is a reminder that the splitter is used to direct different columns from the data source to the
three target tables.
e. Select the Row Selection tab.
f. Verify that All Rows are selected for each of the three output tables.
g. Select the Mappings tab. The 29 columns in the temporary output table from the File Reader are
listed in the Source table pane on the left. Propagate only the necessary columns to each output
table.
h. Select the Splitter 0 (TempForCustDim) table from the Target table drop-down list.
i. Select columns 1 through 11 in the Source table pane. Use the Shift key to make the selection.
The selected columns should include only:
Customer_ID
Customer_Country
Customer_Gender
Customer_Name
Customer_FirstName
Customer_LastName
Customer_Birth_Date
Customer_Type
Customer_Group
Customer_Age
Customer_Age_Group
9-58 Chapter 9 Working with Slowly Changing Dimensions
l. Select columns 12 through 19 in the Source table pane. Use the Shift key to make the selection.
The following 8 columns should be selected.
Product_ID
Product_Name
Supplier_ID
Supplier_Name
Supplier_Country
Product_Group
Product_Category
Product_Line
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-59
o. Select columns 20 through 29 in the Source table pane. Use the Shift key to make the selection.
The 9 selected columns are these:
Order_ID
Order_Item_Num
Quantity
Total_Retail_Price
CostPrice_Per_Unit
Discount
Order_Type
Employee_ID
Order_Date
Delivery_Date
q. Select Customer_Id (column 1) and while holding the CTRL key, select Product_Id (column 12).
The two selected columns are these:
Customer_ID
Product_ID
9-60 Chapter 9 Working with Slowly Changing Dimensions
5. Define columns for the temporary output tables from the Sort transformations.
a. Right-click on the first Sort transformation and select Properties.
b. Select the General tab.
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-61
This is a reminder that the sort transformation will be used to remove records with duplicate
Customer Id values.
This is a reminder that this sort transformation will be used to remove records with duplicate
Customer Id values.
e. Right-click on the DIFT SCD Customer Dimension table and select Properties.
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-63
f. Select the Columns tab and verify that it has the 11 propagated columns.
g. Add 4 new columns.
h. Select the Customer_ID column.
n. Right-click on the DIFT SCD Product Dimension table and select Properties.
o. Select the Columns tab and verify that it has the 8 propagated columns.
7. Add metadata for 4 new columns.
a. Select the Product_ID column.
8. Define columns for the temporary output table from the Lookup transformation.
a. Right-click on the Lookup transformation and select Properties.
b. Select the Mappings tab.
9-64 Chapter 9 Working with Slowly Changing Dimensions
16 columns are defined for the temporary output table from the Lookup transformation.
f. Select the Columns tab and verify that it has the 16 propagated columns.
e. In the Remove duplicate records area, select Remove rows with duplicate keys
(NODUPKEY).
l. In the Remove duplicate records area select Remove rows with duplicate keys (NODUPKEY).
4. Update the properties of the first SCD Type 2 Loader transformation that will populate DIFT SCD
Customer Dimension. Apply the Retained Key method.
a. Right-click on the SCD Type 2 transformation (for DIFT SCD Customer Dimension
table) and select Properties.
b. Click the Change Tracking tab. Use beginning and end dates to track changes for each
Customer ID and use a current record indicator to keep track of the current record for each
Customer ID.
SAS Data Integration Studio provides default expressions for the Change Tracking
columns. The DATETIME function is used to generate the beginning datetime value. A
datetime constant with a future date is used to specify an open ended value for the ending
datetime. You can click in the Expression field to specify a custom expression.
Use Version Number and Use Current Indicator are provided as alternative methods to
track the current record.
c. Click the Business Key tab. Specify Customer_ID as the business key in the Customer
Dimension.
2) Select Customer_ID.
9-70 Chapter 9 Working with Slowly Changing Dimensions
d. Click the Generated Key tab. Specify GenKeyCust as the retained key column.
1) Select GenKeyCust as the column to contain the generated key values.
2) Check Generate retained key to implement the retained key method.
A default expression is provided to generate retained key values in the New record
field. To specify a custom expression, click .
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-71
To implement the surrogate key method, uncheck Generate retained key. The
surrogate key method provides default expressions for new and changed records.
e. Click the Detect Changes tab. Specify Customer_Name as the column on which changes are
based.
If no columns are selected on the Detect Changes tab, then all columns are used to
detect changes, except those used for Change Tracking, Business Key, Generated
Key, and Type 1 columns.
9-72 Chapter 9 Working with Slowly Changing Dimensions
h. Check Postcode.
5. Update the properties for the second SCD Type 2 Loader transformation that will populate DIFT
SCD Product Dimension.
a. Right-click on the SCD Type 2 Loader transformation (for DIFT SCD Product
Dimension table) and select Properties.
2) Select Product_ID.
9-74 Chapter 9 Working with Slowly Changing Dimensions
h. Check Postcode.
6. Select File Save to save diagram and job metadata to this point.
7. Specify properties for the Lookup transformation that will populate DIFT SCD Order Fact.
b. Select the Lookups tab. Specify lookup mappings to the DIFT Customer Dimension table.
1) Select the row for DIFT SCD Customer Dimension and click Lookup Properties.
2) In the Lookup Properties - DIFT SCD Customer Dimension window, select the Source to
Lookup Mapping tab.
3) Click on the Customer_ID column in the Source table pane to select it and click on the
Customer_ID column in the Lookup table pane.
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-77
5) In the Lookup Properties - DIFT SCD Customer Dimension window, select the Lookup to
Target Mapping tab.
6) Click on the GenKeyCust column in the Lookup table pane to select it and click on the
GenKeyCust column in Target table the pane.
7) Click to map the selected columns.
8) Click on the BeginDateTimeCust column in the Lookup table pane to select it, and click on
the BeginDateTimeCust column in Target table the pane.
9-78 Chapter 9 Working with Slowly Changing Dimensions
The Lookup will retrieve the GenKeyCust and BeginDateTimeCust values from the
Customer Dimension table and assign them to the GenKeyCust and
BeginDateTimeCust columns in the target table. This links the transaction in the target
table to the current record for the customer ID in the Customer Dimension table.
10) In the Lookup Properties - DIFT SCD Customer Dimension window, select the Where tab.
11) Click the Data Sources tab.
12) Expand the SCDCustDim table.
13) Select the CurrRecCust column and click to insert the column reference in the
Expression Text pane.
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-79
15) In the Lookup Properties - DIFT SCD Customer Dimension window, select the Exceptions
tab.
16) Click to close the Lookup Properties - DIFT SCD Customer Dimension
window.
1) In the Lookup Properties window, select the row for DIFT SCD Product Dimension and
click Lookup Properties.
2) In the Lookup Properties - DIFT SCD Product Dimension window, select the Source to
Lookup Mapping tab.
3) Click on the Product_ID column in the Source table pane to select it, and click on the
Product_ID column in the Lookup table pane.
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-81
The Lookup transformation will use the Product_Id value in an incoming transaction to
do a lookup into the Product Dimension table to find a matching product id value.
5) In the Lookup Properties - DIFT SCD Product Dimension window, select the Lookup to
Target Mapping tab.
6) Click on the GenKeyProd column in the Lookup table pane to select it, and click on the
GenKeyProd column in Target table the pane.
7) Click to map the selected columns.
8) Click on the BeginDateTimeProd column in the Lookup table pane to select it, and click on
the BeginDateTimeProd column in Target table the pane.
9-82 Chapter 9 Working with Slowly Changing Dimensions
The lookup will retrieve the GenKeyProd and BeginDateTimeProd values from the
Customer Dimension table and assign them to the GenKeyProd and
BeginDateTimeProd columns in the target table. This links the transaction in the target
table to the current record for the customer id in the Product Dimension table.
10) In the Lookup Properties - DIFT SCD Product Dimension window, select the Where tab.
11) Click the Data Sources tab.
12) Expand the SCDProdDim table.
13) Select the CurrRecProd column and click to insert the column reference in the
Expression Text pane.
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-83
15) In the Lookup Properties - DIFT SCD Customer Dimension window, select the Exceptions
tab.
16) Click to close the Lookup Properties - DIFT SCD Customer Dimension
window.
2) All source columns and a generated column are selected by default. Remove all columns
except Source Row Number, Order_ID, Customer_ID, and Product_ID from the
Selected columns pane.
5) Four generated columns and no source columns are selected by default. Accept the default
column selection.
These mappings for the Lookup transformation were established in an earlier step.
8. Select File Save to save diagram and job metadata to this point.
9. Specify properties for the Table Loader transformation that will populate DIFT SCD Order
Fact.
10. Select File Save to save diagram and job metadata to this point.
Update the Processing Order, Run the Job, and view the Results
1. If necessary, select View Detail to open the Details panel. It opens below the Diagram Editor.
2. In the Details panel, select the Control Flow tab.
a. Use the and arrows to arrange the transformations in the following order:
9.2 Using the SCD Type 2 Loader and Lookup Transformations 9-89
b. Verify the order of processing on the Diagram tab in the Job Editor window.
The number in the upper-left corner of the transformation indicates the order of processing.
3. Run the job.
a. Select the Status tab in the Details pane to monitor the execution of the job.
b. Click to run the job.
e. Select the Status tab in the Details pane to monitor the execution of the job.
f. Click to run the job.
a. Right-click on the DIFT SCD Order Fact table and select Open. The data is displayed in the
View Table window. Scroll to see the right-most columns:
e. Select the Status tab in the Details pane to monitor the execution of the job.
f. Click to run the job.
b. Right-click on the DIFT SCD Order Fact table and select Open. The data is displayed in the
View Table window. Scroll to see the right- most columns:
Objectives
Define Change data capture.
List the types of CDC transformations.
List functions of the CDC transformations.
56
57
9-94 Chapter 9 Working with Slowly Changing Dimensions
CDC Transformations
SAS Data Integration Studio provides four CDC
transformations:
Q Attunity CDC
Q DB2 CDC
Q Oracle CDC
Q General CDC.
58
The separately licensed Attunity software enables you to generate source change tables from a variety of
relational databases running in a variety of operational environments.
59
9.3 Introducing the Change Data Capture Transformations 9-95
60 continued...
Prerequisites (continued)
Oracle CDC The Oracle CDC transformation has
been validated on Oracle 10G with
asynchronous CDC. The transformation
requires that you license SAS/ACCESS
to Oracle.
DB2 CDC The DB2 CDC transformation has been
validated on DB2/UDB, release 8.1,
fixpak 3. The transformation requires that
you license SAS/ACCESS to DB2.
General CDC The General CDC transformation has no
prerequisites.
61
9-96 Chapter 9 Working with Slowly Changing Dimensions
62
63
9.3 Introducing the Change Data Capture Transformations 9-97
64
65 continued...
9-98 Chapter 9 Working with Slowly Changing Dimensions
Objectives
Define SAS code transformation templates.
Explain the prompting framework.
Describe the different types of prompts that make
up the prompting framework.
Transformation Templates
The Process Library tree contains two kinds
of transformation templates:
Java Plug-In Created with the Java programming
Transformation language
Templates
SAS Code Created with the Transformation
Transformation Generator wizard
Templates
4
10-4 Chapter 10 Defining Generated Transformations
Analyze
Export
Menu for
Java Plug-In Transformation
6
10.1 SAS Code Transformations 10-5
transform data
create reports
8
10-6 Chapter 10 Defining Generated Transformations
options &options;
title "&title";
proc gchart data=&syslast;
vbar &classvar1 /
sumvar=&analysisvar
group=&classvar2;
run;quit;
&classvar1
&classvar2
&analysisvar
&title
10
10.1 SAS Code Transformations 10-7
%let syslast=yy.xx;
%let options=;
%let classvar1=Customer_Age_Group;
%let classvar2=Customer_Gender;
%let analysisvar=Quantity;
%let title=Sum of Quantity
across Gender and Age Group;
11
Options Window
The New Transformation wizards Options window is the
facility for creating the options to be used for the new
transformation.
12
10-8 Chapter 10 Defining Generated Transformations
Objectives
Create a custom transformation.
15
16
10.2 Using the New Transformation Wizard 10-9
17
18
10-10 Chapter 10 Defining Generated Transformations
19
10.2 Using the New Transformation Wizard 10-11
This demonstration creates a report on customer order information. The HTML report must have a text-
based output with summary statistics as well as a bar chart graphic. A transformation with this type of
output does not currently exist. Hence, a new SAS code transformation is created and then used in a job.
1. If necessary, access SAS Data Integration Studio using Barbaras credentials.
a. Select Start All Programs SAS SAS Data Integration Studio 4.2.
c. Click to close the Connection Profile window and access the Log On window.
d. Type Barbara as the value for the User ID field and Student1 as the value for the
Password field.
e. Type Summary Table and Vertical Bar Chart as the value for the Name field.
g. Type User Defined as the value for the Transformation Category field.
10-12 Chapter 10 Defining Generated Transformations
The General information for the new transformation should resemble the following:
h. Click .
j. Click .
10.2 Using the New Transformation Wizard 10-13
k. Click .
The Options window is available to define the options to be used in this transformation.
10.2 Using the New Transformation Wizard 10-15
1) Click .
2) Type Data Items as the value for the Displayed text field.
3) Click .
4) Click .
6) Click .
7) Click .
8) Type Other Options as the value for the Displayed text field.
9) Click .
10-16 Chapter 10 Defining Generated Transformations
b) Click .
d) Type Column to Chart for Vertical Bar Chart as the value for the
Displayed text field.
e) Type The column selected for this option will be the charting
column for GCHART and a classification column in the row
dimension for TABULATE. as the value for the Description field.
h) Select Data source column as the value for the Prompt type field.
i) Verify that Select from source is selected in the Columns to select from area.
n) Click .
b) Click .
d) Type Column for Grouping Charting Variable as the value for the
Displayed text field.
10-18 Chapter 10 Defining Generated Transformations
e) Type The column selected for this option will be the grouping
column for GCHART and a classification column in the row
dimension for TABULATE. as the value for the Description field.
h) Select Data source column as the value for the Prompt type field.
i) Verify that Select from source is selected in the Columns to select from area.
n) Click .
b) Click .
d) Type Column to Analyze for Vertical Bar Chart as the value for the
Displayed text field.
e) Type The column selected for this option will determine the
heights of the bars for GCHART and an analysis column for
TABULATE. as the value for the Description field.
h) Select Data source column as the value for the Prompt type field.
i) Verify that Select from source is selected in the Columns to select from area.
n) Click .
10.2 Using the New Transformation Wizard 10-19
The three options in the Data Items group should resemble the following:
b) Click .
d) Type Title for Table Report as the value for the Displayed text field.
e) Type Specify some text that will be used as the title for the
TABULATE output. as the value for the Description field.
g) Verify that Text is specified as the value for the Prompt type field.
i) Click .
b) Click .
d) Type Title for Graph Report as the value for the Displayed text field.
e) Type Specify some text that will be used as the title for the
GCHART output. as the value for the Description field.
g) Verify that Text is specified as the value for the Prompt type field.
i) Click .
10-20 Chapter 10 Defining Generated Transformations
The two options in the Titles group should resemble the following:
b) Click .
d) Type Specify SAS system options as the value for the Displayed text
field.
e) Type Specify a space separated list of global SAS system
options. as the value for the Description field.
g) Verify that Text is specified as the value for the Prompt type field.
i) Click .
b) Click .
d) Type Name of HTML file to be created as the value for the Displayed
text field.
e) Type Enter the name of the HTML file that will contain the
reports generated by this transformation. Do NOT enter the
HTML file extension! as the value for the Description field.
10.2 Using the New Transformation Wizard 10-21
g) Verify that Text is specified as the value for the Prompt type field.
i) Click .
The two options in the Other Options group should resemble the following:
10-22 Chapter 10 Defining Generated Transformations
Verify that the three items in the Data Items group are all required (note the *).
The descriptions entered for each of the parameters are displayed.
Clicking opens a dialog window to navigate the SAS Folders to a data source
from which a column can be selected.
10.2 Using the New Transformation Wizard 10-23
2) Click Titles in the selection pane. The two options in the Titles group are displayed:
3) Click Other Options in the selection pane. The two options in the Other Options group are
displayed:
10-24 Chapter 10 Defining Generated Transformations
q. Click .
The Inputs group box values add a specified number of inputs to the transformation when
it is used in a job. If you later update the transformation to increase this minimum number
of inputs value, any jobs that have been submitted and saved use the original value. The
increased minimum number of inputs is enforced only for subsequent jobs. Therefore, you
can increase the minimum number of inputs without breaking existing jobs. The Maximum
number of inputs field is used to allow you to connect additional inputs into the input port.
For example, a setting of 3 allows you to have up to three inputs. The rules for inputs also
apply to outputs.
10.2 Using the New Transformation Wizard 10-25
s. Click .
t. Click .
e. Type Report and Graphic for Customer Orders as the value for the Name field.
3. Add source table metadata to the diagram for the process flow.
c. Drag the DIFT Customer Order Information table object to the Diagram tab of the Job Editor.
4. Add the Summary Table and Vertical Bar Chart transformation to the process flow.
b. Drag the Summary Table and Vertical Bar Chart transformation to the Diagram tab of the Job
Editor.
5. Connect the DIFT Customer Order Information table object to the Summary Table and Vertical
Bar Chart transformation.
6. Select File Save to save diagram and job metadata to this point.
10.2 Using the New Transformation Wizard 10-27
7. Specify properties for the Summary Table and Vertical Bar Chart transformation.
a. Right-click on the Summary Table and Vertical Bar Chart transformation and select
Properties.
1) Click for the Column to Chart for Vertical Bar Chart option.
2) Select Customer Age Group in the Select a Data Source Item window.
3) Click .
6) Click .
10-28 Chapter 10 Defining Generated Transformations
9) Click .
10.2 Using the New Transformation Wizard 10-29
1) Type NODATE NONUMBER LS=80 as the value for the Specify SAS system
options field.
f. Click .
8. Select File Save to save diagram and job metadata to this point.
10.2 Using the New Transformation Wizard 10-31
d. Click .
e. In the Microsoft Internet Explorer window, click to close the information bar.
Exercises
Name Idvariable
Displayed text: ID Variable
Description: The column used to identify obs in input and
output data sets. Its values are interpreted
and extrapolated according to the values of the
INTERVAL= option.
Required Yes
Prompt type: Data Source Column
Other information: Do not allow character columns; only allow 1 column as a selection.
Name Fcastvariable
Displayed text: Column to Forecast
Description: The column from the input data set that is to
be forecasted.
Required Yes
Prompt type: Data Source Column
Other information: Do not allow character columns; only allow one column as a
selection.
10-36 Chapter 10 Defining Generated Transformations
Name Alpha
Displayed text: Significance Level
Description: Specify significance level for confidence
intervals (default is .05).
Required No
Prompt type: Numeric
Other information: Allow values other than integers; provide a default value of .05
Name Lead
Displayed text: Number of Periods to Forecast
Description: Specify the number of periods ahead to
forecast (default is 6).
Required No
Prompt type: Numeric
Other information: Provide a default value of 6.
Name Method
Displayed text: Method to Model the Series
Description: Specify the method to use to model the series
and generate the forecasts. (Default is
STEPAR)
Required No
Prompt type: Text
Other information: Provide a list of values of STEPAR, EXPO, WINTERS,
ADDWINTERS. Set STEPAR as the default.
10.2 Using the New Transformation Wizard 10-37
Add the following options for the Titles and Other Options group:
Name Title
Displayed text: Title for Forecast Graphic
Description: Specify text that will be used as the title
for the FORECAST output.
Required No
Prompt type: Text
Name Options
Displayed text: Specify SAS system options
Description: Specify a space separated list of global SAS
system options.
Required No
Prompt type: Text
Name File
Displayed text: Name of HTML file to be created
Description: Enter the name of the HTML file that will
contain the report generated by this
transformation. Do NOT enter the HTML file
extension!
Required No
Prompt type: Text
Use the YYMM column as the ID column and the Profit column as the column to forecast.
View the HTML file. The output should resemble the following:
3. Check In Objects
Check in the transformation and job objects.
10.2 Using the New Transformation Wizard 10-39
The inner job should be a copy of the job from Exercise 3, but modified with parameters.
10-40 Chapter 10 Defining Generated Transformations
4) Click .
8) Click .
%macro ForeCastGraph;
options mprint;
%if (%quote(&options) ne) %then
%do;
options &options;
%end;
%mend ForeCastGraph;
%ForeCastGraph;
g. Click .
1) Click .
2) Type Data Items as the value for the Displayed text field.
3) Click .
4) Click .
5) Type Forecast Options as the value for the Displayed text field.
10-42 Chapter 10 Defining Generated Transformations
6) Click .
7) Click .
8) Type Titles and Other Options as the value for the Displayed text field.
9) Click .
b) Click .
e) Type The column used to identify obs in input & output data
sets. Its values are interpreted and extrapolated according
to the values of the INTERVAL= option. as the value for the
Description field.
h) Select Data source column as the value for the Prompt type field.
i) Verify that Select from source is selected in the Columns to select from area.
n) Click .
b) Click .
d) Type Column to Forecast as the value for the Displayed text field.
h) Select Data source column as the value for the Prompt type field.
i) Verify that Select from source is selected in the Columns to select from area.
n) Click .
b) Click .
d) Type Significance Level as the value for the Displayed text field.
g) Select Numeric is specified as the value for the Prompt type field.
10-44 Chapter 10 Defining Generated Transformations
j) Click .
b) Click .
d) Type Number of Periods to Forecast as the value for the Displayed text
field.
e) Type Specify the number of periods ahead to forecast (default
is 6). as the value for the Description field.
g) Select Numeric is specified as the value for the Prompt type field.
i) Click .
3) Define metadata for the method to model the series forecast option.
a) Select the Forecast Options group.
b) Click .
d) Type Method to Model the Series as the value for the Displayed text
field.
e) Type Specify the method to use to model the series and
generate the forecasts. (Default is STEPAR) as the value for the
Description field.
h) Select User selects values from a static list as the value for the Method for
populating prompt field.
s) Click .
k. Define metadata for the options in the Titles and Other Options group.
1) Define metadata for the title to be used with forecast graphic.
a) Select the Titles and Other Options group.
b) Click .
d) Type Title for Forecast Graphic as the value for the Displayed text
field.
e) Type Specify text that will be used as the title for the
FORECAST output. as the value for the Description field.
g) Verify that Text is specified as the value for the Prompt type field.
i) Click .
b) Click .
d) Type Specify SAS system options as the value for the Displayed text
field.
e) Type Specify a space separated list of global SAS system
options. as the value for the Description field.
g) Verify that Text is specified as the value for the Prompt type field.
i) Click .
b) Click .
d) Type Name of HTML file to be created as the value for the Displayed
text field.
e) Type Enter the name of the HTML file that will contain the
report generated by this transformation. Do NOT enter the
HTML file extension! as the value for the Description field.
h) Verify that Text is specified as the value for the Prompt type field.
j) Click .
The two options in the Other Options group should resemble the following:
10.3 Solutions to Exercises 10-47
2) Click Forecast Options in the selection pane. The three options in the Forecast Options
group are displayed.
3) Verify that all fields have default values and that four values are available for selection for the
Method to Model the Series.
10.3 Solutions to Exercises 10-49
4) Click Titles and Other Options in the selection pane. The three options in the Titles and
Other Options group are displayed.
m. Click .
o. Click .
p. Click .
h. Select File Save to save diagram and job metadata to this point.
i. Specify properties for the Extract transformation.
1) Right-click on the Extract transformation and select Properties.
2) Select the Where tab.
3) Type Company = Orion Australia as the value for the Expression Text area.
j. Select File Save to save diagram and job metadata to this point.
10.3 Solutions to Exercises 10-51
Demonstration: Creating Jobs for Execution on DataFlux Integration Server ................... 11-20
Objectives
Define data quality.
Discuss data quality offerings from SAS.
4
11-4 Chapter 11 Implementing Data Quality Techniques (Self-Study)
Cleansing
data will
result in
accurate
reports
6
11.1 SAS and Data Quality 11-5
7 ...
8
11-6 Chapter 11 Implementing Data Quality Techniques (Self-Study)
Local Process
PROC DQSCHEME Quality dfPower
read
Knowledge
PROC DQMATCH Studio
Base
18 functions
read
run Profile
Server Process jobs
PROC DQSRVADM Integration
PROC DQSRVSVC Server
8 functions Architect
jobs
The language elements in the SAS Data Quality Server software can be separated into two functional
groups. As shown in the previous diagram, one group cleanses data in SAS, and the other group runs data
cleansing jobs and services on Integration Servers from DataFlux (a SAS company).
The language elements in the Local Process group read data definitions out of the Quality Knowledge
Base to, for example, create match codes, apply schemes, or parse text. The language elements in the
Server Process group start and stop jobs and services and manage log entries on DataFlux Integration
Servers.
The DataFlux Integration Servers and the related dfPower Profile and dfPower Architect applications are
made available with the SAS Data Quality Server software in various software bundles.
11.1 SAS and Data Quality 11-7
10
12
13
All DataFlux jobs and real-time services run on DataFlux Integration Servers. To execute
DataFlux jobs and services from SAS Data Integration Studio jobs, you must first install a
DataFlux Integration Server and register that server in SAS metadata.
11.1 SAS and Data Quality 11-9
14
15
11-10 Chapter 11 Implementing Data Quality Techniques (Self-Study)
Objectives
Discuss the DataFlux Integration Server.
19
11.2 Working with the DataFlux IS Transformations 11-11
20
Batch jobs can be run on a server-grade machine, meaning the process is more scalable to larger data
sources. Server-class machines supported by DataFlux Integration Server include:
Windows
Unix AIX/HP-UX/Solaris/Linux
The data cleansing processes, available as real-time services via Service Oriented Architecture (SOA), are
available to any Web-based application that can consume services (Web applications, ERP systems,
operational systems, SAS, and more).
The data cleansing jobs and services registered to the DataFlux Integration Server are available (via
procedures and functions) from within SAS. This gives the user the full power of dfPower Architect and
dfPower Profile functionality from within SAS.
11-12 Chapter 11 Implementing Data Quality Techniques (Self-Study)
21
Q Enterprise
22
11.2 Working with the DataFlux IS Transformations 11-13
23
24
11-14 Chapter 11 Implementing Data Quality Techniques (Self-Study)
Real-Time Services
In addition, existing batch jobs can be converted to real-
time services that can be invoked by any application that
is Web service enabled. This provides users with the
ability to reuse the business logic developed when
building batch jobs for data migration or loading a data
warehouse, and apply it at the point of data entry to
ensure consistent, accurate, and reliable data across the
enterprise.
25
11.2 Working with the DataFlux IS Transformations 11-15
1. Select Start All Programs DataFlux Integration Server 8.1 Integration Server Manager.
To configure a remote DataFlux Integration Server, specify a valid Server name and Server
port for the remote server.
The DataFlux Integration Server Manager window now displays the machine/port specified.
This demonstration illustrates the creation of a dfPower Profile job, how to run the job, and how to review
the generated metrics.
1. From within SAS Data Integration Studio, select Tools dfPower Tool
dfPower Profile(Configurator).
2. In the data sources area, click on to expand the DataFlux Sample database.
3. Click the Contacts table. The right side of the window populates with a listing of columns found in
the Contacts table.
11-22 Chapter 11 Implementing Data Quality Techniques (Self-Study)
4. Click to the left of the Contacts table. This selects all the fields of the table to be part of the
Profile job.
b. Click to the left of Select/unselect all to select all the Column profiling metrics.
c. Click to save the metric selections and close the Metrics window.
11.2 Working with the DataFlux IS Transformations 11-25
e. Click to save the metric selections and close the Metrics window.
11.2 Working with the DataFlux IS Transformations 11-27
The ADDRESS field has a under the M field to identify that this column has metric overrides
specified.
e. Click to save the metric selections and close the Metrics window.
e. Click to save the metric selections and close the Metrics window.
11-28 Chapter 11 Implementing Data Quality Techniques (Self-Study)
e. Click to save the metric selections and close the Metrics window.
e. Click to save the metric selections and close the Metrics window.
Once the desired metrics are specified, the profile job is ready to run and produce the profile
report.
11.2 Working with the DataFlux IS Transformations 11-29
c. Type Profile fields for Contacts table as the value for the Description field.
d. Click .
The name is now displayed on the title bar for dfPower Profile (Configurator).
11-30 Chapter 11 Implementing Data Quality Techniques (Self-Study)
1. From within SAS Data Integration Studio, select Tools dfPower Tool dfPower Architect.
b. Click next to Data Inputs to expand the Data Inputs category of nodes.
11-32 Chapter 11 Implementing Data Quality Techniques (Self-Study)
c. Click the Data Source node, and then select Insert Node On Page. (Alternatively, you can
double-click on node and it will be automatically appended to the job flow, or you can drag and
drop the node onto the job flow and perform a manual connection.)
The node is added to the job flow and a Data Source Properties window is opened.
11.2 Working with the DataFlux IS Transformations 11-33
d. In the Data Source Properties window, click next to the Input table field. The Select
Table window opens.
e. Click to expand the DataFlux Sample database.
f. Click the Contacts table.
g. Click .
11-34 Chapter 11 Implementing Data Quality Techniques (Self-Study)
The Contacts table is now listed as the value for the Input table field. The fields found in
the Contacts table are now listed as Available.
i. Click .
11.2 Working with the DataFlux IS Transformations 11-35
3. Add a Standardization node to the job flow, and specify appropriate properties for it.
a. From the Nodes tab in the Toolbox panel, click to collapse the Data Inputs category of nodes.
c. Click the Standardization node, and then select Insert Node Auto Append. (Alternatively,
you can double-click on node and it will be automatically appended to the job flow, or you can
drag and drop the node onto the job flow and perform a manual connection.)
d. In the Standardization Properties window, move the ADDRESS and STATE fields from Available
to Selected.
1) In the Standardization fields area, click ADDRESS field in the Available list.
e. Specify the appropriate Definition and/or Scheme for the two selected columns.
1) Click in the Definition field for the ADDRESS field to allow selection of a valid
standardization definition.
3) Click in the Definition field for the STATE field to allow selection of a valid
standardization definition.
5) Verify that the default names given to the output fields are ADDRESS_Stnd and
STATE_Stnd,
11.2 Working with the DataFlux IS Transformations 11-39
1) Click .
3) Remove DATE, MATCH_CD, and DELETE_FLG from the Output fields list.
d) Click to move the selected fields from the Output fields list.
c. Verify that the ADDRESS and STATE field values are standardized by checking for the following:
The state values for the first two records originally were state names spelled out the
STATE_Stnd field now has these values as abbreviations.
The address value for the first record has the word Street the ADDRESS_Stnd field has St.
The address value for the second record has the word Road the ADDRESS_Stnd field has
Rd.
Some of the original address values are all uppercased the ADDRESS_Stnd field has these
values proper-cased.
11.2 Working with the DataFlux IS Transformations 11-43
5. Add an Identification Analysis node to the job flow, and specify appropriate properties for it.
a. From the Nodes tab in the Toolbox panel, click to collapse the Data Inputs category of nodes.
d. In the Identification Analysis Properties window, move the CONTACT field from Available to
Selected.
g. Verify that the default name given to the output column is CONTACT_Identity,
11.2 Working with the DataFlux IS Transformations 11-45
h. Click .
For the first set of records in the Contacts table, the CONTACT field values are identified as
INDIVIDUAL.
11-46 Chapter 11 Implementing Data Quality Techniques (Self-Study)
7. Add a Branch node to the job flow, and specify appropriate properties for it.
a. From the Nodes tab in the Toolbox panel, click to collapse the Quality category of nodes.
c. Click the Branch node, and then select Insert Node Auto Append. The Branch Properties
window is displayed.
d. Click to accept the default settings and close the Branch Properties window.
11.2 Working with the DataFlux IS Transformations 11-47
8. Add a Data Validation node to the job flow, and specify appropriate properties for it.
a. From the Nodes tab in the Toolbox panel, click to collapse the Utilities category of nodes.
c. Click the Data Validation node, and then select Insert Node Auto Append.
11-48 Chapter 11 Implementing Data Quality Techniques (Self-Study)
9. Add a Gender Analysis node to the job flow, and specify appropriate properties for it.
a. From the Nodes tab in the Toolbox panel, click to collapse the Profiling category of nodes.
1) In the Gender analysis fields area, click CONTACT field in the Available list.
g. Verify that the default name given to the output column is CONTACT_Gender.
11.2 Working with the DataFlux IS Transformations 11-51
h. Click .
a. From the Nodes tab in the Toolbox panel, click to collapse the Quality category of nodes.
d. In the Frequency Distribution Properties window, move the CONTACT_Gender field from
Available to Selected.
13. Add a second Data Validation node to the job flow, and specify appropriate properties for it.
a. From the Nodes tab in the Toolbox panel, verify that the Profiling category of nodes is expanded.
b. Click the Data Validation node, and then select Insert Node Insert On Page.
d. Click on the new Data Validation node and drag it so that it is next to the first Data Validation
node.
11-56 Chapter 11 Implementing Data Quality Techniques (Self-Study)
e. Click on the Branch node and (without releasing the mouse button) drag cursor to the second
Data Validation node. Release the mouse button.
15. Add a Text File Output node to the job flow, and specify appropriate properties for it.
a. From the Nodes tab in the Toolbox panel, click to collapse the Profiling category of nodes.
b. Click next to Data Outputs to expand the Data Outputs category of nodes.
1) Click next to the Output file field. The Save As window opens.
b. Click next to Data Outputs to expand the Data Outputs category of nodes.
c. Click the Frequency Distribution 1 node in the job flow.
d. Click the HTML Report node, and then select Insert Node Auto Append.
e. In the HTML Report Properties window, specify the attributes of the file that will be created.
1) Type Frequency Counts from Gender Analysis as the value for the Report
title field.
a) Type Contacts Gender Frequencies as the value for the Name field.
b) Click .
The final settings for the HTML Report Properties window should resemble the following:
c. Click .
11.2 Working with the DataFlux IS Transformations 11-63
This demonstration illustrates the creation of a dfPower Architect job using the External Data Provider.
This job will be uploaded to the DataFlux Integration Server and then processed with the DataFlux IS
Service transformation in SAS Data Integration Studio.
1. From within SAS Data Integration Studio, select Tools dfPower Tool dfPower Architect.
2. Add an External Data Provider node to the job flow.
a. Locate the Nodes tab in the Toolbox panel.
b. Click next to Data Inputs to expand the Data Inputs category of nodes.
c. Click the External Data Provider node, and then select Insert Node On Page.
(Alternatively, you can double-click on node and it will be automatically added to the job flow.)
11-64 Chapter 11 Implementing Data Quality Techniques (Self-Study)
The node is added to the job flow and an External Data Provider window is opened.
11.2 Working with the DataFlux IS Transformations 11-65
e. Select the first generic field (Field) and change the value of the Field Name field to
Field_1.
f. Type 20 as the value for the Field Length field for Field_3.
g. Type 25 as the value for the Field Length field for Field_4.
h. Click .
11.2 Working with the DataFlux IS Transformations 11-67
4. Add a Basic Statistics node to the job flow, and specify appropriate properties for it.
a. From the Nodes tab in the Toolbox panel, click to collapse the Data Inputs category of nodes.
c. Click the Basic Statistics node, and then select Insert Node Auto Append.
11-68 Chapter 11 Implementing Data Quality Techniques (Self-Study)
e. Click .
c. Click .
1. Select Start All Programs DataFlux Integration Server 8.1 Integration Server Manager.
11-70 Chapter 11 Implementing Data Quality Techniques (Self-Study)
8. (Optional) Select Actions Upload. The Upload Architect Jobs window is displayed.
11.2 Working with the DataFlux IS Transformations 11-73
10. (Optional) Click to move the Contacts Table Analysis job to the Selected list.
13. Select Actions Upload. The Upload Real-time Services window is displayed.
11.2 Working with the DataFlux IS Transformations 11-75
15. Click to move the LWDIWN EDP Basic Stats job to the Selected list.
b. Click to close the Connection Profile window and access the Log On window.
c. Type Ahmed as the value for the User ID field and Student1 as the value for the
Password field.
7. Click .
11.2 Working with the DataFlux IS Transformations 11-81
9. Click .
11-82 Chapter 11 Implementing Data Quality Techniques (Self-Study)
2) Type Default Base Path as the value for the Description field.
3) Click .
11. Click .
11-84 Chapter 11 Implementing Data Quality Techniques (Self-Study)
b. Select DefaultAuth.
d. Verify the value for the Port number field is set to 21036.
The final settings for connection properties should resemble the following:
11.2 Working with the DataFlux IS Transformations 11-85
13. Click .
15. Click .
11-86 Chapter 11 Implementing Data Quality Techniques (Self-Study)
c. Click to close the Connection Profile window and access the Log On window.
d. Type Bruno as the value for the User ID field and Student1 as the value for the Password
field.
3. Create initial job metadata that will use DataFlux IS Job transformation.
a. Right-click on DataFlux IS Examples folder and select New Job.
b. Type LWDIWN - Run Profile and Architect Jobs as the value for the Name field.
c. Verify that /Data Mart Development/DataFlux IS Examples is the value for the Location
field.
d. Click .
11.2 Working with the DataFlux IS Transformations 11-89
The Contact Profile.pfi job file should appear in the Job field.
k. Click .
The diagram tab of the Job Editor window now displays the following:
11.2 Working with the DataFlux IS Transformations 11-91
d. (Optional) Right-click on the second DataFlux IS Job transformation and select Properties.
e. (Optional) Type Architect Reports at the end of the default value for the Name field.
11-92 Chapter 11 Implementing Data Quality Techniques (Self-Study)
g. (Optional) Verify that Architect is the value for the Job type field.
h. (Optional) Verify that Contacts Table Analysis.dmc is the value for the Job field.
i. (Optional) Click .
The diagram tab of the Job Editor window now displays the following:
11.2 Working with the DataFlux IS Transformations 11-93
7. Run the job by clicking in the tool set from the Job Editor window.
For DataFlux IS Job transformations, completed successfully simply means that the process was
passed off successfully to the DataFlux Integration Server.
8. Select File Close to close the Job Editor window.
11-94 Chapter 11 Implementing Data Quality Techniques (Self-Study)
9. Access DataFlux Integration Server Manager and verify the jobs ran successfully.
a. Select Start All Programs DataFlux Integration Server 8.1
Integration Server Manager.
b. Verify that both the Profile job and the Architect job completed. The bottom portion of the
DataFlux Integration Server Manager displays this information on the Status of All Jobs tab.
g. Navigate to S:\Workshop\lwdiwn.
h. Select All Files (*.*) as the value for the Files of type field.
j. Click . The path and filename update in the Import File window.
k. Click .
The Contacts Profile report is imported to the DataFlux Default management resources location.
11-98 Chapter 11 Implementing Data Quality Techniques (Self-Study)
l. Double-click the Contacts Profile report and dfPower Profile(Viewer) is invoked with the profile
report.
c. (Optional) When done viewing, select File Close to close the browser.
12. (Optional) View the generated text file output.
a. (Optional) Open a Windows Explorer by selecting Start All Programs Accessories
Windows Explorer.
b. (Optional) Navigate to S:\Workshop\lwdiwn.
11-100 Chapter 11 Implementing Data Quality Techniques (Self-Study)
c. (Optional) Double-click the Unknown_Identities.txt file. The file opens in a Notepad window as
displayed.
d. (Optional) When done viewing, select File Exit to close the Notepad window.
11.2 Working with the DataFlux IS Transformations 11-101
The job being created will use the previously registered DataFlux Integration Server service. Before
taking advantage of the DataFlux IS Service transformation, a table needs to be defined in metadata (this
table will be the source data for the DataFlux IS Service transformation.
c. Click to close the Connection Profile window and access the Log On window.
d. Type Bruno as the value for the User ID field and Student1 as the value for the
Password field.
d. Click .
11.2 Working with the DataFlux IS Transformations 11-103
e. The needed SAS Library does not exist in metadata. Click to invoke the New Library
wizard.
1) Type DIWN DataFlux Sample Database as the value for the Name field.
2) Click .
11-104 Chapter 11 Implementing Data Quality Techniques (Self-Study)
3) Double-click SASApp to move it from the Available servers list to the Selected
servers list.
4) Click .
11.2 Working with the DataFlux IS Transformations 11-105
6) Click .
11-106 Chapter 11 Implementing Data Quality Techniques (Self-Study)
7) The needed Database Server does not exist in metadata. Click to invoke the New
Server wizard.
a) Type DIWN DataFlux Sample Database Server as the value for the Name
field.
b) Click .
11.2 Working with the DataFlux IS Transformations 11-107
c) Select ODBC Microsoft Access as the value for the Data Source Type field.
d) Click .
11-108 Chapter 11 Implementing Data Quality Techniques (Self-Study)
e) Click Datasrc.
f) Type "DataFlux Sample" as the value for the Datasrc field.
g) Click .
11.2 Working with the DataFlux IS Transformations 11-109
i) Click .
11-110 Chapter 11 Implementing Data Quality Techniques (Self-Study)
8) Verify that the newly defined database server (DIWN DataFlux Sample Database Server)
appears in the Database Server field.
9) Click .
11.2 Working with the DataFlux IS Transformations 11-111
10) Review the final settings for the New Library Wizard.
11) Click .
11-112 Chapter 11 Implementing Data Quality Techniques (Self-Study)
f. Verify that the newly defined SAS library (DIWN DataFlux Sample Database) appears in the
SAS Library field. Also, the value specified for the Datasrc (DataFlux Sample)
information for the library server should appear in the Data Source field.
g. Click .
11.2 Working with the DataFlux IS Transformations 11-113
i. Click .
11-114 Chapter 11 Implementing Data Quality Techniques (Self-Study)
k. Click .
a. Right-click on the Contacts table (in the Data Mart Development DataFlux IS Examples
folder on the Folders tab) and select Properties.
b. Type DIWN Contacts as the value for the Name field on the General tab.
c. Verify that /Data Mart Development/DataFlux IS Examples is the value for the Location
field.
d. Click .
i. On the target table side, change the name of the PHONE column to Field_2.
j. On the target table side, change the name of the OS column to Field_3.
k. On the target table side, change the name of the DATABASE column to Field_4.
11.2 Working with the DataFlux IS Transformations 11-117
l. Remove the remaining target columns from the target table side.
1) Click ID column, hold down the SHIFT key and click the CITY column.
2) Click Delete Target Columns from the tool set of the Mappings tab.
c. Right-click on the temporary table object associated with the DataFlux IS Service transformation
and select Open.
A Warning window opens:
d. Click .
11.2 Working with the DataFlux IS Transformations 11-119
The profiling metrics requested in the Architect job are displayed in the View Data window.
Exercises
Upload the new dfPower Architect job to the DataFlux Integration Server as a service.
Create a job in SAS Data Integration Studio.
Name the job LWDIWN Architect Service Exercise.
Place the job in \Data Mart Development\DataFlux IS Examples folders.
Use the DIWN Contacts table as a source table.
Use the Extract transformation following the source table, renaming the target columns for STATE
and PHONE to Field_1 and Field_2. Remove the remaining target columns.
Add a DataFlux IS Service transformation to the job flow, connecting the output of the Extract
transformation to this new transformation. Be sure to specify the new service in the Properties
window.
Save and then run the job.
View the output to verify that the dfPower Architect job did produce basic pattern analysis results.
Chapter 12 Deploying Jobs
Objectives
Discuss the types of job deployment available
for SAS Data Integration Studio jobs.
as a stored process
as a Web service.
4
12-4 Chapter 12 Deploying Jobs
Deployment Techniques
You can also deploy a job in order to accomplish the
following tasks:
Divide a complex process flow into a set of smaller
flows that are joined together and can be executed
in a particular sequence.
Execute a job on a remote host.
5
12.2 Deploying Jobs for Scheduling 12-5
Objectives
Provide an overview of the scheduling process.
Discuss the types of scheduling servers.
Discuss the Schedule Manager in SAS Management
Console.
Discuss batch servers.
Scheduling Requirements
The SAS scheduling tools enable you to automate the
scheduling and execution of SAS jobs across your
enterprise computing environment. Scheduling requires
four main components:
SAS Application
Schedule Manager
Scheduling Server
Batch Server
9
12-6 Chapter 12 Deploying Jobs
Batch server 1
Flow_ABC Flow_ABC Command line 1 -
Job A
event Job_A event Job_A Command line 2 -
Deployment Job B
directory event Job_B event Job_B
Batch server 2
event Job_C event Job_C
Command line 3 -
Job C
10
Step 1: A SAS application (such as SAS Data Integration Studio) creates a job that needs to be
scheduled. If the job was created by SAS Data Integration Studio, the job is placed in a
deployment directory.
Step 2: A user set up to administer scheduling can use the Schedule Manager plug-in in SAS
Management Console to prepare the job for scheduling, or users can schedule jobs directly
from other SAS applications. The job is added to a flow, which can include other jobs and
events that must be met (such as the passage of a specific amount of time or the creation of a
specified file). The Schedule Manager also specifies which scheduling server should be used
to evaluate the conditions in the flow and which batch server should provide the command to
run each job. The type of events you can define depends on the type of scheduling server you
choose. When the Schedule Manager has defined all the conditions for the flow, the flow is
sent to the scheduling server, which retrieves the command that is needed to run each job from
the designated batch server.
Step 3: The scheduling server evaluates the conditions that are specified in the flow to determine
when to run a job. When the events specified in the flow for a job are met, the scheduling
server uses the command obtained from the appropriate batch server to run the job. If you
have set up a recurring scheduled flow, the flow remains on the scheduling server and the
events continue to be evaluated.
Step 4: The scheduling server uses the specified command to run the job in the batch server, and then
the results are sent back to the scheduling server.
12.2 Deploying Jobs for Scheduling 12-7
Scheduling Servers
SAS supports scheduling through three types
of scheduling servers:
Platform Process Manager server
You can create a definition for a scheduling server by using the Server Manager plug-in in SAS
Management Console or an application that directly schedules jobs.
12-8 Chapter 12 Deploying Jobs
The Platform Process Manager server, which is part of Platform Suite for SAS, provides full-featured
enterprise scheduling capabilities, including features such as workload prioritization and policy-based
scheduling. The server enables you to schedule jobs using a variety of recurrence criteria and
dependencies on other jobs, time events, or file events. You can use the Flow Manager application (also
part of Platform Suite for SAS) to manage scheduled jobs, including deleting and stopping previously
scheduled jobs.
Because Platform Suite for SAS is a separate application, it requires an additional license fee. It also
requires you to perform additional tasks to install, configure, and maintain all components of the
application. However, the components included with the application also provide functions such as load
balancing and submission of jobs to a grid.
The metadata for a Process Manager Server includes the following information:
the network address or host name of a machine
the port number for the server
Operating system scheduling provides the ability to schedule jobs through the services provided through a
servers operating system. Using operating system scheduling provides a basic level of scheduling at no
additional cost, because the service is provided by software you already own. However, this type of
scheduling does not support advanced scheduling capabilities, such as the use of many types of
dependencies. The specific scheduling functions that are supported vary according to the operating system
used, which can make it more difficult to set up consistent scheduling criteria on several servers.
Managing scheduled jobs requires you to issue operating system commands, rather than using a graphical
user interface. The metadata for an operating system scheduling server includes the following:
the network address of a machine
the port number for the server
the directory on the server where scheduled flows should be stored (control directory)
the command to start a SAS session on the server
In-process scheduling provides the ability to schedule jobs from certain Web-based SAS applications
without using a separate scheduling server. With in-process scheduling, the scheduling functions run as a
process within the application. Although in-process scheduling is supported only for certain applications
(such as SAS Web Report Studio), it offers basic scheduling capabilities without incurring any additional
cost or requiring many installation or configuration tasks. Because an in-process scheduling server runs as
part of the application, this type of scheduling also eliminates the need for the application to authenticate
scheduled jobs to a separate server. However, the application must be running at the time the scheduled
job attempts to run.
12.2 Deploying Jobs for Scheduling 12-9
Schedule Manager
The Schedule Manager plug-in for SAS Management
Console is a user interface that enables you to create
flows, which consist of one or more SAS jobs. Each job
within a flow can be triggered to run based on criteria
such as a date and time, the state of a file on the file
system, or the status of another job within the flow.
The available scheduling criteria depend on the type
of scheduling server used.
12
Schedule Manager is designed as a scheduler-neutral interface. When you create a flow, you specify
which scheduling server that the flow is to be associated with. Schedule Manager converts the flow
information to the appropriate format and submits it to the scheduling server (the Platform Computing
server, an operating system scheduling server, or an in-process scheduling server).
12-10 Chapter 12 Deploying Jobs
Batch Servers
Batch servers provide the command needed to run the
programs that have been submitted for scheduling.
Several batch server types are supported, each
of which provides the command to run a scheduled
SAS job from a specific application in a specific
environment.
The command is included in the metadata definition
for each server.
The batch servers commands are independent
of the type of scheduling server used.
Batch server metadata objects are components of the
SAS Application Server (for example, SASApp), and
can be created by using the Server Manager plug-in in
SAS Management Console.
13
Job Metadata
Job metadata becomes available to the Schedule
Manager when you use a SAS application such as
SAS Data Integration Studio to schedule a job.
The job metadata includes the following information:
the command that is to be used to execute the job
14
Flow Metadata
Flow metadata is created when you use Schedule
Manager to create a flow. The flow metadata includes
the following information:
the name of the scheduling server that is to execute
the jobs in the flow
the triggers and dependencies that are associated
with the jobs in the flow
Depending on the scheduling server that the user
specifies, Schedule Manager converts the flow metadata
to the appropriate format and submits it to the scheduling
server.
15
12-12 Chapter 12 Deploying Jobs
16
Process Manager Server Controls the submission of jobs to Platform Load Sharing
Facility (LSF) and manages all dependencies among jobs.
Platform Flow Manager Provides a visual representation of flows that have been created
for a Process Manager Server. These include flows that were
created and scheduled in the SAS Management Console
Schedule Manager, as well as reports that have been scheduled
through SAS Web Report Studio.
Platform Flow Manager provides information about each flows
status and associated dependencies. You can view or update the
status of jobs within a flow, and you can run or rerun a single
job regardless of whether the job failed or completed
successfully.
Platform Calendar Editor Is a scheduling client for a Process Manager Server. This client
enables you to create new calendar entries. You can use it to
create custom versions of calendars that are used to create time
dependencies for jobs that are scheduled to run on the server.
Platform Load Sharing Facility (LSF) Controls the submission of jobs to Platform Load Sharing
Facility (LSF) and manages all dependencies among jobs.
Platform Grid Management Services Manages jobs that are scheduled in a grid environment. This
software collects information on the jobs that are running on the
grid and the nodes to which work has been distributed. It makes
the information available to the Grid Manager plug-in for SAS
Management Console. You can use Grid Manager to view and
manage the grid workload information.
12.2 Deploying Jobs for Scheduling 12-13
c. Click to close the Connection Profile window and access the Log On window.
d. Type Bruno as the value for the User ID field and Student1 as the value for the
Password field.
a. Right-click DIFT Populate Order Fact Table and select Scheduling Deploy.
b. Verify that SASApp SAS DATA Step Batch Server is selected as the value for the
Batch Server field.
1) Type Orion Star Jobs as the value for the Name field.
7) Click .
12.2 Deploying Jobs for Scheduling 12-15
8) Click .
d. Accept the default value for the Deployed Job Name field.
e. Verify that /Data Mart Development/Orion Jobs is the value for the Location field.
f. Click to save the information and close the Deploy a job for scheduling window.
12-16 Chapter 12 Deploying Jobs
g. Click .
The Orion Jobs folders shows that the DIFT Populate Order Fact Table job icon has
been decorated to signify scheduling.
Also, a new objects appears in the same folder, DIFT_Populate_Order_Fact_Table.
l. Right-click on the job object, DIFT Populate Order Fact Table. There are more options
available:
12-20 Chapter 12 Deploying Jobs
4. Deploy DIFT Populate Old and Recent Orders Tables for scheduling.
a. Right-click DIFT Populate Old and Recent Orders Tables and select Scheduling Deploy.
b. Verify that SASApp SAS DATA Step Batch Server is selected as the value for the
Batch Server field.
c. Click next to the Deployment Directory field and select Orion Star Jobs.
d. Accept the default value for the Deployed Job Name field.
e. Verify /Data Mart Development/Orion Jobs is the value for the Location field.
f. Click to save the information and close the Deploy a job for scheduling window. An
information message appears.
a. Right-click DIFT Populate Customer Dimension Table and select Scheduling Deploy.
b. Verify that SASApp SAS DATA Step Batch Server is selected as the value for the
Batch Server field.
c. Click next to the Deployment Directory field and select Orion Star Jobs.
d. Accept the default value for the Deployed Job Name field.
e. Verify /Data Mart Development/Orion Jobs is the value for the Location field.
f. Click to save the information and close the Deploy a job for scheduling window. An
information message appears.
a. Right-click DIFT Populate Organization Dimension Table and select Scheduling Deploy.
b. Verify that SASApp SAS DATA Step Batch Server is selected as the value for the
Batch Server field.
c. Click next to the Deployment Directory field and select Orion Star Jobs.
d. Accept the default value for the Deployed Job Name field.
e. Verify /Data Mart Development/Orion Jobs is the value for the Location field.
f. Click to save the information and close the Deploy a job for scheduling window. An
information message appears.
a. Right-click DIFT Populate Product Dimension Table and select Scheduling Deploy.
b. Verify that SASApp SAS DATA Step Batch Server is selected as the value for the
Batch Server field.
c. Click next to the Deployment Directory field and select Orion Star Jobs.
d. Accept the default value for the Deployed Job Name field.
e. Verify /Data Mart Development/Orion Jobs is the value for the Location field.
f. Click to save the information and close the Deploy a job for scheduling window. An
information message appears.
a. Right-click DIFT Populate Time Dimension Table and select Scheduling Deploy.
b. Verify that SASApp SAS DATA Step Batch Server is selected as the value for the
Batch Server field.
c. Click next to the Deployment Directory field and select Orion Star Jobs.
d. Accept the default value for the Deployed Job Name field.
e. Verify /Data Mart Development/Orion Jobs is the value for the Location field.
f. Click to save the information and close the Deploy a job for scheduling window. An
information message appears.
9. Access Windows Explorer and verify the creation of the code files.
a. Select Start All Programs Windows Explorer.
b. Navigate to S:\Workshop\dift\OrionStarJobs.
c. Verify that .sas files were created for each of the deployed jobs.
12.2 Deploying Jobs for Scheduling 12-23
c. Click to close the Connection Profile window and access the Log On window.
d. Type Ahmed as the value for the User ID field and Student1 as the value for the
Password field.
The deployed jobs are displayed and available to be part of the new flow.
e. Click next to the Scheduling Server field and select Platform Process Manager.
f. Click to move all items from the available list to the selected list.
12-26 Chapter 12 Deploying Jobs
g. Click .
b. Click to verify that all six deployed jobs are found in the visual flow editor.
12-28 Chapter 12 Deploying Jobs
d. Accept Completes successfully as the value for the Event type field.
e. Click .
12.2 Deploying Jobs for Scheduling 12-29
g. Click .
j. Click .
12-30 Chapter 12 Deploying Jobs
k. Drag the gate node so that it is to the right of and in between the two jobs
DIFT_Populate_Organization_Dimension_Table and
DIFT_Populate_Time_Dimension_Table.
m. Accept Completes successfully as the value for the Event type field.
n. Click .
p. Accept Completes successfully as the value for the Event type field.
q. Click .
13. Verify the dependencies established using visual flow editor with the standard interface.
a. Locate the deployed job DIFT_Populate_Time_Dimension_Table.
b. Right-click and select Manage Dependencies.
c. Click .
12-34 Chapter 12 Deploying Jobs
f. Click .
12.2 Deploying Jobs for Scheduling 12-35
d. Click .
e. Click next to the Trigger field and select Manually in Scheduling Server.
d. Click .
12-36 Chapter 12 Deploying Jobs
Objectives
Describe SAS Stored Processes.
List the applications that can be used to create
and execute stored processes.
Describe deployment of SAS Data Integration Studio
jobs as a SAS Stored Process.
20
21
12.3 Deploying Jobs as Stored Processes 12-39
stored process
stored process
Metadata
stored process
Server
stored process
Metadata Repository
Source Code Repository
24
25
12.3 Deploying Jobs as Stored Processes 12-41
SAS
Stored
Process
26
SAS
Stored
Results
SAS ODS Process Package
Output
SAS
SAS Information
SAS Add-In for
Microsoft Office
Stored Delivery Portal
Process
29
12.3 Deploying Jobs as Stored Processes 12-43
c. Click to close the Connection Profile window and access the Log On window.
d. Type Bruno as the value for the User ID field and Student1 as the value for the
Password field.
c. Click .
12.3 Deploying Jobs as Stored Processes 12-45
d. Click next to the SAS server field and select SASApp Logical Stored Process Server.
e. Click next the Source code repository field. The Manage Source Code
Repositories window is displayed.
g. Click Package.
12-46 Chapter 12 Deploying Jobs
h. Click .
A new metadata object, a stored process, now should appear in the Extract and Summary folder.
12-48 Chapter 12 Deploying Jobs
5. Execute the SAS Stored Process using SAS Add-In for Microsoft Office.
a. Select Start All Programs Microsoft Office Microsoft Office Excel 2007.
b. Click the SAS tab.
c. Click .
g. Click .
12.3 Deploying Jobs as Stored Processes 12-49
6. When finished viewing, select the Office button and then Exit Excel (do not save any changes).
Chapter 13 Learning More
Objectives
Identify areas of support that SAS offers.
List additional resources.
Education
Comprehensive training to deliver greater value to your
organization
http://support.sas.com/training/
3
13-4 Chapter 13 Learning More
SAS Publishing
SAS offers a complete selection of publications to help
customers use SAS software to its fullest potential:
http://support.sas.com/publishing/
Computer-based
certification exams
typically 60-70 questions
and 2-3 hours in length
Preparation materials and
practice exams available
Worldwide directory of
SAS Certified Professionals
http://support.sas.com/certify/
5
13.1 SAS Resources 13-5
Support
SAS provides a variety of self-help and assisted-help
resources.
http://support.sas.com/techsup/
User Groups
SAS supports many local, regional, international,
and special-interest SAS user groups.
support.sas.com/usergroups/
7
13-6 Chapter 13 Learning More